Author: Tayyaba Syed

  • Linguistic Resources

    In this chapter, we will learn about the linguistic resources in Natural Language Processing. Corpus A corpus is a large and structured set of machine-readable texts that have been produced in a natural communicative setting. Its plural is corpora. They can be derived in different ways like text that was originally electronic, transcripts of spoken…

  • Introduction

    Language is a method of communication with the help of which we can speak, read and write. For example, we think, we make decisions, plans and more in natural language; precisely, in words. However, the big question that confronts us in this AI era is that can we communicate in a similar manner with computers.…

  • Natural Language Processing Tutorial

    Language is a method of communication with the help of which we can speak, read and write. Natural Language Processing (NLP) is a subfield of Computer Science that deals with Artificial Intelligence (AI), which enables computers to understand and process human language. Audience This tutorial is designed to benefit graduates, postgraduates, and research students who…

  • Dimensionality Reduction using PCA

    Dimensionality reduction, an unsupervised machine learning method is used to reduce the number of feature variables for each data sample selecting set of principal features. Principal Component Analysis (PCA) is one of the popular algorithms for dimensionality reduction. Exact PCA Principal Component Analysis (PCA) is used for linear dimensionality reduction using Singular Value Decomposition (SVD) of the data…

  • Clustering Performance Evaluation

    There are various functions with the help of which we can evaluate the performance of clustering algorithms. Following are some important and mostly used functions given by the Scikit-learn for evaluating clustering performance − Adjusted Rand Index Rand Index is a function that computes a similarity measure between two clustering. For this computation rand index…

  • Clustering Methods

    Here, we will study about the clustering methods in Sklearn which will help in identification of any similarity in the data samples. Clustering methods, one of the most useful unsupervised ML methods, used to find similarity & relationship patterns among data samples. After that, they cluster those samples into groups having similarity based on features.…

  • Boosting Methods

    In this chapter, we will learn about the boosting methods in Sklearn, which enables building an ensemble model. Boosting methods build ensemble model in an increment way. The main principle is to build the model incrementally by training each base model estimator sequentially. In order to build powerful ensemble, these methods basically combine several week…

  • Randomized Decision Trees

    This chapter will help you in understanding randomized decision trees in Sklearn. Randomized Decision Tree algorithms As we know that a DT is usually trained by recursively splitting the data, but being prone to overfit, they have been transformed to random forests by training many trees over various subsamples of the data. The sklearn.ensemble module is having…

  • Decision Trees

    In this chapter, we will learn about learning method in Sklearn which is termed as decision trees. Decisions tress (DTs) are the most powerful non-parametric supervised learning method. They can be used for the classification and regression tasks. The main goal of DTs is to create a model predicting target variable value by learning simple…

  • Classification with Naïve Bayes

    Naïve Bayes methods are a set of supervised learning algorithms based on applying Bayes’ theorem with a strong assumption that all the predictors are independent to each other i.e. the presence of a feature in a class is independent to the presence of any other feature in the same class. This is naïve assumption that…