Category: Tutorial

  • Dimensionality Reduction using PCA

    Dimensionality reduction, an unsupervised machine learning method is used to reduce the number of feature variables for each data sample selecting set of principal features. Principal Component Analysis (PCA) is one of the popular algorithms for dimensionality reduction. Exact PCA Principal Component Analysis (PCA) is used for linear dimensionality reduction using Singular Value Decomposition (SVD) of the data…

  • Clustering Performance Evaluation

    There are various functions with the help of which we can evaluate the performance of clustering algorithms. Following are some important and mostly used functions given by the Scikit-learn for evaluating clustering performance − Adjusted Rand Index Rand Index is a function that computes a similarity measure between two clustering. For this computation rand index…

  • Clustering Methods

    Here, we will study about the clustering methods in Sklearn which will help in identification of any similarity in the data samples. Clustering methods, one of the most useful unsupervised ML methods, used to find similarity & relationship patterns among data samples. After that, they cluster those samples into groups having similarity based on features.…

  • Boosting Methods

    In this chapter, we will learn about the boosting methods in Sklearn, which enables building an ensemble model. Boosting methods build ensemble model in an increment way. The main principle is to build the model incrementally by training each base model estimator sequentially. In order to build powerful ensemble, these methods basically combine several week…

  • Randomized Decision Trees

    This chapter will help you in understanding randomized decision trees in Sklearn. Randomized Decision Tree algorithms As we know that a DT is usually trained by recursively splitting the data, but being prone to overfit, they have been transformed to random forests by training many trees over various subsamples of the data. The sklearn.ensemble module is having…

  • Decision Trees

    In this chapter, we will learn about learning method in Sklearn which is termed as decision trees. Decisions tress (DTs) are the most powerful non-parametric supervised learning method. They can be used for the classification and regression tasks. The main goal of DTs is to create a model predicting target variable value by learning simple…

  • Classification with Naïve Bayes

    Naïve Bayes methods are a set of supervised learning algorithms based on applying Bayes’ theorem with a strong assumption that all the predictors are independent to each other i.e. the presence of a feature in a class is independent to the presence of any other feature in the same class. This is naïve assumption that…

  • KNN Learning

    k-NN (k-Nearest Neighbor), one of the simplest machine learning algorithms, is non-parametric and lazy in nature. Non-parametric means that there is no assumption for the underlying data distribution i.e. the model structure is determined from the dataset. Lazy or instance-based learning means that for the purpose of model generation, it does not require any training…

  • K-Nearest Neighbors (KNN)

    This chapter will help you in understanding the nearest neighbor methods in Sklearn. Neighbor based learning method are of both types namely supervised and unsupervised. Supervised neighbors-based learning can be used for both classification as well as regression predictive problems but, it is mainly used for classification predictive problems in industry. Neighbors based learning methods do not have a…

  • Anomaly Detection

    Here, we will learn about what is anomaly detection in Sklearn and how it is used in identification of the data points. Anomaly detection is a technique used to identify data points in dataset that does not fit well with the rest of the data. It has many applications in business such as fraud detection,…