Author: Tayyaba Syed
-
Modelling Process
This chapter deals with the modelling process involved in Sklearn. Let us understand about the same in detail and begin with dataset loading. Dataset Loading A collection of data is called dataset. It is having the following two components − Features − The variables of data are called its features. They are also known as predictors,…
-
Introduction
In this chapter, we will understand what is Scikit-Learn or Sklearn, origin of Scikit-Learn and some other related topics such as communities and contributors responsible for development and maintenance of Scikit-Learn, its prerequisites, installation and its features. What is Scikit-Learn (Sklearn) Scikit-learn (Sklearn) is the most useful and robust library for machine learning in Python.…
-
Scikit Learn Tutorial
Scikit-learn (Sklearn) is the most useful and robust library for machine learning in Python. It provides a selection of efficient tools for machine learning and statistical modeling including classification, regression, clustering and dimensionality reduction via a consistence interface in Python. This library, which is largely written in Python, is built upon NumPy, SciPy and Matplotlib.…
-
Data Leakage
Data leakage is a common problem in machine learning that occurs when information from outside the training dataset is used to create or evaluate a model. This can lead to overfitting, where the model is too closely tailored to the training data and performs poorly on new data. There are two main types of data…
-
MLOps
MLOps (Machine Learning Operations) is a set of practices and tools that combine software engineering, data science, and operations to enable the automated deployment, monitoring, and management of machine learning models in production environments. MLOps addresses the challenges of managing and scaling machine learning models in production, which include version control, reproducibility, model deployment, monitoring,…
-
Entropy
Entropy is a concept that originates from thermodynamics and was later applied in various fields, including information theory, statistics, and machine learning. In machine learning, entropy is used as a measure of the impurity or randomness of a set of data. Specifically, entropy is used in decision tree algorithms to decide how to split the…
-
P-value
In machine learning, we use P-value to test the null hypothesis that there is no significant relationship between two variables. For example, if we have a dataset of house prices and we want to determine whether there is a significant relationship between the size of the house and its price, we can use P-value to…
-
Overfitting
Overfitting occurs when a model learns the noise in the training data, rather than the underlying patterns. This causes the model to perform well on the training data, but poorly on new data. Essentially, the model becomes too specialized to the training data, and is unable to generalize to new data. Overfitting is a common…
-
Regularization
In machine learning, regularization is a technique used to prevent overfitting, which occurs when a model is too complex and fits the training data too well, but fails to generalize to new, unseen data. Regularization introduces a penalty term to the cost function, which encourages the model to have smaller weights and a simpler structure,…
-
Perceptron
Perceptron is one of the oldest and simplest neural network architectures. It was invented in the 1950s by Frank Rosenblatt. The Perceptron algorithm is a linear classifier that classifies input into one of two possible output categories. It is a type of supervised learning that trains the model by providing labeled training data. The Perceptron…