This chapter focusses on the polynomial features and pipelining tools in Sklearn.
Introduction to Polynomial Features
Linear models trained on non-linear functions of data generally maintains the fast performance of linear methods. It also allows them to fit a much wider range of data. That’s the reason in machine learning such linear models, that are trained on nonlinear functions, are used.
One such example is that a simple linear regression can be extended by constructing polynomial features from the coefficients.
Mathematically, suppose we have standard linear regression model then for 2-D data it would look like this −
Y=W0+W1X1+W2X2Y=W0+W1X1+W2X2
Now, we can combine the features in second-order polynomials and our model will look like as follows −
Y=W0+W1X1+W2X2+W3X1X2+W4X21+W5X22Y=W0+W1X1+W2X2+W3X1X2+W4X12+W5X22
The above is still a linear model. Here, we saw that the resulting polynomial regression is in the same class of linear models and can be solved similarly.
To do so, scikit-learn provides a module named PolynomialFeatures. This module transforms an input data matrix into a new data matrix of given degree.
Parameters
Followings table consist the parameters used by PolynomialFeatures module
Sr.No | Parameter & Description |
---|---|
1 | degree − integer, default = 2It represents the degree of the polynomial features. |
2 | interaction_only − Boolean, default = falseBy default, it is false but if set as true, the features that are products of most degree distinct input features, are produced. Such features are called interaction features. |
3 | include_bias − Boolean, default = trueIt includes a bias column i.e. the feature in which all polynomials powers are zero. |
4 | order − str in {‘C’, ‘F’}, default = ‘C’This parameter represents the order of output array in the dense case. ‘F’ order means faster to compute but on the other hand, it may slow down subsequent estimators. |
Attributes
Followings table consist the attributes used by PolynomialFeatures module
Sr.No | Attributes & Description |
---|---|
1 | powers_ − array, shape (n_output_features, n_input_features)It shows powers_ [i,j] is the exponent of the jth input in the ith output. |
2 | n_input_features _ − intAs name suggests, it gives the total number of input features. |
3 | n_output_features _ − intAs name suggests, it gives the total number of polynomial output features. |
Implementation Example
Following Python script uses PolynomialFeatures transformer to transform array of 8 into shape (4,2) −
from sklearn.preprocessing import PolynomialFeatures
import numpy as np
Y = np.arange(8).reshape(4, 2)
poly = PolynomialFeatures(degree=2)
poly.fit_transform(Y)
Output
array(
[
[ 1., 0., 1., 0., 0., 1.],
[ 1., 2., 3., 4., 6., 9.],
[ 1., 4., 5., 16., 20., 25.],
[ 1., 6., 7., 36., 42., 49.]
]
)
Streamlining using Pipeline tools
The above sort of preprocessing i.e. transforming an input data matrix into a new data matrix of a given degree, can be streamlined with the Pipeline tools, which are basically used to chain multiple estimators into one.
Example
The below python scripts using Scikit-learn’s Pipeline tools to streamline the preprocessing (will fit to an order-3 polynomial data).
#First, import the necessary packages.
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
from sklearn.pipeline import Pipeline
import numpy as np
#Next, create an object of Pipeline tool
Stream_model = Pipeline([('poly', PolynomialFeatures(degree=3)), ('linear', LinearRegression(fit_intercept=False))])
#Provide the size of array and order of polynomial data to fit the model.
x = np.arange(5)
y = 3 - 2 * x + x ** 2 - x ** 3
Stream_model = model.fit(x[:, np.newaxis], y)
#Calculate the input polynomial coefficients.
Stream_model.named_steps['linear'].coef_
Output
array([ 3., -2., 1., -1.])
The above output shows that the linear model trained on polynomial features is able to recover the exact input polynomial coefficients.
Leave a Reply