How to Perform Feature Scaling in Machine Learning

Introduction

Feature scaling is one of the important steps in data pre-processing. Scaling refers to converting the original form of data to another form of data within a certain range. Feature scaling can play a major role in poor-performing and good-performing machine learning models.

 

Need for feature scaling

Many machine learning models perform well when the input data are scaled to the standard range. Data that are fed to the machine learning model can vary largely in terms of value or unit. If the feature scaling is not done then the machine learning model can learn unnecessary things and result in poorer performance.

Let’s take an example for a better understanding.

Example 1

Petrol(Liter) Distance(KM)
20 500
23 1000
34 3000

Machine learning models understand only numbers but not what they actually mean. For better learning of the machine learning model, these features needed to be scaled in the standard range.

In case of not being scaled, the data in the Distance column are very larger than the data in the Petrol column, machine learning model learns that Distance > Petrol is not meaningful and can result in the wrong prediction. This is one of the reasons for doing feature scaling.

 

Example 2

In the case of a different unit, say that there are two values 1000g(gram) and 5Kg. The machine learning model in the case of learning on not scaled data interprets 1000g > 5Kg which is not correct. As told already machine learning model always understands the number but not their meaning. So, these data must be converted into a standard range so to avoid such kind of wrong learning because these data play a very important role in the performance of the model. Hence, this is another reason for performing the feature scaling.

 

Always need to perform feature scaling?

Not all machine learning models need feature scaling. The machine learning model that uses weighted sum input such as linear regression, logistic regression, and machine learning model that uses the distance between two points such as K-nearest neighbor and neural networks need feature scaling.

 

Techniques for performing feature scaling

Normalization

In normalization, the data from the original range is converted into the new form of data that has ranged between 0 and 1. For this one should be able to extract the minimum and maximum values from the dataset. The formula used for normalization is:

Y = (X – Xmin)/(Xmax – Xmin)

Here,

Xmax = Maximum value in dataset

Xmin = Minimum value in the dataset

Python scikit-learn library provides MinMaxScaler() function that is used to scale the values. For the demonstration, I’ll use jupyter notebook.

import numpy as np
from sklearn.preprocessing import MinMaxScaler

arr = np.array([1,2,3,4,5,6,7,8])
sc = MinMaxScaler ()
result = sc.fit_transform(arr.reshape(-1, 1))
result

Output

array([[0.        ],
       [0.14285714],
       [0.28571429],
       [0.42857143],
       [0.57142857],
       [0.71428571],
       [0.85714286],
       [1.        ]])

The values in the array are converted into the form where the data varies from 0 to 1. This makes the learning of the machine learning model easy and simple.

 

Standardization

In standardization, the original data is converted into a new form of data that has a mean of zero and a standard deviation of 1. The formula used for standardization is:

y = (x –  μ)/σ

μ = mean of the data

σ = standard deviation of data

import numpy as np
from sklearn.preprocessing import StandardScaler

arr = np.array([1,2,3,4,5,6,7,8])
sc = StandardScaler()
result = sc.fit_transform(arr.reshape(-1, 1))
result

Output

array([[-1.52752523],
       [-1.09108945],
       [-0.65465367],
       [-0.21821789],
       [ 0.21821789],
       [ 0.65465367],
       [ 1.09108945],
       [ 1.52752523]])

This is how StandardScaler works to convert the data into a standard range. After applying the standard scaler, it transforms the data in such a way that the mean is zero and the standard is one.

print("Mean: ",result.mean())
print("Standard Deviation: ", result.std())

Output

Mean:  0.0
Standard Deviation:  1.0

We can see that the StandardScaler converts the data into form with a mean of 0 and a standard deviation of 1. This makes the learning of the machine learning model easy and helps to improve the model performance.

 

Maximum Absolute Scaler

In this, each feature is scaled by its maximum value. This scaler transforms each feature in such a way that the maximum value present in each feature is 1. Scikit-learn library provides MaxAbsScaler() function to carry out this scaling. Let’s take a look at how this method is useful to scale the data.

import numpy as np
from sklearn.preprocessing import MaxAbsScaler

arr = np.array([1,2,3,4,5,6,7,8])
sc = MaxAbsScaler()
result = sc.fit_transform(arr.reshape(-1, 1))
result

Output

array([[0.125],
       [0.25 ],
       [0.375],
       [0.5  ],
       [0.625],
       [0.75 ],
       [0.875],
       [1.   ]])

We can see that the original data are transformed into such a form of data such that the maximum value is unity i.e 1.

 

Robust Scaler

This type of scaler scales the data using an interquartile range(IQR). In the case of the presence of outliers in the dataset, scaling using mean and standard deviation doesn’t work because the presence of outliers alters the mean and standard deviation.

Interquartile range(IQR) is the difference between the third quartile(75th percentile) and first quartile(25th percentile). Let’s take a look at how this scaler is used to scale the data.

import numpy as np
from sklearn.preprocessing import RobustScaler

arr = np.array([1,2,3,4,5,6,7,8])
sc = RobustScaler()
result = sc.fit_transform(arr.reshape(-1, 1))
result

Output

array([[-1.        ],
       [-0.71428571],
       [-0.42857143],
       [-0.14285714],
       [ 0.14285714],
       [ 0.42857143],
       [ 0.71428571],
       [ 1.        ]])

This is how the robust scaler is used to scale the data.

 

Quantile Transformer scaling

In this method, features are transformed so that it follows a normal distribution. It reduces the impact of outliers. First, an estimate of the cumulative distribution function is used to convert the data to a uniform distribution. Then obtained values are converted to the required distribution using the associated quantile function.

Let’s take a look at how it is implemented.

import numpy as np
from sklearn.preprocessing import QuantileTransformer

arr = np.array([1,2,3,4,5,6,7,30])
sc = QuantileTransformer()
result = sc.fit_transform(arr.reshape(-1, 1))
result

Output

array([[0.        ],
       [0.14285714],
       [0.28571429],
       [0.42857143],
       [0.57142857],
       [0.71428571],
       [0.85714286],
       [1.        ]])

This is how the quantile transformer scaler is used to scale the data.

 

Conclusion

Feature scaling is not important to all machine learning algorithms. The algorithms that use weighted sum input and distance need the scaled features. If scaling is not in that case then the machine learning model may lead to the wrong prediction.

Normalization and standardization are the most popular techniques for feature scaling. Normalization is done when the algorithm needs the data that don’t follow Gaussian distribution while Standardscaler is done when the algorithm needs data that follow Gaussian distribution. Hence, feature scaling is an essential step in data pre-processing.

Reference

Scikit-learn Documentation

Happy Learning 🙂

Leave a Comment