## Introduction

Hyperparameters are the set of parameters that are used for controlling the learning process of the machine learning algorithm. Hyperparameter tuning is the process of selecting a set of parameters for a machine learning algorithm. It is because algorithms can learn or identify the pattern in data efficiently and provide a good-performing model.

**Why do we need to perform hyperparameter tuning?**

A machine-learning algorithm may need different constraints or weights to identify the pattern present in the datasets. Training a machine learning model with default parameters may not be suitable for all kinds of data present in the datasets.

Selecting the best parameter for an algorithm is essential as it determines the learning process of the algorithm and its performance. With the help of hyperparameter tuning, we can choose the best parameter for an algorithm so that model can give a good prediction and perform well enough to solve a problem.

**Things on hyperparameter tuning one should know**

- Hyperparameter tuning is computationally expensive
- A small improvement in model performance
- Time-consuming

### Hyperparameter tuning using GridSearchCV

This method tries all the possible permutations and combinations of parameters will be to train the model and compute the corresponding accuracy of the model. This is time-consuming and computationally very expensive as this method use all the possible permutation and combination of the parameters.

For demonstration, I’ll use **Jupyter Notebook** and the heart disease prediction dataset is taken from **Kaggle**

import pandas as pd from sklearn.metrics import accuracy_score from sklearn.linear_model import LogisticRegression from sklearn.model_selection import train_test_split df = pd.read_csv('heart.csv') X = df.iloc[:, :-1] y = df.iloc[:, -1] X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2) LR = LogisticRegression() model = LR.fit(X_train, y_train) pred = model.predict(X_test) accuracy_score(y_test, pred)

**Output**

0.819672131147541

Logistic Regression gives an accuracy of 81% without hyperparameter tuning. Let’s take look at how the accuracy of the model increases after hyperparameter tuning.

### Implementation of GridSearchCV

from sklearn.model_selection import GridSearchCV parameters = { 'penalty' : ['l1', 'l2', 'elasticnet', 'none'], 'C' : [0.8, 0.9, 1.0, 1.2, 1.4], 'solver': ['newton-cg','lbfgs', 'liblinear','sag', 'saga'] } LR = LogisticRegression() clf = GridSearchCV(LR, parameters) clf.fit(X, y) clf.best_params_

**Output**

{'C': 0.9, 'penalty': 'l2', 'solver': 'newton-cg'}

These are the parameters that are best suited for this model. Now let’s implement the model using these parameters and see how much accuracy improves

LR = LogisticRegression(C = 0.9, penalty = 'l2', solver = 'newton-cg') model = LR.fit(X_train, y_train) pred = model.predict(X_test) accuracy_score(y_test, pred)

**Output**

0.8360655737704918

After hyperparameter tuning, the accuracy of the model has increased from 81% to 83%.

**Hyperparameter tuning using RandomizedSearchCV**

RandomizedSearchCV use only randomly selected sets of parameters to train the model and check its accuracy of the model. This is less time-consuming than the GridSearchCV.

One of the main disadvantages of this method is the parameters given by this method may not be the best parameters as this method selects only some set of the parameters to check the performance of the model.

Note: We will use the same model and same data for hyperparameter tuning with RandomizedSearchCV.

### Implementation of RandomizedSearchCV

from sklearn.model_selection import RandomizedSearchCV clf = RandomizedSearchCV(LR, parameters, n_iter= 6) clf.fit(X, y) clf.best_params_

**Output**

{'solver': 'newton-cg', 'penalty': 'l2', 'C': 1.2}

n_iter is used for selecting the number of combinations of the parameters to evaluate the model i.e to check the accuracy of the model. Let’s use these parameters and check the accuracy of the model.

LR = LogisticRegression(C = 1.2, penalty = 'l2', solver = 'newton-cg') model = LR.fit(X_train, y_train) pred = model.predict(X_test) accuracy_score(y_test, pred)

**Output**

0.8360655737704918

So, the accuracy of the model has improved from 81% to 83% which is quite good.

from sklearn.model_selection import RandomizedSearchCV clf = RandomizedSearchCV(LR, parameters, n_iter= 2) clf.fit(X, y) clf.best_params_

**Output**

{'solver': 'saga', 'penalty': 'l1', 'C': 1.4}

Using only 2 iterations we got these parameters as the best parameters. Let’s use these parameters to evaluate the model

LR = LogisticRegression(C = 1.4, penalty = 'l1', solver = 'saga') model = LR.fit(X_train, y_train) pred = model.predict(X_test) accuracy_score(y_test, pred)

**Output**

0.6721311475409836

Using the parameters obtained above has degraded the accuracy of the model. So, RandomizedSearchCV sometimes might not end up with the best parameters for the model. It seems that n_iters must be chosen wisely to get better parameters.

## Conclusion

Hyperparameter tuning is an important step in machine learning. It is used for selecting the best parameters for a machine learning algorithm so that the algorithm can learn the pattern and perform efficiently to solve a problem.

GridSearchCV makes all the possible combinations and permutations of parameters to select the best one while RandomizedSearchCV randomly selects a set of parameters to select the best one among them. It is more computationally expensive than RandomizedSearchCV but always ends up with the best parameters. Hence, hyperparameter tuning is very essential in machine learning.

Happy Learning 🙂