## Introduction

Linear Regression is a machine learning model that is based on supervised learning. It performs regression tasks. This model maps the linear relationship between dependent and independent variables, so have named linear regression.

Regression models the target predicted variable based on independent variables. It is used to develop the relationship between variables and forecasting. Depending on the number of independent variables, linear regression is of two types :

- Simple Linear Regression
- Multiple Linear Regression

### Simple Linear Regression

In simple linear regression, the independent variable is only one. The formula used in simple linear regression to find the relationship between dependent and independent variables is:

`y = Ø1 + Ø2*x`

y = Dependent variable (output variable)

x = Independent variable

Ø1 = Intercept

Ø2 = Slope

The simple regression model tries to find the **‘best-fit line’** (blue-colored line in the figure above) by adjusting the slope(Ø2) and the intercept(Ø1). The best-fit line is the line that is drawn such that the sum of the square of the distance between the predicted value and the true value is minimal.

In other words, the sum of the distances from that line to the points is minimal. Once the best Ø1 and Ø2 are available, the model is ready to predict the output for the corresponding input.

### Multiple Linear Regression

Generally, the independent variables are more than one rather than just one variable. This output variable is dependent upon more than one variable so has been named multiple linear regression. It also develops the linear relationship between dependent and independent variables. The formula used to develop the relationship between dependents and independent variables is:

`y = Øo + Ø1*x + Ø2*x + . . . . . . . +Øn*xn`

y = Dependent variable

x = Independent variables

Øo = Intercept

Øi = Slope coefficient for each of the dependent variables, i = 1,2,3 ,. . . . k

k = Number of observations

n = Number of independent variables

The best fit line is determined by tuning the values of* Øo and Øi *such that the sum of the square of predicted and real value is minimal.

### Cost Function

After we’ve trained our learning algorithm and got a hypothesis, we need to examine how good our results are. This is done by the so-called **cost function**.

The cost function measures the accuracy of the hypothesis outputs. It does this by comparing the predicted values of the hypothesis with the actual true value.

By achieving the best-fit regression line, the model aims to **predict the ‘y’** value such that the error difference between the predicted value and the real value is minimum.

So, it is very essential to update the value of *Øo and Øi* in case of multiple regression and the value of *Øo and Ø1* in case of simple linear regression, to reach the best value that **minimizes the error** between the predicted value and true value.

The cost function(J) of linear regression is the **Root Mean Squared Error(RMSE)** between the predicted y and the true value of y.

### Gradient Descent

To update *Øo and Ø1* values in order to reduce cost function (minimizing RMSE value) and achieve the best fit line the model uses Gradient Descent. The idea is to start with random *Øo and Ø1* values and then iteratively update the values, reaching the minimum cost.

We’ll take a small example to see the working of linear regression. For this, we’ll create dummy datasets having ‘age’, ‘no of hours’ as input parameters, and ‘salary’ as output parameters. For the demonstration, I’ll be using **Jupyter Notebook**.

At first, we’ll create a dummy dataset

info = { 'no of hours' : [1, 2, 5, 7, 8, 10, 12, 15, 17], 'age' : [20, 34, 21, 27, 34, 21, 20, 45, 31], 'salary' : [1000, 3000, 5000, 8000, 8500, 9000, 12000, 15000, 22000] } import pandas as pd df = pd.DataFrame(info) print(df)

**Output**

Let’s visualize the datasets. First of all, we’ll import** matplotlib and seaborn **to visualize the dataset.

import matplotlib.pyplot as plt import seaborn as sns sns.scatterplot(x = "age", y= "salary", data = df) plt.xlabel("age") plt.ylabel("salary") plt.title("age vs salary") plt.show()

**Output**

Also, we’ll visualize the** no of hours vs salary **graph

sns.scatterplot(x = "no of hours", y= "salary", data = df) plt.xlabel("no of hours") plt.ylabel("salary") plt.title("no of hours vs salary") plt.show()

**Output**

Also, we will take a look at the** ‘no of hours’ vs ‘age’ **graph

sns.scatterplot(x = "age", y= "no of hours", data = df) plt.xlabel("age") plt.ylabel("no of hours") plt.title("age vs no of hours") plt.show()

**Output**

Now, we will use a linear regression model to predict the salary based on the hours and age. The equation used will be in the form of:

`salary = Øo + Ø1 * no of hours + Ø2 * age`

Øo = Intercept

Ø1 = Coefficient of no of hours

Ø2 = Coefficient of age

Now, we will start building the model. Let’s select the features and target variables:

X = df.iloc[:, :2] y = df.iloc[:, -1]

Then, we’ll import the necessary libraries as:

from sklearn.model_selection import train_test_split from sklearn.linear_model import LinearRegression

Now, splitting datasets into training and testing datasets:

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25)

**Build the model as:**

lr = LinearRegression() model = lr.fit(X_train, y_train) pred = model.predict(X_test) print(pred)

**Output**

[ 6454.68201512 12813.11470225 24376.50611935]

Now, let’s see the values of *Øo, Ø1, and Ø2*

print("Intercept :",model.intercept_)

**Output**

Intercept : -8477.293570728314

Here we can see that the value of intercept(Øo) = -8477.293570728314

print("Slope :", model.coef_)

**Output**

Slope : [1059.73878119 376.83817716]

As a result:

Ø1 -> coefficient of no of hours = 1059.73878119

Ø2 -> coefficient of age = 376.83817716

## Conclusion

A linear regression algorithm is a machine learning algorithm used to do regression analysis. This model develops the linear relationship between dependent and independent variables minimizing the Root Mean Squared Error(RMSE) between the predicted and true value.

Hence, **price prediction** is one example of linear regression. So, linear regression is the simple yet most useful algorithm of machine learning.

If you want to learn more about the machine learning algorithms types, then check the link here.

Happy Learning 🙂