Linear regression is a statistical technique that is used to find relationships between a dependent variable and one or more independent variables. It is used to predict the outcome of a continuous (numeric) variable. It is widely used for stock market analysis, weather forecasting, and sales predictions.
Linear regression is applied in two steps:
1. Estimate the relationship between two variables.
Examples: Does body weight influence the blood cholesterol level? Will the size of the house affect house prices?
2. Predict the value of the dependent variable based on other independent variables.
The simplest form of a simple linear regression equation with one dependent and an independent variable is shown using the following formula:
Where y is the dependent variable, x is the independent variable, m is the slope, and c is the intercept/coefficient of the line.
The slope m is represented as:
Below are the two types of linear regression:
Let’s understand the intuition behind the regression line by using an example:
The table on the left represents the data; the data points are plotted on the graph on the right.
The next step is to calculate the mean of X and Y and plot the values on the graph.
Here, the mean of X is three, and the mean of Y is five.
The regression line should ideally pass through the mean of X and Y.
Now, we need to draw the equation of the regression line. For that, we need to calculate the following parameters.
Based on the calculated values, the values of slope (m) and coefficient (c) are solved.
Let’s calculate the predicted values of Y for corresponding values of X using the linear equation where m=1.3 and c=1.1.
The best fit line should have the least sum of squares of these errors, also known as e square.
The sum of the squared errors for this regression line is 3.9. We check this error for each line and conclude the best fit line having the least e square value.