Categories
3. R programming language in Data science

Linear Regression with R language

Linear regression is a statistical technique that is used to find relationships between a dependent variable and one or more independent variables. It is used to predict the outcome of a continuous (numeric) variable. It is widely used for stock market analysis, weather forecasting, and sales predictions.

Linear regression is applied in two steps:

1. Estimate the relationship between two variables.

Examples: Does body weight influence the blood cholesterol level? Will the size of the house affect house prices?

2. Predict the value of the dependent variable based on other independent variables.

The simplest form of a simple linear regression equation with one dependent and an independent variable is shown using the following formula:

y

Where y is the dependent variable, x is the independent variable, m is the slope, and c is the intercept/coefficient of the line.

The slope m is represented as:

slope-m

Below are the two types of linear regression:

simple

Let’s understand the intuition behind the regression line by using an example:

The table on the left represents the data; the data points are plotted on the graph on the right. 

variables

The next step is to calculate the mean of X and Y and plot the values on the graph.

Here, the mean of X is three, and the mean of Y is five.

The regression line should ideally pass through the mean of X and Y.

regression-line.

Now, we need to draw the equation of the regression line. For that, we need to calculate the following parameters.

Based on the calculated values, the values of slope (m) and coefficient (c) are solved.

coefficient

Let’s calculate the predicted values of Y for corresponding values of X using the linear equation where m=1.3 and c=1.1.

y-pred

The best fit line should have the least sum of squares of these errors, also known as e square.

e-square

The sum of the squared errors for this regression line is 3.9. We check this error for each line and conclude the best fit line having the least e square value.

Leave a Reply

Your email address will not be published. Required fields are marked *