Categories
3. R programming language in Data science

Linear Regression Analysis in R language

In this analysis, we’ll use a standard built-in cars dataset to find the correlation between variables. 

head(cars) – Displays the top six rows of the data frame

head-cars

str(cars) – Displays the structure of the data frame (50 observations and two variables)

plot-car

plot(cars) – Provides a scatter plot of speed vs distance

plot-cars2

plot(cars$speed, cars$dist)

Correlation analysis studies the strength of the relationship between two continuous variables. It involves computing the correlation coefficient between the two variables.

car-speed

If one variable consistently increases with the increasing value of the other, then they have a strong positive correlation (value close to +1).

Let’s build a linear regression model on the entire dataset to build the coefficients:

reg-model.
dist

We can predict the dependent variables if the model is “statistically significant”. 

p-value

The value of p should be less than 0.05 for the model to be statistically significant.

Split the dataset into training and testing:

Fit the model on training data and predict ‘dist’ on test data

1mod

Review model diagnostic measures: summary(lmMod)

summary-1mod

A simple correlation between the actual and predicted values can be used to measure accuracy:

accuracy

You can compute all the error metrics in one go using the regr.eval() function in the DMwR package. Use install.packages(‘DMwR’) for this if you are using it for the first time.

dmwr

Now that we have seen how the linear regression algorithm works in R, let’s now learn about decision trees.

Leave a Reply

Your email address will not be published. Required fields are marked *