Let’s compare linear regression to logistic regression and take a look at the trendline that describes the model.
In the linear regression graph above, the trendline is a straight line, which is why you call it linear regression. However, using linear regression, you can’t divide the output into two distinct categories—yes or no. To divide our results into two categories, you would have to clip the line between 0 and 1. If you recall, probabilities can be between only 0 and 1, and if we’re going to use probability on the y-axis, then you can’t have anything that is below 0 or above 1.
Thus you would have to clip the line, and once you cut the line, you see that the resulting curve cannot be represented in a linear equation.
For logistic regression, you will make use of a sigmoid function, and the sigmoid curve is the line of best fit. Notice that it’s not linear, but it does satisfy our requirement of using a single line that does not need to be clipped.
For linear regression, you would use an equation of a straight line:
y = b0 + b1*x,
where x is the independent variable, y is the dependent variable.
Because you cannot use a linear equation for binary predictions, you need to use the sigmoid function, which is represented by the equation:
p = 1/(1+e-y)
e is the base of the natural logs.
Then by taking the log of both sides and solving it, you get the sigmoid function. By graphing it, you get the logistic regression line of best fit.
Next, let us get more clarity on Logistic Regression in R with an example.