Categories

# Why Regression?

Let’s say you have a website, and your revenue is based on the website traffic, and you want to predict the revenue based on site traffic. The more traffic is driven to your website, the higher your revenue would be, or at least that’s what you would intuitively assume.

In a plot of revenue versus website traffic, traffic would be considered the independent variable, and revenue would be the dependent variable. The independent variable is often called the explanatory variable, and the dependent variable is called the response variable. However, they are typically referred to as independent and dependent variables. Our intuition tells us that the independent variable drives the dependent variable, and if there is some relationship between the two variables, then you would be able to use the independent variable to make predictions on the dependent variable.

This chart shows a clear trend between website traffic and revenue. As website traffic increases, the revenue increases. You can draw a line to show that relationship, and then you can use that line as a predictor line. So, for example, what will revenue be if your traffic is 4,500? If you draw a perpendicular line from 4.5K on the x-axis (the traffic axis) up to the orange regression line, sometimes called the line of best fit. Then you could draw another line over to the y-axis (the revenue axis) and see where it lands. You can see that when the traffic is around 4,500, the revenue is around 13,000.

Usually, you wouldn’t draw those lines. You would generate an equation, and you would call that equation a model, and you could plug the independent variable into the equation to generate the dependent variable output, which you would call your prediction.