Regression analysis is helpful statistical method that can be leveraged across an organization to determine the degree to which particular independent variables are influencing dependent variables.
The possible scenarios for conducting regression analysis to yield valuable, actionable business insights are endless.
The next time someone in your business is proposing a hypothesis that states that one factor, whether you can control that factor or not, is impacting a portion of the business, suggest performing a regression analysis to determine just how confident you should be in that hypothesis! This will allow you to make more informed business decisions, allocate resources more efficiently, and ultimately boost your bottom line.
In order to conduct a regression analysis, you’ll need to define a dependent variable that you hypothesize is being influenced by one or several independent variables.
You’ll then need to establish a comprehensive dataset to work with. Administering surveys to your audiences of interest is a terrific way to establish this dataset. Your survey should include questions addressing all of the independent variables that you are interested in.
Let’s continue using our application training example. In this case, we’d want to measure the historical levels of satisfaction with the events from the past three years or so (or however long you deem statistically significant), as well as any information possible in regards to the independent variables.
Perhaps we’re particularly curious about how the price of a ticket to the event has impacted levels of satisfaction.
To begin investigating whether or not there is a relationship between these two variables, we would begin by plotting these data points on a chart, which would look like the following theoretical example.
(Plotting your data is the first step in figuring out if there is a relationship between your independent and dependent variables)
Our dependent variable (in this case, the level of event satisfaction) should be plotted on the y-axis, while our independent variable (the price of the event ticket) should be plotted on the x-axis.
Once your data is plotted, you may begin to see correlations. If the theoretical chart above did indeed represent the impact of ticket prices on event satisfaction, then we’d be able to confidently say that the higher the ticket price, the higher the levels of event satisfaction.
But how can we tell the degree to which ticket price affects event satisfaction?
To begin answering this question, draw a line through the middle of all of the data points on the chart. This line is referred to as your regression line, and it can be precisely calculated using a standard statistics program like Excel.
We’ll use a theoretical chart once more to depict what a regression line should look like.
The regression line represents the relationship between your independent variable and your dependent variable.
Excel will even provide a formula for the slope of the line, which adds further context to the relationship between your independent and dependent variables.
The formula for a regression line might look something like Y = 100 + 7X + error term.
This tells you that if there is no “X”, then Y = 100. If X is our increase in ticket price, this informs us that if there is no increase in ticket price, event satisfaction will still increase by 100 points.
You’ll notice that the slope formula calculated by Excel includes an error term. Regression lines always consider an error term because in reality, independent variables are never precisely perfect predictors of dependent variables. This makes sense while looking at the impact of ticket prices on event satisfaction — there are clearly other variables that are contributing to event satisfaction outside of price.
Your regression line is simply an estimate based on the data available to you. So, the larger your error term, the less definitively certain your regression line is.
Regression analysis is the “go-to method in analytics,” says Redman. And smart companies use it to make decisions about all sorts of business issues. “As managers, we want to figure out how we can impact sales or employee retention or recruiting the best people. It helps us figure out what we can do.”
Most companies use regression analysis to explain a phenomenon they want to understand (e.g. why did customer service calls drop last month?); predict things about the future (e.g. what will sales look like over the next six months?); or to decide what to do (e.g. should we go with this promotion or a different one?).
As a consumer of regression analysis, there are several things you need to keep in mind.
First, don’t tell your data analyst to go out and figure out what is affecting sales. “The way most analyses go haywire is the manager hasn’t narrowed the focus on what he or she is looking for,” says Redman. It’s your job to identify the factors that you suspect are having an impact and ask your analyst to look at those. “If you tell a data scientist to go on a fishing expedition, or to tell you something you don’t know, then you deserve what you get, which is bad analysis,” he says. In other words, don’t ask your analysts to look at every variable they can possibly get their hands on all at once. If you do, you’re likely to find relationships that don’t really exist. It’s the same principle as flipping a coin: do it enough times, you’ll eventually think you see something interesting, like a bunch of heads all in a row.
Also keep in mind whether or not you can do anything about the independent variable you’re considering. You can’t change how much it rains so how important is it to understand that? “We can’t do anything about weather or our competitor’s promotion but we can affect our own promotions or add features, for example,” says Redman. Always ask yourself what you will do with the data. What actions will you take? What decisions will you make?
Second, “analyses are very sensitive to bad data” so be careful about the data you collect and how you collect it, and know whether you can trust it. “All the data doesn’t have to be correct or perfect,” explains Redman but consider what you will be doing with the analysis. If the decisions you’ll make as a result don’t have a huge impact on your business, then it’s OK if the data is “kind of leaky.” But “if you’re trying to decide whether to build 8 or 10 of something and each one costs $1 million to build, then it’s a bigger deal,” he says. The chart below explains how to think about whether to act on the data.
Redman says that some managers who are new to understanding regression analysis make the mistake of ignoring the error term. This is dangerous because they’re making the relationship between something more certain than it is. “Oftentimes the results spit out of a computer and managers think, ‘That’s great, let’s use this going forward.’” But remember that the results are always uncertain. As Redman points out, “If the regression explains 90% of the relationship, that’s great. But if it explains 10%, and you act like it’s 90%, that’s not good.” The point of the analysis is to quantify the certainty that something will happen. “It’s not telling you how rain will influence your sales, but it’s telling you the probability that rain may influence your sales.”
The last mistake that Redman warns against is letting data replace your intuition.
“You always have to lay your intuition on top of the data,” he explains. Ask yourself whether the results fit with your understanding of the situation. And if you see something that doesn’t make sense ask whether the data was right or whether there is indeed a large error term. Redman suggests you look to more experienced managers or other analyses if you’re getting something that doesn’t make sense. And, he says, never forget to look beyond the numbers to what’s happening outside your office: “You need to pair any analysis with study of real world. The best scientists — and managers — look at both.”
Regression analysis comes with several applications in finance. For example, the statistical method is fundamental to the Capital Asset Pricing Model (CAPM). Essentially, the CAPM equation is a model that determines the relationship between the expected return of an asset and the market risk premium.
The analysis is also used to forecast the returns of securities, based on different factors, or to forecast the performance of a business. Learn more forecasting methods in CFI’s Budgeting and Forecasting Course!
1. Beta and CAPM
In finance, regression analysis is used to calculate the Beta (volatility of returns relative to the overall market) for a stock. It can be done in Excel using the Slope function.
When forecasting financial statements for a company, it may be useful to do a multiple regression analysis to determine how changes in certain assumptions or drivers of the business will impact revenue or expenses in the future. For example, there may be a very high correlation between the number of salespeople employed by a company, the number of stores they operate, and the revenue the business generates.
The above example shows how to use the Forecast function in Excel to calculate a company’s revenue, based on the number of ads it runs.
Excel remains a popular tool to conduct basic regression analysis in finance, however, there are many more advanced statistical tools that can be used.
Python and R are both powerful coding languages that have become popular for all types of financial modeling, including regression. These techniques form a core part of data science and machine learning where models are trained to detect these relationships in data.
Multiple linear regression analysis is essentially similar to the simple linear model, with the exception that multiple independent variables are used in the model. The mathematical representation of multiple linear regression is:
Y = a + bX1 + cX2 + dX3 + ϵ
Y – Dependent variable
X1, X2, X3 – Independent (explanatory) variables
a – Intercept
b, c, d – Slopes
ϵ – Residual (error)
Multiple linear regression follows the same conditions as the simple linear model. However, since there are several independent variables in multiple linear analysis, there is another mandatory condition for the model:
Non-collinearity: Independent variables should show a minimum correlation with each other. If the independent variables are highly correlated with each other, it will be difficult to assess the true relationships between the dependent and independent variables.
Regression analysis is a set of statistical methods used for the estimation of relationships between a dependent variable and one or more independent variables. It can be utilized to assess the strength of the relationship between variables and for modeling the future relationship between them.
Regression analysis includes several variations, such as linear, multiple linear, and nonlinear. The most common models are simple linear and multiple linear. Nonlinear regression analysis is commonly used for more complicated data sets in which the dependent and independent variables show a nonlinear relationship.
Regression analysis offers numerous applications in various disciplines, including finance.