Categories
What is Linear Regression ?

Linear Regression Test Value: Steps

Sample question: Given a set of data with sample size 8 and r = 0.454, find the linear regression test value.

Note: r is the correlation coefficient.

Step 1: Find r, the correlation coefficient, unless it has already been given to you in the question. In this case, r is given (r = .0454). Not sure how to find r? See: Correlation Coefficient for steps on how to find r.

Step 2: Use the following formula to compute the test value (n is the sample size):

How to solve the formula:

  1. Replace the variables with your numbers:
    T = .454√((8 – 2)/(1-[.454]2 ))
    • Subtract 2 from n:
      8 – 2 = 6
    • Square r:
      .454 × .454 = .206116
    • Subtract step (3) from 1:
      1 – .206116 = .793884
    • Divide step (2) by step (4):
      6 / .793884 = 7.557779
    • Take the square root of step (5):
      √7.557779 = 2.74914154
    • Multiply r by step (6):
      .454 × 2.74914154 = 1.24811026

The Linear Regression Test value, T = 1.24811026

That’s it!

Categories
What is Linear Regression ?

Linear Regression Equation Microsoft Excel: Steps

Step 1: Install the Data Analysis Toolpak, if it isn’t already installed. For instructions on how to load the Data Analysis Toolpak, click here.

Step 2: Type your data into two columns in Excel. For example, type your “x” data into column A and your “y” data into column b. Do not leave any blank cells between your entries.

Step 3: Click the “Data Analysis” tab on the Excel toolbar.

Step 4: Click “regression” in the pop up window and then click “OK.”

Step 5: Select your input Y range. You can do this two ways: either select the data in the worksheet or type the location of your data into the “Input Y Range box.” For example, if your Y data is in A2 through A10 then type “A2:A10” into the Input Y Range box.

Step 6: Select your input X range by selecting the data in the worksheet or typing the location of your data into the “Input X Range box.”

Step 7: Select the location where you want your output range to go by selecting a blank area in the worksheet or typing the location of where you want your data to go in the “Output Range” box.

Step 8: Click “OK”. Excel will calculate the linear regression and populate your worksheet with the results.

Tip: The linear regression equation information is given in the last output set (the coefficients column). The first entry in the “Intercept” row is “a” (the y-intercept) and the first entry in the “X” column is “b” (the slope).

Categories
What is Linear Regression ?

How to Find a Linear Regression Equation: Steps

Step 1: Make a chart of your data, filling in the columns in the same way as you would fill in the chart if you were finding the Pearson’s Correlation Coefficient.

SUBJECTAGE XGLUCOSE LEVEL YXYX2Y2
14399425718499801
2216513654414225
3257919756256241
44275315017645625
55787495932497569
65981477934816561
Σ247486204851140940022

From the above table, Σx = 247, Σy = 486, Σxy = 20485, Σx2 = 11409, Σy2 = 40022. n is the sample size (6, in our case).

Step 2: Use the following equations to find a and b.

a = 65.1416
b = .385225

Find a:

  • ((486 × 11,409) – ((247 × 20,485)) / 6 (11,409) – 2472)
  • 484979 / 7445
  • =65.14

Find b:

  • (6(20,485) – (247 × 486)) / (6 (11409) – 2472)
  • (122,910 – 120,042) / 68,454 – 2472
  • 2,868 / 7,445
  • .385225

Step 3: Insert the values into the equation.
y’ = a + bx
y’ = 65.14 + .385225x

That’s how to find a linear regression equation by hand!

Categories
What is Linear Regression ?

The Linear Regression Equation

Linear regression is a way to model the relationship between two variables. You might also recognize the equation as the slope formula. The equation has the form Y= a + bX, where Y is the dependent variable (that’s the variable that goes on the Y axis), X is the independent variable (i.e. it is plotted on the X axis), b is the slope of the line and a is the y-intercept.

The first step in finding a linear regression equation is to determine if there is a relationship between the two variables. This is often a judgment call for the researcher. You’ll also need a list of your data in x-y format (i.e. two columns of data—independent and dependent variables).

Warnings:

  1. Just because two variables are related, it does not mean that one causes the other. For example, although there is a relationship between high GRE scores and better performance in grad school, it doesn’t mean that high GRE scores cause good grad school performance.
  2. If you attempt to try and find a linear regression equation for a set of data (especially through an automated program like Excel or a TI-83), you will find one, but it does not necessarily mean the equation is a good fit for your data. One technique is to make a scatter plot first, to see if the data roughly fits a line before you try to find a linear regression equation.
Categories
What is Linear Regression ?

Linear regression example

We have 50 parts with various inside diameters, outside diameters, and widths. Parts are cleaned using one of three container types. Cleanliness is a measure of the particulates on the parts. This is measured before and after running the parts through the cleaning process. The response of interest is Removal. This is the difference between pre-cleaning and post-cleaning measures.

We’re interested in whether the inside diameter, outside diameter, part width, and container type have an effect on the cleanliness, but we’re also interested in the nature of these effects. The relationship we develop linking the predictors to the response is a statistical model or, more specifically, a regression model.

The term regression describes a general collection of techniques used in modeling a response as a function of predictors. The only regression models that we’ll consider in this discussion are linear models.

An example of a linear model for the cleaning data is shown below.

In this model, if the outside diameter increases by 1 unit, with the width remaining fixed, the removal increases by 1.2 units. Likewise, if the part width increases by 1 unit, with the outside diameter remaining fixed, the removal increases by 0.2 units. This model enables us to predict removal for parts with given outside diameters and widths.

For example, the predicted removal for parts with an outside diameter of 5 and a width of 3 is 16.6 units. In this example, we have two continuous predictors. When more than one predictor is used, the procedure is called multiple linear regression.

When only one continuous predictor is used, we refer to the modeling procedure as simple linear regression. For the remainder of this discussion, we’ll focus on simple linear regression.

A scatterplot indicates that there is a fairly strong positive relationship between Removal and OD (the outside diameter). To understand whether OD can be used to predict or estimate Removal, we fit a regression line. The fitted line estimates the mean of Removal for a given fixed value of OD. The value 4.099 is the intercept and 0.528 is the slope coefficient. The intercept, which is used to anchor the line, estimates Removal when the outside diameter is zero. Because diameter can’t be zero, the intercept isn’t of direct interest.

The slope coefficient estimates the average increase in Removal for a 1-unit increase in outside diameter. That is, for every 1-unit increase in outside diameter, Removal increases by 0.528 units on average.

Categories
What is Linear Regression ?

Regression vs. ANOVA

Let’s compare regression and ANOVA. In simple linear regression, both the response and the predictor are continuous. In ANOVA, the response is continuous, but the predictor, or factor, is nominal. The results are related statistically. In both cases, we’re building a general linear model. But the goals of the analysis are different.

Regression gives us a statistical model that enables us to predict a response at different values of the predictor, including values of the predictor not included in the original data.

ANOVA measures the mean shift in the response for the different categories of the factor. As such, it’s generally used to compare means for the different levels of the factor.

Categories
What is Linear Regression ?

When to use regression

We are often interested in understanding the relationship among several variables. Scatterplots and scatterplot matrices can be used to explore potential relationships between pairs of variables. Correlation provides a measure of the linear association between pairs of variables, but it doesn’t tell us about more complex relationships. For example, if the relationship is curvilinear, the correlation might be near zero. Four Scatterplots

You can use regression to develop a more formal understanding of relationships between variables. In regression, and in statistical modeling in general, we want to model the relationship between an output variable, or a response, and one or more input variables, or factors.

Depending on the context, output variables might also be referred to as dependent variables, outcomes, or simply Y variables, and input variables might be referred to as explanatory variableseffectspredictors or X variables.

We can use regression, and the results of regression modeling, to determine which variables have an effect on the response or help explain the response. This is known as explanatory modeling.

We can also use regression to predict the values of a response variable based on the values of the important predictors. This is generally referred to as predictive modeling. Or, we can use regression models for optimization, to determine settings of factors to optimize a response. Our optimization goal might be to find settings that lead to a maximum response or to a minimum response. Or the goal might be to hit a target within an acceptable window.

For example, let’s say we’re trying to improve process yield.

  • We might use regression to determine which variables contribute to high yields,
  • We might be interested in predicting process yield for future production, given values of our predictors, or
  • We might want to identify factor settings that lead to optimal yields.

We might also use the knowledge gained through regression modeling to design an experiment that will refine our process knowledge and drive further improvement.

Categories
What is Linear Regression ?

Assumptions of simple linear regression

Simple linear regression is a parametric test, meaning that it makes certain assumptions about the data. These assumptions are:

Homogeneity of variance (homoscedasticity): the size of the error in our prediction doesn’t change significantly across the values of the independent variable.

Independence of observations: the observations in the dataset were collected using statistically valid sampling methods, and there are no hidden relationships among observations.

Normality: The data follows a normal distribution.

Linear regression makes one additional assumption:

The relationship between the independent and dependent variable is linear: the line of best fit through the data points is a straight line (rather than a curve or some sort of grouping factor).

If your data do not meet the assumptions of homoscedasticity or normality, you may be able to use a nonparametric test instead, such as the Spearman rank test.Example: Data that doesn’t meet the assumptionsYou think there is a linear relationship between cured meat consumption and the incidence of colorectal cancer in the U.S. However, you find that much more data has been collected at high rates of meat consumption than at low rates of meat consumption, with the result that there is much more variation in the estimate of cancer rates at the low range than at the high range. Because the data violate the assumption of homoscedasticity, it doesn’t work for regression, but you perform a Spearman rank test instead.

If your data violate the assumption of independence of observations (e.g. if observations are repeated over time), you may be able to perform a linear mixed-effects model that accounts for the additional structure in the data.

How to perform a simple linear regression

Simple linear regression formula

The formula for a simple linear regression is:

Simple linear regression formula

y is the predicted value of the dependent variable (y) for any given value of the independent variable (x).

B0 is the intercept, the predicted value of y when the x is 0.

B1 is the regression coefficient – how much we expect y to change as x increases.

x is the independent variable ( the variable we expect is influencing y).

e is the error of the estimate, or how much variation there is in our estimate of the regression coefficient.

Linear regression finds the line of best fit line through your data by searching for the regression coefficient (B1) that minimizes the total error (e) of the model.

While you can perform a linear regression by hand, this is a tedious process, so most people use statistical programs to help them quickly analyze the data.

Simple linear regression in R

R is a free, powerful, and widely-used statistical program. Download the dataset to try it yourself using our income and happiness example.

Categories
What is Linear Regression ?

An introduction to simple linear regression

Published on February 19, 2020 by Rebecca Bevans. Revised on October 26, 2020.

Regression models describe the relationship between variables by fitting a line to the observed data. Linear regression models use a straight line, while logistic and nonlinear regression models use a curved line. Regression allows you to estimate how a dependent variable changes as the independent variable(s) change.

Simple linear regression is used to estimate the relationship between two quantitative variables. You can use simple linear regression when you want to know:

How strong the relationship is between two variables (e.g. the relationship between rainfall and soil erosion).

The value of the dependent variable at a certain value of the independent variable (e.g. the amount of soil erosion at a certain level of rainfall).

ExampleYou are a social researcher interested in the relationship between income and happiness. You survey 500 people whose incomes range from $15k to $75k and ask them to rank their happiness on a scale from 1 to 10.

Your independent variable (income) and dependent variable (happiness) are both quantitative, so you can do a regression analysis to see if there is a linear relationship between them.