Categories

# Linear regression example

We have 50 parts with various inside diameters, outside diameters, and widths. Parts are cleaned using one of three container types. Cleanliness is a measure of the particulates on the parts. This is measured before and after running the parts through the cleaning process. The response of interest is Removal. This is the difference between pre-cleaning and post-cleaning measures.

We’re interested in whether the inside diameter, outside diameter, part width, and container type have an effect on the cleanliness, but we’re also interested in the nature of these effects. The relationship we develop linking the predictors to the response is a statistical model or, more specifically, a regression model.

The term regression describes a general collection of techniques used in modeling a response as a function of predictors. The only regression models that we’ll consider in this discussion are linear models.

An example of a linear model for the cleaning data is shown below.

In this model, if the outside diameter increases by 1 unit, with the width remaining fixed, the removal increases by 1.2 units. Likewise, if the part width increases by 1 unit, with the outside diameter remaining fixed, the removal increases by 0.2 units. This model enables us to predict removal for parts with given outside diameters and widths.

For example, the predicted removal for parts with an outside diameter of 5 and a width of 3 is 16.6 units. In this example, we have two continuous predictors. When more than one predictor is used, the procedure is called multiple linear regression.

When only one continuous predictor is used, we refer to the modeling procedure as simple linear regression. For the remainder of this discussion, we’ll focus on simple linear regression.

A scatterplot indicates that there is a fairly strong positive relationship between Removal and OD (the outside diameter). To understand whether OD can be used to predict or estimate Removal, we fit a regression line. The fitted line estimates the mean of Removal for a given fixed value of OD. The value 4.099 is the intercept and 0.528 is the slope coefficient. The intercept, which is used to anchor the line, estimates Removal when the outside diameter is zero. Because diameter can’t be zero, the intercept isn’t of direct interest.

The slope coefficient estimates the average increase in Removal for a 1-unit increase in outside diameter. That is, for every 1-unit increase in outside diameter, Removal increases by 0.528 units on average.