Simple linear regression
Simple linear regression is a statistical method that examines the statistical relationship between the independent variable and a dependent variable. ‘Simple’ linear regression only concerns the relationship go one independent variable, whereas ‘multiple’ linear regression concerns two or more independent variables.
Example applications
Consider a situation where you have two variables of concern, and would like to understand whether there is a linear relationship between the two: that is, if one variable goes up or down, we can anticipate what the other variable will do. Examples include:
- height and weight - if height increases, we could expect that weight would increase
-
time spent studying and marks - if you spend more time studying, we could expect that you earn a better mark (in reality, there is probably a low correlation here!)
- responsibility and income - if the responsibility in your job increases, we could expect that the amount of money you earn increases
Steps
When preparing your answer, you should:
-
create a scatter plot of the data - determine the line of best fit using the least squares method (using a statistical package such as RStudio or a spreadsheet application makes this very easy){: .link-ext target=”_blank” }
- report the regression equation, and the S and r values (whether it is positive or negative)
- provide a prediction of a value based on the regression equation
Key concepts
- an explanation of the type of data that can be using in simple linear regression
- an example that walks through how to build a scatter plot and solve the regression equation
- advice to the student engineer on how to interpret the results
Core resources
-
John McDonald has authored an online textbook Handbook of Biological Statistics, which has a very clear entry on Linear Regression.
-
RStudio has an active help community, R-Bloggers. See their example of conducting simple linear regression.
Similar tools…
There are many statistical approaches for examining the relationship between variables that can be more appropriate given the assumptions in a simple linear model. If your data exhibits non-linear behaviour when you draw your scatter plot, you should consider digging a little deeper to find a better statistical approach.