Introduction to Scatter Plots, Line of Best Fit, and the Prediction Equation

4 teachers like this lesson
Print Lesson


SWBAT create a scatter plot, draw a line of best fit, write an equation for the line of best fit to predict values inside and outside of the data.

Big Idea

The emphasis in this lesson is to take students a little beyond the basics of Scatter Plots to explain the correlation coefficient (r) and the coefficient of determination (r squared).

Warm up

15 minutes

To begin class, my students will complete the following online practice as a review of scatter plots.  I plan for my students to spend five minutes completing the exercise. Then, we will spend five minutes reviewing what they learned.  In particular, I will ask students to explain their reasoning. Much of our work today will require careful explanations.

It is usually the case that some of my students draw a blank when they try to recall this material. But, I want to communicate to them the fact that this content is review. Learning the vocabulary and using the different representations correctly are my expectations for what my students can do independently. I provide students with graph paper or an individual white board with a graph on it as a resource to use as they work on these two Warmup_Problems.

  • The Mall Problem data shows no correlation.
  • The price of gas compared to the price of milk is an example of correlation between variables when there is no causation.

In my course the students have  learned how to write the equation of a line. We have also discussed the selection of two "good points" for determining a best fit line. During the discussion of this Warm Up, I also introduce the terms interpolate and extrapolate.  Since we will be using scatter plots to make predictions, I want my students to begin learning how to talk about their predictions using precise language. 

URL for Online Practice

 (Last accessed 3-21-15)




Guided Practice

25 minutes

After reviewing the Warm Up with the students, I provide students with a Guided Practice.  The purpose of this lesson is to take the students beyond the basics of Scatter Plots that were introduced in 8th grade.  I use this Guided Practice to help students build their understanding of the following: 

  • types (patterns) of correlation
  • correlation coefficient (r)
  • determination coefficient (r-squared)
  • least squares regression line
  • line of best fit
  • trend line

Once they are familiar with these terms, my students will be better able to explain their thinking when they interpret scatter plots of bivariate data. For today, students do not calculate by hand. Our efforts focus on developing intuitions and vocabulary for interpreting values taken from a graphing calculator. I want my students to be able to interpret these values meaningfully. 

The Guided Practice includes two sets of bivariate data. I will work through one of the data sets with the students. Then, my students will work with the other data set on their own.  Inmy demonstration I focus on the correlation coefficient (r) and the determination coefficient (r-squared). As we work the problem I encourage my students to contribute by asking them to explain concept from the Warmup: pattern of correlation, line of best fit, interpolation, extrapolation. I plan to introduce the following terms:

  • trend line
  • regression line

After creating a scatter plot and a line of best fit for the first data set, I will ask my students to draw a line that best represents the data in the second set. I expect that they will correctly draw a line in the direction of the points. I will remind them to try to keep about the same amount of points above and below the line.  Then I will pose the question, "Why do we try to keep about the same amount of points above and below the line?" We will discuss this for a few minutes before I share the following video resources:

1.  Correlation Coefficient (r)


2.  Least Squared Regression


3.  Coefficient of Determination (r squared) 


After watching the videos I expect my students to understand that a strong correlation means that the r-value is close to one or negative one.  In addition, the r-squared value helps to explain the percentage of the data points that fall on the regression line. Thus, this value is better if it is close to zero.

The final activity on the Guided Practice worksheet asks students to complete their approximations of r and r-squared and then compare it to the calculator.  I will use a calculator to demonstrate using the speed to braking distance on dry pavement data.  Then, I will have students complete the same process with the wet pavement data.


Exit slip

10 minutes

With about 10 minutes remaining in the period, I will hand my students an Exit Slip.  The Exit Slip asks students to explain their results of the Wet Pavement Data set from the Guided Practice. I first want students to state the differences in their approximations for r and r-squared compared to the calculator. I ask them to explain the differences between the approximate and calculated values in their own words.

I also want my students argue whether or not a linear regression was an appropriate technique for modeling the data. Does the resulting model represent the data well. Here, I am expecting students to explain that the correlation coefficient r, should be close to positive one or negative one for a strong linear fit.

Finally, I would like students to square their r-value consider how much error results from using this model to represent the given data. At the end of the lesson, I want to remind students that the closer the r-squared statistic is to zero, the better the fit for the Least Regression Line.