Lesson 7 of 10
Objective: SWBAT model a linear function to data that suggests a linear association.
Today's lesson is an introduction to linear regression lines. The opener is on the second slide of today's lesson notes. To get started, we'll have a little fun.
I project this graph on the front board, and pose the question: If a man was 8 feet tall, how much would you expect him to weigh? It's low stakes, because the question is pretty absurd (though not impossible), and the idea is just to get kids thinking about making a prediction based on data in a scatter plot.
Note that as kids think about this question, they have to pay attention to how the x-axis is labeled and scaled. The question is about measuring height in feet, but the axis is labeled in inches. When they work on today's assignment, students will have thoughtfully plan how they scale their axes.
There are plenty of possible answers here, because we can imagine differently-placed trend-lines on this graph. Depending on how we assess the trend, we'll get different answers, and that's part of the fun, and it really gets kids thinking, talking, and having some good-natured debates about what might happen. If any student wants to, I'll allow them to sketch some lines on the board. Even though we haven't yet studied lines of best-fit, it's natural to want to fit a line to this data and see where it goes. On the 3rd and 4th slides, I provide some space for kids to extrapolate up to 96 inches and beyond.
Now that I've got everyone talking and thinking in these terms, I say that today, we're going to learn about how to use the data in a scatter plot to create models that can help us make predictions like this.
Notes and Mini-Lesson
What's a Regression?
I try to develop ideas and informal definitions as often as possible, so to begin today's lecture notes (see slides 5 through 14 of the lesson notes), I say that a regression is a statistical tool for modeling data. When we make predictions and sketch our own lines through data, like we did on the opener, we are doing the work that a regression does. Over the next few lessons, we'll look at some ways this is done.
Vocab Check: Association, Correlation, Causation
I post learning target 2.5 on the board (slide #5), which says:
I can fit a linear function to data that suggests a linear association.
I say, "One method of fitting a linear function to data is by running a linear regression. But before we go any further with that "r" word, there's another word in the SLT that deserves our attention: association."
In the last two lessons, I've used the term correlation with students, but formally, we've really only been looking at associations thus far. Now (on slide #6), I make the distinction for students, that association is a general term that is used to describe whether or not one set of data moves with another. Correlation is more strictly-defined, because it implies a linear relationship. Additionally, correlation can be measured. That's coming up later this week.
Finally, on slide #7, I provide some background notes to finish framing our work with linear regression.
Mini-Lesson: Median-Median Lines
Today, I'm going to teach a lesser-known linear regression method called the median-median line. I find that this topic really helps students get a feel for the data, and it's a nice review of median and writing the equation of a line through two points. Check out this article published by the American Statistical Association that describes how this method provides a simple way "to motivate the idea of fitting a straight line to data." In addition to providing a deeper background on the median-median line regression method, the article includes a few data sets that are great to use with students.
For today's lesson, I'm briefly abandoning context. I just want my students to play with the numbers, and practice this skill. Tomorrow, we'll jump right back into using linear regressions in context.
My notes for students are on slides #8-18 of the lesson notes. We work step-by-step with an example, and I tell students to take notes at each step. Then, they practice this method. Here, I provide an overview of how I deliver this example.
- Step 1: First, we have to order our datum from least to greatest by looking at the x-values (predictor variables) for each point. Here, the y-values (response variables) do not matter; we're only ordering points by x-value.
- Step 2: It's optional to plot the points on a graph, but it's part of what I'm asking kids to do today, and it helps students to see what's happening as we move along. I set up a graph on a side board, so I can continue to present notes from the Powerpoint slides.
- Step 3: Split the data into lowest, middle, and highest thirds. I illustrate what I mean on the graph. This is a simple example because there are just six points. At the end of the mini-lesson and as students practice on their own, I show students that the lowest and highest thirds of the data should always have the same number of points.
- Step 4: Next, we find the "median point" for each third of the data. The x-value of the median point is the median of all x-values in this 1/3 of the data, and the same is true for the y-value. Again, this is a simple example because there are just two points in each third, and as we add more points, this part of the activity will change a little. Once again, I illustrate this step by adding to our graph, where you can see the "median points" in green.
- Step 5: Here's where we get a nice review of finding the equation of a line through two points. The next step is to find the equation of the line through the first and third median points. I show what this step looks like on the graph, and I review the algebraic method too.
- Step 6: Then, we find the equation of the line through the middle median point that is parallel to the line we got in Step 5. Isn't it great how much review is built in to this activity?
- Step 7: Finally, the equation for the median-median line has the same slope as the line we found in Step 5. Its y-intercept is defined as the average of the y-intercepts of each of the three lines that run through our "median points". Because the lines through the first and last points are really the same line, another way to think about this is that the median-median line is 1/3 of the way between the first line (from Step 5) and the second one (from Step 6). It helps a lot to be able to see what this looks like on the graph.
After working through this example, I erase all the evidence of our steps to show our final result: a line running through the data. I find that it's important to include this step, because I want students to end up with the understanding of what we just did. We used a regression process to find a line that can be used to make some generalizations about this data set, and this is what it looks like.
Following the mini-lesson, students work to follow the steps and find the median-median lines for five data sets. Here is the two-sided handout; you'll also want to have graph paper available. I'm also including the solutions here, with graphs copied from Desmos. Just as food for thought, this answer key includes the results of a least-squares regression for each data set, so you can see how they differ.
Students work alone or in small groups to get the assignment done, and I circulate to help, check their work, and offer encouragement. Again, we're taking a brief break from context today, and just working with the numbers. This lesson is a review of all that background knowledge kids need to really grasp these ideas around quantitative bivariate data. Students have a chance to practice plotting points, finding median, and writing linear functions in slope-intercept form.
The examples I provide here are more difficult than the one I used in the mini-lesson for a few reasons. First of all, the slopes and y-intercepts aren't very "nice" numbers. When it comes to sketching these lines on paper, most students will need help understanding how to use the decimal slope values that result from their median-median regressions. It helps to plot a few points and to connect them as we sketch these lines. On exercise #3, for example, we might just think about the the values of y when x is 0, 20, 40, and 60, instead of really trying to "rise and run" that slope of -12.74.
The other challenge here is to help students understand what to do when the number of data points is not divisible by 3. On slides #16-18, I provide examples of how to partition the data. For both of the challenges I've just noted, I wait for students to ask questions, and deliver these notes as needed.
Debrief: One Thing
With a few minutes left in class, I call everyone to attention for a quick debrief. I ask everyone to share an observation, a question, or something that surprised them from today's lesson. This is a quick, informal way to get an overall vibe check of the room. Usually, I'm not surprised by what kids say; but it's a nice chance to get a full picture.
Usually, kids will need a little more time to finish the last exercise or two. I tell everyone to finish what they can for homework, and that we'll check answers tomorrow.