Expore Correlation on Gapminder
Lesson 5 of 10
Objective: SWBAT to make and test predictions about the relationships between data sets, as they identify positive and negative correlations.
Opener: Homework vs GPA
Today's opener is on the second of slide of today's class notes. I ask students to sketch a graph that shows the relationship between the amount of homework students get done and their GPA.
Doing my best to avoid any sort of editorializing, I post the opener and tell student to sketch this graph in their notes. If anyone has a hard time getting started, I ensure them that there are no wrong answers here: "Your interpretation is what I'm looking for here. Of course, you should be able to back up your ideas!" I say. After a few minutes, I invite two or three volunteers to share what they've got by sketching their answer on the board. By the time students start to share on the board, kids are talking about it, and I allow conversations to happen at each table. When someone posts their work on the board, we'll take a moment for them to explain their decisions.
By opening with this task, I hope to accomplish two sets of goals. At the surface is one set of goals: I want to get kids talking about this relationship, and if they propose one, I want them to back up their ideas with words. I want everyone to share an idea of what the graph might look like if two sets of data are positively correlated.
Going a little deeper, there are some finer mathematical points I hope for kids to get. I want to get at some ideas about representations of continuous vs. discrete data. I ask the class whether it is more accurate to draw a line or curve, or to sketch a set of points? When everyone recognizes this as discrete data - especially if we're talking about a relative small sample - then we can also start talking about specific points. Many points might follow the simple pattern that the greater the amount of homework someone does, the higher their GPA will be. But aren't there exceptions?
I ask, "Is it possible to do no homework and still get a high grade?" (In a Standards-Based Grading system the answer is, yes, of course, but I digress.) If we don't already have such a point on the graph, I ask if anyone wants to come draw it. This serves as a double-check on whether kids understand how the graph is set up. Conversely, "Is it possible to get a lower grade even though you do all your work?" We repeat placing a point like that on the graph.
Finally, we can note that some sort of line through the middle of the data can still be useful, because it shows a general trend. We're going to get to that in the next few days.
Watch the Video
With ideas about discrete data, relationships between data, and trends in mind, I tell everyone that we're going to watch a video about two more data sets.
If you've never seen Hans Rosling's video 200 Years, 200 Countries, 4 Minutes, then it's my great honor to make the introduction. Watch this video, and watch it with your students. It's a classic.
I distribute the four-page Exploring Correlation on Gapminder packet. The first page is for students to take notes as they watch the video. There's not really enough room on this page to answer all questions in great depth, but this page is primarily a guide for students to watch the video. If anyone wants to write more, I tell them it's fine to write these answers in a notebook.
After we watch, it's important to discuss how much is going on here. Rosling squeezes an awful lot of information into a small space! In fact, I note that this is really 5-dimensional data! There's the wealth and health data on each axis. Then there's time, as Rosling animates in the video - really, it's just the same graph plotted for the data each year. Additionally, population information is given by the size of dot, and geography is denoted by the color of dot. As we dig into our stats course, what an exemplar this is: look how much information we can express in one place!
For today, we're just going to focus just on the axes, but it's always engaging to note the wealth of information in this video.
I give a short lecture by using slides 3 through 11 of today's Powerpoint slides. I introduce learning target 2.3:
I can represent data on two quantitative variables on a scatter plot, and describe how the variables are related.
I introduce the word "correlation," and then provide usage examples. "From what we saw in the video, we can say that the health and wealth of nations have a positive correlation," I say. Two quantitive measures might also share a negative correlation or no correlation at all. We can also talk about the strength of a correlation. "We'll get a little more technical over the next few days," I say, "but for today we're just going to look at some data sets, and try to find some the correlate to each other in some way."
I provide a visual example of each of these, then we get started on the activity.
At www.gapminder.org/world, we can access the same tool that Hans Rosling uses in his video. I show students how it works, and that we can change the axes by choosing from some extensive menus of data sets.
I explain that today's task is for students to choose pairs of data sets that think will have a positive, negative, or no correlation. On pages 2 through 4 of Exploring Correlation on Gapminder, students make and justify predictions about how different data sets might correlated.
Students will need help defining what some categories mean, and you can scaffold this part of the lesson more or less here with definitions and suggested starting places. I find this part of the lesson so engaging, because we're talking about measuring all sorts of social, cultural, and economic phenomena, and I'm always fascinated by the conversations kids want to have.
On slides 15 through 21, I suggest some of the most accessible data sets. These slides can be posted and discussed individually from the front of the room, they can be printed and posted around the room as a gallery walk, or can be consolidated to one sheet of a paper to make a menu of options (Gapminder Menu), with a few copies left on each table.
This is where the lesson really opens up - and I love this! - because it's impossible to know what will capture kids attention and interest. They'll have all sorts of questions. Follow your students on whatever journey they'd like to go on.
Circulate as Students Work
As students get to work on their predictions, I check in around the room. I continue to help defining what different data sets mean, and discuss how quantifying data always involves making certain editorial decisions. If a student is stuck, I suggest that they choose one data set that they find interesting, and then look for another that might relate to it in some way.
Students will be making sense of what makes a positive correlation vs. a negative one. I show students with my thumbs: "if one data measurement increases," I raise one thumb though the air, "while another increases," I raise the other thumb next to the first one, "then that's a positive correlation." To illustrate a negative correlation, I raise one upward-pointing thumb while lowering the other pointing down.
This is a big task for 15 minutes, and it's fine if it takes longer. I try to balance between exploring every fascinating tangent that might come up, and illustrating the big that there's a lot of data out there, and some sets correlate with others.
When students are done making their predictions, they show me, and if Step 1 is complete on all three pages, I give the student a computer so they can go to gapminder.com to test their predictions.
Students go to www.gapminder.org/world, and choose from menus to make graphs that reveal whether their predictions came true or not. They sketch what they see, and answer three questions at the bottom each page.
I do not ask for perfect representations of all data that is represented on Gapminder. I tell students not to worry about the size of the dots or the color of each point. They should choose the most recent year that is available, which for some data sets, like Per Capita Income vs. Energy Use, may involve scrolling back a year or two to see the most recent available data.
I do make sure that students pay close attention to scale. As with other parts of today's lesson, there can be a lot more to discuss than just appears on the surface. For each data set on Gapminder, the axes will rescale automatically to fit what the graph is displaying. This may involving changing between linear and logarithmic scale, which might be a new concept for some students. In itself, this is a great launching point for a whole separate lesson. For our purposes, I explain that linear scale grows by some constant quantity (like an arithmetic sequence), while log scales grow by some constant factor or percentage (like a geometric sequence). Of course, you should follow through or extend on this topic as much as you see fit.
Like Trademap earlier in the week, Gapminder is a pretty complex tool. A lesson like this provides an opportunity for students to engage with a richly-featured tool, and to see what learning detours might result. It might be messy, but that's part of the reason we're doing it!
Share-Out & Debrief
Toward the end of class, it's important to debrief on this activity. I ask for volunteers to share graphs that they found particularly surprising or enlightening. Was there a pair of data sets that you were sure would correlate that did not? Was the opposite true for any of your predictions? Conversations along these lines have been happening in small groups, so I might ask some students to share what they've been talking about if I hear some juicy idea.
Finally - and to a statistician, most importantly - I share a word of caution: correlation is not causation, and that's what we'll explore tomorrow. Even though we are free to speculate about why a certain correlation can be observed, we have to be vigilant about not jumping to conclusions.