Correlation vs. Causation
Lesson 6 of 10
Objective: SWBAT distinguish between correlation and causation.
First of all, I hope my students notice that there is plenty wrong with this graph. It's on the second slide of today's lesson notes, and I have it projected at the front of the room as students arrive. We won't do anything too formal with this - I'm just trying to get some conversations started.
Before we even get to the whole correlation vs. causation discussion, I hope that my students are appalled by the x-axis of this graph. I'll make an informal assessment of how quickly each of my students expresses their concern about that.
Once we get over that initial shock, we'll compare this to yesterday's work with Gapminder. Although I didn't say it then, yesterday's work was grounded in ideas of causation. Implicit in kids making their predictions about relationships between data sets was an interpretation of why those relationships exist. Choosing which data set to put on the x-axis and which to put on the y-axis involves considering which one is the explanatory variable and which is the response variable. So with all of that in mind, does this graph relating the number of pirates in the world to rising global temperature even make sense? And, if we want to show a negative correlation, should the graph be decreasing from left to right?
I hope you can sense my facetiousness. In class, I'm using this approach to have a little fun, and to make it perfectly clear how things can get out of hand when we start searching for correlations among data sets. The thing about correlation vs. causation lessons in general is that they're so much fun, and that there's an infinite amount that we could choose to talk about!
Today's lesson is a quick survey of a few situations. I'll put a few examples on the table, and sets the stage for us to be able to talk about this any time, for the rest of the year.
As we get started, a few more points to consider. Teaching causation is odd because it's a departure from a lot of the high school algebra curriculum. So much of how we teach functions is all about causation. This input, x, affects this output, y. It's built in. Today, I want to undo some of that. It's hard, after all the training kids have been through, not to take it for granted that there is a functional relationship between two quantities. Then, even if there is a causal relationship, we have to decide which variable influences the other, before we can put it on an axis.
To move from one completely preposterous preposition to another, we watch this video, titled "Correlation vs Causation." It's short, and does a nice job explaining the often-cited correlation between ice cream consumption and murder rates.
The difference between this example and the opener is that at least there's a reasonable explanation for it. It's not that eating ice cream causes people to murder each other*, it's that both increase when the weather is warmer. I tell students that this is an important idea: when we're trying to determine if some correlation indicates a causal relationship, we must always consider the possibility that some third variable is really causing both.
To follow the video, I briefly post another absurd (but popular) graph and ask kids if they buy it. If they laugh and say "heck, no," then I know we're on the right track.
Our fast-paced opening survey of the data continues when we revisit this example from yesterday's Gapminder lesson. After the first few examples, at least this one doesn't seem completely crazy. There are a few different ways that we might explain the correlation between per capita income and energy use. I elicit ideas from the class, and often they'll start to get the idea on their own: even when some connection seems reasonable enough, sometimes it's hard to say that one causes the other.
To summarize a good discussion or to get kids thinking if they don't get to this point, I post slide #6 from the lesson notes. Then, on slide #7, I generalize that for any situation, it may be the case that phenomena A is the cause of phenomena B, that B is the cause of A, that some other factor is the cause of both, or that there is no relationship at all.
I lead a quick review of what we've seen so far. In the pirates vs. global warming example, it feels pretty safe to say that there's no relationship. In the case of ice cream vs. murder rates, the weather is the third variable, which causes both variables to rise and fall in tandem. The IE and murder example is just as ridiculous as the first one, and that brings us to income and energy use. "To answer the question of whether or not there is a causal relationship here, we'd need to do a little more research," I say.
Two Case Studies
It strikes me that distinguishing between correlation and causation (CCSS S-ID.9) plays an important role in Mathematical Practice #3. To be able to "construct viable arguments and critique the reasoning of others," students must be able to decide when one factor can be reliably attributed as the cause of another. As they practice to master these standards in tandem, I want students to read and critique the reasoning of others before putting their own arguments to paper. The brief case studies that follow are designed with that goal in mind.
Students receive this double-sided handout, and I provide links to news articles in class.
Case Study #1: Texting and "Shallowness"
A few years ago, Nicholas Carr released an excellent book called The Shallows, and the research cited here stemmed from that. I'm not totally sure I agree with the methods here, but what's most interesting about this example is how different newspapers interpreted the same research. I should also note that I got this idea reading a post on Quora about the relationship between correlation and causation.
Students read two articles about the same research:
Look at the dramatic difference between those two headlines! The question is, which one causes which? Does texting actually make people shallower, or do shallower people end up texting more often? Or maybe there's no cause at all? I don't rush to say any of this to my students. I hope that they'll pick up the articles and begin to notice the different approach being taken by each.
After giving students 20 minutes or so to read and write, I listen to the kinds of conversations that begin to happen around the room. Some students will consider it all preposterous, and others will argue that there's some truth here.
When it feels like everyone has had time to read and record their thoughts, I ask everyone to share out. I ask, "What assumptions are made in each of these headlines? Is there a difference?" Then, "Do you believe both articles equally, or does one feel more trustworthy to you? Why?"
The key is to bring it back to our generalizations: does this cause that, does that cause this, or is there something else entirely. The conversation can get very big here - maybe there's some other grand variable guiding our modern life that makes us all shallow texters - which I love to talk about, but I do my best to rein it in and tell kids that I'd love to shoot the breeze about that question after school anytime.
Case Study #2: Income and Education
To transition to the second case study, I post slide #9. It's pretty well documented - and students are told frequently enough - that getting an education will lead to opportunities for higher income later in life. But is it really as simple as that? Again, I want students to come to this question on their own, and I hope that this article "Education Gap Grows Between Rich and Poor, Studies Show", helps them get there.
With this graphic posted at the front of the room, I tell students to talk at their tables about what they see. Are they surprised by anything? After a few minutes, I bring it back to today's lesson: even though we might accept the correlation between income and education, how clear is it that one is the "explanatory variable" and one is the "response variable"? I provide the article, and we repeat the process from Case Study #1. I give everyone time to read and write, and then we bring it back for a closing conversation.
The final task for this case study is to do some more research. I ask students to find other articles, data or graphics online about this topic. I ask, "What is most (or least) convincing?" Once again, the wider implications of this distinction are worth our time. To recognize the extent to which a policy maker believes that a family's income level affects their educational prospects versus believing that education opportunities always lead to higher income is to begin to understand how complicated modern politics can be.
Do we have time for all of that in today's class? Probably not - but as my students finish high school, it's so important that they ask these sorts of questions.
The focused search for more resources sets the stage for a continued call for students to bring articles about other topics to class. For the rest of the semester, I'll tell kids to keep their eyes out for articles that mistakenly assume that correlation implies causation. As I wrote at the beginning of the lesson, the list of things we could talk about is infinite. Over the next few days, be sure to see where kids might want to go with this.