SWBAT interpret differences in the center and spread of two or more different data sets, as represented in box plots.

Compared to data analysis, interpretation of data is less about working hard and more about thinking hard.

5 minutes

Today's opener should be done pretty quickly. I shoot for under five minutes, because it's the fourth consecutive day that class has started with percentages, and by now I expect to see that this topic has been sufficiently reviewed for all kids.

What's fun about this one is that all the answers are the same. I love saying nothing and just letting kids discover this. I can pretty accurately assess their confidence with percentage problems by tracking the amount of time between when they start their work and when they crack a smile.

For students still lacking confidence on these problems, I write these notes on the board after a few minutes, just to reinforce the algorithm. For the most part, this opener is a confidence builder, and then we're on our way.

10 minutes

Today's mini-lesson and problem set are focused on the following two standards:

**Use statistics appropriate to the shape of the data distribution to compare center and spread of two or more different data sets (S-ID.2).****Interpret differences in shape, center, and spread in the context of data sets (S-ID.3).**

In this lesson, my main focus is on center and spread, and I treat shape as an extension. All of the data that we look at today will be in the form of box plots that I've prepared using Fathom.

For both this mini-lesson and the problem set, I continue to draw on data collected in the survey from two weeks ago. This part is fun: in two of the questions, I asked students to guess the ages of their ELA and Science teachers (we work on a 9th grade team where all ninth graders rotate through the same teachers). Now I present the resulting data as a pair of box plots.

To begin, I briefly share my method. I recall that 99 students took the survey. I explain that I've already cleaned up the data a little bit by eliminating any answers that were lower than 20 or higher than 100. I note, "Someone thought Mr. Moore might be 800 years old. Is it ok that I left that out?" Students recall our conversation about 999 quadrillion text messages, and they recognize that here I've done the same work that they were asked to do on the first Social Media Problem Set.

Then we discuss the data. On the first slide of the lecture notes is the data. I ask, "Who would like to compare these two data sets?" and I guide the conversation from observations like, "Mr. Moore's graph is bigger than Mr. Sabourin's" to more statistical language, and then on to interpretations. If that's all it takes, that's great, and if not, I have a few questions to guide lecture on the second slide.

When it comes to interpreting the data, the key is to note that these are student opinions. So when we try to interpret the difference in the median values of each plot, I ask, "Does this data prove that Mr. Moore is older than Mr. Sabourin?" It's satisfying to hear kids recognize that it doesn't: it just proves that, generally, the 9th graders who took this survey think that Mr. Moore is older. I push it a little farther, just to really hit on the word *interpret*. "So if I didn't know these guys, could I assume that Mr. Moore acts older than Mr. Sabourin? How would a stranger interpret this data? What would a stranger expect upon meeting these guys?"

Next, we talk about spread. From the box plots, we see that the maximum in each data set is the same (curiously, in this year's data, there was an empty gulf for both teachers between 60 and the answers above 100; some of my students attributed this to their awareness of the teacher retirement age). We also see that Mr. Moore's minimum was lower than Mr. Sab's, giving the first data set a greater range. I'm always very interested to hear how students interpret differences in range, so at first, I don't push anything here. Eventually, when I hear students talk about a "variety of answers" or describe the idea that a wider spread indicates more uncertainty, I try to guide students toward that kind of thinking. My favorite summary so far was this: "It's just harder to tell how old Mr. Moore is!"

The last two questions are about percentiles, and you can see my notes here. There are also questions of this sort on the problem set, so I note which students were able to follow confidently on these problems, and I know that I'll target those who could not when we get to work.

I have also prepared two additional slides that show the same data in stacked dot plots and histograms. These allow me to extend the lecture by asking which representation is the most useful, or by offering another way to talk about shape, if we get to that. I'll use these as time allows in each class, either now, after students have worked for a while, or in an upcoming review lesson.

3 minutes

We've been following two parallel lesson sequences in this class, so I take a moment to check in with students. There is the ongoing "Where Does My Stuff Come From?" project, from which we're taking a one-day break after two days of work. In this project, there's a natural break in between Part 3 (which was the focus of yesterday's class) and Part 4 (which will kick off tomorrow). This allows time today, if necessary, for groups to put finishing touches on the work of data organization in Part 3.

I've turned part of today's agenda into a to-do list for students, so they can see exactly what needs to be finish. First is the project. Next there's the first Social Media Problem Set, and the main reason this is up is to remind student to turn it in, if they haven't already. If both of those are done, then students are ready for today's new problem set, Social Media #2.

25 minutes

Following the first Social Media Problem Set, on which students were asked to sift through a lot data, to plot it, and then to calculate measures of center and spread, today's problem set, Social Media #2 emphasizes analysis and interpretation. I hope to see that kids recognize that there's less *work* to do here, because a lot of what they had to do on the previous problem set is already done. This assignment is more about the synthesis and application of skills. Students must read a set of three box plots, compare them and interpret their differences, and calculate percentages.

I thought about calling this assignment a quiz before deciding against it, but I try to treat it like one once students get to work by staying out of their hair, and only helping if they have questions. When they do have questions, I guide them to consult their notes rather than answering directly. If they forget what "IQR" stands for, I show them where to find it. If they need help on the writing prompt or on the percentage problems, I remind them about today's mini-lesson.

The writing prompt is the most important part of this assignment, because it shows me how well each student understands the idea of interpretation. Based on the data, it's actually pretty hard to argue that any network is more popular than Facebook. Having a straightforward answer allows kids to focus on the evidence. I hope to see that each student specifically references the data, followed by their own explanations of what the data means. I know that not everyone is there yet, but when I return their work, I'll reference a few exemplars. At the end of the unit, there is another writing prompt like this, with a less obvious answer.

There is no formal closing to today's lesson because there's enough work here that everyone is busy right up to the bell. With a minute or so left in class, I tell students to submit whatever they can now, and finish up whatever is not done for homework.