SWBAT compare two different data sets by using plots on the real number line, measures of center, and measures of spread. They will also make informal decisions about how to address outliers in the context of a real data set.

If we're going to use real data to answer a question, the first step is to make sense of that data.

10 minutes

One of the best things about using real, student-generated data in the classroom is that it never fails to open up unexpected teaching opportunities. So when, on a computer-based survey, I ask students how many texts they send daily, there will always be a few students who write an absurdly large number. This year, two of the 99 answers I received were 999,999,999,999,999,999 and 10,000,000,000,000,000 (Quick! Quiz yourself: how do you even say these two numbers?*). So of course, there must be a little fun we can have with these answers, right?

That's what I do with today's opener, by posing the question:

**How long would it take to send 999,999,999,999,999,999 texts?**

This opener serves two purposes. First, it will provide some background for the idea that we're going to have to scrub the data a little bit. Second, it allows us to have a little adventure in number sense as we play with this rather unfathomable number.

While the question is up, and as students take a moment to begin to ponder it, I distribute the problem set that I describe in the next section of this lesson. Taking a look at the data in the problem set, students see that one of their colleagues claimed to send this many texts every day. Context for one relies on the other, so once kids see where I'm getting this, then they're really engaged.

Immediately, answers start to fly: you could do it in a few days! A month or two! A year! Someone says a lifetime - which raises eyebrows, but really gets us going. To assist in framing the problem, I say, "Suppose that you could send one text every second, without doing anything else. No eating, sleeping...nothing, but text...text...text..." I continue, pressing the button of an imaginary phone as I keep counting off the seconds. "How long would it take to send this many texts?"

I write a dozen or so student guesses on the board, then I ask what we'd have to do to figure this out. Students tell me that they'll need to know how many seconds there are in an hour, so we figure that out, and we compare 3600 to our target. They laugh as they look at the two numbers side by side. "What now?" I ask. They realize that we'll need the number of seconds in a day (86,400) and in a year (31,536,000), and then that even these numbers look quite small. I ask, "So even if you could send one text every second for a whole year, would you be close?" and suddenly the idea of a lifetime sounds reasonable enough that kids are suddenly proposing "two lifetimes!" as a viable answer.

It's easy enough to get the number of seconds in a century by adding two zeros to the number of seconds in a year, but even 3.1 billion looks weirdly small at this point. (This is also a good chance to talk about how, if you really love someone and want to surprise them with an unexpected party, it's fun to celebrate when someone reaches the 1 billion second mark.)

Some intrepid students will now start cursing their calculators, because the number is too big for them to make the calculation for themselves. At this point, I don't want to spend too much time on the details - students get that it would take a long time, and they're primed for the big reveal. There are a few options here, but I like to just slowly write out the number on the board, one digit at a time, and watch jaws drop as I keep adding zeros. It turns out to be just under 32 billion years, or better put, almost seven times the age of our sun.

"So," I ask with a smile, "is it possible to send this many texts in a day?"

* After trillions come quadrillions. One quadrillion bytes of a data make a petabyte, which is a pretty legitimate object at this point. We don't yet live in an age where a quadrillion dollars is a real thing, but hey, is it possible that we're getting there?

28 minutes

Today's problem set, Social Media #1, asks kids to answer the question:

**These days, is it more common for teenagers to use texting or to use social networks when they communicate?**

There is, of course, a lot that we could reference as we formulate an answer to this question, but for now we're just going to look at the data that I gathered by polling students in last week's survey.

As we've already seen in the opener, we can't trust every answer on here. Our work on today's opener leads neatly into an informal definition for outliers (whose formal definition won't be a focus of our work this year). Before we begin to use the set of student answers, we'll have to clean it up a little bit. This is the real, often hidden work of conducting a survey. Today's assignment about representing the data. It is just as much about interpreting it in context and deciding how to deal with a real, unruly set of numbers.

The first task is to count the zeros. As I circulate, I ask what the zeros represent at each table. It's funny to hear kids say that someone must be lying if they claim to send zero texts per day. I push these students to consider the alternatives. I ask, "Does everyone have a phone they can use for texting?" I ask for a ballpark estimate of what percentage of students have their own phones, estimates usually come in around the 75-80% mark. When students count that 20 of 99 students report sending no texts every day, I suggest that my interpretation is that these 0's represent people who don't have phones. We should note how many people that is, but not include those in the analysis.

Then we have to look at the top end of the data. We all agree that no one can send a quadrillion texts per day, but below that, every student picks a different cut off point for their data. Some believe that it's possible to send 1000 texts per day, and others don't buy it; an argument could be made for either side. I try to make sure they're deciding on where to cut the data based on what they actually believe, and not on what's convenient for their forthcoming graphs.

Next, with the data that remains, it's time to decide on the best kind of plot. After making dot plots, box plots and histograms over the last two weeks, students now have a choice of which to use. It's great to see the recognition from kids, on their own, that sometimes a dot plot is not the best option. Dot plots are simple, sure, but that doesn't mean they're powerful or always easy to use, and that's an important idea when it comes to **strategically using appropriate tools (MP5)**. What's great is that this lesson basically teaches itself as kids experience it. It's always fun, after kids submit their work, to ask, "Was a histogram your first choice of representation here?" and to see their surprise as they say, "No - but wait, how did you know?"

Some students will recall stem and leaf plots, and if they do I emphasize that this is a great tool for organizing data, but that I'd also like one of the plots from the learning target. I show these students what stem and leaf plots have in common with histograms - that they're basically histograms with a bin width of 10.

After analyzing and representing the texting data, students then do the same with set of answers to the question, "How many messages do you send on social media each day?" and calculate the measures of center and spread for each data set. Although they are not specifically prompted to interpret the differences between the two sets, they will naturally begin to explain their answers to the question at the top of the assignment. As these conversations happen, I ask students to support their answers with evidence from the data. In two upcoming assignments, we will more formally interpret differences in data sets.

As students work, I return their group quizzes from yesterday. If a group was missing one of the three representations, they should definitely try to make that one on today's problem set, which will give me further evidence of their mastery of SLT 2.1.

Students are highly engaged on this assignment because the data is real. They can see their own data points within this set. When they're working, and they ask, "Wait, what the heck is going on here?" it's because they're flexing their number sense muscles. That's really a mathematical question, and I always make sure to recognize this fact when I hear it.

I want to note again that this lesson really takes shape after I see the survey results. What I'm sharing here is a script for how this lesson went down in one particular year. The real, messy survey results change annually, and they'll be different for you if you use this lesson, and there will be room to play if you pay attention to what your kids are telling you. In general, I believe that this is one key ingredient in great teaching. It's more engaging for me to have a different data set every year, and that engagement definitely transfers to the kids.

Looking ahead, I won't often give students a data set quite this large. This assignment is right on the precipice of being busy work, but I want them to have the experience, at least this time, of getting their hands dirty in a big set of data. As we move forward, we'll look at representations that have already been made and we'll use technology to analyze data. When we do so, we'll have this assignment as a reference point for what it's like to look at a set of 99 numbers and figure out how to make it make sense.

On a side note, as I was researching this lesson, I stumbled upon this infographic: http://visual.ly/social-media-vs-text-messaging. It's a sponsored presentation with a clear message, but it's nice to have it in my back pocket. If there's a good moment just to put it up and ponder it, I will, even if I never use it in an official capacity.

5 minutes