SWBAT compare the shape, center, and spread of two data sets.

Students are active participants in John Ridley Stroop's famous experiment, which provides them with tangible reasons for the shape, center, and spread of a data set to change.

10 minutes

Today's opener is projected as students enter the room. It's on Slide #1 of today's Prezi, and here's what it says:

**Make a data set of 6 numbers, such that the mean of the data is 15 and the median is 12.**

This is one of my favorite kinds of problems, because it gives students the opportunity to consider the behavior of mean and median for a given data set. This problem is great for getting students to **reason abstractly and quantitatively (MP2)**, because there's room to move between specific values and generalizations about how mean and median work. I give students space to try it on their own, and I encourage them to talk to each other about it. If students are working together, I challenge them to compare notes, but to make solutions that are unique among everyone at their table. It's worth our attention to note that there are infinite solutions to this problem, so at some point I'll ask how many there are.

I'd really like to stay out of this, if possible, by finding a student volunteer or two to show how they came up with their solution. A version of this problem will also show up on Problem Set #2, so I'm not too worried about everyone getting it perfectly right away, and I really like to let this problem generate conversations among students.

If most of the class is struggling, however, I'll sketch a diagram on the board consisting of six blank spaces, separated by commas. Then I'll ask, "How can we make sure that the median of this list will be 12?" Students see that the median only depends on the middle two numbers in the list, so they start with those. I hope for students to notice that the 3rd and 4th elements in the list can each be 12. In general, I'll point out, they must have an average of 12. When I see that students are amenable to that, I'll go one step further: "This means that the middle two values in the list have to add up to 24," I say.

With that established, I can give everyone a way to finish up: "Keeping in mind the relationship between mean and the sum of numbers in the list, then, what must the sum of all six numbers be?"

If any students need a challenge, I tell them to make a list that includes negative numbers, decimal numbers, and/or on number that's greater than 100.

**Source Link for Prezi**: (Accessed September 16 2014)

http://prezi.com/n6pfdghm7kfk/su1l4-the-stroop-effect/

15 minutes

I put up slide #2 on today's Prezi, and I try to provoke my class by throwing in a little light-hearted bluster. I tell the class to come up with a Mastermind code that they think will really stump me, and that I'll step out of the room while they work together to decide on a code.

When they invite me back in, I ask, "Is everyone sure they know the code? I just want to make sure you all know it, and that you're all going to make sure to give me the right feedback." They check with each other, and agree.

I follow a Mastermind strategy that is transparent enough for students to gain some insight from, and that pretty much has the game down to a coin-flip by my fifth guess. For fun, I guarantee to students that I'll get it within 6 guesses, and this ups the ante a bit. Some students have had a lucky game or two where they guess in 4, maybe even 3 -- but to many of my students, promising that I'll do it in 6 is preposterous.

I've made a little video showing how I play (see winning mm.mov), and to be honest, my algorithm is a very informal one that I came up with after playing a bunch of games. I definitely suggest playing on your own to come up with your own strategy (there's an iOS app called "Guess the Code" and several online versions of the Mastermind game). But here's the key:

**when I play, I have to think about it. I have to think through some logical steps. It might take me a minute or two, in front of the class to come up with my third or fourth move. This is the most important thing for kids to see. I don't hurry. **

After I play once, I might repeat the demonstration, and this time I'll think outloud (see MM Example). This is even better. I'm teaching what it looks like to **make sense of a problem and persevere in solving it (MP1).**

After they watch me play, I pose a few debrief questions to students (Slide_3_of_Prezi). I really want students take away that this is *not *a timed activity, and that it's ok to take some time in between guesses. I also want students to see that even when my guess is wrong, I'm still gathering information about the correct code. If on my first guess, I get no black or white dots, this means that I can eliminate all of the colors I've just guessed. Rather than being frustrated by this first guess, I am actually thrilled that I've eliminated a few options!

20 minutes

**About Today's Investigation**

The purpose of today's investigation is to generate two new data sets that will be used to illustrate what differences in shape, center and spread look like, and that lend themselves to interpretation of why such differences exist. In order to do this, each student in the class is going to take the "Stroop Test," which you can read about here: http://faculty.washington.edu/chudler/words.html.

In summary, each student will be timed reciting a list of colors from a list of words. In the first trial, the name of the color matches the color of the word, so it's pretty easy. In the second trial, the color of the word and what the word says no longer match, so it's noticeably more difficult, and takes more time. When we collect the data from the first and second trials, the second data set will consist of longer times than the first, and what's great is that this will come as no suprise to students who have just participated in the experiment. When we analyze the two data sets, we should have concrete examples of differences in shape, center, and spread.

**Running the Experiment**

My instructions to students appear on slides #4 through #7 of today's Prezi. I run through the instruction slides with students, taking time to make sure that they know to say the color that's displayed, not the word itself. I instruct students to police each other by watching over the shoulder of a partner and saying "BUZZ!" if a color is mis-named. This serves two purposes: first, it's a small step toward maintaining the legitimacy of the experiment. Second, the buzz is likely to only be invoked on the second trial, and in my experience there's always one or two students who will argue with the person judging them, which in turn slows them down even more, producing outlier times that we'll be able to talk about later. For example, if one student is slowed down by an over-zealous judge, he might end up with a time of a minute or more. If this happens, that 60-second data point will stand out. My task as a teacher is to make sure that the students responsible for that data point have a sense of humor about it; my experience is actually that they'll delight in saying, "hey, that outlier was me!"

After I run through the instructions, students pick up their laptops and go to the site: http://faculty.washington.edu/chudler/java/ready.html, which is where the experiment is run. As they're getting started, I distribute sticky notes: each student gets a yellow one for their first trial and a blue for the second. I tell students to make sure that they write down their times immediately after receiving them, because I don't want anyone to forget anything. The web tool also says to write down the time when it's displayed.

As students complete the two trials, they place their sticky notes on the board, and we have more data to analyze!

**Other Versions of the Stroop Test**

- A lower-tech version of this experiment can be run using stopwatches and hand-written or color-printed sheets of paper.
- A slightly higher-tech version of the test can be found here: http://www.snre.umich.edu/eplab/demos/st0/stroopdesc.html - scroll to the bottom of the page, where a blue button says, "Go to the Stroop Test".

25 minutes

**Construct a Pair of Box Plots**

Now that we've generated two new data sets, we'll start by constructing a box plot for each. I briefly touch on the idea that because there are decimals this time, a dot plot would be a little harder to make. I also note that if anyone would prefer to make a pair of histograms, they should go for it, noting that all decimal values up to, but not including, the upper bound should be included in a given bin.

For the purposes of today's lesson, box plots will be most useful. I challenge students to figure out how to place both box plots on the same number line, because this will help us to make comparisons between the two. As I circulate, this is where students will need the most help, just because they haven't seen it before. The number line should run from 0 to the maximum time recorded in the two trials, and the scale should then be set up accordingly. I make graph paper available to any students who want it.

I ask the first student to finish to put their result on the board, so we can use their plot to talk about some new learning targets.

**Review of Today's Learning Targets**

As students finish, I post SLT 1.2 and SLT 1.3 on the screen. I ask students to compare the two learning targets, because they are quite similar. The key difference is that 1.2 is about comparing two data sets, while 1.3 is about interpreting differences. Once we establish that the words "shape, center, and spread" are the key vocabulary in both, I say that we're going to define these vocabulary words by looking at our two box plots.

**Center, Spread and Shape**

When a lesson like this relies on data that is produced in class, anything can happen. I expect certain things, like the center of the first Stroop trial being lower than the second, and the range of values in the second trial to be wider, but of course, I can't make any guarantees. That's why this lesson is fun to teach. There is likely to be cause for some improvisation at some point, and it's possible that some of the points I'd hoped to make will be clouded by the messiness of real data. Still, I'd rather be grounded in something that actually happened than not.

The Stroop Effect investigation will serve as an example of how we name the differences in shape, center, and spread, and how we interpret those differences. As students move toward completing their Mastermind Projects and comparing the data sets for the three trials of that project, they will need to use the knowledge developed today. During the next class, as students work on their projects, I will refer back to this example to help students come up with ideas of how to explain what's going on in the project.

In the time that we have left today, I start by describing "center" as the part of statistics that students already know: mean, median, and mode. It always reassures students to hear that a new mathematical word describes something they're already comfortable with. I ask which of these is easiest to see on our box plots, and if everything is going well, students are pretty quick to say that it's the median. I observe that it's pretty easy to compare the median of our two box plots, and ask for a volunteer to give it a try. The median of the first data set should be noticably less than that of the second. Once everyone acknowledges that, I point out what SLTs 1.2 and 1.3 are asking us to do. The directive of one is simply to "compare" data sets, and we've done that. The other has us interpreting differences, so I tell the class that we should give that a try now. Without editorializing, I ask, "who can interpret the difference between the median of each of these data sets in the context of the Stroop Test?" That's usually all it takes to set of a good conversation, and it gives students a toe-hold in what it means to intepret differences in data.

Next, we take a look at "spread", which I also introduce as a pretty simple idea. "Who knows how to find the range of a list?" I ask. Most students are comfortable with that. "How can we figure out the range of a data set by looking at a box plot?" I continue, and students are able to explain that it's just the distance from the minimum to the maximum, both of which show up on a box plot. IQR, I then explain, is just as simple as range, only that it's the distance between the first and third quartiles rather than the min and max. The toughest part about IQR is that "inter-quartile range" really feels like a big word, so I acknowledge that, then say that once you understand quartiles, IQR just isn't so bad. At this point, student opinions of box plots reach an all time high -- here is a representation of data that basically does all the work for us! Now comes the fun part. It's likely that the measures of spread will be greater for the second trial than the first, but it's not guaranteed. I lead a conversation in which we compare the spread of each set, then we try to interpret what's happening. Have fun here, with whatever results come at you!

Finally, we have a little time to talk about the shape of these two sets, so I introduce the ideas of symmetry and skew. Again, no promises of what will happen here, just be ready for it. Interpreting the differences in the shape of each set is the least direct part of today's work, but just having a conversation, even without conclusions, is a useful exercise.

5 minutes

Today's Record Sheet prompt is simply:

**Write a sentence or two about what you did today.**

I give students the last 5 minutes of class to respond to this. Please see my strategy folder for a description of Record Sheets.