Wednesday, April 3, 2013

CNN, statistical-minded proofreading, and percentages of percentages of percentages (of percentages)

This week's post should be a quick one - it has to do the media.  The poor, poor media.


Don't worry, business cat will make sense by the end of all of this.

Specifically my problem this week is with CNN, though they're by no means the only ones guilty of poor statistical reporting.  They're simply the one that I have most recently noticed.  Once you start looking, though, it's no real trick to catch any of the major news outlets in the same kind of gaffe.

The story in question is here:

http://schoolsofthought.blogs.cnn.com/2013/03/11/when-teachers-are-the-bullys-target/?hpt=hp_bn11

Please take note, this is not a discussion of content.  Obviously I was at least a little interested in the content to be reading the article, but this particular discussion should be completely free of content.  The main content of the article could be written in Latin - what's important today are simply the numbers and how they report them.

Now, to drill down into it as quickly as possible, the only paragraph we need concern ourselves with is a little past the halfway point of the article.  For your sake, here it is, copied from the article as originally seen several weeks ago but accessed today, April 3rd, 2013:

"MetLife’s 2012 Survey of the American Teacher revealed that job satisfaction is the lowest in more than 20 years. The survey reported that 29% of teachers said they are likely to leave the profession. That’s 12% higher than the number of teachers who said they would leave in 2009."

I'm sure I'm not the only one that starts to get a fight or flight response when I see any media outlet reporting statistics or percentages, but this paragraph throws up some pretty obvious flags that make a little anxious.

First off, they say that in 2012, x% of teachers are likely to something.  They tell you that this is a z% increase from 2009, but fail to provide you with y, or rather y%, the percent of teachers that were likely to do this same thing back in 2009.  

Some of you are getting a flight or flight response (I can feel it, even over the internets) because I just used letters instead of numbers.  I feel for you, I really do.  But this is grade school algebra I'm dropping on you.  If you have no idea how to do grade school algebra it shouldn't make you feel sad or angry or anxious, it should make you feel motivated to take a few hours and just learn grade school algebra.  If you know me, and want me to teach you, I will.  Honestly.  Just ask.  It will be quick and painless.

In any case, y is not given, but inferred.  We have an equation:

(y + (z/100)*y) = x

This equation has three variables, but only one unknown.  That means it's solvable for y.

To clarify - for those that are looking at that equation like it is Latin - all that's happening in it is that we're taking a 2009 number (y), and increasing it (+) by a percent (z).  Percents are given as numbers from 0 to 100, but to do math in terms of incrementing we actually want a proportion, which ranges from 0 to 1.  We can easily change a percentage to a proportion by dividing by 100 (/100).

This proportion is the part of the first number that increases.  If there is some percentage growth (z), we take the original number (y), and add on to it the share of itself that it is growing by ((z/100)*y).

To make it concrete for you, if y is 10, and it grew by 50%, then the way we figure out what the new value (x) should be is to start with 10, and add on half (or .50*10 = 5).  Thus, a 50% increase to the number 10 results in the number 15.

Are we all on board with that?

Some of you might be saying, 'hey, this is different because you just did it on 10 and not on 10%'

You, good readers, have just hit on the teachable moment.

I used the number 10, but it doesn't matter.  For your sake I'm going to copy paste the same explanation but add in the % symbols.


To make it concrete for you, if y is 10%, and it grew by 50%, then the way we figure out what the new value (x) should be is to start with 10%, and add on half (or .50*10 = 5).  Thus, a 50% increase to the number 10% results in the number 15%.


Still following?  Because it's somewhere in there that CNN stopped following.  Business cat has also moved on to chasing a laser pointer across the floor.

I said we can use the equation up above to figure out the number that CNN isn't reporting (y).  I won't hold you in suspense much longer - or make you do the math - the value from the given x and z should be y = 25.892...

You see, if you start with just shy of 26%, and take 12% of that (it's around 3%) to add on, you end up at around 29%.  If the percents are confusing you, take the % signs off the 26, 3, and 29.

If you start with just shy of 26, and take 12% of that (it's around 3) to add on, you end up at around 29.

The % signs don't matter on any of those except for eventual interpretation in context of the content, and I've already told you I don't care one bit about interpretation of the content here.

Where calling something a percent does matter is on the 12.  You may also notice it's the only one I didn't remove the % sign from.  I start to worry when I read something like this because an increase of 12% is a lot different than an increase of 12 percentage points.

Let's walk through this a little more.  The equation we talked about above deals with an increase in percent:

(y + (z/100)*y) = x 

But if we're talking about percentage point increases it's a bit simpler:

y + z = x

In that case, you would be saying that the 2012 number is 29%, and since 2009 it has not grown 12%, but rather moved up 12 points on a percentage scale.  It's a lot easier to figure out the 2009 number, as it's simple subtraction.  y = 17%

The fact that this is a lot cleaner and simpler (and doesn't give a solution with a non-simplifying decimal) makes me wonder if this is in fact what they might have been doing there.

OH WAIT WE CAN FIGURE THIS OUT.

You see, despite their poor understanding of statistics and percents, CNN does at least take the time to link you to things they are citing (so they are actually doing a little better than some of the news outlets in that regard).  In this case, the link in that paragraph is actually a live link (at the moment) to the pdf research report from which they are drawing their numbers.  For those that want it as a separate link, here you go:

https://www.metlife.com/assets/cao/contributions/foundation/american-teacher/MetLife-Teacher-Survey-2011.pdf

It's not a small document, but we're looking for a very particular piece of information.  A quick search pulls it up, and reveals that CNN didn't even have to read the actual report - they're citing information from the executive summary.  Think of the executive summary like http://simple.wikipedia.org

You've never been to simple.wikipedia.org?  Stop wasting your time here, and start wasting (making use of?) your time here:

http://simple.wikipedia.org/wiki/Large_Hadron_Collider

or here:

http://simple.wikipedia.org/wiki/Special_relativity

or here:

http://simple.wikipedia.org/wiki/Love

or here:

http://simple.wikipedia.org/wiki/Candy

The last one containing what may be my favorite pair of sentences ever written in conjunction on the internets:



"Many people like candy and think it tastes good. Other people do not like it."



Anyway, back to the stats.

Finding the Executive Summary TL:DR, CNN appears to have conveniently found the 'Major Findings' bullet point list of the Executive Summary to be the place to go for numbers.  I don't even have a good comparison for a Major Findings bullet point list in an Executive Summary - simple.wikipedia.org is about as simple as my comparisons get.  

Maybe, uh, quickmeme?

http://www.quickmeme.com/make/

Well NOW I've killed your day.  That's also where business cat came from.  Quickly, in fact.

The place I'm trying to get us to is this bullet point in the report:

"The percentage of teachers who say they are very or fairly likely to leave the profession has increased by 12 points since 2009, from 17% to 29%."

Bam.

Hopefully at this point - if you've been following - you can see that the people who were paid to put together a statistical report actually put it together correctly.  They used the correct terminology, and left a % symbol off of the number 12.  They did this as it is not a percent.  It is a growth in percent, not a percent growth.  These two things are very, very, very different.

If you've been reading the blog for a while you might recognize that this is the same thing that a of companies use to trick you into thinking things are much larger or smaller than they appear.  The way CNN reworded things actually translated into only about 3 percentage points growth - not very impressive.  12 percentage points is...well, larger.

The same way that Jimmy Fallon can change something on the order of half of a percentage point increase into a drastically different 50% increase (or as we noted, much smaller increases into much larger percent increases), so too can poor statistical reporting change any effect into something that it is obviously not (in either direction).

Look for sources, and don't just read through numbers without thinking.  The person feeding you the information might be actively trying to deceive you to prey on your weaknesses (like Jimmy Fallon), or might simply be negligently ignorant about those statistics (like reporters at every major news outlet).

No comments:

Post a Comment