Wednesday, February 27, 2013

OCD of National Proportions: Or How I Learned to Stop Worrying and Love Big Round Numbers

I'm going to make an assumption today that the majority of you reading this have spent their entire lives in a world where the United States of America consisted of 50 states.  I have, and it's always been one of those things that stands out in the back of my mind.

It just seems too convenient -50 is just such a nice number, and graphical representations of the states (e.g. the fifty stars on the flag) are just so nicely symmetrical. 

I've always wondered what it would take for the US to add more states from the very adequate list of territories (e.g. Puerto Rico, Guam, The Virgin Islands, American Samoa...Guam, etc), as this would potentially disrupt some pretty important stuff (like having to make a new flag!). 

It does seem a bit suspicious that it's been so long since we've changed off of a nice round number, though.  It got me to wondering about if there were other numbers that the United States stayed with for a while during the slow growth of the nation.  Let's take a quick look:


You can see that the United States has stuck with 50 states for a while now (about a quarter of the time there have been states), and they had been at 48 for a while as well.  If we're looking at numbers that end in 5 or 0 as those that fit the criteria of big and round, you can see that the US also spent a brief time (around 1900) at the 45 mark. 

Other than that, though, things seem to be pretty random.  There's some periods of time where the number of states was constant for a while, but none of those numbers seem to be big or round. 

Bit of trivia, by the way - all states except two have a well-established order in which they were admitted into the United States.  States admitted on the same day are often ordered by what sequence the president officially signed them into statehood.  President Harrison intentionally shuffled up the paperwork for two states, signed them in a thereby random order, and took the secret of the order to his grave.  Which two states?

Anyone?

North and South Dakota.

Before we move on to thinking outside of the states, I have another graph that I made to see what it would look like, and figured it was worth sharing.  It's the same graph as above, but takes into account that a number of states removed themselves from the US roster during the Civil War.  They weren't all readmitted immediately following the end of the war - they were readmitted over a period of a few years.

Anyway, here's what it looks like if we take the Civil War into account:


Beyond this, I started to wonder if other countries naturally settled into nice round numbers that help out with building their flags, etc.

Before we move on, let's have a quick quiz.  How many provinces does Canada have?  Does Mexico call their state-like things states or provinces?  How many of them do they have?

Let's start with Canada.  If you totally blanked on your quiz, they have 10 provinces.  Here's how that has played out historically:

    
You can see from this that Canada has actually spent more time at 10 provinces than the US has spent at 50 states.  Provinces have been added fairly slowly, but Canada has also only added only a fifth as many as the US.  The bottom line would seem to be that they haven't made any changes in quite a while.  Right?

Well, no.  Some of you might be clever enough (or Canadian enough) to point out the fact that Canada has some territories that are much more like provinces than US territories are like states.  They're also contiguous, which helps to create an overall 'picture' of Canada that includes them.

Putting them on the graph as well produces this:

So Canada did spent a decent amount of time with 12 province-like things, but none of these numbers look as nice and round as 10 does.  They've also settled in - fairly recently - to a 13 province-like thing system. 

By the way, if you live in the US and have no idea of what Canada did to change things up in 1999 you should spend a bit of time on the Googles.

Which brings us to Mexico.  Mexico has 31 states.  If you had no idea of that - or have no idea of what any of them might even be named - maybe you should head over to wikipedia for a bit. 

Here's the historical punchline:

Mexico spent a decent amount of time with 20 states (a nice round number), but only a little with 25, and jumped past 30 altogether!  Like Canada, they've also made more recent changes to their state makeup when compared to the US. 

I kind of feel that these graphs are interesting enough in and of themselves, but I just had to push myself a little farther.  What other countries are large and have a number of internal divisions? 

Who else is wondering how the graph for the People's Republic of China (PRC) would look? 

Well...messy.

If you know as much about China as you know about Mexico or Canada you may be unaware (as I will admit I was) of the number of different divisional concepts that the PRC has moved through in 60 or so years.

First off, the PRC started with some provinces already established from the prior several thousand years of civilization.  Most proximal are those that were in place in the preceding Republic of China (the remnants of which are now confined to Taiwan).

The PRC calls all of their divisional concepts (like states, etc) provinces, but one of the things on that list of provinces is also province - the kind that's most like states.  If we only look at things that the PRC call 'provinces' within the larger concept of provinces, we can start by making something like this:

  
If you're thinking about things too much from a US perspective you might be wondering if some provinces seceded or something.  Nope.  The PRC is simply a lot more likely than the US to shift things around and redraw - or divide or combine - provinces as they see the need. 

This might make things look as if not much goes on in terms of China adding/removing provinces, but things couldn't be further from the truth.  In fact, a ton of shuffling has gone on over the last half century. 

Let's add in all the other things that are in that big category of provinces.  This includes 'Greater Administrative Areas', 'Provinces', 'Autonomous Regions', 'Municipalities', 'Special Administrative Regions', 'Regions', 'Territories', and 'Administrative Territories'. 

If you're interested, the best source of information that I could find was actually on wikipedia (yes, yes, sources, etc), and specifically the article here:

http://en.wikipedia.org/wiki/History_of_the_political_divisions_of_China#List_of_all_provincial-level_divisions_since_the_proclamation_of_the_People.27s_Republic

It's also the first time on this blog that I've been unable to get all my numbers to match up.  The numbers work out against source for that last graph, but they don't for the next two.  They're close, but all the double-checking I've done has not revealed the small mistake I may have in there. 

That said, if you check out that wikipedia page you can see how difficult it is to systematically track the progression of  these state-like things through all the different terminologies, as well as through all the mergers, dissolutions, and reinstatements.  At the end of the day, my take is that if I have any Chinese readers I would absolutely love to sit down and get some input on the last 60 years of your history.

You're waiting for the chart, so here it is:

   
You can see that there were a lot of changes to things in the 50s, as there seemed to be a drive to simplify some of the naming and state-like things.  To really paint this picture I think it's more interesting to look at the same graph set up like this:



The blue line, then, is really the difference between the red and yellow/orange lines.  You might be tempted to say that the PRC hasn't changed anything in a while, but keep note of the different scale of time we're talking about.  Like Canada, they were doing things up into the 90s.

It's hard to imagine a US where this much shuffling took place, but it's a good example of a country not getting stuck on one number or another.  Mexico is similar with their 31 states.  Canada is interesting, as they seem to be pretty stuck on 10 provinces but willing to toss around territories all willy-nilly.  Perhaps my many Canadian readers can illuminate me on what makes their territories different from their provinces.

Perhaps someday the US will follow Mexico's lead and head on up to 51 states.  My advice?  Start working on 51 through 55 star flags right now so you can win the new flag competition post-haste when it's introduced.  Because seriously, isn't that the important part of all of this?

Wednesday, February 20, 2013

On the things that fall out of the sky

If you've been following the news of the last week you might have heard about the recent meteor that was seen (and paparazzied to death) over Russia.  While it's only somewhat statistical (okay, minimally statistical), I figured it would be a fun topic to talk about this week. 

On second though, let's get some stats out of the way up front.  You may have heard reporters talking/writing about this meteor impact as a one in one hundred year event.  Despite the difficulty of determining these sorts of meteor impacts over oceans (70% of the globe) before we had a satellite network, or of determining these impacts over non-explored portions of the globe before, say, the 1500s (90%?), let's say that this number is correct based on our 100 years or so of good global record-keeping.  I have literally read articles that paint this recent strike as a positive thing due to the fact that something like this shouldn't happen for another hundred years now that we've had our strike during this hundred years.

If you've been reading this blog for a while I'm not even going to insult you by explaining how much is wrong with that sort of assumption.  Let's get back to the fun stuff - meteors.  (If you're still wondering why there's so much wrong with such a claim just google 'statistical independence')    

There are a few reasons that I find it fun to talk about meteors.  First, meteors are cool.  That might be enough for some of you.  Beyond that, though, I think there's a lot of great points to learn about meteors and the atmosphere and speeds of things, etc.  Did I mention that meteors are cool?

Before we get to deep into it, if you haven't seen much footage of the actual impact, you can find a whole bunch of videos of it here:

http://say26.com/meteorite-in-russia-all-videos-in-one-place

There's a lot of information to take away from these videos, actually.  It's pretty fantastic that so many Russians have dash cameras on their cars or trucks - apparently a big part of it is simply the fact that having a documentation of your driving helps out if you find yourself in an accident or pulled over for something you may or may not have done.  Makes fighting that traffic ticket in court a whole lot easier.

A pretty good view of what's happening can be found in the first video on that page, or on YouTube here:

http://www.youtube.com/watch?feature=player_embedded&v=tkzIQ6JlZVw

One of the first things you should take note of is the fact that you see what looks like to be a pretty energetic reentry, but that you don't hear anything.  Are you thinking it's because the camera is in a car?  Well, no.

This is something that has been misrepresented for the better part of the last century - basically as long as we've been matching sound to video.  Have you ever seen video of an atomic bomb blast from back in the 40s or 50s?

You might be able to picture in your mind what this even looks like - the bright flash and then the rising mushroom cloud.  Along with the bright flash you probably also remember a pretty loud explosion.

If you've been to a track event started with a pistol and sat way up in the stands, been to a fireworks display from a distance, or watched a thunderstorm for a while, you might understand that this doesn't really make a lot of sense.  In fact, there are exceptionally few surviving clips of atomic tests with correctly matched audio - most footage uses stock explosion noises with the sight and sound of the explosion matched.  If you're interested, more information on the topic can be found here:

http://www.dailymail.co.uk/sciencetech/article-2174289/Ever-heard-sound-nuclear-bomb-going-Historian-unveils-surviving-audio-recordings-blast-1950s-Nevada-tests.html

The reason that you hear thunder some time after you see lightning is due to the differential speeds of light and sound.  Light is...pretty fast.  It's usually expressed in meters per second, and every second it actually goes a whole lot of meters.  With some rounding for simplicity (don't worry I'll use actual numbers for calculations), it's around 300,000,000 meters.  Every second.

Sound, on the other hand, is actually pretty slow.  In the same second that light can travel around the Earth seven or so times, sound doesn't even make it down to the corner store.  It takes over four seconds for sound to make it a mile - if sound was running a 5K it would put in a time of just under 15 seconds.

In that same time, light could run over 100,000 marathons.

[One important aside before we continue - the speed of light is usually given as the speed of light in a vacuum.  Light does travel slightly slower in atmosphere, but the difference is small enough to be negligible in most cases.  The speed of sound in most cases is given at sea level - these are the numbers I've used above.  The speed of sound does decrease as you travel up through the atmosphere, but - despite the fact it could be applicable here - I'm not going to go into that sort of detail.  This will be left up to the ambitious reader.]

This might have been a lot of setup for something you're already pretty aware of.  If something that produces noise happens a distance away from you, the light from the event will reach you before the sound does.  That means that you'll see the event before you hear it.  How much earlier?  Well, it depends on how far away the event is.

If you're sitting a dozen or so feet away from your television, the difference between the light and sound coming at you is small enough that you don't notice any difference.  That is, the sound seems to match the image.

If you were watching a drive-in movie screen through binoculars from a mile away and relying on sound produced at the screen itself you'd start to notice that things weren't matching up.  How far off would the audio be at that distance? 

In the extreme, if we set off an atomic bomb on the moon, how out of synch would the sound be at that distance? 

Well, the first question can be answered, and we'll do it below.  The second is a trick question because sound (unlike light) needs a medium to propagate through; like air or water.  There's no air (or water) between us and the moon (or on the moon), and so no sound waves would propagate from the explosion.    

To answer the first question we can start by taking a look at the time it takes sound and light to travel a range of distances. 


It's possible that you're asking: where's the line for light?  It's at the bottom - the red line isn't an axis, it's the time it takes light to travel these distances.  Compared to sound, the difference between light traveling 1 mile and light traveling 75 miles is fairly negligible.  Light can travel both of these distances in a fraction of a second.

It takes sound a little over 4 seconds to travel 1 mile.  If you've ever heard the old rule that you can count the seconds between seeing lightning and hearing thunder then divide by four to figure how many miles away the strike was you now see why that makes sense.  For the distances you see and hear lightning the rounding doesn't really cause any major problems. 

Why am I going on about all this when we should be talking about meteors?  Well, the fact that so many Russians have dash cameras means that we have a huge supply of data available to us.  We can even find a few examples where the incoming meteor is pretty close to directly overhead.  Here's a good example:

http://www.youtube.com/watch?feature=player_embedded&v=odKjwrjIM-k

Since we're looking at a dash camera we also have a second-by-second time stamp, which is great.  You can see that the person who uploaded this clip cut out a part of the middle - the time between seeing the meteor and actually feeling the shock wave.  We can figure out the difference here by taking note of when two events occur.

The first is the place in the video where the meteor seems to be directly overhead and most energetic - right around 43:05.  The second is when the shock wave hits and knocks some snow off the surrounding buildings.  It's seconds later in the clip as edited, but the time stamp reveals that it was just about a minute and a half later, at 44:35.

Imagine you were watching a thunderstorm and saw some lightning.  A minute and a half later you heard the accompanying thunder.  You might not even link these two events in your mind - you might associate the thunder with more recent lightning strikes that you may have missed.

Well, unless the thunder sounded like this:

http://www.youtube.com/watch?feature=player_embedded&v=w6uOzFo2MQg#!

From the numbers behind the above graph we can figure out what a minute and a half lag time means - turns out it's around 19 miles.

I can hear some of you yelling already, even through the internets.  You're using the speed of sound at sea level!  Yes, yes I am.  I told you that the speed of sound slows as you travel up in the atmosphere, and this meteor was obviously not at sea level.  This means that our estimate of 19 miles will be off, though we at least have a decent ballpark estimation. 

I can also hear a much smaller contingent of you yelling that things are a lot more complex than that and shock waves have different profiles than sound waves.  Well, yes.  I was hoping to keep this pretty simple to get across a point, but if you're so inclined you can learn a bit more here:

http://www.fas.org/sgp/othergov/doe/lanl/pubs/00326956.pdf

and here:

http://en.wikipedia.org/wiki/Shock_wave

19 miles is a bit of a distance.  The fact that damage was produced even at this distance is a testament to the amount of energy released from this particular meteor.  Current estimates have placed the energy released on the scale of nearly half a megaton of TNT (just under 500 kilotons).  Everyone is comparing that to the explosion of "Little Boy", the atomic bomb dropped on Hiroshima, which checked in at 16 kilotons. 

This brings us to some facts about meteors and the atmosphere that are a little less stats-y (not that we've been stats heavy to this point).   

Let's start with some simple stuff.  We've been using the term meteor here, and the use of that term actually carries with it some useful information. 

A piece of debris that's simply floating around in space isn't a meteor - it's a meteoroid if it's fairly small (roughly up to the size of a small car), an asteroid if it's a bit larger (up to the size of a small moon), and a planetoid if it's much larger (that's no moon!).

If any of these things is composed of ice - enough that it grows a tail - it is a comet.

Once one of these things comes in contact with the Earth's atmosphere (or any atmosphere, really) it becomes a meteor.  Thus, what was seen in the Russian sky was a meteor.  There are reports that some fragments of the meteor may have been found - if any parts of a meteor survive to the ground those fragments become meteorites.

You've also clearly seen the trail left in the sky by the meteor - a trail that persisted for some time.  I want you to think about two questions for a moment.  The first is why a meteor (or a space shuttle) heats up when it enters Earth's atmosphere.  The second is what causes a trail to be left in the sky behind a meteor such as the one filmed over Russia.  Think about both of these for a minute or so.

Okay, so what are you thinking?

Your ideas on this are probably again a bit polluted by a few sources.  Mainly movies and TV, I'd bet.

The first question is quite a bit easier, but also one of those that seems to be fairly misunderstood.  You might be thinking that meteors (or the space shuttle, etc) heats up on entry (or reentry) due to friction with the air.  Friction is actually a very small part of this process - what's really happening is that the air in front of whatever is entering the atmosphere is being compressed.  This is simply due to the fact that air can't move out of the way of an object fast enough once an object reaches certain speeds.  Since it can't get out of the way it becomes compressed.

If you've ever sat around and figured out how your refrigerator works (I would suggest it as a fun thought experiment as well) you might recognize what's happening as a bit of a reverse of that process.  As air is compressed it becomes hotter.  When a lot of air is compressed really quickly it becomes really hot.  This is what's heating meteors and space shuttles, etc. 

Looking to kill a bit more time?  Randall Munroe of XKCD has a really cool post on what would happen to a diamond meteor upon entry at different speeds here:

http://what-if.xkcd.com/20/

The space shuttle doesn't burn up on reentry due to some pretty sophisticated heat shielding, but meteors aren't so lucky.  This heat causes differential stress on parts of the meteor and it begins to burn up and break up.  This is why many meteors never become meteorites.

This leads to the second question, which I'm going admit I'm not sure that I have a solid answer on - the internets don't seem to address it.  I'm suspecting that many of you are thinking that the trail behind a meteor is a smoke trail.  I can see how this idea would get planted in our minds - movies and television have given us plenty of examples of things on fire plummeting toward the ground with smoke trailing behind them.  Nicolas Cage's stellar performance in Con Air, anyone?

Like I said, I'm having trouble figuring this out with any actual sources, but it doesn't seem that a meteor streaking through the sky is the best place for normal combustion to take place.  Moreover, you'll notice that trails behind meteors are (from what I've found) universally white - combustion of different components of different meteors would presumably lead to smoke at least occasionally displaying darker shades.  You know, different shades like you're probably familiar with from movies like Deep Impact where you see a plume of dark smoke trailing the meteor as it streaks through the atmosphere.  Example here:

http://www.top10films.co.uk/img/deep-impact-disaster-movie.jpg

What is the alternative?  Well, cloud formation.

If you remember your grade school science fairs you might be familiar with the old 'make a cloud in a bottle' experiment.'  If not, a good example is here:

http://weather.about.com/od/under10minutes/ht/cloudbottle.htm

Much of cloud formation relies on the compression and decompression of air with at least some water vapor content and dust particles.  We've already discussed how a meteor compresses air (which is free to decompress in the immediate wake of the meteor) - as long as local humidity is above zero the meteor is also producing a reasonable share of dust through the breaking/burning up process.

Similar to how airplanes in high atmosphere form contrails it seems that meteors might be leaving a trail that's nothing more than clouds formed by their fairly violent passage through the air.  Like I said, I can't find this in any of the intertubes, so it'd be interesting to have a discussion if people have other thoughts.       

One other thing before we go - in all the coverage of this meteor strike I've only seen one or two articles that discussed the angle of entry of this meteor.  You can probably figure it out from the name, but angle of entry relates to the angle that something enters the atmosphere.  The extreme ends would be directly perpendicular to the ground (think Felix Baumgartner skydiving from space) and directly parallel to the ground (which might cause an object to even deflect off the atmosphere - think satellites that are in orbits that are allowed to decay).  

This meteor was much closer to the second - you can tell pretty clearly from the videos of it that it had a shallow entry (it has been estimated at less than 20 degrees).  The angle of entry is important because it is one of the main determinants of how long something spends in atmosphere before it reaches the ground.  The numbers that I've seen seem to indicate that this particular meteor spent over 30 seconds in the atmosphere before it broke up.  The fact that it had 30 seconds of time in atmosphere was only because it was traveling at such a shallow angle - imagine if it had hit the atmosphere with an angle of entry closer to Felix Baumgartner.    

Well, to do some math on this we need to decide where the edge of space is.  As you travel up in the atmosphere, air gets thinner, and eventually there's no air.  It's not a hard edge, though, it's a slow gradient, so it's tricky to decide when a small amount of air is different from no air.  Most estimates that I've found seem to be in the 75-100 mile range.  This is good enough for some quick estimation.  

The Russian meteor entered Earth's atmosphere at a speed of around 11 miles per second.  If it was taking the shortest path through the atmosphere (straight on, perpendicular to the ground), it would reach the ground in somewhere around 9 seconds (if we're calling space 100 miles up).  If we go with 75 miles to the edge of space we're looking at closer to 7 seconds.  Sure, the air is slowing it as it descends, but this 9 seconds is a lot less than 30.  I'll also cede the fact that traveling at a steeper angle through the atmosphere creates a quicker pass through pressure gradients and might have caused breakup faster.  

Still, if this meteor had entered at such an angle there's a real chance that it may have impacted the ground before it broke up.  The energy released in atmosphere was enough to blow out windows and doors 19 miles away - if this energy were transferred all at once to a stationary object (like the ground), well, then we'd really have something like Hiroshima on our hands.  The fact that some portion of the population says 'well, at least we don't have to worry about it for another 100 years' flies right in the face of what we should actually be taking out of this.

Anyway, that seems like as good a place as any to leave you with something to think about.  Thanks for all the dash cams, Russia.

Wednesday, February 13, 2013

Halo 4: Red vs Blue

A few weeks ago I looked at some of the rich stats provided by Halo 4, and wanted to follow up on some of the ideas that came out of it relating to the effects of team color on multiplayer success. 

For those who aren't familiar, Halo 4's main multiplayer component is highly team based.  With the exception of the relatively new playlist 'team doubles', teams range from 4 to eight players on a side.  There are always two teams, red and blue.  The red team always starts at the same place in any given map and game type, as does the blue team.  This leads to the possibility that team assignment may have some impact on the outcome for each team. 

Like I've said, playing Halo 4 produces a great deal of data that is accessible to the player.  It's relatively easy to go into personal stats and code wins and losses by team color.  Since team assignment is assumed to be random, wins should be equivalent regardless of which color team I'm playing on.

I pulled down data from 136 games of multiplayer Halo 4 that I've played recently.  The assignment to teams in these cases are fairly even, with 65 instances of being on blue team and 71 instances of being on red team.  This relates to a split of 47.8% to 52.2%, which again we should hopefully consider to be at least somewhat random.

We can very easily create a contingency table from this which looks at the win/loss percentages for those instances of red and blue team membership.  This is what that looks like:
 

        Win          Loss
Blue Team 32 33 65
Red Team 42 29 71

74 62 136

We can also run some statistical tests (in this case Chi-square tests of association) on these numbers.  While there does appear to be a slightly better chance of winning on the red team overall, it turns out that this relationship of team color and outcome is not significant.

Now, the failure of an overall effect should really stop us here, but there are a lot of different game types within Halo 4 multiplayer.  Each are pretty distinct, so it's reasonable to believe that effects could be present in one game type and not another.  While there are a lot of different game types, two are fairly well represented in these 136 games, as it reflects what I've been playing recently - Team Slayer and Grifball.

We could have a long discussion about what Grifball is, so for those who don't already know I'll simply direct you to the halo wiki here:

http://halo.wikia.com/wiki/Grifball

For those who don't like reading, I'll direct you to this awesome image that I found on the halo wiki:



In these last 136 games I've played 57 games of Team Slayer and 35 games of Grifball.  I've play other things (obviously), but the next most played game type only has 16 games in this set so I'm a bit (more) skeptical about the stats I'd run on it.  That said, we can take a look at Team Slayer and Grifball and see if anything comes of it. 

Let's cut to the chase - something does.  The numbers on Grifball are still be a bit small, and while numbers do seem to be trending with a slight advantage for blue team we fail to find a significant relationship between team color and wins.  Here's the breakdown:



         Win         Loss
Blue Team 10 6 16
Red Team 8 11 19

18 17 35


For Team Slayer, there is a significant relationship between team color and winning - it's also the opposite of how Grifball is trending (though keep in mind a trend is not an effect).  In Team Slayer matches it turns out I tend to win significantly more when I'm on red team than on blue team.  Here's the breakdown:



         Win         Loss
Blue Team 9 17 26
Red Team 21 10 31

30 27 57

I wasn't intending things to come down to Team Slayer and Grifball (potential reviewer comment: needs more theory), but it actually presents a very interesting dichotomy that I wish I actually had come up with on my own (and not just because I play a lot of Grifball). 

This dichotomy relates to the possible advantages that can be given to either team based on team color. 

Let's take a step back and explain some things about these two types of Halo 4 multiplayer.

In most of Halo 4 multiplayer the maps - and thus the team placement on maps - are asymmetric.  In this way the maps themselves may actually be able to provide some advantage to one team or another.  It might be the case in a Team Slayer match that the red team starts a little closer to better weapons or better vehicles.  This might give them the sort of starting advantage that could turn the tide of a game early on and impact wins and losses.  It's possible that this is what may be driving the effect we found above. 

Let's imagine that for some reason you didn't go read all about Grifball.  Here's the important part. 

Grifball is played in a perfectly symmetric (nearly if not perfectly square) arena in which each team starts in symmetric positions with the Grifball itself in the center of the room.  Neither team should be able to gain any advantage due to starting position, nearby weapons/vehicles, etc.  In fact, all players start with the same weapons (hammers and swords - I'm telling you that Grifball is pretty fun), and there are no additional weapons or vehicles on the map. 

Because of this, if we were to find a decisive and replicable relationship between team color and winning in the Grifball game type we could only attribute it to a) error or b) psychological differences of being placed on different teams. 

Like I said, I wish I had thought of it instead of stumbled onto it. 

Unfortunately, I simply don't have enough data at the moment to make any strong statements one way or the other.  Statistically, it does seem that I have an advantage in Team Slayer when playing on red team, though I'd also like to run some more numbers on it.  I have more games I can code, though it's only recently that I've been playing Grifball. 

So, I guess that instead of writing this post I should actually be playing some more Grifball...I mean collecting some more data.  See you in the arena!

Wednesday, February 6, 2013

Customers who switched to Zeno's car insurance saved up to 50%

I apologize in advance to anyone who is really hoping that the title of today's blog implies that I'm going to talk about some of Zeno's paradoxes.  I usually write the title first - in haste - to get it out of the way, then revise it accordingly later.  There's just something about this one that I find myself unable to revise.  Perhaps I could start by revising half of it?

See, there you go, we got that out of the way. The focus is actually on car insurance commercials, though to be fair what you find inside is really more about what you take with you.

Good to get those Empire quotes out of the way early on as well, right?

I'm sure that almost all of you have seen the kind of commercials that I'm going to talk about.  I spent some time on Google pulling some of them up, and it seems that almost every company has one (or two).  I could have spent plenty of more time, but here are a few:  



People who switched to Allstate saved an average of $348 per year.

Drivers who switched to Allstate saved an average of $396 a year. $473 if they dumped GEICO.

21st Century's customers saved an average of $474 a year by dumping their current carrier.

Drivers who switched to Progressive saved an average of $550. 

15 minutes could save you 15% or more on car insurance (Geico)


I'm betting that you as a reader might have one of two predominant thoughts.  The first would be the thought that for this to hold true some of the companies must be lying.  The second would be the thought that these companies know how to pick their wording well.

The key wording here is that people who are saving (or can save) are those people who switched.  Well, of course you'd only switch if you were going to save, you might say.  Exactly.  This is another pretty nice example of commercials that are really talking to a small segment of the audience while making it sound like they're talking to everyone.

Let's walk through it, shall we?

I wanted to put together some data to illustrate some of what's happening here, and figured that a good way to do it was to come up with some random variables that give a potential picture of what people might be paying (or could be paying) across a number of car insurance companies.

I created a random variable, then created some random variables that correlate a decent level with the first (~.70).  By virtue of the nature of these correlations these other variables also correlate a bit with each other.  The lowest correlation among any of these variables is around .45.

We could just use these random variables to illustrate our point, but we can also make things a bit more concrete by finding some actual numbers.  Numbers seem to be a bit tricky to find on the average annual cost of car insurance, and finding something like standard deviations on that average is that much more unlikely.

The broad average I've been able to find for annual car insurance costs is somewhere right around $1000, which is a reasonable place to start.  Standard deviations might be more important if we were looking to replicate the exact amounts that people are saving, but for these purposes I'm going to just use a SD of $100 to keep things pretty straightforward.

Using all this information it's easy enough to create a matrix of people and the insurance quotes they'd likely get at a number of different companies.  These are simply transformations of the random numbers that were generated.  I've used the base numbers as the 'middle of the road' company, which is closest to the actual mean of $1000.  Two companies are a bit cheaper (around $950 average), and two are a bit more expensive (around $1050 average).

Again, we can argue all day about how accurate these numbers are, but you can also translate things to a different level but simply scaling all these numbers a little differently.  The intent is to illustrate a general concept, not to replicate the actual situation.  There are also more than five car insurance companies out there, so this by no means would cover the entire market. 

I've created 100 cases to work with, and each of those cases represents a person who can select from one of five car insurance companies.  If we look at the overall average of what things would be like if people were randomly assigned to a car insurance company the average cost of car insurance for this group (not surprisingly) is right around $1000.  I've heard there's certainly some money to be made by switching companies, though?

It's easy enough to examine - what's the average difference in cost between the company you're currently assigned to and each of the others?  Well, averaged across all companies it should be near zero, but if we look at each individual company we should see a pretty clear pattern.

Switching away from the two cheaper companies will - on average - cost you around $15.  Switching away from the more expensive companies will actually save on average around $50, and switching away from the middle of the road company for a random sample of those people will also save a little money (around $20).  Such is the nature of random noise and low sample size.

Looking at the reverse actually gives us a picture of how much customers can potentially save by switching to that company.  In this case there is a small benefit to switching to one of the two cheaper companies, but that's it.  On average the savings is right around $15.  Let's see if we can't make that number a little larger.

As was pointed out earlier there's no reason to switch to a company that's going to charge you more money (assuming that coverage stays constant).  If we look at this first cheap company there's some people who will save money by switching and some people who won't.  If you go back to the lines from commercials above you might now - if you haven't already - be picking up the language that lets us start to work these numbers.

$13 is the average savings for anyone to switch to the first cheap company.  But why switch if you're not going to save money?  There are plenty of people for which this isn't the cheapest company.  If we look at just the people who have a reason to switch (i.e. those who would save money by doing so), we come up with a much different number.  Now we're talking about a savings of $119 dollars - over $100 more.

Now that's something you can put in a commercial.

The reason is that all the people who wouldn't save money (in this metric people who would save negative dollars) are being removed from the calculation.  These sorts of numbers do little to give us an ordering or magnitude of how cheap or expensive a company is, but rather how much noise there is in the market.

I'm sure we can do better, though.  There's plenty of small values - $2, $0.74, etc.  If you wanted the numbers to look a little better you might even tell your sales staff to discourage individuals from switching if they weren't going to save much money at all - it might not be worth the hassle.  If we cut out the people who would save less than $10 annually we can move that average savings up to $129.  Not too shabby.  

This should only hold up for the cheapest companies, though, right?  Nope, the same should be true for the expensive ones (in a reasonable market).  There will be fewer people who save money by switching, but taking the average of those who have a reason to switch will always produce a savings (unless you're really doing something wrong/right).  The savings for those who switch to expensive company 1?  Right around $66.  We can make the same <$10 cut here and raise that number to an average savings of $75 for those who switched.

That's not the only trick, either.  If that $75 doesn't seem impressive enough we could also look at the 'up to' sorts of numbers.  It's rare, but a few people can actually save over $300 a year by switching to expensive company 1.  From this data I could make the statement that 'customers who switched to expensive company 1 can save up to $348 a year on their car insurance'.  Run the percent on that and you're looking at something even harder for the average person to wrap their head around.  


Before we go, there's another way we can look at this.  We have five companies, and without assigning customers to any of them we can simply compare the numbers and see what percent of the time each of these companies actually has the lowest rate of all five companies.  Here's how that breaks down:

Cheap company 1 = 30%
Cheap company 2 = 33%
Middle of the road company =  37%
Expensive company 1 = 0%
Expensive company 2 = 0%

Certainly interesting.  The easy question from this set of information is how the middle of the road company is able to provide 37% of people with the lowest rate while still having a higher average price overall.  Well, as all of this was derived randomly it does turn out that the middle of the road company has a slightly higher standard deviation than the cheaper companies.  Also, the difference in means is not very large, so it doesn't take too much to undercut the cheaper companies.  They end up making more money by - I'm sure some of you already have this figured out - charging a different segment of the population more than their average.

It's actually quite interesting in and of itself.  A market such as this - with fixed but correlated rates - would eventually settle out (over some period of time) such that everyone ended up with the insurance company that was the best for them.  The market does not have fixed rates, however, and those expensive companies need to find some way to stop the slow flux of customers away from them to the cheaper companies.  Left alone, they would eventually stabilize to zero market share.

We can do this by strategically cutting or raising rates on certain segments of the population.

While there's no customers in this group that find expensive company 1 or expensive company 2 to be the cheapest place to go for insurance, it is occasionally close.

If we look at the 10 people who expensive company 1 find the cheapest to insure already (sorting expensive company 1's rates over all people), we find that on average these people are about $85 more than the lowest option.  Thus, to get these 10 people on board they'd have to lower those 10 rates by at least $85 each, at least to pull them from a company that actually has a lower average rate.  Let's say they decide to toss $90 a person at this (and now tell their sales staff that $5 is a big deal).  That's still $900 just to get 10 customers, which doesn't seem that great.

Or does it?  We still have 90 other people, some of which might already be customers of expensive company 1.  All you have to do is transfer this loss onto the bills of people you've already sold, and you're set.  It gets more expensive as you try to get customers that would be more and more of a risk to you (exemplified by higher rates), but that problem actually solves itself.  If you bring on people with lower rates, you're still eventually going to have to raise those same rates.  You'd need to do this to cover the new people you're bringing on with lower rates, or to cover the original deal you gave them.  Soaking each of those new customers with an extra $90 the second year would be a very easy way to make this all work, obviously.  Things will eventually get to the point that people are either paying a lot more than they should.  At this point one of two things will happen.  They'll either stay with you, or leave.

If they stay, great!  Keep raising their rates and hope they don't notice.  You didn't get a reputation of expensive company 1 for nothing.  If they leave, even better!  You now have new potential customers to win back by leveraging current customers costs into a means of undercutting other companies.  If you don't believe that this works for insurance, might I point you to how cable companies work?  It's actually a lot more transparent (yet still somehow effective) there.  Try calling your cable company and getting your rate lowered - you can usually make some pretty quick money and solidify that fact that you're being overcharged.    

This concept works so well at destabilizing equilibrium that I find it very hard to believe that insurance companies *don't* use it.  Flaunting it in commercials is merely tipping their hand.

Let's take a step back, though.  There's a lot of points here, and the main one that I think has the potential to get lost as I continued to expand on it is the trap of allowing others to define their own reference groups, and thus hide useful information.  Saying that customers who switched to your company saved money is a triviality.  By simple definitions this will be the case for all companies in even semi-competitive markets.

To bring it back to the start I made the point that some of you would assume that every company having one of these commercials must mean that (at least) some of them are lying.  You can see now that it's possible that none of them are lying, depending on how you define lying (it's clear they're all misleading).  These are exactly the sort of situations where people like to cite the old 'lies, damned lies, and statistics'.  Statistics don't lie to people, car insurance companies do.

Does this mean you shouldn't have car insurance, or that you should switch companies several times a week?  No, and not necessarily, respectively.  You should get rid of cable, though.

Wednesday, January 30, 2013

Ranking every possible super bowl matchup (and then some)

For those of you paying attention to sports in any way whatsoever you may have noticed that the super bowl is coming up this weekend.  It's pretty easy to find a wide array of articles and analysis about it, and a week or two ago I came across an article at the bleacher report with the title:

Ranking Every Possible Super Bowl Matchup
(http://bleacherreport.com/articles/1483126-power-ranking-every-possible-super-bowl-matchup?hpt=hp_t3 )

I was excited by the title because I thought this was going to be a ranking of *EVERY* super bowl matchup between every team to figure out which team would actually be the strongest, and not just a simple rundown of what the situation was from this point onward.

Since that was a disappointment, I figured I'd just do it myself.  Right?

Well, it's easy enough (if not a touch tedious) to pull down the scores from every game of this season.  Luckily, the NFL plays a relatively small number of games so it's a fairly reasonable set of data.  At most a team will play another team twice, so we can produce a somewhat odd 32x64 partially filled matrix containing all the win information in one direction and the loss information in the other direction.

The important thing that this allows us to do is to calculate some means and standard deviations.  Specifically, we can check out the mean score of each team both from an offensive and defensive standpoint.  The offensive score is the score that team was able to produce, and a higher score should indicate a better offense.  The defensive score is the score that the team allowed the other team to produce, and a lower score should indicate a better defense.

Right off the start this gives us some good numbers to check out - what teams performed the best and worst throughout the season as well as how consistent given teams were.

The Patriots showed the best offense this year, coming in just over 34 points a game on average.   

The worst team?  Sorry, Kansas City Chiefs fans.  Do I have any readers who are Kansas City Chiefs fans?  Sorry, your offense only produced a little over 13 points on average.

The best defense goes to the Seattle Seahawks, only right around 15 and a quarter points per game allowed on average, and the worst defense goes to the New Orleans Saints, allowing on average just over 28 and a quarter points per game.

If we compare the average points of every team against every other team we can get a feel for what their records would have been if a) every team played every other team once and b) every team had the same defense.  Obviously one of those is a bit larger of a jump, but let's keep an open mind for the moment.  This is how things would work out:


Team (offense) Count wins Count losses
New England Patriots  31 0
Denver Broncos  30 1
New Orleans Saints  29 2
New York Giants  28 3
Washington Redskins  27 4
Green Bay Packers  26 5
Atlanta Falcons  25 6
Houston Texans  24 7
Seattle Seahawks  23 8
Cincinnati Bengals  22 9
Baltimore Ravens  21 10
San Francisco 49ers  20 11
Tampa Bay Buccaneers  19 12
Minnesota Vikings  18 13
Dallas Cowboys  17 14
Detroit Lions  16 15
Chicago Bears  15 16
Carolina Panthers  13 17
Indianapolis Colts  13 17
San Diego Chargers  12 19
Buffalo Bills  11 20
Pittsburgh Steelers  10 21
Tennessee Titans  9 22
Cleveland Browns  8 23
St. Louis Rams  7 24
Oakland Raiders  6 25
Miami Dolphins  5 26
New York Jets  4 27
Philadelphia Eagles  3 28
Jacksonville Jaguars  2 29
Arizona Cardinals  1 30
Kansas City Chiefs  0 31

Due to the way this works out through mean comparisons, this is actually a ranking of how every team would do in a super bowl against every other team.  The Patriots would beat anyone, the Broncos would beat everyone but the Patriots, etc.  

We can find the probabilities (roughly) associated with this actually being the outcome by taking into account the stability of those means via their standard deviations.  A proxy for this that I'm calling good enough for our immediate purposes is the probability associated with t-tests between these individual means.  The product of these reversed probabilities (due to the fact that a win or loss is more probable when the p-value is small; e.g. .02 should actually be .98) gives us something we can put in a table.  YES I KNOW I'M KIND OF BUTCHERING THE POINT OF P-VALUES. 

Some of these numbers are actually reasonably finite, and we can add to the above table as such:


Team (offense) Count wins Count losses Probability of occurrence
New England Patriots  31 0 0.47089515
Denver Broncos  30 1 0.019777304
New Orleans Saints  29 2 0.000572806
New York Giants  28 3 6.88355E-09
Washington Redskins  27 4 1.22143E-07
Green Bay Packers  26 5 2.99026E-08
Atlanta Falcons  25 6 7.57896E-08
Houston Texans  24 7 8.88084E-10
Seattle Seahawks  23 8 4.09893E-11
Cincinnati Bengals  22 9 1.49184E-09
Baltimore Ravens  21 10 5.36586E-12
San Francisco 49ers  20 11 1.34842E-11
Tampa Bay Buccaneers  19 12 1.50215E-10
Minnesota Vikings  18 13 1.74783E-09
Dallas Cowboys  17 14 4.93673E-10
Detroit Lions  16 15 7.0116E-10
Chicago Bears  15 16 4.6336E-12
Carolina Panthers  13 17 0
Indianapolis Colts  13 17 0
San Diego Chargers  12 19 3.8835E-10
Buffalo Bills  11 20 3.40025E-09
Pittsburgh Steelers  10 21 6.20755E-07
Tennessee Titans  9 22 1.45213E-08
Cleveland Browns  8 23 1.73087E-06
St. Louis Rams  7 24 1.08623E-06
Oakland Raiders  6 25 2.8135E-07
Miami Dolphins  5 26 1.65687E-07
New York Jets  4 27 2.73281E-08
Philadelphia Eagles  3 28 1.04094E-06
Jacksonville Jaguars  2 29 0.000267479
Arizona Cardinals  1 30 0.000383025
Kansas City Chiefs  0 31 0.137665451


You can see that things sort of follow an upside down bell curve (let's call it a valley curve) - the most probable outcomes are those at the ends, while those in the middle have a bit more noise in them.  More of those middle games are likely to be close enough to drop the cumulative associated probabilities.

What we should keep in mind is that there are a lot of potential outcomes here.  There aren't just 31 (31-0 down to 0-31), but every possible combination of individual wins/losses that would get you to that point.  There's only one way to get 31-0 or 0-31, but there are 31 ways to go 30-1 or 1-30 (you could win or lose to any given team, and each of those has a probability associated with it).  If you'd like to kill a bit more time before you get back to work you can start working out the number of ways you can get to each potential outcome.  It also explains at least a little bit of the valley curve that we have going. 

Yes, the clever among you might have just realized that this table is excluding some potentially important information.  This probability isn't the cumulative probability of all situations that would produce a given outcome, but rather the probability associated with the most likely sequence that would produce that outcome.

For example, the Broncos going 30-1 is actually the probability of the Broncos going 30-1 while losing to the Patriots.  There's another probability that they'd go 30-1 while losing to the Giants, or the Saints, or even the Chiefs (the probability of them just losing to the Chiefs *at all* in this metric is 4.57E-07; fairly unlikely).

There's also a strange coincidence here that you might notice - the Panthers and Colts actually produced the same mean score throughout the season.  There's an interesting discussion to be had about how the way points are earned (in chunks) allows this, but it's for another day.  We'll see it happen a few more times when we get to defense.

Overall these probabilities don't really instill a lot of confidence (except for the Chiefs - sorry again Chiefs fans).  We have to keep in mind that this is simply offense, and doesn't consider how difficult any teams' defense might have been.  Now that we've seen how this works we can also produce the same table based on the idea that a) every team plays every other team once and b) every team has the same offense.

Such a situation would mean that a team's defense was the only way to stand out, and we can produce the same table based on how things would play out from there:


Team (defense) Count wins Count losses Probability of Occurrence
Seattle Seahawks  31 0 0.033645703
San Francisco 49ers  30 1 5.72087E-06
Chicago Bears  29 2 3.03016E-05
Atlanta Falcons  28 3 7.53205E-07
Houston Texans  27 4 0
Miami Dolphins  26 5 5.97182E-10
Denver Broncos  25 6 4.65242E-06
Cincinnati Bengals  24 7 1.68938E-11
Pittsburgh Steelers  23 8 5.27437E-11
New England Patriots  22 9 0
Green Bay Packers  21 10 3.75177E-13
St. Louis Rams  20 11 1.21268E-13
Baltimore Ravens  19 12 0
Arizona Cardinals  18 13 5.77952E-13
Minnesota Vikings  17 14 1.17577E-13
Cleveland Browns  16 15 5.5051E-11
Carolina Panthers  15 16 2.73832E-11
San Diego Chargers  14 17 1.6091E-12
New York Giants  13 18 0
New York Jets  12 19 1.07632E-10
Indianapolis Colts  11 20 9.97422E-12
Washington Redskins  10 21 2.72878E-09
Tampa Bay Buccaneers  9 22 6.3485E-09
Dallas Cowboys  8 23 3.63465E-07
Kansas City Chiefs  7 24 9.21125E-08
Buffalo Bills  6 25 7.97968E-11
Tennessee Titans  5 26 2.93401E-05
Philadelphia Eagles  4 27 0
Jacksonville Jaguars  3 28 0
Detroit Lions  2 29 8.82456E-09
Oakland Raiders  1 30 5.65719E-11
New Orleans Saints  0 31 1.93842E-07


The same things about the other charts apply to this one, though it also gives us a picture of how strong different teams' defense was.  Unfortunately, this is also confounded with the fact that different defenses played different offenses.  We could simply look back at offenses, but those were already confounded by the fact that different offenses played different defenses.  You can see we're in a bit of a loop here.

While we're trying to think our way out of that one we can kill some time by taking a look at the average quality of defense that different teams faced throughout the season.  We can do this by averaging - for each team - the average points allowed by their specific list of opponents.   The more points that your list of opponents allowed, the easier it is to score points against them.


Team Opponent Defense
Atlanta Falcons  24.5025641
Pittsburgh Steelers  23.89198718
Cleveland Browns  23.5650641
Tampa Bay Buccaneers  23.50737179
Cincinnati Bengals  23.5025641
Indianapolis Colts  23.46634615
San Diego Chargers  23.40865385
Philadelphia Eagles  23.07948718
Jacksonville Jaguars  23.04807692
Houston Texans  22.96634615
Kansas City Chiefs  22.93269231
Baltimore Ravens  22.91121795
Carolina Panthers  22.88717949
Miami Dolphins  22.88461538
New Orleans Saints  22.86794872
Denver Broncos  22.81730769
Washington Redskins  22.81025641
Chicago Bears  22.76217949
New York Giants  22.67083333
Green Bay Packers  22.60576923
Oakland Raiders  22.55288462
Minnesota Vikings  22.53846154
Tennessee Titans  22.49038462
Buffalo Bills  22.46153846
New England Patriots  22.19230769
New York Jets  22.17788462
Seattle Seahawks  22.07467949
San Francisco 49ers  22.06730769
Dallas Cowboys  21.94230769
Detroit Lions  21.85576923
St. Louis Rams  21.62980769
Arizona Cardinals  21.39903846


Turns out things are actually pretty close when it gets to this level.  The Falcons faced the easiest defenses, with their average opponent allowing 24 and a half points.  The Cardinals - perhaps not enough to account for their fairly weak season - faced the most difficult defenses.

We can look at the same concept in terms of how well defenses performed against their opponents' offenses:


Team Opponent Offense
Arizona Cardinals  23.18269231
Atlanta Falcons  22.5599359
Baltimore Ravens  23.43974359
Buffalo Bills  20.9375
Carolina Panthers  23.58397436
Chicago Bears  22.27403846
Cincinnati Bengals  21.35801282
Cleveland Browns  22.59358974
Dallas Cowboys  23.9974359
Denver Broncos  23.49519231
Detroit Lions  22.05769231
Green Bay Packers  22.73301282
Houston Texans  23.25
Indianapolis Colts  21.77403846
Jacksonville Jaguars  23.12019231
Kansas City Chiefs  23.48557692
Miami Dolphins  22.0625
Minnesota Vikings  22.62980769
New England Patriots  21.67307692
New Orleans Saints  23.35801282
New York Giants  23.96634615
New York Jets  22.07211538
Oakland Raiders  22.34615385
Philadelphia Eagles  23.73301282
Pittsburgh Steelers  21.9974359
San Diego Chargers  22.38461538
San Francisco 49ers  23.44455128
Seattle Seahawks  22.56730769
St. Louis Rams  23.55288462
Tampa Bay Buccaneers  22.97339744
Tennessee Titans  22.73557692
Washington Redskins  23.25224359

At this level of aggregation we again seem to be washing out all useful variance.  

Overall, I'm not sure there's really enough variance here to warrant the meaningful inclusion of it unless things are really pretty close.

Speaking of close, we should at some point probably try to figure out who is going to win the *actual* super bowl.  One last combination before we get to that.  We might be able to get a little more out of offense and defense if we look at them in combination.  We can do this by combining the win/loss records for each team to produce a table like this:


Team (overall) Wins overall Losses overall
Denver Broncos  55 7
Seattle Seahawks  54 8
Atlanta Falcons  53 9
New England Patriots  53 9
Houston Texans  51 11
San Francisco 49ers  50 12
Green Bay Packers  47 15
Cincinnati Bengals  46 16
Chicago Bears  44 18
New York Giants  41 21
Baltimore Ravens  40 22
Washington Redskins  37 25
Minnesota Vikings  35 27
Pittsburgh Steelers  33 29
Miami Dolphins  31 31
New Orleans Saints  29 33
Carolina Panthers  28 33
Tampa Bay Buccaneers  28 34
St. Louis Rams  27 35
San Diego Chargers  26 36
Dallas Cowboys  25 37
Cleveland Browns  24 38
Indianapolis Colts  24 37
Arizona Cardinals  19 43
Detroit Lions  18 44
Buffalo Bills  17 45
New York Jets  16 46
Tennessee Titans  14 48
Kansas City Chiefs  7 55
Oakland Raiders  7 55
Philadelphia Eagles  7 55
Jacksonville Jaguars  5 57
      
Looks like that helps to put a bit more spread on things, though our apparent best teams aren't the ones in the super bowl.  Not shocking, as randomness can really play havoc with things when you play so few games and leave playoffs and finals up to single elimination matches.  While I'd be a bit more excited to watch a super bowl between the Broncos and the Seahawks (or the Bears and the Jaguars), that's not what we have this year.  
  
The Ravens and 49ers - going back to the earlier table - put up the 11th and 12th best offenses on average.  They're actually pretty close on that metric - the Ravens averaged 24.875 points per game, while the 49ers averaged 24.8125 points per game.  Given that their pooled standard deviation on those means is 11.70 points there's very little reason to believe that one of these teams has a substantially (or statistically) better offense.

The 49ers scored less than a tenth of a point less than the Ravens on average, though they also faced slightly more difficult opponents.  Their opponents allowed 22.0673 points on average, while the Ravens' opponents allowed 22.9112 points on average.  While this might allow us to tip things *a little* more in favor of the 49ers I'd still be hesitant to say that anything was even close to a sure thing.  I've thought about it a while and don't know if I have any meaningful way to combine points earned and points allowed by specific opponents.  

Let's take a look at defenses - the 49ers did hold up to some of the early promise of a good defense by coming up as the 2nd best, allowing only right around 17 points on average.  The Ravens were somewhat in the middle of the pack, coming up as 13th best defense with right around 21 and a half points on average.

Remember where we got caught in a loop a while ago?  One of the problems was that we had offense and defense to worry about, though at least for this pairing it seems the offenses are pretty close.  The small point difference is also offset by the difference in opponents.

If defense is where the difference is it's hardly enough to be impressed by - the difference in defensive strength is 4 points, while the pooled standard deviation is just under 11 points.

The slight advantage held by the 49ers is also shown in that last table, as they show up as 6th overall while the Ravens come in at 11th.  Even this spread isn't huge, as it's partly due to the fact that a lot of teams are actually incredibly close in terms of mean points scored or allowed.  Forcing things into wins/losses allows for sorting, but carries a lot of error in these close match-ups that could have gone either way.  Let us keep in mind that the teams that are coming up on the top of our charts didn't have perfect seasons, but the games they lost they may have lost by very slim margins. 

All in all I was hoping that one of these teams would have meaningfully distinguished themselves on something, but it seems that these two teams in the super bowl really are pretty close - at least by the numbers.  If pushed it would seem that the 49ers have a slight edge, but what that relates to in terms of a point spread is pretty tricky.  If the 49ers are able to put up a defense that's able to stop 4.5 more points than the Ravens, and both play basically the same offense (with perhaps a slight advantage to the 49ers), then we're talking about less than a one possession spread.  Four to five points is right in that range of being just covered by a touchdown but not covered by a field goal.

If I had to make some guesses, then, the best things to work from are the scores we've seen so far - offensively 24.8125 vs 24.875 points per game, defensively 17.0625 vs 21.5 (49ers and Ravens, respectively).  Opponents of each team also gave up 22.0673 vs 22.9112 points on average, scored 23.4446 vs 23.4397 on average.

So, if team A is trying to score x points and team B is trying to hold team A to y points, the relative importance of offense vs defense would dictate the weighted average that is most accurate.  Given no reason to assume anything else I'm just going to call it an even split and take a normal average.  What that would mean is that the most likely score of this super bowl (still probably pretty unlikely) would be 23.15625 to 20.96875, 49ers.  Okay, so that score is not just unlikely, but impossible.  Silly imprecise sports. 

You might be going straight to the comments to point out that you can't earn 0.00005 points in a game of *normal* football.  More important is the fact that even the rounded score of 23-21 might not be the most common.  If we head back over to my favorite historical archive of all football scores ever ( http://www.pro-football-reference.com/boxscores/game_scores.cgi ) we can see that there have only ever been 46 games with an outcome of 23-21.  Given the amount of error we're playing with here I'm willing to take this prior information into account to some degree, especially given the fact that the *very* similar score of 23-20 is over three times as likely as 23-21.

In all, my best guess would be that the scores are pretty close, and somewhere in the low 20s both.  The 49ers seem to have a slight edge, but it's football and they only get to play one game.  Repeat this super bowl 100 times and then we can talk.


More than anything, it seems that this super bowl might actually be a close game.  I say that's always what I want, sooooooo I guess I'd better watch it.

Maybe I'll record it so I can get rid of the stupid commercials. (<- flame baiting)



 >>>>Update:
Here are the raw numbers for points scored and points allowed as requested in the comments.


Team Points scored Points allowed
Arizona Cardinals  15.625 22.3125
Atlanta Falcons  26.1875 18.6875
Baltimore Ravens  24.875 21.5
Buffalo Bills  21.5 27.1875
Carolina Panthers  22.3125 22.6875
Chicago Bears  22.5625 17.3125
Cincinnati Bengals  25.0625 20
Cleveland Browns  18.875 23
Dallas Cowboys  23.5 25.53333333
Denver Broncos  30.0625 18.0625
Detroit Lions  23.25 27.3125
Green Bay Packers  27.0625 21
Houston Texans  26 20.6875
Indianapolis Colts  22.3125 24.1875
Jacksonville Jaguars  15.9375 27.75
Kansas City Chiefs  13.1875 26.5625
Miami Dolphins  18 19.8125
Minnesota Vikings  23.6875 20.875
New England Patriots  34.8125 20.6875
New Orleans Saints  28.8125 28.375
New York Giants  27.46666667 21.5
New York Jets  17.5625 23.4375
Oakland Raiders  18.125 27.6875
Philadelphia Eagles  17.5 27.75
Pittsburgh Steelers  21 20.25
San Diego Chargers  21.875 21.875
San Francisco 49ers  24.8125 17.0625
Seattle Seahawks  25.75 15.3125
St. Louis Rams  18.6875 21.75
Tampa Bay Buccaneers  24.3125 24.625
Tennessee Titans  20.625 29.4375
Washington Redskins  27.25 24.25