The Skeptical Statistician: June 2013

Wednesday, June 26, 2013

Coin spinning, flipping, standing OR How to bias 'randomness' (Part I)

A few weeks ago, Zane Lamprey broke the Guinness Book of World Records record for "Longest live audio broadcast streamed over the internet" with 25 hours of non-stop broadcasting. He also successfully funded his new show "Chug" through kickstarter, for those of you who ever saw his prior shows.

During this 25 hours Zane and his co-hosts and guests talked about a lot of stuff. There was a limit of 5 seconds of continuous dead air, so they pretty much talked non-stop (especially Zane after learning about the 5 second rule). One of the things they talked about was the idea that while coin flips are basically fair, coin spins tend to be a bit more biased. Allegedly, it's even worse if you stand a coin on its edge and then bump the surface on which it's standing.

The general idea is that a coin flipped with enough velocity imparted into rotation is basically fair. A coin flipped by someone looking to bias it may be able to shift that fairness slightly, and someone looking to professionally bias it for a living may be able to eradicate that fairness altogether. Practically speaking, however, if you're trying to flip a coin in a fair way it's pretty easy to do so (e.g. get a good toss and spin on it, and don't pay attention to which side started up or down, or better yet have someone who hasn't seen this call it, etc).

Things happen differently when you stand a coin on its edge and then agitate the surface, or spin the coin instead of flipping it. In both cases it is alleged that you'll see a lot more 'tails' results, at least on Lincoln Memorial pennies.

The bottom line is that a coin flip or spin is simply a system of knowable physical properties. A physicist with enough gumption (these folks are mathematicians) could sit down and write a Lagrangian for the system to determine the outcome given starting conditions (e.g. position, initial velocity, initial height, etc). I'm by no means saying that it would be easy, I'm just saying that it could be done. [The easiest case to start on - those of you who have already opened their lab notebooks - might be the standing on edge case where the force comes from one side.]

Without such a Lagrangian, people like to just come up with reasons for things they observe, and then apply them without worrying about pesky things like the scientific method.

From: http://www.makeitsolar.com/science-fair-information/01-the-scientific-method.htm

For instance, people who have examined this so far have come up with two main suggestions, the first far dominant over the second.

1) One side of the coin is heavier than the other, and so a spinning (or disturbed standing) coin will fall toward that side. Thus, the other side will come up more often.

2) The way coins are struck leaves the edges to one side smoother than the other, and so a spinning (or disturbed standing) coin will fall toward that side. Thus, the other side will come up more often.

Both would appear to make some sense, though warrant further examination.

Now, I'm also not saying that it's by any means practical to deduce any of these smaller effects by brute force statistics. The whole point - as will hopefully become apparent - is that by manipulating these effects we can potentially make them large enough to overpower any potential smaller confounds (like imperfect spinning, or the fact that the table I'm spinning on isn't a polished, frictionless surface). There is a balance to be struck here between the extremes of dismissing an experiment out of hand because it would be impossible (or impractical) to do it perfectly and not doing an experiment at all because you think you know how things work and don't want to take the time.

Let's start with some coin flipping.

Most of this research on pennies is a few years old, and as such focused on the old US penny design with the Lincoln Memorial on the 'tails' side. I was able to find one of those in excellent condition sitting around - the idea being that a penny that had some wear might a) collect 'gunk' on one side or the other, leading to an unequal weight distribution, b) receive uneven wear on the faces of the coin, leading to an unequal weight distribution, or c) receive wear on the edges, removing any bias in edges from striking.

Flipping was rather boring, and led to a fairly predictable outcome. Of 50 attempts, 24 came up tails and 26 came up heads. Is that biased?

Well, I could write a much longer post just about how many times you realistically have to flip a coin to tell if it's actually biased, but let's just quickly compare this result to something which we have reason to believe should be quite a bit less biased.

If you read the post a few weeks ago you know how to generate random numbers in a program like Excel or Google Docs. We can quickly generate 50 random numbers that way, which will fall between 0 and 1. Then we can check what proportion of those fall into the top or bottom half of that scale.

I would suggest you do this for yourself just to see how such a random distribution breaks down at this sample size. My first draw got me a distribution of 23 'bottom' and 27 'top'. It's perfect to illustrate my point, so I'll stop there - 24/26 split on coin toss seems fair enough to me for all practical purposes. I could flip it again until I have the same number for each side and then stop there - would that satisfy randomness any better?

No, that would be cheating.

How does this same coin fair on the spin test, though? Well, it's a little worse. This time the split is 29 tails to 21 heads. I'm a little more impressed with this, for two reasons.

First off, it's a little more outside the range of what I'd be expecting. You might say, well you just observed a 27/23 split on something you are holding up as random, so why is two more off this that impressive?

Well, second (off?), we have directionality in our hypothesis this time. The expectation is that tails will come up more, which is what is happening. I'm confirming something that has already been shown, rather than deriving something from sheer exploration. When flipping I wasn't expecting a bias in one way or the other, so being convinced of a bias in either direction should be more difficult. How much more difficult is an entirely different discussion.

It should be said that these spins - at this point and continuing through the rest of this post - have all been done with the same direction of rotation relative to the ground. That direction is clockwise looking down from above, or a right-hand rule result of negative Z in a Cartesian coordinate system. There is much to be said about testing a counterclockwise or positive Z spin, but that's more than I want to realistically talk about today.

I wondered if this would hold up on other coins, so I found another shiny new penny - but this time of the new 'shield' penny variety. Without a hypothesis here I'm back to a bit of exploration, but found similar results of 29/21. Interestingly, though, this time in favor of heads instead of tails.

By the way, I have no reason to believe that this or any other coins I have is biased in flips (and frankly, I simply don't care), so I didn't check flips on this or any other coins. Looking back I'm wondering why I even did on the first penny.

Anyway, looking at one type of penny at a time seems prudent as a start, so I'm simply going to let the shield penny slide for now.

There's another part to this, though, and it's the whole standing up and bumping. The idea is that if you put a penny on its edge, then destabilize it, it's biased again toward tails.

Turns out it's not the easiest thing to stand a penny on its edge, but the results seemed so clear so quick that I didn't have to do it for long. Of ten attempts, I ended up with 9 tails and only 1 heads.

Before you start speculating, I also tried to vary things as much as possible (you might say introduce as many confounds as possible) as to check that it wasn't just the fact that my table was tilted in one direction or another. If one trial was in one area I tested the next with the coin 180 degrees from that, then 90, then 180 from the 90, then a whole different area of the table.

I also tried to bump the table with fairly uniform strikes from my fists some distance away from the coin and from random (and sometimes dual and competing) directions. Nothing really seemed to change the fact that this coin wanted to go tails up.

Intrigued, I reached the part where science comes into play.

Some of the brightest people I've ever worked with have always pushed the idea that unless you're able to manipulate an event, you don't actually understand it.

In this case, people claim to understand how this whole penny problem is working out. They postulate that it's one of the two above ideas, but I've been unable to find anyone who has actually sat down to manipulate either of them to actually strengthen (or weaken) the effect.

So, I sat down to try and strengthen (or weaken) the effect.

The first thing on my mind was how to see if weight played any role. There are two obvious ways to manipulate this - add weight to one side or remove weight from the other.

Not wanting to deface any coins at the moment, I decided to try to add weight to one side of the penny by simply adding some small cut squares of packing tape. I made sure that they didn't extend to the edges so that they wouldn't interfere with any spinning, and the result was a coin that from a distance didn't really look much different than normal (since packing tape is transparent).

I put the tape on the tails side to see if I couldn't negate the effect that we're seeing. If the heads side is in fact heavier, then adding weight to the tails side should (at some level of weight addition) eliminate that bias.

Ten trials with the coin stood on edge, and no great change in which way they fell. Instead of 9 tails and 1 heads, this time I found 8 tails and 2 heads. The difference is in the direction I was expecting, but it is by no means the game-changer that would eliminate that bias. Wondering if the weight was enough to do anything, I decided to try the same with spins.

Keep in mind, the last time I tried spinning this same penny resulted in the expected bias toward the tails side: 29 tails vs 21 heads. The effect wasn't as strong as the standing on side effect, but it was there.

Of 25 spins (I was getting a little lazy), the effect does seem to be reversing. By putting a small amount of tape on the tails side of the coin, it now appears to be biased a bit toward falling heads up - 16 heads to only 9 tails.

Putting the tape on the other side of the coin is also biased toward heads, though. It's a little less, 14 vs 11, but it might mean that the tape isn't really enough to do much, or there's something being canceled out here that's bringing things back to near-random.

A little bit of tape is one thing, but I found myself wanting to really knock this thing out of the park and fully bias a coin in one direction or another. Frankly, a penny is too small to add much weight to, but a quarter gives a lot more surface area to play around with.

To start I figured I'd check to see if an old (but good quality) eagle-backed US quarter would be biased in the spin test.

Emphatically yes, it would appear. In the first 10 spins only 1 came up heads. I tried another quarter, and it appeared to be similar. I could test more quarters and see if there's a consistent effect here, but what I'm really trying to do at this point is simply show that something I do can change the result. I don't care what the starting result is so much as I care that by some intervention it can be changed.

That intervention is (ostensibly) weight. A few pieces of tape on a penny is one thing, but for this test I wanted to really just overpower smaller effects. So, I used some tape to carefully tape a dime to one side of the quarter.

For reference, I used the old loop tape onto itself and put under technique, like you might use to hang a poster on your wall. That way I didn't run the risk of taping too close to an edge and introducing other variables. The dime sat safely in the middle of the face of the quarter, so there was no worry that its edge might graze the table either - if the dime was touching the table it was already well on its way to (read: unavoidably) falling onto that side.

The weight of a dime would seem to be drastically greater than the weight differential of either side of the coin, and my only concern was the fact that I'd done something aerodynamically detrimental. Something to think about, though it turns out (later) that other things might actually be at play.

[By the way - as a quick aside - the addition of this weight was enough that the quarter was no longer willing to stand on its edge, thus making the standing and bumping aspect of this impossible.]

If the above weight arguments are working properly, then the idea would be that the head side of this quarter was heavier, and thus causing the quarter to fall with the tails side up. The first natural thing to try, then, was to place this dime on the tails side to see if that weight would pull that side down faster, leaving the heads side up.

And this is where things start to get weird.

10 spins.

10 tails side (with dime taped to it) up.

Oddly, then, it would seem that I'd solidified the effect that had already been occurring. There is simply no way that the weight differential was still in favor of the heads side of the quarter (if it ever had been).

There's a simple way to confirm that - move the dime to the heads side of the quarter and see if that will reverse the effect.

Well, yes. 10 spins, this time 8 heads (with dime up) and only 2 tails (with dime down).

I was such in doubt of my own results that I ran both cases again.

The dime on heads side was similar, with 7 heads (with dime up) and 3 tails (with dime down).

I kept spinning the dime-on-tails-side hoping that I'd eventually get a heads (dime side down) result. I have yet to reach that point. I spun 40 more times, just to get to a nice even 50 overall, and have yet to end with anything other than a tails (dime up) result.

It only took me a few spins on this second run to figure out what I believe might be going on in this case, though. It's not the fact that a heavier side will fall (as it might be if it was standing on edge), but rather that introducing more weight on one side of the coin shifts the center of mass.

Why is the center of mass of the two-coin system important? Well, it's potentially important on the one-coin system as well, but in the two-coin system it has a pretty profound impact on axial tilt even on a pretty hard spin. Systems rotate around their center of mass, so if that is pushed outside of one of the faces of the coin (or toward one of the faces, in a less extreme example), the axis will drift to keep that side inside the spin.

That is to say that even on a spin with a whole lot of energy the spinning coin fails to achieve a spin perfectly perpendicular to the surface on which it is spinning. It maintains an axial tilt that keeps the dime side 'internal', so to speak. As the coin slows, the axial tilt increases, as it's only the energy in the spin that is keeping it even close to a zero tilt system.

The short answer (and longer question) is that an unequal distribution of weight might cause differential effects in spinning vs standing coins due to the fact that axial forces can come into play during a spin but only gravitational forces will come into play (ideally) in a standing bump test.

All in all, from a numbers point there seems to be something here - though it's certainly not as simple as it might first appear.

What can we really get out of this so far?

Well, edging of the coin might not be a strong factor, at least not as strong as some of these weight changes.

The bump test and spin test seem to be producing effects of different sizes, or at least effects that are more robust to interference. More importantly the bump test might be driven by gravity and the spin test driven by shift in center of mass.

Placing a relatively large weight on one side of a quarter tends to favor that side ending up face up, though there are also some problems of shift in the center of mass and axial tilt. This might come into play in all coins, to a lesser degree.

Overall, though, I think I'm left with more questions than answers.

There may be reason to believe that at least some of the forces that might be working on a spinning coin are based on spin direction - I've kept that constant so far but might find drastically different things with a reversed spin. The only thing that should operate this way would be the Coriolis force, which seems like it may be one of the smaller effects operating in this system.

Taping a dime to a quarter is a quick proof of concept, but the additional protrusion that this introduces leaves me a little unhappy. I'd like to figure out a way to increase the weight distribution without changing the shape of the coin, but that would seem to involve some metalworking.

I didn't look specifically at edging yet, but it would seem that a quick brush with some sandpaper might be enough to give one edge or another a smoother...edge. The problem with this is - unlike some tape on the face of the coin - that such a technique is destructive to the object being tested. Unless I had two coins that I believed to be - for all intents and purposes - identical, I couldn't test the second edge sanded alone after I'd already sanded and tested the first one.

All in all I thought this was going to be fairly straightforward, but some of these odd results have be a bit intrigued. I've put a (Part I) on this because I want to spend some time thinking about this as well as running it past others - by all means if you have suggestions or thoughts post them in the comments. I'd love to figure out what's actually at play here.

Wednesday, June 19, 2013

Why the Stanley Cup Finals Don't Have Shootouts

It's Stanley Cup Finals time, and that means it is time to look at some hockey numbers.

(borrowed from http://www.printactivities.com/Mazes/Shape_Mazes/stanley-cup-hockey-maze.html in the spirit of fair use)

I always say that for most sports I don't care as much about win or lose as much as I care about just seeing a good game of [sport]. I think that holds fairly well for hockey, though in this particular Stanley Cup series I'm cheering pretty hard for my hometown team (the Blackhawks).

Nothing says good game of sport like playing it longer than normal, right? A blowout in either direction is usually pretty boring, and coming to the end of regulation in a tie usually means just the opposite - a fierce, well-matched game of sport has likely been played to that point.

If you like watching hockey, you've already received a sort of buy two-get one free deal on the first two games of the Stanley Cup Finals between the Blackhawks and the Bruins (and Bruins fans got an extra freebie on that third game). Two full periods - and some change from two other partial periods - were played outside of regulation in just the first two games.

If you watched those games, but don't really follow hockey a lot, you might have been scratching your head at some point during the second or third overtime. "When do they get to the point where it's like in the movies and guys just shoot on the goalie?"

Well, we're in playoffs now, so...never.

You see, the rules of overtime play are different in regular season and post-season play. In regular season hockey, overtime is a sudden-death (i.e. first team to score wins) five minute period, followed by a short break and then a shootout. In post-season play, you just keep playing sudden-death (but otherwise normal) 20 minute periods until someone scores.

As long as no one scores, the game will simply go on and on. If you're interested, the longest game of NHL hockey extended into the 6th overtime, and finished with a total of 176 minutes and 30 seconds of ice time. The game was in 1936 between Detroit and Montreal, and the goal scored 176 minutes into the game was the only goal of the game. If I could go back in time and watch any one game of hockey, that might just be it.

You may be asking "Yeah, but why make them play so much hockey? Why not just go to a shootout? If it's good enough for the regular season it's good enough for playoffs. Come on."

Well, truth be told, shootouts aren't really good enough for the regular season. It would appear that they're tolerated simply because they make great ends of movies (i.e. they're fun to watch).

I'd always been told that the reason shootouts aren't in the playoffs is because they are poor predictors of actual skill. Granted, they measure a particular level of skill (I couldn't go out there and score a shootout goal against...well, probably anyone), but they completely fail to differentiate skill levels within the range of skill being measured (professional hockey players).

Imagine if a basketball game tied after a few minutes of overtime simply came down to a free throw competition, or - more appropriately - a series of one-on-one layup attempts against a team's best defensive player.

Imagine if a baseball game's extra innings consisted of every fielder except the pitcher and catcher taking the bench, and every hit simply being an in-the-park home run (actually, that might be awesome).

Imagine if football's overtime was, well, exactly as they do it now. Come on, it's not like they want to play extra football.

Anyway, I've always just accepted the notion that this skepticism surrounding shootouts was true. Given the shortened season this year, though, I figured it would be easy enough to go into the team records and pull some data on who actually does win the shootouts in the regular season.

Given that teams play each other (duh), I was able to cover a large majority of the shootouts that took place this year by looking at three random teams in each division (there are five teams in each division, three divisions in each of two conferences).

This sample produced 83 regular season games which were decided by shootout.

I was also able to determine - based on their standings at the time of the game involving the shootout - which team was favored to win, and which was the underdog.

If shootouts are actually getting at the skill of the team, then we'd expect to see teams with better records more likely to win shootouts (because they've shown themselves to be better at winning games, which is the main established criteria of hockey skill).

I'm not sure I really have any way to continue to hold you in suspense other than this sentence, so here is this sentence that's really just designed to hold you in suspense for the time it takes you to read it.

Of those 83 games, 45 (54.2%) of them were won by the underdog. Only 38 (45.8%) were won by the team with the better record at the time.

Now, that's not too far from random chance (a 50-50 split), but random chance isn't what we're going for here. Random chance would be if the NHL simply ended ties in regulation with a coin flip. If we wanted to show that shootouts are useful they would need to display some bias toward the team with the better record.

Not only are we not seeing a bias toward the winning team, we're potentially seeing a bias against them.

So next time you're watching a shootout in regular season play (or someone you're with complains about lack of shootouts in the playoffs), take a coin out and give it a flip beforehand. It might actually be a little bit more fair.

Wednesday, June 12, 2013

Distribution of birthdays and the availability heuristic

My birthday was this week, as were a number of my friends' birthdays. I've always held the belief that the week or two surrounding my birthday is heavily populated by other birthdays of people I know, more so than any other time of year. I know a few people that share my birthday, a few people that are the day before, a few that are the week after and before.

It can't be that my birthday week or two is special, though, right?

facebook is only good at a few things, and holding on to birthdays is one of them. If you go into the events sidebar and then click on some stuff that makes sense when you see it you can eventually get to a calendar view listing off all your friends' birthdays (or at least those that use facebook and have put their birthday on it, and that are telling the truth).

It's a pretty easy way to pull down some data on a large majority of my friends' birthdays. I have no reason to believe that birthday would have anything to do with friends' refusal to use facebook, so hopefully that data is missing at random.

It's a fun exercise, and I'd recommend you to do it if you're bored one day. You might also think that your time of the year is special, and - well - if you do I'm here to tell you that it's probably not.

Well, unless your birthday is Halloween:

October 31st seems to be the most popular birthday, by a decent margin. Seven of my facebook friends were born on Halloween (hi guys), with the next most popular date being June 7th (with five people - hi guys). All other dates have four or less birthdays (hi guys, sorry you're not popular).

June 7th is pretty close to my birthday, so maybe this is starting to look like I was right all along?

If we're looking at the days with the most birthdays, we can also take a look at this with the least. There are a lot of those - days where none of my facebook friends have a birthday. What's the longest stretch of birthday-free days? Well, there are two stretches of five birthday-free days.

Interestingly, they are right around the last place I would have expected them.

Well, that would seem to fly in the face of early June being anything particularly special.

There's still some other things we can look at, though. How about months? What's the most popular birthday month?

Oh, months have different numbers of days? Okay, here's the same thing but normalized.

Or we could just look at quarters.

Overall, it looks like the late summer months are somewhat weak, as August and September are the only months where the average birthdays per day drops below 1. Interestingly, May/June - despite some large gaps that we saw earlier - is still quite strong. Both these months are closing in on one and a half birthdays a day. This might actually start to explain some of what I've been picking up on through the years.

Perhaps the fact that June is still such a strong month even with so many days without birthdays is because the days that have birthdays (which happen to be around my birthday) have more birthdays than normal.

Well, the average birthdays per day across the whole year based on my friends' reported birthdays is right around 1.1 a day. Those 3s, 4s, and 5s in that above table are starting to look pretty good, right?

In all, it's kind of hard to say one way or another that any part of the year seems to be the most populated (though late summer does sort of seem to be kind of desolate). It is certainly clear that I have been a least a little biased by remembering the birthdays of individuals near mine (probably because I say at least once "hey your birthday is pretty close to mine!). The spread of other birthdays does seem to be decently random, though Halloween still seems at least a little confusing. What's the deal with Halloween, seriously?

Vampire/werewolf/zombie/ghost doctor conspiracy?

The moral is clearly that you should never trust anyone with a Halloween birthday (sorry guys - you might be ghost doctors or something). Happy (belated/eventual) birthday, everyone!

Wednesday, June 5, 2013

Contestants' Row: Position is Everything (Games of The Price is Right)

Time for some more TPIR. Today, we're going to talk about Contestants' Row again. If you missed the first post on Contestants' Row you might start here, though unlike some posts this one is fairly standalone.

I've been coding more episodes, and one thing that has started to stand out is the proportion of time that the contestant who has the last bid on Contestants' Row wins. It is not surprising that they would win the bidding most frequently, though it seems that they win a lot.

From a simple betting standpoint they should have the odds in their favor for two reasons. First, they get to choose a bid mapping to a range of numbers with full knowledge of the other contestants' bids. Second, no other contestant gets to bid after them, and thus no other contestant has the potential of cutting out a part of that range of numbers.

For instance, if there are bids of 600 and 900 registered, and a contestant bids 601, they now have the range of numbers from 601-899 covered. If it is the third contestant bidding 601 there is still another contestant who could cut that - even all the way down to just one number (by bidding 602 and stealing the numbers from 602-899). If it is the fourth contestant bidding, that range of numbers (601-899) is perfectly safe.

A simple frequency count on the winner of the Contestants' Row bids for the episodes I've watched reveals that my suspicion is quite correct - the fourth contestant does win a lot. Like, a lot.

If we break it down to simple odds, the fourth spot on Contestants' Row has nearly twice as great a chance of winning as the next best spot (39.3% for fourth spot, 21.7% for first spot, 21.4% for second spot, and 17.5% for third spot).

It's an interesting point that the third position is actually the weakest - they should have some advantage from knowing what the first two bids have been and only differ from the fourth bidder in that they have someone who goes after them. This may mean that the knowledge of prior bids is simply a much smaller effect than lacking someone going after you.

I have noticed in casual watching that the first contestant also seems to win frequently. Anecdotally it would appear that they often win by getting exceptionally close to the eventual price, blocking out the other contestants who bid after them. This would lead to the question: do winners in different positions on Contestants' Row need to get closer to the price in order to win.

We can take a look at this by looking at the winning bids of each position, when they win. Particularly, we can look at the error in the winning bids by position. Because we're only looking at wins the bias is unidirectional - a winner can't have bid over the price of the item, only below or exact.

Interestingly, the first position does not has the highest rate of perfect bids (as I might have guessed) - the second position does. The first position does, however, have the highest proportion of winning bids within a range of $100 - around 45% of winning bids (including those perfect bids and those from $1-100). In fact, we can rearrange this chart to better illustrate this point.

This shows the cumulative percent of contestants in each position who have won with any given bid or lower. It is clear that when the first position wins on Contestants' Row they're actually doing it by bidding fairly well. The later bidders may suffer from trying to move away from this already established (but good) bid, though that's hard to identify.

It is also clear that winning from the fourth spot takes the least degree of precision - winning bids from the fourth position are on average off by more than any other position. This is very likely due to the ability to bid either $1 (securing the range of all numbers below the lowest bid) or $1 above the highest bid (securing the range of numbers higher than the highest bid). Both these bids have the potential to win while being off by a large margin.

Because the position moves around based on who wins, a fourth position win actually makes the prior third position the new fourth position. This is some solace to contestants trapped in the seemingly weakest third position, as long as there are a number of bids left. A fourth position contestant who loses their bid to the first bidder gets to retain their fourth position - a very fortunate turn of events for them, again given that there are some bids left.

Overall, it would appear that starting in fourth position is the best way to go, though it's also sort of out of your hands. While this information might not be practically applicable in the moment, it might make you feel a little better if you get trapped in third position and go home empty handed.