The Skeptical Statistician: This one is about Halo 4 (but also about the association of nominal dichotomous variables)

If you follow video games - and even if you don't - you may have heard of the Halo series. Halo 4 came out recently, and I've been playing a bit of it. It's a good game, though that's not really what we're here to talk about today.

The Halo series has always been pretty good at keeping very detailed statistics for everything you do in the game, and Halo 4 is no exception. The website Halo Waypoint allows you to access a ton of great information about how you've been playing the game.

Now, a major component of Halo 4 is the multiplayer aspect. Much of the time I spend playing is playing online with friends - in fact, using the game to keep in touch with distant friends is a big part of playing. There are a number of different game types you can play, and there are also a number of different maps that you can play on.

While playing with different friends I started to notice an odd trend in one game type on a certain map. Specifically, for those who care, the game type that seems to be producing these odd trends is the objective-based game type of dominion. When a game is being set up, the way things work is that a number of players are found, and then those players vote on a map to play on. Once the map is selected, Halo divides the players into teams, assigns them to either red team or blue team, and then starts the game.

What I started to notice - and initially joked about - was the fact that while I was playing with one friend in particular we would always be placed on blue team on a certain map. Let's call that friend Brad. That map - again for those who care - is the map called Longbow.

This started as a joke, but as things progressed it started to seem more and more true that being in a group with Brad meant that the game would always assign us to blue team while playing dominion on Longbow. This assignment is (seemingly) random, and completely out of our hands.

Like I mentioned, Halo Waypoint allows you to pull down a whole lot of stats about what you're doing in the game. It was fairly easy to go in to my play history and pull out all the games of dominion that I have played on this map. I was then able to sort these into games on two criteria: where I was playing with this particular friend and those where I wasn't, and if we were on blue team or red team.

What this produces is a two by two table that looks like this:

	With Brad	Without Brad	Totals
Red Team	2	14	16
Blue Team	14	10	24
Totals	16	24	40

This is called a contingency table, and represents the multivariate frequency distribution of these variables. Each game can (and must) have one of two values on each of these two variables. In every game we have to be placed on either red or blue team, and in every game Brad is either there or he isn't.

These variables are special in that each of them has only two values that they can take. Variables of these type are a special type of nominal categorical variable called dichotomous variables, based on the two values that can be taken.

If you have two dichotomous variables there are a few tests that can be used to look at the relationship between those variables. What we're going to use today is Fisher's exact test.

This test was devised by R.A. Fisher back in the 1920s. Anecdotally, he devised this method to test a colleague who claimed that she was able to tell by taste whether the milk or tea was added first to a given cup of tea (a seemingly difficult claim). Fisher proposed that he would give her eight cups of tea - four of each type - prepared and presented in random order. The woman (Dr. Muriel Bristol) was able to successfully identify all cups correctly.

Regardless of how successful she was, you can imagine producing a contingency table of this data. For her specific case it would look like this:

	Says Milk First	Says Tea First
Actually Milk First	4	0
Actually Tea First	0	4

Long story short, this data can be analyzed with a Fisher's exact test. A significant result means that there is a relationship between the two variables such that information about one provides you with better than random information about the other. In the case of the lady tasting tea (as the experiment is known), there is a relationship between what Dr. Bristol said and what was actually the case. In fact, there is a perfect relationship in this case - each of the 4 times that she said milk was added first the milk was actually added first, and each of the 4 times that she said tea was added first the tea was actually added first.

The lady tasting tea experiment data is significant, but what about our Halo data?

Well, it's significant too, actually. If you want to try it yourself there's a good online calculator here:

http://graphpad.com/quickcalcs/contingency1/

In order for this to be statistically significant we're looking for a p value below 0.05. Our p value in this case is actually 0.0074, below 0.05 and thus significant at that level.

What this means is that there is a statistically significant relationship between our two variables. Our two variables are the presence of Brad, and the assignment of team. The presence of Brad in my game is actually statistically related to the team to which we are assigned. This would seem to make it appear that our team assignment is not random on this map.

This got me wondering about other maps. This effect was only large enough for me to casually pick up on for this one map, but might things be similar on others? If other maps don't show these relationships then there would potentially be an effect of the map.

The map Longbow gets played a lot for the dominion game type, but there's another map that gets similar (if not a little more) play. That map is Exile.

So in the same way as I pulled down the numbers for Longbow I went in and pulled down the numbers for Exile. Here they are:

	With Brad	Without Brad	Totals
Red Team	13	18	31
Blue Team	16	14	30
Totals	29	32	61

You can use the same calculator to run the same Fisher's exact test, but for this map we fail to find a significant result - the p value for these numbers is greater than 0.05 and thus not significant (it's actually 0.4462).

This would seem to implicate that the effect that we're seeing is map specific, and specific to the map Longbow. Having Brad in my group doesn't seem to impact team selection on Exile - it follows with the fact that unlike Longbow I haven't casually noticed it on that map, or others.

So, there you go - find some other dichotomous variables in your life and put together some contingency tables. You might be surprised (like I am) at what you find.

343 Industries, maybe you should look into this? And what more could I find if I had access to your full overall data? Probably some pretty cool stuff. =)

The Skeptical Statistician

Wednesday, January 16, 2013

This one is about Halo 4 (but also about the association of nominal dichotomous variables)

No comments:

Post a Comment