The Skeptical Statistician: 2014

Wednesday, July 9, 2014

Procedural generation in video games: A great lesson in measurement and variability

It has now been a while since E3 2014, where by many accounts one game sort of stole the show. That game is called 'No Man's Sky'.

If you want a quick rundown, check out this video:

The premise of No Man's Sky is captured well by one of the things mentioned by the developer in the above video. The demonstration shows off a landscape and a bunch of creatures, and the developer notes that this whole scene isn't something they 'built', so to speak. It is something the game (sure, the game that they built) built, and something that they found.

It is a very unique and potentially missed point that the developers themselves don't really know what might be on the next planet until they go there and discover it. They know the parameters of the universe, but they can experience the exploration of everything (the core of the game) just as much as any player.

It is a point perhaps best captured by a simple statement:

With a powerful enough computer you can simulate the universe.

There's a lot more nested in that statement. We could start talking about the Matrix or all of that sort of stuff and get into a discussion of if we're actually in a simulation of the universe right now, but that's not what I want to talk about today. What I want to talk about is the underlying assumptions about variability that make this game (and many others) possible. This game is clearly possible because we're watching video of it, right?

In fact, a lot of people are bringing up the point that many people consider a game like this impossible.

http://kotaku.com/how-a-seemingly-impossible-game-is-possible-1592820595

Let's walk through why it's not.

The term 'procedurally generated' is sort of a hot buzzword in gaming at the moment. There are plenty of games that utilize this fairly new method of creating the worlds in which we can then game.

If you've never heard the term, the quick breakdown is that content is not generated by the game developer and then shipped to the customer. The game developer instead creates a game engine that generates content based on sets of rules. That game engine is what is sold to customers, and all the content is generated on the fly by the customer's machine before or as they play.

This idea has been around in some form, and to some degree, for some time. For instance, Fisher Random Chess is more or less procedurally generated chess based on a few rules of standard placement (e.g. bishops must end up on opposite color squares, the king must end up between the rooks). It was designed to try to nullify the decided advantage that comes from memorizing chess openings and instead give more of an advantage to the skill of reacting quickly and accurately to new situations.

Not surprisingly, a lot of people who are really good at regular chess seem to hate this idea of being placed in new situations where all their accumulated knowledge isn't particularly useful.

Anyway, to bring it back to video games, I have recently been playing a game called Starbound. Similar to No Man's Sky, it produces a universe of random worlds which you can fly around and explore in your spaceship.

While No Man's Sky looks like this, though:

Starbound looks like this:

It's not to say that one is better or worse than the other, they're simply different takes on an idea. Starbound is a lot of fun, and also hits in the nostalgia feels for those of us who grew up on 2D platformers.

I've also played a bunch of games like Rogue Legacy:

And, well, I've certainly played way too much FTL:

You may have played some of these games, you may have not. The thing that links them all is the concept of procedural generation. No single run of FTL is going to be like any other run of FTL (well, except for the fact that you are likely to lose). No dungeon layout and set of character traits in Rogue Legacy are going to match up across one playthrough to the next. I'm sure I've played other games utilizing procedural generation, but probably not as much as these three.

http://en.wikipedia.org/wiki/Procedural_generation#Games_with_procedural_levels

The only other game that comes to mind, and which you have almost certainly at least heard of, is Minecraft.

I'm not sure why I'm reluctant to put Minecraft into the same category as the above three (or four) games, but it is either because it doesn't fit well enough or because it just fits too well. The world in Minecraft is procedurally generated by seed numbers and by some estimates can be explored to reveal somewhere from as little as two or as much as eight times the surface area of the earth.

Don't believe it? Check out this guy who generated a mine cart track with his world and is at the moment (for a few more days at least) riding at 20m/s toward the end of the world. How long will this take? 17 days, by current estimates. Not 17 in-game days, 17 IRL 'Earth spins on its axis' days.

http://kotaku.com/itll-take-him-17-days-to-get-to-the-end-of-minecraft-1598772102

It's fairly beautiful to watch, and if you don't understand Minecraft it might be a good way to get a feel for the kind of exploration that pulls people into it. I have watched the stream here and there. Every once in a while I see something cool off in the distance and get what can only be described as a compulsion to explore it. It is exploring something new, something that no one has ever explored before. What's going to be over there? Nobody knows.

Probably some cows and chickens, though, most likely.

Like cows and chickens, there are some parts of Minecraft that don't change. Pigs are never purple, for instance.

To be fair, the enemies in Rogue Legacy aren't procedurally generated, nor are the weapons in FTL. In Starbound, for contrast, both enemies and weapons are procedurally generated. Perhaps I'm nit-picking.

If you checked out either of those links, though, you'll see that enemies (or weapons) in Starbound fall into some pretty basic categories. For instance, there is a section on quadruped enemies, that include things that look like this:

It's very possible that I've never seen any of these enemies in my game of Starbound, however. Each is generated on the fly from a few basic parts. The wiki seems to think these parts (for quadruped enemies) are head, legs, body, and tail.

Already you might start to see some of how this is holding together. There is an assumption that all of these quadruped enemies will have some sort of a head, for instance. Here lies the tip of the iceberg on how all of this is possible.

The clip of No Man's Sky above has some things that look like dinosaurs. It also has some things that look (roughly) like gazelle. The generation of these creatures is random, but based on certain rules and algorithms.

For instance, creatures of a certain size or type might be required to have some number of legs not equal to zero. At the same time, number of legs might be bound to even numbers, and we might not ever run into a species of three-legged gazelle.

Quadrupeds also need to have a torso, presumably, to have a place for their legs to attach. At the moment they might also be bound by some rules about head and face, as well. All of these things are built into the algorithms to generate the world (or universe).

At the heart of all of this is a (probably implicit and potentially unspoken) discussion about measurement.

The first step of measurement, and most critical cornerstone, is definition.

There is a definition of what an animal is, and what a plant is. There is a definition of what makes a planet, and what makes the space between them. Some things are variable, and others are not. The No Man's Sky developers talk about the process in some of those articles. They don't build animals, they build prototypes of animals. They build things that look like animals and then decide which parts can vary and which parts are fixed.

A fully procedurally generated animal based on nothing but randomness might look something like this:

Not really too much like an animal, right?

In this there exists a crucial difference between randomness and randomness within boundary conditions.

The above image is randomness. The above games are randomness within boundary conditions.

By defining what things are crucial for something to be a quadruped (four legs, torso, head), you can start to vary the things that are not crucial for something to be a quadruped (color, size, facial expression).

This is the real magic that is being done in all these games, and is the part that is seemingly going widely unrecognized. People are having to sit down to define what characteristics of a thing are free to vary and which things are uncompromisingly critical and not free to vary. This is a really cool exercise, and I think people are missing it.

A planet in No Man's Sky has to be spherical to weakly ellipsoidal. The planets you might find in a game like, say, Mario Galaxy, are not the kind of planets that would be procedurally generated by No Man's Sky algorithms.

Nothing against Mario Galaxy, I enjoyed it as a completely different type of game. It does get me thinking, though, and thinking often leads to Googling, and Googling often leads to me finding crazy things I never would have expected to exist.

So, I'll leave you today with that idea of procedural generation, and the concept that the next big innovation in gaming might be a content-lite industry simply churning out the coolest simulations of the worlds we want to game in.

I'll leave you with one other thing, though, the product of my above Google search. Think any games are immune to procedural generation? Exhibit A to get that discussion properly started can be found in the link below: procedurally generated Mario Bros. levels.

http://eis-blog.ucsc.edu/2010/04/infinite-fun-mario/

Enjoy!

Wednesday, July 2, 2014

The Facebook Study: Why we shouldn't be surprised/impressed/shocked/etc.

Well, it's that time again. The facebook did something you don't like, and now you're angry. Well, yeah. But what did they really do? And why are you really angry? More than that, why are you really surprised, and will you really be angry tomorrow? Tomorrow, will you even remember this happened?

Here's the breakdown of what seems to have happened.

At some point in 2012, someone at the facebook likely came up with an empirical question: 'does the content of the user feed impact the user experience?'

Start discussing this idea for a while and you'll get to the meat of the discussion: 'by altering content of the user feed, can we impact the user experience?'

They decided to try, and instead of doing this kind of research on a reasonable sample of individuals, decided to just go nuts and try it on about 700,000 people.

Then, once they did this, and had this data, the story seems to be that they had no idea what to do with it, and got in touch with some guy at the cornell. He was like 'cool, this is an easy publication', ran the numbers on it, and got it published in an academic journal.

On the surface, most of the individual pieces of this are nothing surprising. The heart of it (the idea that a company might change the user experience differentially and then see which seems to work better) is standard practice.

This part, alone, isn't surprising for the facebook, or for the Google, or the Apple, or the JCPenny. User experience testing is nothing new. Think about any website or store that you've visited over the course of a decade or so and you'll see small changes over time. Only the most poorly run of these companies make these changes blindly and without planning and testing. There are levels of this, though, and there is also a thing called restraint.

Let's think of a simple example. There is a lot you can learn from sitting people down in a room and asking them what they think about a change to something they know. Sometimes they like it, sometimes they don't. Collecting human subjects data in this way is rarely controversial. It's not very deep, and there's not really that much to it. Things only go south when you really have no idea what you're doing, or what you plan to do. Even then it's usually just the case that you'll get back poor quality data, not that you'll, I don't know, withhold treatment of syphilis from your participants.

If you've ever completed an IRB training course you might know what I'm talking about. If not, we'll come back to it in a bit.

I hate to be the one to have to tell you, but websites do this sort of user experience testing all the time. Can you really think of any website (sure, other than the Google homepage) that has remained static for the last decade? Take a spin on the wayback machine and check out 2005 Amazon:

https://web.archive.org/web/20050714084608/http://www.amazon.com/exec/obidos/subst/home/home.html

Or 2001 Yahoo:

https://web.archive.org/web/20010815022655/http://www.yahoo.com/

Or 1997 Geocities:

https://web.archive.org/web/19970702235214/http://www15.geocities.com/

Or any number of sites that you will find to be almost unrecognizable from their current iterations. The changes to these websites over time isn't random blundering in the darkness (okay, except maybe for Yahoo). Changes to these websites actually follow a fairly tried and true system of focus testing driven evolution.

The use of the term evolution here is no accident, as what these websites have the potential to do is differentially manipulate the underlying digital makeup of their webpage to see which fares best in the wild.

Have you ever been part of an early roll out? Probably, and you probably didn't even think about it that much. Remember when you had to have an invite to sign up for gmail? You don't? Ask your grandparents, youngin. It is often the case that you might even be mad to not be in the early roll out.

Google does this sort of small scale testing all the time. To be fair, a large part of this is ostensibly stress testing the system in ways that regular testers can't, but the process also has great potential to just see how people react to changes. You would never know, at least in the moment, if you were in a Google early roll out that actually had two different experiences. If it was subtle you might never know.

People are already clamoring to be part of these roll outs, and they are already there to test the functionality of the product. So what if half the roll out had a minor tweak to the experience that didn't alter the functionality? Think of Gmail Google chat integration, years ago. It almost certainly had a soft roll out at some point. Nowadays, Google chat has a firmly rooted place on the left side of the screen. In a roll out on a few thousand people, you could split the sample in half and give half the left side chat and half the right side chat. They're already there to comment on the functionality and if that variability of position matters it should be one of the things they mention.

You really should be more worried that giants like Google might not be doing this. It's lost opportunity, and it really is some low-hanging fruit.

Think about it. Yahoo wants to know if making all the tables on their fantasy sports site look like semi-opaque vomit scattered with misdirecting links will make people visit those pages less. Pull down a subset of the user base (read: thousands or millions of people) and change the tables for half of them so they look horrible, and leave the other half of your sample alone. Record visit and click data over time.

This is as far as the facebook got, this time. It also appears to be the point where Yahoo stops, coincidentally, before just rolling out the half-coded vomit tables. Thanks, Yahoo. Thanks.

There seems to be a line in the sand here that is occasionally washed away by the tide and then redrawn. This time, on this day, the facebook seems to be on the wrong side of it. It has incurred the wrath of the internet (are we keeping score, because this isn't the first time).

I said in the title that this is something you really shouldn't be surprised by. The facebook makes absolutely no qualms about the fact that your data is their data, and your data is how the Zuck gets paid. You are not the customer, you are the product. This is not news. I am not some prophet coming down from the mountains telling you this. This is common knowledge. Advertisers are the customer, and your page impressions (visits) are what grease those wheels. It is not really that tricky of a business model to grasp. Let's break it down:

Phase 1: Collect users
Phase 2: ?
Phase 3: Profit

Simple, right? Unfortunately, the above seems to be pretty much how people perceive the facebook as operating. Best not to think too hard about that Phase 2.

It boils down to the simple fact that [more clicks] = [more money].

Have you been to a "news" website in, say, the last five years? Have you noticed how the headlines are really 80-90% clickbait? We could go to any "news" site out there and pick a dozen headlines that sound like:

"4 signs the stock market is overheating"
"6 months in, how's Colorado pot?"
"10 places we dare you to go"
"James and the worst headline ever"
"What a shot! 32 sports photos"
"The most powerful celebrity is..."
"THIS is to blame for car accidents"
"$32 for a hot dog?!"
"Watch dude's crazy pants dance"
"The upside of Pippa's backside"
"These celebs are sexy in their 50s"

Those headlines all happened to be from CNN, from the top half of the page (it gets worse down below). I was thinking of doing more sites, but really lost the will for it after that set.

Is that enough baiting? Have you yet to wonder, 'why'?

Well, people in the comments on CNN wonder about it all the time. Go to any of these stories and start reading comments (warning, strong fortitude recommended to ever read internet comments) and you'll quickly get to handfuls upon handfuls of people posting some variant of:

'wow cnn great clickbait this article sux'

Quickly followed with comments by those even more unfortunate souls of the internet, not realizing their own advice also applies to themselves on both the original level and the additional feeding the trolls level:

'but here you are still reading it, and still posting about it'

One can only hope (for CNN's sake, not for humanity's sake) that CNN has actually done this kind of experiment and found that people are more likely to click on the sensationalized clickbait than the normal, well, journalism. They've found their new model, and it's pretty simple, too:

Phase 1: Collect clicks
Phase 2: ?
Phase 3: Profit

Are you sad about that? Angry? As angry as at the facebook? What makes it different? Really start to ask yourself: 'what makes it different?'

We have a fairly bad history of cherry-picking the evil de jour based on some pretty sketchy foundations, without considering how many other things we should actually consider evil.

Now, to get it out there, the facebook is decidedly evil. I'm going to tip my hand on that one. There is simply no way around it in my book. Frankly, though, that's my opinion. You might love them. Sure, go for it. I long ago stopped proselytizing against the facebook, as the only really good place to do it is on the facebook. It just starts to feel a bit too much akin to a steadily growing ouroboros.

That said, my political views on the facebook are (happily) still listed as 'anti-facebook'. Small victories. Take 'em where you can get 'em.

The astute reader might note that while I consider the facebook evil, I still use it. Well, yeah. Sometimes it is all about the devil you know.

So maybe you've had a chance to think about the question I asked above. What makes this thing, this time, different from all the others?

Oh, it's because the facebook was actually trying to manipulate your emotions. Well, judging from the outcry, just talking about this study had a much larger effect size than actually running the study. In terms of bang for your buck the real punchline would be if talking about this earlier study was itself the actual study. Soooo meta.

I'm sorry to say that the facebook does not seem to be so clever.

So the facebook moved around the contents of the crap your friends had to say and gave you a crappier or rosier version of the world outside your window. They did this for a week, for something like 700,000 people. Then they looked at the things you posted, and the quantity of the things you posted, because that's how they operationalized your emotions.

Guess what. Half a million people might be right around the place that some of these statistical tests become overpowered. You know, give or take a half a million people. I guess no one at the facebook knows what a power analysis is? Or maybe the software to do one was just too expensive? Oh, that software is free? Well, maybe they didn't have administrator rights to install software on their machines. That's probably it. Always so hard to find the tech guy at such a big company.

So, they found some /significant/ results. I put slashes around it because I honestly have no idea what kind of quotes would even be appropriate around the word significant here, for so many reasons.

Sure, there's a difference between the groups, so they were able to manipulate something (noise?). Sure, that difference, albeit small, is statistically significant. Once you have a few thousand people you really need to start watching for significant but small (SBS) effects. The difference between these groups is non-zero. That is uninteresting. Frankly, the burden starts to fall on the "statisticians" at "the cornell" who even accepted a "sample" of this size. You don't need a crystal ball to know that this exact result is almost guaranteed from a sample like this. Finding no significant effect here would have been the impressive result.

I will leave it to the statistically inclined (or reclined) reader to run the odds (they are calculable) on finding a result in the absence of a result with this sample size on this test (they ran two sample z-tests, it would appear).

So the facebook did this, and they revealed a few things. First off, they're particularly bad experimentalists (if not good showmen). Also, they might have been able to change some people's moods a little. They also might have caused people to post a little more or a little less than normal.

Don't miss this among the noise, because this is the point that the facebook really cares about. Clicks are dollar dollar bills y'all, and the more things people are posting the more clicking they are doing and the more clicking they are causing their friends to do. If they had simply been happy with that result and taken it to the bank no one would be any the wiser and they'd be that much richer.

But for some reason, they decided they wanted to publish this. Beyond that, they wanted to call it an effect that people should care about. Turns out people do care about it, but maybe not for the reasons they expected. Hint: it isn't because this is a large effect. How big is the effect?

Well, the long and the short of it is that this is no the Stroop Effect.

It can really be said that, other than collecting a huge amount of data, the facebook study really has nothing of results to speak of. Have you seen all those decimal placeholder zeros in their effect sizes? If I can express an effect size just as concisely in scientific notation as in decimal notation I think we're safely in the zone of not very big. There's also a joke there about placeholder zeros, maybe something with like I haven't seen this many placeholder zeros since the line at the last midnight showing of [insert popular movie you dislike].

Let's put this in a different framework. I mentioned the IRBs before, and if you don't work at a place that has an IRB you might not know that it stands for Institutional Review Board (now you do). These are the folks that give the ethical green light to research conducted on human and/or animal subjects at, well, academic institutions. The facebook doesn't have an IRB, because they're a corporation, and they don't have to worry about ethical research. *shrug* Tell me I'm wrong.

IRBs exist because it turns out that humans wanting to do research tend to kind of turn into jerks when left to their own devices without any regulation of their work. What kind of jerks? Bad ones. Much worse than those I'm about to talk about, if you can believe it and bring yourself to Google it.

Anyway, if you've run into a few studies that set the groundwork for modern IRBs you might be familiar with Zimbardo's Stanford Prison Experiment.

We talk about this experiment for a lot of reasons. That said, it is genuinely hard to argue against the fact that the main reason we talk about it to this day is because it worked. The effect sizes were huge. The guards - normal people - accepted and internalized their roles so completely that the result became downright mortifying. Like, torture. The study was stopped early, after six days, when it finally became clear that things had been out of control for, give or take, six days.

What if the Zimbardo Prison Experiment had an effect size of d=0.001 (the size of the facebook effect)? What if the guards acted pleasant and, well, normal? What if everyone carried on as happy and friendly folks and after two weeks everyone just went their separate ways? Or after a week everyone decided to switch roles, and then everyone was still pleasant and cheerful? Would we care as much?

The answer should be yes, but I would imagine that many of you might say no.

This is one of the most worrisome parts of this the facebook study that no one is talking about. We seem to be concerned about the after the fact ethical ramifications of research based largely on how big of an effect was found. People are giving the facebook a pass here because they didn't really find anything of substance, but what if this the facebook mood manipulation study had worked as well as something like the Zimbardo Prison Experiment? What is the end result of a normal effect size in this case?

At this point, I guess we don't know who these 700,000 people were. I've seen no reports of them being debriefed (something IRBs make you do), so to my knowledge no one knows if they were in these groups.

What we do know is that we're well within the law of large numbers. The odds scale (not necessarily in a linear fashion, mind you) with the number of people manipulated. If this effect had been large it is not outside of the zone of possibility (in fact it is decidedly well within the zone of possibility) that this mood manipulation might have made someone who was already sad just that much sadder, just enough to push them over the cusp to something a bit more drastic, like suicide, or homicide, or both. Even as a small effect on this large a sample there is still that risk, just smaller. It is a bit morbid to think about, but it is our job as ethical researchers to think about these things before the fact.

I'm having trouble finding 2012 data on the number of suicides in the US, but the 2010 number is right around 38,000. What are the odds that this the facebook study drove at least one person to suicide in 2012? Well, the odds might be small, but they are at least finite non-zero.

Think about it. If their sample was randomly drawn from the population then we can run some really quick back of the envelope math to show that if 38,000 (# of suicides) of 314,000,000 (population of US) people are committing suicide in a given year then we're looking at a little more than 1 suicide in per every 10,000 people.

Do you see where I'm going? With a sample size of 700,000 people, something like 70 people they selected into this study would be expected to commit suicide sometime over the course of the year just by sheer numerical chance.

You should really think about that. I'm not just making up these numbers.

A whole bunch of you are going to throw up your hands like a wacky waving inflatable arm flailing tube man and say 'oh hai hypocrite for talking earlier about sensationalism but now being super sensationalist', but hear me out.

Philip Zimbardo has his supporters and his detractors, and his general stance on the Stanford Prison Experiment (after the fact) is that he never expected it to go so far and he found himself wrapped up in it and continuing it against his better judgment. It's the sort of thing you have to say, but it is also believable that he never expected it to go as far as it did. The effect size was drastically larger than what a normal person might expect. The lesson to learn here is that just because you think that things can't go horribly wrong is no reason to progress with a situation where things could go horribly wrong.

Like I said, this experiment is one of the things that led to IRBs having more and more control over this process. The job of an IRB is not to look at a study after the fact and say how much damage it did, but to look at a study before the fact and say how much damage it might do.

If you've filled out an IRB application you might know that you always have to specify risks to participants that might never happen, even if you tend to do very low-risk research. The reason you do this is because the whole point of an IRB is to consider the worst case scenarios. Let's call them the places where d > 0.001.

So what are the potential risks of this the facebook study?

Take a quarter of a million people, and you're likely to find a few people in there that are particularly sad. You are going to find a few that are walking that ledge between rational and irrational decision making. By making those people sadder you are walking a fine line, and by walking that fine line 700,000 times you're just increasing the odds that something bad will happen. That's what we should be worried about if we are the IRB reviewing it before the fact. We are now after the fact, and it's already done. If you want to do some crazy investigations, it is those people who are the ones to hunt down and make sure they are okay. That is where the continuation of this story is, not on the 'the facebook made me angry but not sure why' or 'the facebook are a bunch of meanies and they won't even apologize and omg the Google+'

Take note, though, that even if you find a few people who were part of this study who committed suicide you are already expecting a few in both conditions by chance. You'd really need to piece together that full contingency table to say anything definitive.

Oh, or you know, take a reasonable sample from their sample of 700,000 and run your secondary stats on that.

The bottom line, though, is that the facebook is well within their rights in the current system to do whatever they want to their users with no regulation or oversight. Welcome to the 21st century. The only check to this free reign is if the things they do stop people from using their product, or if they break the law. They are a corporation, and they do not have an IRB, nor do they seem to have any of the ethical constraints that they would have at an institution, like, I don't know, the cornell.

The fact that some guy at the cornell entered this process after data had been collected fits into an odd loophole of IRB lore. As long as the data is already in existence there is much less intense questioning of how ethically or unethically that data was obtained. Was this data collected unethically? Well, lack of informed consent and/or debriefing seems to be the major red flags that point at yes.

It's hard not to see the ways to con this system. It might sound fairly conspiratorial (it is), but all a corporation like the facebook would have to have is one guy with a passing knowledge of research (it actually kind of sounds like they got the D student in this case) and then an exceptionally mutually beneficial collusion with someone who runs their numbers from a research institution to give it some publication cred (and IRB cred, which is often required in the publication process).

There are a lot of people out there not trying to con this system, but maybe this is a point where we are all just due for a collective slap on the wrist and a firm 'this is why you can't have nice things'. Maybe data not collected under the supervision of an IRB should never be granted IRB approval for analysis? That might be too harsh, but look at where we are. Look at what we've become.

The facebook shouldn't get a pass here just because their experiment sucked and they found a really weak effect. The facebook should be responsible for the whole range of things that could have happened, including effect sizes greater than d=0.001. Is it possible that the facebook caused the death of at least one of their 700,000 test subjects? I hate to sound like this guy, but yes, it is at least possible. If they didn't it is only another testament to how small of an effect they found.

The facebook also shouldn't get a pass on failing to meet the basic expectations of human subjects research like informed consent and debriefing just because they're a corporation and don't have an IRB. Unfortunately, this is something that they've already gotten a pass on from our past-selves. This is just regular corporate research, and the reasons that the facebook got called out this time are 1) they are huge, 2) the study sample size was huge, 3) they got greedy.

If you're mad about this the facebook thing, this is what you should be mad about. You should be mad at all of us, yourself included, for not worrying about this giant loophole until someone stepped right through it guns blazing, and also the bullets coming out of the guns are bad research.

At the same time, though, you shouldn't be surprised. You shouldn't hold the facebook to some personal standards that they have no hope of ever adhering to, and you shouldn't try to put them on some pedestal like this is something that you're shocked they did. The facebook is not kid George Washington bravely stating that they will never lie. This is not outside of the facebook's comfort zone, and they are likely to do very similar if not identical things in the future. Other companies are probably doing pretty similar stuff at this very moment. If you don't like it, stop using their service. Yeah, go ahead and try.

At the same time, you should again be more mad at the system that allows the facebook and the cornells to do this with absolutely no limitations or repercussions. It appears, at least at this point with the information that we currently have, that everyone was working within the bounds of the system we have put in place. They might have colluded or taken advantage of weird loopholes, but unless we find out something weird it does appear that they were technically correct in their actions.

I said 'technically correct', so I guess that just leaves us this to wrap things up:

Thursday, February 20, 2014

The Beautiful Experiment that is Twitch Plays Pokemon

Maybe you haven't heard, but the Internet recently decided it wanted to play Pokemon Red/Blue. That's not to say that the Internet decided that everyone should play Pokemon Red/Blue on their own, it is to say that the Internet decided it wanted to play Pokemon Red/Blue, as a hive mind.

Watch live video from TwitchPlaysPokemon on www.twitch.tv

Apparently, progress is being made, as this highly accurate diagram illustrates.

(from the official? @TwitchPokemon here: https://twitter.com/TwitchPokemon)

Poor ABBBBBBK (, you are missed.

Let's take a step back, shall we?

The original notion (as far as I can figure), was to do exactly what I said above - play Pokemon Red/Blue as a giant hive mind. The inputs to the game (e.g. up, down, left, right, a, b, start, etc) were (quite brilliantly) linked to the chat window you see scrolling like mad on the right side of the screen.

Let's look at a simple specific case before the general, shall we?

If just one person was playing, they could probably beat this game in a bit more time than normal. They would want the character to move up, and instead of pressing the 'up' button, they would type the word 'up' in chat. It's more or less a text based adventure - a hybrid if you will. Instead of typing 'go up' and then having the game return text that tells you what happens, 'go up' returns a picture, a graphical user interface that is simply text driven (but not exclusively text presented).

Like I said, anyone who could beat the normal Pokemon Red/Blue could beat this version, it would just take a while longer. The problem is simply that typing 'up' is slower than pushing 'up'.

Consider the slightly more difficult case where two people play this game together. It's not that much more complicated, and realistically progress would probably still be made. If they were really good, they might actually even play the game as fast or faster than one person. Two people playing the same game isn't anything new, and some people are actually quite good at it:

Watch live video from SpeedDemosArchiveSDA on TwitchTV

The people above are in the same room, though. They're seeing the same thing and they have the same goals. They're working in sync, and they're good.

Consider a slightly more complex situation. Two people are playing, and one of them just wants to finish the game. That is the goal of the first person. A second person is playing, and that person wants to make sure the first person can't finish the game. That is the goal of the second person.

The second person doesn't have to do as much work. They can strategically wait to enter inputs at inopportune times. The first player's goal is far off, and requires a long strong of correct inputs. The second player's goal is more proximal - to make the first player fail in the moment. It's often hard to go backward in a game; the march of progress over time is somewhat in favor of the first person.

In a game like Pokemon, the nickle and dime approach of making the character do the same thing over and over again often amounts to fighting Pokemon over and over again (or walking in circles, or consulting the great and powerful Helix; it's a bit of a mix). Fighting with the same Pokemon over and over is nothing more than level grinding, and it makes those Pokemon stronger. Have you ever spent one or two hours level grinding in Pokemon, only to find your Pokemon are super strong? How about level grinding for one or two days?

I mentioned that this two player game is a simple specific case of a more general problem. Let's make it a bit more complicated. Now instead of just two players playing - one trying to play and one trying to stop play - you open up the floodgates to the Internets. Anyone can join, and anyone can enter 'up' just as much as the next guy. At the same time, anyone can enact sabotage, or make sure that it's not time to use the SS ticket.

That went on for a few days, and again, progress was made. Think about that, for a minute, next time you wonder what the power of the Internet can do. The hive mind is triumphant, if only in bursts, and only temporarily. It's still no speed run, but at the same time it might be just as - if not more - impressive.

Think of the data being generated in that sidebar. Watch the feed for a few seconds and you realize that the chat window is like a waterfall. It is unrelenting, and constant. It is the water flowing over the stone, knowing that - while any given moment may look like chaos - time will be the great decider, and time will rule in its favor. It is a leaf on the wind, and you are able to watch how it soars.

Too soon?

Anyway, that was the first few days. It was interesting enough, and then things got real.

This system, as described, was called (perhaps quite accurately) anarchy. To move things along, democracy was enacted. Commands in the chat were aggregated over a period of time (it varied between 10 and 20 seconds), and whatever command had the most votes in that time period was entered into the game.

This slowed things down, as checking the great Helix Stone now took 10s of seconds, but it also sped things up, because the majority of the hive mind could act without worrying about sabotage of those that might be in the minority - as long as those trying to sabotage stayed the minority.

Well, people didn't like this. I think I like the spirit of it, but see that it takes something away. Something chaotic, and strangely beautiful. Some people didn't like the fact that it stopped them from being antithetical, and some people didn't like the fact that when those who were trying to sabotage got organized they were actually better at halting progress of the game (by doing things like spamming pause in the majority against all other less organized and thus more disparate commands).

This is where things really get interesting, and where they really start to produce a very beautiful decision making process.

The Helix was consulted, and a slider was created. You can see it above the chat feed in the stream, or here:

In that image, the solid slider line is on the side of anarchy, which means the rules are as they were in the before time, the long long ago. In order to move things back to democracy - and here is where it gets especially beautiful - the votes for 'democracy' in the sidebar must reach 80% (at one point it was 75%, and might change again) of the total vote between 'democracy' and 'anarchy'.

That is why you now see in the chat window the addition of the words 'democracy' and 'anarchy'.

In addition to being able to type 'up' and make the character go up, you can now type 'democracy' and move the slider a little bit back toward that dotted line at 80%. To get back to anarchy from democracy the slider only needs to get back to the 50% mark (they worked out that this is the fairest based on how people were voting).

As a player, your decision tree is now substantially more complicated, whether you want to play the game or disrupt it.

Collaboration between individuals - nay, between complete strangers - is demanded in ways that make Journey look like solitaire. (By the way, you should really play Journey)

There are clearly two camps - those that want to play the game and those that want to stop the game. You might be able to argue that there is a third camp that just wants to watch the world burn, but they're probably just a subset of the second camp.

There are now also subgroups that think the best way to accomplish their goals is through anarchy, and those that think the best way to accomplish their goals is through democracy. There is a constant push and pull, and some set of those people always have to be watching that slider. If you're an anarchy player, you need to occasionally spam 'anarchy' in addition to trying to play the game (or trying to stop play).

When people talk about Big Data, well...here it is.

Think what you could do with this stream of data. Think of what this stream of data is.

I tried timing it a bit by watching when a command enters the bottom of the chat and seeing how long it takes to get pushed off the top. It's about 2 seconds on average during the time I was checking, and there are 17 comments on the screen at a time.

It's a lot of back of the envelope calculating, but that means that (lets err on the low side) about 8 commands are coming in every second. This has been going on for, well, days. It probably started out slower, as I'm currently watching it with 70,000 other people. It has also had its peaks, as reports claim that the 100,000 mark has been broken a few times, and the 120,000 barrier at least once. Complaints are also in that the strain from this chat stream has caused lag on other twitch streams, to the point that twitch has moved this stream its own dedicated server.

Let's say I'm seeing a peak, and shoot low. Let's take half what I'm seeing as a low estimate of the average, and say about 4 commands come in every second. That's 240 commands a minute. 14,400 an hour. 172,800 a day, and yes, 1.2 million a week (which is right about where we are, currently).

Patterns and randomness and data, oh my.

Is it random data? Well, no. Is it patterned? Well...moreso?

Watch the chat window for, I don't know, a few seconds? People are throwing data at you in a glorious battle of signal and noise. Where is the signal? What is the signal?

Better yet, what could you do with this data without context? What could you do if you covered the left half of the screen? Or were given this string of data in a (very large) text document? What does this string of data represent?

A million monkeys at a million typewriters will eventually type the works of Shakespeare, sure. The probability is finite, but negligible. Also, you have to feed the monkeys, and they would probably get bored of their typewriters after a while. Also, you need to replace the monkeys when they die. Also, clean up after them. It has a lot of assumptions built in there.

Might this stream eventually produce Shakespeare? Well, I don't know how much 'up up down down left right left right b a' there is in King Lear (I'm guessing not much), but I'm pretty sold on the idea that they might be producing something else.

Is the data repeating? Is some representation of pi in there? e? i?

In 1995, the people running the Hubble telescope had some time on their hands (I'm simplifying, sure). They decided to do something different. Instead of pointing the Hubble at something they could see, they pointed it a region of space that looked dark. A tiny fraction of the sky - about 1/24,000,000th of it. They thought nothing was there.

So they left the telescope on (again, simplifying) for about 10 days. This is what they found when they processed the resulting images:

It's called the Hubble Deep Field. You might say, 'wow, that's a lot of stars'. Well, you're right, and you're wrong. See those things that have Xs shining about them? There are three of them, one in each of the lower center squares and one along the left side of the center.

Those are stars. The rest of the things are galaxies (which I guess contain stars, but lots more of them). This is what's going on in 1/24,000,000th of the sky.

In a few days the Twitch Plays Pokemon will have been on for 10 days. I wonder if the hive mind of the Internet isn't somehow doing something similar to what the Hubble did. Is there anything readily apparent in the chat that scrolls by pages at a time? No. But we weren't seeing anything in that 1/24,000,00th of the sky, either. One quick slice of the data from the Hubble Deep Field is meaningless, and it's only when assembled into one singular image that things become apparent.

It's perhaps unfair to compare Twitch Plays Pokemon to the Hubble Deep Field, because a lot of people working on the Hubble probably did have pretty good guesses about what their attempt would produce. It wasn't a completely blind decision, and it didn't happen by chance.

The nice thing about a telescope is that we know how it works. The pieces of that image line up fairly nicely, and we know where each pixel belongs. This post is certainly much more philosophical than normal, but I wonder how you might start to examine the data being spewed into this magnificent social experiment that is Twitch Plays Pokemon. The thing that makes this different, I think, is that no one really claims to know how Twitch chat works normally, let along in this very specific and very unusual case. It's less looking visually at a blank region of the sky and more looking at the quantum foam of a blank area of space.

Is it noise? Is it signal? What makes one or the other different when the signal is partially noise itself?

I also wonder if someone is keeping records of it. Twitch? Maybe? The log file would have to be getting large, to be sure, but they might have the space. One can only dream, perhaps. Do you work at Twitch? If you do, press the 'RECORD EVERYTHING' button now.

Anyway, watch it for a while. Look for the signal in the noise, and wonder if there really is one. Wonder why we as scientists and researchers didn't think of it first, and lament the fact we'd never be able to replicate it.

Such is the hive mind; and such is not the world we are slowly entering, but the world that is already around us. It is the world where tens of thousands of strangers play Pokemon together as a competitive text-based RPG and create their own mythology along the way regarding tickets and stones and false prophets.

Maybe you're feeling a little lost, and maybe you're feeling a little overwhelmed. If you find yourself a bit confused, just remember:

Image borrowed from @LordSevein

Wednesday, January 8, 2014

Let's talk about wind chill

It's that time of year again where - depending on where you live - it sometimes gets kind of cold outside. Sometimes it gets really cold. Sometimes it's windy.

This leads to a lot of news about the cold, and specifically about the wind.

I overheard a conversation the other day that went more of less like this:

Person A: "Did you hear? It's supposed to be like -40 degrees tomorrow"
Person B: (skeptically) "Really?"
Person A: "Well, you know, wind chill"

You may have taken part in conversations like this yourself. I've certainly heard my share of them.

For those of you who understand how the wind chill works, you're maybe wondering why people don't get it. Well, part of it may be that this is how it's calculated:

Where the variables to be entered are ambient temperature and wind velocity (speed).

The result is just a number that is expressed in a different scale that people widely understand (temperature). It's very easy to just take a wind chill number at face value and interpret it as a raw temperature.

In fact, wind chill is somewhat designed to do just that. It is meant to be a perceptual scale, to give an idea of how cold the air feels to a human. It's a harder measurement problem than it might first appear.

Let's take a step back. If you've ever taken a class on meteorology (I'd highly recommend it, if you have the opportunity), you might have encountered the following question in some form or another:

>>
Bob and Sally live in the Rocky Mountains near a particular canyon that - especially in the winter - has the effect of producing exceptionally strong winds. One day during winter break Bob and Sally have run out of things to do, and are staring out the window at some light snow flurries that have just started to fall. The TV is on, and they hear the local weatherman reporting the current temperatures.

"The current air temperature is 34 degrees, but the current temperature with wind chill is only 15 degrees."

Bob suddenly has an idea to cure their boredom. "Let's put a glass of water out on the porch and then watch it freeze!"

Sally looks at the thermometer mounted to the porch and confirms what she's just heard from the weatherman - the air temperature is 34 degrees. The wind, however, is quite strong, and she has no problem believing that the temperature with wind chill might only be 15 degrees. "That won't work, water doesn't freeze above 32 degrees!"

A argument ensues. Which child is correct?
>>

Think about it for a while, because it's a fun thought experiment. I ask this question of people occasionally every winter, and far and away people tend to agree with Bob. Interestingly enough, Sally is the one who is correct. We'll come back to it.

Temperature is a fairly complex idea, but it's also pretty easy to measure objectively (the hard part was all the early work in establishing scales, etc, that is). The problem is that a person can be standing in an open field in 20 degree weather and no wind and have a completely different experience than someone standing in the same field at the same temperature with 50mph winds.

Those of you who have never been around Chicago during the winter months might only have a vague idea of why it's really called the Windy City. I've often said that I'd take much colder temperatures without wind than less cold temperatures with wind, any day. I almost just typed it without thinking it, because it's something that you just learn to say in a knee-jerk sort of way:

"The wind is great at just ripping the heat right out of you."

Does anyone out there have a freezer with, like, a glass door? I kind of want one now, but it seems like they wouldn't be great except for things like this. Put a glass of room temperature water in it and then pull up a chair.

The water isn't going to freeze right away, but it will freeze eventually. Anyone waiting for a tray of ice cubes to set knows this well.

Now, watch a few episodes of any random Food Network cooking challenge show and you'll likely see someone use a blast chiller. You put stuff in, and bam! it's frozen pretty quick. It's not instant, but certainly faster than your freezer at home. What's the difference?

Well, it's in the name. A blast chiller is just a freezer (sure, temperature can vary with different units, but temperature could be held constant across devices and the blast chiller would still work quicker), but instead of just making something cold by looking at it really hard for a while (like my freezer), it blasts cold air over the thing being chilled.

The wind chill in a freezer is non-existent, because there is no wind. The wind chill in a blast chiller is, well, more extreme than that.

The reason why people aren't going out to spend more money on blast chillers instead of freezers is because we're usually fine with waiting an hour or two for our ice cubes. Time isn't a huge factor, and a freezer will eventually get the job done.

What's happening when you put some water cubes in the freezer? You might say, "well, the freezer is making them cold."

More accurately, the heat in the water is being lost to the cold air of the freezer.

Most accurately, the freezer-water system is tending toward equilibrium.

Two things in a system at different temperatures means that the warmer thing will give up heat to the cooler thing until they're the same temperature. So, your freezer will get a little warmer, and the water cubes will (eventually) turn into ice cubes.

Now, as a thought experiment, what would happen if you made a big pot of chili, and put some of the warm leftovers (in a freezer safe container) into your freezer. Or put all the chili in there. Make some water cubes at the same time. Make all the water cubes you can. Make them with boiling water. If you put that much warm stuff in there, the system should equalize on the warm side of the freezing line, right?

Now we're getting somewhere. It should not be a surprise that if you come back a day or so later, the water cubes and the chili will all be frozen. Take that, warm stuff, says your freezer. Install some glass in the door so next time you can watch me pwn all these water cubes.

Your freezer might have had to work a little harder than normal, but the reason this worked is that your freezer isn't just a cold box (give it some thought, though - a century ago, an ice box would not have been talking to you about the l33t pwnage it just unleashed on those water cubes, as it probably would have succumbed to the heat). Your freezer has a motor, and can magically make the air inside it cooler (okay, enough asides, but most of you probably have no idea how your freezer actually works to make air colder, at which point you're viewing it as magic - it's a hard problem that you should try to understand, there's a reason that boiling water was one of the first things we were capable of doing as a species but that freezing water artificially took us until the last century).

Because your freezer has a motor, it can regulate temperature. As long as it is plugged in and has power it can work its magic against pretty much anything. Give it time and it will get things to a certain temperature.

Magic aside, this should make sense to you. You do this all the time, and probably take it for granted.

If you understand this (and maybe even if you don't), you should see why humans are able to go outside when it's 50 degrees without dying.

You see, your body has a temperature range it wants to stay at, just like your freezer. Put a whole bunch of boiling water in your freezer and it will bring it to the temperature it's set at. Eat a whole bunch of ice cream and your stomach won't freeze, your body will simply regulate itself and keep your temperature where it wants it to be (upper 90s, etc).

When you stand outside in a 20 degree field with no wind, the air around you is acting like the freezer. It's trying to freeze you, and bring you to 20 degrees. Your body, in the upper 90s, is going to lose some heat, and that heat is going to warm up the air around you. The process of you heating up the air around you and that air heating up the air around it and so on and so on is a slow one, just like your freezer trying to freeze some water cubes.

At the same time, your body is expending energy to produce heat to keep you warm. It's fighting the heat loss, and if it can produce heat faster than you're losing it, then you're perfectly fine. You're technically losing heat all the time on a 70 degree day, but you're probably not too worried about it.

Heat is lost through exposure with the air, which is why it's a good idea to a) wear clothes and b) minimize skin contact with the air. Gloves, hats, etc. Think about it, though, what do coats actually do for you? You know this, you've probably just never said it in these terms. Coats don't produce heat (well, awesome ones might), they just trap the heat that your body is producing and slow the loss of it to the air around you.

So you're standing around in a field having a heat fight with nature. Good for you. Nature is a lot like the freezer, even on a day with no wind. It has a lot more reserve of cold air than you have heat, and unless the temperature rises nature is probably going to win. The result? Human cubes. Ice human? Take your pick.

The point is, though, that it will probably take a little bit of time. You can hang out for a bit fighting the air with your heat, but at some point you'll probably realize you're losing and go inside for a cup of hot chocolate.

Now let's imagine the same situation, but with a bit of wind. Remember that pocket of air that you're warming with your body that then has to warm the air around it? That slow process? Well, that pocket of warmer air just got swept away by some wind, and replaced by air that is just as cold as the air before you started warming it. Oh, you'll just warm that, too? Oh wait, it's gone.

Standing outside fighting the cold with your body heat on a windy day is not like being in a freezer - it's like being in a blast chiller. Nature can cool you down a lot quicker with wind, because that wind is constantly replenishing the cold air around you and pulling away any air that you might have heated.

But how much can nature cool you down? Now we're back to Bob and Sally's argument.

Temperature, and particularly heating and cooling, is all about equilibrium. If we turned your refrigerator into a blast refrigerator it wouldn't start freezing your food (unless you turned down the temperature, or put stuff in the waaaay back), it would just get them to the temperature your fridge was set at, faster.

Imagine that while fighting nature the first time you decide against going in to get that cup of hot chocolate. Days later, your body is found frozen, fist still held defiantly against the cold as if in mid-shake. What temperature is it at? Well, whatever ambient is - 20 degrees in this example.

How about the second fight, with wind? Your body, found frozen days later will be what temperature? Ambient. 20 degrees. It just got there faster.

In the same way, a glass of room temperature water placed outside will eventually get to ambient temperature. It has no magic motor or circulatory system, so it's not really going to put up a fight. The question of how fast it will freeze (holding starting temperature constant) is really only up to a few factors. The main factors? Ambient air temperature, and wind speed.

A higher wind speed can cool (or heat) a thing faster, but it can't cool (or heat) a thing beyond the temperature of the air around it. It is simply a tool of the equilibrium.

Wind chill, then, is trying to paint a picture of how fast your body will be brought down to equilibrium. What's the equilibrium temperature, you ask? Well, if you're talking about wind chill (wind chill only works as a calculation below 50 degrees) it's probably below 98.6, so...low enough to kill you (with time).

Your body isn't completely defenseless, though. Well before nature kills you, your body will start to sacrifice parts for the whole. Do you know what needs to keep working to stop you from dying? The stuff in your torso. Do you know what doesn't need to keep working to stop you from dying? Well, pretty much everything else.

When your body stops pumping heat (via blood) out to your extremities, you've probably made some poor decisions. You're in the danger zone, and that danger is frostbite. How fast will frostbite set in? Well, that depends on two things, ambient air temperature and wind speed.

Wind chill is based on those same two factors because both of these things are tapping the same underlying quantity: how quickly is nature going to kill you. Give nature enough time and it always will.

Because this one quantity (wind chill) is based on two others (ambient and wind), we can also make a pretty cool graph of it. Thankfully, though, we don't have to, because the NOAA already did, and it's public domain. So, here you go:

Now stay warm.