Monday, February 1, 2016

When polling, don't underestimate trolling

Let's talk about elections.

Primary/caucus season officially starts today with the Iowa, and it's been a somewhat different type of primary pre-season. Put simply, not a whole lot has really gone as expected. Particularly, a number of candidates have maintained strong momentum when examples from all prior modern elections say they shouldn't. Donald Trump is undeniably the greatest example of this.

In conversations about the elections over the past few months, I've been surprised how little weight people seem to be willing to give to the possibility that some finite number of individuals may in fact be trolling the polls, and perhaps even the primary. This election cycle has produced a rather unique set of circumstances that seem to make this more likely than prior elections.

I realized a while ago that many of the questions that I had about this were in fact empirical. I am nothing if not an experimentalist, so I figured: why not try my hand at some polling? This is the first post in...(maybe ever?) that I'm going to talk about some actual survey data that I collected specifically for this post.

Now, I'm not a polling person. I'm not going to say that I'm particularly good at it in any way, and I'm not going to claim to have conducted a poll in a way that would predict anything with any high degree of certainty. What I wanted to know was more a research question than a polling question:

What proportion of individuals would actively vote in the presidential primary for a candidate that they did not support for the general election?

I collected data from around 400 people that responded, for a very very small amount of money, to a survey I put up on Amazon's Mechanical Turk. This is by no means an accurate representation of the United States population, or even of the United States voting population. I was looking for fairly big effects, and relaxed a lot of controls that would make this a much better examination of the issue. Feel free to nit-pick in the comments about all of them, if you'd like. This is more a proof of concept than anything particularly insightful.

I should say, I did require that all participants be located in the United States, and 18. Data was collected completely anonymously, at least on my end. I trust that Amazon checks those things, but who knows?

One of the questions I asked at the end of the survey was exactly addressing what I mentioned above:

"Would you ever vote for someone in a primary or caucus that you would not vote for in the general election?"

Let's jump right into it, shall we?

So, yeah. I guess somewhere between 1 in 3 and 1 in 4 individuals will openly admit to a willingness to mess around with the primary election process of their nation's government. That's...something.

Personally, I'm always interested in what other people think other people think, so I also asked a follow-up question:

"Roughly, what percentage of people, between 0 and 100%, do you believe would vote for someone in a primary or caucus that they would not vote for in the general election?"

I'm not sure what I would have expected this distribution to look like, but there are some interesting spikes. I should mention that these bins of the histogram are forced choices - responses were limited to 0 to 100 in steps of 5.

There's an interesting spike at 50%, which either means that people consider it a coin flip of whether or not others will do this. That, or 50% is a pretty good choice if you have no idea. There's a similar spike at 10% and 25% and to a lesser degree 75%, perhaps for similar reason. If I had to dig a normal distribution out of this on the assumption that there were a number of noisy distributions overlaying it, I'd say it looks like it was centered around 25%. I'm not entirely sure what that means, either.

In all, I mentioned this as sort of a proof of concept, and it seems to support the claim that at least some people are actively trying to disrupt the system as it normally operates.

I did collect a short follow-up to these questions, only for the individuals who said that they would vote for someone in a primary that they wouldn't in the general election. Specifically, I asked:

"If you answered yes to the previous question (that you would vote for someone in a primary or caucus that you wouldn't vote for in the general election) why would you?"

This question was open ended, and I got some interesting responses that seemed to follow a theme:

"To mess the other party's poling"

"Strategic voting to give my preferred candidate the best chance of being elected"

"If I could, I would vote for Donald Trump. That way, it's easier for the Democratic nominee to win."

"It's unlikely, but I might vote for someone in a primary I wouldn't vote for in the general election if my vote might either: 1) increase the probability of a win in the general election by someone I favored, or 2) decrease the probability of a win in the general election by someone I opposed."

"The one reason to do this would be to help elect someone who could be more easily defeated in a general election by someone who one would want to win."

"In a multiple party primary vote I would vote for the worst candidate of the opposite party."

"If I thought the person I voted for in the primary had a good chance of losing to my preferred candidate."

"To set up an opponent who would lose in the general election"

"I would vote in a party's primary with which I do not associate myself. In this case, I would vote for the candidate I perceive to be the weakest in order for my preferred candidate from the other party to have a better percentage of winning the general election. "

"make a lesser candidate win the primary to help my candidate in the regular election"


I do have to say that my favorite response was the simple statement:

"Rand Paul"

While I expected this to some degree (the first part, not Rand Paul), I'm not sure I expected so many people from a random sample to all agree with it, or even produce it in their own words.

What does this really mean, at the end of the day?

Well, it means that polling for the primary elections potentially contains more noise than is commonly modeled. This noise makes polling of the primary less predictive of polling for the general election. Support for a candidate at this stage of the game does not necessarily mean that such an individual supports that candidate to become president. That's...kind of weird.

It's weird enough that the only place I've really seen it show up is crazy conspiracy theories about certain candidates being in the race only to help candidates from the other party. Occam's razor would say that this is almost certainly not the case. I don't think there needs to be a conspiracy for some people to just be acting in a strange but strategic way.

At the end of the day, this might explain some of the discrepancy between what everyone predicted (based on historical data) and what seems to be happening. Poor candidates aren't losing support when expected, in part because that support is based on their standing as a poor candidate. The worse the perception of them, the more of this type of support they should acquire!

Like I said. Weird.

Who knows, maybe many of the people who took my survey were in fact just trolling it, as well. There are a lot of possibilities here.

While this was the main question that I had, I did ask some other questions, as well. Some of them dealt with what the individuals in my sample looked like:

I'm not aware of any studies that have looked at the political affiliation of Mechanical Turk participants, so it's hard to say how accurate this is. It could be that I pulled a weird sample of people, or it could be that the above is pretty close.

As an extension of the above question and the main question from earlier, I also asked participants:

"If you had the opportunity to vote (or caucus) in one of the early voting states such as Iowa or New Hampshire, but could only vote in one of the party races, which party would you vote in?"

The results are slightly different from the reported party affiliation:

I also asked a question that perhaps got closest to actual polling, and is maybe a good place to finish for now:

"If given a choice between all current candidates, who would you be most likely to vote for in the 2016 General Presidential Election?"