Can Polls Be Trusted?
After the great upset that was the 2016 presidential election, many were quick to point a finger at the polls. “Oh, the polls got it all wrong!” analysts screamed, citing the large errors in Wisconsin and Michigan. But this conclusion is too simple a way of looking at things.
Private polls, commissioned by polling groups, often universities or media companies, have been the most accurate form of predictive data for decades. As data has improved, tools such as poll averages (which add multiple polls on the same topic), and more advanced models have developed.
So why were they all so wrong this time? The truth is that polls were wrong in four or five states, but other than that were perfectly accurate. The final poll average, according to Real Clear Politics, was Hillary Clinton +3.3. The most recent results show Clinton’s popular vote lead hovering around Clinton +2. (Remember, Clinton won the popular vote but lost the electoral vote). This difference is significantly less than it was in 2012 (President Obama won by 4 points, though polls had him ahead by 1), and nobody blamed the pollsters then.
The more important thing is that polls were only off in a handful of states. In the swing states of Arizona, Colorado, Florida, Virginia, New Hampshire, and North Carolina, the poll average was off by under 2 points. In most, it was nearly spot-on. Only in Nevada, Pennsylvania, and the midwestern states was there substantial error (and even in Nevada, which Clinton win, they were wrong in Clinton’s favor.)
There is a specific reason polls got the Midwest so wrong. The states of Pennsylvania, Michigan, Ohio, Wisconsin, and, to a lesser extent, Minnesota, all have one thing in common: They’re based around one or two major urban centers with high minority populations surrounded by rural areas. The Democrats built an assumption around high minority turnout in the cities and therefore thought they didn’t have to appeal very much to the white rural voters in the rest of the state. The pollsters modeled turnout of minorities based on the past two elections, when Barack Obama was able to turn out minorities at a level never before seen. This assumption fell flat, while the Republicans got unprecedented vote shares from rural areas, and that misjudgment was enough to swing these four states towards Donald Trump and win him the election.
The polling industry got caught in one crucial error this time, and it will fail every once in a while, but it remains the best system we have. As long as they continue to improve their modeling, they will become more accurate and will ensure that events like this happen less and less often.
Which brings me to another point: My three steps for telling whether a poll is a good poll or just an unreliable outlier. I find these to be the most helpful ways:
Step 1: Past Performance of the Pollster
To do this, you’ll have to do a little bit of research. Look up how well the pollster has done in the past. (FiveThirtyEight's Pollster Ratings work really well for this.) If the pollster is new and has never released much of anything, chances are they'll seem suspicious. In addition, if they consistently publish data that’s tilted toward one side or that’s just plain inaccurate, you’ll want to take a closer look or, at the very least, account for their bias.
Step 2: Accuracy of Questions Asked
Sometimes, a pollster will get a crazy result by forgetting to name candidates in the survey, failing to account for third parties, or asking a specific set of questions biased toward one candidate or party. A quick glance at the poll’s PDF, which will list all the questions and some basic methodology, should let you know if this is the case—though, to be honest, it rarely is.
Step 3: Demographic Samples
Another way pollsters can get strange results is by asking the wrong people. An example I love is this poll of Delaware from the 2016 primary in which the sample was 89% white, including 80% white for the Democrats. The state’s population, however, is only 69% white, and the Democratic primary voters, I’d assume, were an even lower percentage than that (although there is no definitive data). The poll, not surprisingly, produced some strange results that were later proven very wrong. A comparison between the age, gender, and racial makeup of the poll and that of the state or the previous election can provide ample material for fact-checking—although it’s important to note that predicting the exact same demographics may have been how the polls messed up this go around.
If a given poll passes these three steps, it’s likely a legitimate shift and not just an outlier. But of course, viewing an aggregate of multiple polls (RealClearPolitics and HuffPost Pollster are the best-known and my personal favorites) is often more accurate than just one.
So, yes, the polls got it wrong. But polling is a changing process, and the industry has it in their best interests to improve. So rather than abandon them, the best thing to do is trust them to fix their mistakes and continue building stronger databases so they can predict 2018 and 2020 with greater accuracy.
Photo courtesy of andriano.cz/Shutterstock