Looking Beyond Polling Numbers

The 2020 election is a few days away, and I’m having fun with my friends guessing who will win which states. Here’s my final guess:

Image for post
Image for post
Created using 270towin.com

But people are asking me how I arrived at this guess. Did I just look at polls? Do I trust polls? What other sources did I look at?

So here’s my research method to guessing who will win the White House.

“Do you trust the polls?”
No. Of course you shouldn’t “trust” polls. Polling is a game statisticians play because once every 4 years people finally listen to them. But for most of us it’s entertainment, like watching football scores, not an actionable analysis. Unless you’re a campaign manager, why would you need to trust polls?

I think people look at polls to sooth anxiety or reduce uncertainty. But seeing things like the 2016 polls predicting Clinton’s win in Pennsylvania don’t exactly inspire confidence:

Image for post
Image for post
2016 polling results from Pennsylvania. 24 out of 26 polls conducted in the final month predicted a win for Clinton, with all but 3 stating their margin of error didn’t allow for Trump to win the state. But Trump won.

Polls in 2016 were so overwhelmingly persuasive that Hillary didn’t even write a concession speech — convinced she would easily win. Clinton lost Pennsylvania, and with it her return to Pennsylvania Ave.

So how can polls get it so wrong? And if polls aren’t right, are they useful?

The “likely voter” problem
All polls are based on guessing who will actually show up and vote on Election day. Most people have an opinion, but not everyone votes. For example, in 2016 only about 55% of voting age citizens in the US cast a ballot. If you’re trying to predict an election, you need a sample that will reflect actual votes cast, not just average opinions.

Guessing who will show up on election day is a mess, and pollsters are terrible at it. Divergent opinions on who constitutes a “likely voter” is the biggest reason why polls disagree. A few pollsters publish their methodology (which is a positively thrilling read, believe you me) but it amounts to either talking to everyone and then guessing which people to ignore or else guessing which people to ignore and then talking to everyone else.

This is how you get different polls, using the same sampling methods, publishing completely irreconcilable results. Polling results in tight races may tell you more about what pollsters believe than they do about who will win.

This is why 2016 polls incorrectly predicted a Clinton win. Pollsters made guesses about voter turnout — based on the 2008 and 2012 elections — which didn’t come true. 3 million fewer democrat votes were cast in 2016 than in the 2012 election, especially in key states like Wisconsin and Michigan, and that tipped the election.

How I read (and ignore) polls
Polls can be wrong, but pollsters are trying to be right. Their results aren’t perfect, but they are informative. For my own prediction, I used poll results to divide states into two categories:

  1. States which aren’t competitive

Polling results in states like Utah (my home state), Oklahoma, and California show an overwhelming lead for one candidate or another. Surprises happen, but the chance of Trump winning California or Biden winning Utah are dismal.

Image for post
Image for post
Utah polling results 2020. Trump is so far ahead that few people are willing to spend money on further polling.

Polls are useful in making a prediction, because they can narrow down your list of competitive states. Analyses like those done by The Economist illustrate the point:

Image for post
Image for post
The Economist, “Forecasting the US Elections — 2020”

Those polls are (almost definitely) right. Trump is going to win in Wyoming and West Virginia. Biden will take California, New York, and Hawaii. Don’t let anyone tell you different.

Polls give us a narrow list of a few states we are NOT certain about. Even if all polls in the past month show Biden winning in Pennsylvania, there are two problems:

  1. We don’t know the polls are right, and

So we immediately check off a list of all the states that are pretty certain, and make a short list of states that are both uncertain and might tip the election.

After ignoring the polls, what next?
By the time I start getting curious about something, other smart people have almost always answered my questions. Never do all the research and analysis yourself.

In this case, once again, The Economist already created a list of uncertain states, sorted by the chance of the state determining the election:

Image for post
Image for post
States sorted by the chance of impacting the final election results
Image for post
Image for post
Probable election results based on simulations and analysis by The Economist.

Biden leads in all of the top 10 tipping point states. Even if Trump does win Ohio, Georgia, Iowa, North Carolina, Arizona, Florida, and Nevada, he would still need to pick up Wisconsin, Michigan, or Pennsylvania in order to win.

So I don’t need to do deep research on every state to make my prediction. I only need to research 3–5 states to guess who will win, and 5–10 states if I’m feeling competitive about guessing the exact spread of electoral votes.

In other words, if I can accurately predict Wisconsin, Michigan, and Pennsylvania, my chances of accurately predicting a winner are probably better than 90%.

Predicting likely voters in Wisconsin
Current polls in Wisconsin claim Biden is somewhere between 1% and 17% ahead — with an average lead predicted at +6%. That is almost identical to 2016 polling, where Clinton was 6% ahead but lost by .77%.

So Trump will win Wisconsin again, right?

Trump won Wisconsin in 2016 by ~22,000 votes, with nearly 3 million votes cast (+.77%). But in 2012, Mitt Romney got the same number of votes (1.4 million) but lost the state by 213,000 votes. In 2008 McCain did worse, losing by 400,000 votes. (Wikipedia)

The difference between Trump winning in Wisconsin and Romney losing wasn’t about a brilliant campaign by Trump or rallying record numbers of Republicans to the polls. Hillary simply failed to attract 200,000 voters who had voted for Obama. Pollsters adjusted their “likely voter” guesses at least partly based on turnout for the Obama elections, and their guesses were wrong.

That tells me Wisconsin will be decided mostly by voter turnout by democrats. If I can find data to help me predict sentiment among Wisconsin democrats, I can make a reasonable prediction.

Searching on Wisconsin’s official elections page tells me that they don’t keep records of voter registration by age, party, race, or gender. Early voting shows intense interest, with already more than 1.5 million ballots cast, but that isn’t definitive. But if we look back at the Wisconsin democratic primaries, 1.55 million people voted — which is a huge turnout. If that many people had voted for Hillary in 2016, she would have carried the state.

The key indicator for me is that early voting this year means a growing number of people polled have already voted. So pollsters using the “how likely are you to vote” method of picking likely voters are actually picking up a growing number of people who already voted. So when polls show Wisconsin ahead and Biden with a growing lead, that reflects — at least partly — a strong turnout that is already being counted.

So it looks like Wisconsin Democrats got the message: If they don’t want 4 more years of Trump, they need to get off Facebook and vote. And they’re showing up to the polls in force.

So I’m going to guess Wisconsin for Biden, consistent with the polls.

So who is likely to vote in 2020?

Wow. Do I even want to ask about Pennsylvania?
Pennsylvania is a different story, and interesting. Well, I’m assuming if you made it this far that you’ll find it interesting.

PA has been a battleground state for a long time, but Democrats consistently won by a small margin for nearly 30 years.

Unlike Wisconsin, Trump won PA because 400,000 more people voted in 2016 than in 2012, with Hillary garnering a similar number of votes as Obama. Even with all the extra votes, 100,000 of those people voted for Gary Johnson (I), and Trump won by only 44,000 votes. (Wikipedia)

I found this lovely little report which gave me PA voting results by county for more elections than we care about, which means we can analyze how Trump got support in 2016.

Image for post
Image for post
Tada! Everyone has access to this data, but few people look deeply enough to glean insights from it. (And yes I know there’s covariance between population loss and voter loss. But the results still hold, and this isn’t a statistics lecture. So I’m not going down that rabbit hole.)

When you look at voter changes between 2008 and 2016, you find two key trends:

  1. The vote was decided by changes in rural counties, not cities

This is fascinating. This data tells me Trump’s message connected with the (very real) plight of declining rural communities in PA, and Hillary’s did not. Trump took about half of the reduced rural Pennsylvanian democrat votes, but a whole lot of other people either moved away or simply didn’t vote.

At this point it would be tempting to look at economic numbers, see how things have changed for people in rural Pennsylvania, compare what Trump and Biden have said that might make them happy or mad, but I think that’s all hogwash. Nobody checks economic numbers before going to the poll. And while Biden or Trump might have said something, most people live in echo chambers so perfectly soundproofed that one idle comment by either of them is unlikely to sway the state.

Wait, what? That same PA government website will give me detailed voter registration statistics by county, including telling me how many people in each county changed from which party to what and is current as of Monday? Be still my heart.

Donald still resonates with rural PA, but he is hosed. PA has 500,000 more active registered Democrats than active Republicans. 47% of voters are Democrats to 39% Republicans. Republicans have have gained ~25,000 more new voters than Democrats, but that’s a drop in the bucket compared to 500,000 voters. All Biden needs to do is get Democrats in Pennsylvania off the couch, off social media, and out to the polls. In reality, Biden doesn’t even need to do that, because the…strong…feelings Donald has engendered in even the most reluctant voters will be all the push most of the usual “unlikely voter” democrats will need to get them to cast a ballot.

His rural blitz strategy in 2016 paid off with a round-trip ticket to the Oval Office, but it only worked because it was unexpected. Democrats have the voters to win, and they know they need to win. The polls agree, and realistically some of those polls will be skewed by voter participation in previous years (IE: 2016). So while polls may show a slim lead, realistically The Donald could lose PA by as much as 10%.

So I’m going to guess Biden takes PA.

Holy crap, do I even want to know more?
No, you really don’t. But here are even more states and even more avenues of research:

Georgia goes to Biden based on racial demographics. Georgia is expected to have a much higher turnout than usual, so high in fact that it can (probably) only happen if minority groups (which typically have lower voter participation and tend to vote Democrat) show up to the polls by the boatload. And we’re seeing that happen.

North Carolina has a huge population of “unaffiliated” voters, who are the game changers. But surprise, a huge portion of them are transplants from New England states. Also, NC has a large population of newly registered voters, a huge portion of which are young and educated — which is also not a good sign for Trump. You can see someone else’s demographics analysis here. Biden takes North Carolina…probably.

Michigan goes to Biden. It’s a repeat of the Wisconsin story, where Trump eked out a win by 11,000 votes — with hundreds of thousands of missing voters. Like with PA, the surprise works once, but not twice. (Wikipedia)

Florida I guessed Trump. The state (probably) doesn’t matter, given Biden’s lead in so many other states. A friend of mine shared an article that I can’t find, so here’s a cool meme that’s about as valid as most of the evidence circulating on the Internet.

Iowa and New Hampshire don’t matter (they literally don’t) unless you’re going for points. I split the difference.

I’m just dying to see more research. What about the popular vote?
Looking at the aggregated data gathered by The Economist, I noticed that Trump has lost significantly in states where he won in 2016. What this tells me is that in easy Republican states, Trump has lost ground. I’m guessing (remember, this is all just for fun) this indicates fewer total votes for him in all states — which is also a bad sign. In contrast, the “Never Trump” democrats are fired up to vote, no matter where they are.

So I made a very scientific (pulled out of thin air) guess that Biden will take 56% of the vote.

Image for post
Image for post
Trump has lost significant ground in 2020 polls compared to his performance in 2016

Lastly, a word about polls, fear, and anxiety
Realistically, if you want to know who’s going to win, wait until votes are tallied on Nov. 3rd. Don’t waste energy being anxious about something you can’t know yet and can’t change. Unless you enjoy it, you don’t need to guess. It’s perfectly okay to just be informed and cast your vote.

I do enjoy this game, and I plan to enjoy the tacos my friend promised me for winning. So long as I’m doing the research, I’m happy to share.

