National Polling Accurately Nails Popular Vote
Tuesday, November 29th, 2016
This was a complex election since Hillary Clinton won the popular vote and Donald Trump won the Electoral College. In terms of forecasting, being "right" means two different things, depending on whether you were estimating the former or the latter.
Many people most likely assume that any poll at the national level is forecasting the Electoral College outcome, which is actually not the case. National horse race polls predict the national horse race -- the popular vote. Given that in two of the last five elections the popular vote winner did not win the Electoral College (and the presidency), the distinction between national horse race polls and efforts to predict the Electoral College becomes more significant.
In terms of predicting the national popular vote outcome, the national polls did remarkably well in 2016. As was the case in 2012, the Democratic candidate's popular vote margin is growing as vote counting continues in the weeks after Election Day. As of this writing, Clinton is ahead of Trump by 1.5 percentage points (48.1% to 46.6%), representing the fact that she has received over 2 million more votes than Trump. The margin could grow to two points. Clinton will, therefore, win the popular vote by a larger margin than was the case for Al Gore over George W. Bush in 2000, Richard Nixon over Hubert Humphrey in 1968 and John F. Kennedy over Richard Nixon in 1960. Clinton will have won by a greater popular vote margin than two other candidates who won the popular vote but lost the Electoral College (Gore and Grover Cleveland in 1888), as well as five other candidates who won on both measures (Nixon, Kennedy, Cleveland in 1884, James Garfield in 1880 and James Polk in 1844).
The average "gap" estimate on the national popular vote as calculated by RealClear Politics prior to the election was 3.3 points. This means the national popular vote estimate will end up being significantly closer to the actual result than was the case in 2012, and well within the margin of error. To come within less than two percentage points on the gap is a remarkable polling achievement and should be applauded.
But, given that the Electoral College determines the winner, state-level polls are what matters for those interested in projecting the outcome of a presidential election. And projecting the Electoral College outcome using polling essentially comes down to the accuracy of polls conducted in a handful of swing states. The outcome in the vast majority of other states is predetermined in all but wave elections (the last wave election was in 1984).
This creates a paradox since state polls in swing states, in my judgment, are less reliable than national polls. This occurs for several reasons. State polls typically have smaller sample sizes, have more variable quality depending on what organization conducts the poll, are estimating an outcome that can shift more readily because the population is smaller, are often conducted further away from Election Day and are more dependent on precision in estimates of turnout by geography.
This latter point is critical. The voting choice of the population of most states can vary dramatically between big city/urban areas and outstate areas. Examples in swing states include Milwaukee and Madison versus outstate Wisconsin, Detroit and Ann Arbor versus outstate Michigan, and Philadelphia and Pittsburgh versus outstate Pennsylvania. Relatively small variations in the proportion of a state poll sample from the big city versus outstate in a swing state can shift the overall horse race values enough to move the winning margin from one candidate to the other.
The state poll averages in key states of Wisconsin, Michigan and Pennsylvania prior to the election pointed to a Clinton win in each state. Trump won each state (albeit apparently by a very narrow margin in Michigan). The final RealClear Politics average in Pennsylvania was Clinton +1.9. Trump won by 1.2. From a statistical perspective, this difference between poll average and outcome is within the margin of error. In Michigan, the final polling average was Clinton +3.4, and Trump at this point has a very small 0.3 average win. Wisconsin showed the biggest deviation, with the final poll average of a 6.5 Clinton win; Trump won by 1.0. But the four polls used in the Wisconsin average by RealClear Politics were completed on Oct. 27, Oct. 31, Nov. 1 and Nov. 2. The election was Nov. 8, meaning that the predictions in Wisconsin were based on data about a week or more old.
In Florida, a rich electoral vote prize state, the final RCP average was Trump +0.2. Trump won by 1.3. This was a quite accurate prediction. In North Carolina, the final RCP average was Trump +1.0. Trump won by 3.8. Again, within the margin of error.
Exacerbating and amplifying the state poll limitations in this election cycle was the proliferation of the so-called election forecasting operations. Springing from the unusual success of Nate Silver's FiveThirtyEight blog in The New York Times in the 2012 cycle, a number of copycats came forward to, in essence, do the same thing this year. Although they differed from one another in some ways, they all -- underneath the pseudo-scientific jargon -- were based on aggregates of state polls in swing states. (Outcomes in red and blue states were known entities.) As the handful of swing-state polls went, so went these forecasts. These operations essentially built a giant edifice on the foundation of often-limited swing state polls.
The whole enterprise would have been better if the forecasters used the state polls to focus more on developing scenarios under differing assumptions, rather than precise "probability" models. The "90% probability that Clinton will win" forecasts were much too certain, based on the evidence on which they were built.
To the degree that organizations want to predict the Electoral College, they are going to have to find ways to finance or encourage larger-sample, higher-quality state polls, rather than relying on the haphazard polls that happen to be conducted in the various states.
At the national level, however, the evidence clearly shows that the polls were, in fact, accurate -- confirming that even with lower response rates and other challenges, national polls using landlines and cellphones are able to project to the national population. This confirms other evidence showing that national polling continues to be accurate in terms of estimating population parameters -- including the percentage without health insurance and employment statistics.
I continue to believe there is too much emphasis on trying to predict the winner of an election, although I recognize that it is human nature to want to know which horse is ahead and is most likely to win while the race is still occurring. All of the immense time, money and effort put into forecasting before the election has little lasting impact or value on policy or the direction of the democracy once we know who won.