Patrick J. Borchers is a contributing writer on Leavenworth St. You can read more from him at his blog, The Way I See It.
This is driving me batty with regard to polls and margins of error. I know that a lot of people don’t like math, but you’d think that self-professed political reporters would take some time to educate themselves on the topic.
Polling involves a universe of roughly binary samples. Are you going to vote for Trump? Yes or no are the two options (we’ll put those who are undecided to the side for the sake of simplicity).
Are you going to vote for Clinton? Again, yes or no. Binary. Two choices.
Based upon the sample size one can calculate a “confidence interval” for each. The most commonly used one is 95%, meaning that one can say with 95% certainty that the true value is between the two ends of the confidence interval.
This requires the calculation of a standard deviation. One standard deviation each direction from the observed value and you can say the true value is lies within this range to 68% confidence, two standard deviations 95%, three 99%.
The size of the confidence interval goes as the square root of the number of samples. So polling 200 voters instead of 100 voters will shrink the confidence interval by about 1.4 times. This is the reason that the number sampled often seems low. National polls of the presidential race might only involve 400 or 500 likely voters. Polling 5,000 would shrink the confidence intervals, but not by 10 times — more like just over 3 times, and cost a lot more.
If you don’t want to do the math, here’s one of many calculators available on the web.
Of course, pollsters have to avoid “sample bias.” Polling only Democrats or only Republicans in the presidential race would introduce massive sample bias. There’s also confirmation bias — if you already like a particular candidate the chances are vastly greater that you believe your candidate won the debate.
Pollsters have different ways to trying to account for bias and they don’t all agree. The LA Times/USC tracking polls have consistently showed Trump with a narrow lead while Reuters, Fox and the others show Clinton with a narrow lead. They must have different sampling methodologies because it has persisted so long (well over a month) that it’s extremely unlikely it’s a random occurrence
Let’s use a real life example. NE-2 is getting some attention because of Nebraska’s district method for allocating electoral votes. Emerson ran a poll last week that showed Trump up 49-40 in the second district. Yesterday the Omaha World-Herald offered the opinion (it’s not a fact, as I’m about to discuss) that this was a “tie” because it was within the margin of error.
Without having seen all the detail (cross-tabs and such) my guess is that the so-called margin of error was plus or minus 5% on each so Clinton could be as high at 45 and Trump as low as 44. So it’s tied, right?
Hardly. The distribution for each candidate is the bell curve with which most are familiar. It peaks in the middle (49 for Trump and 40 for Clinton) with what are called “tails” on each side. So the low side tail of Trump would just barely touch the high side tail for Clinton.
Assuming I’m right about confidence intervals, the chances of Clinton being ahead or tied are very low and there’s a better chance that Trump is actually up by over 15 points.
My point is that if a poll shows one candidate up 49-47 — and the reported margin of error is 4% — having that lead is not insignificant. The 49-47 result is the most probable value. It’s just that (in this very close scenario) there’s a reasonable chance the candidates may be tied or the one behind may actually be ahead. But there’s a much better chance that the candidate at 49 is actually ahead.
The pollsters, of course, have more sophisticated algorithms to account for the fact that the choices are not completely binary – undecided voters and those expressing a preference for Johnson or Stein muddy the waters a bit. Nor are the choices truly independent because a voter who says “yes” on Trump likely won’t say “yes” on Clinton too.
A fun game to play is to guess where Johnson’s and Stein’s voters will go when he inevitably drops down to three or four percent and she to under one percent, because voters become more pragmatic as election day draws near and they realize that only Clinton or Trump will win. The pollsters seem to recognize this as so many polls report the race with just Clinton and Trump and then Clinton-Trump-Johnson-Stein.
Thus far the margins between Clinton and Trump stay about the same, it’s just their totals go up. So, to take a couple of examples from last week: Reuters had it (Clinton-Trump-Johnson-Stein and then Clinton-Trump) 42-38-7-2 and 44-38 (so Clinton plus four and plus six) and PPP had it 44-40-6-1 and 49-45 (so Clinton plus four in both).
Of course, polls can be “wrong” because people can (and do) change their minds driving to the polling place, sample bias or what have you. But in the last race, Real Clear Politics’ aggregation of the polls had Romney with a less than a percent lead with about a week to go but then the tide turned in the last week (I think Obama was helped by Super Storm Sandy, but pick your own theory) and on election eve the aggregation of polls showed Obama with a three point lead, which was on the money.
The point is that a three point lead meant something, no matter the blather about the margin of error.