*Patrick J. Borchers is a contributing writer on Leavenworth St. You can read more from him at his blog, The Way I See It.*

This is driving me batty with regard to polls and margins of error. I know that a lot of people don’t like math, but you’d think that self-professed political reporters would take some time to educate themselves on the topic.

Polling involves a universe of roughly binary samples. Are you going to vote for Trump? Yes or no are the two options (we’ll put those who are undecided to the side for the sake of simplicity).

Are you going to vote for Clinton? Again, yes or no. Binary. Two choices.

Based upon the sample size one can calculate a “confidence interval” for each. The most commonly used one is 95%, meaning that one can say with 95% certainty that the true value is between the two ends of the confidence interval.

This requires the calculation of a standard deviation. One standard deviation each direction from the observed value and you can say the true value is lies within this range to 68% confidence, two standard deviations 95%, three 99%.

The size of the confidence interval goes as the square root of the number of samples. So polling 200 voters instead of 100 voters will shrink the confidence interval by about 1.4 times. This is the reason that the number sampled often seems low. National polls of the presidential race might only involve 400 or 500 likely voters. Polling 5,000 would shrink the confidence intervals, but not by 10 times — more like just over 3 times, and cost a lot more.

If you don’t want to do the math, here’s one of many calculators available on the web.

Of course, pollsters have to avoid “sample bias.” Polling only Democrats or only Republicans in the presidential race would introduce massive sample bias. There’s also confirmation bias — if you already like a particular candidate the chances are vastly greater that you believe your candidate won the debate.

Pollsters have different ways to trying to account for bias and they don’t all agree. The LA Times/USC tracking polls have consistently showed Trump with a narrow lead while Reuters, Fox and the others show Clinton with a narrow lead. They must have different sampling methodologies because it has persisted so long (well over a month) that it’s extremely unlikely it’s a random occurrence

Let’s use a real life example. NE-2 is getting some attention because of Nebraska’s district method for allocating electoral votes. Emerson ran a poll last week that showed Trump up 49-40 in the second district. Yesterday the Omaha World-Herald offered the opinion (it’s not a fact, as I’m about to discuss) that this was a “tie” because it was within the margin of error.

Without having seen all the detail (cross-tabs and such) my guess is that the so-called margin of error was plus or minus 5% on each so Clinton could be as high at 45 and Trump as low as 44. So it’s tied, right?

Hardly. The distribution for each candidate is the bell curve with which most are familiar. It peaks in the middle (49 for Trump and 40 for Clinton) with what are called “tails” on each side. So the low side tail of Trump would just barely touch the high side tail for Clinton.

Assuming I’m right about confidence intervals, the chances of Clinton being ahead or tied are very low and there’s a better chance that Trump is actually up by over 15 points.

My point is that if a poll shows one candidate up 49-47 — and the reported margin of error is 4% — having that lead is not insignificant. The 49-47 result is the most probable value. It’s just that (in this very close scenario) there’s a reasonable chance the candidates may be tied or the one behind may actually be ahead. But there’s a much better chance that the candidate at 49 is actually ahead.

The pollsters, of course, have more sophisticated algorithms to account for the fact that the choices are not completely binary – undecided voters and those expressing a preference for Johnson or Stein muddy the waters a bit. Nor are the choices truly independent because a voter who says “yes” on Trump likely won’t say “yes” on Clinton too.

A fun game to play is to guess where Johnson’s and Stein’s voters will go when he inevitably drops down to three or four percent and she to under one percent, because voters become more pragmatic as election day draws near and they realize that only Clinton or Trump will win. The pollsters seem to recognize this as so many polls report the race with just Clinton and Trump and then Clinton-Trump-Johnson-Stein.

Thus far the margins between Clinton and Trump stay about the same, it’s just their totals go up. So, to take a couple of examples from last week: Reuters had it (Clinton-Trump-Johnson-Stein and then Clinton-Trump) 42-38-7-2 and 44-38 (so Clinton plus four and plus six) and PPP had it 44-40-6-1 and 49-45 (so Clinton plus four in both).

Of course, polls can be “wrong” because people can (and do) change their minds driving to the polling place, sample bias or what have you. But in the last race, Real Clear Politics’ aggregation of the polls had Romney with a less than a percent lead with about a week to go but then the tide turned in the last week (I think Obama was helped by Super Storm Sandy, but pick your own theory) and on election eve the aggregation of polls showed Obama with a three point lead, which was on the money.

The point is that a three point lead meant something, no matter the blather about the margin of error.

Nice piece. A couple more points; it makes a difference if the choice is binary and forced (e.g. you much be for either Trump or Clinton). In that case the sample is fully covariant; if Clinton is sampled at 3 points above the actual average, then Trump MUST be 3 points below his average. So in setting confidence limits, you should use only the sampling error for a single candidate.

Other polls are covariant or not depending on the number of candidates and how they’re distributed. This makes it very hard to combine the sampling errors. But in the other extreme case: no covariance, where opting for Clinton does not preclude you from opting for Trump, then one should take the geometric mean of the individual sampling errors to decide if the lead is beyond the 95% confidence limit.

I agree with you; I think there’s very good evidence that the covariance is high — that if you’re for Clinton, you’re not for Trump. Therefore, I would recommend comparing the lead with the sampling error only for one candidate. If the sampling error is 3 points, then a 3 point lead is at the margin of 95% probability. Under no circumstances can you just add sampling errors to come up with the sampling error for the difference.

For ‘geometric mean’, read ‘root mean square’

ProfGH, Borchers is the son of a physicist and majored in physics at ND, he used to bully us at faculty meetings because he could do long division.

Yes it’s not as simple as comparing free throw shooters whose actions are completely independent.

So what percentage will Marcus Watson shoot from the line this year? Give us the important stuff!

Stenberg was also a physicist. What’s with physicists become lawyers?

GH, if you’ve taken quantum mechanics and differential equations somehow the Rule Against Perpetuities just doesn’t seem all that difficult.

Bluejay, probably within 5 points of whatever he shot last year. Most guys are about as good or bad as they’re going to get by the time they start college. There are strange slumps, like the year Brody Deren shot 36% or whatever it was.

ProfGH, law school is what happens when Wigner’s friend will not tell you the answer.

Hey Pat, you came in 3rd in the primary, losing to a guy who only raised $5K. That’s how brilliant you are.

“We are all failures – at least the best of us are.”

― J.M. Barrie

Ohhh Ricky, even Sparkles spanked you. But the question is, why have you turned into one of the cowards you have always denounced for not using their real name?

Remember, as the OWH pointed out in a lengthy article, Stothert had nothing to do with Con Agra or HDR. How embarrassing for you!

Dean:

The MSM is in the tank for Clinton. And the continuing reference to the horse race generates controversy which means audience.

This is all obvious to me.

Surely you agree that it is nothing short of spectacular how the universe of the ‘MSM’ has so dramatically expanded throughout the last 12 months? A virtual ‘Big Bang’, triggered by a seemingly innocuous escalator ride.

An expansion continually fueled by each successive TrumpFest of Inanity.

Just one example of the ‘MSM’ expansion – Sept 29, Wall Street Journal –

“It will be either Mr. Trump or Mrs. Clinton—experienced, forward-looking, indomitably determined and eminently sane,”

“Her election alone is what stands between the American nation and the reign of the most unstable, proudly uninformed, psychologically unfit president ever to enter the White House.”

– the eminently conservative, Pulitzer Prize-winning Dorothy Rabinowitz, in a column headlined “Hillary-Hatred Derangement Syndrome.”

As I’m certain you’re aware, Dorothy Rabinowitz is just one of the literally dozens among the conservative intelligentsia to go on record in opposition to the breathtakingly incurious Tufted Talking Yam. (formerly referred to by his managers as the “Babe Ruth of Debate”)

A vote for Johnson here. The two-party system sucks. Why? There are more than two opinions, so we need more than two viable choices. The fact that we have NO VIABLE CHOICES IN THIS ELECTION illustrates that this “system” is broken beyond repair. I have nothing but scorn for Democrats, Republicans and indeed the American political system in its entirety – scorn and a lifetime of farts to boot.

That’s three minutes I’ll never get back. Stats…not my thing.

Just set your alarm 3 minutes earlier tomorrow and it’s all good.

Also, the polls don’t tell you what the refusal rate is. Without that, the margin of error can almost be useless.

I am told that this is a bigger problem with exit polls. The refusal rate with pre-election polls is, I’m told, enormous. However, they have algorithms to try to avoid sample bias based upon the enthusiasm level for a candidate. Voters who are highly motivated to vote for their candidate are much more likely to take a few minutes with a pollster than one who is simply voting what he or she perceives to be the least bad candidate.

This election probably will have the historically largest percentage of people voting for one of the major party candidates mainly to try to keep the other one out of office. Some indication of this is the impressive poll numbers that Johnson and Stein have shown. The Libertarians have only cracked 1% in the presidential election once and Stein’s 500K votes (about a half a percent) in 2012 were a record. The fact that Johnson has showed up in some polls in double digits and Stein as high as 5% is a strong indication of massive dissatisfaction with the major party candidates.

But they won’t wind up there. Johnson I estimate will wind up in the 3-4% range and Stein might tickle 1%. Voters get more pragmatic the closer the election draws.

With that long wind up, Trump has a decided enthusiasm advantage over HRC. He’s got more voters who are affirmatively “for” him than does HRC. But the pollsters can test for that with questions such as how likely the person is to change his/her mind, etc. and account for it to avoid sample bias. But Trump has been a true “black swan” event so the polls may prove particularly unreliable.

The most spectacular failure to take account of refusals happened back in the 80’s when Tom Bradley (the African-American mayor of L.A.) was the Democratic nominee for Governor of California. The exit polls showed that he’d win about 52-48. He lost about 52-48. Red-faced, the pollsters then noticed that they had a statistically significantly higher number of refusals than normal from whites over 50. Basically, there were a good number of people who wouldn’t vote for Bradley because he was black even (and especially) among white Democrats.

But anyway, this stuff has gotten so deadly accurate now and people just shouldn’t treat a poll that says 49-47 as a tie, because it’s probably not.