Ohio 2004: Statistical Improbability of Election Fraud

Letxa.com

Commentary, science, and debunking

Main

My Positions

Articles
  McCain 2008
  Climate Change Reality
  Gore Climate Scaremongering
  Clinton Surplus Myth
  Taxes, Rich & Poor
  Climate scaremongering
  Social Security Problem
  More articles...

Debunking
  Ohio 2004 Election Fraud?
  Illegal Immigration "Guide"
  The NEXRAD Hoax
  Other Debunkings...

NEXRAD
  Introduction
  Technical Explanation
  NEXRAD Site Tour
  Interpretation
  Specifications
  Site Locations
  Anomalies
  NEXRAD Links

Other
  Why do I Care?
  Contact Me

OHIO 2004: STATISTICAL IMPROBABILITY OF ELECTION FRAUD

EXECUTIVE SUMMARY

Many people have made the accusation that there was fraud in the Ohio 2004 presidential election because the election results did not agree with some preliminary exit poll results obtained earlier in the day. This article observes that the election results were, in fact, statistically identical to the prediction of nearly a dozen polls conducted in the week before the election. The conclusion is that given the pre-election polls, the election, and the exit polls, the data that contradicts everything else is the exit polls. For the exit polls to have been right, not only was there massive election fraud, but over a dozen pre-election surveys would have had to have been rigged. This article will show that this is statistically impossible and logically questionable.

BACKGROUND

Ever since Ohio became the deciding state in the 2004 presidential election, there are those that have been trying to make Ohio out to be the "Florida" of the election. Many of these people believe Gore was robbed of a presidency in Florida in 2000 and now these people are saying that Kerry was robbed of his presidency in Ohio.

PRE-ELECTION SURVEYS

As we know, polls and pre-election surveys were constant news in the months, weeks, and days prior to the 2004 election. Ohio was not the exception. We have the following polls that were published in the week prior to the election.

I am going to ignore the Columbus Dispatch poll of 10/29/04 because 1) I have yet to be able to get through to the Columbus Dispatch website to verify the data, 2) It presents a 50/50 break between Bush and Kerry which implies no-one would vote for anyone else and no-one is undecided which seems strange. 3) It comes right down in the middle anyway so the best it can tell us is that it's a very close race--something we already know.

So discounting the Columbus Dispatch poll we have 14 polls in the week previous to the election of which Bush had a lead in 11 and Kerry had a lead in 3.

EXPLAINING KERRY LEADS IN TWO ZOGBY POLLS

Zogby showed Kerry +1 on 10/27 and Kerry +3 on 10/28. On each of these two days there was one other poll released with a better statistical level of accuracy that showed contradicting results.

On 10/27 Zogby showed Kerry leading 46% to Bush's 45% with a 4.1% margin of error. On that same day Strategic Vision showed Bush leading Kerry 48% to 47% with a 3% margin of error. So on this date the more accurate poll showed Bush leading by 1 point.

On 10/28 Zogby showed Kerry leading 47% to Bush's 44% with a 4.1% margin of error. On that same day Mason-Dixon polled more than twice as many people and calculated Bush with 48% and Kerry with 45% with ust a 2.6% margin of error. So again on this date the significantly more accurate poll showed Bush leading by 3 points.

It should be mentioned that Zogby also showed a Bush lead in the days immediately before and immediately after these two days that showed a Kerry lead. While it is possible that Kerry enjoyed a small gain for these two days, the two more accurate polls released on those two days (Strategic Vision and Mason-Dixon) did not detect that same "bump."

Additionally these two more accurate polls that were conducted one day apart both show Bush holding steady with 48% and Kerry moving around between 47% and 45%. Between these two polls there are a total of 2301 samples that show Bush with 48% and Kerry with a combined 45.6% and a combined margin of error of 2.1%--this shows Bush ahead by more than the margin of error.

Zogby's two polls are such that even though they show a Kerry lead, the more accurate results of the other two surveys could be right and still within Zogby's 4.1% margin of error.

Thus the most likely conclusion is that Zogby's polling for these two days, while within the statistical margin of error of reality, provided a bad representation of the sentiments of Ohio voters. The other two polls with a larger sample size and a smaller margin of error seem to provide a better idea of the true position of the candidates on those days.

EXPLAINING KERRY LEADS IN GALLUP

On 10/31/04 CNN/Gallup showed Kerry with 50%, Bush with 46%, thus giving Kerry + 4 with a 3% margin of error. On that very same day four other surveys were showing a Bush lead by a minimum of +0.9% to +4%.

If you combine the four surveys that show a Bush lead you get a total of 2979 samples which gives Bush a combined position of 49.0% and Kerry 46.8% with a margin of error of 1.8%. Once again we have Bush ahead of Kerry by more than the margin of error. Assuming the 46.8% is correct the CNN/Gallup poll is just outside its margin of error (which will happen 1 time out of 20). However, if we assume that Kerry was really at 47% (rather than 46.8%) and Bush at 48.8% (rather than 49%) then we find that all the polls (including the combined statistics of the four that favor Bush) could all be within their margin of error and thus all be statistically correct.

CONSENSUS OF THE POLLS VALIDATES ELECTION

The consensus of the polls is that Bush was ahead in the week going into the election and was gaining steam. That is, if there was any movement in the last week it appeared to be favoring Bush.

Ignoring the three polls that favored Kerry (since it has been shown that these polls were probably "outliers" and the polls that indicated a Bush lead were more probably close to correct), we can attempt to find an approximate consensus of what all the polls together would predict for the ultimate outcome of the election. Technically this isn't accurate since polls taken the day before the election should be more representative of voting the next day then polls taken a week earlier--but in this case obtaining a consensus from all the non-outlier polls of the previous week favor Kerry (since his support was stronger a week before the election than the day before the election).

The total sample size of the non-outlier polls in the week before the election is 8522. Of this sample Bush gets 48.5% and Kerry gets 45.8% with a 1.08% margin of error. Given these numbers--and even giving Kerry the benefit of his stronger position the week prior to the election--Bush is ahead by more than the margin of error.

This means that based on the election surveys of the entire week before the election there is less than a 5% probability that Kerry actually had more support in Ohio than Bush-- especially when you consider we gave Kerry the benefit of the stronger support he had a week before the election even though it is clear some of that support disappeared. Conversely, there is more than a 95% probability that Bush did win the election based on those same pre-election surveys. If you consider that Bush's support was increasing while Kerry's was sliding the actual probabilities are even more in Bush's favor.

In statistics anything that happens less than 5% of the time is considered "unlikely" and, when it happens, is worth special attention due to its improbability. Thus if Kerry were to actually win that would be considered statistically interesting and worthy of additional scrutiny. That Bush won Ohio is not surprising based on all the surveys going into the election.

Additionally, assuming the polls projected Bush with 48.5% and Kerry with 46.8% that projects a Bush win by 1.7% with a margin of error of 1.08%. This means the polls predicted a Bush win by no less than 0.62% and no more than 2.78%. In reality, Bush won 50.8% and Kerry won 48.7% giving Bush the win by 2.1%. As we can see, the pre-election polls correctly predicted the margin by which Bush would win Ohio to a statistically acceptable level of accuracy.

There were no surprises in Ohio. Polls indicated that Bush should win Ohio by 0.62% to 2.78% and Bush won Ohio by 2.1%.

THE IMPLICATION OF A KERRY WIN

Despite the absolute statistical improbability that Kerry actually won, there is a bigger issue. For Kerry to have won the election all these pre-election surveys would have had to have been wrong. What is the probability of that?

Here is another table of the surveys with an additional column called "Kerry Win %". This column is the statistical probability that Kerry was actually ahead even though the poll showed him as being behind.

The probability of any given survey showing the given results when Kerry was actually in the lead is shown in the far right column. For example, in the last SurveyUSA poll Bush had 49% and Kerry had 47% with a 3.5% margin of error. Given those specifications statistics tell us that there was a 28.4% chance that Kerry was actually ahead and that the poll got it wrong.

What is the probability of Kerry actually being ahead on election day when all of the above polls showed Bush ahead? This can be calculated by multiplying all of the values from the last column together. I.e. 0.37 * 0.125 * .404, etc. The result is 0.00000453381 which is .000453381% or 4 one-thousandths of a percent. This means that the chances that Kerry should have won on election day given all these polls showing Bush ahead are 1 in 220,564! Statistically speaking this is a virtual impossibility.

WHAT ABOUT FRAUD?

Widespread fraud sufficient to effect the election is statistically ruled out for three major reasons.

  1. Fraud was unnecessary. Kerry had only 1 shot in 220,564 of winning Ohio on election day. There was virtually no chance Bush would lose. Conducting widescale election fraud when you're already all but guaranteed to win the state is an unreasonable and unnecessary risk.

  2. Fraud didn't accomplish anything. We have already shown that, statistically speaking, Bush should have won Ohio by 1.7% within a 1.08% margin of error. That is to say, any Bush win between 0.62 and 2.78 is considered within the margin of error of the prediction. Bush ended up winning by 2.1% which is just slightly ahead of the prediction but well within the margin of error.

    If there was fraud in Ohio, the ultimate effect of the fraud accomplished nothing! Statistically speaking Bush received the exact number of votes in the Ohio election as were predicted by the pre-election surveys. What is the purpose of fraud if it only gets you the same number of votes you were expecting to get anyway?

  3. Fraud virtually impossible to coordinate. Even if we assume that the Republicans "weren't going to take any chances" and committed fraud "just in case," the election ended up being right in line with the pre-election polls. If there was widespread fraud of any statistical importance it is highly unlikely that the resulting election would end up matching the pre-election polls so closely.

    Think about it: If Bush was going to win 2,858,727 votes and Kerry was going to win 2,739,952 and the Republicans had "created" 120,000 phantom votes then Bush would have won 2,978,727 (51.8%) to Kerry's 2,739,952 (47.7%) giving Bush a margin of victory of 4.1%. This far exceeds the pre-election survey's predictions of a Bush win by 1.7. In fact, 4.1% is more than twice the margin of error which means that given the pre-election surveys there is less than a 1% chance that Bush would win by 4.1%. This would be very suspicious and would merit investigation.

    Instead, the election results mirrored the pre-election surveys precisely, statistically speaking. Bush's win by 2.1% was well within the margin of error of the pre-election surveys that predicted a win by 1.7%. This is not at all suspicious.

    In fact, if there was widespread Republican fraud in the election, the following would have to occur: 1) Republican voters would have to be turning out in far fewer numbers than expected. In reality Republicans turned out at a higher than Democrats (40% to 35%). 2) Republican fraudsters would have to know that their constituents weren't turning out in the desired number and would have to have a very accurate idea of how low the Republican turnout was going to be. 3) The Republican fraudsters would have to then communicate with the people actually participating in the fraud and tell them how many additional votes they wanted so that the Bush would win the election in such a way as it would be consistent with the pre-election surveys. 4) The fraudsters would have to hope that all their constitutents didn't show up later in the day and suddenly vote and give Bush a margin of victory so huge that it would raise statistical eyebrows.

    This is entirely improbable. Republicans turned out in strong numbers which means no fraud was necessary to begin with. Any substantial fraud would have given Bush a margin of victory so large as to raise statistical curiosity regarding how Bush could win with, say, 4.1% when pre-election surveys only predicted 1.7% with a 1.08% margin of error. And even if we assume that someone lied and Republican turnout wasn't as strong as has been suggested, the fraudsters would have to know this during election day in order to "create" just the right number of votes in multiple counties to give Bush a win by just enough to not raise eyebrows.

    This is simply not feasible. It would require a massive fraud organization ready to act and communicate throughout the day based on information no-one had until the end of the day when precincts start reporting. The chance of such a fraud organization operating undetected is virutally non-existant.

EXIT POLLS: MISINTERPRETATION, HOAX, OR FRAUDULENT EXIT POLLS?

According to some sites the Ohio (and Florida) exit polls were a strange anomaly in the entire country. They suggest that the original exit polls "In Ohio, Kerry had a small but noticeable lead with both male and female voters, a rare thing for him as males have tended to favor Bush in this election by a small margin. Likewise, independent voters clearly broke for Kerry, by a 21 percent margin, 60-39. This is not anywhere near the result we are seeing now...". They themselves suggest that Kerry was leading among males which they recognize was a rare thing. Yet they proceed to suggest that these exit polls--which produced "rare" results--were accurate and since they were different from the election that the election must be fraudulent. They continue by saying "there appears to be some dishonesty somewhere" and we are led to believe that the dishonesty was with Republican officials who managed the Ohio election.

But what if the dishonesty is with the people making these allegations or in the exit polls themselves? I'm not one to suggest a "conspiracy theory" but given the statistical impossibility that Kerry was actually ahead in Ohio but lost the election the only remaining possibilities are:

  1. MISINTERPRETATION. It is possible that the exit polls were actually right but the people making these allegations are misinterpreting the exit poll data or are using raw, unadjusted polling data. The data that supposedly gave Kerry an early lead is data that was never officially released; it was unofficial data anonymous leaked to pro-Democratic blogs. It is entirely possible that the people that received this data did not know how to interpret it. If the data they received was raw poll data that had not been adjusted to correctly reflect the area's demographics then any conclusions drawn from the data would be completely inaccurate. This is the most likely explanation.

  2. HOAX. Certain people are perpetuating an unfounded rumor regarding the meaning of the exit polls or of the raw data itself. Remember that the "raw data" was never officially released. Rather it was leaked anonymously to pro-Democratic blogs and alternative news sources. We have no way to verify that the data that was "leaked" was accurate, where it came from, and whether or not it was raw data or had already been statistically adjusted. Since there is no source that can confirm the exit poll data that the conspiracy theorists are discussing it is impossible to confirm the data. The data could have been gleaned from a pollster's website early in the day, it could have been leaked by a liberal-leaning employee of the exit poll organization, or it could be a complete fabrication. We have absolutely no way of knowing. I'd like to think that this is not the case but the truth is that very little in the "blogosphere" can be verified so it's entirely possible someone said something, someone else repeated it, and it snowballed from there with no real evidence to support it.

  3. EXIT POLL FRAUD. The final possibility is that the exit polls themselves were intentionally fraudulent. That is to say, the organization(s) conducting the exit polls wanted Kerry to win and purposefully sampled too many woman, too many Democrats, etc. in order to obtain results that favored Kerry, leak them to alternative news sources, and thereby discourage Republicans that might think that Kerry had an insurmountable lead. This seems extremely unlikely but is an explanation that must be mentioned.

So which of these is most probable?

In my estimation the alternatives are listed in the order of their probability. The most probable explanation is that some well-meaning people are misinterpreting exit poll data. The next most probable explanation is that the people are not well meaning and are intentionally spreading rumors and asking questions simply to undermine Bush's second term. The least probable explanation is that the exit polls were intentionally biased--this possibility seems even less likely considering the exit polls released later in the evening on election day were far more along the lines of what we would expect.

CONCLUSION: NO SIGNIFICANT ELECTION FRAUD

Statistically speaking the only conclusion we can make is that there was no significant election fraud in Ohio in 2004. That is not to say that everything was perfect. This doesn't mean that there wasn't any fraud. But we can conclude that:

  1. Bush won Ohio with a margin consistent with the pre-election surveys.
  2. The chance that Kerry was actually ahead in Ohio but was shown to be behind by both the pre-election surveys and also in the actual election are 1 in 220,564. It's statistically impossible to believe that Kerry should have won on election day.
  3. If there was any fraud in Ohio it was not statistically significant and did not effect the outcome of the election.
  4. If the assertion is made that Kerry actually should have won then the person making that assertion must accept that the fraud wasn't just in the election but in all of the pre-election surveys as well. A conspiracy of this magnitude could not be concealed and it's doubtful that it could work even if it were possible to conceal it.
It is my conclusion that it was impossible for Kerry to win Ohio on election day, Bush won by the exact margin that pre-election surveys were predicting, and that the current rumors that Kerry should have won based on exit polls are due to people misinterpreting data or intentionally spreading disinformation to undermine Bush's second term.

UPDATE: On December 29, 2004, a recount in Ohio was completed. The recount resulted in Kerry gaining 734 more votes and Bush gaining 449. The reduced Bush's lead by a mere 285 votes and left him with a win by 118,775. As I predicted, Bush's lead was confirmed by the recount and as I also predicted, the recount itself was useless. There is no statistcal way that anyone but Bush could have on Ohio on election day. Of course that doesn't stop Jesse Jackson and others that lack an understanding of statistics from continuing to question the election and suggest that the counting machines themselves may be inaccurate. Does it never end? Will they never accept the result?

UPDATE: On January 19, 2005, a report was released by the National Election Pool that evaluated the exit polls and any discrepancies that may have existed. It concluded, in part, that "Exit polls overstated John Kerry's share of the vote on November 2, both nationally and in many states, because more Kerry supporters participated in the survey than Bush voters". There is nothing surprising about this. "The problem is not new -- in every presidential election since 1988, exit polls have overstated support for Democrats nationally -- but the discrepancy in 2004 was more pronounced than in previous years."

IMPLICATION OF THE STEVEN F. FREEMAN ANALYSIS

Dr. Steven F. Freeman of the University of Pennsylvania came up with a statistical analysis shortly after the election in which he calculated that the odds of Bush winning (or Kerry losing) by the margins they did in three battleground states (Ohio, Pennsylvania, and Florida) were one in 250 million--essentially a statistical impossibility. Given Freeman's analysis it is reasonable to wonder how he can be right if my analysis is right.

Despite the fact that Dr. Freeman has a PhD, it doesn't appear that he is a professor in statistics nor mathematics, nor that his doctorate is in any of these fields. His doctoral thesis was entitled "The problem of identity in organizational behavior and human decision processes". This is not to detract from his doctorate, but it should be noted that unless further information is forthcoming, it would appear that he has no significant edge in statistics as compared to any educated college graduate. That is to say: Don't let Freeman's PhD generate some kind of "aura" that leads you to believe that he is somehow "more right" just because he has PhD attached to his name.

Freeman's analysis is based on specific exit poll data that suggested a significant advantage for Kerry in all three battleground states except for Florida. Specifically, the exit poll data predicted a Bush win by 0.1% in Florida, a Kerry win by 4.2% in Ohio, and a Kerry win by 8.7% in Pennsylvania. The final election results, according to Freeman, had a Bush win by 5% in Florida, 2.5% in Ohio, and a Kerry win by 2.2% in Pennsylvania. Based on these differing margins, Freeman then proceeds to look at what Bush gained in the margin based on what was predicted in the exit polls compared to what he actionally won in the election. In Florida, Bush did 4.9% better than predicted, in Ohio he did 6.7% better than predicted, and in Pennsylvania he did 6.5% better than predicted--though he still lost to Kerry by 2.2%.

Based on these numbers and statistical analysis, Freeman calculates that given these numbers there is only a 1 in 250 million chance of this happening.

REVIEWING FREEMAN'S STATISTICAL WORK

Bush actually won Ohio by 2.1%, not by 2.5% as indicated in Freeman's paper. According to CNN as of Jan. 20, 2005, Bush won 2,858,727 votes, Kerry won 2,739,952, and two minor candidates won 26,602 votes. Out of a total of 5,625,281 votes, Bush won 50.8% and Kerry won 48.7%; thus the Bush margin was actually 2.1%, not 2.5% as indicated in Freeman's report. This is a minor difference but places the odds of Kerry receiving 48.7% at 0.1277% rather than the 0.08% suggested by Freeman (this weakens Freeman's argument).

In Florida I calculate the probability of Kerry getting only 47.1% of the vote when he was polling at 49.7% to be 0.2709% rather than 0.2800% calculated by Freeman (this actually strengthens Freeman's argument).

In Pennsylvania I calculate the probability at 0.1763% rather than the 0.1800% calculated by Freeman (this actually strengthens Freeman's argument).

Thus using these corrected values the probability is 0.001277 * 0.002709 * 0.001763 = 0.000000609891% which equates to one chance in 164 million rather than Freeman's 250 million figure. This is still an extremely improbable occurence--but 34% more likely than what Freeman suggests.

IMPLICATIONS OF FREEMAN'S WORK

Whether you believe Freeman's "one in 250 million" figure or my "one in 164 million" figure, the practical conclusion is the same. They are both huge numbers and for all effects and purposes they might as well be the same--there is no significant difference in a one in 164 million, 250 million, or one in a billion chance. They are all so highly improbable so as to be considered impossible.

We have three possible explanations:

  1. Pure Chance. Statistics specifically permits for the possibility of events that would be considered improbable. The static of 1 chance in 164 million doesn't mean it will never happen; it means it ought to happen once every 164 million times. So we could be witnessing that one oddball event. However, statistically speaking, this is virtually impossible. The odds are so highly against it that for any logical discussion we must assume that this is not the explanation.

  2. Election Fraud. There was massive election fraud that changed the outcome so significantly that it forced such an improbable statistical anomaly to occur in three different areas of the country; and in one case (Pennsylvania), this massive fraud was engaged in even though Kerry's lead was so great that it could not be beaten even with the massive fraud. Given the nature of the electoral college, there is no logical reason to risk engaging in fraud in a presidential election in a state that you know you can't win even with the fraud. This alone is sufficient reason to question the theory of election fraud; but if you combine that with the fact that, as I demonstrated above, the Ohio election produced results completely in line with over a dozen pre-election surveys, it becomes increasingly difficult to believe that election fraud was the answer: 1) In Ohio, the election results came down exactly as pre-election polls predicted. 2) In Pennsylvania, we must believe that Republicans engaged in a massive fraud campaign and did so even though it wasn't going to make a difference. 3) In Florida, Bush was predicted to win but we are expected to believe that Republicans engaged in such huge fraud as to draw attention to themselves when they were going to win anyway and a much smaller level of fraud would have been less suspicious. 4) Although there has been a huge amount of liberal uproar about inconsistencies in Ohio before and during the election, I'm unfamiliar with any significant complaints about the process in Florida and Pennsylvania; if a similar level of fraud had been conducted in those states, I'd expect a lot more "noise" about Florida and Pennsylvania--yet it seems we only hear complaints about Ohio. All in all, the accusations of election fraud don't add up.

  3. Exit Polls Flawed. That leaves us with the final--and in my opinion, most probable--option: If there was something significantly flawed in the exit polls themselves, the exit poll data itself was bad and any conclusions drawn from those polls is equally bad. As amazingly huge as Freeman's odds might seem, they are absolutely irrelevant if the underlying data on which they are based is inaccurate. You can't reach useful conclusions based on bad data. As cited above, in January 2005, the National Electio Pool (responsible for the polls) admitted they were wrong. Pollsters do this for a living and their livelihood depends on getting it right. They are not going to make excuses to prop up the president; to do so would undermine their own credibility in future exit polls. The alternative is to believe a massive right-wing conspiracy of an even greater magnitude than that necessary to actually perpetrate the election fraud.

It seems quite apparent that Freeman made a reasonably good effort at reaching a statistical conclusion based on bad data. If we look at Ohio, we see that when we took the pre-election polls, the election, and the exit polls, it is not the election that was the outlier, but rather the exit polls. Those involved in generating the exit polls themselves have admitted there was an error in their work. Yet, for some reason, certain people cling instead to the conspiracy theory that the GOP was able to generate literally millions of votes over several states, two of which they were already expecting to win anyway and the third of which the fraud didn't lead to a win anyway.

Any rational and logical analysis must conclude that despite Freeman's best effort and his reasonable application of statistics, he was simply basing his work off of bad numbers. The huge numbers contained in his result do not automatically indict the election; they can just as easily indict the accuracy of the exit polls. And when we consider that--at least in Ohio--the election came down exactly as a dozen pre-election polls predicted, it seems all the more likely that the problem was in the exit polls, not the election.

MODIFICATION HISTORY

I wrote this article, I believe, in mid-December 2004. As I was re-reading it, I realized that my response to Freeman's work had been left incomplete. As such, I finished the "Implication of Freeman's Work" section and added the "Executive Summary" to the top of this article on October 9, 2006. The rest of the article remains unchanged.