**Election Statistics**

**December 11, 2000**

*What's the chance that a hand recount will reverse the outcome of an
election?*

Taking the # of undervotes in the whole state, p(reversal)=15% (n=20).

Putting more detail in and taking into account each counties' undervotes & proportional vote for Bush & Gore, p(reversal)<1%. (Note of the Editor: It was indeed later shown in an unofficial recount that Bush won again).

It has recently become apparent that a problem of great importance in electoral law is to evaluate whether a request for a hand recount has merit or not. Law in Florida authorizes a recount when there is a significant probability that such a recount would alter the outcome of the election. It is thus of great practical importance to have a method to determine this probability. I show here that in a simplified but fairly realistic scenario, we need only three pieces of data to determine this probability: the error rate of the machine, the # of votes favoring one candidate in the machine count, and the total # of votes.

I hypothesized that given a constant difference in the # of votes counted for each of two candidates and a finite error rate of the counting machine, the larger the total # of votes, the greater the chance that a recount will alter the winner. Here's my demonstration and the assumptions that I use to prove it.

How do we calculate the error rate of the machine? We first need to define what errors a machine can make. Let us assume there are only two candidates with real chances of winning the election. For every vote that a machine counts as a vote for candidate A, there are two possible error types that the machine can make: it can misinterpret a vote for candidate B as a vote for candidate A, or it can misinterpret an invalid vote or a vote for a third candidate (equivalent for our purposes) as a vote for candidate A. Analogously, there are two possible error types that a machine can make for a vote counted as a vote for B. Finally, for every vote counted as a vote for neither candidate (an invalid vote or a vote for a third candidate), there are also two error types the machine can make: to have misinterpreted a vote for A or a vote for B. Let us call the probabilities of these six classes of error p(B|A), p(N|A), p(A|B), p(N|B), p(A|N) & p(B|N), where A & B stand for the two leading candidates, respectively, and N stands for null votes and votes for other candidates.

In our first analysis, we will tackle the case of unbiased error in the counting machines, i.e. errors do not intrinsically favor any candidate. We can then write p(B|A) = p(A|B), p(N|A) = p(N|B) and p(A|N) = p(B|N). Furthermore, given the design of ballots, p(B|A) & p(A|B) are much smaller than p(A|N) & p(B|N): the most common error made by machines is to count a valid votes as an invalid vote, not to mistake one candidate for another. False positives are also rare relative to false negatives, so it is a fairly good approximation to consider only p(N|A) & p(N|B) as different from zero. For simplicity, let us call p(A|N)=p(B|N)=x.

How do we calculate these two probabilities? The error rate
will vary depending on whether the votes tallied are perfect punches, dimpled
ballots, pregnant chads, etc. There is no loss of generality derived from
assuming that votes in a given county fall in a certain distribution of perfect
punches, dimpled ballots, pregnant chads, etc, and that this distribution can
be estimated given a reasonable number of votes. Once the distribution is
estimated to a satisfactory degree of accuracy, the shape of it is independent
of the number of votes cast: each vote is drawn at random from the
distribution. Once we estimate this distribution, we can calculate the error
rate of the machine, p(A|N) & p(B|N), by comparing, *on a vote by vote basis*, the output of a machine count of a
representative (& thus sufficiently large) sample of votes drawn from this
distribution with the output of a hand count of the same votes. We assume that
a hand recount is done with sufficient care that we can consider it to be
error-free by definition, i.e. we will consider its result to be the true
measure of each vote. Alternatively, by assuming that every vote was cast for a
candidate, we can also estimate the error rate from the number of votes that
were counted as null by the machine.

The other two pieces of data needed -- the # of votes favoring one candidate in the machine count and the total # of votes-- are not controversial: they come from the original machine tally.

How do we then calculate the probability that a hand recount will change the winner of the election given a constant margin of votes for the victor?

Let us take each vote counted for A as a -1, each vote tallied for B as a 1, and each null vote as a 0, and calculate the probability P that the mean of all votes will change sign as a consequence of a recount. Given that each vote can be tallied correctly or incorrectly by the machine, the mean is a random variable whose mean depends on the actual votes cast and whose variance depends on the probability of error of the machine. It is this variance that will determine P. The greater the variance, the greater P. If there is only one vote cast and it is for candidate A, the machine's readout could be A with p=(1-x) or Null with p=x. The mean is thus m=(1-x)*-1, and the variance is (1-x)*(m-(-1))^2 + x*m^2. If x=0.05, m=-.95 & v=0.0024.

If there are two votes cast, both for A, m=(1-x)^2*-1 +
2*x*(1-x)*-0.5, and the variance is (1-x)^2 * (m+1)^2 + 2*x*(1-x) * (-.5-m)^2 +
x^2 * m^2. If x=0.05, m still is -0.95 but v now is 0.0238. So the variance in
the vote tally *increases* with the
number of votes to tally. This agrees with the previous notion that the
probability of a change in the outcome increases as the number of votes
increases. Intuitively, the reason why the variance in the outcome of the vote
increases with the number of votes is that given that the tally of each vote
has a probability of error, the more votes you count, the more errors you will
do, and the more different the outcome will be each time you repeat the
procedure. Why does the intuitive notion not apply here that the more times we
repeat a measurement, the smaller the standard error of the mean become? Simply
because having more votes to tally is not repeating the *same* measurement more times; each vote is a *different* measurement.

__Data__

# of votes counted blank with punchcard readers is 5 times that with optical readers, so non-blank read blank can be assumed to be >= 4/5 of the number found blank (Gore filing w/Supreme Court of FLA, p. 12 paragraph c, CNN edition).

This page has been visited times.

Back to Alex Bäcker's Publications.