## Voter fraud and bayesian inference – part 3

Here’s part1 and part2.

Welcome back. In the previous posts we saw how to do inference using the beta-binomial to get probabilities for the proportion of fake ballots in an election, as well as an upper bound on the probability that the election result is incorrect. We briefly mentioned the hypergeometric distribution but did discuss it further nor use it.

Like the binomial (and beta-binomial), the hypergeometric distrbution can be used to model the number of successes in a series of sampling events with a binary outcome. The distinction is that the binomial models sampling with replacement, whereas the hypergeometric models sampling without replacement. In other words, if we are sampling from a box, the binomial applies when the sample is returned to the box before drawing more samples. The hypergeometric applies when the sample is not returned. But wait, doesn’t that mean that we’ve been doing it wrong?

When auditing ballots we keep track of those already checked, a ballot is never audited twice. Shouldn’t we then be using the hypergeometric distribution? It turns out that the binomial distribution approaches the hypergeometric distribution in the limit of a large total number of items compared to the number sampled. This fits our case, as we can only audit a limited number of ballots compared to all those cast.

As we saw in the previous post, the beta distribution is a conjugate prior for the binomial, which makes inference very easy. Unfortunately this is no the case for the hypergeometric. But because of the converging behaviour seen above, we can stick to the beta-binomial’s easy calculations without sacrificing accuracy. For the sake of completeness we will quickly show the posterior for the hypergeometric, following [1]. Incidentally this “manual calculation” is what allowed us to obtain the images above, through the javascript implementation in the jsfiddle.

Again, this is just Bayes theorem with the hypergeometric likelihood function and a uniform prior. In [1] it is also pointed out that the normalization factor can be computed directly with this expression

We use this in the jsfiddle implementation to normalize. Another thing to note is that the hypergeometric posterior is 0 at positions that are inconsistent with evidence. One cannot have less successes than have been observed, nor more than are possible given the evidence. These conditions are checked explicitly in the implementation. Finally, the jsfiddle does not contain an implementation for obtaining the upper bound on the probablity of election error, only code for the posterior is present. How about forking it and adding that yourself?

In these three posts we have used bayesian inference to calculate probabilities over proportion of fake ballots, and from there to calculate probabilities that an election result was incorrect. These probabilities could be used to achieve trust from stakeholders in that everything went well, or conversely to detect a possible fraud and invalidate an election.

I’ll finish the post by mentioning that the techniques we have seen here can be generalized beyond the special case of detecting fake ballots for plurality votes. For example, one could use bayesian inference to conduct ballot audits for the sake of checking tally correctness, not just from failures in authentication, but from errors in counting. See [2] for this kind of more general treatment

[1] http://www.wmbriggs.com/public/HGDAmstat4.pdf

[2] https://www.usenix.org/system/files/conference/evtwote12/rivest_bayes_rev_073112.pdf

In this work, because results of audits are not just binary, and because tallies are not only plurality, the authors use dirichlet distrbutions and sampling using posteriors to project possible alternative tallies.

## Voter fraud and bayesian inference – part 2

We left off the discussion with

We want to calculate the proportion of fake ballots in an election based on the results of limited audits. We have seen how the binomial and hypergeometric distributions give probabilities for the results of an audit given an assumption about the proportion of fake ballots. Bayes theorem can be used to calculate the inverse probability that we are after, once we have specified a prior.

Bayesian inference is a process that takes prior information, and adds evidence to obtain a posterior distribution. In our case this posterior distribution will be over the possible proportion of fake ballots in the set of all ballots. Let’s begin with the binomial case. What prior should we use? One answer is that, since we know nothing about the proportion of fake ballots we should be indifferent about each possibility. This translates into a uniform prior, where all proportions are equally likely. For example

P(proportion = fake ballots/ total ballots) = 1 / (total ballots + 1)

Since there are n + 1 possibilities for the number of fake ballots, we give each of them the same weight, which is 1 / (n + 1).

### Beta + Binomial = Beta-Binomial

Before plugging this into bayes, a small technical detour.  Notice how the prior is itself a probability distribution, defined over the 0.0 – 1.0 interval. That is, the minimum proportion (0.0) is no fake ballots and maximum (1.0) is all fake ballots. It turns out there is a paramateric probability distribution one can use for this interval, it’s called the Beta distribution. The Beta distribution has two parameters, alpha and beta. The case of our neutral prior we defined above is equivalent to the Beta distribution with parameters (1, 1)

P(proportion ) = 1 / (n + 1) = Beta(1, 1)

We could express other knowledge with different choices of alpha and beta. But what’s the point of using the Beta, besides having a convenient way to specify priors? The point is that the Beta distribution is a conjugate prior of the binomial distribution. This means that the posterior distribution, once having taken into account available evidence, is also a Beta distribution. Meaning that the calculation of the posterior is much easier, as inference is just a matter of mapping the values of parameters of the Beta to some other values. Here is the posterior of the Beta distribution when it is used as the prior of the binomial (this is called the beta-binomial model).

Equations taken from [1]. The first line is just Bayes theorem, but the payoff is that the last line corresponds to a beta distribution, but with different parameters. In summary

with a beta prior, bayesian inference reduces to remapping the initial parameters alpha and beta, to alpha + k and beta + n – k, where k is the number of successes and n is the number of trials. Conjugate priors are an algebraic convenience that allow obtaining analytic expressions for posteriors easily. End of detour, please refer to [1] for further details.

Armed with our use of the beta-binomial obtaining the posterior given some audit results is simple. If we audited 10 ballots and 3 of them were fake our posterior would simply be

P(proportion = p | fake audit count = 3 out of 10)

= Beta(1 + 3, 1 + 10 – 3)

= Beta(4, 8)

here’s what Beta(4, 8) looks like

note how the peak of the distribution is at 0.3, it makes sense since in the sample 3 out 10 ballots where invalid. Evidence has transformed our initial uniform prior into the distribution seen above. This meets our original objective, a way to judge how many ballots are fake in the entire set of ballots based on limited audits. But it’s not the end of the story. What we would like also is to have an estimate as to whether or not the election result is correct. As we said previously, this estimation can be used either as a guarantee that all went well or in the opposite case to detect a problem and even invalidate the results.

The probablity that an election result was correct, given uncertainty about fake ballots, depends on two things. One is the proportion of ballots that are fake, this is what we already have a calculation for. The other thing is the actual election outcome; specifically a measure of how close the result was. The reason is simple, if the election was close, a small number of invalid ballots could cast doubts on its correctness. Conversely, if the result was a landslide, the presence of fake votes has no bearing on its correctness. For our purposes we will stick with a simple example in which the election decides between two options via simple plurality.

Call the difference between the winning and losing option d

In order for the election to be wrong, there must be a minimum of d fake votes. The existence of d fake votes does not imply that the result was wrong, but d fake votes are a necessary condition. Thus a probability that the number of fake votes is greater than or equal to d represents an upper bound on probability that the election was incorrect. Call this E (for error)

(upper limit on the probability that the election was wrong)

We have P(proportion), it is the posterior we got above. How do we get P(proportion >= some constant)? Through the beta distribution’s cumulative distribution function, which is defined in general as

In order to reverse the inequality, we just need to subtract it from 1 (gives us the tail distribution). We finally have

Probability of incorrect result

= P(proportion >= d / total ballots)

= 1 – Cumulative Distribution Function of P(d / total ballots)

One final correction. Because we have sampled a number of votes with known results, we must apply our calculations to the remaining ballots.

P(E) = 1 – CDF(d – sampled ballots / total ballots – sampled ballots)

Let’s try an example, an election between option A and option B with the following numbers.

Total ballots = 1000

Audited ballots = 100

Audited fake ballots = 4

which gives

Posterior = Beta(5, 97)

d = 100

Minimum fraction of fake votes required to change result = (100 – 4) / (1000 – 10) = 0.1066

Upper bound on probability of error

= 1 – CDF(Beta(5, 97), 0.1066)

= 0.01352

In conclusion, the probability of error due to fake ballots in this election is less than or equal to 1.35%.

You can find a javascript implementation for everything we’ve seen until now in this jsfiddle. Calculations for the binomial, beta, hypergeometric and cumulative distribution function are done with the jStat javascript library. Wait, you say, what about the hypergeometric? We’ll leave that for the next post, which should be pretty short.

## Voter fraud and bayesian inference

Wikipedia says

Personation  is a term used in law for the specific kind of voter fraud where an individual votes in an election, whilst pretending to be a different elector.

when someone practices personation multiple times to cast multiple votes we are talking about ballot stuffing. In this post we will consider an election in which authentication is not 100% secure, where personation is difficult but not impossible.  Furthermore we will assume there is some available, but costly method by which  ballots can be audited to determine whether or not they were cast via personation or were in fact valid.

What makes the problem non trivial is that ballot auditing is costly and cannot in principle be performed for the entirety of the ballots cast. Hence we would like to estimate, from a limited number of audited ballots, how severe ballot stuffing was for an election. This estimation can be used either as a guarantee that all went well or in the opposite case to detect a problem and even invalidate the results.

What we need is a mathematical model that given some information about the results of an auditing processes allows us to estimate the proportion of “fake” ballots in the set of all those cast. In other words, we are talking about statistical inference; in this post will use a bayesian approach. Let’s get to work.

Imagine we have a box with all the ballots for an election, and the auditing process consists in randomly taking one out and determining whether it is valid or not, recording the result, and then repeating a limited number of times. After we have recorded the results of all the audits, we would like to know how many of ballots in the entire box are fake. Two things come to mind. First, that the result of each audit is binary, we either get FAKE or VALID. Second, that if the proportion of fake ballots in the box is p, then probability that a randomly chosen ballot is fake is p; the probability that it is valid is 1 – p.

The auditing process as a whole yields a count of fake ballots and a count of valid ballots. If we have probablity p for the result of a single audit , can we assign a probablity to the count resulting from the complete audit process? Yes, the binomial distribution and its brother the hypergeometric distribution do just that[1]. Here they are

In our example, k above corresponds to the count of fake ballots. So these distributions give us a way to calculate the probability that a specific number of fake ballots is obtained in the audit assuming a certain proportion of fake ballots in the entire election. For example, let’s say we have 100 ballots total and we know that 10 of them are fake. What is the probability that if we audited 10 ballots we would obtain a fake count of 3?

P(X = 3) = 0.057395628 (binomial)

P(X = 3) = 0.0517937053324283 (hypergeometric)

Only 5%, it is unlikely we’d find 3 fake ballots with only 10 audits, given that there are only 10 out of 100 in total.

We have a way to calculate the probability of some outcome given some assumption about the proportion of fake ballots. But remember, what we want is exactly the opposite: given a certain result for an audit, we’d like to estimate the proportion of fake ballots in the entire set. It is this inversion that makes the problem a case of bayesian inference, and our friend Bayes theorem gives us the relationship between what we have and what we want.

in our case, it translates down to

P(Proportion of fake ballots | Audit Result) = P(Audit Result | Proportion of fake ballots) * P(Proportion of fake ballots) / P(Audit Result)

What we were calculating before is P(Audit Result | Proportion of fake ballots), which we can plug into the formula, together with the other terms, to get what we want: P(Proportion of fake ballots | Audit Result). The other terms are

P(Audit Result) =

The unconditional probability that a certain audit result occurs. It can be calculated by summing over all possible proportions, like this version of Bayes theorem shows:

As seen in the bottom term. Because the bottom term is common to all values of Ai, it can be interpreted as a normalizing constant that ensures that probabilities sum to 1.

P(Proportion of fake ballots) =

The prior probability that some proportion of fake ballots occurs. This ingredient is crucial for Bayesian inference and the Bayesian approach in general. It is an estimate of  the quantity we want to calculate that is prior to any of the evidence we obtain. It can be used to encode prior knowledge about what we are calculating. If we have no knowledge, we can try to encode that in a “neutral prior”. This last point is a very deep problem in Bayesian inference, as is the general problem of choosing priors. We won’t go into in detail here.

Recap. We want to calculate the proportion of fake ballots in an election based on the results of limited audits. We have seen how the binomial and hypergeometric distributions give probabilities for the results of an audit given an assumption about the proportion of fake ballots. Bayes theorem can be used to calculate the inverse probability that we are after, once we have specified a prior. See it in action in the next post.

[1] There is an important difference, the binomial distribution models sampling with replacement, whereas the hypergeometric models sampling without replacement. In the next post we will consider this difference and its significance for our problem.