Voter fraud and bayesian inference

Wikipedia says

Personation  is a term used in law for the specific kind of voter fraud where an individual votes in an election, whilst pretending to be a different elector.

when someone practices personation multiple times to cast multiple votes we are talking about ballot stuffing. In this post we will consider an election in which authentication is not 100% secure, where personation is difficult but not impossible.  Furthermore we will assume there is some available, but costly method by which  ballots can be audited to determine whether or not they were cast via personation or were in fact valid.

What makes the problem non trivial is that ballot auditing is costly and cannot in principle be performed for the entirety of the ballots cast. Hence we would like to estimate, from a limited number of audited ballots, how severe ballot stuffing was for an election. This estimation can be used either as a guarantee that all went well or in the opposite case to detect a problem and even invalidate the results.

What we need is a mathematical model that given some information about the results of an auditing processes allows us to estimate the proportion of “fake” ballots in the set of all those cast. In other words, we are talking about statistical inference; in this post will use a bayesian approach. Let’s get to work.

Imagine we have a box with all the ballots for an election, and the auditing process consists in randomly taking one out and determining whether it is valid or not, recording the result, and then repeating a limited number of times. After we have recorded the results of all the audits, we would like to know how many of ballots in the entire box are fake. Two things come to mind. First, that the result of each audit is binary, we either get FAKE or VALID. Second, that if the proportion of fake ballots in the box is p, then probability that a randomly chosen ballot is fake is p; the probability that it is valid is 1 – p.

The auditing process as a whole yields a count of fake ballots and a count of valid ballots. If we have probablity p for the result of a single audit , can we assign a probablity to the count resulting from the complete audit process? Yes, the binomial distribution and its brother the hypergeometric distribution do just that[1]. Here they are

In our example, k above corresponds to the count of fake ballots. So these distributions give us a way to calculate the probability that a specific number of fake ballots is obtained in the audit assuming a certain proportion of fake ballots in the entire election. For example, let’s say we have 100 ballots total and we know that 10 of them are fake. What is the probability that if we audited 10 ballots we would obtain a fake count of 3?

P(X = 3) = 0.057395628 (binomial)

P(X = 3) = 0.0517937053324283 (hypergeometric)

Only 5%, it is unlikely we’d find 3 fake ballots with only 10 audits, given that there are only 10 out of 100 in total.

We have a way to calculate the probability of some outcome given some assumption about the proportion of fake ballots. But remember, what we want is exactly the opposite: given a certain result for an audit, we’d like to estimate the proportion of fake ballots in the entire set. It is this inversion that makes the problem a case of bayesian inference, and our friend Bayes theorem gives us the relationship between what we have and what we want.

in our case, it translates down to

P(Proportion of fake ballots | Audit Result) = P(Audit Result | Proportion of fake ballots) * P(Proportion of fake ballots) / P(Audit Result)

What we were calculating before is P(Audit Result | Proportion of fake ballots), which we can plug into the formula, together with the other terms, to get what we want: P(Proportion of fake ballots | Audit Result). The other terms are

P(Audit Result) =

The unconditional probability that a certain audit result occurs. It can be calculated by summing over all possible proportions, like this version of Bayes theorem shows:

As seen in the bottom term. Because the bottom term is common to all values of Ai, it can be interpreted as a normalizing constant that ensures that probabilities sum to 1.

P(Proportion of fake ballots) =

The prior probability that some proportion of fake ballots occurs. This ingredient is crucial for Bayesian inference and the Bayesian approach in general. It is an estimate of  the quantity we want to calculate that is prior to any of the evidence we obtain. It can be used to encode prior knowledge about what we are calculating. If we have no knowledge, we can try to encode that in a “neutral prior”. This last point is a very deep problem in Bayesian inference, as is the general problem of choosing priors. We won’t go into in detail here.

Recap. We want to calculate the proportion of fake ballots in an election based on the results of limited audits. We have seen how the binomial and hypergeometric distributions give probabilities for the results of an audit given an assumption about the proportion of fake ballots. Bayes theorem can be used to calculate the inverse probability that we are after, once we have specified a prior. See it in action in the next post.


[1] There is an important difference, the binomial distribution models sampling with replacement, whereas the hypergeometric models sampling without replacement. In the next post we will consider this difference and its significance for our problem.

Liquid democracy and spectral theory

In this post we will show some practical examples of how liquid democracy can be understood in terms of, and make use of results from spectral graph theory. For more background please see [Vigna2009]. Wikipedia says:

In mathematics, spectral graph theory is the study of properties of a graph in relationship to the characteristic polynomial, eigenvalues, and eigenvectors of matrices associated to the graph, such as its adjacency matrix or Laplacian matrix.

What does this have to do with liquid democracy? To answer this, let’s remember what defines liquid democracy is: system of transitive proxy voting. Proxy in that I can delegate my vote to some entity. Transitive because that entity can itself delegate the vote, and so on. Imagine a simple case with three voters, Alice, Bob, and Charlie. Alice delegates to Bob, and Bob delegates to Charlie. It is very natural to represent this as a graph, this is what we call the delegation graph

graph_decycled

Alice chooses Bob, Bob chooses Charlie

Assuming each voter starts with one vote, this would give us the following weights for each voter:

Alice = 1, Bob = 2, Charlie = 3

Because Alice and Bob have delegated their votes, Charlie will end up casting three votes, one for himself, one for Bob and one for Alice. What determines these weights is the structure of the graph; who is voting for who is encoded in the connections between vertices. In graph theory, the object that encodes this information is the adjacency matrix. Here’s the adjacency matrix for our example:

M

Where each row shows who each voter delegated to. Alice (A) delegated to Bob (B), hence the 1 in the second position of the first row. Similarly, Bob (B) delegated to Charlie, (C) as can be seen in the second row. Because Charlie did not delegate, the third row is all zeroes.

We can express the liquid tally above with these equations (1’s represent voter’s initial weights)

0*A + 0*B + 0*C + 1 = A

1*A + 0*B + 0*C + 1 = B

0*A + 1*B + 0*C  + 1 = C

Note how the 0’s and 1’s above correspond to the columns of the adjacency matrix. The above can be represented[1] in matrix form:

(A B C) * AdjacencyMatrix = (A B C)

This is an eigenvalue equation, whose eigenvector (the (A B C) row vector) corresponds to the result of the liquid tally. Note how the equation is recursive, which fits the recursive nature of transitive delegation. Vote weights are calculated in terms of vote weights themselves, the same way each delegate transmits votes that result from previous delegations.

When the adjacency matrix is used to evaluate the importance or influence of nodes in a graph in this way we are speaking of eigenvector centrality. Our example shows that calculating centrality is basically equivalent to tallying in liquid democracy. This is what makes the connection between spectral theory and liquid democracy.

Eigenvector centrality, Katz and Pagerank

Yes, that’s PageRank as in google, in case you thought this whole talk of centrality was a bit abstract, it’s what made google what it is today. Eigenvector centrality, Katz centrality, and PageRank are related methods to measure the importance of a node in a network. We won’t go into the differences between each measure, besides noting that both Katz and PageRank include an attenuation factor that decreases contributions from distant nodes, whereas eigenvector centrality does not.

In order to run some examples we will use the networkX library, which includes several functions for calculating centrality as well as many other useful features. If you want to play along with this code you will need to install the library as well as its dependencies, including matplotlib and numpy. If you’re on windows I suggest downloading the comprehensive WinPython package which includes everything you need here.

Let’s first create the graph corresponding to our example with Alice, Bob and Charlie. Here’s the code that does that

This generated the image shown earlier. Now to calculate the tally, we will run both Katz and PageRank.

which gives us

Both results match the tally we showed before. A couple of minor points above. First, the PageRank result was rescaled to make it match Katz. Second, the adjacency matrix for Katz was reversed as the networkx 1.8.1 Katz implementation is using a right eigenvector (this has been changed to left eigenvector in master).

More importantly, the alpha parameter is a damping factor. In the language of liquid democracy it modulates just how transitive delegation is by reducing contributions the further away the originate. For example, let’s change the above to alpha = 0.5:

Now Charlie receives 25% of Alice’s vote and 50% of Bob’s vote. So alpha quantifies the fraction of the vote that is effectively delegated. We can interpret then that a liquid democracy tally is a special case of Katz centrality and PageRank. In fact, liquid democracy is the limiting case of Katz and PageRank when alpha = 1.0, ie no damping (which is why you get viscous democracy in [Boldi2011]).

What about cycles?

One of the first things you have to deal with if you’ve implemented a liquid tallying algorithm is the possibility of cycles in the delegation graph, otherwise the procedure will blow up. Having detected a cycle at tally time the standard treatment is to consider votes that enter it as lost. In order to prevent that undesirable situation you can do cycle detection at vote time to warn the user that his/her vote may produce such a cycle.

What happens if we add a cycle to our example? Let’s try it

graph

and we get

The reason this happens has to do with the details of the algorithm that calculates eigenvectors; in particular the relationship between its convergence and the attenuation factor alpha[2]. The short story is this: using an attenuation factor of 1.0 on a graph with cycles may cause problems.

Just as liquid tally algorithms have to deal with cycles, so do we in order to make centrality work correctly. Fortunately there are fast algorithms to detect cycles in graphs. NetworkX offers an implementaton of an improved version of Tarjan’s strongly connected components algorithm, we will use it to define a function that removes cycles in a graph

Using this function we can obtain liquid tallies for any delegation graph correctly, using either Katz or PageRank. See the bottom of this post for the full python script demonstrating this.

Liquid democracy and degree (or branching factor)

Before we said that liquid democracy is the limiting case of Katz centrality and PageRank when alpha = 1.0. In the last section we saw another requirement besides that of alpha = 1.0: that the delegation graph must be acyclic, in other words a DAG. There is one more property that we can consider, degree.

A node’s degree is the number of (in our case, outward) connections with other nodes. In terms of delegation, it is the number of delegates that a voter chooses. Standard liquid democracy uses degree = 1, but such a requirement could in theory be relaxed. How does this fit in with Katz and PageRank? Lets construct a graph where voters may choose one or two delegates.

which gives

graph

resulting values

We see how Katz centrality does not yield a correct tally as it is not dividing outgoing weights for voters who split their delegation among two delegates, instead we get inflated weights. But the PageRank result does work, Bob’s two votes are split correctly, and the delegation proceeds normally from then on.

In summary

  • Liquid democracy is a special case of Katz centrality given
    • a damping factor alpha = 1.0
    • a directed acyclic graph of degree d = 1
  • Liquid democracy is a special case of PageRank given
    • a damping factor alpha = 1.0
    • a directed acyclic graph of degree d >= 1

That’s it for our quick tour of the relationship between liquid democracy and spectral theory. We have also seen how liquid democracy could be extended to include damping (as in [Boldi2011]), or to allow “multi delegation”.


Notes/References

[Vigna2009] Sebastiano Vigna – Spectral Ranking  http://arxiv.org/abs/0912.0238

[Page1998] The PageRank Citation Ranking: Bringing Order to the Web http://ilpubs.stanford.edu:8090/422/1/1999-66.pdf

[Boldi2011] Viscous Democracy for Social Networks http://chato.cl/papers/boldi_bonchi_castillo_vigna_2011_viscous_democracy_social_networks.pdf

[1] For simplicity, I have ignored the initial weights associated with each voter in the matrix equation. These initial weights are what makes liquid tallying equivalent to undamped Katz centrality rather than eigenvector centrality.

[2] For details as to alpha and PageRank See http://vigna.di.unimi.it/ftp/papers/PageRankFunctional.pdf section 5 Limit behaviour.

In the case of the networkX implementation of Katz centrality an alpha of 1.0 is guaranteed to converge as all eigenvalues of an acyclic graph are 0 (see http://www.emis.de/journals/JIS/VOL7/Sloane/sloane15.pdf and http://networkx.github.io/documentation/latest/reference/generated/networkx.algorithms.centrality.katz_centrality.html#networkx.algorithms.centrality.katz_centrality)

Python Script: Random liquid Tallying with Katz/PageRank

Liquid filtering

T

Over at agoravoting

We have a situation where we have to collectively choose among many options, this is scaling the solution space. It is infeasible to apply voting as is, because voters cannot consider all these options to make a judgement. So what we do is to distribute the cognitive load in a way that reflects user delegation. The problem of liquid filtering is the assignment of voters to questions according to delegation choices, in an optimal way.

Continue reading here

Parallel collections for vote processing

At AgoraVoting we recently completed a very important feature, cryptographically secure voting. Among many other things, this adds a lot of heavy number crunching to the process of carrying out elections. One of the steps in the process validates votes, using something called proofs of knowledge. I won’t go into the math details here, just note that like other domains such as 3D graphics, processing votes is embarrassingly parallel. So we carried out an experiment to see how Scala’s parallel collections can achieve parallelism for one particular task.

In this experiment, we have to parse voting records which are then transformed into collections for their validation. The technique is to first obtain a collection with all the necessary data, and then compute in parallel on it. But first, lets see what happens with sequential code, for comparison. I’ve left out the preprocessing code that first obtains the collection, here’s the compute-intensive fragment:

[scala]

ctexts.foreach( vote => {
vote.foreach( question => {

val pk_p = BigInt((question(2) \ "p").as[String])
val pk_g = BigInt((question(2) \ "g").as[String])

val commitment = BigInt((question(0) \ "commitment").as[String])
val response = BigInt((question(0) \ "response").as[String])
val challenge = BigInt((question(0) \ "challenge").as[String])
val alpha = BigInt((question(1) \ "alpha").as[String])

val toHash = alpha + "/" + commitment
val digest = MessageDigest.getInstance("SHA-256")
val hash = digest.digest(toHash.getBytes("UTF-8"))
val expected = BigInt(1, hash)

assert (challenge == expected)

val first_part = pk_g.modPow(response, pk_p)
val second_part = commitment * (alpha.modPow(challenge, pk_p)) % pk_p

assert(first_part == second_part)
})
})

[/scala]

Here’s what happens when running this code:

seq

as you can see, the cores are underutilized. This test run took 1175.254 seconds. Now let’s turn ctexts into a parallel collection before processing on it:

[scala highlight=”1″]

ctexts.par.foreach( vote => {
vote.foreach( question => {

// the same code here …

})
})

[/scala]

Yes, that’s a difference of just three characters, par converts ctexts into a parallel collection. Here’s what happens:

par

All the cores are maxed out, total time: 307.184 seconds. Not bad!