The pairwise-bradleyterry model for pairwise voting

In a previous post we discussed pairwise voting and the pairwise-beta model as a way to obtain a global ranking over candidates using bayesian inference with the beta distribution. In that post we remarked in the final paragraph that the pairwise-beta model is not perfect:

In particular, it does not exploit information about the strength of the opposing item in a pairwise comparison.

In this post we will look at a better model which addresses this particular problem, albeit at a computational cost. To begin we present a pathological case which exhibits the problem when using the pairwise-beta.

Consider the following set of pairwise ballots, where A, B, C, D, E and F are options, and A > B indicates that A is preferred to B. There are 5 ballots:

A > B

B > C

C > D

D > E

F > E

Applying the pairwise-beta algorithm method to this set of ballots yields the following output (options A-F are referred to as numbers 0-5):

which is equivalent to the following ranking:

  1. A, F
  2. B, C, D
  3. E

A and F share the first position. B, C and D share the second position. E is last.

Hopefully the problem in this ranking is apparent: the strength of the opposing option in a pairwise comparison is not affecting the global ranking. This is why option F, which only beats the last option, is ranked at the same position as A, which “transitively” beats every other option. Similarly, options B, C and D are ranked at the same level, even though presumably option B should be stronger as it beats option C which beats option D.

In other words, beating a strong option should indicate more strength than beating a weak option. Similarly, being beaten by a strong option should indicate less weakness than being beaten by a weak option.

We can resort to the Bradley-Terry [1] model to address these shortcomings. The Bradley-Terry is a probabilistic model that can be used to predict the outcome of pairwise comparisons, as well as to obtain a global ranking from them. It has the following form:

and in logit form[2]:


The parameters (p’s and lambdas) can be fit using maximum likelihood estimation. One can consider these to represent the relative strength of options and therefore give a global ranking, although strictly speaking their interpretation is rooted in probabilities of outcomes of comparisons.

In order to apply this model we can use the BradleyTerry2 R package by Turner and Firth[2], which fits the model using tabular input data. Armed with this package all we need is some extra plumbing in our Agora Voting tallying component and we’re ready to go. Let’s run it against the same ballots we did above, we get:

which is equivalent to the following ranking:

  1. A
  2. B
  3. C
  4. D
  5. F
  6. E

Notice how this ranking does away with all the problems we mentioned with the pairwise-beta result. In particular, note how option F, which above was ranked joint first, is in this case ranked fifth. This is because it beat option E, which is last, and therefore not much strength can be inferred from that comparison.

Before concluding that the pairwise-beta model is terrible, remember that the results we got here correspond to a handpicked pathological set of ballots. In general it seems reasonable to expect results from both models to converge as more data accumulates and the strength of opponents is evened out. This hypothesis seems to match that stated in work by Salganik[3], where the pairwise-beta and a more robust model are compared saying:

In the cases considered in the paper, the two estimates of the score were very similar; there was a correlation of about 0.95 in both cases.

In summary, in this and the previous post we have described two models that can be used for pairwise elections, where candidates are asked to compare options in pairs. We have seen how one of the models works well and is easy to calculate, but can potentially give unrealistic rankings when data is sparse. We then considered a second more robust model which addresses this problem, but is computationally more expensive. Further work is required to determine exactly how computationally demanding our pairwise-bradleyterry implementation is.


[1] BRADLEY, R. A. and TERRY, M. E. (1952). Rank analysis of incomplete block designs. I. The method of paired comparisons.  –

[2] Bradley-Terry Models in R: The BradleyTerry2 Package –

[3] Wiki surveys: Open and quantifiable social data collection -

The pairwise-beta model for pairwise voting

In a pairwise vote, voters are asked to repeatedly pick between pairs of options, selecting the one they favor of the two. The procedure then combines all these pairwise choices in order to establish a global ranking for all options. A pairwise vote is statistical in nature, it must infer preference data that voters have not explicitly stated in order to obtain a result.

This statistical property allows obtaining an approximate preference ordering over a large list of options without overwhelming the voter with too much work. For example, if each voter was asked to establish a preference over 50 items, they would be exhausted and participation would suffer.

The pairwise-beta is a simple bayesian method used to rank items in pairwise votes. It is based on the beta-binomial model [1], which is composed of a beta prior and a binomial likelihood. This model is very tractable: because the beta distribution is conjugate to the binomial, the posterior also has beta form and is easily obtained:


I will not present a formal justification of the pairwise-beta model for pairwise comparisons, rather I will present some intuitions that should convey how and why the model works.

The key bridge to interpret pairwise comparisons in terms of the beta-binomial model is to realise that the better-worse relation between options maps directly into the success/failure outcomes of a bernoulli trial. We can thus establish a correspondence[4] between pairwise comparisons and bernoulli trials:

  • each item i corresponds to a sequence of bernoulli trials, Bi
  • a comparison in which i wins corresponds to a success in Bi
  • a comparison in which i loses corresponds to a failure in Bi

The question we are trying to answer is

Given the proportion of comparisons in which i wins, what is the proportion of items that are globally better than i?

that reformulated in terms of our correspondences becomes

Given a sequence of bernoulli trials Bi, what is the proportion of successes/losses for i?

Which is a case of standard binomial proportion estimation[2]. As we noted before, the posterior of the beta binomial is also a beta distribution, given by


If we want a point estimate we just use the mean for this distribution which is


This gives us, for each item i, an estimation for the number of items that are better/worse than itself. This leads directly to a global ranking: the best ranked items will be those which are estimated to be better than most other items.

 In summary, the procedure is

  1.  For each item i, obtain the corresponding sequence of bernoulli trials Bi
  2. For each item i, calculate the posterior beta distribution mean given the data from 1)
  3. Create a global ranking based on the proportions for each item, as calculated in 2)

The pairwise-beta model is simple but not perfect. In particular it does not exploit information about the strength of the opposing item in a pairwise comparison. However, despite this drawback it performs well in practice. Please refer to [3] for details.





[4] The correspondence is two to one, as each comparison yields two bernoulli trials

Voter fraud and bayesian inference – part 2

We left off the discussion with

We want to calculate the proportion of fake ballots in an election based on the results of limited audits. We have seen how the binomial and hypergeometric distributions give probabilities for the results of an audit given an assumption about the proportion of fake ballots. Bayes theorem can be used to calculate the inverse probability that we are after, once we have specified a prior.

Bayesian inference is a process that takes prior information, and adds evidence to obtain a posterior distribution. In our case this posterior distribution will be over the possible proportion of fake ballots in the set of all ballots. Let’s begin with the binomial case. What prior should we use? One answer is that, since we know nothing about the proportion of fake ballots we should be indifferent about each possibility. This translates into a uniform prior, where all proportions are equally likely. For example

P(proportion = fake ballots/ total ballots) = 1 / (total ballots + 1)

Since there are n + 1 possibilities for the number of fake ballots, we give each of them the same weight, which is 1 / (n + 1).

Beta + Binomial = Beta-Binomial

Before plugging this into bayes, a small technical detour.  Notice how the prior is itself a probability distribution, defined over the 0.0 – 1.0 interval. That is, the minimum proportion (0.0) is no fake ballots and maximum (1.0) is all fake ballots. It turns out there is a paramateric probability distribution one can use for this interval, it’s called the Beta distribution. The Beta distribution has two parameters, alpha and beta. The case of our neutral prior we defined above is equivalent to the Beta distribution with parameters (1, 1)

P(proportion ) = 1 / (n + 1) = Beta(1, 1)

We could express other knowledge with different choices of alpha and beta. But what’s the point of using the Beta, besides having a convenient way to specify priors? The point is that the Beta distribution is a conjugate prior of the binomial distribution. This means that the posterior distribution, once having taken into account available evidence, is also a Beta distribution. Meaning that the calculation of the posterior is much easier, as inference is just a matter of mapping the values of parameters of the Beta to some other values. Here is the posterior of the Beta distribution when it is used as the prior of the binomial (this is called the beta-binomial model).


Equations taken from [1]. The first line is just Bayes theorem, but the payoff is that the last line corresponds to a beta distribution, but with different parameters. In summary


with a beta prior, bayesian inference reduces to remapping the initial parameters alpha and beta, to alpha + k and beta + n – k, where k is the number of successes and n is the number of trials. Conjugate priors are an algebraic convenience that allow obtaining analytic expressions for posteriors easily. End of detour, please refer to [1] for further details.

Armed with our use of the beta-binomial obtaining the posterior given some audit results is simple. If we audited 10 ballots and 3 of them were fake our posterior would simply be

P(proportion = p | fake audit count = 3 out of 10)

= Beta(1 + 3, 1 + 10 – 3)

= Beta(4, 8)

here’s what Beta(4, 8) looks like


note how the peak of the distribution is at 0.3, it makes sense since in the sample 3 out 10 ballots where invalid. Evidence has transformed our initial uniform prior into the distribution seen above. This meets our original objective, a way to judge how many ballots are fake in the entire set of ballots based on limited audits. But it’s not the end of the story. What we would like also is to have an estimate as to whether or not the election result is correct. As we said previously, this estimation can be used either as a guarantee that all went well or in the opposite case to detect a problem and even invalidate the results.


The probablity that an election result was correct, given uncertainty about fake ballots, depends on two things. One is the proportion of ballots that are fake, this is what we already have a calculation for. The other thing is the actual election outcome; specifically a measure of how close the result was. The reason is simple, if the election was close, a small number of invalid ballots could cast doubts on its correctness. Conversely, if the result was a landslide, the presence of fake votes has no bearing on its correctness. For our purposes we will stick with a simple example in which the election decides between two options via simple plurality.

Call the difference between the winning and losing option d

d = winner votes – loser votes

In order for the election to be wrong, there must be a minimum of d fake votes. The existence of d fake votes does not imply that the result was wrong, but d fake votes are a necessary condition. Thus a probability that the number of fake votes is greater than or equal to d represents an upper bound on probability that the election was incorrect. Call this E (for error)

P(proportion of fake votes >= d / total votes) = E

(upper limit on the probability that the election was wrong)

We have P(proportion), it is the posterior we got above. How do we get P(proportion >= some constant)? Through the beta distribution’s cumulative distribution function, which is defined in general as

In order to reverse the inequality, we just need to subtract it from 1 (gives us the tail distribution). We finally have

Probability of incorrect result

= P(proportion >= d / total ballots)

= 1 – Cumulative Distribution Function of P(d / total ballots)

 One final correction. Because we have sampled a number of votes with known results, we must apply our calculations to the remaining ballots.

P(E) = 1 – CDF(d – sampled ballots / total ballots – sampled ballots)

Let’s try an example, an election between option A and option B with the following numbers.

Votes for A = 550

Votes for B = 450

Total ballots = 1000

Audited ballots = 100

Audited fake ballots = 4

which gives

Posterior = Beta(5, 97)

d = 100

Minimum fraction of fake votes required to change result = (100 – 4) / (1000 – 10) = 0.1066

Upper bound on probability of error

= 1 – CDF(Beta(5, 97), 0.1066)

= 0.01352

In conclusion, the probability of error due to fake ballots in this election is less than or equal to 1.35%.


You can find a javascript implementation for everything we’ve seen until now in this jsfiddle. Calculations for the binomial, beta, hypergeometric and cumulative distribution function are done with the jStat javascript library. Wait, you say, what about the hypergeometric? We’ll leave that for the next post, which should be pretty short.


Liquid filtering


Over at agoravoting

We have a situation where we have to collectively choose among many options, this is scaling the solution space. It is infeasible to apply voting as is, because voters cannot consider all these options to make a judgement. So what we do is to distribute the cognitive load in a way that reflects user delegation. The problem of liquid filtering is the assignment of voters to questions according to delegation choices, in an optimal way.

Continue reading here

Democracy, proposal elaboration and voting

The first thing we associate with democracy is voting and elections. But the democracy-as-voting picture is limited. When voting in a referendum, for example, we choose over a predefined set of options. How and which options are included is not subjected to the vote; voting in itself says nothing of this.

So the question is, can the decision process be extended backwards to determine the option set? We could have a “pre-vote”, where we choose over a previous, wider set of predefined options to determine which of those end up in the final vote. But that just repeates the problem at an earlier stage.

The solution is to do away with the idea of a predefined set of options altogether, to allow the voters themselves to suggest any number of options in a distributed fashion. In this way, people not only contribute knowledge by selecting from a set of options, but more thoroughly by creating the options themselves, in an unrestricted way. This poses aditional challenges of course, because choosing an option out of a list is simpler than creating said options.

We can call this opening of proposal elaboration to all participants collaborative proposal elaboration (see also collaborative government). Just as voting aggregates information in the shape of individual votes, collaborative proposal elaboration aggregates information in the shape of free form contributions. Because proposals are normally described with text, these contributions take the form of editing[1]; our predefined options are thus generalized to arbitrary text documents[2].

This presents a wider view of democracy, where decision making is not just limited to voting over proposals, but includes elaborating the proposals themselves. Let’s review the process. First, we have some matter we wish to make a decision about. Then, people present and elaborate proposals in an unrestricted and distributed way. Finally, people vote on any of these freely created proposals to select one (or many).

But let’s repeat the same pattern of thinking we began with: who decides what matters are to be decided upon? Who creates the context in which our decision making process occurs? If it is up to some central authority to pose matters on which to decide, we are again limiting the scope of the decision process (although it is less limited than the case of predefined options)

The solution is the same as above: anyone can propose matters on which to decide. And the definition of “the matter” is itself a text document that is voted upon. With this final piece, we have extended our decision process to include all the stages[3]. From proposing to decide on some matter, to the elaboration of the possible proposals to resolve the matter, to the selection of the proposal that is finally accepted.


[1] Ideally, this collaborative editing of text occurs in a medium that facilitates communication and debate. Debate can be an information aggregation process for voting as well as for policy elaboration. In the latter case, there are probably specific mechanisms that aid debate in the context of editing text, such as those seen here.

[2] In contrast to predefined options, arbitrary text documents represent a much wider solution space. This added flexibility allows more precise and earlier information aggregation as well as presenting opportunities for consensus during collaborative (and realtime) editing of text.

[3] Although the creation of the voting space itself, that is, the set of people that can participate in some decision making domain, is left specified. In one scenario, voting spaces could be created by anyone, and permission to join a voting space could be granted following a vote put to all existing members. An unrestricted approach to creating any number of these voting spaces would support spontaneous community formation.