In a previous post we discussed pairwise voting and the pairwise-beta model as a way to obtain a global ranking over candidates using bayesian inference with the beta distribution. In that post we remarked in the final paragraph that the pairwise-beta model is not perfect:

In particular, it does not exploit information about the strength of the opposing item in a pairwise comparison.

In this post we will look at a better model which addresses this particular problem, albeit at a computational cost. To begin we present a pathological case which exhibits the problem when using the pairwise-beta.

Consider the following set of pairwise ballots, where A, B, C, D, E and F are options, and A > B indicates that A is preferred to B. There are 5 ballots:

A > B

B > C

C > D

D > E

F > E

Applying the pairwise-beta algorithm method to this set of ballots yields the following output (options A-F are referred to as numbers 0-5):

1 2 3 4 5 6 |
[(0, {'wins': 1, 'losses': 0, u'score': 0.6666666666666666}), (5, {'wins': 1, 'losses': 0, u'score': 0.6666666666666666}), (1, {'wins': 1, 'losses': 1, u'score': 0.5}), (2, {'wins': 1, 'losses': 1, u'score': 0.5}), (3, {'wins': 1, 'losses': 1, u'score': 0.5}), (4, {'wins': 0, 'losses': 2, u'score': 0.25})] |

which is equivalent to the following ranking:

- A, F
- B, C, D
- E

A and F share the first position. B, C and D share the second position. E is last.

Hopefully the problem in this ranking is apparent: the strength of the opposing option in a pairwise comparison is not affecting the global ranking. This is why option F, which only beats the last option, is ranked at the same position as A, which “transitively” beats every other option. Similarly, options B, C and D are ranked at the same level, even though presumably option B should be stronger as it beats option C which beats option D.

In other words, beating a strong option should indicate more strength than beating a weak option. Similarly, being beaten by a strong option should indicate less weakness than being beaten by a weak option.

We can resort to the Bradley-Terry [1] model to address these shortcomings. The Bradley-Terry is a probabilistic model that can be used to predict the outcome of pairwise comparisons, as well as to obtain a global ranking from them. It has the following form:

and in logit form[2]:

The parameters (p’s and lambdas) can be fit using maximum likelihood estimation. One can consider these to represent the relative strength of options and therefore give a global ranking, although strictly speaking their interpretation is rooted in probabilities of outcomes of comparisons.

In order to apply this model we can use the *BradleyTerry2* **R** package by Turner and Firth[2], which fits the model using tabular input data. Armed with this package all we need is some extra plumbing in our Agora Voting tallying component and we’re ready to go. Let’s run it against the same ballots we did above, we get:

1 2 3 4 5 6 |
[(0, {'score': 73.69821}), (1, {'score': 49.13214}), (2, {'score': 24.56607}), (3, {'score': 1.186101e-06}), (5, {'score': 0.0}), (4, {'score': -24.56607})] |

which is equivalent to the following ranking:

- A
- B
- C
- D
- F
- E

Notice how this ranking does away with all the problems we mentioned with the pairwise-beta result. In particular, note how option F, which above was ranked joint first, is in this case ranked fifth. This is because it beat option E, which is last, and therefore not much strength can be inferred from that comparison.

Before concluding that the pairwise-beta model is terrible, remember that the results we got here correspond to a handpicked pathological set of ballots. In general it seems reasonable to expect results from both models to converge as more data accumulates and the strength of opponents is evened out. This hypothesis seems to match that stated in work by Salganik[3], where the pairwise-beta and a more robust model are compared saying:

In the cases considered in the paper, the two estimates of the score were very similar; there was a correlation of about 0.95 in both cases.

In summary, in this and the previous post we have described two models that can be used for pairwise elections, where candidates are asked to compare options in pairs. We have seen how one of the models works well and is easy to calculate, but can potentially give unrealistic rankings when data is sparse. We then considered a second more robust model which addresses this problem, but is computationally more expensive. Further work is required to determine exactly how computationally demanding our pairwise-bradleyterry implementation is.

[1] BRADLEY, R. A. and TERRY, M. E. (1952). Rank analysis of incomplete block designs. I. The method of paired comparisons. - http://www.jstor.org/stable/2334029

[2] Bradley-Terry Models in R: The BradleyTerry2 Package – http://www.jstatsoft.org/v48/i09/paper

[3] Wiki surveys: Open and quantifiable social data collection -http://arxiv.org/pdf/1202.0500v2.pdf