Liquid democracy in action!

This is a screenshot taken from within the Agoravoting (Agoravoting.com) web application, which features a visual representation of elections. Blue nodes correspond to voters, and the green and red nodes correspond to ballot choices. Node diameters are proportional to vote weights, as are link colors, becoming darker as they transfer more votes.

Vote-time loop detection and transitive closure

Previously we have spoken about vote-time loop detection as a mechanism to prevent votes from being lost, as the tally in Agora will discard votes that enter a cyle

vote-time loop detection is a mechanism that detects cycles just as they are formed, and prevents them from being established. A feedback message instructs the voter that his/her choice is invalid as it would create a loop, the user must make another selection. How does this detection algorithm work? Well essentially in the same way as it does in the tally algorithm, except that it only checks  loops starting from delegation choices just as they are made by a user, on demand.

But this is not the entire picture, for the following reasons

• Delegation loops do not invariably result in lost votes. In the event that any of the members in the cycle votes directly, that member will channel all the delegation weight  present in the loop. Still, loops are in principle undesirable.
• Instead of allowing the user to select an invalid (see the previous point) option and then requiring them to change it, the system could identify problem delegates before the user selects one, pre-emptively.
• Following the point above, the vote-time loop detection algorithm would have to operate differently, as the starting point would not be a specific choice of delegate, but rather all possible choices. Using the tally-time algorithm would be computationally expensive.

This leads us to two different possibilities, pre-emptive vote-time loop detection, and reactive vote-time loop detection. Reactive vote-time loop detection corresponds to our earlier notion of vote-time loop detection. When a user selects a delegate, the system runs an algorithm that, based on that selected delegate, detects where a loop exists. That is, loop detection begins from the path User => Delegate-choice. This algorithm can proceed by walking the graph, and keeping a record of already visited nodes.

In pre-emptive vote-time loop detection, the system must calculate, for every user of the application, what choices of delegate would result in a delegation loop.  Thus the path begins at each possible user. Because such a calculation is expensive, it should be done “offline”, prior to the display of the delegate choices. The result of such a calculation is then stored so that it can be looked up to determine problem delegates whenever presenting the delegation vote interface.

So how does this data look like? It must contain all the delegation graph information necessary to carry out loop detections cheaply, for all possible users. In effect, this is a matter of graph reachability, and the structure we are looking for is the transitive closure of the relation defined by all existing delegate votes (that is, the delegation graph). The transitive closure of a relation can take the form of a binary adjacency matrix, where a ‘1’ represents that two vertices are reachable, and ‘0’ otherwise.

The transitive closure (TC) of a directed graph can be calculated using Warshall’s (pdf) algorithm[3] that runs in O(n^3) time. Let’s consider how this could be implemented. First consider that the transitive closure can change whenever the delegation graph changes, that is, whenever a delegate vote is cast, changed or removed. So a naive approach would be to calculate the TC each time this happens, and store it for use during lookup.  Since the TC is an adjacency matrix, it can be easily represented as a database table with two columns.

Lookup is also very easy, as transitive reachability is encoded directly in the adjacency matrix. To determine if a delegate D will cause a loop if selected by user U, we simply have to ask if U is reachable from D, because if it were, the extra path would cause a loop. If stored in a rdbms table, the sql to determine if D is problematic is simply

which would return 1 row if U is rechable and 0 rows otherwise.

But perhaps a naive approach would be inefficient. For every delegated vote cast this would entail

• Running Warshall’s algorithm

A better solution would be to calculate the transitive closure incrementally, such that only the varying parts of the TC are recalculated. Even better, if the delegation graph is stored in a database, it would be more efficient if the calculation could be done locally on the database without having to load votes and then run the algorithm.

These two desirable properties bring us to the paper Maintaining Transitive Closure of Graphs in SQL, where such a method is described. As an added benefit, it is formulated in terms of simple sql, requiring no specialized extensions[2] that some databases offer that support graph related operations. We can implement this method as two stored procedures[1] that serve to incrementally maintain the TC table.

As an example, we will construct the TC for the following graph

which is reflected in the following sql session

This session shows the edge insertions and the resulting transitive closure table.  Now imagine that delegate 4 has decided to delegate his vote. What delegates would cause a loop? The answer is a straightforward lookup on the TC table:

which returns the rows with values 1, 2, 3, 5, since any of these will result in a loop.

What delegates would cause a loop for delegate 2? We have

which returns rows with values 1, 5.

As can be seen from these examples, detecting possible loops using the TC table is very easy, provided the TC is maintained properly as delegate votes are casted, changed or removed.

[1] Following are two stored procedures based on this method implemented for mysql.

Removing an edge

Open proposal elaboration

Democracy is not just about voting, but about the wider problem of making decisions in a community. Good systems for making decisions are more likely to yield decisions that reflect the community’s preferences and exploit its collective knowledge. A decision making system, therefore, takes as input preferences and knowledge and produces as output a decision. Such a decision making system is effectively an information aggregator; high quality decisions are the result of aggregating as much information as possible: no preferences are ignored, and no valuable knowledge is left unused.

This brief description brings us to the distinction between voting and the general case of  decision making. In traditional voting, voters are asked to cast a vote which represents a choice between a predefined set of already existing alternatives. The alternatives presented to the voter are fixed prior to the vote, they have been previously elaborated by some other agent independent of the voters. The voters simply choose among the set presented to them.

In the terms presented above, the knowledge and preferences that voters input into the system only serves to discriminate between a predefined set of options. If the process by which the options were elaborated has not previously already aggregated this information and materialized it into the options themselves, that information could be lost.

In other words, voting is too coarse grained to aggregate the information in all its richness. The grain is determined by the options, if these options do not previously already represent the best quality decisions, the resulting decision will be suboptimal. The optimal decision implicit in the lost information is unavailable as a choice.

The rise of information technology is very relevant to this problem. It is clear, nominally speaking, that a problem of aggregating information is directly related to information technology. More specifically, the communication possibilities that the modern day internet afford are precisely those that are necessary to open up the possibility of aggregating information from hundreds of thousands of sources in a much more refined way than the coarse grained procedure of voting allows.

The key idea is this: information technology and the internet allow extending voters’ participation from just the voting phase, back in time to include the proposal drafting/elaboration (what before were merely fixed options) phase. In contrast to the coarse grain input that voting allows, contributions to a proposal as it is being written are fine grained; modifications to proposals can be arbitrarily small and precise. From adding (or removing) a comma, to a sentence, to a paragraph, to an entire proposal.

We call this opening up of proposal elaboration to the general voter population Open / Distributed / Federated proposal elaboration.

It is open because anyone can contribute. It is distributed/federated because there is no central authority that drafts proposals and imposes them on the voters. Proposals are drafted spontaneously in a distributed fashion. In conclusion, Open/Distributed/Federated proposal elaboration is a decision making medium that need not suffer from information loss that can affect democracy-as-voting. This strong focus on participation makes this approach strongly related to ideas such as collaborative governance[1] and participative democracy[2].

There are two sides to the coin, however. In opening up proposal elaboration to the general public, the gates to more information are opened, but they are also opened to too much information. The key then, is to balance the benefits of the extra possibilities and ideas that this system allows with the extra noise and less useful contributions that can potentially flood the medium. If the signal to noise ratio is too low, optimal decisions can be discarded, not because the required information was not present as we discussed,  but because those decisions were buried under noise that makes them impossible to find.

In summary, a three stage model of information aggregation as decision making is proposed. Although we use the term stage, these three opportunities need not occur chronologically, they may occur in any order and repeatedly.

1. The first stage of decision making occurs when a set of individuals sets about collaboratively writing[3] or modifying an existing proposal. Ideally this occurs in realtime, these individuals discuss and debate the changes as they are being made. Or they can exchange views or ideas with more traditional means of interaction, offline. In any case, the proposal is the result of their collaboration and contribution, information in terms of preferences and knowledge is integrated to yield the proposal.
2. The second stage is proposal cross-fertilization[6]. Whereas the focus on aggregation in the first stage was accross individuals collaboratively writing a proposal, in this stage the aggregation occurs across proposals. Proposals may be forked, merged, recombined and mutated. Sections from one proposal may be imported/transcluded into another one. Although the process is still carried out by individuals, proposal cross-fertilization does not  only gather contributions from a subset of users collaboratively making modifications, but rather contributions are drawn from the entire proposal pool.
3. The third stage is proposal selection, which corresponds to our traditional idea of voting. Having a given pool of proposals that have evolved during the other two stages of aggregation, the voting population is asked to vote for one (or many).

Due to the problems of noised mentioned above, it may be necessary to add extra stages that act as filters. For example, a filter could discard proposals that are not supported by a threshold number of participants.

References

[1] http://en.wikipedia.org/wiki/Collaborative_governance

[3] By collaborative proposal editing we mean realtime online editing of document by several individuals. The most common examples of this today are found in Google Drive and Etherpad.

[6] Tools that support this model are for example Wikis[7] and Version control systems[8]. The idea of cross fertilization has been suggested in [9]

[7] http://en.wikipedia.org/wiki/Wiki

[8] http://en.wikipedia.org/wiki/Version_control

Liquid democracy: voter similarity and distance function

The set of votes cast in a liquid democracy can be used to define a vector space over voters. Voter similarity can thus be specified in terms of a distance function, or metric, over this space. The question is, how to represent voters as vectors?

The most obvious approach would be to assign one dimension to each election. The content of the vote cast in election n would be the nth component of the vector. So for example, if there are two elections, each with three options (1,2,3), and we had the following voters:

John chose 1 in the first election, 1 in the second election and 3 in the third election

Sally chose 2 in the first election, 1 in the second election and 1 in the third election

we would have these column vectors

John = (1, 1, 3)

Sally = (2, 1, 1)

We could them simply choose the euclidean distance as our metric in this vector space (wikpedia)

So we would have

d(John, Sally) = Sqrt( 1 + 0 + 2) = Sqrt(3)

But there’s a problem with this approach. If, for example, John had chosen option 2 in the third election, we would have

d(John, Sally) = Sqrt(1 + 0 + 1) = Sqrt(2)

So we would say that John and Sally would be more similar as voters in the second case. But this requires the interpretation that option 2 is “semantically” closer to option 3 than option 1. In other words, the encoding we proposed implies similarity between options that are numerically close together. But we do not want this; a priori, options in an election can be arbitrarily close or far apart, option numbering should impose no “semantic structure” on the options.

What we need is an encoding where the distance between Sally and John only depends on whether the same or different option was chosen, not on the numerical distance between the numbers assigned to each option. One way to do this[1] is to encode votes into string sequences that correspond to the cast vote. For example, choosing option 2 is encoded as the string “2”. Vectors are the concatenation of these individual vote. With this approach, the metric is the Hamming distance

In information theory, the Hamming distance between two strings of equal length is the number of positions at which the corresponding symbols are different. For a fixed length n, the Hamming distance is a metric on the vector space of the words of length n, as it obviously fulfills the conditions of non-negativity, identity of indiscernibles and symmetry, and it can be shown easily by complete induction that it satisfies the triangle inequality as well.

Let’s encode our voters again, in the first example

John = (“1”, “1”, “3”) = (“113”)

Sally = (“2”, “1”, “1”) = (“211”)

d(John, Sally) = 2

In the second example,

John = (“1”, “1”, “2”) = (“112”)

Sally = (“2”, “1”, “1”) = (“211”)

d(John, Sally) = 2

And, as desired, the distances in both examples are the same, 2.

So, what is the point of all this? Defining voter similarity can have many applications, for example

• Collaborative filtering for delegate recommendation (delegate discovery when the number of delegates is potentially very large)
• Activity and notification filtering in a social network voting context (similar to filtering based on proximity in a social graph)
• Anomaly detection (e.g. via k-nearest neighbours) for delegate recall or switching (i.e. alert users when their delegates may not be acting as expected)
• Visualization, using clusters to identify voting blocs

It should be noted, however, that all of this only applies to public voting, where vote informaton is available to compute similarites.

[1] Of course, other metrics can be defined. For example the Rajski distance is used here