I came accross this post that illustrates bayesian updating

One can see how the probability of success (or any binary variable) in a bernoulli process is updated as more and more outcomes are observed.

Skip to content
# Month: November 2012

## Visualizing bayesian updating

## Liquid democracy: voter similarity and distance function

## Distance to the truth

## Democracy, proposal elaboration and voting

## Type classes as a decoupling mechanism

I came accross this post that illustrates bayesian updating

One can see how the probability of success (or any binary variable) in a bernoulli process is updated as more and more outcomes are observed.

The set of votes cast in a liquid democracy can be used to define a vector space over voters. Voter similarity can thus be specified in terms of a distance function, or metric, over this space. The question is, how to represent voters as vectors?

The most obvious approach would be to assign one dimension to each election. The content of the vote cast in election *n* would be the *n*th component of the vector. So for example, if there are two elections, each with three options (1,2,3), and we had the following voters:

John chose 1 in the first election, 1 in the second election and 3 in the third election

Sally chose 2 in the first election, 1 in the second election and 1 in the third election

we would have these column vectors

John = (1, 1, 3)

Sally = (2, 1, 1)

We could them simply choose the euclidean distance as our metric in this vector space (wikpedia)

So we would have

d(John, Sally) = Sqrt( 1 + 0 + 2) = Sqrt(3)

But there’s a problem with this approach. If, for example, John had chosen option 2 in the third election, we would have

d(John, Sally) = Sqrt(1 + 0 + 1) = Sqrt(2)

So we would say that John and Sally would be more similar as voters in the second case. But this requires the interpretation that option 2 is “semantically” closer to option 3 than option 1. In other words, the encoding we proposed implies similarity between options that are numerically close together. But we do not want this; a priori, options in an election can be arbitrarily close or far apart, option numbering should impose no “semantic structure” on the options.

What we need is an encoding where the distance between Sally and John only depends on whether the same or different option was chosen, not on the numerical distance between the numbers assigned to each option. One way to do this[1] is to encode votes into *string sequences* that correspond to the cast vote. For example, choosing option 2 is encoded as *the string “2”*. Vectors are the concatenation of these individual vote. With this approach, the metric is the Hamming distance

In information theory, the **Hamming distance** between two strings of equal length is the number of positions at which the corresponding symbols are different. For a fixed length *n*, the Hamming distance is a metric on the vector space of the words of length n, as it obviously fulfills the conditions of non-negativity, identity of indiscernibles and symmetry, and it can be shown easily by complete induction that it satisfies the triangle inequality as well.

Let’s encode our voters again, in the first example

John = (“1”, “1”, “3”) = (“113”)

Sally = (“2”, “1”, “1”) = (“211”)

d(John, Sally) = 2

In the second example,

John = (“1”, “1”, “2”) = (“112”)

Sally = (“2”, “1”, “1”) = (“211”)

d(John, Sally) = 2

And, as desired, the distances in both examples are the same, 2.

So, what is the point of all this? Defining voter similarity can have many applications, for example

- Collaborative filtering for delegate recommendation (delegate discovery when the number of delegates is potentially very large)
- Activity and notification filtering in a social network voting context (similar to filtering based on proximity in a social graph)
- Anomaly detection (e.g. via k-nearest neighbours) for delegate recall or switching (i.e. alert users when their delegates may not be acting as expected)
- Visualization, using clusters to identify voting blocs

It should be noted, however, that all of this only applies to public voting, where vote informaton is available to compute similarites.

[1] Of course, other metrics can be defined. For example the Rajski distance is used here

Here’s my brief proposal in response to the debate found here regarding verisimilitude.

Theories are probability distributions over observations, and convertible to probability distributions over possible worlds. This way to describe theories is richer than a compatibility relation between theories and worlds. Theories are not just compatible or incompatible with possible worlds, they assign probabilities to them.

The notion of truth, in the sense of the *complete* truth, can be described by a probability distribution as well. In a scenaro with no indeterminacy, the true theory, call it **T**, is a degenerate case of a probability distribution: it assigns probability 1 to the actual world, and zero elsewhere. In a scenario with indeterminacy, the true theory assigns probabilities to possible worlds; this is similar to the indeterminacy present in some interpretations of quantum mechanics.

Once we have established that theories, including the true theory **T**, are probability distributions, then distance to the truth is a matter of choosing (somewhat arbitrarily) a metric on probability distributions. We can choose, for example, the Jensen–Shannon divergence, because it has the nice property of always being finite (images from wikipedia)

where

and D is the Kullback–Leibler divergence. So the distance to the truth for a theory H is

D(H) = JSD(**T**, H)

where **T**, the true theory, is given. Of course, it is more interesting to consider what we *think* is the distance to the truth, as we don’t have magical access to what the truth really is. Our estimation of what the truth is can be obtained via Bayesian inference using experimental evidence. So we could define what we think is the truth as

T = argmax(H) P(H|E)

where H are theories, and E is observed evidence. Then we would estimate the distance to the truth as the distance to the theory we think is most likely (given by argmax)

D(H, E) = JSD(T, H)

where

T = argmax(H) P(H|E)

But there is another possibility. In the above formula, we are not using all our experimental evidence. Some of the information is thrown away by taking only the most likely theory, and ignoring the rest of the probability distribution over theories that evidence establishes. Remember that what we want is to compare probability distributions over worlds. In order to integrate all the information that evidence provides, we can compare our theory against the predictive distribution over worlds that all theories contribute to, not just the most likely one. We define the predictive distribution over worlds as

Pd(W) = ∑ P(W|H)P(H|E)

where the sum ∑ is over theories H. Finally, our new estimate of distance to the truth becomes

D(H, E) = JSD(Pd, H)

where

Pd = ∑ P(W|H)P(H|E)

The first thing we associate with democracy is voting and elections. But the democracy-as-voting picture is limited. When voting in a referendum, for example, we choose over a *predefined* set of options. How and which options are included is not subjected to the vote; voting in itself says nothing of this.

So the question is, can the decision process be extended backwards to determine the option set? We could have a “pre-vote”, where we choose over a previous, wider set of predefined options to determine which of those end up in the final vote. But that just repeates the problem at an earlier stage.

The solution is to *do away with the idea of a predefined set of options altogether*, to allow the voters themselves to suggest any number of options in a distributed fashion. In this way, people not only contribute knowledge by selecting from a set of options, but more thoroughly by creating the options themselves, in an unrestricted way. This poses aditional challenges of course, because choosing an option out of a list is simpler than creating said options.

We can call this opening of proposal elaboration to all participants **collaborative proposal elaboration** (see also collaborative government). Just as voting aggregates information in the shape of individual votes, collaborative proposal elaboration aggregates information in the shape of free form contributions. Because proposals are normally described with text, these contributions take the form of editing[1]; our predefined options are thus generalized to arbitrary text documents[2].

This presents a wider view of democracy, where decision making is not just limited to voting over proposals, but includes elaborating the proposals themselves. Let’s review the process. First, we have some matter we wish to make a decision about. Then, people present and elaborate proposals in an unrestricted and distributed way. Finally, people vote on any of these freely created proposals to select one (or many).

But let’s repeat the same pattern of thinking we began with: who decides what matters are to be decided upon? Who creates the context in which our decision making process occurs? If it is up to some central authority to pose matters on which to decide, we are again limiting the scope of the decision process (although it is less limited than the case of predefined options)

The solution is the same as above: anyone can propose matters on which to decide. And the definition of “the matter” is itself a text document that is voted upon. With this final piece, we have extended our decision process to include all the stages[3]. From proposing to decide on some matter, to the elaboration of the possible proposals to resolve the matter, to the selection of the proposal that is finally accepted.

[1] Ideally, this collaborative editing of text occurs in a medium that facilitates communication and debate. Debate can be an information aggregation process for voting as well as for policy elaboration. In the latter case, there are probably specific mechanisms that aid debate in the context of editing text, such as those seen here.

[2] In contrast to predefined options, arbitrary text documents represent a much wider solution space. This added flexibility allows more precise and earlier information aggregation as well as presenting opportunities for consensus during collaborative (and realtime) editing of text.

[3] Although the creation of the voting space itself, that is, the set of people that can participate in some decision making domain, is left specified. In one scenario, voting spaces could be created by anyone, and permission to join a voting space could be granted following a vote put to all existing members. An unrestricted approach to creating any number of these voting spaces would support spontaneous community formation.

Say you have a class *T* and a method *m*. *m *represents an operation that is applicable to many types, including *T. *In good OO practice, the only information *m *needs about the types it operates on is precisely that part of the types’ nature that makes *m* applicable to them. So you create an abstraction representing that nature, and define *m* according to it. In turn, classes like *T* conform to that abstraction.

If the abstraction is an interface, we implement that interface. If the abstraction is a class, we inherit that class. Both of these mechanisms will work, they are a standard solution to the problem. But even though the use of an abstraction reduces the coupling between *T* and *m*, it still exists. If you want an existing class to be usable with *m* you have to modify its type and implementation. That class would now be aware of *m.*

Type classes are an alternate abstraction used to define *m *without altering the types that *m* operates on. Instead of specifying that *T* must have some type, the type class mechanism merely states that there must exist some code somewhere that *m* can rely on in order to operate on *T*. This code, which does not necessarily exist at *T*, is the abstraction that m depends on, but that *T* need not be aware of.

The classic example would be sorting objects, where a sorting method requires some way to compare the objects it must sort. Using the standard solution, you would create an abstraction like Comparable, and make m require the objects it sorts to have that type. In Java we have

public static <T extends Comparable<? super T>> void sort(List<T> list)

In the type class approach, however, the abstraction is separate from the type of the objects to be sorted. Instead, it requires that some code, a *Comparator*, be accessible to the sorting method. In Java,

public static <T> void sort(List<T> list, Comparator<? super T> c)

or in a Scala example, where the type class is *Ordering*

def sorted[B >: A](implicit ord: Ordering[B]): List[A]

I’ve shown examples of the type class mechanism for Java and Scala[1]. What makes type classes even more effective in Scala is that the *Ordering[B]* parameter need not be passed explicitly. This remedies the two problems that the java type class solution has. First, users of the sort operation should not have to worry about passing the *Comparator*, it is an implementation detail. Second, in Scala it is possible to vary the *Ordering* implementation to be used just by bringing in a different *Ordering* object into scope, without having to modify any of the call sites.

[1] It could be said that the type class approach by definition requires transparent passing of type class instances, so the Java code would not really qualify as an example