The most incomprehensible thing about the universe is that it is comprehensible

It’s a quote by Albert Einstein, which is where we left off last time. Comprehensible translates to, for example, mathematically intelligible, regular or lawful. These are different ways to say that it is possible to arrive at descriptions of the world that allow us to understand it and make predictions. Einstein’s point was that there is no particular reason to expect the universe to be the way it is, i.e. following elegant mathematical laws. It could have just as well been a chaotic mess impossible to make sense out of.

Regularity in the Sierpinski triangle

It’s hard to tell whether it’s even meaningful to speak of the way the universe could have been without speaking of how the universe and its characteristics arise. Indeed, one of the deepest questions in physics is, why does the universe have the laws it has? (Second only to why is there something rather than nothing?)

But imagine for the moment that the universe was in fact a messy chaos. Well, one thing seems clear, that kind of universe would not contain life, because life is one of the most obvious examples of order and regularity (or if you like, life requires order and regularity to exist), and intelligent life is precisely the kind of life that requires most order.

The point is that our very existence screens off the possibility of a non-regular universe, it is impossible for us to observe anything different because we would not have existed under those circumstances. This point is known as the anthropic principle. Does it answer the question? Not really; the anthropic principle has been labeled as unscientific and metaphysical by critics. You have to be careful to not take the point too far. In this case I’m just saying that life implies a selection effect to the universe it inhabits.

But again, that does not answer the question. However, if we additionally postulate that there isn’t one universe, but many, the situation makes some sense:

Alice: Why is the universe comprehensible?

Bob: The thing is, there isn’t just one, there are many, so it turns out that some of them are comprehensible, just like in a lottery someone must end up winning.

Alice: But what about the coincidence that we landed precisely on a comprehensible one?

Bob: That’s not a coincidence, our very existence implies that the universe we are in must be orderly. We couldn’t have landed in any other one.

Alice: So it’s a combination of those two things that answers the question, the anthropic principle is not enough..

Bob: Yes

Although in fact the question is still not answered because we had to postulate the existence of many universes, and we could in turn ask ourselves why that is the case. Oh well.

The unreasonable effectiveness of mathematics

Previously I mentioned a famous paper by physicist Eugene Wigner titled The unreasonable effectiveness of mathematics, where, citing Wikipedia

Wigner observed that the mathematical structure of a physics theory often points the way to further advances in that theory and even to empirical predictions, and argued that this is not just a coincidence and therefore must reflect some larger and deeper truth about both mathematics and physics.

So how is it that, a product of human thought that is abstract and apparently unrelated to experience is so surprisingly successful at describing the physical world? It seems remarkable to the point of unreasonable.

Armed with our understanding of what we mean by coincidences and how they are explained, let’s turn to this problem. Recall that many times explanations of coincidences are nothing other than the establishment of causal links between their two elements, such that one follows from the other in an unsurprising way.

So automatically we have a recipe to try to tackle the above. There are two initial ways to proceed, either that mathematics causes the world to be the mathematical, or that the world causes mathematics to be the way it is. This has to be developed to make sense.

The first case can be interpreted as stating that our cognitive apparatus, having mathematics embedded in it, causes us to interpret the world accordingly. A fair analogy would be that if you put on a pair of red glasses the world will seem red to you. Not because the world is red, but because the way you look it at interprets it that way. So if our brains are somehow inherently mathematical, they will both produce mathematics as an a-priori field of study, as well as interpret the physical world mathematically.

But of course, our brains turned out the way they are for some reason. Which brings us to our second possibility, that the mathematical nature of the world caused our brains to develop accordingly. After all, a brain that is not tuned to understand the environment is not much good; evolution did the tuning.

And we can have a combination of both mechanisms, whereby evolution tunes the brain, and the brain in turn selectively interprets the world according to its nature, the causal link goes both ways, although it starts in one direction.

If we briefly look at the wikipedia article, we realize that the two solutions presented here correspond to solutions 1 and 4 proposed by Richard Hamming, not bad for our simple method!

But this is not the end of the story. If in fact the world is mathematical, or at least mathematically intelligible, and brains evolved to make sense of it, then that leaves another question. Why is the world mathematical in the first place? This is echoed in Einstein’s quote:

The most incomprehensible thing about the universe is that it is comprehensible.

I’ll leave this for my next post.

Coincidences and explanations

I was reading about a famous article by physicist Eugene Wigner titled The unreasonable effectiveness of mathematics, where, citing Wikipedia

In the paper, Wigner observed that the mathematical structure of a physics theory often points the way to further advances in that theory and even to empirical predictions, and argued that this is not just a coincidence and therefore must reflect some larger and deeper truth about both mathematics and physics.

I’ll write about this in a later post, but for now this brings me to consider what we mean by coincidence and how we think about them.

In the above, a coincidence is remarked between two apparently independent domains, that of mathematics, and that of the structure of the world. In general, when finding striking coincidences our instinct is to reach for an explanation. Why? Because by definition a striking coincidence is basically something of a-priori very low probability, something implausible that merits investigation to “make sense of things”.

An explanation of a coincidence is a restatement of its content that raises its probability to a level such that it is no longer a striking state of affairs, the coincidence is dissolved. Example:

Bob: Have you noticed that every time the sun rises the rooster crows? What an extraordinary coincidence!

Alice: Don’t be silly Bob, that’s not a coincidence at all, the rooster crows when it sees the sun rise. Nothing special

Bob: Erm… true. And why did David choose me to play the part of fool in this dialogue?

Alice’s everyday response to coincidence is at heart nothing other than statistical inference, be it bayesian or classical hypothesis testing[1]. The coincidence at face value plays the role of a hypothesis (null hypothesis) that assigns a low probability to the event, ie the hypothesis of a chance occurrence between two seemingly independent things. The explanation in turn plays the role of the accepted hypothesis by virtue of assigning a high probability to what is observed.

So one could say that the way we respond and deal with coincidence is really a mundane form of how science works, where theories are presented in response to facts, and those that better fit those facts are accepted as explanations of the world.

But how do explanations work internally? The content of an explanation is the establishment of a relationship between the two a-priori independent facts, typically through causal mechanisms. The causal link is what raises the probability of one given the other, and therefore of the joint event. In the example, the causal link is ‘the rooster crows when it sees the sun rise‘.

But the links are not always direct. An interesting example comes from what in statistics is called a spurious relationship. Again, Wikipedia says:

An example of a spurious relationship can be illuminated examining a city’s ice cream sales. These sales are highest when the rate of drownings in city swimming pools is highest. To allege that ice cream sales cause drowning, or vice-versa, would be to imply a spurious relationship between the two. In reality, a heat wave may have caused both

although the emphasis here is about the lack of direct causal relationship, the point regarding coincidence is the same. Prior to realizing that both facts have a common cause (the explanation is the heat wave), one would have regarded the relationship between ice cream sales and drownings as a strange coincidence.

In the extreme case the explanation reveals that the two facts are really two facets of the same thing. The coincidence is dissolved: any given fact must necessarily coincide with itself. Before the universal law of gravitation, it would have been regarded as extraordinary that both the apples falling from a tree, and the movement of planets in the heavenly skies had the same behavior. But we know now that they are really different aspects of the same phenomenon.


Notes

[1] The act of explanation is, in classical statistics language, the act of rejecting the null hypothesis. In the Bayesian picture, the explanation is what is probabilistically inferred due to the higher likelihood it assigns to the facts (and its sufficient prior probability)

Universal intelligence and biased generality

When Shane Legg posted about AIQ recently I asked him to comment on Hibbard’s paper[1] where an objection is made about the apparently counter-intuitive consequences of the universal intelligence measure[2]. Hibbard notes that

Given arbitrarily small ε > 0, total credit for all but a finite number of environments is less than ε. That is, total credit for all environments greater than some level C of complexity is less than ε, whereas credit for a single simple environment will be much greater than ε. This is not the way we judge human intelligence.

In other words, an agent that succeeds in a number of very complex environment is considered less intelligent than another agent that succeeds in one very simple one, and that seems wrong.

But the important thing about the universal intelligence measure is that it’s a matter of expectation, of capacity. It is not obtained by just totaling number of environments the agent is successful at, but rather by the number of environments weighted by the expectation that they occur (according to a solomonoff prior). So the measure is really an indicator of the expected degree of success of an agent if all we know about the environments it may find itself in is that they are distributed according to the simplicity prior.

This ingredient of expectation, together with the assumed prior, is what necessitates a simplicity-biased generality to obtain a high score.

Legg suggests that the source of dissonance Hibbard remarks is a hidden assumption present in our every day intuition about intelligence. Given that in practice we measure intelligence of generally intelligent agents (people), that is, agents that succeed in environments by virtue of generally intelligent ability rather than through narrow domain specific ability, we take success at a complex task as strong evidence of the ability to succeed at less demanding ones. This hidden assumption turns out to be inapplicable in the case of the two agents described, and hence our intuition goes wrong.

Note how if we analyze the example above in terms of what cannot be done (ie, the simplest environment the agent fails at), the dissonance seems to go away: it is intuitively acceptable that an agent that fails at a trivial task is stupid. This is just inverting the “less-than-or-equal-to assumption” above. In this case, the assumption that an agent cannot succeed at tasks that are more difficult than one it has failed at is again wrong, but yields results that better correspond with the universal measure.

Following a different route, Hibbard proposes (if I understood his paper correctly) a measure that screens out non-general agents by requiring them to succeed not at a single environment, but rather at a class of environments, parametrized by complexity[3], Ef(m)

The intelligence of an agent p is measured as the greatest m such that p learns to predict all e ∈ Ef(m)

Presumably this prevents one-trick-pony agents from reaching intelligent scores that would seem unintuitive; these scores are not possible because success is necessary in all the environments of the class. I interpret that the argument can be extended to conclude that if an agent succeeds at a class of given complexity, it is guaranteed that it will also succeed at complexities below that, although I have not seen this explicitly stated in the paper.

 


References/Notes

[1] Measuring Agent Intelligence via Hierarchies of Environments

[2] Universal Intelligence: A Definition of Machine Intelligence

[3] Where complexity is given by the number of steps of computation that define the evolution of the environment