## Simplicity and meta-theoretic induction

During my discussion of induction and occams razor, I have said

The key is that our intuition, in fact our intelligence in general, has a built-in simplicity bias. We strongly favor the odd number theory because it is the simplest theory that fits the facts. Hence induction, including our everyday intuitions and the scientific method, is founded upon Occam’s razor as a way to discriminate between equally supported theories.

Note that I have described Occam as a simplicity bias. I deliberately chose this word in order to convey that simplicity is a guideline that we use prior and independent of experience. In the language of probability, Occam takes the form of a prior probability distribution that favors simpler theories before any updating has occured, and is unaffected by posterior evidence.

This state of affairs does not seem satisfactory; as described above, Occam seems like an unjustified and arbitrary principle, in effect, an unsupported bias. Surely, there should be some way to anchor this widely applicable principle on something other than arbitrary choice[1]. The obvious course would be to say something like this:

Simpler theories are more likely to be true because they have been so in the past

thus grounding the principle on experience, just like any other case of inductive inference and scientific knowledge. But let’s go back to the source of the problem that this principle is trying to fix. We have two theories, S and C, which make identical, correct predictions with respect to all observations made to date. These two theories only differ in their future predictions.

And yet, in practice, we consider the predictions made by S much more likely than those made by C. Because by definition these two theories share likelihoods for observed evidence, it is only through their priors that we can assign them different probabilities. Here’s where the simplicity principle comes in. We favour theory S because it is simple, hence granting it a greater prior probability and consequently a greater posterior despite its shared likelihood with C. When asking ourselves how we justify the simplcity principle, we answer

Because simple theories have been true in the past.

So the simplicity principle acts like a meta-theory and can accrue probability through experience just like any other theory. Until now, everything seems to work, but here’s the problem. Let’s say we have two guiding principles:

Occam: Simpler theories are more likely to be true

Occam(t): Simpler theories are more likely to be true until time t

Whatever the mechanism by which the simplicity meta-theory accumulates posterior probability, so shall its peculiar brother, and in the same exact amount. When going back to our two theories, S and C, Occam will favour S while Occam(t) will favour C. Because both Occam and Occam(t) are supported by the same amount of evidence, equal priors will be assigned to S and C. The only way out of this is for Occam and Occam(t) to have different priors themselves. But this leaves us back where we started!

So in conclusion, if we try to solve the problem with

Simpler theories are more likely to be true because they have been so in the past

we are just recasting the original problem at the meta level, we end up begging the question[2] or in an infinite regress. Which should not really come as a surprise, there is no way to justify knowledge absolutely, there must be unjustified assumptions somewhere. In my point of view, Occam is such a bedrock

The way I see it, the foundations of scientific knowledge are the postulates of probability theory (as derived for example by Bernardo-Smith or Cox) together with Occam’s razor.

Please see my next post where I formalize meta-theoretic induction

[1] In essence, this is the problem of induction, which dates back to Hume who first posed it in its original form.

[2] The future is like the past because in the past, the future was like the past