**Note: I wrote this piece before the two posts presenting simple model of learning using bayesian inference. There is significant overlap, and conclusions are stated without complete explanations**.
I attended an informal talk on climate change recently, after which I had several discussion regarding the scientific process and the foundations of knowledge (in science).

One question was *Is scientific knowledge is inductive or deductive*? Well, the scientific method requires deductive inference to establish the logical consequences of a theory in order to make predictions. But the justification of theories, the method by which a theory is temporarily accepted or discarded is inductive. In the language of Bayes, theory confirmation/invalidation occurs by updating theory posteriors inductively (*P(H|E)*), whereas evidence conditioning on theories (*P(E|H)*) is derived deductively.

So, although deduction plays a part in establishing the logical consequences of theories in the form of testable predictions, the nature of the knowledge, or rather, the process by which that knowledge is gained, is fundamentally inductive.

What does this say about the foundations of knowledge in science? If scientific knowledge were deductive, we could simply say that its foundations are axiomatic. We could also talk about incompleteness and other interesting things. But if as we have stated this knowledge is inductive, what are its foundations? Is induction a valid procedure, and what are its endpoints?

This is a very deep subject, trying to go all the way to the bottom is what I have titled this post as an epistemological dive. I’m not going to give this a thorough treatment here, but I’ll briefly state what my position is and what I argued in discussion that day.

The way I see it, the foundations of scientific knowledge are the postulates of probability theory (as derived for example by Bernardo-Smith or Cox) together with **Occam’s razor**. In fact, given that most people are aware of probability theory I would say that the best single answer to the foundation of knowledge, in the sense that it is something we are less aware of, is Occam’s razor. I will give a brief example of this, borrowed from a talk by Shane Legg on machine super intelligence.

Let’s consider a minimal example of a scientific process. An agent is placed in an environment and must form theories whose predictions correctly match the agent’s observations. Although minimal, this description accounts for the fundamental elements of science. There is one missing element, and that is a specification of how the agent forms theories, but for now we will use our own intuition, as if we were the agent.

For this minimal example we will say that the agent observes a sequence of numbers which its environment produces. Thus, the agent’s observations are the sequence, and it must form a theory which correctly describes past observations and predicts future ones. Let’s imagine this is what happens a time goes forward, beginning with

**1**

For the moment there is only one data point, so it seems impossible to form a theory in a principled way.

**1,3**

Among others, two theories could be proposed here, odd numbers, and powers of 3, with corresponding predictions of **5** and **9**:

*f(n) = 2n – 1*

*f(n) = 3^n*

the observations continue:

**1,3,5**

The powers of three theory is ruled out due to the incorrect prediction 9, while odd number theory was correct.

**1,3,5,7**

The odd number theory has described all observations and made correct predictions. At this point our agent would be pretty confident that the next observation will be **9**.

**1,3,5,7,57**

What?! That really threw the agent off, it was very confident that the next item would be **9**. But it turned out to be **57**. As the builder of this small universe I’ll let you know the correct theory, call it *theory_57*:

*f(n) = 2n – 1 + 2(n-1)(n-2)(n-3)(n-4)*

which if you check correctly describes all the numbers in the sequence of observations. If the 5th observation had instead been **9**, our odd number theory would have been correct again, and we would have stayed with it. So depending on this 5th observation:

9 => *f(n) = 2n-1*

57 =>* f(n) = 2n – 1 + 2(n-1)(n-2)(n-3)(n-4)*

Although we only list two items, the list is actually infinite because there are an infinite number of theories that correctly predict the observations up until the 4th result. In fact, and here is the key, *there are an infinite number of theories that correctly predict any number of observations*! But let us restrict the discussion to only the two seen above.

What our intuition tells us is that no reasonable agent would believe in *theory_57* after the fourth observation, even though it is just as compatible with the odd number theory. Our intuition strongly asserts that the odd number theory is the correct theory for that data. But how can we justify that on the basis of induction, if they make the same predictions (ie they have the same *P(E|H)*)?

The key is that our intuition, in fact our intelligence in general, has a built-in *simplicity bias*. We strongly favor the odd number theory because it is the *simplest* theory that fits the facts. Hence induction, including our everyday intuitions and the scientific method, is founded upon Occam’s razor as a way to discriminate between equally supported theories.

Without this or some more specific bias (in machine learning we would call this inductive bias), induction would be impossible, as there would be too many theories to entertain. *Occam’s razor is the most generally applicable bias; it is a prerequisite for any kind of induction, in science or anywhere else*.