In this post I formalize the discussion presented here, recall

Simpler theories are more likely to be true because they have been so in the past

We want to formalize this statement into something that integrates into a bayesian scheme, such that the usual inference process, updating probabilities with evidence, works. The first element we want to introduce into our model is the notion of a **meta-theory**. A meta-theory is a statement about theories, just as a theory is a statement about observations (or the world if you prefer a realist language).

As a first approximation, we could formalize meta-theories as priors over theories. In this way, a meta-theory prior, together with observations, would yield probabilities for theories through the usual updating process. This formalization is technically trivial, we just relabel priors over theories as meta-theories. But this approach does not account for the second half of the original statement

..because they have been so in the past.

As pure priors, meta-theories would never be the object of justification. We need a way to represent a meta-theory such that it favours some theories over others *and* such that it can be *justified through observations*. In order to integrate with normal theories, meta-theories must accumulate probability via conditioning on observations, just as normal theories do.

We cannot depend on or add spurious observations like “this theory was right” as a naive mechanism for updating; this would split the meta and theory level. Evidence like “this theory was right” must be embedded in existing observations, not duplicated somewhere else as a stand alone, ad-hoc ingredient.

Finally, the notion of meta-theory introduces another concept, that of distinct theory **domains**. This concept is necessary because it is through cross-theory performance that a meta-theoretical principle can emerge. No generalization or principle would be even possible if there were no different theories to begin with. Because different theories may belong to different domains, meta-theoretic induction must account for logical dependencies pertaining to distinct domains; these theories make explicit predictions only about their domain.

Summing up:

Our model will consist of observations/evidence, theories and meta-theories. Theories and corresponding observations are divided into different domains; meta-theories are theories about theories, and capture inter-theoretic dependencies (see below). Meta-theories do not make explicit predictions.

Let’s begin by introducing terms

E_{n}: An element of evidence for domain *n* [1]

*H _{n}*: A theory over domain

*n*

*M*: A meta-theory

Observations that do not pertain to a theory’s domain will be called external evidence. An important assumption in this model is that *theories are conditionally independent of external observations given a meta-theory*. This means that a theory depends on external observations only through those observation’s effects on meta-theories[2].

We start the formalization of the model with our last remark, conditional independence of theories and external observations given a meta-theory

*P(H _{n}|E_{x},M) = P(H_{n}|M) …………………… (1)*

Additionally, any evidence is conditionally independent of a meta-theory given its corresponding theory, i.e. it is theories that make predictions, meta-theories only make predictions indirectly by supporting theories.

*P(E _{n}|M,H_{n}) = P(E_{n}|H_{n}) …………………… (2)*

Now we define how a meta-theory is updated

*P(M|E _{n}) = P(E_{n}|M) * P(M) / P(E_{n}) …………………… (3)*

this is just Bayes’ theorem. The important term is the likelihood, which by the law of total probability is

*P(E _{n}|M) = P(E_{n}|M,H_{n}) * P(H_{n}|M) + P(E_{n}|M,¬H_{n}) * P(¬H_{n}|M)*

which by conditional independence (**1**)

*P(E _{n}|M) = P(E_{n}|H_{n}) * P(H_{n}|M) + P(E_{n}|¬H_{n}) * P(¬H_{n}|M) …………………… (4)*

This equation governs how a meta-theory is updated with new evidence *E _{n}*. Now to determine how the meta-theory determines a theory’s prior. Again by total probability

*P(H _{n}|E_{x}) = P(H_{n}|Ex,M) * P(M|E_{x}) + P(Hn|E_{x},¬M) * P(¬M|E_{x})*

which by conditional independence (**2**)

*P(H _{n}|E_{x}) = P(H_{n}|M) * P(M|Ex) + P(H_{n}|¬M) * P(¬M|Ex) …………………… (5)*

The following picture illustrates how evidence updates a meta-theory which in turn produces a prior. Note that evidence E1 and E2 are *external* to H3

Lastly, updating a theory based on matching evidence is, as usual

*P(H _{n}|E_{n}) = P(E_{n}|H_{n}) * P(H_{n}) / P(E_{n}) …………………… (6)*

Equations **3**,**4**,**5** and **6** are the machinery of the model through which evidence can be processed in sequence. See it in action in the next post.

[1] A given *E _{n}* represents a sequence of observations made for a domain n. So

*H*represents induction in a single step, although in practice it would occur with successive bayesian updates for each subelement of evidence.

_{n}|E_{n}[2] This characteristic is the meta analogue of conditional independence between observations given theories. In other words, just as *logical dependencies between observations are mediated by theories, inter-domain logical dependencies between theories are mediated by meta-theories*.

## 3 thoughts on “Formalizing meta-theoretic induction”