Here’s my brief proposal in response to the debate found here regarding verisimilitude.

Theories are probability distributions over observations, and convertible to probability distributions over possible worlds. This way to describe theories is richer than a compatibility relation between theories and worlds. Theories are not just compatible or incompatible with possible worlds, they assign probabilities to them.

The notion of truth, in the sense of the *complete* truth, can be described by a probability distribution as well. In a scenaro with no indeterminacy, the true theory, call it **T**, is a degenerate case of a probability distribution: it assigns probability 1 to the actual world, and zero elsewhere. In a scenario with indeterminacy, the true theory assigns probabilities to possible worlds; this is similar to the indeterminacy present in some interpretations of quantum mechanics.

Once we have established that theories, including the true theory **T**, are probability distributions, then distance to the truth is a matter of choosing (somewhat arbitrarily) a metric on probability distributions. We can choose, for example, the Jensen–Shannon divergence, because it has the nice property of always being finite (images from wikipedia)

where

and D is the Kullback–Leibler divergence. So the distance to the truth for a theory H is

D(H) = JSD(**T**, H)

where **T**, the true theory, is given. Of course, it is more interesting to consider what we *think* is the distance to the truth, as we don’t have magical access to what the truth really is. Our estimation of what the truth is can be obtained via Bayesian inference using experimental evidence. So we could define what we think is the truth as

T = argmax(H) P(H|E)

where H are theories, and E is observed evidence. Then we would estimate the distance to the truth as the distance to the theory we think is most likely (given by argmax)

D(H, E) = JSD(T, H)

where

T = argmax(H) P(H|E)

But there is another possibility. In the above formula, we are not using all our experimental evidence. Some of the information is thrown away by taking only the most likely theory, and ignoring the rest of the probability distribution over theories that evidence establishes. Remember that what we want is to compare probability distributions over worlds. In order to integrate all the information that evidence provides, we can compare our theory against the predictive distribution over worlds that all theories contribute to, not just the most likely one. We define the predictive distribution over worlds as

Pd(W) = ∑ P(W|H)P(H|E)

where the sum ∑ is over theories H. Finally, our new estimate of distance to the truth becomes

D(H, E) = JSD(Pd, H)

where

Pd = ∑ P(W|H)P(H|E)

I have a problem in trying to understand your suggestion; it is about the idea that theories can be seen as ‘probability distributions’. I think this is not the right way to understand theories (consider, e.g., a theory like Newton’s gravitation theory, or Darwin’s natural selection theory, or Wegener’s continental drift). What I do not understand is what would happen to the notion of a CONJUNCTION of two theories according to your view. From the more traditional view of theories as equivalent to SETS of possible worlds (not probability distributions thereof), the conjunction of two theories is simply the intersection of the two corresponding sets (i.e., those worlds that ‘satisfy’ both theories symmultaneously); but according to your understanding of theories, I simply cannot make sense of the idea of a proposition like “the conjunction of Newton’s second law and the law of gravity”. For I don’t see the point of a notion like ‘the conjuntion of two different probability distributions”.