Theories are probability distributions over observations, and convertible to probability distributions over possible worlds. This way to describe theories is richer than a compatibility relation between theories and worlds. Theories are not just compatible or incompatible with possible worlds, they assign probabilities to them.
The notion of truth, in the sense of the complete truth, can be described by a probability distribution as well. In a scenaro with no indeterminacy, the true theory, call it T, is a degenerate case of a probability distribution: it assigns probability 1 to the actual world, and zero elsewhere. In a scenario with indeterminacy, the true theory assigns probabilities to possible worlds; this is similar to the indeterminacy present in some interpretations of quantum mechanics.
Once we have established that theories, including the true theory T, are probability distributions, then distance to the truth is a matter of choosing (somewhat arbitrarily) a metric on probability distributions. We can choose, for example, the Jensen–Shannon divergence, because it has the nice property of always being finite (images from wikipedia)
and D is the Kullback–Leibler divergence. So the distance to the truth for a theory H is
D(H) = JSD(T, H)
where T, the true theory, is given. Of course, it is more interesting to consider what we think is the distance to the truth, as we don’t have magical access to what the truth really is. Our estimation of what the truth is can be obtained via Bayesian inference using experimental evidence. So we could define what we think is the truth as
T = argmax(H) P(H|E)
where H are theories, and E is observed evidence. Then we would estimate the distance to the truth as the distance to the theory we think is most likely (given by argmax)
D(H, E) = JSD(T, H)
T = argmax(H) P(H|E)
But there is another possibility. In the above formula, we are not using all our experimental evidence. Some of the information is thrown away by taking only the most likely theory, and ignoring the rest of the probability distribution over theories that evidence establishes. Remember that what we want is to compare probability distributions over worlds. In order to integrate all the information that evidence provides, we can compare our theory against the predictive distribution over worlds that all theories contribute to, not just the most likely one. We define the predictive distribution over worlds as
Pd(W) = ∑ P(W|H)P(H|E)
where the sum ∑ is over theories H. Finally, our new estimate of distance to the truth becomes
D(H, E) = JSD(Pd, H)
Pd = ∑ P(W|H)P(H|E)