A maximum entropy uniform probability distribution over some outcome corresponds to a state of zero knowledge about the phenomenon in question. Such a distribution assigns equal probability to, does not favor, nor prohibits any result; moreover, any result that comes about is equally compatible with said probability distribution. So it seems this maximum entropy probability distribution is the worst case scenario in terms of knowledge about a phenomenon. Indeed, in this state of knowledge transmitting the description of the results requires the maximum amount of information, hence maximum entropy.

However, one can in fact do worse than zero knowledge. It is worse to have a low entropy, but incorrect, belief, than to have a maximum entropy and ignorant lack of belief. We could informally call this state as that of not zero knowledge, but of negative knowledge. Not only do we not know anything about a phenomenon, worse still we have a false belief.

These notions can be well understood in terms of the Kullback-Leibler divergence. Starting from a state of low entropy, but incorrect, probability distribution, bayesian updates will generally modify said distribution into one of higher entropy. Instinctively, it seems that going to higher entropy is a step backwards. We now need more information to describe the phenomenon than before.

The key feature of KL that corrects this wrong intuition is that in the context of bayesian updates, KL divergence measures the change in the quantify of information necessary to describe the phenomenon, as modeled by our updated probability distribution, *from the standpoint of the state of knowledge prior to updating*, that is, from the standpoint of our previous non-updated distribution. So even though our new distribution is of higher entropy, it is more efficient at coding (describing) the phenomenon than our previous low entropy, but incorrect distribution.

The KL divergence measures the expected information gain of a bayesian update. It is a positive quantity; updating with new evidence will, on average, leave us in a state of knowledge that is more accurate.