Worse than ignorant

A maximum entropy uniform probability distribution over some outcome corresponds to a state of zero knowledge about the phenomenon in question. Such a distribution assigns equal probability to, does not favor, nor prohibits any result; moreover, any result that comes about is equally compatible with said probability distribution. So it seems this maximum entropy probability distribution is the worst case scenario in terms of knowledge about a phenomenon. Indeed, in this state of knowledge transmitting the description of the results requires the maximum amount of information, hence maximum entropy.

However, one can in fact do worse than zero knowledge. It is worse to have a low entropy, but incorrect,  belief, than to have a maximum entropy and ignorant lack of belief. We could informally call this state as that of not zero knowledge, but of negative knowledge. Not only do we not know anything about a phenomenon, worse still we have a false belief.

These notions can be well understood in terms of the Kullback-Leibler divergence. Starting from a state of low entropy, but incorrect, probability distribution, bayesian updates will generally modify said distribution into one of higher entropy. Instinctively, it seems that going to higher entropy is a step backwards. We now need more information to describe the phenomenon than before.

The key feature of KL that corrects this wrong intuition is that in the context of bayesian updates, KL divergence measures the change in the quantify of information necessary to describe the phenomenon, as modeled by our updated probability distribution, from the standpoint of the state of knowledge prior to updating, that is, from the standpoint of our previous non-updated distribution. So even though our new distribution is of higher entropy, it is more efficient at coding (describing) the phenomenon than our previous low entropy, but incorrect distribution.

The KL divergence measures the expected information gain of a bayesian update. It is a positive quantity; updating with new evidence will, on average, leave us in a state of knowledge that is more accurate.

Automatic selection bias in news driven perception of technological progress

It’s a long winded title, but the idea is simple. If we informally judge, or simply perceive, the rate of technological progress from technology related news or developments, we will automatically fall into a selection bias that will overestimate the rate of overall progress. This is because news pertains to changes, not to stasis. So most of the data that is the input to our perception is concentrated around those areas that change rapidly. This is the selection bias. Thus, a naive judgement of technological progress assigns greater weight to rapidly changing technologies, and ignores those not contributing data; areas of stagnation.

The obvious overrepresented example would be information technology. Some underrepresented examples could be energy and transport.

Further thoughts on rates of progress (with a reverse selection bias!)