Continuous improvement and TDD

It’s an old message, in a 2009 talk. But it’s good to recite the arguments every now and then, especially with such a great talk. And of course there’s the bonus of the other material which is actually the focus of the talk; the history of Smalltalk and how it “died”, according to Martin, due to messy code, parochialism/arrogance, and inability to deal with real world or enterprise requirements.

But back to the reason I’m linking this video, continuous improvement. Code tends to be messy, and needs constant correction, continuous improvement, to not collapse into an unmanageable, unmaintainable, unintelligible ball of yarn. So, code tends to be messy. What does this mean, and why is this the case?

Well, software is complex relative to our ability to write it. So it’s unlikely to get code right the first time. Secondly, writing code occurs when not all the information is available. This compounds the first point, and makes it even less likely to get things right initially.

But not only that, the fact that information is not available, or that in fact, changes as code is written, has the consequence that the code will evolve to hit a moving target. Changes to code again leads to messy code. The reason is different than that above. It’s a matter of coherence, not of correctness.

So in this simple model, we have two main driving forces that point to messy code. Intrinsic difficulty of getting things right and incoherence due to changing requirements. In theory, these are not unsurmountable. “All” it requires is constant improvement and correction. However, constant improvement requires a lot of discipline and mental effort to undertake. So constant improvement is not carried out and code gets messy.

The main point of test driven development is to lower the mental effort required for continuous improvement, by addressing its biggest obstacle, fear of breaking things. So the logic goes, reduce the fear of breaking things -> reduce the mental effort for continous improvement -> reduce the level of mess.

As I said, it’s an old message, nothing new here, but it’s worth remembering every now and then.

Worse than ignorant

A maximum entropy uniform probability distribution over some outcome corresponds to a state of zero knowledge about the phenomenon in question. Such a distribution assigns equal probability to, does not favor, nor prohibits any result; moreover, any result that comes about is equally compatible with said probability distribution. So it seems this maximum entropy probability distribution is the worst case scenario in terms of knowledge about a phenomenon. Indeed, in this state of knowledge transmitting the description of the results requires the maximum amount of information, hence maximum entropy.

However, one can in fact do worse than zero knowledge. It is worse to have a low entropy, but incorrect,  belief, than to have a maximum entropy and ignorant lack of belief. We could informally call this state as that of not zero knowledge, but of negative knowledge. Not only do we not know anything about a phenomenon, worse still we have a false belief.

These notions can be well understood in terms of the Kullback-Leibler divergence. Starting from a state of low entropy, but incorrect, probability distribution, bayesian updates will generally modify said distribution into one of higher entropy. Instinctively, it seems that going to higher entropy is a step backwards. We now need more information to describe the phenomenon than before.

The key feature of KL that corrects this wrong intuition is that in the context of bayesian updates, KL divergence measures the change in the quantify of information necessary to describe the phenomenon, as modeled by our updated probability distribution, from the standpoint of the state of knowledge prior to updating, that is, from the standpoint of our previous non-updated distribution. So even though our new distribution is of higher entropy, it is more efficient at coding (describing) the phenomenon than our previous low entropy, but incorrect distribution.

The KL divergence measures the expected information gain of a bayesian update. It is a positive quantity; updating with new evidence will, on average, leave us in a state of knowledge that is more accurate.

Automatic selection bias in news driven perception of technological progress

It’s a long winded title, but the idea is simple. If we informally judge, or simply perceive, the rate of technological progress from technology related news or developments, we will automatically fall into a selection bias that will overestimate the rate of overall progress. This is because news pertains to changes, not to stasis. So most of the data that is the input to our perception is concentrated around those areas that change rapidly. This is the selection bias. Thus, a naive judgement of technological progress assigns greater weight to rapidly changing technologies, and ignores those not contributing data; areas of stagnation.

The obvious overrepresented example would be information technology. Some underrepresented examples could be energy and transport.

Further thoughts on rates of progress (with a reverse selection bias!)