Worse than ignorant

A maximum entropy uniform probability distribution over some outcome corresponds to a state of zero knowledge about the phenomenon in question. Such a distribution assigns equal probability to, does not favor, nor prohibits any result; moreover, any result that comes about is equally compatible with said probability distribution. So it seems this maximum entropy probability distribution is the worst case scenario in terms of knowledge about a phenomenon. Indeed, in this state of knowledge transmitting the description of the results requires the maximum amount of information, hence maximum entropy.

However, one can in fact do worse than zero knowledge. It is worse to have a low entropy, but incorrect,  belief, than to have a maximum entropy and ignorant lack of belief. We could informally call this state as that of not zero knowledge, but of negative knowledge. Not only do we not know anything about a phenomenon, worse still we have a false belief.

These notions can be well understood in terms of the Kullback-Leibler divergence. Starting from a state of low entropy, but incorrect, probability distribution, bayesian updates will generally modify said distribution into one of higher entropy. Instinctively, it seems that going to higher entropy is a step backwards. We now need more information to describe the phenomenon than before.

The key feature of KL that corrects this wrong intuition is that in the context of bayesian updates, KL divergence measures the change in the quantify of information necessary to describe the phenomenon, as modeled by our updated probability distribution, from the standpoint of the state of knowledge prior to updating, that is, from the standpoint of our previous non-updated distribution. So even though our new distribution is of higher entropy, it is more efficient at coding (describing) the phenomenon than our previous low entropy, but incorrect distribution.

The KL divergence measures the expected information gain of a bayesian update. It is a positive quantity; updating with new evidence will, on average, leave us in a state of knowledge that is more accurate.

Automatic selection bias in news driven perception of technological progress

It’s a long winded title, but the idea is simple. If we informally judge, or simply perceive, the rate of technological progress from technology related news or developments, we will automatically fall into a selection bias that will overestimate the rate of overall progress. This is because news pertains to changes, not to stasis. So most of the data that is the input to our perception is concentrated around those areas that change rapidly. This is the selection bias. Thus, a naive judgement of technological progress assigns greater weight to rapidly changing technologies, and ignores those not contributing data; areas of stagnation.

The obvious overrepresented example would be information technology. Some underrepresented examples could be energy and transport.

Further thoughts on rates of progress (with a reverse selection bias!)

http://www.softmachines.org/wordpress/?p=296
http://www.softmachines.org/wordpress/?p=1027
http://www.amazon.com/Great-Stagnation-Low-Hanging-Eventually-ebook/dp/B004H0M8QS

Inner and Outer perspectives on intelligence

I was talking about how measures of intelligence could focus on the outside results or on the process internals that give rise to said results. Our first intuition would be that if intelligence is a property of behaviour, then an empirical approach would be to focus on the outside results, and that the internals can be treated as a black box that is irrelevant, a view reminiscent of behaviorism. If intelligence is defined in terms of outside results, then it does not matter how these outside results, ie behavior, came about.

This kind of reasoning leads us to the Turing test, a black box measure of intelligence. Now, Im not going to go the route of the Chinese Room and make that kind of objection. However, even if I find Chinese Room misguided, there is something to opening the black box, both practically and philosophically. Consider two implementations of the black box for an intelligent agent.

Vast Rule Set

Here, the agent whose behavior we have measured to be intelligent has a brain that is nothing but a huge list of rules that map perceptions to actions. There is no processing. The agent takes its percept, finds the corresponding rule, and acts according to the action pointed to by the rule. Because there is no processing of any kind, the rule set would be huge, as the number of possible raw unprocessed perceptions is huge. In fact, no rule would ever be used more than once as perceptions, when considered as raw unprocessed data, never repeat.

Would knowing these internals of the agent make us change our mind about its intelligence? I would say no, they should not. However, even if there are no philosophical consequences as to the definition of intelligence from this example, there are practical ones. That is, if you try to build an AI this way, you will fail. In judging whether certain AI attempts are on the right track or not, opening the black box and looking inside tells us things. And if we know that Watson is not grounding natural language knowledge in sensory experience, we suspect it is not an advance towards general intelligence, but domain specific technology that solves a restricted goal.

Random Number Generator

What if we look inside the agent and find nothing but a random number generator? Further, lets say that, in fact, there is nothing else but a random number generator. Is this impossible? Not really, just hugely, astronomically, cosmically,… improbable that the agent did behave intelligently. But it did.

It is kind of a far-fetched hypothetical situation because randomness is exactly the antithesis of intelligence, the baseline if you like. But, given that we accept the terms of the hypothesis it suggests a contradiction. The behavior was intelligent, but it came out of sheer idiocy. What then?

We can resolve the contradiction noting that although we say that the behavior was intelligent, what this really means is that said behavior, in general, allows us to infer a capacity on the part of the agent. In this particular case, the description of the hypothetical situation invalidates said inference, because we know that the agent will not be intelligent in the future (or at least, it is astronomically improbable).

So..

In conclusion, intelligence is a capacity of an agent. The internals as to how this capacity is realized are not fundamental to the definition of intelligence. Intelligent behavior is simply behavior that allows us to infer this capacity from an agent.

Besides the matter of defining intelligence, opening the black box allows us to make predictions as to whether a given strategy for AI is on the right path, given experience of what has not worked in the past. So although the internals are not fundamentally important philosophically, they tell is if an approach is practical.

The deal with Watson

Ferruci gives a brief semi-technical talk on how Watson works

The question on everyone’s mind is what does Watson represent in terms of AI advancement? It’s a very advanced piece of natural language processing and machine learning, impressive parallel computing and engineering, and clever domain specific (ie jeopardy) trickery. But Watson does not represent significant improvement towards general AI, although it’s probably general enough to apply to other question answering, QA, domains.

After all, that’s what the name of the technology suggests, DeepQA. Deeper natural language processing and semantic understanding, but not the level of depth of real understanding. And this takes us to the question of what exactly do we mean by real understanding. The simple answer seems, the understanding that we associate with humans in tasks of reading comprehension. Of course, using this definition alone leads to the no-win trap that many AI researchers complain about: if the goal necessarily includes the human element, the by definition it is unattainable by a machine.

There are two approaches to reach a useful (as in, not predetermined to produce failure) metric of success. Those that concentrate on results, and those that concentrate on the internals that give rise to results. More on this later.