Are intelligence and goals practically coupled?

There are many definitions of intelligence[1][2]. Most of them include as a central element the capacity to achieve goals, together with an ingredient of generality to distinguish from narrowly applicable abilities. In these definitions, the goals themselves are left unspecified, their content has no bearing as to whether something is considered intelligence or not. In other words, intelligence and goals are decoupled, or orthogonal.

However, definitions are just… definitions. The only requirement for a definition to be valid is logical consistence. Whether it applies to the real world as a useful concept is another matter all together.

This brings us to consider whether, in practice, intelligence and goals are independent or not. Not only empirically, which is a question of observing existing cases of intelligences and their associated goal content, but also physically. In other words, whether intelligence and goals are constrained to correlate in physically realizable intelligences that do not yet exist. The main constraint that a physically realizable intelligence is subjected to is a limit to computational resources[3].

So, in practice, is it possible to build an intelligence with arbitrary goals? And if not, what constraints are imposed on these goals, and how do these constraints come about?

I will stop here as I think it’s not yet possible to think rigorously about these questions, although I think the questions themselves are well defined and relevant (ie for matters of AI safety). Here is some related reading

Bostrom [2012] – Motivation and instrumental rationality in advanced artificial agents

Lesswrong – General purpose intelligence: arguing the Orthogonality thesis

Lesswrong – Muehlhauser-Goertzel Dialogue, Part 1

[1] Legg [2006] – A collection of definitions of intelligence

[2] I have considered intelligence from a naturalistic standpoint as an optimization process that arose in living beings to counter entropy through behavior

[3] With unlimited computational resources one could instantiate a model like AIXI where goals are not coupled and perhaps answer the question immediately


Grok (

I’ve been following Jeff Hawkins’ work at Numenta for a while, although I’ve never played with their HTM technology[1]. My view is that the fastest route to AGI is through neuroscience inspired algorithms and cognitive architectures. What Demis Hassabis’ calls a systems neuroscience approach. Numenta’s work does not seem to me to qualify[2] as an integral route to AGI, but definitely looks like a promising building block, not at the systems level, but at the perception algorithm level.

The news is that Numenta will be pushing a cloud based platform comercially, called Grok. Its strong points stem from those things the neo-cortex does well, mainly dealing with time based data robustly and autonomously (online learning), with little of the fine tuning that is involved in traditional machine learning approaches. It is aimed at prediction and anomaly detection in data streams.

The pieces of AGI will be falling into place over the next decades, and maybe this is one of them.


[1] Hierarchical Temporal Memory including HTM Cortical Learning Algorithms

[2] Although Itamar Arel seems to have a different point of view for his related approach

Universal intelligence and biased generality

When Shane Legg posted about AIQ recently I asked him to comment on Hibbard’s paper[1] where an objection is made about the apparently counter-intuitive consequences of the universal intelligence measure[2]. Hibbard notes that

Given arbitrarily small ε > 0, total credit for all but a finite number of environments is less than ε. That is, total credit for all environments greater than some level C of complexity is less than ε, whereas credit for a single simple environment will be much greater than ε. This is not the way we judge human intelligence.

In other words, an agent that succeeds in a number of very complex environment is considered less intelligent than another agent that succeeds in one very simple one, and that seems wrong.

But the important thing about the universal intelligence measure is that it’s a matter of expectation, of capacity. It is not obtained by just totaling number of environments the agent is successful at, but rather by the number of environments weighted by the expectation that they occur (according to a solomonoff prior). So the measure is really an indicator of the expected degree of success of an agent if all we know about the environments it may find itself in is that they are distributed according to the simplicity prior.

This ingredient of expectation, together with the assumed prior, is what necessitates a simplicity-biased generality to obtain a high score.

Legg suggests that the source of dissonance Hibbard remarks is a hidden assumption present in our every day intuition about intelligence. Given that in practice we measure intelligence of generally intelligent agents (people), that is, agents that succeed in environments by virtue of generally intelligent ability rather than through narrow domain specific ability, we take success at a complex task as strong evidence of the ability to succeed at less demanding ones. This hidden assumption turns out to be inapplicable in the case of the two agents described, and hence our intuition goes wrong.

Note how if we analyze the example above in terms of what cannot be done (ie, the simplest environment the agent fails at), the dissonance seems to go away: it is intuitively acceptable that an agent that fails at a trivial task is stupid. This is just inverting the “less-than-or-equal-to assumption” above. In this case, the assumption that an agent cannot succeed at tasks that are more difficult than one it has failed at is again wrong, but yields results that better correspond with the universal measure.

Following a different route, Hibbard proposes (if I understood his paper correctly) a measure that screens out non-general agents by requiring them to succeed not at a single environment, but rather at a class of environments, parametrized by complexity[3], Ef(m)

The intelligence of an agent p is measured as the greatest m such that p learns to predict all e ∈ Ef(m)

Presumably this prevents one-trick-pony agents from reaching intelligent scores that would seem unintuitive; these scores are not possible because success is necessary in all the environments of the class. I interpret that the argument can be extended to conclude that if an agent succeeds at a class of given complexity, it is guaranteed that it will also succeed at complexities below that, although I have not seen this explicitly stated in the paper.



[1] Measuring Agent Intelligence via Hierarchies of Environments

[2] Universal Intelligence: A Definition of Machine Intelligence

[3] Where complexity is given by the number of steps of computation that define the evolution of the environment

Hints of generality in machine learning

One of the themes in AI is the progressive replacement of hand crafted knowledge/programming with autonomous learning. Another theme is the progressive shift from narrow domain specific abilities to generally applicable performance. These themes are closely related: domain specific performance is usually achieved via encoding domain specific expert knowledge, provided by humans, into the AI itself. This encoding is fixed and static, and is potentially brittle if the agent is subjected to a domain outside its narrow region of applicability.

In this talk we see hints of generality in machine learning via the replacement of hand crafted, tuned features (as input to a later stage learning phase) with a learning phase that autonomously learns these features. Because domain specific features, for example for vision or speech recognition, are typically designed by subject matter experts, learning is constrained by that specific manual task. One cannot use vision features in an audio problem and so forth.

However, when machine learning is targeted at learning the features themselves (in an unsupervised scheme), the potential for generality becomes present. If features can be autonomously learned for various domains, the machine learning process becomes general in so far as the feature learning’s performance is comparable or superior to that using hand crafted knowledge. This is exactly what is demonstrated here.

And, to make it even more relevant in terms of current research avenues, this is related to findings in neuroscience that suggest the general applicability of some kinds of learning in the neocortex, where for example patients can learn to see with their auditory cortex or their somatosensory cortex (by rewiring the optic nerve). This suggests the possibility that there is a unique learning algorithm at work, at least in the learning of concepts at low levels of the hierarchy close to the perceptual level, which is exactly where features reside.