Rationality trick: noticing what you want to be true

Any time you make a judgement about something, whether it is during private reflection, or as part of some argument or debate, it is useful to stop and think: Do I have any preference as to what I would like to be true? and What is that preference? At first it sounds like a silly think to ask yourself; what’s true is true regardless of what you would like. Which brings me to the following distinction:

  • what is true
  • what I think is true
  • what I’m arguing to be true
  • what I want to be true

Consider a simple question

What is the population of Switzerland?

In this example, what’s true is a straightforward fact you can easily find out. But that’s not the reason I used this example. What I’m remarking here is that you don’t really have a preference about what the truth is. When doing introspection with what do I want to be true, nothing results, you don’t care either way. Let’s contrast this with another example, assume for the sake of argument that you have personal convictions in the realm of politics, consider

Is the minimum wage beneficial or detrimental?

If you have some political affiliation and if you’re honest with yourself I’m pretty sure you will have an answer as to what you want to be true. And even if it’s not the case for this particular example, you can probably find a matter of policy for which a positive answer results from the introspection.

The important thing is to note the clear difference between the two examples. In the first example, there was no fact of the matter as to what you want to be true, in the second there is. And despite the fact that wanting something to be true has no bearing as to whether in fact it is true, it absolutely does have a bearing on

  • what I think is true
  • what I argue to be true

If you have a preference as to how you’d like things to be, you can be pretty sure that your mind will distort things to match that. It’s those pesky cognitive biases again, and as we’ve mentioned before, the biggest offender and most relevant here is confirmation bias.

Say, for example, you have a strong opinion about an issue, strong to the point that you consider that position to be part of your identity. In this scenario facts and arguments about the issue that are contrary to your position become an attack on who you are, they compromise your identity. And psychologically speaking, this is a big deal. Your ego will defend itself, and that includes distorting things and deceiving you and anybody else if necessary. This simple description does a good job of explaining some of the irrationality in politics.

So in summary, a preference for a state of reality activates biases that distort cognition to match.

This brings us back to the beginning of the post, here’s where the rationality trick comes in handy. When reflecting about some matter, it is a good exercise to ask yourself if you have a preference as to what you would like to be true. Noticing that you have such a preference should be a warning sign and a cue to exercise more discipline and restraint, because you know there is probably a bias at work.

Lastly, I want to point out the relationship between

  • what I’m arguing to be true
  • what I want to be true

To make matters worse, arguing something to be true may well determine what you want to be true. It’s all about signaling. As soon as you establish, in a social context, that you are advocating or defending a certain position, you become bound to it: being proven wrong as well as changing your opinion signals weakness, something we are evolutionarily programmed to avoid at all costs. That’s why you rarely see someone admitting being wrong or changing their mind in a debate, especially if there’s an audience.

Art, regularity, novelty and dopamine

Dopamine (Wikipedia)

When writing up the post on regularity I was looking for an image that would match the content. As you can see if you go back to that entry the image shows a regular structure, a fragment of architecture. I was quite happy with it, but what turned out to be even more interesting was the article I lifted it from, which I did not look at until now.

It happens that in that post the author explores some of the same ideas I’ve considered when attempting to apply concepts from information theory[1] to yield an interpretation of artistic experience as a learning process. The notions of monotony and complexity are also given a similar (tentative) mathematical formalization in terms of algorithmic information theory, which is how we have just discussed regularity in the previous entry.

Of special interest is the hypothesis of how monotony (at the low complexity end of the spectrum, see below on AIC) is not only boring but in fact distressing, due to a mismatch between what our neural system is tuned to observe and what is actually observed

why is human neurological response actually negative? Some insight into the effect comes from the notion of Biophilia, which asserts that our evolution formed our neurological system within environments defined by a very high measure of a specific type of coherent complexity. That is, our neurological system was created (evolved) to respond directly and exquisitely to complex, fractal, hierarchical geometric environments. When placed in environments that have opposite geometrical features, therefore, we feel ill at ease.

In my brief essay on the matter I mentioned hierarchical nature of processing in the visual cortex, which is reflected in machine learning approaches using recurring neural networks such as HTM’s and deep learning. The hierarchical nature of the processing, as well as the dual top-down and bottom-up flow of information are well established in neuroscience. However, it remains to be seen if there is a neurological explanation for the observation that “we feel ill at ease” with, for example, monotonous geometry.

In any case, all this led me to to try to extend my information-theoretic picture of artistic expression and experience with a grounding in neurological processes. In particular, how does the brain deal with regularity and novelty as relates to learning and satisfaction/pleasure. Knowledge from neuroscience points to dopamine as both related to pleasure and reward and, critically, learning (temporal difference learning).

I’m still thinking many of these ideas through, but I’m going to try to summarize some of them in their crude form[6] so I can return to this as reference, please excuse the lack of polish.

Here are the main points:

* Artistic expression can be characterized in terms of its position in a spectrum whose extremes are the obvious/boring on one side and the unintelligible on the other. This spectrum corresponds to the algorithmic information content of the expression.

* Artistic experience is a learning process where regularities are extracted and predictions are made according to those regularities

* Novelty is the mismatch between prediction and observation, novelty is a departure from regularity

* Novelty, either as reward-prediction error, or as an intrinsic reward (indirect indicator of reward) motivating exploration[2], activates dopamine in the brain. In the latter case, this is related to the exploration vs exploitation aspect of reinforcement learning[3].

* Different dopamine response profiles yield different levels of sensation-seeking in individuals[4], which may partially account for different taste in terms of desired optimum points on the AIC spectrum

* The optimum balance between insufficient and excessive complexity in the AIC spectrum parallels the requirement of balance between task difficulty and skill in flow

* Artistic satisfaction/pleasure is an interplay between novelty and regularity[5], it may be possible to explain this within some model of reward mediated by dopamine[6]


Notes/References

[1] Perhaps the canonical example of this line of work is Jurgen Schmidhuber’s treatment

[2]  Absolute Coding of Stimulus Novelty in the Human Substantia Nigra/VTA [2006] , Pure novelty spurs the brain

[3] Exploration & Exploitation Balanced by Norepinephrine & Dopamine [2007]

[4] Midbrain Dopamine Receptor Availability Is Inversely Associated with Novelty-Seeking Traits in Humans [2008]

[5] Dopamine: generalization and bonuses [2002]

[6] In particular, it is unclear whether a dopaminergic model is applicable to standalone perception which is the case for artistic experience, since actions and explicit reward are absent, and the timescales may be too small to be compatible with such a model.

Towards an agent based model of democracy

Democracy as voting can be conceptualized as a social information processor, composed of a set of individual processors (the voters) and an integrator (the voting system) that together produce global decisions.

Can one speak precisely about the quality of these decisions? Can the concept of performance and error be meaningfully defined?

Let’s restrict the analysis to the simplest case, a direct democracy Yes/No choice. Assume the concept of correct vote is well defined for the individual information processors, which we can call voting agents. If all the voting agents correctly emit Yes, but the global decision is No, then it seems reasonable to say that the global decision is incorrect. This is just a limiting case of the principle of majority vote, which we can restate as

1. The globally correct decision is that for which there is a highest number of votes that match individual correctness [5]

This establishes, pending an individual definition of correctness, a binary[1] definition of correctness for the global processor. Furthermore, if voting is exercised repeatedly for a sequence of decisions, we can define a success rate for the global processor as

2. The success rate is the fraction of correct results that the processor emits

Let’s define the vote for an agent a on an issue i as V(a, i), and the correct vote as C(a, i). So

3. An agent’s a vote on an issue i is V(a,i)

4. An agent a votes correctly on an issue i if and only if V(a, i) = C(a, i)

What 4 says is that there exists a correct vote for an agent on an issue, and that the agent may or may not emit that vote. This definition does not require an interpretation of what V and C are, merely that they exist. An interpretation can be made in the framework of decision theory as we will see in the next post[3]. But whatever the interpretation, one can ask, given a fixed C and V how does the performance of the system vary as a function of integrator design?

Say for example, comparing direct to representative democracy. It may be that in representative democracy representatives have greater expertise and emit votes that match C for a greater fraction of voters than if these voters had voted directly. This would speak in favor of representative democracy as a better processor. Conversely, one could say that the representatives are unaligned with the voters’ preferences, yielding worse performance.

Let’s extend 1-4 to allow for representative and liquid democracy.

In representative democracy there is no possibility of direct vote. It can be characterized as a low frequency delegatation-vote-only integrator. This adds to our previous analysis the notion of a delegation vote which is distinct from a standard vote (as described by V and C). A delegation vote is a voting agent’s choice of delegate.

Define the delegation vote for an agent a as D(a), which points to a delegate agent d. Thus

3.1 (representative) A voting agent a’s vote is given by V(D(a), i)

In a liquid democracy agents can choose to vote directly or delegate their vote

3.2 (liquid) A voting agent a’s vote is given by V(a, i) or V(D(a), i)

and the delegation transitivity of liquid democracy yields

3.3 (liquid) A delegate agent d’s vote is V(d, i) or V(D(d), i)

having extended V and D to apply also to delegate agents.

Getting back to the big picture, here’s what we have. We have suggested simple definitions for performance and error in global decisions as a function of individual correctness, which remains unspecified. We need to provide interpretations and specifications for individual voting, delegation and individual correctness (V, D and C). These specifications may yield a model that is operational, whose resultant dynamics can be observed.

Can democratic performance be investigated with such an agent based model? And more specifically, can such a model reveal or justify the hidden assumptions present when asserting superior performance of direct vs representative vs liquid democracy integrators?

(Continued in further posts)


References/Notes

[1] An alternative definition could incorporate degrees, for example depending on the exact numbers involved.

[2] The voting agent / delegate agent distinction is not obligatory, but is required for private voting systems

[3] Let’s say the agent has some preferences, or a utility function, about the state of the world. Given a voting decision, it must judge which outcome results in a higher expected utility. This rationally ideal choice is C. The agent, constrained by limited knowledge, limited computing resources, and cognitive errors, is modeled by V.

[4] The definition of correctness for a delegation vote should be defined entirely in terms of the correctness of the delegate’s vote. As an extreme case, if the delegate’s votes are all correct from the point of view of the voter, then that delegate vote was correct. Conversely, the vote is incorrect if none of the delegate’s vote were correct (again from the point of view of the voter). So judging from these extremes it seems that the correctness of a delegated vote admits degrees and is similar to the success rate of 2:

5. An agent a has a delegation success rate equal to the fraction of issues i where V(D(a), i) = C(a, i)

[5] The wording is somewhat convoluted to account for unintuitive scenarios. It is possible, for example, for almost every voter to individually vote incorrectly but that the resulting global decision is correct (due to errors cancelling).

What we mean by regularity

Regular structure (Nikos A. Salingaros)

I’ve spoken before of regularity, but haven’t defined it exactly. But before going into that, let’s first consider the intuitive notion that comes to mind. By regularity we mean something exhibiting pattern, repetition, invariance. We say something is regular if it follows a rule. In fact, the word’s etymology matches this, as regular is derived from the latin regula, rule. Repetition and invariance result from the continued applicability of the rule, over time and/or space, to that which is regular. For example

1, 3, 5, 7, 9, 11, 13, 15….

we say this sequence is regular because it follows a rule. The rule is

each number is the result of adding 2 to the number before it

As per our scheme above, the rule is applicable throughout the sequence, the +2 difference repeats, and it is invariant.

This way of looking at regularity matches the language we’ve used previously when defining the key aspect of learning as the extraction of generally applicable knowledge from specific examples. In this case the specific examples would be any subset of the sequence, the general case is the sequence in its entirety, and the extracted knowledge is the rule “+2″.

We can take this further and try to formalize it by realizing one consequence of the repetition characteristic. And it is that something that repeats can be shortened. The reason is simple, if we know the rule, we can describe the entire object[1] by just describing the rule. The rule, by continued repetition, will reproduce the object up to any length. We can use the example before, and note how the sequence can be described succinctly as

f(x) = 2x + 1

which is much shorter than the sequence (which can be infinite in fact). So we can think of the rule as a compression of the object, or from the other point of view, the object is the expansion of the rule. Here’s another example

Mandelbrot set (wikipedia)

In this case, the object is a fractal, which can be described graphically by a potentially infinite set of points. The level of detail is infinite in the sense that one can zoom-in to arbitrary levels without loss of detail. This is why the description at a literal level (ie pixels) is infinitely long. However, like all fractals, the Mandelbrot set can be compactly described mathematically. So we say the set is highly regular by virtue of the existence of a short description that can reproduce all its detail. Here’s the short (formal) description for the Mandelbrot set

In mathematics the length of what we have called short description is known as Kolmogorov complexity, or alternatively, Algorithmic Information Content. It is a measure of the quantity of information in an object, and is the inverse of regularity as we have discussed it here[2]. We say that something with a comparably low AIC exhibits regularity as it can be compressed down to something much shorter.

I’ll regularly return to the concept of regularity as it is a fundamental way to look at pretty much everything, and is thus a very deep subject.


[1] For lack of a better word, I’m using the word object to refer to that which can house regularity, which is basically anything you can think of

[2] Note that this is not the only way to look at regularity, but rather one of two main formalizations. What we have seen here is the algorithmic approach to complexity (and regularity), but there is also a statistical view that is more suited to objects that do not have a fixed description, but rather a statistical one.

Loops, search and the creative process

I have described rationality as the ability to represent the world correctly (in a brain). Correct representation allows for correct predictions. Predictions in turn allow choosing those actions that will attain goals; an intelligent agent can predict the result of its actions and develop plans accordingly. It sounds complicated but it’s really a routine process. Here’s a pigeon doing something like it.

Without the ability to develop plans (and during learning stages) an agent is limited to trying every possibility and seeing what happens. But trying everything out in the real world can be slow and even dangerous. So intelligent agents simulate the outside world internally to carry out the trial and error process much more quickly, and safely. This trial and error process is a loop.

A creative process can also be seen as a search containing a trial and error loop. I’m using the term creative process loosely and inclusively. It applies to any process where we are creating something with some specific intent and criterion. Call the result art if you like. It can be a painting, a computer program, a piece of music or the design for a bridge. There are many manifestations, but they can all be interpreted as a search for a solution in a large space of possibilities; finding good solutions is not trivial.

This is why trial and error is necessary. It cannot be known ahead of time what the best solution will be, nor how good a particular solution is, except by trying them out. During such trials the creator evaluates a solution. This evaluation then propagates information back to the search for other solutions, establishing a feedback loop.

Does this view of the creative process offer any insights as to how to best carry it out?

First, it is clear that, all other things being equal, the more possibilities are considered the higher the quality of the final solution will be. So speeding up the feedback loop can dramatically improve the outcome; removing barriers and delays between creation and evaluation. It turns out that the speed of the feedback loop is very sensitive to advances in technology, here are a few examples:

All these advances and many others can be interpreted as a tightening of the feedback loop, accelerating the creative process and thus exploring more possibilities. A straightforward example above is digital photography. By eliminating chemical processing the time between taking a shot and judging the result is drastically reduced (including reductions due to limited film etc). There are potential pitfalls, a faster feedback loop may also encourage a less disciplined and principled approach. But this is more a side effect of the advance than a property of the advance itself, which ceteris paribus is usually a net gain.

3D modeling

And another point can be made. When searching for solutions, the creator will inevitably be limited to a fraction of the search space. And the quality of the final solution can only be as high as that present in that space. There’s a tradeoff, if the space is too large, the search will be bogged down in the filtering of many low quality results. But if it’s too small, the quality of the best solution present in the examined space will be bounded, producing stagnation.

So the second insight is that adding a random element to a creative process allows the search space to be expanded beyond that which would be possible otherwise. This is the role of accident in art. It allows the creator to transcend his/her limits in exchange for more labor or a possibly worse outcome.


Notes

Additional interpretations are possible in terms of search and search spaces, for example

- The role of knowledge and expertise as a search heuristic

- Process specifics as search bias