Beauty and simplicity in physics

A theory appears beautiful or elegant [..] when it’s simple, [..] when it can be expressed very concisely. – Murray Gell-Mann

The question of whether one should use beauty as a heuristic in physics is a difficult problem with deep philosphical roots. It is hard to find appropriate assumptions to even frame the problem. Like in the classic philosophical problems it is related to, falling into infinite regress[5] is always right around the corner. In the limit, if we make no assumptions, nothing can be said.

I sketch an argument, in the form of a series of claims, as to how one should try to think of the problem. This is then applied in two ways. First to argue that the term “beauty” is not useful to think about the problem. Second, to evaluate one example of a heuristic that is normally considered to be an instance of beauty, naturalness.

1. Science is a process in which theories are assigned relative degrees of certainty.

1.1 We can model degrees of certainty with probabilities.

1.2 Theories must be logically consistent

1.3 Theories gain (or lose) certainty through empirical confirmation.

2. Scientific theories allow calculations that specify expected observations.

2.1 To yield calculations, these theories must be computable, in the Church-Turing sense.

2.2 These theories can be re-cast as algorithms.

3. The routine practice of science includes embedded assumptions. These assumptions are usually taken for granted and even invisible [4].

3.1 These assumptions affect the way probabilities are assigned to theories.

3.2 The way these assumptions affect probability assignments must be consistent.

We can accomodate the problem of beauty in physics in terms of point 3. I argue later that the word “beauty” is not the right word to describe the real problem at hand. This real problem can be best described as follows.

3.3 If competing theories yield (through calculation) empirical claims that are equivalent in our domain of observation, then they cannot be assigned different probabilities from observation alone.

3.3.1 The problem of assigning different probabilities to empirically equivalent theories (up to observation) is an instance of the philosophical problems of underdetermination, and its cousin, the problem of induction.

3.3.2 The problem is not vacuous as theories with as yet equivalent calculations may diverge for not yet made observations.

The problem of beauty in physics can now be described as the question of whether beauty should be used an additional criteria to answer 3.3.1. Before going forward, we identify the main classes of assumptions present and accepted in scientific practice.

4. Science includes a class of assumptions about nature’s uniformity, this is known as Uniformitarianism.

4.1 Uniformitarianism asserts, for example, that the laws of nature are invariant in space and time: we expect the laws of physics to be the same everywhere. We also expect the laws of physics to be the same in the past as in the future.

4.2 Uniformitarianism is an a priori assumption: it is not subject to confirmation as scientific theories are.

4.3 Following 3.1, uniformitarianism affects the way probabilities are assigned to theories in a way that is complementary to empirical confirmation (extra-empirical in [3]).

5. Science includes an assumption about nature’s simplicity, this is known as Occam’s Razor.

5.1 Occam’s razor asserts that simpler theories are, ceteris paribus, more likely to be true.

5.2 Occam’s razor is an a priori assumption: it is not subject to confirmation as scientific theories are.

5.3 Following 3.1, Occam’s razor affects the way probabilities are assigned to theories in a way that is complementary to empirical confirmation (extra-empirical in [3]).

It should be apparent that the claims in 4.3 and 5.3 have the same form as that required by 3.3: probability assigning mechanisms that, given equivalently supported theories, allow assigning probabilities differentially.

6. Uniformitarianism and Occam’s razor are the only a priori assumptions accepted in science. They can be reduced/unified into a bias in favour of simpler theories.

6.1 Uniformitarianism and Occam’s razor can be formalized as a bias that assigns relatively more probability to simpler theories. In  a bayesian setting, this bias takes the form of a prior over theories.

6.2 Given that theories take the form of algorithms, one needs a measure of simplicity for an algorithm. An example of such a formalism is found in Solomonoff induction, also in a bayesian setting.

6.3 Although Kolmogorov complexity is uncomputable, and the complexity has a dependence (up to a constant overhead) on choice of turing machine, solomonoff induction illustrates that the complexity of a theory is amenable to formal treatment[7]. This provides a model with which to think about the underlying philosophical issue in a principled way.

6.4 Other considerations such as “testability, falsifiability, naturalness, calculability, and diversity”[5] are not epistemological assumptions, but rather desirable criteria for the efficient practice of science.

Now we can say what the problem with “beauty” is: that it is ambiguous and/or inconsistent with scientific practice.

7. The word “beauty” is not a good way to think about the problem of selecting theories with equivalent empirical support. Whenever “beauty” does not refer to simplicity it is ambiguous or inconsistent.

7.1 If beauty does not refer to simplicity, there is no objective way to formalize it, and it is, in principle, not possible to construct a prior.

7.2 If beauty does not refer to simplicity, applying it is inconsistent: it it is not part of the accepted a priori assumptions used in science.

7.3 If it is a problem to use “beauty” when it does not mean simplicity, then one can drop beauty altogether and use simplicity instead.

Now I turn to a specific application of this approach naturalness as described by Hossenfelder [1]

Physicists use the word “natural” in general to mean that a theory’s dimensionless parameters (ie those without units) are not much larger or much smaller than 1 [..] The belief in naturalness is usually rationalized by claiming that numbers which are very large or very small are unlikely. They appear cherry-picked for no other reason than correctly describing an observation and require further explanation.

I argue that naturalness, as defined above, could be formulated as an instance of a simplicity bias:

8. If a theory is to be encoded as an algorithm, then the parameter values it uses to make calculations must be present as data in the theory.

8.1 Relative to the choice of Universal Turing Machine (UTM), the number of parameters and their values will contribute to the length of the algorithm.

8.2 Due to kolmogorov complexity invariance, the choice of UTM cannot arbitrarily reduce the length contributions of parameters. In particular, if the encoding of certain parameter values in one UTM is efficient, it cannot be for others. There is no free-lunch UTM, and certain number of parameters and values contribute to algorithm length in a non-arbitrary way (up to a constant overhead).

9. If we are using simplicity priors over theories due to 4) and 5), then it is reasonable to expect that these priors over theories have logical implications as to the number and values of theory parameters.

9.1 If 4) and 5) take the form of priors over theories, then they imply priors over parameter values.

9.2 If 9.1) is the case, then not only are simplicity favouring priors over parameter numbers and values compatible with scientific practice, they are, in fact, necessary. For these formulations of 4) and 5) it would be inconsistent not to employ a priori probability distributions over parameter numbers and values.

We can compare 9.1 to observations made by Hossenfelder[1][2] and Wells[3].

The major problem with finetuning arguments both in cosmology and in particle physics is the reference to probabilities without defining a probability distribution, or, if a distributionis defined, the failure to explain where the distribution itself comes from.

If naturalness has any connection to the extra-empirical judgments of theory plausibility, which is surely what naturalness discussion is after, we have no recourse but to introduce probability measures across parameters of the theory.

The main idea is that simplicity assumptions provide these probability distributions, albeit implicitly.

The claims made in 9) are not general, but rather point to the possibility of accomodating naturalness into the framework of scientific practice in a principled way. This is a philosophical argument, because I provide no concrete specifications here for 4) and 5), or naturalness. So one cannot say whether naturalness follows in general, one can only say that it is conceivable, as in the above model, that it does.

We can imagine a scenario where we could go from the philosophical to the technical, in a way reminiscent of E.T. Jaynes’ reasoning robot. The key to this idea is the need to automate the relevant probabilistic reasoning, which forces the formalization of the method into precise mathematical instantiations. This would mean

Step 1. Choose some turing machine where the robot runs on

Step 2. Restrict the class of theories such that priors are computable

Step 3. Formalize the principles 4) and 5) into precise probability priors over theories

Step 4. Formalize naturalness

Step 5. Extract the logical relationship between the formalizations of Step 3 and Step 4.

This would give a concrete answer as to whether, in that scenario, naturalness follows from previous principles, and in effect, whether naturalness is consistent with and necessary for the robots’ “scientific practice”.

Note that Step 3 is related to Wells’ comment [3]:

The critical discussion above leads to a credible skepticism to extra-empirical theory assessments (SEETA). We do not have a meta-theory that provides statistical distributions toparameters of theories, which is a key problem naturalness faces

Step 3 would, in effect, construct the meta-theory for the robot’s accepted scientific principles.

[1] Screams for Explanation: Finetuning and Naturalness in the Foundations of Physics

[2] How nature became unnatural

[3] Naturalness, Extra-Empirical Theory Assessments, and the Implications



[6] Beauty and Elegance in Physics

[7] Machine Super Intelligence

[8] Probability Theory: The Logic of Science


A toy model of information-theoretic security

In a previous post we discussed the foundations of cryptography in terms of different types of uncertainty stemming from limited information or computation. We saw an example using the caesar cipher that demonstrated how an adversary may have all the information necessary to reveal a secret if the relationship between the message space, key space and length of the ciphertext satisfies certain conditions:

Note how we were able to pick out the correct message because none of the other attempts gave meaningful results. This happens because the space of possible keys is so small that only one of them decrypts to a possible message. In technical terms, the key space and message space[2] are small enough compared to the length of the message that only one key will decrypt.

The situation can be coarsely be classified into three cases:

  1. H(K | C) = H(K) – Perfect secrecy
  2. H(K | C) < H(K) – Information-theoretic security
  3. H(K | C) = 0 – Computational security

Information-theoretic security refers to the epistemic uncertainty of the previous entry. We can use a simple model to illustrate how the conditions governing information-theoretic privacy work and when these conditions fail reducing to computational security.

Elements of our model

The XOR function

Our toy model needs to specify how messages are encrypted from some plaintext language to ciphertexts, using a key that determines the exact transformation and can be used by the intended recipient to recover the secret. We can use the simplest language possible, a two character alphabet whose messages are equivalent to binary sequences. Our encryption function will be the XOR function, which takes two binary sequences and produces a third one as its output. This choice fixes our key space to also be binary sequences. Here’s an example encryption:

1010011000 XOR 1100111010 = 110100010

The XOR function takes as input a message (1010011000), a key (1100111010) and produces a ciphertext 110100010. Of course, the message in this case means nothing to us, but there is no difference as to the process, we can simply imagine that the 1010011000 above is some meaningful content like “WELL DONE YOU HAVE FOUND THE SECRET”. The important thing to note is that, just like in english, there is a subset of the combinations in the plaintext space (binary sequences) that make meaningful messages whereas the rest do not. This brings us to the notion of language entropy, which precisely quantifies how big the meaningful subset of messages is relative to the entire plaintext space.


The higher the language entropy, the bigger the blue region will be compared to the entire plaintext space. In the case of a binary language the entropy ranges from 0 to 1, since this quantity is measured in bits per character. So far our toy model has these elements:

  • Plaintext space: ∈ {0, 1}n
  • Message space: M ⊂ P
  • Key space: K ⊂ {0, 1}n
  • Ciphertext space: C ∈ {0, 1}n
  • Encryption function: XOR: P x K → C
  • Language entropy: HL ∈ {0.0-1.0}

The security properties of our system depends on three parameters related to the above:

  • n: the number of characters in the plaintext
  • |K|: the size of the key space
  • HL: the language entropy
  • RL = 1 – HL: the language redundancy

The last parameter, redundancy, is just a rewriting of the language entropy. The equation that describes the security in terms of these parameters is:


This equation gives a lower bound for the expected number of spurious keys, represented by the term sn. A spurious key, for a given ciphertext, is a key that decrypts said ciphertext into a message that does not correspond to the message which was encrypted with the real key. In the example encryption at the beginning of the post we saw that when trying all the keys to decrypt the ciphertext only one of them yielded a meaningful plaintext: the ciphertext had no spurious keys. If, on the other hand, one of the keys, s, had decrypted the ciphertext into something like


then that key s would be a spurious key. An attacker trying all keys would get two possible messages when trying all possible keys backwards and would not know which of them was the real one. The secret would remain somewhat protected. The existence and number of  expected spurious keys determines which of the three coarse categories above a cryptosystem belongs to. Looking at the spurious key equation we can see the following trends

  • sn increases with the size of the key space, |K|
  • sn decreases with the size of the plaintext, n
  • sn decreases with language redundancy, RL

A visual representation

Encryption representation with n = 2, H = 0.8, M = 3, K = 1

The left part of the image is visual representation of our toy model for parameter values, the left axis is the plaintext space, the right axis is the ciphertext space. A point on the plot represents an encryption, in other words, a mapping from the plaintext space to the ciphertext space. We have used n = 2, giving a plaintext space of size 4. Of these four, three are meaningful messages (for a language entropy of 0.8). The 3 red dots on the plot therefore correspond to the three encryptions of these 3 meaningful messages. Because K=1, there is only one ciphertext per plaintext, or visually, only one point on any given horizontal line. Compare with:

n = 2, H = 0.8, M = 3, K = 3

Here we can see 9 points, corresponding to 3 messages x 3 keys. A horizontal line thus represents all the encryptions for a given message under different keys. What about the color difference? Here’s another set of parameters:

n = 6, H = 0.8, M = 28, K = 3

With a higher plaintext length the number of meaningful messages rises to 28, resulting in a total of 28 x 3 = 84 encryptions, which show up as red and blue points in this case. Can you spot the pattern that explains this?

It’s not easy to see, but the answer lies in understanding what vertical lines mean in the representation. Points lying on the same horizontal line are different encryptions for the same message. Points lying on the same vertical line are different messages for the same encryption. As we saw before, this is exactly the situation where it is not possible for an adversary to determine the secret by trying all keys in reverse, as there is no way to tell which of the resulting messages is the original.

Blue points are ciphertexts for which there is more than one key that decrypts to a meaningful message, or equivalently, blue points are ciphertexts with one or more spurious keys.

sn > 0 ⇒ blue point

sn = 0 ⇒ red point

Now we can go see the visual equivalent the properties of information-theoretic security we mentioned before.

  • sn increases with the size of the key space, |K|
Fixed n, H, increasing values of K

Visually: the proportion of blue dots increases.

  • sn decreases with the size of the plaintext, n
Fixed K, H, increasing values of n

Visually: the proportion of red dots increases.

  • sn decreases with language redundancy, RL
Fixed n, K, decreasing values of H (increasing values of R)

Visually: the proportion of red dots increases.

Visualizing the categories

Besides these trends, we also spoke about three broad categories into which cryptosystems fit:

  • H(K | C) = H(K) – Perfect secrecy
n = 8, H = 0.55, K = 256

Visually: the number of blue dots per column is equal to the number of horizontal lines. This means that the adversary gains no information from the ciphertext. Note also that 2^8 = 256, which is the value of K.

  • H(K | C) < H(K) – Information-theoretic security
n = 9, H = 0.55, K = 106

Visually: there are only blue dots. Every ciphertext is partially protected in the sense that the adversary does not have enough information to reveal the secret unequivocally.

  • H(K | C) = 0 – Computational security
n = 13, H = 0.53, K = 29

Visually: there are red dots, these ciphertexts have no information-theoretic protection and depend on computational security for their confidentiality.

Try it yourself

In this post we have described some of the key ideas presented in Claude Shannon’s seminal Communication Theory of Secrecy Systems published in 1949. We have also constructed a toy model that allowed us to visualize the security properties of cryptosystems and how they vary with its main parameters. You can try playing around with the model here. If you are a teacher and find it helpful please let us know!

Privacy-preserving e-participation with nMix

In this post we describe a concrete design enabling privacy for e-participation platforms falling under the category of social information filtering systems. This is an extension of work presented in the User Anonymization for Decidim Barcelona report. Although this report was written in the context of the Decidim e-participation tool, the techniques generalize to any platform with similar characteristics and intent. Continue reading here.

AI Progress: Ladders and Rockets

I once heard a metaphor about progress that I now use when talking about AI. Don’t know where I heard it first, so can’t give credit. Update: see comment by Pat Lee below.

Imagine you want to get to the moon, but you don’t know how. Someone suggests using a ladder propped up against a tree. So a ladder is built, put in place, and the climb begins. To be scientific about it, distance to the moon is measured during the ascent. Sure enough, the distance decreases, all the way to the top of the ladder. To get closer, bigger and bigger ladders are constructed, propped up against taller and taller trees. The enterprise is making progress, as the distance to the moon diminishes with each attempt. If we continue on this way, we will surely make it to the moon.

Or you could try building a rocket. However, building the rocket does not result in any progress of the same kind being reported. You can build all the rocket parts correctly, but building any one of those wont result in any direct progress, as defined by distance to the moon. By that metric, there is no progress at all.

The metaphor illustrates that progress towards an objective does not imply that the employed solution will scale all the way towards reaching its goal. Conversely, it also illustrates that a solution that will scale may not reveal any progress during its development. These two situations are the core of what makes progress towards AI difficult to assess. If we factor in the hype, things are even more confusing because there is a propensity to report progress on advances that may be of the ladder type rather than the rocket type.

Because we have not solved AI, there is no way to know to which type a certain advance belongs. We can have intuitions. For example, an advance that involves generality and can solve many different problems is more likely to be a rocket type component. But not necessarily so. Most techniques in AI developed all the way from the 1960’s until the present day will never scale to full AI, and surely many of them were thought to. And today, what can we say of the latest techniques in deep learning? Perhaps they are a rocket component that solves a small piece of the puzzle, such as low level perception. Or maybe in 20 years time they will be remembered a promising direction that ultimately failed to overcome its data efficiency issues.

But there is one general and large scale trend that we can be confident is bringing us closer to AI, if only in the vague chronological sense. This trend is the continued alignment of the economy with AI research. We can take a previous and related example to illustrate the phenomenon: the semiconductor industry. Before the information age, the allocation of resources to the development of microprocessors and related technologies was orders of magnitude smaller than today. When society entered the information age, the explosion in demand for devices resulted in a shift in resources that gave rise to the exponential advances known as Moore’s law.

Of course, a resource shift is in itself not sufficient to guarantee explosive advances; rather it is a culmination of previous enabling scientific advances that typically occur at a slower pace. However, the limiting factor in scientific progress is typically aggregate cognitive power, which is itself one of the resources being allocated. Whether we are talking about a slow accretion of knowledge, or a fast explosion of applications, the two can only occur when the right people are working on the right task. Which brings us back to the surrounding conditions these individuals are in when making decisions.

Today, we have entered another phase of the information age. A phase in which society is demanding services that project great value onto technologies that can deliver analysis, prediction and automation in data intensive domains. This has shifted huge resources towards AI research. Whether or not specific AI advances are real progress towards AI, the large scale activity resulting from a stronger economic alignment is very likely producing overall progress. And this would be true even if we restricted ourselves to enabling conditions such as increasing computational power or accumulation of training data.

We may not know which are the ladders and which are the rocket components. But we do know that as the resources in play rise the chances that somewhere some group is building rocket components become increasingly significant.

Secure online voting

What do we mean by secure voting?

This is the subject of a new post on the nvotes blog, found here. Here’s the summary:

  • The term “secure voting” is generally thought to refer to cybersecurity and resistance to cyberattacks.
  • However, cybersecurity is a general property of hardware/software that does not reflect the specific requirements of voting as a political process. The secret ballot is an established and indispensable requirement for voting.
  • Secure voting systems must support privacy as well as integrity; these two requirements stand in opposition.
  • In a system supporting privacy, no one, not even system administrators or other privileged observers can violate the secret ballot.
  • In a system supporting end-to-end verifiability, voters can ensure that their vote was cast, recorded, and counted correctly.
  • Cryptographically secure voting systems employ cryptographic technology to satisfy these two properties simultaneously. The gold standard are end-to-end verifiable voting systems.
  • A secure voting system is an end-to-end verifiable voting system that also employs general computer security principles.