Secure voting systems supporting privacy through encryption must encode ballot contents into integers before they can be encrypted[1]. This encoding step is mostly trivial. For example, imagine a yes-no-abstain ballot. One can simply apply the following mapping to yield integer plaintexts for corresponding ballots:

Yes => 1

No => 2

Abstain => 3

But things can get a bit more involved when dealing with multiple-selection ballots. These are ballots where the voter makes more than one choice. They can be either ranked ballots, where the voter specifies a preference relation over the selections, or unranked ballots where no such a preference exists. Examples of voting systems using the former are single transferable vote or instant runoff voting. Systems like approval voting or plurality at large are examples of the second type, using unranked ballots.

Imagine we are using one of these systems to elect a candidate out of a field four: Alice, Bob, Charlie, and Donna. We first apply the trivial mapping:

Alice => 1

Bob => 2

Charlie => 3

Donna => 4

But how do we encode a complete ballot, for example, a ballot with (X corresponds to marked choices)

Alice X

Bob X

Charlie O

Donna O

Unlike the yes-no-abstain ballot above, the content of the ballot corresponds to a *list* of integers: 1 and 2. We could use the following mapping

encode: Alice, Bob => 1, 2 => 12

The ballot is encoded as the number 12, resulting from the *concatenation of the string representations* of each of the integers. But what if there are more candidates, and we need to encode something like:

encode: 11, 3 => 113

That won’t work, because 113 could represent either 11 and 3, or 1 and 13.

decode: 113 => ?

We can remedy this by adding some padding such that each choice is well separated:

encode: 11, 3 => “1103″ => 1103

Then when decoding, we convert the integer to a string, split it every 2 characters, and finally obtain integers for each of the candidates:

decode: 1103 => “1103″ => “11″, “03″ => 11, 03

But there’s still a problem, what about this choice:

encode: 1, 13 => “0113″ => 113

We run into trouble here, because the string “0113″ corresponds to the integer 113; there is no mathematical difference between “0113″ and ”113″. To fix this, when decoding we can first check that the string length is a multiple of 2 (since we are using 2 chars per candidate integer), if it is not we prepend the required zeros. The encode-decode process would be

encode: 1, 13 => “0113″ => 113

decode: 113 => “113″ (prepend zero) => “0113″ => “01″, “13″ => 1, 13

I hear you complain that all this concatenation, padding, and prepending looks a bit hackish, can we do better?

Let’s go back to our first example, when we simply wanted to do

encode: Alice, Bob => 1, 2 => 12

This looked very nice and simple. Can’t we do something like this in general, without any string hackery? The first step is to go back to the definition of decimal numbers.

In these terms, the encoding 1, 2 => 12 corresponds to

(10^1) * 1 + (10^0) * 2 = 12

Here we have expressed the encoding of 1, 2 using arithmetic, no string operations involved. The ballot choices are interpreted as digits according to the mathematical definition of decimal numbers. (In fact, this is what goes on under the covers when you convert a string like “12″ into the number 12.) This gives us a a purely arithmetical description of the simple mapping we started with. Things then got complicated when we considered the possiblity of more choices (candidates) in the ballot. Let’s apply our mapping to that problematic ballot:

encode: 11, 3 => (10^1) * 11 + (10^0) * 3 = 113

Our new procedure fails the same way: the simple scheme where each digit represents one choice cannot be accommodated by the decimal digit representation of choices, and the result 113 is ambiguous. But wait, who says we have to encode according to a decimal representation? What if we were to map choices to hexadecimal digits,:

encode: 11, 3 => (10^1) * B + (10^0) * 3 = B3

And we’ve restored simplicity and correctness to our scheme. B3 encodes the choice 11, 3 with one choice per digit and no ambiguity! If the B3 looks like cheating, just remember, B3 is a *representation* of a number that in decimal format turns out to be 179. The encode-decode process could just as well be written as

encode: 11, 3 => 179

decode: 179 => 11, 3

The bottom line is we can encode lists of integers into an integer provided we use the optimal base, which is equal to the number of possible choices in the ballot plus one.

Let’s revisit our original example, with Alice, Bob, Charlie and Donna. Since we have four candidates, our base is 4 + 1 = 5. The encoding is thus:

encode: Alice, Bob => 1, 2 => (10^1) * 1 + (10^0) * 2 = 12 (base 5) = 7 (decimal)

in short:

encode: 1, 2 => 7

decode: 7 => 1, 2

Note that not only is this method simpler with no string operations or padding, but the encoded values are smaller. Compare:

encode: 1, 2 => 12

encode: 11, 3 => 1103

with

encode: 1, 2 => 7

encode: 11, 3 => 179

Which should not come as a surprise, encoding with the specified base is the most compact[2] encoding possible (proof left as excercise for the reader). A larger base wastes space encoding ballot contents that are not possible, whereas a smaller base is insufficient to encode all possible ballots.

Finally, here is a python implementation of the encoder we have proposed

def encode(choices, max_choices): radix = max_choices + 1 number = 0 for d in choices: number += int(d) number = number * radix return number / radix def decode(number, max_choices): radix = max_choices + 1 choices = deque() while(number > 0): choices.appendleft(number % radix) number = number / radix return choices

In the next post we will further discuss details as to the compactness of the encoding mentioned in [2].

[1] In the case of ElGamal encryption used in Agora Voting, the plaintext must be encoded into an element of the multiplicative subgroup G of order q of the ring Zp, where p and q are suitably chosen prime numbers. In order to do this, the plaintext must be first encoded into an integer, after which it is mapped to a member of G, and subsequently encrypted.

[2] A few caveats must be mentioned. First, we are using 1-based indices to represent choices, which means some values are unused. Second, our scheme allows encoding repeated choices, which are usually not legal ballots. For example, one can encode the choice 1,1,1 but that would be an illegal vote under all the voting systems we mentioned as examples. The fact that legal ballots are a subset of all possible encodings means that the encoding is necessarily suboptimal with respect to those requirements. We will look at an alternative scheme in the next post.