If you are a fan of football (a.k.a. soccer), you surely know that the draw for the World Cup in Russia is going to take place on December 1st, 2017.

Teams from 5 continental football federations, namely:

  • North America – CONCACAF,
  • South America – CONMEBOL,
  • Europe – UEFA,
  • Africa – CAF,
  • Asia – AFC (which includes Australia)

are split into 4 pots:

POT 1 POT 2 POT 3 POT 3
Russia Spain Denmark Serbia
Germany Peru Iceland Nigeria
Brazil Switzerland Costa Rica Australia
Portugal England Sweden Japan
Argentina Colombia Tunisia Morocco
Belgium Mexico Egypt Panama
Poland Uruguay Senegal Korea Republic
France Croatia Iran Saudi Arabia

According to FIFA: no teams from the same confederation, with the exception of UEFA, which could have up to two teams in the same group, will be drawn into the same group.

It’s worth noting that if we take any team, the probability of other teams being drawn into the same group depends on:

  • confederations to which all four teams belong
  • the order of pots in which the groups are drawn

So you can get different probabilities for group set up if you do the draw in pot 1, 2, 3, 4 order and in pot 4, 3, 2, 1 order for example. I do not know which order FIFA uses on December 1st, I’m assuming here pot 1 .. pot 4 since Russia is assigned group 1 already, so it makes sense to fill the rest of the groups with pot 1 teams.

Also, the draw done by sequential choosing a team randomly does not assign the same probability to each possible team arrangement. It is because during the draw the rules may force some team to be assigned to a particular group. Other choices may result in an invalid arrangement. I did enumerate all valid confederation arrangements and got slightly different results than the simulation.

For the simulation I wrote a short Python program that simulated the draw with all the rules validated. I computed the probabilities by running the draw simulation a hundred thousand times.

Here’s how you would use these tables.

Assume you want to check probabilities for Serbia, which is from Europe and is in pot 4. Find the “Potential groups for a team from Europe in pot 4” table. It turns out that the most probable arrangement for Serbia (28.5%) is to face an European team from pot 1, South American team from pot 2 and and an African team from pot 3. So Poland, Peru, Tunisia or Germany, Colombia, Senegal etc. are just as probable opponents for Serbia. On the other hand teams from South America from pot 1, North America from pot 2 and Asia from pot 3 have only 2% chance to be drawn into Serbia’s group – indeed there are only 2 such groups possible, with Brazil/Argentina,  Mexico, Iran.

I hope you will find it interesting in the last few days before the draw!

Potential groups for a team from Europe in pot 1 Probability
Europe Europe Africa Asia 18.62%
Europe South America Europe Asia 13.47%
Europe South America Europe Africa 9.42%
Europe South America Africa Asia 7.15%
Europe Europe Asia Africa 5.58%
Europe Europe North America Asia 5.13%
Europe Europe Africa North America 4.90%
Europe South America Africa Europe 4.76%
Europe South America Europe North America 3.58%
Europe Europe North America Africa 3.49%
Europe North America Europe Asia 3.30%
Europe Europe Asia North America 2.26%
Europe North America Europe Africa 2.25%
Europe South America North America Asia 2.14%
Europe South America Asia Africa 2.11%
Europe North America Africa Asia 2.08%
Europe South America Africa North America 1.85%
Europe South America Asia Europe 1.72%
Europe South America North America Europe 1.52%
Europe South America North America Africa 1.51%
Europe North America Africa Europe 1.27%
Europe South America Asia North America 0.76%
Europe North America Asia Africa 0.65%
Europe North America Asia Europe 0.47%
Potential groups for a team from Europe in pot 2 Probability
Europe Europe Africa Asia 27.94%
South America Europe Europe Asia 10.51%
Europe Europe Asia Africa 8.37%
Europe Europe North America Asia 7.69%
Europe Europe Africa North America 7.34%
South America Europe Europe Africa 7.25%
South America Europe Africa Asia 5.97%
Europe Europe North America Africa 5.23%
South America Europe Africa Europe 3.89%
Europe Europe Asia North America 3.39%
South America Europe Europe North America 2.73%
South America Europe North America Asia 1.80%
South America Europe Asia Africa 1.70%
South America Europe Africa North America 1.58%
South America Europe Asia Europe 1.40%
South America Europe North America Africa 1.26%
South America Europe North America Europe 1.25%
South America Europe Asia North America 0.68%
Potential groups for a team from Europe in pot 3 Probability
Europe South America Europe Asia 26.95%
Europe South America Europe Africa 18.84%
South America Europe Europe Asia 14.01%
South America Europe Europe Africa 9.67%
Europe South America Europe North America 7.15%
Europe North America Europe Asia 6.60%
Europe North America Europe Africa 4.50%
South America Europe Europe North America 3.64%
South America North America Europe Asia 3.58%
South America North America Europe Africa 2.54%
South America North America Europe Europe 2.52%
Potential groups for a team from Europe in pot 4 Probability
Europe South America Africa Europe 28.55%
South America Europe Africa Europe 15.57%
Europe South America Asia Europe 10.34%
Europe South America North America Europe 9.14%
Europe North America Africa Europe 7.62%
South America North America Europe Europe 7.56%
South America North America Africa Europe 5.71%
South America Europe Asia Europe 5.60%
South America Europe North America Europe 4.98%
Europe North America Asia Europe 2.84%
South America North America Asia Europe 2.08%
Potential groups for a team from South America in pot 1 Probability
South America Europe Europe Asia 21.02%
South America Europe Europe Africa 14.51%
South America Europe Africa Asia 11.94%
South America Europe Africa Europe 7.78%
South America Europe Europe North America 5.45%
South America North America Europe Asia 5.37%
South America North America Europe Africa 3.81%
South America North America Europe Europe 3.78%
South America Europe North America Asia 3.61%
South America Europe Asia Africa 3.41%
South America Europe Africa North America 3.16%
South America North America Africa Europe 2.86%
South America Europe Asia Europe 2.80%
South America Europe North America Africa 2.53%
South America Europe North America Europe 2.49%
South America North America Africa Asia 2.35%
South America Europe Asia North America 1.37%
South America North America Asia Europe 1.04%
South America North America Asia Africa 0.72%
Potential groups for a team from South America in pot 2 Probability
Europe South America Europe Asia 26.95%
Europe South America Europe Africa 18.84%
Europe South America Africa Asia 14.31%
Europe South America Africa Europe 9.52%
Europe South America Europe North America 7.15%
Europe South America North America Asia 4.28%
Europe South America Asia Africa 4.22%
Europe South America Africa North America 3.70%
Europe South America Asia Europe 3.45%
Europe South America North America Europe 3.05%
Europe South America North America Africa 3.02%
Europe South America Asia North America 1.51%
Potential groups for a team from North America in pot 2 Probability
Europe North America Europe Asia 19.80%
Europe North America Europe Africa 13.49%
Europe North America Africa Asia 12.50%
South America North America Europe Asia 10.75%
Europe North America Africa Europe 7.62%
South America North America Europe Africa 7.62%
South America North America Europe Europe 7.56%
South America North America Africa Europe 5.71%
South America North America Africa Asia 4.71%
Europe North America Asia Africa 3.87%
Europe North America Asia Europe 2.84%
South America North America Asia Europe 2.08%
South America North America Asia Africa 1.44%
Potential groups for a team from North America in pot 3 Probability
Europe Europe North America Asia 30.77%
Europe Europe North America Africa 20.93%
Europe South America North America Asia 12.83%
Europe South America North America Europe 9.14%
Europe South America North America Africa 9.07%
South America Europe North America Asia 7.22%
South America Europe North America Africa 5.06%
South America Europe North America Europe 4.98%
Potential groups for a team from North America in pot 4 Probability
Europe Europe Africa North America 29.37%
Europe South America Europe North America 21.45%
Europe Europe Asia North America 13.58%
Europe South America Africa North America 11.11%
South America Europe Europe North America 10.90%
South America Europe Africa North America 6.32%
Europe South America Asia North America 4.54%
South America Europe Asia North America 2.73%
Potential groups for a team from Africa in pot 3 Probability
Europe Europe Africa Asia 37.25%
Europe South America Africa Asia 14.31%
Europe Europe Africa North America 9.79%
Europe South America Africa Europe 9.52%
South America Europe Africa Asia 7.96%
South America Europe Africa Europe 5.19%
Europe North America Africa Asia 4.17%
Europe South America Africa North America 3.70%
Europe North America Africa Europe 2.54%
South America Europe Africa North America 2.11%
South America North America Africa Europe 1.90%
South America North America Africa Asia 1.57%
Potential groups for a team from Africa in pot 4 Probability
Europe South America Europe Africa 28.27%
Europe Europe Asia Africa 16.74%
South America Europe Europe Africa 14.51%
Europe Europe North America Africa 10.46%
Europe North America Europe Africa 6.74%
Europe South America Asia Africa 6.33%
Europe South America North America Africa 4.54%
South America North America Europe Africa 3.81%
South America Europe Asia Africa 3.41%
South America Europe North America Africa 2.53%
Europe North America Asia Africa 1.94%
South America North America Asia Africa 0.72%
Potential groups for a team from Asia in pot 3 Probability
Europe Europe Asia Africa 33.48%
Europe Europe Asia North America 13.58%
Europe South America Asia Africa 12.67%
Europe South America Asia Europe 10.34%
South America Europe Asia Africa 6.82%
South America Europe Asia Europe 5.60%
Europe South America Asia North America 4.54%
Europe North America Asia Africa 3.87%
Europe North America Asia Europe 2.84%
South America Europe Asia North America 2.73%
South America North America Asia Europe 2.08%
South America North America Asia Africa 1.44%
Potential groups for a team from Asia in pot 4 Probability
Europe Europe Africa Asia 27.94%
Europe South America Europe Asia 20.21%
Europe South America Africa Asia 10.73%
South America Europe Europe Asia 10.51%
Europe Europe North America Asia 7.69%
South America Europe Africa Asia 5.97%
Europe North America Europe Asia 4.95%
Europe South America North America Asia 3.21%
Europe North America Africa Asia 3.13%
South America North America Europe Asia 2.69%
South America Europe North America Asia 1.80%
South America North America Africa Asia 1.18%
Advertisements

I learned about New Merology while reading Professor Stewart’s Cabinet of Mathematical Curiosities. The original idea was created by Lee Sallows and I found the original paper here. The updated version is here. I think that either Ian Stewart or Lee Sallows have explained this better than I can, here’s just a short introduction.

So called gematria deals with assigning distinct values to letters thus giving each word a numerical value. Let’s call a number name perfect if the sum of the letter values equals the number. For example (in English):

O + N + E = 1

T + W + O = 2

This is impossible to attain with the typical letter value assignment A=1, B=2, C=3 etc. To overcome this problem Lee Sallows invented New Merology which says that the letter value can be zero or negative, or not necessarily a whole number- although in this case we deal with integers only. Letter values must be still distinct. Lee Sallows starts his exploration of the possible values with O=-2, N=2 and E=1. Then O+N+E equals 1 and if W=-3 and T=7 it necessarily follows that T+W+O=2, making both ONE and TWO perfect numbers. In his paper he finally arrives at the most perfect possible solutions for English and French. This is reportedly verified by a Pascal program.

I have originally written a Python program to generate the Perfect Number Names solutions for Polish and English. It tried to arrive at the dependencies between letters e.g. (in Polish)

D + Z + I + E + S + I + Ę + Ć = 10

D + Z + I + E + W + I + Ę + Ć = 9

(after subtracting both sides)

S – W = 1

so S=W+1, which can lead to further discovered dependencies. But the remaining part of the program was still brute force and made it quite slow. And I was thinking that there must be some better way to solve this since it was just a list of simultaneous Diophantine equations with additional constraints (distinct letter values). I abandoned this puzzle temporarily but kept it in the back on my mind.

Z3

And now Z3Prover enters the stage. It can be used for many things and may be an overkill in this particular case. Still it is very simple to use –  it allows me to provide the equations and constraints and practically instantly receive the information if these are satisfiable or not. Moreover the positive result is accompanied with a model, in this case the mapping of letters to their values!

So here are the inputs:

  1. Equations e.g. Z+E+R+O=0, O+N+E=1 etc. up to MAX_NUM
  2. Distinctness constraints: A!=B, A!=C… B!=C etc. for all A…Z
  3. Minimum: abs(A) <= MIN_VAL for all letters A…Z

The algorithm

Initially I start with only inputs 1 & 2: the list of equations including numerals up to MAX_NUM (usually less than 20) and the distinctness constraints. If these are not satisfiable, I reduce MAX_NUM until I find the list of the numerals that is satisfiable under the constraints. (Bisection would be an improvement here).

Once I establish the largest satisfiable list of perfect number names i.e. know the MAX_NUM, another search is made, this time using all inputs 1 & 2 & 3 and going upward if current MIN_VAL condition makes the constraints unsatisfiable. (MIN_VAL  is the smallest maximum absolute value of all used letters). This search must end since a solution existed in the previous algorithm step. The obvious improvement is again to use the bisection instead of the linear search and is left as an exercise for the reader.

This results in the longest possible list of perfect number names and the smallest maximum absolute value of all letters.

Language competition

Which language among Dutch, English, French, German, Italian, Polish, Portuguese and Spanish has the longest list of perfect number names? Make your guess. The output of my program is provided below with the answer.

Note: I’m treating letters with diacritics/accents as distinct from the base letters which makes sense for Polish. In Lee Sallows’ solution for French he did not distinguish between E in DEUX and É in ZÉRO, I do, even though all the character mapping makes the program harder to follow.

****************************************
DE numerals from 0 up to 16 absolute value within 12
a=11 b=-4 c=-9 d=8 e=10 f=-1 h=5 i=-12 l=2 n=3 ö=7 r=-3 s=0 t=1 u=-7 ü=4 v=9 w=12 z=-8

   n   u   l   l   =   0
  +3  -7  +2  +2   =   0

   e   i   n   s   =   1
 +10 -12  +3  +0   =   1

   z   w   e   i   =   2
  -8 +12 +10 -12   =   2

   d   r   e   i   =   3
  +8  -3 +10 -12   =   3

   v   i   e   r   =   4
  +9 -12 +10  -3   =   4

   f   ü   n   f   =   5
  -1  +4  +3  -1   =   5

   s   e   c   h   s   =   6
  +0 +10  -9  +5  +0   =   6

   s   i   e   b   e   n   =   7
  +0 -12 +10  -4 +10  +3   =   7

   a   c   h   t   =   8
 +11  -9  +5  +1   =   8

   n   e   u   n   =   9
  +3 +10  -7  +3   =   9

   z   e   h   n   =  10
  -8 +10  +5  +3   =  10

   e   l   f   =  11
 +10  +2  -1   =  11

   z   w   ö   l   f   =  12
  -8 +12  +7  +2  -1   =  12

   d   r   e   i   z   e   h   n   =  13
  +8  -3 +10 -12  -8 +10  +5  +3   =  13

   v   i   e   r   z   e   h   n   =  14
  +9 -12 +10  -3  -8 +10  +5  +3   =  14

   f   ü   n   f   z   e   h   n   =  15
  -1  +4  +3  -1  -8 +10  +5  +3   =  15

   s   e   c   h   z   e   h   n   =  16
  +0 +10  -9  +5  -8 +10  +5  +3   =  16

****************************************
NL numerals from 0 up to 13 absolute value within 12
a=4 c=-11 d=-10 e=3 é=1 f=2 g=5 h=7 i=0 j=12 l=6 n=-1 r=10 s=-8 t=8 u=-5 v=-9 w=-12 z=11

   n   u   l   =   0
  -1  -5  +6   =   0

   é   é   n   =   1
  +1  +1  -1   =   1

   t   w   e   e   =   2
  +8 -12  +3  +3   =   2

   d   i   r   e   =   3
 -10  +0 +10  +3   =   3

   v   i   e   r   =   4
  -9  +0  +3 +10   =   4

   v   i   j   f   =   5
  -9  +0 +12  +2   =   5

   z   e   s   =   6
 +11  +3  -8   =   6

   z   e   v   e   n   =   7
 +11  +3  -9  +3  -1   =   7

   a   c   h   t   =   8
  +4 -11  +7  +8   =   8

   n   e   g   e   n   =   9
  -1  +3  +5  +3  -1   =   9

   t   i   e   n   =  10
  +8  +0  +3  -1   =  10

   e   l   f   =  11
  +3  +6  +2   =  11

   t   w   a   a   l   f   =  12
  +8 -12  +4  +4  +6  +2   =  12

   d   e   r   t   i   e   n   =  13
 -10  +3 +10  +8  +0  +3  -1   =  13

****************************************
PT numerals from 0 up to 11 absolute value within 20
a=-11 c=9 d=-4 e=-6 ê=3 i=-10 m=-3 n=-8 o=5 q=17 r=-19 s=11 t=8 u=4 v=18 z=20

   z   e   r   o   =   0
 +20  -6 -19  +5   =   0

   u   m   =   1
  +4  -3   =   1

   d   o   i   s   =   2
  -4  +5 -10 +11   =   2

   t   r   ê   s   =   3
  +8 -19  +3 +11   =   3

   q   u   a   t   r   o   =   4
 +17  +4 -11  +8 -19  +5   =   4

   c   i   n   c   o   =   5
  +9 -10  -8  +9  +5   =   5

   s   e   i   s   =   6
 +11  -6 -10 +11   =   6

   s   e   t   e   =   7
 +11  -6  +8  -6   =   7

   o   i   t   o   =   8
  +5 -10  +8  +5   =   8

   n   o   v   e   =   9
  -8  +5 +18  -6   =   9

   d   e   z   =  10
  -4  -6 +20   =  10

   o   n   z   e   =  11
  +5  -8 +20  -6   =  11

****************************************
FR numerals from 0 up to 13 absolute value within 15
a=15 c=-9 d=14 e=6 é=5 f=2 h=13 i=8 n=7 o=-5 p=-2 q=-1 r=-3 s=10 t=-7 u=-6 x=-12 z=3

   z   é   r   o   =   0
  +3  +5  -3  -5   =   0

   u   n   =   1
  -6  +7   =   1

   d   e   u   x   =   2
 +14  +6  -6 -12   =   2

   t   r   o   i   s   =   3
  -7  -3  -5  +8 +10   =   3

   q   u   a   t   r   e   =   4
  -1  -6 +15  -7  -3  +6   =   4

   c   i   n   q   =   5
  -9  +8  +7  -1   =   5

   s   i   x   =   6
 +10  +8 -12   =   6

   s   e   p   t   =   7
 +10  +6  -2  -7   =   7

   h   u   i   t   =   8
 +13  -6  +8  -7   =   8

   n   e   u   f   =   9
  +7  +6  -6  +2   =   9

   d   i   x   =  10
 +14  +8 -12   =  10

   o   n   z   e   =  11
  -5  +7  +3  +6   =  11

   d   o   u   z   e   =  12
 +14  -5  -6  +3  +6   =  12

   t   r   e   i   z   e   =  13
  -7  -3  +6  +8  +3  +6   =  13

****************************************
PL numerals from 0 up to 13 absolute value within 10
a=7 c=-4 ć=1 d=-8 e=5 ę=-10 i=8 j=-1 m=-7 n=0 o=-2 p=6 r=-5 s=4 ś=-6 t=9 w=3 y=-3 z=2

   z   e   r   o   =   0
  +2  +5  -5  -2   =   0

   j   e   d   e   n   =   1
  -1  +5  -8  +5  +0   =   1

   d   w   a   =   2
  -8  +3  +7   =   2

   t   r   z   y   =   3
  +9  -5  +2  -3   =   3

   c   z   t   e   r   y   =   4
  -4  +2  +9  +5  -5  -3   =   4

   p   i   ę   ć   =   5
  +6  +8 -10  +1   =   5

   s   z   e   ś   ć   =   6
  +4  +2  +5  -6  +1   =   6

   s   i   e   d   e   m   =   7
  +4  +8  +5  -8  +5  -7   =   7

   o   s   i   e   m   =   8
  -2  +4  +8  +5  -7   =   8

   d   z   i   e   w   i   ę   ć   =   9
  -8  +2  +8  +5  +3  +8 -10  +1   =   9

   d   z   i   e   s   i   ę   ć   =  10
  -8  +2  +8  +5  +4  +8 -10  +1   =  10

   j   e   d   e   n   a   ś   c   i   e   =  11
  -1  +5  -8  +5  +0  +7  -6  -4  +8  +5   =  11

   d   w   a   n   a   ś   c   i   e   =  12
  -8  +3  +7  +0  +7  -6  -4  +8  +5   =  12

   t   r   z   y   n   a   ś   c   i   e   =  13
  +9  -5  +2  -3  +0  +7  -6  -4  +8  +5   =  13

****************************************
ES numerals from 0 up to 11 absolute value within 20
a=-9 c=20 d=4 e=7 h=10 i=-19 n=-5 o=-11 r=-16 s=9 t=3 u=17 v=-17 z=18

   c   e   r   o   =   0
 +20  +7 -16 -11   =   0

   u   n   o   =   1
 +17  -5 -11   =   1

   d   o   s   =   2
  +4 -11  +9   =   2

   t   r   e   s   =   3
  +3 -16  +7  +9   =   3

   c   u   a   t   r   o   =   4
 +20 +17  -9  +3 -16 -11   =   4

   c   i   n   c   o   =   5
 +20 -19  -5 +20 -11   =   5

   s   e   i   s   =   6
  +9  +7 -19  +9   =   6

   s   i   e   t   e   =   7
  +9 -19  +7  +3  +7   =   7

   o   c   h   o   =   8
 -11 +20 +10 -11   =   8

   n   u   e   v   e   =   9
  -5 +17  +7 -17  +7   =   9

   d   i   e   z   =  10
  +4 -19  +7 +18   =  10

   o   n   c   e   =  11
 -11  -5 +20  +7   =  11

****************************************
IT numerals from 0 up to 10 absolute value within 10
a=0 c=-8 d=-7 e=5 i=10 n=-4 o=1 q=-2 r=-5 s=-9 t=3 u=4 v=7 z=-1

   z   e   r   o   =   0
  -1  +5  -5  +1   =   0

   u   n   o   =   1
  +4  -4  +1   =   1

   d   u   e   =   2
  -7  +4  +5   =   2

   t   r   e   =   3
  +3  -5  +5   =   3

   q   u   a   t   t   r   o   =   4
  -2  +4  +0  +3  +3  -5  +1   =   4

   c   i   n   q   u   e   =   5
  -8 +10  -4  -2  +4  +5   =   5

   s   e   i   =   6
  -9  +5 +10   =   6

   s   e   t   t   e   =   7
  -9  +5  +3  +3  +5   =   7

   o   t   t   o   =   8
  +1  +3  +3  +1   =   8

   n   o   v   e   =   9
  -4  +1  +7  +5   =   9

   d   i   e   c   i   =  10
  -7 +10  +5  -8 +10   =  10

****************************************
EN numerals from 0 up to 12 absolute value within 10
e=-2 f=-6 g=0 h=-7 i=7 l=9 n=2 o=1 r=4 s=3 t=10 u=5 v=6 w=-9 x=-4 z=-3

   z   e   r   o   =   0
  -3  -2  +4  +1   =   0

   o   n   e   =   1
  +1  +2  -2   =   1

   t   w   o   =   2
 +10  -9  +1   =   2

   t   h   r   e   e   =   3
 +10  -7  +4  -2  -2   =   3

   f   o   u   r   =   4
  -6  +1  +5  +4   =   4

   f   i   v   e   =   5
  -6  +7  +6  -2   =   5

   s   i   x   =   6
  +3  +7  -4   =   6

   s   e   v   e   n   =   7
  +3  -2  +6  -2  +2   =   7

   e   i   g   h   t   =   8
  -2  +7  +0  -7 +10   =   8

   n   i   n   e   =   9
  +2  +7  +2  -2   =   9

   t   e   n   =  10
 +10  -2  +2   =  10

   e   l   e   v   e   n   =  11
  -2  +9  -2  +6  -2  +2   =  11

   t   w   e   l   v   e   =  12
 +10  -9  -2  +9  +6  -2   =  12

As you see German won with perfect number names possible for 0…16.

Python code

 

# -*- coding: UTF-8 -*-
# Copyright 2016 TKK
from z3 import sat, Int, Solver

NUMERALS = {
  "PL" : "zero,jeden,dwa,trzy,cztery,pięć,sześć,siedem,osiem,dziewięć,dziesięć,jedenaście,dwanaście,trzynaście,czternaście".split(','),
  "DE" : "null,eins,zwei,drei,vier,fünf,sechs,sieben,acht,neun,zehn,elf,zwölf,dreizehn,vierzehn,fünfzehn,sechzehn,siebzehn".split(','),
  "FR" : "zéro,un,deux,trois,quatre,cinq,six,sept,huit,neuf,dix,onze,douze,treize,quatorze,quinze,seize".split(','),
  "IT" : "zero,uno,due,tre,quattro,cinque,sei,sette,otto,nove,dieci,unidici,dodici,tredici,quattordici,quindici,sedici".split(','),
  "ES" : "cero,uno,dos,tres,cuatro,cinco,seis,siete,ocho,nueve,diez,once,doce,trece,catorce,quince,dieciséis,diecisete".split(','),
  "PT" : "zero,um,dois,três,quatro,cinco,seis,sete,oito,nove,dez,onze,doze,treze,quatorze,quinze,dezesseis".split(','),
  "NL" : "nul,één,twee,dire,vier,vijf,zes,zeven,acht,negen,tien,elf,twaalf,dertien,veertien,vijftien".split(','),
  "EN" : "zero,one,two,three,four,five,six,seven,eight,nine,ten,eleven,twelve,thirteen,fourteen".split(',')
}

CHARMAP = { 'ć': 'c_', 'ę': 'e_', 'ś': 's_', 'ö': 'o_', 'ü': 'u_', 'é': 'e_acute', 'ê': 'e_circ' }

def find_num(numerals, limit, charmap):
  letters = sorted(set(charmap.get(x, x) for x in "".join(numerals)))

  my_sym = {}
  for l in letters:
    my_sym[l] = Int(l)

  s = Solver()
  for i, w in enumerate(numerals):
    s.add(sum([ my_sym[charmap.get(c, c)] for c in w ]) == i)

  for i, let in enumerate(letters):
    if limit is not None:
      s.add(my_sym[let] >= -limit)
      s.add(my_sym[let] <= limit)

    for let2 in letters[i + 1:]:
      s.add(my_sym[let] != my_sym[let2] )

  chk = s.check()
  model = None
  if chk == sat:
    model = s.model()
  return chk, model

def find_longest_existing(numerals, charmap): 
  for lennum in range(len(numerals), 0, -1):
    ret, model = find_num(numerals[:lennum], None, charmap)
    if ret == sat:
      return lennum
  return 0

def find_best_assignment(numerals, charmap):
  for m in range(5, 100):
    ret, model = find_num(numerals, m, charmap)
    if ret == sat:
      return m, model
  return None, None

def pretty_print_solution(lang, range_max, abs_max, model, charmap):
    rev_charmap = dict(zip(charmap.values(), charmap.keys()))
    print("*" * 40)
    print(lang, "numerals from 0 up to", range_max - 1, "absolute value within", abs_max)
    #print(NUMERALS[lang][:v])
    sp = sorted((str(k), model[k].as_long()) for k in model)
    sval = dict(sp)
    print(" ".join(["{}={}".format(rev_charmap.get(p[0], p[0]), p[1]) for p in sp ]))
    print()
    
    for i, word in enumerate(NUMERALS[lang][:range_max]):
        let, num, nsum = "", "", 0
        for ch in word:
            let += "{: >4}".format(ch)
            chval = sval[CHARMAP.get(ch, ch)]
            num += "{: >+4}".format(chval)
            nsum += chval

        let += "   ={: >4}".format(i)
        num += "   ={: >4}".format(nsum)
        print(let)
        print(num)
        print()    

def main():
  for lang in NUMERALS:
    v = find_longest_existing(NUMERALS[lang], CHARMAP)
    
    ret, model = find_best_assignment(NUMERALS[lang][:v], CHARMAP)
    if ret and model:
        pretty_print_solution(lang, v, ret, model, CHARMAP)    
   
if __name__ == "__main__":
  main()

This is related to the previous post. I have started developing drop-in replacement for atoi() and atol(). The new functions maintain the compatibility with the library functions and yet are faster on the platforms and compilers I tested. The sign of a number and whitespace are handled as atoi() would, with the caveat that whitespace is hardcoded to match atoi() behavior when LCC_ALL=C. See the source code and performance charts.

 

 

This year I had the pleasure to attend Code::Dive conference in Wroclaw, Poland. One of the invited speakers was Andrei Alexandrescu, the author of the notorious “Modern C++ Design” book and co-designer of the D language who lately left Facebook to concentrate on his “pet project”. I set my sights on Andrei and didn’t miss any of his lectures – they were worth it.

Sadly, the video materials from the conference have not been officially published yet. If you search reddit, however, you will find links to non-public recordings on YT. I recommend both parts of “Writing Fast Code”. It’s an eye opening experience – Andrei shows how the mathematic properties of operations and processor’s internal architecture  can be combined together to increase the performance by several hundred percent. And the optimized functions are very simple to understand and yet useful: fast versions of integer-to-string and string-to-integer conversions.

The claimed optimizations were substantial: 2-3 times faster than the library function and up to 2 times faster than a handwritten one. However Andrei underlined that everything should be properly measured and you should not rely on intuition and experts since they are often wrong. So, I started my own verification task. First, I wrote the atoui() routine using the Horner algorithm, the equivalent of “handwritten” version in the lecture. As the next step, I added “one small trick” that Andrei showed us. It saved some comparisons and the code was visibly faster: atoui_f() on the chart.

Then I rewrote the function applying the optimizations that should better parallelize the computation done in ALU… and received something slower even than the original Horner algorithm: atoui_and(). After reducing the number of  comparisons with “one small trick”, the new version, atoui_and_f(), was still losing to the original atoui() on my old i5 CPU and MinGW gcc 4.9.2 compiler even after applying full optimizations for the architecture. Unrolling the loops – atoui_and_u() – did not improve the situation. Horner was winning.

Well, that could not be right: probably Alexandrescu used a different combination of compiler and architecture.  Luckily I have a secondary development machine which is… well… not as old as the primary one. Still I could not replicate Andrei’s results in full. But at least, for some string sizes atoui_and_u() (orange, unrolled and mathematically optimized) was minimally faster even than atoui_f() (red, the improved Horner algorithm). The chart from my MacBook is below. The function zero() is just a plain empty loop prevented from being optimized out that serves as a base value (another trick by Andrei).

image001

atoui performance

So a minor success. And it pays off to know your maths and tricks of the trade after all.

I hope to write about the implementation details if I have some time. Please leave a comment if you are interested.

Note: in Alexandrescu’s video you will find ILP, ILP+unsigned, ILP+unsigned+unrolling, these are respectively atoui_and, atoui_and_f, atoui_and_u.

I’m bringing to an end this series of mildly interesting useless factoids on the Google Translate ability to recognize the likely language judging by just the one or two initial letters. Polish language uses almost all letters of the Roman alphabet and adds nine specific ones: ą, ę, ć, ł, ń, ó, ś, ź, ż. Some of these are also used by other languages like Lithuanian, but I suspected that all would be detected on their own as being a part of a Polish word by Google Translate. Of course this an example of Central Europe-centric thinking 😉 Was I right?

It turns out that ć, ę, ł, ś, ź, ż are indeed recognized as Polish. The letter ą was marked as English, which might mean “no detection, use default”. It’s debatable whether this behaviour is correct since no valid Polish word can start with ą. On the other hand ę never happens at the beginning of a word and it was detected as Polish. What’s interesting – the letter ó was marked as probable Irish and ń as Yoruba. I understand that in those languages ó and ń are just versions of the base letter with added acute accent, whereas in Polish they have quite distinct pronunciation from o and n.

Speaking of accents here’s some bonus information: (single apostrophe) is guessed to be Afrikaans and ` (back apostrophe): Armenian.

Finally here are the changes in the proposed language to translate from when typing “ósemka” – the name of the digit 8 and one of the few words that start with ó in Polish.

ó      Irish - detected
ós     Galician - detected
óse    Irish - detected
ósem   Icelandic - detected
ósemk  Icelandic - detected
ósemka Polish - detected

 

I wrote a script that ran all two letter word prefixes against Google Translate page and stored the auto detected language. It has no real use apart from a long list of mildly interesting factoids, for example “ji” prefix is most likely to be Czech, “sy” – Afrikaans and “ta” – Maltese etc.

Here’s the visualization of the prefix to language mapping done using PIL. I have used HSV color space so that the colors would be uniformly distributed among the hues, although the less saturated ones seem to be difficult to distinguish from one another. Each color represents a language, the languages are listed at the bottom. Generation of N easily distinguishable colors seems to be an interesting problem in itself.

Note that letter combinations unlikely to prefix a word in any language seem to default to English.

Two letter prefixes and detected languages

Two letter prefixes and detected languages

Disclaimer: all data came from Google Translate.

When you use Google Translate with the “Instant translation” and “Detect language” options turned on, the language is detected as soon as you type the first letter.

For example a single letter “a” is identified as a text most probably in English and both “w” and “z” are likely to be Polish. These happen to be valid words in the relevant languages.

Here is the chart of the various languages detected in response to all the letters of the Roman alphabet.

Letter to likely language mapping chart

Letter to likely language mapping chart

Note that most letters are defaulted to English. Of course this Google Translate behaviour probably depends on the user country and language settings, so your mileage can vary.

In the next installment in this series I’ll write about the language specific characters and multi-letter prefixes.