While reading excellent Professor Stewart’s Cabinet of Mathematical Curiosities I came across the 12 coins problem (again). This problem is usually stated like this: you have 12 coins out of which a single one is a counterfeit, either lighter or heavier than the regular ones. You have access to a balance scale and need to determine which coin is fake and if it weighs more or less compared to genuine coins. You only have  three weighings to accomplish this.

I must have seen this problem before and maybe even have solved this previously, however I could not remember the details of the answer. So I started anew especially as I was seeing a way to solve this neatly.

Obviously in each of three weighings every coin might be put on the left pan, right pan, or put away. Let’s code the left pan by 1, right pan by 2 and off the scale as 0. Obviously to differentiate between the coins each needs to have a different pattern of being placed into the left, right and no pan. For three weighings there are 3^3=27 such codes available, although (0, 0, 0) or 000 is not acceptable since we need to measure each coins at least once to determine if it’s heavier or lighter. So we have 001, 002, 010, 011, 012, 020…, …, 221, 222 to choose from.

Conveniently if we assume that each of the coins is weighed exactly twice we will end up with twelve available codes:

 1: 110
 2: 120
 3: 210
 4: 220
 5: 101
 6: 102
 7: 201
 8: 202
 9: 011
10: 012
11: 021
12: 022

For example the 1st coin would be put on the left pan in both first and second weighing step and put away for the third one. If the left pan would be heavier in both 1st and 2nd attempt and there would be an equilibrium in the third one, the first coin would be a heavier counterfeit. Or would it? Equally the 4th coin might be a lighter counterfeit since it was placed in the pan opposite to the 1st coin. My assumption was wrong. The code assignment as given above is NOT a correct solution.

Given this failure I resorted to a more clumsy solution involving a decision tree. It involved weighing a different number of coins each time, which convinced me that there must not be a solution that has 4 +4 coins on the scale each time. Having completed this I looked up the answer in Professor Stewart’s book. To my astonishment he gave a neat answer basing on a coding similar to mine! How was it even possible? Oh, there was a detail I overlooked. You get more usable codes if you weigh some coins just once and some three times!

There are some questions that arise at this point.

How to find a coding that will always give a correct solution?

Is the solution given in the book the only such coding? (Up to the order and left/right symmetry).

Is there a coding that puts only 1, 2, 3, 5 or 6 coins into each pan at some weighing step?

Let’s find out the size of the problem first. We have 27-1=26 codes to choose from and we need to assign one to each coin. The number of such assignments is C(26, 12)=9657700 (C stands for combinations, binomial coefficient). This is small enough to check them all by brute force.

An assignment must fulfill some conditions to be considered a correct solution. First of all, there must be equal numbers of coins in both pans at each of three weighing steps. So if 4 codes have “1′ as the first digit, which stands for the left scale pan in the first step, the remaining chosen codes must have 4 codes starting with “2” (right pan in the first step) and the remaining four require “0”.  Function equal_counts verifies this condition.

The other important condition is the “no symmetric coins” as shown in the example of my failed coding above. If a coin is given a code of 112 (left, left, right) any other coin cannot be assigned 221 (right, right, left). Otherwise you would not be able to determine which of the two is a counterfeit and whether it is heaver or lighter. This is checked by has_symmetric_coins function.

The brute force method finds the correct codings in less than a minute on a modern laptop. After removing duplicates – rejecting left/right symmetrical solutions and these solutions where only the order of weighing steps changes – there are 26 unique solutions left. What’s interesting each of them has 4+4 coins weighed at every step; this means that e.g. weighing 5+5 in any step would destroy too much  information. An example solution is shown here. As you see the first coin is weighed only once, on the left scale pan in the third batch. On the other hand coin number 12 is weighed three times, as right, right, left. Thus if the scale would balance towards right, right and left in the three steps, the 12th coin would be a heavier fake; balance towards left, left and right would mean 12th coin is lighter, while balance towards left, left and equal (110, opposite of 220) would mean 11th coin is lighter. And so on.

 1: 001
 2: 010
 3: 011
 4: 012
 5: 100
 6: 102
 7: 121
 8: 122
 9: 202
10: 210
11: 220
12: 221

The brute force program is given below. I’m sure there must be a more sophisticated and faster algorithm. Finding it is left as an exercise for the reader.

 

import itertools

left_right_map = {}

def equal_counts(tuple_of_tuples, weigh_cnt):
    # weigh_cnt == len(tuple_of_tuples[0])
    for i in range(weigh_cnt):
        left = sum([ 1 for x in tuple_of_tuples if x[i] == 1 ])
        right = sum([ 1 for x in tuple_of_tuples if x[i] == 2 ])
        if left != right:
            return False
    return True

def has_symmetric_coins(tuple_of_tuples, weigh_cnt):
    global left_right_map
    m = [ 0, 2, 1]
    if len(left_right_map) == 0:
        for t in itertools.product([0, 1, 2], repeat=weigh_cnt):

            equivalent_t = tuple([m[x] for x in t])
            left_right_map[t] = equivalent_t
    for t in tuple_of_tuples:
        equivalent_t = left_right_map[t]
        if equivalent_t in tuple_of_tuples:
            return True
    return False

def check(coins, weigh_cnt):
    pos = [ t for t in itertools.product([0, 1, 2], repeat=weigh_cnt)
            if sum(t) > 0 and sum(t) < weigh_cnt * 2 # remove no weighing and always right cases
          ]  

    i = 0
    sol = []
    for t in itertools.combinations(pos, coins):

        i += 1
        if i % 100000 == 0:
            print(i)
        if not equal_counts(t, weigh_cnt):
            continue
        if has_symmetric_coins(t, weigh_cnt):
            continue
        sol.append(t)
        print(i, len(sol), t)
    return sol

def is_unique_solution(tot, solutions, weigh_cnt):
    lrmaps = [[0, 1, 2], [ 0, 2, 1]] # 0: no change, 1: 1->2, 2->1
    for pm in itertools.permutations(range(weigh_cnt)):
        for lrmap in lrmaps:
            eqsol = []
            for t in tot:
                equivalent_t = tuple([ lrmap[t[pm[i]]] for i in range(weigh_cnt) ])
                eqsol.append(equivalent_t)
            eqsol.sort()
            eqsol = tuple(eqsol)
            if eqsol in solutions:
                return False
    return True
        

def main():

    COINS = 12
    COUNT = 3
    solutions = check(COINS, COUNT)
    uniq = []
    for sol in solutions:
        if is_unique_solution(sol, uniq, COUNT):
            uniq.append(sol)
    print(len(uniq))
    for u in uniq:
        print(u)
        for i in range(COUNT):
            print(i + 1, ":", [t[i] for t in u].count(0), end=", ")
        print()

if __name__ == "__main__":
    main()

Just a heads up on the last things I’ve been doing – some old school graphical effects in plain old C without using floating point routines. I have even added dithering, again without floats. Something different, very refreshing.

dither

A challenge from my friend – writing a spider that will walk a certain blog and gather statistics on the comments. Sure, why not?

So here’s the idea, what else you need?

Of course Python, 3.2 in this case. Why not the latest one? I’m trying to find the sweet spot between the goodness of language’s built-in functionality and the richness of the external libraries which ten to lag lag, some have never crossed the Python3 barrier.

Packages like pip or easy_install help a lot when adding the libraries. Follow the instructions here:

http://stackoverflow.com/questions/4750806/how-to-install-pip-on-windows

After I got pip in my system I used it to install the extremely forgiving HTML parser, BeautifulSoup.

pip install beautifulsoup4

The built-in Python3 library urllib.request was used to download the HTML pages while BeautifulSoup provided tools for looking up tags. 100 lines later I had a working spider. I left it happily churning the stats from the blog website and left the room.

After a while I noticed that my computer looks like being idle. And it was. The spider has crashed because of an out of memory error. Oh well.

What was the cause? Again there was a library for that.

pip install objgraph

It turned out that Python’s GC does not work very well with BeautifulSoup’s thicket of inter-object dependencies. Every webpage parse left a solid lump of unused data in the memory until the 32-bit process ran out of the allotted 2GB.

I tried the trick with the soup.decompose(). It helped initially. But the more the tags I touched, the more memory was leaked. Soon my spider hit the 2GB wall again.

I could try to improve the GC further and hope for the best, or use the solution suggested on the web – use multiprocessing to start the parsing in a different process. In this way the memory leaks would be contained.

import multiprocessing

And several lines later I had a spider that did not leak. It parsed the pages perfectly, the only problem was that it did not send the stats to the main thread. Here’s why. The stats were a complex hierarchical dictionary and needed to go via multiprocessing.Queue to the main thread. The problem was that the data needed to be pickled in order to be put onto the queue. And the pickle module did not like my structure and threw a seemingly unrelated error. Oh well.

I could have redesigned my project or use shared memory, but chose the simplest solution ready at hand. Another library. Luckily json module liked my structures and I manually pickled them to strings.

import json

Very soon after I had a working spider that did not leak the memory and gathered the statistics. And worked a whole lot slower due to the pickling. Well, I had to live with that. After all it was only 150 lines (rewritten several times).

My friend was writing some C++ code that would require sorting an array of A with std::sort. A was a class written a long time ago and defined in its own header file enclosed in namespace n. It did not provide an operator < though. As this was just a quick and dirty proof of concept my friend did not bother to change the header file. Instead he provided an operator<(const n::A&, const n::A&) in his cpp file which seemed just as good solution as any. A simplified example follows.

#include <algorithm>

// This came from A.h
namespace n
{
struct A {};
}

bool operator<(const n::A&, const n::A&)
{
return true;
}

int main()
{
n::A a[2];

std::sort(&a[0], &a[1]);

return 0;
}

Apparently it was not since the compiler (venerable gcc 4.3, 4.4 giving the same result) spewed a lot of non-obvious error messages.

/usr/include/c++/4.3/bits/stl_algo.h: In function 'const _Tp& std::__median(const _Tp&, const _Tp&, const _Tp&) [with _Tp = n::A]':
/usr/include/c++/4.3/bits/stl_algo.h:1919:   instantiated from 'void std::__introsort_loop(_RandomAccessIterator, _RandomAccessIterator, _Size) [with _RandomAccessIterator = n::A*, _Size = int]'
/usr/include/c++/4.3/bits/stl_algo.h:4783:   instantiated from 'void std::sort(_RAIter, _RAIter) [with _RAIter = n::A*]'
prog.cpp:17:   instantiated from here
/usr/include/c++/4.3/bits/stl_algo.h:93: error: no match for 'operator<' in '__a < __b'
/usr/include/c++/4.3/bits/stl_algo.h:94: error: no match for 'operator<' in '__b < __c'
/usr/include/c++/4.3/bits/stl_algo.h:96: error: no match for 'operator<' in '__a < __c'
/usr/include/c++/4.3/bits/stl_algo.h:100: error: no match for 'operator<' in '__a < __c'
/usr/include/c++/4.3/bits/stl_algo.h:102: error: no match for 'operator<' in '__b < __c'
...and so on

When asked for help I immediately provided a resolution – put the operator< in the same namespace as the class. But the next question put me off balance: – Why didn’t it work for the first time? What’s the difference?
I was experienced enough to deal with the problem but I lacked understanding about what caused it. Therefore an investigation had to be launched. The tens of compiler error lines were not helpful. At the first glance the function was provided correctly. I could use the operator< in my code even if I put the call to it into another namespace n2. So why would the sort() located in std namespace not see it?
The answer was in hidden in the first line of the file. The #include did not just bring the std::sort() but the whole bunch of function definitions, among these templated operator<s, all of those in namespace std. Just one such function in std namespace prevented the Koenig lookup for candidate functions defined outside of std and A‘s native namespace n. Once the compare operator was defined in the n namespace like the rest of the class it started being visible to the std::sort().
What was a bit surprising was the fact that the actual correctness of the code depended on the actual composition of the std namespace. Had it no templated operator<s the original code would still compile. The non-template operator<s of course were never candidates for overload resolution and did not affect it.
And now the good news: the newer gcc compilers have more useful error messages that list the actual candidates and help to get to the point… once one reads through 100s of lines! Here’s just the beginning.

In file included from /usr/include/c++/4.7/algorithm:63:0,
                 from prog.cpp:1:
/usr/include/c++/4.7/bits/stl_algo.h: In instantiation of ‘void std::__insertion_sort(_RandomAccessIterator, _RandomAccessIterator) [with _RandomAccessIterator = n::A*]’:
/usr/include/c++/4.7/bits/stl_algo.h:2237:4:   required from ‘void std::__final_insertion_sort(_RandomAccessIterator, _RandomAccessIterator) [with _RandomAccessIterator = n::A*]’
/usr/include/c++/4.7/bits/stl_algo.h:5478:4:   required from ‘void std::sort(_RAIter, _RAIter) [with _RAIter = n::A*]’
prog.cpp:17:24:   required from here
/usr/include/c++/4.7/bits/stl_algo.h:2163:4: error: no match for ‘operator<’ in ‘* __i < * __first’
/usr/include/c++/4.7/bits/stl_algo.h:2163:4: note: candidates are:
In file included from /usr/include/c++/4.7/utility:72:0,
                 from /usr/include/c++/4.7/algorithm:61,
                 from prog.cpp:1:
/usr/include/c++/4.7/bits/stl_pair.h:212:5: note: template bool std::operator<(const std::pair&, const std::pair&)
/usr/include/c++/4.7/bits/stl_pair.h:212:5: note:   template argument deduction/substitution failed:
In file included from /usr/include/c++/4.7/algorithm:63:0,
                 from prog.cpp:1:
/usr/include/c++/4.7/bits/stl_algo.h:2163:4: note:   ‘n::A’ is not derived from ‘const std::pair’
...

For the really interested here is the smallest standalone nonsensical code example that causes the same error as above.

namespace s
{
 struct Y {};

 template
 bool operator<(const Y&, const Y&)
 {
   return true;
 }

 template
 void f(T x)
 {
   (*x < *x);
 }

}

namespace n
{
 struct A {};
}

bool operator<(const n::A&, const n::A&)
{
 return true;
}

int main()
{
 n::A a;

 s::f(&a);

 return 0;
}

Books are trees. Not only because they are made of paper and paper in turn is made of trees. The structure is quite obviously resembling a tree with chapters, paragraphs, sentences and words.

Let’s have a look at J. R. R. Tolkien’s Hobbit. It’s got 19 chapters, and we can draw each one like a tree branch down to every word: the longer the word the longer the needle is.

To better see what’s going on we can color the branches – sentences if you like – according to the persons (or monsters) included there. The result is below – animated gif. Enjoy.

 

Chapters of Hobbit as tree branches

Chapters of Hobbit

Let me start with a sample of very special English literature:

Branton Hills was a small town in a rich agricultural district; and having many a possibility for growth. But, through a sort of smug satisfaction with conditions of long ago, had no thought of improving such important adjuncts as roads; putting up public buildings, nor laying out parks; in fact a dormant, slowly dying community. So satisfactory was its status that it had no form of transportation to surrounding towns but by railroad, or “old Dobbin.”

What’s so special in the above example of somewhat mundane prose? You can pause and try to guess or just continue reading.

The answer – it is missing something vital for the written English, actually what’s even capitalized in the word ‘English’ itself: the letter ‘e’. As anyone who read Poe would attest ‘e’ is the most frequent letter in this language. It’s hard to construct a sentence without ‘e’, the paragraph above looks impossible to compose – and yet one Ernest Vincent Wright wrote a whole novel  in such manner! The excerpt is from “Gadsby, a Story of over 50 000 Words Without Letter ‘E”” as it was called on its title page. The author allegedly tied down that key on his typing machine so that ‘e’ would not slip into the book  by accident.

This kind of writing is called a lipogram. Mr Wright did not gain acclaim for his prose but he sure did find some followers, even more ridiculous than himself. See for yourself at the Wikipedia article.

And now for the programming part. I was curious whether lipograms with one letter amiss happen by accident in the English prose. Project Gutenberg’s most downloaded list provided me with a small sample of 19th/20th century literary works. I suspected that Joyce’s Ulysses might render something interesting. War and Peace, although translated, was also included in the list because it was so long, increasing a chance of a lucky hit.

def lipogram(words):
    best = defaultdict(list)
    length = defaultdict(int)
    letters = [ chr(k) for k in range(ord('a'), 1 + ord('z')) ]

    words_guard = words
    words_guard.append("".join(letters))

    for k, w in enumerate(words_guard):
        for ch in letters:
            if ch not in w:
                length[ch] += 1

            else:
                if length[ch] > len(best[ch]):
                    best[ch] = words_guard[k - length[ch]: k]

                length[ch] = 0

    return best

The above is a part of a program that found the longest word sequences without selected English alphabet letters in several Project Gutenberg books. I’ll only describe the results for the famous ETAOIN SHRDLU high frequency letters.

Letter E:

Gadsby wins easily. Ulysses has a 57 word sequence that is actually a collection of acronyms. So the best accidental lipogram is “…for you.  Your word is all I want.  I wouldn’t ask such a thing ordinarily, I wouldn’t so dishonour you as to imply a doubt, but this is a…“,  29 words from Dracula by Bram Stoker.

Letter T:

This time Ulysses cheats by repeating “yes” many times and the award goes to Wuthering Heights which is more ingenious regarding repetitions:  “…sparkle and dance in a glorious jubilee.  I said his heaven would be only half alive; and he said mine would be drunk: I said I should fall asleep in his; and he said he could…”. (36 words)

Letter A:

Actually Project Gutenberg’s disclaimer text did better than most of the tested novels so I had to trim the books to the actual content.  Plenty of them have 31-33 a-lipograms. The translator of Tolstoy’s War and Peace won narrowly: “the cruelty of his rejection of her, the cruelty of his rupture with her. “If only it were possible for me to see her once more! Just once, looking into those eyes to“.

Letter O:

Joyce’s word play provided the longest although nonsensical sequence at 53. Other novels have a couple of more prosaic ones at 37-38. Jane Austen wrote in Pride and Prejudice: “…family. Mrs. Bennet had seen her eldest daughter much admired by the Netherfield party. Mr. Bingley had danced with her twice, and she had been distinguished by his sisters. Jane was as much gratified by this as her…“.

Letter I:

One cry in Moby Dick is over 50 words long and includes sentences like this “There goes three thousand dollars, men!–a bank!–a whole bank!”. An i-lipogram from War and Peace is only 46 words long  but nicely worded and with some word substitutions the translator might have extended it. “…to the commander of the corps. How do matters stand?… You know, Count, there’ll be a battle tomorrow. Out of an army of a hundred thousand we must expect at least twenty thousand wounded, and we haven’t stretchers, or bunks, or dressers, or doctors enough for six thousand. We have ten thousand carts, but we need other…”.

Letter N:

Ulysses is disqualified for repetitions and  it is Pride and Prejudice again: …to her that, by his wife at least, if not by himself, such a hope was cherished. The letter was to this effect: “MY DEAR LIZZY, “I wish you joy. If you love Mr. Darcy half as well as I do my dear Wickham, you must be very happy. It is a great comfort to have you so rich,… .

Letter S:

Three Men in a Boat (to say nothing of the dog) by Jerome K. Jerome has a 47 naturally sounding s-lipogram.

Letter H:

Apart from French language citations the longest sequence is in Adventures of Tom Sawyer, 54 words.

Letter R:

Its frequency is even lower and there are examples of naturally sounding r-lipograms around 70 words log, the longest was found in Three Men in a Boat.

Letter D:

Mellville’s Moby Dick has a 78-word d-lipogram.

Letter L:

125 word sequence in Ulysses (and it does have some sense). Close to 100 in others.

Letter U:

Jane Eyre by Charlotte Bronte contains a 123-word u-lipogram.

The less frequent letter is the longer accidental lipograms are. For ‘m’ they are over one hundred words long, for ‘y’ 200, 500 for ‘y’, thousands for ‘j’, ‘x’ and ‘q’.

In fact everything follows the letter frequency statistics and there seem to be no surprising interesting lipograms. So what’s the purpose of the whole exercise? In fact the same as for Ernest Vincent Wright’s novel: show that something can be done.

Today I’ll write about a simple board game that  exists in numerous incarnations on entertainment websites. Spartans vs goblins is played on a 3 row x 8 columns grid. One Spartan and one goblin occupy opposite squares at the end of each row. The goal of the game is to prevent your opponent from making any more moves. Spartans or Goblins move any squares right or left in the same row but cannot jump over the opponent. Only in the first move on each side the maximum number of squares is half of the available 6 squares i.e. 3.

S _ _ _ _ _ _ G
S _ _ _ _ _ _ G
S _ _ _ _ _ _ G

The opponent loses if s/he has all figures in the last column while mine are in adjacent squares and it is his/her time to move. More generally, a situation where in each row Spartans and goblins are adjacent means the current player can move only backward. Since the opponent can mimic the moves, s/he will eventually strangle the first player  in the last row. Finally, it does not require a Kasparov to guess that if the distances between the figures are 0, 1 and 1, the player taking the move will lose too. After any forward move the situation will be 0, 0 and 1, by distances, and the opponent will close the gap in the remaining row.

_ _ _ S _ G _ _
_ _ S G _ _ _ _
_ _ _ _ S _ G _

These facts lead to the following observations. Only forward moves are meaningful because as shown above after a move back the situation may be restored by the opposing player. Therefore all that matters for recording the state of the game  are distances between the figures in each row. Obviously the rows are interchangeable so their order is not important.

Thus Spartans and goblins are equivalent to a game where you have three stacks of tokens or matches and can take any number (first move excepted) of tokens from any stack, while the goal of the game is to take the last token remaining. Games of this kind have been extensively covered in the literature on games theory. For the sake of completeness I will show how to determine the winner and the winning strategy anyway.

The recursive approach is given below. Function check() returns true if the position is a sure win for the player that has just made a move. If any opponent’s move from the current position may reach a new, winning, position, the current one is a sure lose, otherwise it is a wining position. Note that in order to reduce the number of states to consider the distances are sorted. The additional column in the tuple is the number of moves yet to move with the half-squares limitation, the beginning state is (6, 6, 6, 2). This program lists the winning positions, one needs to move to any possible one in order to win.

class Checker:
    def __init__(self, tracks, squares):
       self.tracks = tracks
       self.squares = squares
       self.cache = {}
    def check(self, pos = None, movecount = 0):
       if pos == None:
           pos = (self.squares,) * self.tracks
           pos = pos + (2,) # store state
       if pos in self.cache:
           return self.cache[pos]
       if (sum(pos[0:-1]) == 0):
           self.cache[pos] = True
           return True

       m = self.squares
       if movecount <= 1:
           m = m // 2

       save = True
   
       for i in range(self.tracks):

           for j in range(1, min(pos[i], m) + 1):
               new = list(pos[:])
               new[i] -= j
               new[-1] = max(0, pos[-1] - 1) # change state
               new = tuple(sorted(new[0:-1]) + [new[-1]])
               if self.check(new, movecount + 1):
                   save = False   


       self.cache[pos] = save
       return save

def main():
    simple = Checker(3, 6)
    print(("2nd" if simple.check() else "1st") + " player wins" )

    for k in sorted(simple.cache.keys()):
       if simple.cache[k]:
           print(k, simple.cache[k])

if __name__ == "__main__":
    main()

Just as easily one can find the winning positions from the bottom, (0, 0, 0) state by marking all that can reach it as undesirable, the ones that lead only to undesirables as desirable etc.

Follow

Get every new post delivered to your Inbox.