Spotted this one at programmingpraxis.com (highly recommended). Assuming you have one permutation find the next one in the lexicographical order, for example 123->132->213… I knew that such an algorithm existed, but I did not remember the details. After a minute with a pencil everything came together and I was able to post my solutions in Python and C++. In fact one of the other posters pointed out to me that this algorithm has been first found centuries ago and has been frequently rediscovered.

Python solution (of course there is a library and an oneliner for that)

C++ solution (the Standard Library has a function for that too)

#include <iostream>
#include <string>
#include <algorithm>
 
bool make_next_perm(std::string& p)
{
    int n = p.length(), i = n - 1, j = i, k = i;
    while (i && p[i - 1] > p[i]) --i;
    if (!i) return false;
    while (p[j] <= p[i - 1]) --j;
    std::swap(p[i - 1], p[j]);
    while (i < k) std::swap(p[i++], p[k--]);
    return true;
}
 
int main()
{
    std::string s("12345");
 
    do {
        std::cout << s << '\n';
    } while(make_next_perm(s));
 
    return 0;
}


There are many answers to today’s exercise on Programming Praxis which is to remove from a string characters given in another string. I liked most the Perl one by Paul G. where he used substitution to get the result in one line.

In python I implemented the obvious list comprehension solution and for comparison the substitution trick. Even counting the import statement the latter approach is a bit shorter (to be fair it has no error checking and can be fooled). That’s thinking out of the box!

See the source here: http://codepad.org/ABuodKwx

Content Aware Image Resizing is a concept that says that instead of making the pictures narrower to fit the web page, some unimportant parts of the pictures may be removed. Important image features stay as they are. When it was all the hype I wrote my version too. Recently I found the images and made animated images.

Addition: a draft animated GIF wich should be visible in any browser.

Birds in flight getting narrowed

Birds in flight - GIF animation

 

Content Aware Image Resizing

Ducks. The lines on the right are the png animation artifact. (Animation visible on select browsers including Firefox after clicking the image)

Back then I learned Python and PIL writing this program, so it was not only fun but very useful too.

Meantime in the park

Ducks on water (animation visible in select browsers after clicking the image)

From the long list of my ideas. Represent an image as a linear combination of fractal images. Gives you unlimited zoom depth. I guess someone has already patented this though. And of course working out the math to find the combination giving the best match is really tough. It’s worth pursuing though when I have more free time.

I'm pretty sure that's a landscape somewhere in Dorset

Now that I have the 'fractal', all I need is to find the original image for it!

Here is my solution to an interview question posed here.In short, find a value in a m by n matrix where each individual column and each individual row is sorted in ascending order. The algorithm should require O(m + n) steps.

My idea is very simple. Cut off the first or last column or row if the searched number cannot be in it. And repeat recursively until your resulting new matrix narrows down to one point (the desired number) or one of the dimensions is 0 – which means the search has failed.

The resulting algorithm is quite complicated because of many checks. It still has the O(m + n) complexity since in every step either a row or a column is cut off.

See the source here

Or below:

M = [ [  1,  5,  7,  9 ],
      [  4,  6, 10, 15 ],
      [  8, 11, 12, 19 ],
      [ 14, 16, 18, 21 ] ]

def find(m, v):
    if len(m) > 0 and len(m[0]) > 0:
        return find_r(m, v, 0, 0, len(m) - 1, len(m[0]) - 1)
    return None

def find_r(m, v, r1, c1, r2, c2):
    if r1 > r2 or c1 > c2:
        return None
    if not (m[r1][c1] <= v <= m[r1][c2]):
        return find_r(m, v, r1 + 1, c1, r2, c2)
    if not (m[r2][c1] <= v <= m[r2][c2]):
        return find_r(m, v, r1, c1, r2 - 1, c2)
    if not (m[r1][c1] <= v <= m[r2][c1]):
        return find_r(m, v, r1, c1 + 1, r2, c2)
    if not (m[r1][c2] <= v <= m[r2][c2]):
        return find_r(m, v, r1, c1, r2, c2 - 1)
    return (r1, c1)

print(find(M, 11))
print(find(M, 13))

The problem is stated in the following way: compute the number of Roman numerals where no letter appears twice. For example XCIV and XVI are OK, XXI is not.

The range includes I through MDCLXVI which includes every Roman ‘digit’ once. Obviously by picking any number of those ‘digits’ in this order we get a Roman numeral fulfilling the conditions. There can be 2^7 – 1 such numerals, excluding the trivial case where none was picked. However 127 is only the lower bound since this formula does not cover subtractions like CM, XL, IV.

To continue we need to establish the rules for the subtractions. According to Wikipedia the only allowed cases when a smaller ‘digit’ can precede a larger one are:

  • C before M, D
  • X before C, L
  • I before X, V

Therefore IV, IX, XLI, CMXLIV are all fine. VL, XD are not. Also subtractions from the letter/’digit’ 10 times larger preclude the use of the jumped ‘digit’. XCL, IXV are incorrect.

To generate all the numerals without repetitions we will start with M. There are two choices: either we use M in your numeral, or not. Go down recursively via D. It starts getting interesting after reaching C. Now there are not two, but four choices:

  • Skip it,
  • use it in a direct way,
  • use it as a prefix to the previous ‘digit’ (D) if we chose it previously
  • use it as a prefix to the ‘digit’ two steps back  (M) if we chose it previously

With X there is another condition: it can only precede C if C is itself used in the direct way. Same for I which can form IX but not IXC.

The algorithm enumerating all Roman numbers is now completely formed. We will use a 7 element array to store the way the letters MDCLXVI are used. The possible values are:

  • not used
  • subtracting from the digit one step back
  • subtracting from the digit two steps back
  •  used directly for the face value

The two ‘subtracting’ cases need only be differentiated if we want to actually print the Roman numerals. As the task is to compute the number of numerals without repetitions we will only keep the distinction between a letter used ‘positively’ and ‘negatively’.

We’ll call our function find() with the parameters position=0 and array of chosen letters = 0, 0, 0, 0, 0, 0, 0 referring to M, D, C, L, X, V, I. The pseudocode is as follows:

find(position, chosen)
if position == 7 then no more choices, return 1;
mark chosen[position] as used directly; call find(position + 1, chosen) and store the result as the count
mark chosen[position] as unused; call find(position + 1, chosen) and add the result to the count
if position refers to C, X, I (preceding letters)  then {
 if the letter two steps back was used directly and the previous letter was not used then mark chosen[position] as used indirectly; call find(position + 1, chosen) and add the result to the count
 if the letter one step back was used directly then mark chosen[position] as used indirectly; call find(position + 1, chosen) and add the result to the count
}
return count

The C version somewhat obfuscated and with certain optimizations (no need to call find twice for the indirect case) can be found here:

http://codepad.org/D1Sesi4Q

or here:

#include <stdio.h>
#define RN 7
#define BK 1
#define USE 2
#define recurse(x) { s[p]=(x); count += times * find(p + 1, s); }

typedef unsigned int uint;

uint find(uint p, uint* s)
{
  uint count = 0;
  if (p == RN) return 1;
  uint times = 1;
  recurse(USE);
  recurse(0);
  if (p > 0 && p % 2 == 0)
  {
    times = (uint)(s[p - 2] == USE && !s[p - 1]) + (uint)(s[p - 1] == USE);
    if (times) recurse(BK);
  }
  return count;
}

int main(int argc, char** argv)
{
  uint s[RN] = {0, 0, 0, 0, 0, 0, 0};
  uint count = find(0, s);
  printf("%d\n", count - 1);
  return 0;
}

The exercise taken from programmingpraxis is to generate all anagrams from a phrase, e.g “hey man” -> “may hen” etc. The words must come from a dictionary. My solution has some optimizations in place. The main one: multiple words that are anagrams are treated as one and trivially expanded after the main part of the algorithm completes.

import re
from collections import defaultdict
import functools
import time
import itertools
 
PAT_SMALL = re.compile(r"[a-z]+$")

def get_letter_count_dict(word):
    d = defaultdict(int)
    for ch in word:
        d[ch] += 1
    return d

def filt(test_word, against):
    t = get_letter_count_dict(test_word)
    for ch in t:
        if ch not in against or t[ch] > against[ch]:
            return False
    return True

def filtered_read(fname, letter_count, length_min):
    f = open(fname)
    words = [ w.rstrip() for w in f.readlines() ]
    f.close()
    words = [ w for w in words if len(w) >= length_min and PAT_SMALL.match(w) and filt(w, letter_count) ]

    print("Using filtered dictionary with %d lowercase words" % len(words))
    return words

def gen_anagrams(word_letter_count, n, words_and_dicts, start_index):
    retv= []
    k = start_index
    n_words = len(words_and_dicts)
    while k < n_words:
        w1 = words_and_dicts[k][0]
        if len(w1) > n:
            k += 1
            break
        new_let_count = words_and_dicts[k][1]
        diff_let_count = word_letter_count.copy()
        good = True
        left = n
        
        for ch in new_let_count:
            if ch in diff_let_count:
                letters_taken = new_let_count[ch]
                diff_let_count[ch] -= letters_taken
                if diff_let_count[ch] < 0:
                    good = False
                    break
                left -= letters_taken
            else:
                good = False
                break
                
        if good:
            if left == 0:
                retv = retv + [ [w1,] ]
            else:
                leftover_sol = gen_anagrams(diff_let_count, left, words_and_dicts, k)
                for t in leftover_sol:
                    retv.append([ w1 ] + t)
        k += 1

    return retv

def expand(solutions, sorted2word):
    i = 1
    for sol in solutions:
        for mapped in itertools.product(*(sorted2word[anagram] for anagram in sol)):
            print(i, mapped)
            i += 1

def find_anagrams(word, min_length = 3, do_expand = False):
    ss = "".join(sorted([c for c in word if 'a' <= c <= 'z' ]))
    ss_d = get_letter_count_dict(ss)
    n = len(ss)

    words = filtered_read("wordlists/mwords/74550com.mon", ss_d, min_length)

    # operate on equivalence classes of all anagrams
    # create structures for reverse expansion
    sorted2word = defaultdict(list)
    wordclass = set()
    for w in words:
        sortedletters = "".join(sorted(list(w)))
        sorted2word[sortedletters] += [ w ]
        wordclass.add(sortedletters)

    # order by length ascending, this is used in gen_anagrams
    class_dicts = sorted([ (len(w), w, get_letter_count_dict(w)) for w in wordclass if len(w) <= n  ])
    class_dicts = [ (trip[1], trip[2]) for trip in class_dicts ]

    start_time = time.clock()
    r = gen_anagrams(ss_d, n, class_dicts, 0)
    stop_time = time.clock()
    print("%d unique solutions found in %5.2f seconds" % (len(r), stop_time - start_time))
    if do_expand:
        expand(r, sorted2word)

if __name__ == "__main__":
    find_anagrams("programming praxis", do_expand = True)

Today a perfect brain teaser for a Friday night. Let’s find how many paths of a certain length a chess knight on a phone keypad has.

To find the solution I had to write down the adjacency list first. Instead of doing that mundane task manually, I wrote a small program.

from itertools import product
from collections import defaultdict

def isOK(x, y):
    return (0 <= x <= 2 and 0 <= y <= 2 or
            x == 1 and y == 3)

def generate_adjacency_dict():
    k = "123 456 789 x0x".split()
    jump = defaultdict(list)
    delta = list(product([2, -2], [1, -1])) + list(product([1, -1], [2, -2]))

    for y, row in enumerate(k):
        for x, ch in enumerate(row):
            if isOK(x, y):
                for d in delta:
                    if isOK(x + d[0], y + d[1]):
                        jump[ord(ch) - ord('0')].append(
                            ord(k[y + d[1]][x + d[0]]) - ord('0'))
    return jump

if __name__ == '__main__':
    print(generate_adjacency_dict())

The actual algorithm is based on the following observation: once we reach a phone key it is not important how we did that. We simply need to remember the total number of ways we had of reaching that key. So in each step one keeps the count of paths that end at a key. In the new iteration every reachable key will contain a sum of path counts of the keys that the knight jumped from.

from collections import defaultdict

def count_knight_paths(maxlevel):
    jump = {0: [6, 4], 1: [6, 8], 2: [9, 7],
            3: [4, 8], 4: [9, 3, 0], 6: [7, 1, 0],
            7: [6, 2], 8: [3, 1], 9: [4, 2]}
    c = { 1 : 1 }
    for level in range(maxlevel):
        cprim = defaultdict(int)
        for key in c:
            for jumped in jump[key]:
                cprim[jumped] += c[key]
        c = cprim
    return sum(c.values())

if __name__ == "__main__":
    for i in range(10):
        print(i + 1, count_knight_paths(i))

Some pictures of simple l-systems.

A ringZoom in

The algorithm I mentioned yesterday was the ancient Sieve of Eratoshenes. It is a widely known method for finding small primes. I chose it for my test since it involves heavy memory usage, moreover I wanted the parallel version to access shared memory. In contrast to the Sieve my previous test used little memory and the pool members did not share any. Please refer to the link above for the description of the classical algorithm; in short it involves eliminating compound numbers by marking multiples of newly found primes in a large array with every item referring to one integer.

How to split the work among the pool members? My first idea was to have each subprocess mark the multiples of a different prime. This would require either synchronizing them on each read and write to/from the shared memory, or some complicated design to avoid that.

What seemed simpler to me was to walk to the next unmarked cell (i.e. to the next prime) in the main process. The subprocesses would mark the multiples of this prime in separate chunks of the big array, In this way they would not need to be synchronized when doing the writing. Actually this depends on the hardware and the array implementation, but I made an assumption it would work in my case.

When I solved the problem of computing the ranges I ran into a problem: the pooled jobs would start and complete, but the result array would be unchanged. I stared at the screen astonished. I resorted even to debugging. My debug session lasted exactly two steps. The line “from multiprocessing import…” apparently made the program hang up. Tracing was more successful. When I ascertained that the subprocesses write to the array, but it is again empty at each new execution, my hidden suspicions were confirmed. The shared memory was not really shared and each subprocess used a different copy.

I tried passing the multiprocessing.Array as a parameter to Process, which happened to work, but was too slow to be practical in my case. Thanks to StackOverflow, however, I found the best solution – who would bother to read the fine manual anyway… See below how the pool initializer is used to share a global variable from the main process with the subprocesses. And that’s all.

import time
from multiprocessing import Pool, Array

def fill_part(part, parts, n, prime):
    global num

    # compute the range
    left = (n - 2 * prime + 1)
    k = left % parts
    if k > 0:
        left += parts - k
    # now left % parts == 0
        
    part_len = left // parts
    begin = (n - left) + part * part_len
    end = begin + part_len

    m = begin % prime
    i = begin
    if m > 0:
        i = begin + prime - m
    if i <= prime:
        i = 2 * prime

    # do the actual work

    while i < end:
        num[i] = 1
        i += prime

    return 0

def fill_part_tuple(t):
    return fill_part(t[0], t[1], t[2], t[3])

def initProcess(share):
    global num
    num = share

def era_dist(n):
    num = Array('L', n, lock = False)

    parts = 4
    pool = Pool(initializer = initProcess, initargs=(num,), processes = parts)
    
    num[0] = 1
    num[1] = 1
    i = 2

    while i * i < n:
        if num[i] == 0:
            params = ( (k, parts, n, i) for k in range(parts) )
            x = pool.map(fill_part_tuple, params)
        i += 1

    return num

def find_last(num, v):
    i = len(num) - 1
    while i > 0:
        if num[i] == v:
            return i
        i -= 1
    return None

if __name__ == '__main__':
    c = time.clock()
    b = era_dist(100000000)
    print(time.clock() - c)
    print(find_last(b, 0))