## Programming Example inspired by a Usenet problem

On a 5x5 chessboard with 12 black knights and 12 white knights in a certain position, in how few moves can you reverse all the colors?

This problem will be hard if you've never done anything like it, but with practice such problems become straightforward. The special difficulty here is that a large data structure is required. I had to make it fit into 30 megabytes; slightly simpler techniques would have sufficed for this particular problem on a larger machine.

I'll give the source code, followed by a discussion of the data structure, followed by the solution discovered by the software. This is my original hacked-up code, foibles and all. I also produced a slightly improved version in which the hash handling code can be used for other applications.

```

/* Usage:
* a.out #a #b   -- find the midpoint of the quickest path from a to b.
*        (each position is denoted by the decimal value of
*         its internal representation.)
* a.out #a      -- same but b defaults to the color-reversal of a.
* a.out         -- same but a defaults to the position posted on Usenet.
*/

typedef unsigned short Entry;

struct hashtab {
int     asize, usize;
Entry   *tab;
} Htab;

#define KEYWORD(entry)  (*(entry))
#define EPEND           0x4000
#define OPEND           0x8000
#define PURE(key)       ((key) & ~(EPEND | OPEND))

#define POS2LO(pos)     ((pos) & 0xffff)
#define POS2HI(pos)     ((pos) >> 16)
#define LH2POS(lo, hi)  ((hi) << 16 | (lo))

#define ECELL(pos)      (0x1f & (pos))
#define BIT(x)          (1 << 5 + (x))
#define INVERTPOS(pos)  ((pos) ^ 0x3fffffe0 ^ BIT(ECELL(pos)))

int     Pointersmoved;

Entry *hashsub(table, tsize, key)
Entry *table;
{
int     ix, tkey;

for (ix = (key ^ key >> 7) % tsize;
(tkey = PURE(KEYWORD(table + ix)))
&& tkey != key; )
if (++ix == tsize)
ix = 0;
return table + ix;
}

Entry *tseek(tabp, key)
struct hashtab *tabp;
{
int     kw, newsize, ix;
Entry   *ep, *newp, *malloc();

if (tabp->usize + (tabp->usize >> 3) >= tabp->asize) {
/* Grow the hash table */
newsize = tabp->asize * 2 + 23;
newp = malloc(newsize * sizeof(Entry));
for (ix = 0; ix < newsize; ix++)
KEYWORD(newp + ix) = 0;
for (ix = 0; ix < tabp->asize; ix++)
/* Copy the old hash table */
if (kw = KEYWORD(tabp->tab + ix))
*hashsub(newp, newsize, PURE(kw)) = tabp->tab[ix];
free(tabp->tab);
tabp->asize = newsize;
tabp->tab = newp;
Pointersmoved = 1;
}
return hashsub(tabp->tab, tabp->asize, key);
}

Entry *tlook(tabp, pos)
struct hashtab *tabp;
{
return tseek(tabp + POS2LO(pos), POS2HI(pos));
}

struct hashtab *tabp;
{
Entry   *hp;

tabp += POS2LO(pos);
hp = tseek(tabp, POS2HI(pos));
if (KEYWORD(hp))
return 0;
tabp->usize += 1;
KEYWORD(hp) = POS2HI(pos) | flags;
return 1;
}

main(argc, argv)
char    **argv;
{
int     i, mov, npos, depth, opos, tno, kw, side;
int     Startpos = 1072447500, Endpos = 0;
Entry   *hp, *tp;

genmap();
if (argc > 1)
Startpos = atoi(argv);
if (argc == 3)
printf("Start pos = %d Target = %d\n", Startpos,
Endpos ? Endpos : INVERTPOS(Startpos));
/* We have 11 levels of nesting; so let's not indent the
* source code except on the levels where we actually use
* squiggly braces. */
/* Level 1 -- Iterate through generations */
for (depth = 1; ; depth++)
/* Level 2 -- Alternate work on Startpos and Endpos */
for (side = 0; side < 1 + (!!Endpos); side++)
/* Level 3 -- Iterate through tables looking for pending positions */
for (tno = 0; tno < 65536; tno++)
/* Level 4 -- Iterate within table */
for (Pointersmoved = 0, tp = Htab[side][tno].tab;
tp < Htab[side][tno].tab + Htab[side][tno].asize;
tp++)
/* Level 5 -- Process only positions of the proper generation */
if ((kw = KEYWORD(tp)) & (depth & 1 ? OPEND : EPEND)) {
KEYWORD(tp) = PURE(kw);
opos = LH2POS(tno, KEYWORD(tp));
/* Level 6 -- Iterate through 8 knight's move directions */
for (mov = 0; mov < 8; mov++)
/* Level 7 -- Ignore off-board moves */
if (npos = makemove(opos, mov))
/* Level 8 -- Ignore positions already in database */
if (tadd(Htab[side], npos, depth & 1 ? EPEND : OPEND)) {
if (Endpos)
hp = tlook(Htab[! side], npos);
else
hp = tlook(Htab[side], INVERTPOS(npos));
/* Level 9 -- Detect success */
if (KEYWORD(hp)) {
int     x, y;
printf("!!! Solved !!!\n");
printf("%d\n", npos);
/* Levels 10,11 -- Display the board */
for (x = 0; x < 5; x++)
for (y = 0; y < 6; y++)
printf("%c", y == 5
? '\n' : npos & BIT(x*5 + y)
?  'X' : x*5 + y == ECELL(npos)
?  '.' : 'O');
exit(0);
}
}
if (Pointersmoved) {
/* Fall back if `tp' might be obsolete */
--tno;
break;
}
}
}

int     momap;

makemove(pos, mov)
{
int     hop = ECELL(pos);
int     nhop = momap[hop][mov];
return
nhop == -1 ? 0
: (pos & 0x3fffffe0 | nhop)
^ (pos & BIT(nhop) ? BIT(nhop) | BIT(hop) : 0);
}

int     mpars = {
1, 2, 2, 1, 2, -1, 1, -2, -1, -2, -2, -1, -2, 1, -1, 2,
};

genmap()
{
int     x, y, m, nx, ny;

for (x = 0; x < 5; x++)
for (y = 0; y < 5; y++)
for (m = 0; m < 8; m++) {
nx = x + mpars[m];
ny = y + mpars[m];
if (nx < 0 || nx > 4 || ny < 0 || ny > 4)
momap[5*x + y][m] = -1;
else
momap[5*x + y][m] = 5*nx + ny;
}
}
```

Comments on the use of memory.

On a 5x5 chessboard with 24 knights, the problem is to transpose one position to another in fewest moves. The fun is figuring how to solve the problem quickly and how to fold the solution method into finite memory.

I used the brute force method others mentioned; to go from A to B in fewest moves we place the positions reachable from either A or B in memory, generation by generation, stopping when a match is found: the match is the midpoint in the winning move sequence. Either two separate tables are used for A and B, or a common table with an extra bit in each entry to indicate whether it is for A or B. (In our specific problem there is an equivalence between A and B, so a single table is generated for A, and applied also for B with an equivalence mapping routine.)

We go forward from A, and backward from B, but the non-capturing knight's moves happen to be exactly the same forward as backwards. Chess is somewhat unusual in that most of the moves have the same prescription forwards as backwards. Separate moving routines would be needed if the pieces were pawns, or if captures (or castling) could occur.

We'd like to record the generation number of each position encountered; this would take 5 bits. (We'll run out of time before we get up to generation 32.) If that's too much memory, a degraded technique is possible which makes do with just 2 bits. In the 2-bit approach, instead of denoting generation G from the set {1, 2, ..., 30, 31, No}, we restrict it to the set {less_than_K, K, K+1, No} and modify the bits as K increases. (If "No" can be otherwise distinguished a trit would suffice instead of 2 bits, but this is of limited interest on Earthly computers.)

The simplest representation of a position uses 30 bits, namely 5 = ceil(log(25)) bits to locate the single empty cell, 24 bits to denote the knights' colors, and 1 unused placeholding bit for the empty cell's knight-color. (We could get rid of the placeholding bit easily, but why bother until we're desperate?) It seemed kind of neat that 30 bits for the position plus 2 bits for the position's "value" fit precisely into a 32-bit word but, as we will see, that turned out to be unimportant.

The remaining problem is to ensure the table(s) are smaller than the physical memory limit (they can be larger with virtual memory, if they're used sparsely with good locality). My machine has "only" 32 megabytes. The two usual techniques for such a large table are bitmap and hashtable.

With the bitmap we need just 2 bits for each possible position (or 3 bits without the A/B equivalence trick, or 5 or 6 bits if we want to record generation number). As someone else pointed out there are only 25 * C(24,12) positions, or about 68 million. This would (barely) fit in my memory, but converting combinations to integers is clumsy. A simpler approach would assume at least 25*2^24 positions, or 419 million, far too much for my computer. (Someone on Usenet made the nice observation that, due to a special property unique to knight's moves, one bit can be eliminated: the odd/even sense of generation number can be inferred from the location of the empty cell, since the sum of its x-y coordinates will strictly alternate odd/even/odd/even/etc. with every move. This trick means a single bit per position suffices, and thus a bitmap would fit comfortably on a 64-megabyte machine. My machine is smaller so I chose an approach other than bitmap.)

The other approach is hash table. Now we need H+2 bits per entry instead of 2, where H is the size of the hash key. Instead of 68 million or 419 million entries we need only 5.9 million (that turns out to be the number of distinct positions generated prior to solution), or rather more actually to get good hash table performance. If the position is its own hash key, H = 30 and each entry needs H+2 = 32 bits. This seemed neat, and I coded it this way without knowing how large the hash table actually needed to be.

Although 5.9 million times 32 bits fits into my computer memory, it's a very tight squeeze and I encountered poor hashing and/or page thrashing. Perhaps I could have tweaked it to work, but there was a simple ``Plan B'' to fall back on.

It struck me as a straightforward work-around for the memory shortage when I coded it -- a combination of the advantages of bitmap and hash table -- but I don't recall it being discussed in any textbook. I won't discuss it in any detail either: the technique and its advantages seem straightforward.

I used 65536 hash tables instead of one. Now 16 of the position bits are used as a table-selector and only H=14 bits are needed for the hash key. Thus the table entries are 2 bytes instead of 4 bytes. (Earlier I mentioned that the serendipity 30+2 = 32 wasn't important. If a position required 31 bits, I'd have simply increased the number of hash tables to 2^17.) With this approach it was necessary to dynamically reallocate the hash tables. With today's large memories it's been several years since the last time I had to do *that*. (``One size fits all'' hash tables wouldn't work because the distribution of bit-codes for the relevant positions is highly non-uniform. A ``good hash function'' would have addressed this problem but lazy-old-me makes do with mediocre hashing.)

(In this approach some of the position-index bits are used for a hash-key, and some are used directly as part of a memory address. Thus the approach can be viewed as a hybrid between the two basic methods, hash table and bit map.)

Once the midpoint of the solution sequence is found, the generation numbers would allow the rest of the solution to be readily located. But I was using the 2-bit approach, and didn't want to spend further effort on automation (I certainly didn't have a client paying for fancy graphics or a turnkey interface). Setting B equal to the newly discovered solution midpoint, I just reran the program, iteratively printing successive midpoints. It took only 15 seconds of computer time to solve the problem, ignoring wastage during the debug cycles :-)

Here's an outline of the solution:

 Before Move 1 OOOOO OOOOX OO...XX OXXXX XXXXX After Move 5 O...OOO OOOXX OOXXX OXOXO XXXXX After Move 9 OXOXO OO...XX OOOXO OXOXO XXXXX After Move 14 XXOX... OOOXX OOOXO OXOOO XXXXX After Move 18 XXOXX OOOXX ...OOXO OXOOO XXXXO After Move 23 XXOXX OOOXX X...OXO XXOOO XOOXO After Move 27 XXOXX OXOXX XXO...O XXOOO OOOXO After Move 32 XXXXX OXOXO XXXO... XXOOO OOOOO After Move 36 XXXXX XXXXO XX...OO XOOOO OOOOO