## Solving the Ginnow's Sieve Puzzle

Carl Ginnow published an interesting solitaire game on the rec.puzzles message board. Its solution has several instructive points. The rules are as follows:

1. Start with a list of the whole numbers from 1 to N, say N = 101.
2. Select a number from the list that has at least two factors in the list (one of the factors may be the number itself).
3. Remove all factors of the selected number from the list.
4. Repeat steps 2, 3 until all of the numbers in the list have only one factor remaining in the list.
Your score in this game is the sum of the numbers selected in step 2. Which numbers should you select, and in which order, to maximize this sum?

I call Ginnow's game the ``Ginnow Sieve'' because of a (very) vague resemblance to the Sieve of Eratosthenes. (In Eratosthenes's sieve you cross out the multiples of selected numbers; in Ginnow's sieve you cross out their divisors. In the former the numbers are selected automatically and are the primes; in the latter you select them as you wish, hoping for a good final score.) Ginnow's sieve is a rather fun solitaire, and I recommend you experiment with it a bit before reading further. This solitaire also seems like it might be a nice recreation for youngsters to practice their multiplication tables and learn about prime numbers.

For me the biggest fascination was finding a computer algorithm to develop the optimal move sequence.

Such an algorithm will involve trial-and-error (the ``brute force'' idea of trying essentially every possibility) or problem ``knowledge'' (heuristics restricting the form a solution might take) or both. Ginnow's sieve will require both. A simple example of problem knowledge is the fact that for each selected number K in step 2, one of the factors still remaining in the list and immediately then crossed out in step 3 will be of the form K/p where p is prime. This fact is easily confirmed, and can be left as an exercise to make sure you understand the rules.

In other words, if K is the product of, say, 4 primes, like K = 60 = 2*2*3*5, then one of its remaining factors will be the product of 3 primes: call that the principal factor. K = 60 has many factors (1, 2, 3, 4, 5, 6, 10, 12, 15, 20, 30, 60) but only three possible principal factors (12, 20, 30), and at least one of those must be present in the list in order for 60 to qualify for selection.

Other examples of knowledge specific to this game are:

• The best first move is always the largest prime p <= N.
• The starting sequence (a, b, ... c, d, e) will yield the same optimal score as the starting sequence (a, b, ... c, e, d) as long as both are legal.
• If a < b < c < d are all still on the list, and a | b, b | d, c | d; then b is always a better selection than d. (`` a | b '' means ``a is a factor of b.'')
• After any starting sequence an upper bound can be found for the final score. That starting sequence can be rejected unless that bound exceeds the best previous score.
These heuristics are used to reject portions of the game tree, and thus reduce the size of the trial-and-error search. Another heuristic, which increases the search but reduces execution time, is to suppress some of these heuristics (because their application is time-consuming) when the search is already deep in the tree.

When this problem was presented on Usenet, another expert programmer and I had fun competing in pursuit of its solution, using heuristics like those just given. But progress was slow. Because of the exponential nature of the search, a program that takes only a few seconds when N=50, might take several minutes when N=65, or many hours when N=80. Then suddenly I found a totally different approach, that got much faster speeds. My new approach was to first solve a different game altogether!

I'll call this other game ``The Ginnow-like Game.'' It may not seem ``like'' Ginnow's original game at all, but the similarity is undeniable: its solution led directly to the solution of Ginnow's Sieve. The rules of ``The Ginnow-like Game'' are as follows: Make a list of ordered pairs (a,b) in which b is always a principal factor of a. (``Principle factor'' was defined above.) Each a or b must be drawn from the set {1,2,3,...,N} and each number in this set must be represented at most once among all the pairs (a,b). The sum of the a's is your score.

Perhaps an example is in order to demonstrate the relationship between these two games. When N=18 the optimal score in Ginnow's Sieve is 111 which is obtained with the move sequence (17, 9, 15, 10, 14, 18, 12, 16). With one exception no deviation is permitted in the ordering of these moves (the exception is 14, which can be delayed). The optimal solution to ``The Ginnow-like Game'' consists of the eight ordered pairs (17,1), (9,3), (15,5), (10,2), (14,7), (18,6), (12,4), (16,8). I have listed these in the same order as the solution for Ginnow's Sieve to make the relationship unmistakable, but unlike the Sieve, this game is defined so the order here is irrelevant. It is the fact that we need not try different orderings that makes solution of this second game very rapid.

The fact that move ordering is essential in Ginnow's Sieve, and irrelevant in the other game, makes the two games very different and it may seem quite peculiar that they have the same solution. Let's look at another example, N=21. The optimal move sequence in the Sieve is (19, 9, 21, 15, 14, 18, 12, 20, 16) for a score of 144. Again no deviation is permitted in the ordering (except now 15 can be delayed as long as it precedes 20), and again there is just one way to map these nine moves to the nine ordered pairs required in the other game.

A simple analysis shows that 145 is an upper bound on the score in either game when N=21. Only one of the large primes (11, 13, 17, 19) can possibly be used and it will take 1 as its ``principal factor.'' This leaves 16 numbers, or 8 number/factor pairs. Adding the largest eight of these numbers, and 19, gives 145. The solution above, 144, comes within one point of this, the deviation being that 9 is used as a selection and 10 as a principal factor (for 20).

But wait!! What about (19,1), (21,7), (14,2), (10,5), (15,3), (20,4), (12,6), (18,9), (16,8)? This is a valid solution to ``The Ginnow-like Game'' and achieves the bounding score of 145. These moves are shown in an order which almost works for the Ginnow Sieve, but not quite: 21 must precede 14 since its factor 7 divides 14; 14 must similarly precede 10, and 10 must precede 15. But 15 must precede 21 since we're using 3 as 15's principal factor and 3 divides 21. This creates a cycle in the partial ordering: 21 < 14 < 10 < 15 < 21, which means this solution to the Ginnow-like Game is worthless in the Ginnow's Sieve.

What to make of this? I said I solved the one game by treating it as the other, but the games are not the same after all! The workaround is simply to modify the rules of the Ginnow-like Game to outlaw a factoring cycle like the example just given. If these factoring cycles were common we'd be back where we started, needing to worry about move ordering, but it turns out they are comparatively rare. (In fact N=21 is the smallest N where the Ginnow-like Game has an optimal solution different from the optimal Sieve.)

I did adopt one technique which means the algorithm's answers cannot be guaranteed. Inspecting the results of the exhaustive searches performed earlier, one sees that all the largish numbers occurred in the optimal sequences, except when this was clearly impossible. With this observation in mind, the software inputs a threshold, called Mustt in the source code, above which an attempt is made to force inclusion.

The above discussion should be adequate to understand the essential algorithm and to see why I felt this problem was ``neat.'' Now I'll offer some comments on the details of coding.

In the software I maintain various lists, and the partial ordering relation, as bit maps. The associated macros are of general utility.

Certain operations on the bit masks are table driven; for example one megabyte of memory is allocated just so that the sum of the k largest entries in any 16-bit bitmask can be found quickly. This sped up the program only a very tiny bit, but was included anyway since the source code is intended to be a potpourri of example techniques. Not too long ago, a one-megabyte table would have been regarded as preposterously large, but it would often be innocuous on today's systems. Note that changing a single character near the front of the source code will change the bitmask word size from 16 bits to 8 bits (and the 1 megabyte table becomes just 2 kilobytes).

As with most depth-first searches, the search path is maintained in a stack. It is often easy to build that stack automatically as part of the procedure call stack; that disguises a little bit of complexity when each function examines at most one prior node. Here we'd then have to follow backpointers to print a solution, so it's more convenient to use a static array.

Some moves are made by trial and error and some are forced by heuristic. I chose to push the play stack only on the trial-and-error moves, a dubious decision since it added considerable complexity just to save a little time and memory. I arbitrarily left room for 11 heuristic moves after each trial move; sometimes more than 11 heuristic moves are made; a brief search on the string NUMASS should acquaint you quickly with these details. I could have made the program simpler by giving each move its own stack entry, but it isn't so very bad as is, and performance is often important as well as code simplicity.

Some of the so-called ``forced moves'' may be rejected by the ordering rule, and this isn't detected until after we begin to update the data structure. Observe that these updates are partitioned so that any move rejection occurs before any irreversible change is made. This is a common idea, although here, again, there would have been no need for care if we simply pushed the play stack on every move.

Another feature of this code, which may seem atypical to some readers, is that the data structure contains a lot of redundancy: The fields rfcount, revbef, remain, and numremain, for example, could all be reconstructed from other fields. Such redundant data is a ``foible'' in many kinds of software, and is a common source of bugs, e.g. in window handling code. But in this kind of software, where speed is the key, you want the data in a form for immediate application, even if that means storing it redundantly in two or more forms.

```

/*
* Usage: gsieve [Numn [Mustt [Bestscore]]]
* The first argument is N, the initial size of the sieve;
* The second argument is the heuristic threshold beyond which
*      the algorithm attempts to force number selection;
* The third argument allows a known lower bound on the optimal
*      score to be specified, adding a little speed.
*
* Default values for the arguments are:
*      Numn = 120 (the interesting sieve,
*              since it defies the "large p*p" heuristic);
*      Mustt = Numn/3 + 16 (a choice which provides fast execution
*              and seems to yield optimality when Numn <= 120);
*      Bestscore = 0.
*/
#define MAXN    140             /* largest Numn allowed */
#define ALLOCN  (MAXN + 1)

typedef unsigned short  u_short;
typedef unsigned char   u_char;
typedef u_short Smallint;       /* Actually u_char is enough */

/* Choose bitmask processing unit size */
#if     1
typedef u_short Bmword;
#define LOGBPB  4               /* log of number of bits in Bmword */
#else
typedef u_char  Bmword;
#define LOGBPB  3               /* log of number of bits in Bmword */
#endif

#define NUMBPB  (1 << LOGBPB)
#define BMSIZ   ((MAXN + NUMBPB) / NUMBPB)
#define TESTBIT(sp, n)  ((sp)[(n) >> LOGBPB] & 1 << ((n) & (NUMBPB - 1)))
#define CLRBIT(sp, n)   ((sp)[(n) >> LOGBPB] &= ~(1 << ((n) & (NUMBPB - 1))))
#define SETBIT(sp, n)   ((sp)[(n) >> LOGBPB] |= 1 << ((n) & (NUMBPB - 1)))

#define MAXASS  12
#define MAXNFAC 4       /* 3 is Enough for Numn < 210 */

struct node {
Smallint        mfacs[ALLOCN][MAXNFAC];
Smallint        gfac[ALLOCN];
int             score;
int             numremain;
struct  revfac {
int     rfcnt;
} revfac[ALLOCN / 2 + 1];
int     numass;
struct assign {
Smallint user, using;
} ass[MAXASS];
} Stack[ALLOCN / 2];

#define ISFACTOR(a, b)  (((b) / (a)) * (a) == (b))

int     Numn, Mustt, Bestscore;

CLRALL(sp)
{
int     wix;

for (wix = 0; wix < BMSIZ; wix++)
sp[wix] = 0;
}

main(argc, argv)
char    **argv;
{
struct node     *p;
int     num;

/* setlinebuf(stdout); */
Numn = argc > 1 ? atoi(argv) : 120;
if (Numn > MAXN) {
printf("Maximum N is %d\n", MAXN);
exit(1);
}
Mustt = argc > 2 ? atoi(argv) : Numn / 3 + 16;
Bestscore = argc > 3 ? atoi(argv) : 0;
setbcounts();
setprimes(Primes);
p = Stack + 0;
for (num = 1; num <= Numn; num++) {
setflist(p->mfacs[num], p, num);
SETBIT(p->remain, num);
p->numremain += 1;
}
solve(p);
exit(0);
}

solve(p)
struct node     *p;
{
int     ended, num, ax, ur, ug;
struct node     *q;

ended = 1;
if (p == Stack) {
/*
* Speed up solution with some forced beginning moves:
* Largest prime; 121 when N >= 121.
* When 98 <= N < 121, try both 49 and (25,98).
*/
p = p;
p.numass = 0;
assignit(p+1, biggestbit(Primes), 1);
if (98 <= Numn && Numn < 121) {
p += 1;
p = p;
p.numass = 0;
assignit(p + 1, 49, 7);
gimmes(p+1);
solve(p + 1);
p = p;
p.numass = 0;
assignit(p + 1, 25, 5);
assignit(p + 1, 98, 49);
} else if (121 <= Numn) {
p += 1;
p = p;
p.numass = 0;
assignit(p + 1, 121, 11);
}
gimmes(p+1);
solve(p+1);
return;
}
if (p->numass == MAXASS) {
p = p;
/* There may be more gimme's; we'll spend
*   a stack entry just for them.
* Nullify the first ass entry
*   to indicate they are all gi'me's.
*/
p.ass.user = 0;
p.ass.using = 0;
p.numass = 1;
gimmes(p+1);
solve(p+1);
return;
}
while ((ur = biggestbit(p->remain)) && ! p->mfacs[ur]) {
CLRBIT(p->remain, ur);
p->numremain -= 1;
}
if (p->score + mkmax(p->remain, p->numremain / 2) <= Bestscore)
return;
while (ur) {
for (num = 0; num < MAXNFAC && (ug = p->mfacs[ur][num]);
num++) {
p = p;
p.numass = 0;
if (assignit(p+1, ur, ug)) {
ended = 0;
gimmes(p+1);
solve(p+1);
}
}
if (ur-- >= Mustt)
break;
}
if (ended && p->score > Bestscore) {
Bestscore = p->score;
printf("Bestscore = %d :", Bestscore);
for (q = Stack; q <= p; q++)
for (ax = ! q->ass.user; ax < q->numass; ax++)
printf(" (%d,%d)%s",
q->ass[ax].user,
q->ass[ax].using,
ax ? "*" : "");
printf("\n");
}
return;
}

/* Make automatic moves */
gimmes(p)
struct node     *p;
{
int     ug, ur, oldn;

for (oldn = -1; oldn != p->numass; ) {
oldn = p->numass;
/*
* (ur,ug) is an automatic move if
*      ug is a relevant factor of only ur,
*   and ug has no relevant factors
*   and ur is larger than Numn/2
*/
for (ug = 2; ug <= (Numn / 2) + 1; ug++) if (1
&& p->revfac[ug].rfcnt == 1
&& p->mfacs[ug] == 0) {
ur = biggestbit(p->revfac[ug].rf);
if (ur + ur > Numn) {
assignit(p, ur, ug);
if (p->numass == MAXASS)
return;
}
}
/*
* (ur,ug) is an automatic move if
*      ur has only ug as a relevant factor
*  and ur is at least as large as Mustt)
* (This (might) work only because
* we scan the ur's high-to-low.
*/
for (ur = Numn; ur > 2; ur--) if ((ug = p->mfacs[ur])
&& p->mfacs[ur] == 0
&& ur >= Mustt) {
assignit(p, ur, ug);
if (p->numass == MAXASS)
return;
}
}
}

/* Make the list of maximum factors */
setflist(mf, p, n)
Smallint        mf[];
struct node *p;
{
int     prim, fac;

for (prim = 2; prim < n; prim++) if (TESTBIT(Primes, prim)
&& ISFACTOR(prim, n)) {
*mf++ = fac = n / prim;
SETBIT(p->revfac[fac].rf, n);
p->revfac[fac].rfcnt += 1;
}
}

deletemf(p, ur, ug)
struct node *p;
{
int     tix, t;
Smallint        *sp;

sp = p->mfacs[ur];
tix = sp == ug ? 0 :
sp == ug ? 1 :
sp == ug ? 2 :
sp == ug ? 3 : 4;
/* We'll keep the list in sorted order for heuristic */
while (tix < MAXNFAC - 1 && (t = sp[tix + 1]))
sp[tix++] = t;
sp[tix] = 0;
if (tix == 0 && ur + ur > Numn && TESTBIT(p->remain, ur)) {
CLRBIT(p->remain, ur);
p->numremain -= 1;
}
}

forbid(p, ur, ug)
struct node *p;
{
if (ur > Numn)
return;
if (TESTBIT(p->revfac[ug].rf, ur)) {
CLRBIT(p->revfac[ug].rf, ur);
p->revfac[ug].rfcnt -= 1;
deletemf(p, ur, ug);
}
}

relpos(ur1, ug1, ur2, ug2)
{
if (ISFACTOR(ug1, ur2))
return 1;       /* ur1 must be used before ur2 */
if (ISFACTOR(ug2, ur1))
return -1;      /* ur2 must be used before ur1 */
return 0;
}

/*
* Assign ur to use the factor ug;
* update node p appropriately.
*
* If this is the first assignment in p, it is a trial-and-error,
* and when backed out, p will be rebuilt.
* If this is not the first assignment, it is a "gimme" (strategy-forced)
* move, and won't/can't be backed out.
* However, we will back it out immediately (before doing any damage)
* if it causes an illegal ordering cycle.  (Only the cycling info of ur
* will be affected but that's ok --- it won't be used until recleared.)
*/
assignit(p, ur, ug)
struct node *p;
{
int     i, j, k, rel;
struct node *q;

CLRALL(p->before[ur]);
CLRALL(p->revbef[ur]);
for (j = 2; j <= Numn; j++) if (TESTBIT(p->ralloc, j)) {
rel = relpos(ur, ug, j, p->gfac[j]);
if (rel == 1) {
SETBIT(p->before[ur], j);
for (i = 0; i < BMSIZ; i++)
p->before[ur][i] |= p->before[j][i];
} else if (rel == -1) {
SETBIT(p->revbef[ur], j);
for (i = 0; i < BMSIZ; i++)
p->revbef[ur][i] |= p->revbef[j][i];
}
}
for (i = 0; i < BMSIZ; i++)
if (p->revbef[ur][i] & p->before[ur][i])
return 0;
/* Point of no return ... update unrepealable data */
for (j = 2; j <= Numn; j++) if (TESTBIT(p->revbef[ur], j)) {
SETBIT(p->before[j], ur);
}
for (j = 2; j <= Numn; j++) if (TESTBIT(p->before[ur], j)) {
SETBIT(p->revbef[j], ur);
for (k = 2; k <= Numn; k++) if (TESTBIT(p->revbef[ur], k)) {
SETBIT(p->before[k], j);
SETBIT(p->revbef[j], k);
}
}
p->score += ur;
SETBIT(p->ralloc, ur);
CLRBIT(p->remain, ur);
p->numremain -= 1;
if (TESTBIT(p->remain, ug)) {
CLRBIT(p->remain, ug);
p->numremain -= 1;
}
p->ass[p->numass].user = ur;
p->ass[p->numass].using = ug;
p->gfac[ur] = ug;

/* Delete other connections (k, ur) */
for (j = 0; j < MAXNFAC && (k = p->mfacs[ur][j]); j++) if (k != ug) {
CLRBIT(p->revfac[k].rf, ur);
p->revfac[k].rfcnt -= 1;
}
for (j = 0; j < MAXNFAC; j++) {
p->mfacs[ur][j] = 0;
}

/* Delete other connections (ug, j) */
for (j = ug + ug; j <= Numn; j++) if (j != ur
&& TESTBIT(p->revfac[ug].rf, j)) {
deletemf(p, j, ug);
}
p->revfac[ug].rfcnt = 0;
CLRALL(p->revfac[ug].rf);

/* Delete connections (ur, j) */
for (j = ur + ur; j <= Numn; j++) if (TESTBIT(p->revfac[ur].rf, j)) {
deletemf(p, j, ur);
}
if (ur + ur <= Numn) {
p->revfac[ur].rfcnt = 0;
CLRALL(p->revfac[ur].rf);
}

/* Delete connections (k, ug) */
for (j = 0; j < MAXNFAC && (k = p->mfacs[ug][j]); j++) {
CLRBIT(p->revfac[k].rf, ug);
p->revfac[k].rfcnt -= 1;
}
for (j = 0; j < MAXNFAC; j++)
p->mfacs[ug][j] = 0;
p->numass += 1;
return 1;
}

setprimes(pout)
{
int     num, fac;

for (num = 2; num <= Numn; num++) {
for (fac = 2; fac * fac <= num; fac++)
if (ISFACTOR(fac, num))
break;
if (fac * fac > num)
SETBIT(pout, num);
}
}

#define NUMBW   (1 << NUMBPB)
u_char  Bigbit[NUMBW];
u_char  Numbit[NUMBW];
u_char  Sumbit[NUMBW][NUMBPB];

setbcounts()
{
long    bwval;
int     bix, cnt, sum;

for (bwval = 0; bwval < NUMBW; bwval++) {
for (cnt = sum = 0, bix = NUMBPB - 1;
bix >= 0; bix--) if (bwval & 1 << bix) {
if (cnt == 0)
Bigbit[bwval] = bix;
sum += bix;
Sumbit[bwval][cnt] = sum;
++cnt;
}
Numbit[bwval] = cnt;
}
}

/* Return the sum of the k largest elements of bm */
mkmax(bm, k)
{
int     wix, m;
register tot;
Bmword  b;

tot = 0;
for (wix = BMSIZ - 1; k; wix--, k -= m) {
m = Numbit[b = bm[wix]];
if (m > k) {
tot += (wix << LOGBPB) * k;
tot += Sumbit[b][k - 1];
break;
} else {
tot += (wix << LOGBPB) * m;
tot += Sumbit[b][m - 1];
}
}
}

biggestbit(sp)
{
int     wix, dat, res;
Bmword  x;

for (wix = BMSIZ - 1; wix >= 0; wix--)
if (x = sp[wix])
return (wix << LOGBPB) + Bigbit[x];
/* -1 would be a more appropriate exit code for general
* use; but our problem specifics allow a more convenient 0.
*/
return 0;
}
```

``I found a new approach to this problem which improved search speed. I map each selected number to the required factor *without regard for move order.* This means redundant equivalent orderings aren't processed. Another key idea is to restrict consideration of the factors of K to the form K/p where p is prime.

``Since plays are no longer made in order, the software quickly finds, for N=101, that 32 and 48 must be sacrificed to get 64 and 96. These plays will be made near the very end, so an ordinary move-ordered search might waste time trying to take 32 or 48 earlier.

``Ordinary move-ordered search software might be adapted to support some of this behavior, but with difficulty. The main problem with my approach is that an apparently valid move-factor mapping may have a cycle in its required partial-ordering that invalidates it as a solution. We keep track of the partial order and reject any selection which causes such a cycle. Perhaps surprisingly, such cycles turn out to be fairly uncommon: otherwise this new method would be worthless.

``Whereas speed required introducing complicated devices into the earlier searches, the new method has no special heuristic or strategy except a single threshold, M, passed on the command line; the software forces numbers above M into the solution when it can. At present I cannot guarantee my solutions when M < N, but experimentally I found that there always seemed to be relatively small values of M for which the search completed quickly and found the same solution provided by Hugo's more exhaustive search program.

``It was noted earlier that taking the biggest (prime^2) was almost always the best second move (taking the largest prime is the best first move), but N=8 and N=20 were exceptions. Assuming my results are indeed optimal, there is one more exception, at N=120, where I get a score of 4593 with 113 25 85 115 95 119 91 77 69 111 93 87 75 45 63 105 81 117 99 74 118 106 94 86 82 50 70 98 42 78 52 116 92 104 66 88 110 100 76 114 68 102 56 112 84 80 64 60 120 96 90 72 108.

``This is 21 points better than the best solution I found using 49, which uses the same numbers, except (25,45) is replaced with just (49). This Third Exception is probably the final one, since for larger N there is always p*p > N/2.

``This optimal N=120 solution was found whenever 55 <= M <= 64. (I didn't complete the time-consuming runs for larger M.) When M=55, the search took 53 seconds. Searching from scratch took 143 seconds when M=63, but only 31 seconds when configured just to confirm the known solution. For N=82, a challenge with earlier searching, M=40 was adequate to find the optimal 2187 score, and took less than a second. I'm sure there is room to improve these speeds.''