Copyright © 2003 by James D. Allen

Introduction to Expert Programming (in C)

Lesson 3: Simplicity (continued)

For this lesson, we discuss five further modes of simplicity, with instructive examples.

3.1Simplicity means rejecting unneeded requirements.
Example: Exclusionary semaphores without locking
3.2Simplicity is how to cope with complexity.
Example: ISO 7-layer network model
3.3Simplicity is not a substitute for study.
Example: Hash tables
3.4Simplicity is the inner beauty of elegance.
Example: Bloom filter
3.5Simplicity doesn't imply laziness.
Exercise: Knights on a 5x5 board

As promised we've given more definitions of simplicity. Here's another one:

--- Simplicity: ---
Not always easy to create or even define.
Easy to recognize and appreciate.

S i m p l i c i t y     m e a n s     r e j e c t i n g     u n n e e d e d     r e q u i r e m e n t s   .

Example 3.1: Locking semaphores

A typical `exclusive access' method might comprise the four state transitions:

If Alice and Becky both attempt to acquire the same lock simultaneously, this can get messy. When the underlying processing is done by a uniprocessor (or when the command software otherwise forces serialization) the following code is typically used:

        if (*thisLock == 0) {
                disab_intupt(); /* should be `oldm = setint(DISABLE);  */
                if (*thisLock == 0)
                        *thisLock = belongBecky;
                enabl_intupt();    /* should be `(void)setint(oldm);' */
        if (*thisLock == belongBecky) {
                succ = use_the_reserved_resource();
                *thisLock = 0;
        } else {
                succ = 0;

With multiprocessor hardware, the code fragment does not suffice and a (reliable) `memory compare and swap' primitive may be used instead. Many systems (e.g. from the 1970's), however, did not provide such a primitive.

A simpler method is available in many cases which requires no special locking primitive. For example when the object being locked is a request/response bucket and the two processes (Alice and Becky) have a fixed server-client relationship, two simple state transitions suffice:

(In practice the A/B variable is a single bit, called the `IO Done' bit in Unix kernels, where this has been a standard idea for at least two dozen years.)

This simpler approach would have benefited an 1981-vintage `intelligent database machine' (specifically its Cpu --> DiskCtrl protocol) but Head of Software rejected the idea (saying ``I don't know how to do that'') when it was proposed at my last pre-consultant job. (But Dr. Robert probably knew something I didn't, as he was a startup multimillionaire long before the Internet made it fashionable.)

The above code manipulated the processor's ``disable interrupt'' mask. This requires care.

        disab_intupt();    /* should be `oldm = setint(DISABLE);  */
#if   DEBUG
        /* `if(intreg() != DISABLE) print(DEBUGINFO);' */
        enabl_intupt();    /* should be `(void)setint(oldm);' */

What happens if giggle_and_garble() (a subroutine provided by George) in the above fragment also does disint(); ... enint() instead of the reliable form that remembers the old mask? Don't blame George if you waste hours with the logic analyzer to find his bug; instead consider yourself shamed if you forgot the if (intreg() != DISABLE) ... as shown.

Such simple debug aids are often overlooked. Is an in-line test so much extra to type? Come on!

S i m p l i c i t y     i s     h o w     t o     c o p e     w i t h     c o m p l e x i t y     .

Example 3.2: ISO 7-layer model, e.g. NFS application

You may have seen this picture before:

7Application(NFS) < . . . . . . . > 7 remote
6Presentation(XDR) < . . . . . . . > 6 remote
5Session(RPC) < . . . . . . . > 5 remote
4Transport(UDP) < . . . . . . . > 4 remote
3Network(Internet Protocol) < . . . . . . . > 3 remote
2Data Link(CSMA/CD) < . . . . . . . > 2 remote
1Physical(Ethernet) <--------> 1 remote

The Network File System (NFS) application used in Unix and other systems conforms to an ISO/OIS computer networking model. As shown, several layers are defined, serving high-level to low-level needs. Logically the Layer 5 (Session handling) code on one machine communicates with the Layer 5 code on the remote machine, but of course they do not communicate directly. Similarly Layer 4 communicates (indirectly) with Layer 4 and so on. The actual `baton' of software control is received from Layer 6 by Layer 5 and then passed on to the Layer 4 code, then Layer 3 and so on, across the Ethernet, and then the `baton' ascends up through the layers on the other side. (Thus Layer N serves Layer N+1 and is served by Layer N-1.) This modular approach means a designer can pick and choose from various protocols and methods; the combination shown in parentheses (NFS - XDR - RPC - UDP - IP - etc.) are the particular methods used in the Network File System application.

If this sounds frightfully complicated rather than simple, think of it this way:

The total system requirements for a transparent file system across a remote connection are complicated. By diagnosing the complication, and extracting just the right (complicating) factor, and handling it properly, maximum simplicity is achieved.

Before leaving this ISO example let us note a different, but also marvelous, simplicity which exists in the example. Like the other simplicity, the idea is to cope with a complexity. Specifically, errors and resource bottlenecks can present complexity. The 7-layer NFS application copes with that complexity by ... ignoring it entirely!! ... by obliviously discarding any incoming message which appears flawed or inconvenient!

The Layer 4 service UDP is unreliable -- When an application uses UDP transport and the network software is unable to obtain a required message buffer or exclusionary lock, the software simply discards the UDP packet without comment. This behavior may seem buggy or at best undesired, but it's not. When (moderate) packet loss is tolerated and no acknowledgement signals are sent, network usage is more efficient and code can be streamlined. Of course the Layer 5 service (RPC) knows it is using an unreliable Layer 4 and signals with its Layer 5 partner to provoke any required retransmissions.

Many people are surprised to hear that a popular Internet transport method is designed to be unreliable but it's true. What is the key advantage of the unreliability? ... You guessed it: simplicity.

Another example of complex software which would survive a bug is the disk allocation policy in the Fast File System of Berkeley Unix. If the allocation heuristic gets confused (as it might when the system was shut down uncleanly and `fsck' not run), allocation will fall back to a simple brute-force method.

Next time your software needs a complicated heuristic, see if you can code it in such a way that any logic flaw would only impact performance, not integrity.

S i m p l i c i t y     i s     n o t     a     s u b s t i t u t e     f o r     s t u d y     .

Example 3.3: Hash-function Indexing (RAM emulation)

Hash tables are a neat and useful idea you should be very familiar with. Write a hash table handler if you haven't already. To keep the exercise simple: If you are a beginner, don't bother to support deletions. (If hash tables and Bloom filters are too easy for you, investigate the beautiful sparse distributed memories.)

Among four important extensions to Hash Tables are (listed in decreasing order of coding difficulty)

Dynamic expansion should be self-explanatory, and quadratic rehash is a nice but straightforward trick to reduce repeating collisions. I think multi-compartment hash tables -- which are an important way to save space, combining the benefits of bitmaps and hash tables -- are ignored by the other text books, so I give an example with discussion in the suggested answer to Exercise 3.5 below. Even if you're ``only'' an intermediate programmer, the coding is only slightly harder than for an ordinary hash table -- give it a try !!

My solution to Exercise 3.5 (below) includes a general-purpose hash table handler with all four of the extensions mentioned.

Hashed Indexing may seem out of place in a lesson on simplicity (After all, isn't it rather a unique and even tricky invention? -- If you can patent the XOR cursor or the double-click, an hash table patent would be worth a fortune!) but you may need hash tables for some of the other exercises. Hash tables are of great importance, so any self-taught aspiring programmer who is still unfamiliar with them should correct that lack now.

Moreover hash functions form the basis of the `Bloom Spell-checker' we consider next. No one has the aptitude for programming expertise who doesn't shiver in delight when first introduced to the sublime Elegance of the Bloom Filter.

And at the end of the day, the muses of Elegance and Simplicity are usually shown to be twin sisters.

S i m p l i c i t y     i s     t h e     i n n e r     b e a u t y     o f     e l e g a n c e     .

Example 3.4: Bloom Filter

The Bloom Filter is a (large) memory organization for spell checking based on hashed indexing. This highly unusual method has been written up frequently, e.g. in Dobb's Journal, but if you don't know it yet, read slowly and ponder the hints. Many of the best simple ideas in computer programming have already been discovered, but can you reinvent the marvelous Bloom apparatus from a few hints?

... How much memory is needed for a spelling checker? A simple bit array which answers the binary question `Is XYZ a word?' would work nicely, except that's over 26^7 = 8 billion bits just for 7-letter words. This memory will be 100% reliable (words are always called words, nonwords called nonwords) -- what memory would be needed if a small but positive error rate were allowed? Hint 1: The Bloom Filter is unreliable (as was UDP Transport).

Calculating the memory usage for the case of a 100% reliable spell checker is a good exercise. Such methods may use K.h bits, where K is the dictionary size (number of correctly spelled words) and h is the entropy congealed by answering the spelling question (e.g. if mispellings are random and number 2^h-1).

This question -- how much memory do I need to reflect given relational information -- arises frequently, and is discussed in Appendix M.

With the Bloom filter, a completely different idea determines the total memory used. An optimized Bloom filter uses total memory of (1.4427 K H) bits where K is the number of properly spelled words and S = 2^-H is the desired error rate for false positives. 1.4427 = log22.71828; in other words the optimal Bloom filter uses 1 nat of memory for each attempt to write a hash bit.

Equivalently, the storage cost per valid word is given in nats by (log S) (log 2)^-2. (See derivation in Appendix M.)

Even if you are a beginner you may want to take the time to code a spell-checking program based on the Bloom filter.

Whenever a word is added to the spell-checking dictionary, simply set H bits in the large memory. (Some of the bits may already have been set.) The H bit indexes are just H different hash functions. When testing a word, read the same H bits and report an error if any is zero. Information theoreticians may be unsurprised to learn that optimality is achieved when there an equal number of zeros and ones in the Bloom memory.

This lesson contains several implicit exercises (write a hash table handler with some of the features mentioned), write a spell checker using the `Bloom filter' memory. These are pretty easy problems.

S i m p l i c i t y     d o e s n ' t     i m p l y     l a z i n e s s     .

Exercise 3.5: Knights on 5x5 board (Hard Problem)

I'm afraid this programming exercise will be too difficult for beginners (the exercises in Lesson 5 are easier.) Still I recommend it for anyone who aspires to be an expert programmer. It is an excellent practice exercise for C coding, algorithm design, and time/space optimization.

The reference answer contains the discussion of large lookup tables promised above. While your computer memory is probably large enough so that any lookup method will work, if the problem seems too easy, make the challenge to squeeze it to run with minimal memory.

On a 5x5 chessboard with 12 black knights and 12 white knights in the position shown at the right, in how few moves can you reverse all the colors? Each move is an ordinary chess knight's move (necessarily non-capturing).

Please tell me how long it takes you to code and debug your solution. Even you masters should invest an hour or two and polish your skills. Note what the memory requirement is, to find the solution. Does it use more than 20 megabytes? For an extra challenge get it below twenty megabytes!


In my program to solve the 24 knights problem, I use hash tables with the extensions we discussed, and can find the solution with less than twenty megabytes:

Copyright © 2003 by James D. Allen

Please send me some e-mail.

Please visit my family tree.