My career was as inventor (with thirty-two U.S. patents) but one of the inventions of which I'm most proud was never patented. Let me tell you about it! To lay some context, I'd been working with the 145 (along with other IBM mainframes) for two years, and with AMS's "Model A" memory attachment to the 145. That Model A was AMS's second designed attachment to 145. Our story begins when the Systems Engineering Director decides to build a lower-cost UMS box, which will replace attachments to several IBM mainframes (using the same storage cards, cabinet, and even backplane wiring(!) for all models. Though the Model A was successful, everything was to be redesigned for this third (UMS) model of 145 memory. (Volumes were high; the company was making money.)
Of all the many computers I've worked with, IBM's 370/145 was the most fun to microprogram or to troubleshoot. Most of the machine was organized as 36-bit registers or 36-bit buses. (By 36-bit, I really mean 32 bits plus parity.) Only one part of the machine was wider than 36 bits: the output from memory (Storage Data Bus Out, or SDBO), which was 72 bits (one doubleword). (SDBI, Storage Data Bus In, was only 36 bits: it took two write cycles to replace a doubleword.)
The same memory array was used both for the 370 main memory and "control storage" (the system microprogram storage), although the latter was disjoint from and inaccessible to 370 Assembler code. (The "true" machine language was 3145 microcode.) Ignoring certain special diagnostic modes, every cycle either began by fetching a microinstruction or ("Storage-1" cycle) by accessing main memory or control storage as data. Thus, as you will see in the diagram below, SDBO could be directed either to the microinstruction register (C-register), or to the "SDBO assembler" which was the main data bus in the machine. (You'll see that D-register, the latched output of the ALU, could also be directed to that main bus.)
For this webpage, we're interested only in the data output by main memory; the C-register and D-register are mentioned in the diagrams just for reference. As you will see in the diagram, the data from memory passes through a multiplexor (to choose between "internal" memory and memory located in an external cabinet), and is then clocked into a 72-bit register. The register is followed by another multiplexor: all the buses in the machine are 36 bits, so only half of the SDBO can be used. Finally, a third multiplexor chooses between SDBO and the D-register to create the "SDBO Assembly" which was the input bus to most of the registers of the machine.
The discussion on this page will be clarified by four images, but I've combined the four images into a single animation. All of the bus lines shown in the diagram are 36-bit. (The 72-bit busses are depicted as two busses.) In addition, three control signals are shown.
The smallest 145's fit into a single cabinet with two chassis:
a chassis for the CPU and a chassis for memory.
But this limited the machine to slightly less than a quarter
of a megabyte(*). Perhaps all machines were eventually
upgraded to larger sizes: IBM had dramatically underestimated
customer demand for memory.
The first frame in the animated diagram shows this base case, all in black.
* - No, "megabyte" isn't a misprint. This was in the mid 1970's.
For larger memories, IBM would install a second cabinet.
(Customers with certain IO channel or controller configurations
would also get the 2nd cabinet whether they needed the extra
memory or not.)
The second frame in the animated diagram shows this case, with the 72 data wires from the second cabinet shown in blue.
The control storage was always located in the Internal Memory. Low memory (containing special system memory like the Location-80 Timer) was always located in the external memory cabinet if such a cabinet was present.
I didn't work for IBM. I worked for AMS/Intersil which provided add-on memory for IBM customers. There were three reasons why a customer might buy from us instead of from IBM:
(But, when my invention was implemented, the preference even when no IBM 2nd memory cabinet is present was for the fourth configuration described below, with blue wires omitted.)
|Four different configurations
are presented in this animated gif:
|This diagram depicts
the delivery of main memory output to the main
internal bus of the 370/145 computer.
Three steering signals are used:
"-Gate Internal Storage", "+Clock SDBO (open window)",
and "-Select Even Data"; note that in the most complex
case, data will be routed from the green (AMS) wires,
if and only if all three control signals are in the
High (positive) state.
Selection signals for several "obvious" multiplexors are not shown.
The interesting case was when a customer who'd already installed external IBM memory in the second cabinet wanted to upgrade further via AMS. The third frame in the animated diagram shows this case. Another multiplexor is needed, choosing between the 72 wires (shown in blue) from IBM's external memory, and the 72 wires (shown in green) from the AMS cabinet. This configuration was called "shared port."
There is also a fourth frame in the animated diagram; it is the subject of this webpage.
It takes just 18 chips (MECL "quad mux" chips) to multiplex the 144 signals, half from IBM's 2nd cabinet, half from AMS's cabinet, down to 72 signals for presentation to the original IBM circuitry shown in black. This is only a small amount of circuitry, fitting easily on a smallish circuit board. However, two hundred and sixteen (216) wires are connected to those multiplexors. Each wire was one of the expensive trilead cables used in IBM's 370 line. Physically routing and mounting so many cables posed a problem; in fact AMS addressed that problem by installing a fourth cabinet just for the shared port! (There was a little bit more to the shared port than I discuss here. For example, in addition to the 72 data bits, there were about 13 error signalling wires that also had to be multiplexed.)
Installing a whole cabinet, with its own power supply and fan IIRC, added complexity and expense. Adding insult to injury, the add-on maintenance personnel were often required to reroute the 144 IBM wires to restore the machine to a virgin state when IBM wanted to do maintenance.
I was just the diagnostic programmer with no formal education in circuit design. But I had a wonderful manager named Dick Rasmussen who asked me to design circuits just so I'd "be challenged and not want to look for another job." I enjoyed diagnostic programming, but circuit design was also fun. One day as the UMS project was underway, and I'd had a couple pre-UMS board designs under my belt, I went to him and complained sadly, "All the other engineers are designing circuits; why didn't you ask me?" He invited me to work on a new shared port and I returned to my office grumbling to myself -- what's the fun in drawing up a 72-bit multiplexor?
I loved the Model 145 and had pored through its logic diagrams the way others might have pored through porn! I'd learned far more about the machine than I needed for my diagnostic programming work. Soon after getting my assignment, a lightbulb flashed in my head. I don't recall any thought processes; I think one moment my mind was blank, and the next I had a novel idea.
The fourth frame in the animated diagram shows the data routing I came up with. You'll see that there's still a total of 72 multiplexors, but 36 of them could be placed on our Data/ECC card with no cables, just printed-circuit traces. (The actual circuit comprised 36 three-input multiplexors, rather than the 72 two-input multiplexors shown here, IIRC.) We'd still need 108 cables for the main multiplexor, but that would fit in our cabinet easily. (216 cables might have fit anyway -- I'm proud of the idea for its "cleverness", not for its cost savings -- although since trilead cables cost several dollars each(!) there was cost savings.
As an added bonus, you'll note that the AMS data no longer gets delayed by the Timer Multiplexor (discussed below.) Moreover, the only IBM data that gets delayed by our port-sharing multiplexor is the odd word -- the 36 bits which are not delayed by the Timer Multiplexor and are therefore faster than necessary. (In the older shared port, as you can see, IBM's external data is all delayed by an extra 13 nanoseconds or so -- it's lucky it worked at all!)
The ALU output actually is latched into the Z register; Z's contents are then passed to the D-register. In this way the ALU/Z apparatus can operate on the cycle N+1 inputs, while the storage of the cycle N result is delayed. Special hardware detects when N's output is N+1's input, and a bypass is created, gating D to A and/or B. Recall that the 16-bit ALU cycles twice for 32-bit adds/subtracts. For logical ops (bitwise and, or), only 8-bit ALU ops are permitted! (The same byte is sent to both 8-bit halves of the ALU's inputs, with results compared as an integrity test.)
For backward compatibility with the System 360, the 145 featured a timer in bytes 80-83 of low memory. In a one-cabinet system, it is only for this case that the alternate input to the "-Gate Internal Storage" multiplexor will be selected. The timer is of no further interest in this story, except for the relevance to worst-case propagation delay. (There is an interesting story, involving a machine in Atlanta Georgia and the 3145 Interval Timer Interrupt latch logic.)
Yes, the 370/145 (and similarly 370/135) always fetched 72 bits but used at most only 36 of them. (This full double-word is fetched to apply a consistent low-cost ECC.)
BUT there is one case where the 145 is able to use all 72 bits. Because the Instruction Buffer (with up to 12 bytes IIRC) is semi-autonomous, special hardware uses up to 8 bytes (the entire doubleword), on its memory fetches. (NOTA BENE: If the first half ("even word") of that doubleword contains a S370 instruction to write memory which writes into that same prefetched doubleword, the 145 must know to refetch the instruction word!)
Although the 3145 documentation doesn't bother to give it a name, the cycle after a Storage-2 cycle is somewhat special. In particular, that's when the odd word just fetched may be gated to the Instruction Buffer. Another detail is that the SDBO Pre-assembler (not labeled in the diagram, but shown as input to I-bfr and co-source with the D-reg) needs to have correct parity during a Storage-3 cycle.
A Storage-3 cycle can never also be a Storage-2 cycle. It might be a Storage-1 cycle, a different micro-instruction (Branch, BAL/RTN, Move, Arithmetic), a Share-1 cycle, a Correction cycle (used only when the control storage access to a microinstruction has an ECC error), or one of several diagnostic cases. (Share-1 is the same as Storage-1 except the microinstruction is generated by hardware rather than fetched from control storage. Storage/Share is the only two-cycle microinstruction; an Arithmetic Word that specifies 32-bit arithmetic will have to drive the 16-bit ALU twice but this is done by lengthening the single cycle.
During a Storage Instruction, there may be up to three writes of local storage. Local storage is the 3145's general purpose registers (including the 16+16 words known to 370 Assembler programmers as well as 32 words reserved for channels and microcode). The need for three writes is due to the Storage Word microinstruction implementing the C code:
R_read = * R_address ++; R_count ++;(The register selections don't come directly from the microinstruction, but indirectly if the microinstruction specifies its indirection method.) IIRC, the address will be written in middle of Storage-2, the read data late in Storage-2, and the count in the middle of Storage-3. (During the Storage-1 cycle, the D-register result from the preceding Arithmetic or Move Word (Microinstruction) will be written into Local Storage (or Expanded LS, or one of the so-called External Registers).
Instead of incrementing the address, decrementing is permitted but IIRC this happens only on a Share cycle for a tape device reading backwards!! ??
But for our purpose the significance of a Storage-3 cycle is that, if we (the AMS UMS-model memory) are selected, we must maintain high voltage on the normally low -Gate Internal Storage, maintain high voltage on the open-window enable to the SDBO register (not normally clocked by IBM on a Storage-3 cycle) and maintain high voltage on -Select Even Data, since AMS data is always provided on the Odd Word bus. (Actually, this forcing is useful in Storage-2. The IBM logic already forces Odd Word for Storage-3, at least in the relevant cases.)
And note that Forcing a high level on a signal wire is easy with the Mecl-like IBM logic: multiple drivers can be hard-wired together with copper to yield a "Wire-Or." If an over-riding Low (Negative) level had been needed on any of the desired control signals, a much less serendipitous solution would have been needed, perhaps even Exacto-knife cuts of IBM PC-board traces! ::whack::
This 36-bit attachment of main memory to the 3145 mainframe computer worked successfully the first time it was tried. I daresay I'd not have considered this approach if I hadn't (unnecessarily) personally studied the 3145 in detail. There was another connection (not shown to avoid clutter) between the SDBO register and local storage; I had to know that this was used only in diagnostic modes where Storage-2 and Storage-3 would not interfere. Call it all a fluke if you like. What made it like magic for me was that this all appeared, suddenly in my mind, not long after I'd walked out of my manager's office despondently, despondently because I'd asked for a circuit design project to be assigned to me and had been given only the boring "shared port." (In fact I also ended up being principle designer on the Data/ECC board for UMS-145, which was also great fun!)