Backup of: http://sky.relative-path.com/zx/floating_bus.html
The Definitive Programmer’s Guide to Using the Floating Bus Trick on the ZX Spectrum; or, How to Keep Your Bus Afloat
Floating bus
This document is a draft. As time passes, and as more questions arise (and as I see fit), I will expand and revise it. So, watch this space.
My other ZX Spectrum–related stuff
- Introduction In the context of the ZX Spectrum, the floating bus trick refers to exploiting a hardware quirk of these machines, where a value being read by the ULA from the data bus can also be read by the CPU. (For a more detailed description of the floating bus in general, I encourage you to search the Internet for this term.) As one of ULA’s primary purposes is to form the video signal for the TV, the values it reads from the RAM are those that form the bitmap of a byte on the screen and its color overlay—or attributes, as they are referred to. Since only one device can access RAM at a time, and the TV raster beam cannot be suspended, the ULA then holds the CPU by suspending its clock, reads a value from the RAM, then resumes the CPU clock pulsing. This phenomenon is commonly known as memmory contention. While the display file on the Spectrum is only ~7KB is size, the contention is applied to a full 16K page of RAM (or pages on 128K machines).
The ULA uses a regular pattern for fetching values from the display file so as not to hold the CPU for the entire duration of a frame. The pattern is as follows: (a) when the ULA is drawing the border, no contention is applied, and the CPU operates at full speed; (b) when the ULA is drawing the main screen area, it suspends the CPU for 4 T states while performing four reads—bitmap, attribute, next bitmap, next attribute—then releases the CPU again for 4 T states, then continues with the next four bytes, and so on and so forth, until it reaches the end of the line and proceeds drawing the border again.
The value being read by the ULA from the display file can also be read by the CPU using the IN instruction and polling any port that is not physically attached to any peripheral. Since on the Spectrum, port addresses are only partially decoded, it is enough to specify the LSB of the port address. What’s more, during the idling interval (i.e. when the ULA is drawing the border or during the 4 T states in between the bitmap/attribute byte fetch), all eight lines of the bus are properly held at a logical “1” thus returning the value of 0xFF.
Thus, by reading from a port, one can tell whether the ULA is drawing the border/idling, or drawing the screen.
- Practical use While the above might seem overly technical to some of you, the main point to take home is that since we can peek at the value the ULA is currently reading (or, more appropriately, the value currently sitting on the data bus), we can also know what area of the screen is being drawn by the TV electron beam. Well, in theory.
In practice, we only know if the value on the bus is 0xFF or something else. This might seem like very little information (what if our screen is filled with 0xFF values in both the bitmap and attribute areas?), but by carefully preparing the screen and refining the timing of our port reading technique, we can get very precise.
- Going on a tangent Why is this useful to begin with? If you want a steady frame rate in your game and no flickering, you must have some form of synchronization available to you. Otherwise, the part of your code that is erasing an area of the screen before drawing new sprites might coincide with in time of the electron beam running across the same area. If you’re a little unlucky, that area will flicker occasionally. If you’re really unlucky, however, it might remain blank most or all of the time.
The most straightforward method is to rely on the interrupt for synchronization. Interrupts are generated by the ULA at the end of each frame. This is useful, as we can start executing our redrawing routines right after an interrupt occurs. This will give us the time to update the screen while the electron beam is drawing the top border. If the top of the screen is occupied by a more or less static image (a score board, status display, fancy ornament, etc.)—even better, we have some extra time to finish updating the screen below it. But what if we want to use the top of the screen and not all of our drawing routines finish executing by the time the ULA finishes drawing the top border?
One method to employ is such case is “drawing behind the beam.” The concept is very simple: you wait for an interrupt, then wait some more (say, via a wait loop) for the ULA to finish drawing the top border, then you start drawing or redrawing your screen. Chances are you’ll always stay behind the beam, and no flicker will occur.
While virtually foolproof, this method is incredibly wasteful. Here’s where the floating bus trick comes into play.
- There and back again Imagine you could catch the electron beam at any point in time, and sync your drawing routines to it. Instead of wasting precious CPU cycles between an interrupt and the beginning of the display area, you could use it for some other calculations, then chase the beam as you see fit. Well, you can. If you replace your idle loop with a loop that reads the data bus and, as the value read changes from 0xFF to something else, you know that it has left the top border and is now in the screen area. That’s very useful, but can be taken even further.
How about syncing to an arbitrary position on the screen? What if you could start executing your screen updates even before a frame began, say, at the bottom of the screen? That would give you yet more time—the entire bottom border (at least) plus the top border. By Grabthar’s Hammer, what a savings!
Remember that reading the data bus while the ULA is fetching a bitmap or attribute byte will return that very byte. Reading the bitmap byte doesn’t seem all that useful—we’ll get too many false positives, but the attribute byte is a whole different ballgame.
If we fill an area of the screen with a unique attribute value—a value that doesn’t appear anywhere else on the screen—we can then read the bus in our loop and only exit it when the value read matches the expected one.
- A working example Before we begin, I need to warn you about an important (albeit, probably, obvious at this point) caveat. If we arbitrarily read from a port at any time, we’ll end up fetching any of the three values: 0xFF, during the idling period (i.e. while the ULA is drawing the border or in between the bitmap/attribute fetch cycle); the bitmap byte (which can also be 0xFF); or the attribute byte (which can also be 0xFF, if we’re crazy enough to set it to white ink on white paper, BRIGHT 1, FLASH 1). We can make sure the attributes are unique and not 0xFF, but we can’t very well control the bitmap byte. It can be anything—that’s the idea of drawing after all. For this reason, we must ensure that we only read the attribute byte. How, though?
This task is accomplished by carefully timing our loop. Several things must be taken into account here, including the fact each instruction takes a fixed number of T states. Moreover, our code must sit in non-contended memory, so as not to throw our timing off. There are many ways to skin a cat, and your preferred method of reading from a port might be different from my example below. In my loop, I chose DEC HL as a padding instruction. It takes 6 T states and doesn’t do anything destructive as far as this particular piece of code it concerned, other than ensuring that we always read the attribute byte and not the bitmap byte.
Any port that is not physically connected to a peripheral device will do. It’s “traditional” to use port 0xFF on the Spectrum. Because of partial port address decoding, it’s also not particularly important what the MSB is (however, contention affects the timing). I chose port 0x40FF in my example.
Assemble the following code anywhere in non-contended RAM.
ld a,9 ;INK 1; PAPER 1; BRIGHT 0; FLASH 0
ld b,32
ld hl,$5a40 ;draw a 32 char-wide strip
1$ ld (hl),a inc l djnz 1$
fl_bus
ld de,$940 ;attr into D, MSB of port addr into E
2$ dec hl ;[6]padding instruction
ld a,e ;[4]MSB of port addr into A
in a,(
ld a,1 ;border blue
out (254),a
ld b,128 ;wait
djnz $
ld a,7 ;border white
out (254),a
jr fl_bus
When you run it, you should see something like this:
Floating bus trick on 48K/128K/+2
Note the missing line in the left border before the screen area. That’s to be expected—we don’t start changing the border color before we fetch the first attribute byte.
If you’re still not convinced that we’re only latching onto the attribute byte, the example below will definitively demonstrate that it is in fact so. We’ll fill the entire bitmap area of the display file with the same value as our attribute byte—9.
ld a,9 ;INK 1; PAPER 1; BRIGHT 0; FLASH 0
ld b,32
ld hl,$5a40 ;draw a 32 char-wide strip
1$ ld (hl),a inc l djnz 1$
ld bc,6143
ld hl,$4000
ld de,$4001
ld a,9 ;fill pattern
ld (hl),a
ldir
fl_bus
ld de,$940 ;attr into D, MSB of port addr into E
2$ dec hl ;[6]padding instruction
ld a,e ;[4]MSB of port addr into A
in a,(
ld a,1 ;border blue
out (254),a
ld b,128 ;wait
djnz $
ld a,7 ;border white
out (254),a
jr fl_bus
Assemble and run it again and you’ll see that the code still only catches the attribute byte:
Floating bus trick on 48K/128K/+2
So far, I haven’t told you anything you (or Spectrum developers at large) didn’t know. Now let’s get into the mystery of why the above will work on the 48K/128K/+2 machines, but not on the +2A/+3.
- It only took us 30 years The existence of the floating bus (and the way to exploit it in games) was presumably discovered by accident by the late, great Joffa Smith in around 1986. The earliest examples of games that used this method (although in a more primitive way of waiting for the value to change from 0xFF—the border area—to something else) were Cobra and Terra Cresta, both of which came out in late 1986. Very few games took advantage of this technique since then. My understanding is that it was because it was discovered a little too late. By that time Amstrad had already acquired the Spectrum line of products from Sinclair, and soon afterward the ZX Spectrum +2A was released. Its redesigned hardware introduced several incompatibilities for the software that was written for the original machines (including the +2, which architecturally was, for all intents and purposes, a 128K with a slightly different ROM). The floating bus fell victim to one of such redesigns, as it was gone from the +2A and +3.
A couple of titles were rereleased with the floating bus loop replaced with a simple idle wait loop, sometimes with not ill effect. Cobra, however, suffered from a great deal of flickering, much to Joffa’s chagrin. All in all, the technique was abandoned, since it could not be used on all Spectrum. Or could it?
That was the consensus for almost thirty years. In early 2016, Cesar Hernandez, the author of the ZEsarUX Spectrum emulator, made an interesting discovery. According to his simple test (written in BASIC), something of a floating bus was indeed present on the +2A/+3, albeit with a few differences from the previous models. Namely, it occurred on different ports, when paging was enabled, and the returned value always had Bit 0 set. Some speculation followed, but nobody felt compelled enough to take it any further.
Fast forward to the fall of 2017, when yours truly stumbled upon Cesar’s post. Coincidentally, at that time I was struggling with shoehorning my drawing routines for A Yankee in Iraq into the top border. No matter how hard I tried, there still was a narrow strip at the top of the screen where flicker could occur sometimes. Much like the rest of the community, I was under the impression that the floating bus trick was a peculiar quirk that only existed on earlier Sinclair machines and as such was of no use if you wanted your game to run on all Spectrums.
Cesar’s post made me very curious. What if . . . ? That’d be great news, I thought. But perhaps it’s wishful thinking, since no one has used it since then. Curiosity and stubbornness took the better of me, and I began my own investigation. The rest is history. Suffice it to say that with the help of a few volunteers and some additional tests by Hikaru (his were much more scientific than mine), we managed to solve the Great Floating Bus Mystery on the +2A/+3. It only took us thirty years. In the process, we also helped Mark Woodmass make the necessary changes to his SpecEmu, which became the first Spectrum emulator to fully support the floating bus on all Spectrums. A Yankee in Iraq became the first Spectrum game to take advantage of the floating bus trick across the entire Spectrum line of computers.
Why the historical tangent, I hear you ask? Because as of this writing, SpecEmu by Mark Woodmass and Spectramine by weiv, are the only two emulators that I’m aware of that you can use to test your games. Both are Windows-only. I was only able to perfect the technique by purchasing an actual Spectrum +2A, so unless you own one or trust the aforementioned emulators completely, I strongly suggest you follow my instructions to the letter.
-
Enough chit-chat. Let’s dig in The floating bus on a +2A/+3 differs from previous Spectrums in a few key aspects outlined below:
-
It is only found on ports that follow the pattern (1 + (4 * n) && n < 0x1000) (that is, ports 1, 5, 9, 13 . . . 4093).
-
The bus always returns 0xFF if bit 5 of port 32765 is set (i.e. paging is disabled), so it won’t work in 48K mode.
-
Otherwise, the value returned is the value currently read by the ULA ORed with 1 (i.e. Bit 0 is always set). In practical terms, this means that we can’t use even values for the INK attribute; so no black, red, green, or yellow.
-
During idling intervals (i.e. when the ULA is drawing the border or in between fetching the bitmap/attribute byte), the bus latches onto the last value that was written to, or read from, contended memory, and not strictly 0xFF. This is crucial to keep in mind, but I’ll explain it in detail below.
Quite a laundry list, if you ask me. That’s probably another reason why it was not discovered until later. Commercial software authors were not insane enough to spend extra time and effort going on a wild goose chase investigating it to such an extent. Leave it to a couple of incessant Spectrum enthusiasts in their forties to tackle the details.
So, how do we modify our code above to work on a +2A/+3, then? Let’s go through the checklist above, one item at a time. First, the port address. That’s easy enough. For no particular reason other than this author’s whim, we’ll chose port 4093 (0xFFD). Check. Make sure paging is enabled. Well, as long as we don’t disable it explicitly (and the user doesn’t decide to switch to 48K BASIC mode), we’re good. Upon power-up, the Spectrum will have paging enabled. Check. We already made sure that Bit 0 is set in the value that we’re expecting (INK 1). Check. Now comes the tricky bit.
Two conditions must be met simultaneously. One is that, as in the previous example, the timing of our loop has to be just right so we don’t accidentally fetch the bitmap or idling bus value. The other is that we can no longer rely on the fact that the idling value will always be 0xFF. It can be any byte (ORed with 1, remember) that was placed on the data bus if it was read from, or written to, contended memory. So as not to leave it to chance, we must “preload” it with a value that is different from our sync value ourselves. After many days of experimentation in September of 2017 (and only after I had direct access to an actual +2A), I arrived at the simplest padding instruction: LD A,(NNNN), where NNNN is any address in contended memory. Again, on a whim, I chose the first attribute byte of the display file 0x5800.
Let’s rewire our example.
ld a,9 ;INK 1; PAPER 1; BRIGHT 0; FLASH 0
ld b,32
ld hl,$5a40 ;draw a 32 char-wide strip
1$ ld (hl),a
inc l
djnz 1$
fl_bus
ld de,$90f ;attr into D, MSB of port addr into E
2$ ld a,($5800) ;[13]point to contended memory and fetch a “blanking” attr
ld a,e ;[4]MSB of port addr into A
in a,($fd) ;[11]read port 0xFFD into A
cp d ;[4]is it D (i.e. INK 1; PAPER 1; BRIGHT 0; FLASH 0)?
jp nz,2$ ;[10]no? keep trying
ld a,1 ;border blue
out (254),a
ld b,128 ;wait
djnz $
ld a,7 ;border white
out (254),a
jr fl_bus
If all goes well, after assembling and running this listing, you should see a picture similar to the one in our first example. There are some timing differences between +2A/+3 machines and the previous models. The former apply contention only if the MREQ signal is low (therefore making I/O access non-contended), whereas the latter apply it every time the ULA needs to access any area of contended RAM. This explains why, say, games that use the multicolor trick must use different patterns for each machine: one for the 48K, one for the 128K/+2 (because of a different clock speed), and one for the +2A/+3 (because contention patterns are different). But that’s an aside that is not related to this article.
Second draft: April 18, 2018 by Ast A. Moore (added a bitmap fill pattern)
First draft: April 17, 2018 by Ast A. Moore