A Brief z80 Assembly Tutorial

Chapter 6

Let's start with a small (and quite pointless) optimization. The findplayer function, which we'll be calling once after loading a level, can be optimized quite a bit.

Optimizing findplayer

; findplayer - scans map and returns the player's position
; no parameters, destroys a, bc, returns position in hl
findplayer:                
        ld hl, map - 1
findloop:
        inc hl
        ld a, 1
        sub (hl)
        jr nz, findloop
        ld bc, 65536 - map
        add hl, bc
        ret

First, we can use CP instead of SUB, which means we don't need to keep reloading the A register. This will improve the runtime but the size will remain the same, as the LD A just moves outside the loop.

; findplayer - scans map and returns the player's position
; no parameters, destroys a, bc, returns position in hl
findplayer:                
        ld hl, map - 1
        ld a, 1
findloop:
        inc hl
        cp (hl)
        jr nz, findloop
        ld bc, 65536 - map
        add hl, bc
        ret

Second, there is a single z80 instruction that does pretty much what our loop does - CPIR - so we might as well use it:

; findplayer - scans map and returns the player's position
; no parameters, destroys a, bc, returns position in hl
findplayer:                
        ld hl, map - 1
        ld a, 1
        ld bc, 16*12
        cpir
        ld bc, 65535 - map
        add hl, bc
        ret

Note that the add after the loop (or cpir) has changed by one. Using the CPIR instruction has the additional bonus of being bounded, so if the map doesn't have a byte with the value 1, we won't be scanning memory after it.

That out of the way, let's make a first stab at physics.

A First Stab at Physics

First, changes in the main loop; we'll add call to the physics function after the map drawing loop, with its own border color so we see how heavy it is:

        ld a, 4
        out (0xfe), a

        call physics

        ld a, 0
        out (0xfe), a
        
        halt

And then we have the new physics function. Let's go through it in chunks again.

; physics, calculates physics of non-player controlled items
; no parameters, destroys a, bc, ix (eventually others)        
physics:
        ld ix, map
        ld b, 16*11

Here we load the start address of the map into the register IX. Note that there are a bunch of limitations when it comes to the IX (and IY) registers - you can't do "LD IX, HL" for example. If you need to move register values to IX/IY, you need to go through the stack, or be creative with instructions like setting IX to zero and adding another register to it. (There's no "ADD IX, HL" though). The IX/IY instructions are also, overall, slower and take more bytes, so it's usually better to use something else.

Additional note about IY and the spectrum - if you're using any of the ROM routines (which we won't be), it's best to leave IY alone - the ROM routines depend on IY pointing at some specific place.

We won't scan the last line of our map, because there's nowhere to drop from there.

physicsloop:
        ld a, (ix)
        cp 5
        jr z, physics_gem
        cp 6
        jr z, physics_stone    
physicsdone:
        inc ix
        dec b
        ret z
        jp physicsloop

All of this should look pretty familiar by now. We load the tile to register A, check if it's 5 or 6 and jump to the appropriate handler, increment our index register, decrement our loop register and return if we hit zero, otherwise keep looping. The only really new thing here is the fact that the RET instruction can be predicated; so can the "CALL" instructions.

There's no real reason why I couldn't have written this as "JP NZ, physicsloop" followed by non-predicated RET. When the jump isn't taken, RET Z takes 5 clocks followed by 10 clocks for the JP, while JP NZ takes 10 clocks either way, and RET takes 10 clocks when not predicated, and 11 when predicated, taking the jump. So overall, doing it this way is 5 clocks worse per iteration than the other. But it's here for demonstration purposes.

physics_gem:
        ld a, (ix+16)
        cp 0
        jr nz, physicsdone
        ld (ix), 0x80
        ld (ix+16), 0x85
        jr physicsdone    
physics_stone:
        ld a, (ix+16)
        cp 0
        jr nz, physicsdone
        ld (ix), 0x80
        ld (ix+16), 0x86
        jr physicsdone

The stone and gem physics are identical for now. Here we can see why we wanted to use the index register: you can add an offset to them. The offset is a signed byte, so you can add or decrement about 127.

We check if the tile below is zero, if not, we don't do anything. If the tile below is zero, we set the current tile to zero (with dirty bit on) and fill the tile below.

Note that the dirty bit gives us the added advantage here that the tile won't be moved more than once per iteration.

The physics processing takes a chunk of our frame time, but less than I feared. I honestly expected this approach to take so much time that it would force me to make a list of the physics objects, but I'm glad this, much simpler, solution is viable.

And when we try it out, we find that our rocks and gems drop down as expected:

The physics of both rocks and gems needs to be a bit more complex eventually. Both rocks and gems should slide to the side if above rocks or gems. Rocks also have the ability to make the player go splat, but only if they're in the move already - which means that rocks actually have two states!

We may want to make it so the physics is not run on every frame, or possibly run only part of the physics on every frame, to make things fall slower. Scanning one horizontal line of the map per frame would make the physics run at about 4.5Hz (or 4.2Hz if all lines are scanned), which might look nice, and let the player outrun the rocks if need be. So let's try that out.

Slower Physics

First, we need a new variable to track which line we're scanning:

physofs:
        db 0, 160

We want to scan the map bottom-up, or otherwise we'll be updating falling objects several times on every cycle; hence, we start from offset 160. We can't rely on the dirty bits to avoid moving tiles several times because the map drawing function clears the dirty bits.

Like before, we skip the scan of the bottom-most row of tiles.

physics:
        ld ix, map
        ld bc, (physofs)
        ld hl, -16
        add hl, bc
        ld a, l
        cp -16
        jr nz, physics_notlooped
        ld hl, 160
physics_notlooped:        
        ld (physofs), hl
        add ix, bc
        ld b, 16
physicsloop:

The start of the physics function changes to take this into account. We take the physofs variable, decrement it by 16, check if we're past zero, and if so, reset the offset to 160, otherwise use it. The loop also has shrunk to 16 iterations.

The result shows physics time shrunk significantly, and rocks (and gems) falling down at a slower pace:

This chapter's version of the source is available here.

The raw binary is now up to 1692 bytes, up from 1603, so our simple physics (with the small optimization we did to the findplayer) added mere 89 bytes.

Any comments etc. can be emailed to me.