A Brief z80 Assembly Tutorial

Chapter 7

Let's again start off with a little optimization. I started to wonder why, excactly, is the map drawing loop (which scans through the map, skipping work as possible) taking so much more time than the physics loop (which scans through the map, skipping work as possible)?

The map drawing loop is pushing and popping couple registers regardless of whether we're calling drawtile. Let's not.

Making Maploop Faster

maploop:
        ld a, (hl)
        bit 7, a
        jr z, skipdraw
        push hl
        push bc
        call drawtile
        ld a, 2
        out (0xfe), a
        pop bc
        pop hl

skipdraw:        
        dec hl
        dec b
        jp nz, maploop

The change simply moves the PUSHes after the JR Z and POPs before skipdraw. For a couple clock cycle savings, the inner maploop jump also changed from JR (which takes 12 clocks when it jumps, which is most of the time here) to JP (10 clocks always). This is an example of size-versus-speed optimization: the JR to JP change just traded 1 byte of RAM for 360 fewer clock cycles per frame. The PUSH/POP change was 4032 clocks per frame for no storage cost.

The result made everything so fast that we're idling in the top border again. Now, if we just had a register free we could replace the BIT instruction with CP against a register, which would take fewer clocks. CP against constant takes as much time as the BIT does.

But hey, we can shift..

maploop:
        ld a, (hl)
        rla
        jr nc, skipdraw
        push hl
        push bc

        srl a    

RLA rotates bits of register A to the left through the carry flag, and only takes 4 clocks, being one of the cheapest operations. We might have 0 or 1 in the carry before this instruction, so our bottom bit is garbage at this point, but that doesn't matter.

Later on, where we were using AND to mask off the top bit, we simply call SRL on the A register, doing unsigned shift to the right, which moves our bits back to their proper place and also gets rid of the bottom garbage bit. It's a tiny improvement, but like Depeche Mode sings, everything counts in large amounts. Or 8 bit platforms.

That saved 768 clocks per frame. I think we're good for now.

Pushing Rocks

One potential bug in the making that we have is that if the player moves both up and right at the same time, they will move diagonally. We don't want to allow this for fear of generating odd bugs, so we'll add early outs for player movement, so when one direction is accepted, we don't consider the rest, like so:

        ld a, (movekey)
        bit 0, a
        jr z, notup
        ld bc, -16
        add hl, bc
        jp moved    

The added JP after each direction does the job. The "moved" label is in the exact same place as the same as "notright" label, but we'll use a new label for clarity.

We add the check for pushing the stone after all the (currently) legal moves:

        jr z, moveok ; Open exit is fine
        cp 6
        jr z, movestone ; Check if we can push a stone
        jr movedone

And then we get to the stone movement proper.

movestone:
        ld a, (playerpos)
        sub l ; playerpos - target
        ; Let's limit stone pushing to horizontal movement
        ; -1 = 11111111, 1 = 00000001, 16 = 00010000, -16 = 11110000
        ; checking bit 0 is enough
        bit 0, a
        jr z, movedone

We start off by calculating the delta of player's previous position and the desired target position. The result is either 1, -1, 16 or -16. We'll rule out the vertical movement, although we may remove this check later as pushing a deadly rock upwards sounds like fun. We get away with checking a single bit to see whether this is vertical or horizontal movement.

        ld bc, hl  ; back up target position
        ld l, a    ; l = 1/-1
        ld a, c    ; a = target position
        sub l      ; a -= 1/-1
        ld l, a    ; hl = other side of target position

Next, we need to calculate the position on the other side of the rock. This involves juggling registers due to us wanting to preserve the target position and the fact that SUB only works on the register A.

        ld de, map
        add hl, de
        ld a, (hl)
        cp 0
        jr nz, movedone ; other side wasn't empty
canmovestone:
        ld (hl), 0x86 ; store stone in empty slot
        ld hl, bc  ; restore move target

After that we're in familiar territory again. We add the address of our map to the offset, fetch the tile and find out whether there's empty space on the other side of the rock. If not, we're done. Note that we don't need to restore HL because we won't be moving anywhere.

If we find the empty space, we'll fill it with the stone, restore HL and let the movement happen, which will overwrite the old position of the stone.

Diagonal Falling Physics We want the gems and rocks to roll off each other diagonally if they can. If there's space on the left and left-down tiles, we'll move rocks and gems there. This means we'll be rewriting most of the physics function.

physicsloop:
        ld a, (ix)
        cp 5
        jr z, physics_drop
        cp 6
        jr z, physics_drop

Off the bat, the rocks and gems now share the same physics code. Since the only real difference, from physics point of view, of rocks and gems is that moving stones will splat the player, we can just consider moving stones as a completely separate tile type (eventually), and thus gems and stones are identical here.

physics_drop:
        ld a, (ix+16)
        cp 5
        jr z, dropdiagonal_left
        cp 6
        jr z, dropdiagonal_left
        cp 0
        jr nz, physicsdone
        ld a, (ix)
        or 0x80
        ld (ix), 0x80
        ld (ix+16), a
        jr physicsdone

If the tile below is rock or gem, we'll try to move diagonally. We'll be deterministic and always try left first, then right. It might be visually more pleasing to vary this, but gameplay wise it's better to be predictable.

If there's empty space below, we'll fetch whatever tile we were moving, OR the dirty bit on it, and store it in the new place, and mark the current tile as empty.

dropdiagonal_left:
        ld a, ixl   ; Note: undocumented instruction
        and 15
        jr z, dropdiagonal_right ; Can't move, border
        ld a, (ix-1)
        cp 0
        jr nz, dropdiagonal_right ; can't move, stuff on the left
        ld a, (ix+15)
        cp 0
        jr nz, dropdiagonal_right ; can't move, stuff at target
        ld a, (ix)
        or 0x80
        ld (ix), 0x80
        ld (ix+15), a
        jr physicsdone

With the diagonal check we have to first see if we're on the border of the screen and disallow movement if we are. There's no official opcode for getting the low 8 bits of IX, but luckily there's plenty of undocumented opcodes that have been in use for so long that they're pretty safe to use. Here, we're using one - "LD A, IXL". If we didn't use this one, we'd have to get IX to some register we can access, such as BC, but there's no direct opcode for that either, so we'd have to go through stack..

So let's just use the undocumented opcode. Conveniently, sjasmplus supports those directly, so we don't need to type in cryptic strings of hex codes.

After the border check (which is identical to our player movement one) we check if there's empty space directly to the left, before checking the left-and-down position. If all these checks pass, we grab the tile we're moving, slap in the dirty bit and put it in its new place, clearing the original one.

dropdiagonal_right:
        ld a, ixl
        and 15
        cp 15
        jr z, physicsdone ; can't move, border

The diagonal right check is pretty identical, except that the right border check requires that additional CP, like with player movement, we bail out to physicsdone as we know there's no other place to go, and all the target offsets move to the right instead of left.

Things are coming together rather nicely. Soon we'll be needing more levels.

This chapter's version of the source is available here.

Size check! We're at 1797 bytes. No need to panic.

Any comments etc. can be emailed to me.