Let's again start off with a little optimization. I started to wonder why, excactly, is the map drawing loop (which scans through the map, skipping work as possible) taking so much more time than the physics loop (which scans through the map, skipping work as possible)?
The map drawing loop is pushing and popping couple registers regardless of whether we're calling drawtile. Let's not.
maploop: ld a, (hl) bit 7, a jr z, skipdraw push hl push bc
call drawtile ld a, 2 out (0xfe), a pop bc pop hl skipdraw: dec hl dec b jp nz, maploop
The change simply moves the PUSHes after the JR Z and POPs before skipdraw. For a couple clock cycle savings, the inner maploop jump also changed from JR (which takes 12 clocks when it jumps, which is most of the time here) to JP (10 clocks always). This is an example of size-versus-speed optimization: the JR to JP change just traded 1 byte of RAM for 360 fewer clock cycles per frame. The PUSH/POP change was 4032 clocks per frame for no storage cost.
The result made everything so fast that we're idling in the top border again. Now, if we just had a register free we could replace the BIT instruction with CP against a register, which would take fewer clocks. CP against constant takes as much time as the BIT does.
But hey, we can shift..
maploop: ld a, (hl) rla jr nc, skipdraw push hl push bc srl a
RLA rotates bits of register A to the left through the carry flag, and only takes 4 clocks, being one of the cheapest operations. We might have 0 or 1 in the carry before this instruction, so our bottom bit is garbage at this point, but that doesn't matter.
Later on, where we were using AND to mask off the top bit, we simply call SRL on the A register, doing unsigned shift to the right, which moves our bits back to their proper place and also gets rid of the bottom garbage bit. It's a tiny improvement, but like Depeche Mode sings, everything counts in large amounts. Or 8 bit platforms.
That saved 768 clocks per frame. I think we're good for now.
One potential bug in the making that we have is that if the player moves both up and right at the same time, they will move diagonally. We don't want to allow this for fear of generating odd bugs, so we'll add early outs for player movement, so when one direction is accepted, we don't consider the rest, like so:
ld a, (movekey) bit 0, a jr z, notup ld bc, -16 add hl, bc jp moved
The added JP after each direction does the job. The "moved" label is in the exact same place as the same as "notright" label, but we'll use a new label for clarity.
We add the check for pushing the stone after all the (currently) legal moves:
jr z, moveok ; Open exit is fine cp 6 jr z, movestone ; Check if we can push a stone jr movedone
And then we get to the stone movement proper.
movestone: ld a, (playerpos) sub l ; playerpos - target ; Let's limit stone pushing to horizontal movement ; -1 = 11111111, 1 = 00000001, 16 = 00010000, -16 = 11110000 ; checking bit 0 is enough bit 0, a jr z, movedone
We start off by calculating the delta of player's previous position and the desired target position. The result is either 1, -1, 16 or -16. We'll rule out the vertical movement, although we may remove this check later as pushing a deadly rock upwards sounds like fun. We get away with checking a single bit to see whether this is vertical or horizontal movement.
ld bc, hl ; back up target position ld l, a ; l = 1/-1 ld a, c ; a = target position sub l ; a -= 1/-1 ld l, a ; hl = other side of target position
Next, we need to calculate the position on the other side of the rock. This involves juggling registers due to us wanting to preserve the target position and the fact that SUB only works on the register A.
ld de, map add hl, de ld a, (hl) cp 0 jr nz, movedone ; other side wasn't empty canmovestone: ld (hl), 0x86 ; store stone in empty slot ld hl, bc ; restore move target
After that we're in familiar territory again. We add the address of our map to the offset, fetch the tile and find out whether there's empty space on the other side of the rock. If not, we're done. Note that we don't need to restore HL because we won't be moving anywhere.
If we find the empty space, we'll fill it with the stone, restore HL and let the movement happen, which will overwrite the old position of the stone.
We want the gems and rocks to roll off each other diagonally if they can. If there's space on the left and left-down tiles, we'll move rocks and gems there. This means we'll be rewriting most of the physics function.
physicsloop: ld a, (ix) cp 5 jr z, physics_drop cp 6 jr z, physics_drop
Off the bat, the rocks and gems now share the same physics code. Since the only real difference, from physics point of view, of rocks and gems is that moving stones will splat the player, we can just consider moving stones as a completely separate tile type (eventually), and thus gems and stones are identical here.
physics_drop: ld a, (ix+16) cp 5 jr z, dropdiagonal_left cp 6 jr z, dropdiagonal_left cp 0 jr nz, physicsdone ld a, (ix) or 0x80 ld (ix), 0x80 ld (ix+16), a jr physicsdone
If the tile below is rock or gem, we'll try to move diagonally. We'll be deterministic and always try left first, then right. It might be visually more pleasing to vary this, but gameplay wise it's better to be predictable.
If there's empty space below, we'll fetch whatever tile we were moving, OR the dirty bit on it, and store it in the new place, and mark the current tile as empty.
dropdiagonal_left: ld a, ixl ; Note: undocumented instruction and 15 jr z, dropdiagonal_right ; Can't move, border ld a, (ix-1) cp 0 jr nz, dropdiagonal_right ; can't move, stuff on the left ld a, (ix+15) cp 0 jr nz, dropdiagonal_right ; can't move, stuff at target ld a, (ix) or 0x80 ld (ix), 0x80 ld (ix+15), a jr physicsdone
With the diagonal check we have to first see if we're on the border of the screen and disallow movement if we are. There's no official opcode for getting the low 8 bits of IX, but luckily there's plenty of undocumented opcodes that have been in use for so long that they're pretty safe to use. Here, we're using one - "LD A, IXL". If we didn't use this one, we'd have to get IX to some register we can access, such as BC, but there's no direct opcode for that either, so we'd have to go through stack..
So let's just use the undocumented opcode. Conveniently, sjasmplus supports those directly, so we don't need to type in cryptic strings of hex codes.
After the border check (which is identical to our player movement one) we check if there's empty space directly to the left, before checking the left-and-down position. If all these checks pass, we grab the tile we're moving, slap in the dirty bit and put it in its new place, clearing the original one.
dropdiagonal_right: ld a, ixl and 15 cp 15 jr z, physicsdone ; can't move, border
The diagonal right check is pretty identical, except that the right border check requires that additional CP, like with player movement, we bail out to physicsdone as we know there's no other place to go, and all the target offsets move to the right instead of left.
Things are coming together rather nicely. Soon we'll be needing more levels.
This chapter's version of the source is available here.
Size check! We're at 1797 bytes. No need to panic.
Any comments etc. can be emailed to me.