Anyone familiar with the SPC700?

I'm writing an APU library for the SNES sound chip. Got it to reset and read back the init bytes, then upload a block of code, but that's it. I'm stuck.

Anyone familiar enough to lend a hand?

In case anyone has done some SNES hacking, this is what specifically happens:

I reset the SPC700, wait for 0xAA, 0xBB on port0/1. Check.

I send the destination RAM address to port2/3, 0x01 to port1, and 0xCC to port0, then wait for that to be echoed. Check.

I have a 10-byte assembly program I wrote to increment a counter, echo it to port0, then jump back to the IPL ROM:

00: mov y, #00  ; 0x8d 0x00
loop:
02: mov $f4, y  ; 0xcc 0xf4
04: inc y       ; 0xfc
05: bne loop    ; 0xd0 0xfb
07: jmp !ipl    ; 0x5f 0xc9 0xff

I store this in a character array, then sent it to 0xffa0. I wait for 0x09 on port0, showing the byte counter from the IPL ROM. Check.

Finally, I write the address of my code (0xffa0) to port2/3, 0x00 to port1, increment the counter by 2 (0x0B -- and I've tried higher values for fun), and send to port0. Then, wait for it to be echoed on port0. Check.

At this point, the SPC should be executing the code I've just dropped at 0xffa0. But no, instead I just continue to read 0x0B on port0. To make things even more interesting, I tried sending more data -- without restarting the protocol, just writing another byte to port1 and incrementing port0. To my surprise, it worked. It really shouldn't have worked. The APU should have jumped out of the communications routine by then.

I'm starting to wonder if my APU is bad. The disassembly of the IPL ROM shows "cmp y, port0" followed by "bpl transfer", followed by the jmp to the address written to port2/3. By this, if the unsigned comparison of the byte counter minus the value I write to port0 ends up setting the N flag (bit 7 high, aka negative), it should jump. 0x0A minus 0x0B should set bit 7 and fail the "bpl" test, and therefore jmp. But it either doesn't, or ends up branching back to the transfer loop somehow.

Never mind... I found it. There's a bug in the disassembly I was referencing to learn the opcodes.

mov $f4, y is actually 0xcb 0xf4. The 0xcc opcode is a three-byte mnemonic that takes a 16-bit address as destination, so the inc y instruction became its third parameter.