You are only waiting 20uS, not long for the slave to do all that work, and that's MINUS the time it takes to transmit a byte because you are waiting from the start of transmission but the slave doesn't get the interrupt until the end of transmission. It's also minus the 3-4uS interrupt latency.
I would try increasing that delay, even to 1mS just to remove that as a factor.
Rob