Now that I look at the graphic again, it looks like it is 1 uS per division. So a byte is clocked out in 2 uS, is that right? That's too fast to monitor, I had to slow down my master somewhat when testing that code. And it isn't the code's fault.
Unless you can slow down the AVR (perhaps you can) you might need faster hardware. For example, an FPGA board might be able to capture it fast enough, in bursts.