Go Down

Topic: How quickly I can change pin states on Arduino? (Read 1 time) previous topic - next topic

westfw

Quote
I don't see what this is to do with the shift registers

Load the input of 8 (6) parallel shift registers with one write, fiddle the clock with the next write.
6x the throughput of single-bit-at-a-time...

Nick Gammon

In my post above about sending data to VGA I managed to get a byte out in 6 cycles:

Code: [Select]
while (i--)
    PORTD = * messagePtr++;


Generated code:

Code: [Select]
  while (i--)
    PORTD = * messagePtr++;
(2) 194: 89 91        ld r24, Y+
(1) 196: 8b b9        out 0x0b, r24 ; 11
(1) 198: 91 50        subi r25, 0x01 ; 1
(2) 19a: e0 f7        brcc .-8      ; 0x194

-------
6 cycles in loop = 375 nS


If you unrolled the loop I suppose you could get a byte out in 3 cycles (you wouldn't need to subtract 1 from i, nor do a branch).
Please post technical questions on the forum, not by personal message. Thanks!

More info:
http://www.gammon.com.au/electronics

TanHadron

If you unrolled the loop and put the data in immediate mode, you could get two cycles:

Code: [Select]

  PORTD = 0x24;
  PORTD = 0x35;
...

(1)  ldi   r24, 0x24
(1)  out  0x0b, r24
(1)  ldi   r24, 0x35
(1)  out  0x0b, r24
...

JarkkoL

#18
Jun 22, 2013, 05:54 am Last Edit: Jun 22, 2013, 06:00 am by JarkkoL Reason: 1

Code: [Select]
  while (i--)
    PORTD = * messagePtr++;
(2) 194: 89 91        ld r24, Y+
(1) 196: 8b b9        out 0x0b, r24 ; 11
(1) 198: 91 50        subi r25, 0x01 ; 1
(2) 19a: e0 f7        brcc .-8      ; 0x194

-------
6 cycles in loop = 375 nS


Nice to see gcc optimizes the loop so well and there's no need for inline asm. Didn't know there was instruction that does both load with post increment. For shift registers you need to add two out calls there to signal the register for the data so it comes up to 8 cycles. However since I might need to route the data through several microcontrollers which redirect it to shift registers, maybe it's possible to optimize the clock ticking (e.g. pass data on both rising and falling clock edge).


If you unrolled the loop and put the data in immediate mode, you could get two cycles:

I need this data to be read from memory because it's supposed to be streamed in via USB or something.

Nick Gammon


Nice to see gcc optimizes the loop so well and there's no need for inline asm. Didn't know there was instruction that does both load with post increment.


This is one of the reasons I recommend against using asm unless you absolutely have to (which is practically never).

The compiler generates good code, and unless you are very, very familiar with the underlying hardware (as the compiler-writers happen to be) you may choose sub-optimal ways of solving the problem.

By all means decompile and see what is generated. That can give hints about ways of optimizing (for example) how you store data in arrays. But ultimately you practically never need to out-guess the compiler.
Please post technical questions on the forum, not by personal message. Thanks!

More info:
http://www.gammon.com.au/electronics

Go Up