How quickly I can change pin states on Arduino?

JarkkoL · June 21, 2013, 4:38am

Hi,

I'm thinking of implementing a project which would require quite a lot of data to be transferred to shift registers and I'm thinking what the limitations in performance I could expect from Arduino to plan the project properly. Instead of using digitalWrite() that sets the state of pins individually my understanding is that I can write the state directly to PORTA/B/C/D global variables, which sets the state of 8 pins in each port at once. If I understood correctly, memory write on ATMega takes 2 cycles, so at 16MHz that would mean I can change the state of 8 pins at 8MHz, is that right?

Thanks, Jarkko

Grumpy_Mike · June 21, 2013, 4:51am

Well you can write at that rate but you have to do something else like fetch the data and that will slow you down.
For transferring data to a shift register look at using the SPI interface, I think that will run at 1MHz

graynomad · June 21, 2013, 5:06am

change the state of 8 pins at 8MHz, is that right?

Only if the writes are inline, if you have a loop you have to add the JMP time.

Also you can use the IN and OUT instructions I would assume, they work in 1 cycle.

But you are getting data from somewhere and having to shift the bits etc, that will slow things down again.

Depending on the nature of the transfers you can organise the data in RAM such that all the clocking and data information is decoded into an array, then blat that array as fast as possible to a port.

Is there any reason for not using the SPI hardware?

Rob

CrossRoads · June 21, 2013, 5:22am

SPI is quickest - 8 MHz clock, can send out a byte about every 1 uS if you do it right.
This test spat out 41 bytes in 46uS I think, 58uS for the loop, pulling data from an array to send out.

simpletestR2.ino (6.13 KB)

nickgammon · June 21, 2013, 5:23am

JarkkoL:
I'm thinking of implementing a project which would require quite a lot of data to be transferred to shift registers

Read this post about clocking out pixel data to a VGA monitor:

I don't think you can get much faster than that. Using SPI (a shift register, effectively) it clocks out one bit every two clock cycles (ie. every 125 nS), however there is a one clock cycle gap between bytes.

In the 17 clock cycles that the hardware is clocking out that byte (8 bits) you just have time to load up the next byte from the array in memory where you have them waiting.

westfw · June 21, 2013, 6:37am

there is a one clock cycle gap between bytes.

I've heard that you can eliminate this by using "the usart in SPI mode", on those devices that have the more broadly "universal" USART.
(I think that includes the 328. It's definitely getting attention on the Xmega chips, which have many usarts AND dma; people over on avrfreaks are doing "interesting stuff."

nickgammon · June 21, 2013, 8:02am

Interesting point. I don't believe my testing confirmed that, but I'm willing to try again. It was like (strangely) that the hardware needed one extra clock cycle to "recover" from sending that byte.

nickgammon · June 21, 2013, 8:03am

DMA would be another kettle of fish, of course. I don't recall seeing reference to it on the Atmega chip lines.

system · June 21, 2013, 1:27pm

JarkkoL:
require quite a lot of data to be transferred to shift registers

How much data are you talking about here? Where is it coming from?

My limited experience with shift registers is as serial in / parallel out devices i.e. one shift register per pin. Are you writing to multiple shift registers simultaneously?

CrossRoads · June 21, 2013, 2:25pm

Did we lose the OP?

graynomad · June 21, 2013, 4:21pm

Looks like it.

Rob

westfw · June 21, 2013, 7:03pm

Back to the original subject.

I can change the state of 8 pins at 8MHz

Theoretically. Note that the ATmega328 on the Uno only has one IO port (PORTD) with a full 8 pins, and those pins include the serial rx/tx pins.
If you want to maintain serial connectivity as well, you end up with several possibilities for writing 6 pins at 8MHz...

CrossRoads · June 21, 2013, 7:24pm

I suppose if you had:

label A;
PORTD = (PORTD & B00000011) | B10101011;
PORTD = (PORTD & B00000011) | B01010111;
GOTO A; // or however labeling works

you could flip all but the Rx/Tx bits pretty quick.
Hard to do much else tho. Anything like pulling the data from an array would slow it down:
// clears all but Rx/Tx lets bits 7:2 be set
PORTD = (PORTD & B00000011) | (dataArray[0] & B11111100); // set the 6 bits per whats in the array
PORTD = (PORTD & B00000011) | (dataArray[1] & B11111100); // set the 6 bits per whats in the array
PORTD = (PORTD & B00000011) | (dataArray[2] & B11111100); // set the 6 bits per whats in the array
PORTD = (PORTD & B00000011) | (dataArray[3] & B11111100); // set the 6 bits per whats in the array
PORTD = (PORTD & B00000011) | (dataArray[4] & B11111100); // set the 6 bits per whats in the array
PORTD = (PORTD & B00000011) | (dataArray[5] & B11111100); // set the 6 bits per whats in the array
PORTD = (PORTD & B00000011) | (dataArray[6] & B11111100); // set the 6 bits per whats in the array
PORTD = (PORTD & B00000011) | (dataArray[7] & B11111100); // set the 6 bits per whats in the array

Don't do this in a for loop, that adds 12uS with each jump to the next for value.

nickgammon · June 21, 2013, 11:40pm

westfw:
Theoretically. Note that the ATmega328 on the Uno only has one IO port (PORTD) with a full 8 pins, and those pins include the serial rx/tx pins.
If you want to maintain serial connectivity as well, you end up with several possibilities for writing 6 pins at 8MHz...

In any case I don't see what this is to do with the shift registers the OP mentioned.

JarkkoL · June 21, 2013, 11:55pm

Thanks for the replies and sorry the delay!

Graynomad:
Is there any reason for not using the SPI hardware?

I didn't know about SPI, and thanks for pointing it out. If I got it right, there's single SPI on Uno, and that's able to push data at 8Mbits/s. I believe transfer() call to SPI is asynchronous so that data loads & looping can be done in parallel on the CPU while SPI feeds the data to the pins?

PeterH:
How much data are you talking about here? Where is it coming from? My limited experience with shift registers is as serial in / parallel out devices i.e. one shift register per pin. Are you writing to multiple shift registers simultaneously?

I'm looking to control ~300 RGB LED's, where each RGB channel takes 8-bit PWM data. This is rotating LED cylinder with radius of ~15cm and I'm looking for ~5mm resolution, so the cylindrical "display" size is 28200 pixels. I would like to have refresh rate of 20fps, so that would require 28200203*8 bits/sec = ~13Mbits/sec data pushed from the main board. Of course the specs are adjustable to whatever is feasible to implement, but higher the better. While I would like to push the data directly to shift registers, I think I need additional microcontrollers in between, because the cylinder probably need to rotate faster than the actual update rate to maintain steady image (e.g. 100fps).

westfw:
Theoretically. Note that the ATmega328 on the Uno only has one IO port (PORTD) with a full 8 pins, and those pins include the serial rx/tx pins.

I checked that it should take 8 cycles to update 8 bits of data from an array to the PORTD. If I unroll the loop few times you could shave off ~couple of cycles for higher transfer rate:

read data from memory to a register (2 cycles)
out the register to port D (1 cycle)
out 1 to cycle pin (1 cycle)
out 0 to cycle pin (1 cycle)
increment data address counter (1 cycle)
compare and loop (2 cycles)
So at 16MHz that should give 16Mbits/sec transfer speed.

Jarkko

westfw · June 22, 2013, 12:00am

I don't see what this is to do with the shift registers

Load the input of 8 (6) parallel shift registers with one write, fiddle the clock with the next write.
6x the throughput of single-bit-at-a-time...

nickgammon · June 22, 2013, 12:34am

In my post above about sending data to VGA I managed to get a byte out in 6 cycles:

while (i--)
    PORTD = * messagePtr++;

Generated code:

  while (i--)
    PORTD = * messagePtr++;
(2) 194:	89 91       	ld	r24, Y+
(1) 196:	8b b9       	out	0x0b, r24	; 11
(1) 198:	91 50       	subi	r25, 0x01	; 1
(2) 19a:	e0 f7       	brcc	.-8      	; 0x194 

-------
6 cycles in loop = 375 nS

If you unrolled the loop I suppose you could get a byte out in 3 cycles (you wouldn't need to subtract 1 from i, nor do a branch).

TanHadron · June 22, 2013, 12:52am

If you unrolled the loop and put the data in immediate mode, you could get two cycles:

  PORTD = 0x24;
  PORTD = 0x35;
...

(1)  ldi   r24, 0x24
(1)  out  0x0b, r24
(1)  ldi   r24, 0x35
(1)  out  0x0b, r24
...

JarkkoL · June 22, 2013, 3:54am

  while (i--)
    PORTD = * messagePtr++;
(2) 194:	89 91       	ld	r24, Y+
(1) 196:	8b b9       	out	0x0b, r24	; 11
(1) 198:	91 50       	subi	r25, 0x01	; 1
(2) 19a:	e0 f7       	brcc	.-8      	; 0x194 

-------
6 cycles in loop = 375 nS

Nice to see gcc optimizes the loop so well and there's no need for inline asm. Didn't know there was instruction that does both load with post increment. For shift registers you need to add two out calls there to signal the register for the data so it comes up to 8 cycles. However since I might need to route the data through several microcontrollers which redirect it to shift registers, maybe it's possible to optimize the clock ticking (e.g. pass data on both rising and falling clock edge).

TanHadron:
If you unrolled the loop and put the data in immediate mode, you could get two cycles:

I need this data to be read from memory because it's supposed to be streamed in via USB or something.

nickgammon · June 22, 2013, 4:38am

JarkkoL:
Nice to see gcc optimizes the loop so well and there's no need for inline asm. Didn't know there was instruction that does both load with post increment.

This is one of the reasons I recommend against using asm unless you absolutely have to (which is practically never).

The compiler generates good code, and unless you are very, very familiar with the underlying hardware (as the compiler-writers happen to be) you may choose sub-optimal ways of solving the problem.

By all means decompile and see what is generated. That can give hints about ways of optimizing (for example) how you store data in arrays. But ultimately you practically never need to out-guess the compiler.

Topic		Replies	Views
How fast can I update 84 7-segment displays and 42 RGB-LED’s using shift registe Project Guidance	10	785	May 5, 2021
Digital input - state change speed Sensors	4	702	May 6, 2021
Faster than shiftOut Programming Questions	11	4432	May 5, 2021
Digital Pin HIGH LOW state change speed General Discussion	4	2560	May 5, 2021
Maximum Shift Register for a set of Arduino pins used General Electronics	6	1387	May 6, 2021

How quickly I can change pin states on Arduino?

Related Topics