Maximum pin toggle speed

ok, i'm totally new with arduino plus I don't have a debugger, compiler or any other tool for the moment so I just play with my Macintosh, ATmega1280 board & official SDK arduino 15 downloaded from arduino website.

As you may have found in other thread, i'm working on very fast PWM for my power electronics inverters....

The average poor man's method I use to measure real CPU cycle is as follows

#define NOP __asm__("nop\n\t")

int N = 0;
// long N=0;
unsigned long time, time1, time2;

void setup()
{
  Serial.begin(9600);
}

void loop{
  if (Serial.available()) {
    val = Serial.read();
    if (val == '+') {
      N += 1000;
    } 
    if (val == '-') {
      N -= 1000;
      if (N<0) N = 0;
    }
  }
  
  time1 = micros();
  
  for (int i=0; i < N; i++){
//    __asm__("nop\n\t");
    PORTD |= B1000;
    PORTD &= B11110111;
//   if (UCSR0A & _BV(RXC0)) {
//    }
  }
  
  time2 = micros();

  Serial.print("N: ");
  time=time2-time1;
  Serial.print(time);
  Serial.print(" / ");
  Serial.println(N);
  delay(1000);
}

so with big N value, it converges with average ns measurement for whatever set of instruction I have inside for(i=0;i<;N;i++). For example, 444/1000 print output means 444ns or roughly 7 cycles at 16MHz.

This is how I observed a NOP was really 1 cycle, PORTD |= B1000 (setting bit 3 of PORTD) was 1 cycle,... 6 cycles for INT i and 10 cyles for LONG i overhead or surrounding management of for(i=0;i<;N;i++).

About my project, please note what i'm doing to generate high speed PWM

void loop()
{
 cli();  // turn off interrupts
 while (true) {
// Turns ON coil charging opto-coupler #1
    PORTH |= B10000;
    for(i=0;i<charge_on;i++) NOP;

// Turns OFF coil charging opto-coupler #1
    PORTH &= B11101111;
    for(i=0;i<charge_off;i++) NOP;
  
// Turns ON coil FE extracting opto-coupler #2
    PORTA |= B1;
    for(i=0;i<extract_on;i++) NOP;

// Turns OFF coil FE extracting opto-coupler #2
    PORTA &= B11111110;
    for(i=0;i<extract_off;i++) NOP;

    if (UCSR0A & _BV(RXC0)) { // check uart  (register name changes per port)
      break;  // looks like there is data.  Break out of loop to handle it
    }
  } // end of time critical loop
  sei();  // interrupts back on
  delay(10); // wait for some characters to arrive
  while (Serial.available()) {
// Macintosh serial monitor parameter management to update 4 loops
  }
}

so which is why I need to know while(true) overhead to compensate the computation of charge_on, charge_off, extract_on & extra_off values to get precise duty cycle. Please note I already compensated by including 1 cycle for PORTX writing along with 2 cycles for USB RX from if (UCSR0A & _BV(RXC0)) and 6+1 cycles for local for(i=0;i<...) NOP

I don't have a debugger, compiler or any other tool for the moment

Sure you do; some of them are hiding inside the arduino distribution. Off in /hardware/tools/avr/bin/*, you'll find a bunch of the standard gcc tools:

BillW-MacOSX-2<1011> ls /Applications/arduino/arduino-0014/hardware/tools/avr/bin/
avarice*      avr-gcc*      avr-gprof*      avr-project*      ice-insight*
avr-addr2line*      avr-gcc-3.4.6*      avr-help*      avr-ranlib*      kill-avarice*
avr-ar*            avr-gcc-4.3.2*      avr-info*      avr-readelf*      libusb-config*
avr-as*            avr-gcc-select*      avr-ld*            avr-size*      make*
avr-c++*      avr-gccbug*      avr-man*      avr-strings*      simulavr*
avr-c++filt*      avr-gcov*      avr-nm*            avr-strip*      simulavr-disp*
avr-cpp*      avr-gdb*      avr-objcopy*      avrdude*      simulavr-vcd*
avr-g++*      avr-gdbtui*      avr-objdump*      ice-gdb*      start-avarice*

I'm not entirely sure which ones work without special hardware support (JTAG ICE/etc) that isn't on Arduino, but I make pretty extensive used of avr-size and avr-objdump (which does disassembly.) After you download a sketch to your arduino, you'll have an "applet" subdirectory of the sketch directory that will contain standard format binaries and stuff:

BillW-MacOSX-2<1012>  pwd
/Users/billw/Documents/Arduino/-test-/Blink_double/applet
BillW-MacOSX-2<1013> ls
Blink_double.cpp      Print.cpp.o            wiring_analog.c.o
Blink_double.cpp.o      WInterrupts.c.o            wiring_digital.c.o
Blink_double.eep      WMath.cpp.o            wiring_pulse.c.o
Blink_double.elf*      core.a                  wiring_shift.c.o
Blink_double.hex      pins_arduino.c.o
HardwareSerial.cpp.o      wiring.c.o

I tried your test code with N=1000 on my duemilanove (and other numbers with similar results):

  time1 = micros();
  
  for (int i=0; i < N; i++){
    PORTD |= B1000;
    PORTD &= B11110111;
  }
  
  time2 = micros();

As shown, I get "628/1000" or 10.048 cycles per loop.
If I remove the first PORTD line, I get "504/1000" or 8.064 cycles per loop, which sure looks like 2 cycles for the bit set to me...

Oups, you're entirely right ::slight_smile:

I did benchmark with only

  for (int i=0; i < N; i++){
    PORTD = B1000;
    PORTD = B11110111;
  }

Many thanks, you helped found me a bug :wink:

So about my code, in particular last PORT call (extract_off), how many cycles the while(true) will use ?

Please note that I TDMA frame my pulses so i'm not using while() to toggle but rather tune parameters with cycle offset corrections to get proper timing.

Worthwhile noting if 2 pins belong same port, we can set or clear within same cycle two outputs.

Again many thanks for all your support on this thread & the other one about scaning USB activity in 2 cycles.

I've just used:

        avr-objdump -S mysketch.elf > mysketch.lss

in the applet directory of my sketch to produce a nice assembly listing (If you're not on Windows, replace '>' with the pipe mechanism for your OS).
Selfonlypath, I think you'll find it useful to pick through the .lss file. It should save you having to time everything to find your best solution. I have some sympathy with you doing cycle-counting - I've been cycle-counting video code recently to ensure jitter-free video output.

Re: using PIND to toggle bit-states.
It doesn't leave the cleanest of aftertastes. I think using this method has 2 downsides: a) It's the opposite of self-documenting code (self-obfuscating?) and b) It's not available on all Arduino hardware (e.g. Mega8). Hopefully the very smart people who develop avr-gcc will work out a way of optimising PORTD ^= 0b00001000 to the PIND equivalent one day (but I can see that being rather tricky to do).

hello westfw and condemned,

ok, from my Mac, i can see & go to /Applications/arduino-0015/hardware/tools/avr/bin/avr-objdump but how do I launch avr-objdump - S on my sketch.elf which is stored on another directory ?

About max toggling, does goto instead of while(true) has same assembly code & overhead / duty cycle time at each run ?

  while (true) {
    PORTD |= B1000;
    PORTD &= B11110111;
  }
m:  PORTD |= B1000;
    PORTD &= B11110111;
    goto m;
  }

Caution ahead: GOTO usage. ;D

Lefty

The "while (1)" compiles to a single jmp instruction at the end of loop (2 cycles.) You can't do any better than that on an AVR.

The FOR loop looks pretty optimal for a counted loop as well. The AVR doesn't have any of those fancy "decrement and loop if non-zero" instructions, so the choices of looping code is pretty limited. It looks like you can save ONE cycles of an int-based for loop by counting down to zero instead of up to N, since it has to load a non-zero compare value into a register to do the "double precision" compare.

Have you looked at the video output code some people are doing (http://www.arduino.cc/cgi-bin/yabb2/YaBB.pl?num=1240539968 - "teleMate shield")? While the end application is quite different than yours, the low level coding issues - coming up with a high-accuracy and high-speed bit stream on an AVR pin - are very similar!

Hey westfw,

i'm still having difficulties running AVR MacPack:
http://www.arduino.cc/cgi-bin/yabb2/YaBB.pl?num=1245125011

so can you confirm this math model on how many extra cycles (overhead, surroundings,...) because I can only for the moment inspect by benchmarking or cycle-counting as I explain yesterday :cry:

Q1: While(true) only uses 2 cycles ?

Q2: For(i:=0;i<n;i++) uses 2 cycles to initialize then 6 cycles to either jump loop or leave the loop once completed ?

Q3: About your suggestion to decrement loop, do you confirm initialization will also take 2 cycles (starting from i=0 might not require same cycle as starting from i= non zero value).

Sorry for all this questions but I feel you understand what i'm trying to do: have precise math model of every cycle, every instructions to generate high speed multi PWM requiring reverse-engineer to set up parameters of for(), while(),...

Q1: While(true) only uses 2 cycles ?

Yes.

Q2: For(i:=0;i<n;i++) uses 2 cycles to initialize then 6 cycles to either jump loop or leave the loop once completed ?

Hmm. I haven't been paying much attention to initialization. It looks like it's got additional dependencies on whether n is constant or variable, and exact value if its a constant? I still think you should use delay_loop_1() for your inner timing loops (or just strings of nops, depending on just how fast you need.) (example of delay_loop_1() posted in the other thread.) Each loop computation and jump is 7 cycles (for a 16bit loop variable) EXCEPT for the last one that drops out the bottom (6 cycles.)

Q3: About your suggestion to decrement loop, do you confirm initialization will also take 2 cycles (starting from i=0 might not require same cycle as starting from i= non zero value).

Good point. I think initialization is same for zero or non-zero, since there is no carry involved. The end-of-loop test for zero is shorter because there is a "known zero" register to compare against for zero check, but it has to load a register (or at any rate, the sample code DID load a register) to do the equivalent of "compare with borrow" against a non-zero 16bit constant. (Hmm. Why didn't it move that register load outside of the actual loop, eh? Perhaps because arduino compilation gives the compiler the "optimize for size" switch and they're the same size?)

I feel you understand what i'm trying to do: have precise math model of every cycle, every instructions to generate high speed multi PWM requiring reverse-engineer to set up parameters of for(), while(),...

Yes, but you SHOULDN'T be counting on the compiler to produce the same code from version to version. That's why there are those inline asm functions like delay_loop_1() designed to look like C code but actually use CONSTANT assembler structure for this sort of timing. If you really need things accurate down to single cycles, you should bite the bullet and write pure assembler., very carefully. If you need things down to (say) +/- 4 cycles, I'd feel pretty confident using the delay macros inside of C constructs. If you can withstand +/- 10 cycles you can probably get away with pure C as long as you pay attention each time the compiler changes...

Many many many thanks westfw.

I've tried & benchmarked delay_loop_1 and delay_loop_2 per your explanation in the other thread. Works fine as you predicted: 3N cycles for delay_loop_1 and 4 cycles for delay_loop_2.

I don't know if it is important but it seems delay_loop_1(0) or delay_loop_2(0) do not work giving extremely long return time but that is OK for my application which forces a non-negative value to not blow up my MOSFET.

It is really interesting to note that for(i=0;i<n;i++) NOP will give average 7 cycles per iteration if i UNSIGNED but average 8 cycles per iteration if i SIGNED as you mentionned.

Anyway, calling x unsigned or signed delay_loop_x(n) gives same running time plus I really save a lot of timing hence i'll be able to go PWM higher precision than using for(i=0;i<n;i++) NOP.

Amicalement, Albert

Code:
cli();
while (1) {
PORTD |= 0x8;
PORTD &= ~0x8;
}

on the same board runs at 2.667MHz. (This does produce the minimal sbi/cbi/rjmp loop that you'd expect, BTW.)
(so that's about a 20x penalty for the arduino library code; sounds about right: the overhead of abstracting IO to "pin number" is pretty substantial: a subroutine call, lookup table to get the port, another lookup table to get the bit, a third to check whether analogWrite is in use, and then less efficient instructions to access the port "indirectly")

I don't know if this is of any interest but i've lately figure out how to generate fast PWM, phase correct PWM using 8bits and 16 bits timers hence freeing CPU for my project.

Could be wrong but if setting a timer in fast PWM mode and TOP=OCRnA=2, you can reach max frequency of 16MHz/(1+TOP) hence 5.333MHz and OCRnB=1. There seems to be a specific auto-toggling mode in the timer providing a 50% PWM at 8MHz.

The advantage is ultra fast PWM and freeing CPU.

The disavantage is only limited timers in duamilanove, a bit more on mega so if one needs many pins fast toggling, the quoted method is the best.

If you have a sample sketch, I'll be happy to throw the output into my scope and measure it to make sure!

Here is the sketch for a mega board or duamilove, don't know which you have but I have both which should generate 5.333MHz on Pin12

#include <util/delay_basic.h>

int outputPsuB = 12;  // Timer1-B

void setup()
{
// outputs via timer1
  pinMode(outputPsuB, OUTPUT);  // select Pin as ch-B

  TCCR1A = B00100011; // Fast PWM change at OCR1A
  TCCR1B = B11001;  // System clock
  OCR1A = 2; // 5.333 MHz
  OCR1B = 1; // 50% PWM
}

void loop()
{
// do what ever you want with full 100% CPU
}

If it works on your scope, you might then try OCR1A=5 and should get your initial case of 2.667MHz and choose different PW via OCR1B value from 0 to OCR1A !

This other sketch should give 8MHz toggling on Pin11 but only 50%PW that cannot be changed

#include <util/delay_basic.h>

int outputPsuA = 11;  // Timer1-A

void setup()
{
// outputs via timer1
  pinMode(outputPsuA, OUTPUT);  // select Pin as ch-A

  TCCR1A = B01000011; // Fast PWM change at OCR1A
  TCCR1B = B11001;  // System clock
  OCR1A = 0; // 8 MHz
}

void loop()
{
// do what ever you want with full 100% CPU
}

There are many other possibilities if you have a mega board

Using digitalWrite() on a non-PWM pin (4 instead of 3) runs about 148.4kHz instead of 106.8kHz:

I recently wrote a speed optimized digitalWrite() for Teensyduino. It runs that loop at 223 kHz on non-PWM pins and 195 kHz on PWM pins.

If anyone's interested in porting it back to the Arduino core, it's available and open source. Just run the installer, then look for it in pins_teensy.c inside the teensy_serial or teensy_hid directories.

Here is the sketch for a mega board or duamilove, don't know which you have but I have both which should generate 5.333MHz on Pin12

Ah. The timer output to arduino pin mapping is different on Mega vs Diecimila; the signal shows up on pin 10 of the Diecimila. Once I corrected for that I did indeed get 5.33MHz on pin 10. Not a square wave, though; high for twice as long as it is low.

The timer output to arduino pin mapping is different on Mega vs Diecimila; the signal shows up on pin 10 of the Diecimila. Once I corrected for that I did indeed get 5.33MHz on pin 10.

Glad it worked on your side. :wink:

Yes about pin, there are a bit different depending diecemila / duamilanove or mega eventhough the core sketch I gave you works on both.

I suggest you try my second sketch to find out ultimate case of 8 MHz on pin9 of diecemila instead of pin 11 of mega.

This is very useful doc from mem arduino member http://spreadsheets.google.com/pub?key=rtHw_R6eVL140KS9_G8GPkA&gid=0

Not a square wave, though; high for twice as long as it is low.

I'm sure you know this already but to not create confusion, the timers really work at correct frequencies but what happens is the arduino drivers and/or Atmega drivers are not enough fast to generate sharp PW waves. You might want to plug an external fast speed driver IC then you'll get nice waveform on your scope.

The first example really does generate a non-square wave. It ought to be obvious: 5.33 = 16/3, so it's a 3-cycle waveform. Lacking any sort of clock multiplier, that's going to be two cycles in one state and one in the other state... (or, you're counting from 0 to 2 and flipping the bit at 1, so depending on which edges are involved, we have one state for 0,1 and the other for 2, or one state for 0, and the other for 1,2...)

The second example did generate a 8Mhz square wave (on pin 10 of the diecimila, I think. I didn't check both 9 and 10.) Interpret the code as "count from 0 to 0 and toggle the output each time you finish (ie toggle every cycle, so output = clock/w))

The timer-generated output signals are quite a bit less interesting than the software-controlled ones, IMO. Although people seem to have used such things to help drive LCD displays in a jitter-free fashion... Hmm.

About the first example, then you might want to try

  • PWM, Phase & frequency correct
  • PWM, Phase correct
    because I gave you the code for fast PWM which goes higher freq but is less jitter stable.

For Fast PWM and OCR1A=3, timer1 frequency=16MHz/(1+TOP)=16MHz/3

For PWM phase and/or frequency correct, timer1 frequency is 16MHz/(2*TOP). Since your initial sofwtare controlled code was producing 2,667 MHz, you could choose OCR1A=TOP=2 in order to get 4MHz and setting OCR1B=1 to get 50% PWM. The phase and/or frequency correct does not provide maximum frequency but does provide very stable jitter due to its counting up-down timer construction.

Anyway, we're clearly pushing the limits of Atmega toggling outputs frequency :wink:

In any case, you need good external fast drivers otherwise signal distorsion will occur.

Could you post the code? Please?

The basic code I am using, from the Fade example, has a base frequency of about 500Hz. To simplify and reduce the size of a lot of the inductors it would be nice to work at 50kHz.

bconley at circuitsvilleeng dot com