High frequency PWM

hello,

I need very high frequency PWM then time chaining two PWM outputs to drive high power MOSFET & power electronics inverters.

#define NOP __asm__("nop\n\t")

int outputPin1 = 22; // PORTA - bit #1
int outputPin2 = 7; // PORTH - bit #4
int i, charge_on, charge_off, extract_on, extract_off;
int duty_charge, duty_blank, duty_extract;
long freq;

void setup()
{
  Serial.begin(9600);
  pinMode(outputPin1, OUTPUT);
  pinMode(outputPin2, OUTPUT);
  freq=23000; // Hz
  duty_charge=55; // %
  duty_blank=5; // %
  duty_extract=35; // %
  frame();
}

void loop()
{
// Turns ON coil charging opto-coupler #1
  PORTH |= B10000;
  for(i=0;i<charge_on;i++) NOP;

// Turns OFF coil charging opto-coupler #1
  PORTH &= B11101111;
  for(i=0;i<charge_off;i++) NOP;
  
// Turns ON coil FE extracting opto-coupler #2
  PORTA |= B1;
  for(i=0;i<extract_on;i++) NOP;

// Turns OFF coil FE extracting opto-coupler #2
  PORTA &= B11111110;
  for(i=0;i<extract_off;i++) NOP;
}

void frame()
{
float fperiod, fduty_charge, fduty_blank, fduty_extract;
int period;

  fperiod=2272727.0 / ((float) freq);
  period=(int) (fperiod+0.5);
  fduty_charge=fperiod*((float) duty_charge)/100.0;
  fduty_blank=fperiod*((float) duty_blank)/100.0;
  fduty_extract=fperiod*((float) duty_extract)/100.0;
  charge_on=(int) (fduty_charge+0.5);
  charge_off=(int) (fduty_blank+0.5);
  extract_on=(int) (fduty_extract+0.5);
  extract_off=period-charge_on-charge_off-extract_on;
}

Once frequency gets high and duty cycle low, it is not precise anymore due to overhead time of for(i=0;i<N;i++).

For big values of N, I get average 699ns run time for each NOP if LONG I or average 444ns run time each NOP loop if INT i so very far from theoretical 62,5ns.

When N is small, this number gets much higher due to internal overhead control management of for(i=0;i>…)

Can you teach me or show me a code much faster (guess in ASM but i’m new to AVR) where my need is to SW by a variable control how much NOP have to be called.

Thank you, Albert

Yes, writing critical timing code gets pretty complex. Your NOP takes 1/16 microsecond, but the surrounding for loop is significantly more substantial and uses slower instructions, not to mention being subject to C's weird ideas about how to do intermediate math, changes in the way the compiler optimizes things in general. You should look at the "precise" cycle-accurate delays in the include/util/delay_basic.h file

_delay_loop_1(uint8_t N)     // apparently runs for 3*N cycles plus setup overhead
_delay_loop_2(uint16_t M)    // apparently runs for 4*N cycles (plus setup overhead.)

delay.h from the same directory is probably not directly useful to you, since its macros are designed to take floating point CONSTANTS as arguments, but it uses the macros from delay_basic.h at its core, and you might get some hints about converting arbitrary numbers of microseconds into appropriate arguments for the more primitive macros...

Some of the more complex issues were discussed in this thread on max pin toggle speed: http://www.arduino.cc/cgi-bin/yabb2/YaBB.pl?num=1230286016

Where do I find delay_basic.h ?

How can I compile & link succesfully _delay_loop_1(N) in my arduino loop() ?

I’ve done some benchmarking with my board and it seems for(i=0;i<n;i++) will use as overhead surrounding 6 cycles per iteration if i & n are INT, then if there is a NOP inside, it would be 7 cycles,… but maybe my method to measure is not so good !

Do you have any idea on how much overhead _delay_loop_1(N) would use ?

/hardware/tools/avr/avr/include/util/delay_basic.h

It’s part of the standard avrgcc support, so you can just put
#include <util/delay_basic.h>
at the top of your sketch.

They’re implemented as inline functions written in assembler, and I’m not sure what the overhead of getting into them is. The loops themselves are minimal:

delay_loop_1: dec reg
    brne delay_loop_1

delay_loop_2: sbiw reg, 1
    brne delay_loop_2

If you don't mind, could you just write the code from sketch point of view to call say delay_loop_1(N) with N being a int variable in my sketch.

Same question if calling delay_loop_2(N)

Thank you in advance

Like this? (The code would look the same with delay_loop_2, it’d just be slightly slower.)
It’s supposed to make a nice square wave, 17 cycles on, 17 cycles off. (hmm. I suppose I should have aimed for 16 cycles each on and off, a nice 500kHz square wave… Oh well.) (note that the missing cycle on the last jump that isn’t taken is nicely filled in by the initialization of the asm loop variable.)

#include <util/delay_basic.h>
#define NOP __asm__("nop\n\t")
/*
 * The overhead for a "for" loop (with 16 bit int) in avrgcc is 7 cycles.
 * Two for a 16bit add, three for a 16bit compare (against a constant > 255), and 2 for the jump
 *
 * for an 8bit for loop, there is one cycle for add, one for compare, and two for jump
 */
#define FOR_LOOP_OVEREHEAD16 7
#define FOR_LOOP_OVERHEAD8 4

void setup()
{
  Serial.begin(9600);
}

void loop() {

  for (byte i=0; i < 100; i++){
    PORTD |= B1000;       // 2 cycles
    _delay_loop_1(5);     // 15 cycles
    PORTD &= B11110111;   // 2 cycles
    _delay_loop_1(3);     // 9 cycles
    NOP;                // 2 cycles of nops
    NOP;
  }                       // 4 cycles of "for" overhead

  delay(1000);
}

It disassembles as:

 c0:   35 e0           ldi     r19, 0x05       ; 5  (loading of constants into registers)
  c2:   23 e0           ldi     r18, 0x03       ; 3
  for (byte i=0; i < 100; i++){
    PORTD |= B1000;       // 2 cycles
  c4:   5b 9a           sbi     0x0b, 3 ; 11
  c6:   83 2f           mov     r24, r19
  c8:   8a 95           dec     r24
  ca:   f1 f7           brne    .-4             ; 0xc8 <loop+0xa>
    _delay_loop_1(5);     // 15 cycles
    PORTD &= B11110111;   // 2 cycles
  cc:   5b 98           cbi     0x0b, 3 ; 11
  ce:   82 2f           mov     r24, r18
  d0:   8a 95           dec     r24
  d2:   f1 f7           brne    .-4             ; 0xd0 <loop+0x12>
    _delay_loop_1(3);     // 9 cycles
    NOP;                // 2 cycles of nops
  d4:   00 00           nop
    NOP;
  d6:   00 00           nop
  d8:   9f 5f           subi    r25, 0xFF       ; 255
  da:   94 36           cpi     r25, 0x64       ; 100
  dc:   99 f7           brne    .-26            ; 0xc4 <loop+0x6>
 }                       // 4 cycles of "for" overhead

OK, I’ve been able to dissamble my sketch.elf, in particular focussing on loop(), while(), _delay_loop_2,… via using avr-objdump in my Macintosh

Here is my sketch

void loop()
{
  cli(); // turn off interrupts

  while (true) { // start critical loop
// Turns ON coil charging opto-coupler #1 
    PORTH |= B10000;
    _delay_loop_2(charge_on);

// Turns OFF coil charging opto-coupler #1 
    PORTH &= B11101111; 
    _delay_loop_2(charge_off);
  
// Turns ON coil FE extracting opto-coupler #2 
    PORTE |= B1000;
    _delay_loop_2(extract_on);

// Turns OFF coil FE extracting opto-coupler #2 
    PORTE &= B11110111;
    _delay_loop_2(extract_off);

// check usb port and if there is data.  Break out of loop to handle it
   if (UCSR0A & _BV(RXC0)) break; 
  } // end of time critical loop

  sei();  // interrupts back on

// gets new parameter from USB Serial Monitor connected to Macintosh then update on board
  change_param();
}

Now the extract of dissamble

void loop()
{
  cli(); // turn off interrupts
     73c:      f8 94             cli

  while (true) { // start critical loop
// Turns ON coil charging opto-coupler #1 
    PORTH |= B10000;
    _delay_loop_2(charge_on);
     73e:      e0 91 fe 02       lds      r30, 0x02FE
     742:      f0 91 ff 02       lds      r31, 0x02FF

// Turns OFF coil charging opto-coupler #1 
    PORTH &= B11101111; 
    _delay_loop_2(charge_off);
     746:      60 91 00 03       lds      r22, 0x0300
     74a:      70 91 01 03       lds      r23, 0x0301
  
// Turns ON coil FE extracting opto-coupler #2 
    PORTE |= B1000;
    _delay_loop_2(extract_on);
     74e:      40 91 02 03       lds      r20, 0x0302
     752:      50 91 03 03       lds      r21, 0x0303

// Turns OFF coil FE extracting opto-coupler #2 
    PORTE &= B11110111;
    _delay_loop_2(extract_off);
     756:      20 91 04 03       lds      r18, 0x0304
     75a:      30 91 05 03       lds      r19, 0x0305
{
  cli(); // turn off interrupts

  while (true) { // start critical loop
// Turns ON coil charging opto-coupler #1 
    PORTH |= B10000;
     75e:      80 91 02 01       lds      r24, 0x0102
     762:      80 61             ori      r24, 0x10      ; 16
     764:      80 93 02 01       sts      0x0102, r24
      __asm__ volatile (
            "1: sbiw %0,1" "\n\t"
            "brne 1b"
            : "=w" (__count)
            : "0" (__count)
      );
     768:      cf 01             movw      r24, r30
     76a:      01 97             sbiw      r24, 0x01      ; 1
     76c:      f1 f7             brne      .-4            ; 0x76a <loop+0x2e>
    _delay_loop_2(charge_on);

// Turns OFF coil charging opto-coupler #1 
    PORTH &= B11101111; 
     76e:      80 91 02 01       lds      r24, 0x0102
     772:      8f 7e             andi      r24, 0xEF      ; 239
     774:      80 93 02 01       sts      0x0102, r24
     778:      cb 01             movw      r24, r22
     77a:      01 97             sbiw      r24, 0x01      ; 1
     77c:      f1 f7             brne      .-4            ; 0x77a <loop+0x3e>
    _delay_loop_2(charge_off);
  
// Turns ON coil FE extracting opto-coupler #2 
    PORTE |= B1000;
     77e:      73 9a             sbi      0x0e, 3      ; 14
     780:      ca 01             movw      r24, r20
     782:      01 97             sbiw      r24, 0x01      ; 1
     784:      f1 f7             brne      .-4            ; 0x782 <loop+0x46>
    _delay_loop_2(extract_on);

// Turns OFF coil FE extracting opto-coupler #2 
    PORTE &= B11110111;
     786:      73 98             cbi      0x0e, 3      ; 14
     788:      c9 01             movw      r24, r18
     78a:      01 97             sbiw      r24, 0x01      ; 1
     78c:      f1 f7             brne      .-4            ; 0x78a <loop+0x4e>
    _delay_loop_2(extract_off);

// check usb port and if there is data.  Break out of loop to handle it
   if (UCSR0A & _BV(RXC0)) break; 
     78e:      80 91 c0 00       lds      r24, 0x00C0
     792:      87 ff             sbrs      r24, 7
     794:      e4 cf             rjmp      .-56           ; 0x75e <loop+0x22>
  } // end of time critical loop

  sei();  // interrupts back on
     796:      78 94             sei

// gets new parameter from USB Serial Monitor connected to Macintosh then update on board
  change_param();
     798:      0e 94 da 02       call      0x5b4      ; 0x5b4 <_Z12change_paramv>
}
     79c:      08 95             ret

I’m a bit confused (first time) how to read this result, in particular how to count number of cycles…

For example, it says

// Turns ON coil charging opto-coupler #1 
    PORTH |= B10000;
    _delay_loop_2(charge_on);
     73e:      e0 91 fe 02       lds      r30, 0x02FE
     742:      f0 91 ff 02       lds      r31, 0x02FF

then later

// Turns ON coil charging opto-coupler #1 
    PORTH |= B10000;
     75e:      80 91 02 01       lds      r24, 0x0102
     762:      80 61             ori      r24, 0x10      ; 16
     764:      80 93 02 01       sts      0x0102, r24
      __asm__ volatile (
            "1: sbiw %0,1" "\n\t"
            "brne 1b"
            : "=w" (__count)
            : "0" (__count)
      );
     768:      cf 01             movw      r24, r30
     76a:      01 97             sbiw      r24, 0x01      ; 1
     76c:      f1 f7             brne      .-4            ; 0x76a <loop+0x2e>
    _delay_loop_2(charge_on);

Should I interpret a part with initialization of each 4 loops then again actual looping ?

Would it be possible for you to comment on different cycles, how much overhead is used,… in particular it seems but i could be wrong that I don’t have exactly same overhead cycles for my main 4 switch on-off procedures,… does that mean I have to insert NOP to balance ?

Should I interpret a part with initialization of each 4 loops then again actual looping ?

Yes. It's doing the "expensive" load of the delay variables into registers outside of your main loop, so that they can be transferred into the working registers used by delay_loop_2() using single-cycle instructions ("movw")

Would it be possible for you to comment on different cycles, how much overhead is used,... in particular it seems but i could be wrong that I don't have exactly same overhead cycles for my main 4 switch on-off procedures

Yep. PORTE is within the range of IO addresses that can be modified by the "set bit in IO register" instructions (2 cycles), but PORTH isn't and has to use the "LDS, ORI, STS" sequence instead (5 cycles.) One of the wonderful "gotchas" of the AVR architecture that isn't emphasized while they're busy claiming architectural elegance ("some of the instructions only work on some of operands you'd think!")

does that mean I have to insert NOP to balance ?

That's probably the easiest way to do it. You could also move your bits to the same (or similarly handled) IO ports...

Yep. PORTE is within the range of IO addresses that can be modified by the "set bit in IO register" instructions (2 cycles), but PORTH isn't and has to use the "LDS, ORI, STS" sequence instead (5 cycles.) One of the wonderful "gotchas" of the AVR architecture that isn't emphasized while they're busy claiming architectural elegance ("some of the instructions only work on some of operands you'd think!")

aie aie aie, so it is not symetric... tough :'(

That's probably the easiest way to do it. You could also move your bits to the same (or similarly handled) IO ports...

OK I see and should be easy since I've bought arduino mega so I should find a combination of I/O and PORTX...

P.S. I will not insert NOP because I'm looking to increase PWM high speed precision

Say I use same port X to write at the same cycle respective On-off switch:

// Turns ON coil charging opto-coupler #1 
    PORTX |= B10000;
    _delay_loop_2(charge_on);

// Turns OFF coil charging opto-coupler #1 
    PORTX &= B11101111; 
    _delay_loop_2(charge_off);
  
// Turns ON coil FE extracting opto-coupler #2 
    PORTX |= B1000;
    _delay_loop_2(extract_on);

// Turns OFF coil FE extracting opto-coupler #2 
    PORTX &= B11110111;
    _delay_loop_2(extract_off);

Do you know if there will be NO cross pin TTL glitch output ? What I mean: turning ON then later OFF bit 4 should NOT hopefully create a glitch event on bit 3 of the same port otherwise this could create power electronics strange behaviour or MOSFET shortcircuit since each MOSFET are gated by each TTL output.

P.S. The high PWM MOSFET gating sequence I look for: (1,0), (0,0), (0,1), (0,0) then loop where (a,b) is for MOSFET(a) and MOSFET(b).

Many thx for everything

Do you know if there will be NO cross pin TTL glitch output ?

I would have expected there to be HUGE amounts of complaining if such glitches did occur.

P.S. The high PWM MOSFET gating sequence I look for: (1,0), (0,0), (0,1), (0,0) then loop where (a,b) is for MOSFET(a) and MOSFET(b).

AVRs are pretty widely used as brushless motor electronic speed controllers in the remote control community, where they'll be used to sense/drive three-phase motors at several tens of thousands of RPM. You might want to look up some of those for hints; I suspect that it's at least the same class of problem. (alas, it seems to be pretty difficult to separate the completed, working projects from the projects that are barely started. One developed project is the "Speedy-BL" (http://www.speedy-bl.de/speedybl-e.htm) How's your German?

OK, I’ve found this new PORT allocation (A & E) so it seems to the switching on-off cycles are balanced (cbi and sbi). Lucky me I’ve put more money right away in buying Megaboard as my ever first arduino so I’ve many I/O combinations.

void loop()
{
  cli(); // turn off interrupts
     73c:      f8 94             cli

  while (true) { // start critical loop
// Turns ON coil charging opto-coupler #1 
    PORTA |= B1;
    _delay_loop_2(charge_on);
     73e:      e0 91 fe 02       lds      r30, 0x02FE
     742:      f0 91 ff 02       lds      r31, 0x02FF

// Turns OFF coil charging opto-coupler #1 
    PORTA &= B11111110; 
    _delay_loop_2(charge_off);
     746:      60 91 00 03       lds      r22, 0x0300
     74a:      70 91 01 03       lds      r23, 0x0301
  
// Turns ON coil FE extracting opto-coupler #2 
    PORTE |= B1000;
    _delay_loop_2(extract_on);
     74e:      40 91 02 03       lds      r20, 0x0302
     752:      50 91 03 03       lds      r21, 0x0303

// Turns OFF coil FE extracting opto-coupler #2 
    PORTE &= B11110111;
    _delay_loop_2(extract_off);
     756:      20 91 04 03       lds      r18, 0x0304
     75a:      30 91 05 03       lds      r19, 0x0305
{
  cli(); // turn off interrupts

  while (true) { // start critical loop
// Turns ON coil charging opto-coupler #1 
    PORTA |= B1;
     75e:      10 9a             sbi      0x02, 0      ; 2
      __asm__ volatile (
            "1: sbiw %0,1" "\n\t"
            "brne 1b"
            : "=w" (__count)
            : "0" (__count)
      );
     760:      cf 01             movw      r24, r30
     762:      01 97             sbiw      r24, 0x01      ; 1
     764:      f1 f7             brne      .-4            ; 0x762 <loop+0x26>
    _delay_loop_2(charge_on);

// Turns OFF coil charging opto-coupler #1 
    PORTA &= B11111110; 
     766:      10 98             cbi      0x02, 0      ; 2
     768:      cb 01             movw      r24, r22
     76a:      01 97             sbiw      r24, 0x01      ; 1
     76c:      f1 f7             brne      .-4            ; 0x76a <loop+0x2e>
    _delay_loop_2(charge_off);
  
// Turns ON coil FE extracting opto-coupler #2 
    PORTE |= B1000;
     76e:      73 9a             sbi      0x0e, 3      ; 14
     770:      ca 01             movw      r24, r20
     772:      01 97             sbiw      r24, 0x01      ; 1
     774:      f1 f7             brne      .-4            ; 0x772 <loop+0x36>
    _delay_loop_2(extract_on);

// Turns OFF coil FE extracting opto-coupler #2 
    PORTE &= B11110111;
     776:      73 98             cbi      0x0e, 3      ; 14
     778:      c9 01             movw      r24, r18
     77a:      01 97             sbiw      r24, 0x01      ; 1
     77c:      f1 f7             brne      .-4            ; 0x77a <loop+0x3e>
    _delay_loop_2(extract_off);

// check usb port and if there is data.  Break out of loop to handle it
   if (UCSR0A & _BV(RXC0)) break; 
     77e:      80 91 c0 00       lds      r24, 0x00C0
     782:      87 ff             sbrs      r24, 7
     784:      ec cf             rjmp      .-40           ; 0x75e <loop+0x22>
  } // end of time critical loop

  sei();  // interrupts back on
     786:      78 94             sei

// gets new parameter from USB Serial Monitor connected to Macintosh then update on board
  change_param();
     788:      0e 94 da 02       call      0x5b4      ; 0x5b4 <_Z12change_paramv>
}
     78c:      08 95             ret

What does this code means, does it take extra cycles, why does not appear later in similar other stages, do you confirm I now have always 2 cycles for sbi or cbi in my 4 stages ?

      __asm__ volatile (
            "1: sbiw %0,1" "\n\t"
            "brne 1b"
            : "=w" (__count)
            : "0" (__count)
      );

How’s your German?

Well I can speak 4 langages but not german :-X

__asm__ volatile (
            "1: sbiw %0,1" "\n\t"
            "brne 1b"
            : "=w" (__count)
            : "0" (__count)
      );

This is the “source code” for the delay_loop_2(); gcc inline asm statements have a complex syntax that I haven’t begun to fully understand. “sbiw” is "subtract immediate (a constant) from a word-length double register. %0 means “a parameter that is more fully described outside the quotes”, and in this case I THINK it has the magic characters that mean “a temporary 16bit register initialized to the constant that the user specified.” “brne” is “branch if not equal to zero.”

I don’t know why this appears only once in the disassembly. The code that it produces:

760:      cf 01             movw      r24, r30
     762:      01 97             sbiw      r24, 0x01      ; 1
     764:      f1 f7             brne      .-4            ; 0x762 <loop+0x26>

shows up just as many times as you’d expect.

There are a couple things you might be able to do, even in C, that might speed up your inner loop by a few additional cycles. Most “obviously”, move the bitwise calculations outside of the loop:

  register byte c_on, c_off, e_on, e_off;
  c_on = PORTH | B10000;
  c_off = PORTH & ~B10000;
  e_on = PORTE | B10000;
  e_off = PORTE & ~B10000;

  while (true) { // start critical loop
// Turns ON coil charging opto-coupler #1
    PORTH = c_on;
    _delay_loop_2(charge_on);

// Turns OFF coil charging opto-coupler #1
    PORTH = c_off;
    _delay_loop_2(charge_off);
  
// Turns ON coil FE extracting opto-coupler #2
    PORTE = e_on;
    _delay_loop_2(extract_on);

// Turns OFF coil FE extracting opto-coupler #2
    PORTE = e_off;
    _delay_loop_2(extract_off);
    :

That SHOULD replace the 2 or 5 cycle sequences with 1 or 2 cycle OUT or STD instructions (depending on port address), at least until you run out of registers for the extra values… (not actually tested…)

I’m gonna play & run what you suggested (using register in C context) to see if I can gain extra cycle precision on my project.

P.S. Please note that i’m heading towards controlling 6 pins or 2 * MOSFET for 3PH inverters eventhough for the moment I use my board with only 2 MOSFET working on 1PH to see if arduino MEGA board is good candidate on high PWM frequency from C langage POV.

Just dowloaded this file to self count cycles of my code and not ask so many question here ::slight_smile:http://www.atmel.com/dyn/resources/prod_documents/DOC0856.PDF

what is #clocks XMEGA versus #clocks ?

Take CBI or SBI: 2 clocks and 1 XMEGA clock ???

About BRNE, it says #clocks 1/2, is it 2 if branching up and 1 if continuing down ?

Is it correct to over all count with N being number of iterations

     760:      cf 01             movw      r24, r30
     762:      01 97             sbiw      r24, 0x01      ; 1
     764:      f1 f7             brne      .-4            ; 0x762 <loop+0x26>

1 cycle + N * 2 cycles + (N-1) * 2 cycles + 1 cycle

so _delay_loop_2(N) uses overall 2 cycles + 4N cycles - 2 cycles = 4N cycles ?

About this other part of my

// check usb port and if there is data.  Break out of loop to handle it
   if (UCSR0A & _BV(RXC0)) break;
     77e:      80 91 c0 00       lds      r24, 0x00C0
     782:      87 ff             sbrs      r24, 7
     784:      ec cf             rjmp      .-40           ; 0x75e <loop+0x22>

if condition is false, how many cycles does it use ?

I ask because when I benchmarked & before knowing how to dissamble, it gave me 2 cycles but it seems to be 3 cycles so my method must have been wrong!

What I really need to know on my code how many cycles to check any RX USB activity then loop on the while(true) if no RX, I would say 5 cycles, am I right ?

void loop()
{
  cli(); // turn off interrupts
     73c:      f8 94             cli

  while (true) { // start critical loop
// Turns ON coil charging opto-coupler #1
    PORTA |= B1;
    _delay_loop_2(charge_on);
     73e:      e0 91 fe 02       lds      r30, 0x02FE
     742:      f0 91 ff 02       lds      r31, 0x02FF

// Turns OFF coil charging opto-coupler #1
    PORTA &= B11111110;
    _delay_loop_2(charge_off);
     746:      60 91 00 03       lds      r22, 0x0300
     74a:      70 91 01 03       lds      r23, 0x0301
  
// Turns ON coil FE extracting opto-coupler #2
    PORTE |= B1000;
    _delay_loop_2(extract_on);
     74e:      40 91 02 03       lds      r20, 0x0302
     752:      50 91 03 03       lds      r21, 0x0303

// Turns OFF coil FE extracting opto-coupler #2
    PORTE &= B11110111;
    _delay_loop_2(extract_off);
     756:      20 91 04 03       lds      r18, 0x0304
     75a:      30 91 05 03       lds      r19, 0x0305
{
  cli(); // turn off interrupts

  while (true) { // start critical loop
// Turns ON coil charging opto-coupler #1
    PORTA |= B1;
     75e:      10 9a             sbi      0x02, 0      ; 2
      __asm__ volatile (
            "1: sbiw %0,1" "\n\t"
            "brne 1b"
            : "=w" (__count)
            : "0" (__count)
      );
     760:      cf 01             movw      r24, r30
     762:      01 97             sbiw      r24, 0x01      ; 1
     764:      f1 f7             brne      .-4            ; 0x762 <loop+0x26>
    _delay_loop_2(charge_on);

// Turns OFF coil charging opto-coupler #1
    PORTA &= B11111110;
     766:      10 98             cbi      0x02, 0      ; 2
     768:      cb 01             movw      r24, r22
     76a:      01 97             sbiw      r24, 0x01      ; 1
     76c:      f1 f7             brne      .-4            ; 0x76a <loop+0x2e>
    _delay_loop_2(charge_off);
  
// Turns ON coil FE extracting opto-coupler #2
    PORTE |= B1000;
     76e:      73 9a             sbi      0x0e, 3      ; 14
     770:      ca 01             movw      r24, r20
     772:      01 97             sbiw      r24, 0x01      ; 1
     774:      f1 f7             brne      .-4            ; 0x772 <loop+0x36>
    _delay_loop_2(extract_on);

// Turns OFF coil FE extracting opto-coupler #2
    PORTE &= B11110111;
     776:      73 98             cbi      0x0e, 3      ; 14
     778:      c9 01             movw      r24, r18
     77a:      01 97             sbiw      r24, 0x01      ; 1
     77c:      f1 f7             brne      .-4            ; 0x77a <loop+0x3e>
    _delay_loop_2(extract_off);

// check usb port and if there is data.  Break out of loop to handle it
   if (UCSR0A & _BV(RXC0)) break;
     77e:      80 91 c0 00       lds      r24, 0x00C0
     782:      87 ff             sbrs      r24, 7
     784:      ec cf             rjmp      .-40           ; 0x75e <loop+0x22>
  } // end of time critical loop

  sei();  // interrupts back on
     786:      78 94             sei

// gets new parameter from USB Serial Monitor connected to Macintosh then update on board
  change_param();
     788:      0e 94 da 02       call      0x5b4      ; 0x5b4 <_Z12change_paramv>
}
     78c:      08 95             ret

I apologize if the following has already been mentioned...

For the time critical parts of your code, you will have to disable interrupts. Something like this...

/* Disable interrupts */ uint8_t SaveSREG = SREG; _CLI();

/* Time critical code goes here */

/* Restore interrupts */ SREG = SaveSREG;

what is #clocks XMEGA versus #clocks ?

XMEGA is a processor family... http://www.atmel.com/products/AVR/default_xmega.asp

I have no idea which Arduino processors qualify as XMEGA. Given the features listed, I suspect none are XMEGA.

About BRNE, it says #clocks 1/2, is it 2 if branching up and 1 if continuing down ?

Usually the numbers are BRANCH NOT TAKEN / BRANCH TAKEN but I don't know for certain. So, BRNE uses 1 clock cycle if the branch is not taken (if the PC is not changed) and 2 cycles if the branch is taken.

Good luck, Brian

  if (UCSR0A & _BV(RXC0)) break;

77e:      80 91 c0 00      lds      r24, 0x00C0
    782:      87 ff            sbrs      r24, 7
    784:      ec cf            rjmp      .-40          ; 0x75e <loop+0x22>


if condition is false, how many cycles does it use ?

Two cycles for the lds instruction (essentially, just because it’s two words long), one cycle for the sbrs instruction to not skip, two cycles for the rjmp back to the beginning of the loop.

XMEGA is a new cpu family with (mostly) compatible instruction set. There are currently no arduinos that contain XMEGA cpus. (they’re still pretty hard to come by. But they’ll run 32MHz…)

1 cycle + N * 2 cycles + (N-1) * 2 cycles + 1 cycle

That looks right.

so _delay_loop_2(N) uses overall 2 cycles + 4N cycles - 2 cycles = 4N cycles ?

Yes, but only as long as the initialization only takes one cycle. I’m not sure what circumstances would cause it to take more, but this is something left to the compiler rather than something that is explicitly specified by the macro itself.

What I really need to know on my code how many cycles to check any RX USB activity then loop on the while(true) if no RX, I would say 5 cycles, am I right ?

I think so. You could consider putting the check at a different place in the loop so that the check overhead and the loop overhead aren’t right next to each other; it would just mean making sure you do the right things after you exit the loop to put things into a safe state before you change values… (right now you “waste” 5 cycles testing and looping with nothing turned on. Which is fine if you want to spend at least 5 cycles in that state anyway. But you COULD reorder the loop so that the test overhead happens whhile charge_on is happening, and the loop happens while extract_on is happening:

  while (1) {
    CHARGE_OFF();
    EXTRACT_ON();    // the first time through we'll extract without having charged.  So what?
    if (UARTDATA)
      break;
    EXTRACT_OFF();
    CHARGE_ON();
    }
  EXTRACT_OFF();  // make sure we turn off extract before we modify values.

You could consider putting the check at a different place in the loop so that the check overhead and the loop overhead aren’t right next to each other; it would just mean making sure you do the right things after you exit the loop to put things into a safe state before you change values… (right now you “waste” 5 cycles testing and looping with nothing turned on. Which is fine if you want to spend at least 5 cycles in that state anyway. But you COULD reorder the loop so that the test overhead happens whhile charge_on is happening, and the loop happens while extract_on is happening:

  while (1) {

CHARGE_OFF();
   EXTRACT_ON();    // the first time through we’ll extract without having charged.  So what?
   if (UARTDATA)
     break;
   EXTRACT_OFF();
   CHARGE_ON();
   }
 EXTRACT_OFF();  // make sure we turn off extract before we modify values.

yes but do you agree tthis reordering will not save not any cycle ?

Another question I have about _delay_loop_1(N) which uses 3N cycles per iteration but for 0<N<256 versus _delay_loop_2(N) which uses 4N cycles per iteration for 0<N<65536.

My project right now still explores wide frequency bandwith On-Off switching strategies for my MOSFET controlled by my Macintosh Serial Monitor so i’m obliged to keep _delay_loop_2(N) function because I have sometimes N>255.

Would you know a fast C algorithm taking a few extra cycles to call Q times delay_loop_1(K) then call once _delay_loop_1(R) with N=K*Q+R. Theoretically, I could save 25% cycles for same N but I need a low cycle count overhead management algorithm to call exactly the correct number of times _delay_loop_1().

do you agree tthis reordering will not save not any cycle ?

Yes. It just lets you even out where the uncontrollable pauses are…

Would you know a fast C algorithm taking a few extra cycles to call Q times delay_loop_1(K) then call once _delay_loop_1(R) with N=K*Q+R.

I can’t think of anything. I’m not sure I see the point; if you use delay_loop_2(), your maximum time through your loop is (very large), and the minimum would be:
4 cycles for each delay loop (16 cycles)
3 cycles for UART check
2 cycles for loop overhead
for a total of 21 cycles. Using delay_loop_1() instead only shaves a single cycle off of the minimum of each delay, and dramatically decreases the maximum. Are the desired delays fully independent of each other? You COULD always do something like:

if (con <= 255 && coff <= 255 && eon <= 255 && eoff <=255) {
  loop_using_delay_1((byte)con, (byte)coff, (byte)eon, (byte)eoff);
} else {
  loop_using_delay_2(con, coff, eon, eoff);
}

Since you have (comparatively) a large code space, you could write special purpose functions for the very fast loops and jump to the appropriate versions:

void loop_1_1_1_1() {
  while (1) {
    PORTA |= ABIT;
    NOP;
    PORTA &= ~ABIT;
    NOP;
    PORTE |= EBIT;
    NOP;
    PORT2 &= EBIT;
    if (UARTCHECK)
       break;
    }
}
void loop_2_2_2_2() {
  while (1) {
    PORTA |= ABIT;
    NOP; NOP;
    PORTA &= ~ABIT;
    NOP; NOP;
    PORTE |= EBIT;
    NOP; NOP;
    PORT2 &= EBIT;
    if (UARTCHECK)
       break;
    }
}

void loop() {
    :
  if (con <= 255 && coff <= 255 && eon <= 255 && eoff <=255) {
    if  (con == 1 && off == 1 && eon == 1 && eoff == 1) {
      loop_1_1_1_1();
    } else if (con == 2 && off == 2 && eon == 2 && eoff == 2) {
      loop_2_2_2_2();
    } else {
      loop_using_delay_1((byte)con, (byte)coff, (byte)eon, (byte)eoff);
    }
  } else {
    loop_using_delay_2(con, coff, eon, eoff);
  }

Math says you can implement all the possible versions with no more than 66 functions, most of which could be “easily” generated by a short meta-program or editor macro… (and that’s assuming that Eoff needs separate noops after it gets below 5 cycles…) (all the versions with all delays less than 3, I mean.)

In order to clarify what i’m trying to do, here is the code computing the loops parameters (initialization but also after for resonance tuning). I initialize with specific 4 duty cycles then eventually keyboard will send other 4 duty cycles to refine tuning so loops parameters will be modified while board is running (no compile & load).

The issue is the timing precision in number of cycles to place each On-Off transition very precisely, so with _delay_loop_2() has +/- 2 cycles (in absolute 4 cycles) precision as opposed to a better +/- 1.5 (in absolute 3 cycles) precision if using _delay_loop_1().

I’m interested in the highest precision but for some duty cycle parameter, it involves delaying more than 255 cycles so it requires calling multiple times _delay_loop_1() or a mix of unique call _delay_loop_1() and unique call of _delay_loop_2()…

P.S. I don’t know if this is AVR or STK arduino bug but _delay_loop_1(0) as well as _delay_loop_2(0) do not work so extra check is requested :’(

#define CPU_CLOCK 16000000.0
#define OVERHEAD_OUT 2.0 // two cycle to set ON or OFF the port driving MOSFET
#define OVERHEAD_LOOP 4.0 // 4 cycles to each iteration of delay_loop_2
#define OVERHEAD_USB_WHILE 5.0 // 3 cycles for USB RX check and 2 cycles for each while(true)

... SNIP ...

  fperiod=CPU_CLOCK / float(freq * 100);

  fduty=fperiod*float(duty_charge_on);
  fduty = (fduty - OVERHEAD_OUT) / OVERHEAD_LOOP;
  charge_on=int(fduty+0.5);
  if(charge_on < 1){
    Serial.println("Error: digital Charge ON");
    return;
  }
  fduty=fperiod*float(duty_charge_off);
  fduty = (fduty - OVERHEAD_OUT) / OVERHEAD_LOOP;
  charge_off=int(fduty+0.5);
  if(charge_off < 1){
    Serial.println("Error: digital Charge OFF");
    return;
  }
  fduty=fperiod*float(duty_extract_on);
  fduty = (fduty - OVERHEAD_OUT) / OVERHEAD_LOOP;
  extract_on=int(fduty+0.5);
  if(extract_on < 1){
    Serial.println("Error: digital Extract ON");
    return;
  }
  fduty=fperiod*float(duty_extract_off);
  fduty = (fduty - OVERHEAD_OUT - OVERHEAD_USB_WHILE) / OVERHEAD_LOOP;
  extract_off=int(fduty+0.5);
  if(extract_off < 1){
    Serial.println("Error: digital Extract OFF");
    return;
  }