Go Down

Topic: how to generate an asm program listing (Read 3624 times) previous topic - next topic

dpharris

Wouldn't this be a good option to add to the preferences page?   Enquiring minds like to know what machine code they are generating.  Very instructional, and at times essential. 

David
Dr. David Harris
OpenLCB Dev Team

optimistx


...  Enquiring minds like to know what machine code they are generating.  Very instructional, and at times essential. 
...

Yes. With the assembly listing one could tune short programs to work efficiently. When we use  a timer with 16 Mhz frequency we could even count CPU cycles easily. It is also fascinating to see how the compiler optimizes the code. 
I wrote a timing exercise:
Code: [Select]
/*
How to count machine cycles of any instructions.
No external circuits needed. Tested with Arduino Uno 16 Mhz
and Arduino IDE 1.0.5
Written by optimistx, who takes no responsibility of this.
You may use this code as you like.
*/
void setup(){
// define the variables in the test instructions as volatile
// to prevent the optimizer to remove the instructions
  byte volatile testbyte = 123;
  byte volatile ibyte = 0;
  int volatile i = 0;
  long int volatile j = 0;
  double volatile f = 1.0;
 
  byte t0,t1,t2,t3;

  Serial.begin(115200);
// timer registers to initial values as in the atmega328 datasheet
// arduino ide software had changed some
  TIMSK2 = 0; // initial value, disables overflow interrupt
  TCCR2A = 0; // only timer2 op, no pwm (arduino changed to pwm, was B00000001)
  TCCR2B = 0; // Stop Timer2, no prescaler. arduino had set prescaler 64
  TCNT2=0; // arduino had timer2 counting
  TIFR2 = 0; // should be initial value 0
  bitWrite(TIFR2, TOV2, 1); // TOV2 will be cleared to zero when writing one
   // (strange, but so the datasheet says and it worked so)

  noInterrupts();
  bitWrite(TCCR2B, CS20, 1); // Start Timer2
  t0 = TCNT2; // 1 cycle; takes then 2 cycles to store
  asm("nop\n");//1 cycle
  asm("nop\n");//1
  asm("nop\n");//1
  asm("nop\n");//1
  t1 = TCNT2; // total of 7 cycles here;
  asm("nop\n");
  asm("nop\n");
  asm("nop\n");
  asm("nop\n");
  t2 = TCNT2; // 13 = 7 + 2 + 1+1+1+1
  // test any instruction(s) between lines below or write your own
  // uncomment any example line below to run it
  // -------------------------------------
  testbyte = t2;  //  2 cycles with volatile testbyte
 
  //asm("nop\n"); // 1 cycle, else program error
 
  //ibyte = ibyte + t2; // 5 cycles with volatile ibyte, nonvolatile t2
 
  //i = i + t2; // 12 cycles with volatile i (integer), nonvolatile t2
 
  //j = j + 123456L; // 20 cycles with volatile j (long integer)
 
  //f = f + 1.0; // 100 or 101 cycles with volatile f (floating point )
 
  //micros(); // 47 or 48 cycles = 3 microseconds
 
  //millis();// 21 or 22 cycles
 
  //for (byte ii = 0;ii < 10;ii++){ibyte = ibyte + ii;} // 90 cycles
 
  //for (int ii = 0; ii < 10;ii++){i = i + ii;} //  161 cycles
 
  //ibyte = bitRead(TIFR2, TOV2); // 4 cycles
 
  //Serial.print('x'); // 143 cycles. interrupts are off!
  // -----------------------------------------
  t3 = TCNT2;
  interrupts();
  TCCR2B = 0; //Stop Timer2
 
 
  if(bitRead(TIFR2, TOV2) == 0){ // if no overflow of timer2
    if((t0 != 1) || (t1 != 7) || (t2 != 13)){
      Serial.print(t0, DEC); Serial.print('\t');
      Serial.print(t1, DEC); Serial.print('\t');
      Serial.print(t2, DEC); Serial.print('\t');
      Serial.print(t3, DEC); Serial.println();
      Serial.println(F("The above should be 1\t 7\t 13\t ..."));
      Serial.println(F("The program might give wrong results"));
    } 
    Serial.print(F("The test instruction(s) took "));
   
    Serial.print (t3 - t2 - 2, DEC);
    Serial.print (F(" cycles  ( 62.5 ns each, if 16 Mhz CPU) "));
    Serial.println();
    if(t3-t2-2 <= 0){
      Serial.println(F("You may uncomment any example line(s),"));
      Serial.println(F("or add your own code. Then reload"));
    }
  } else {
    Serial.print(F("Timer2 overflow occurred, too much to do in 255 cycles"));
  }
}

void loop(){
}

westfw

So, with 1.5.x, it should be trivial to cause assembly listings to be generated at compile time, and not too difficult to cause a disassembly at the end of the build process.  (when it works, I tend to find the disassembly more useful than the compiler-produced output.)

optimistx


...
(when it works, I tend to find the disassembly more useful than the compiler-produced output.)


More useful? When in doubt about the code produced, I would (also) trust the disassembly more. Or do you see other reasons for being more useful?
---
When trying to optimize interrupt service routines to be as fast as possible, it is nice to see, how the compiler is smart enough to save/restore only those registers which are really needed.
E.g. incrementing 4 byte volatile  timer variable can be done in about 39 cycles of 62.5 ns.  If an overflow interrupt happens every 256 cycles there is reason to think how to code: 39/256 is about 15 % of total cpu-cycles. However, premature optimization is a source of many unnecessary and complicated code sequences. But ah so interesting!

westfw

Quote
do you see other reasons

The assembler listing from the compiler:
1) is full of debugging info and "noise"
2) is pre-link, which means some optimization might not have been done ("relax"?), and absolute jump/call destinations are not filled in.  Also, doesn't have the unused functions omitted by "gc-sections"
3) doesn't have the full program including libraries.

Go Up