how to generate an asm program listing

I need to have an assembler program listing for my Arduino project. I've looked in the build folder and can find only the object code there, with the .hex files and other support files of course.

Does anyone know how to modify the Windows Arduino 1.0.1 IDE to output assembler listings for all compiled source files?

I don't think there is a way to change the avr-gcc switches without the java source code and re-building at least one of the .jar files in the IDE.

The AVRgcc tools that come with Arduino include two programs for dis-assembling binaries. Read about avr-objdup and avr-readelf.

You can learn to use the avr-gcc directly. That way you can provide it with any command-line switches you want. The easiest way is to turn on verbose logging for the build and look at the command lines that Arduino generates.

Can you tell how to turn on verbose logging?

It's in preferences, unless you are using a Pre-1.0 IDE.

I am still using 0021, but am attempting to move to 1.0.1
I havbe an urgent need to solve a problem and no time to get to 1.0.1 before that is done.

So, I am stuck here for now.

Hold down the shift key when compiling/uploading for verbose on Pre-1.0 IDEs.

johnwasser:
I don't think there is a way to change the avr-gcc switches without the java source code and re-building at least one of the .jar files in the IDE.

The AVRgcc tools that come with Arduino include two programs for dis-assembling binaries. Read about avr-objdup and avr-readelf.

You can learn to use the avr-gcc directly. That way you can provide it with any command-line switches you want. The easiest way is to turn on verbose logging for the build and look at the command lines that Arduino generates.

In addition, if you are invoking avr-gcc directory, there are the following options:

  • -save-temps -- when compiling, leave .ii and .s files around (.ii is the C++ source after the preprocessor, .s is the assembler file)
  • -Wa,-al -- tell the assembler to generate a listing file to standard output

MichaelMeissner:
-save-temps -- when compiling, leave .ii and .s files around (.ii is the C++ source after the preprocessor, .s is the assembler file)

Thank you! I need that!

Yes, I added -save-temps many, many years ago (probably about 22 or so), as it made tracing down problems in large builds much easier, in that you could just look at the .s file or feed the .i/.ii file into the compiler to debug it.

In the last 2-3 years, there have been some warts, when we moved the preprocessor into the compiler, and I added support to turn on particular processor options for given functions, but over all, -save-temps is useful.

Would an objdump do or are you looking specifically for .asm ?

sample objdump of digitalRead in all its 'I never knew it was that big' glory -

int digitalRead(uint8_t pin)
{
	uint8_t timer = digitalPinToTimer(pin);
    1342:	68 2f       	mov	r22, r24
    1344:	70 e0       	ldi	r23, 0x00	; 0
    1346:	cb 01       	movw	r24, r22
    1348:	82 55       	subi	r24, 0x52	; 82
    134a:	9f 4f       	sbci	r25, 0xFF	; 255
    134c:	fc 01       	movw	r30, r24
    134e:	24 91       	lpm	r18, Z+
	uint8_t bit = digitalPinToBitMask(pin);
    1350:	cb 01       	movw	r24, r22
    1352:	86 56       	subi	r24, 0x66	; 102
    1354:	9f 4f       	sbci	r25, 0xFF	; 255
    1356:	fc 01       	movw	r30, r24
    1358:	44 91       	lpm	r20, Z+
	uint8_t port = digitalPinToPort(pin);
    135a:	6a 57       	subi	r22, 0x7A	; 122
    135c:	7f 4f       	sbci	r23, 0xFF	; 255
    135e:	fb 01       	movw	r30, r22
    1360:	94 91       	lpm	r25, Z+

	if (port == NOT_A_PIN) return LOW;
    1362:	99 23       	and	r25, r25
    1364:	19 f4       	brne	.+6      	; 0x136c <digitalRead+0x2a>
    1366:	20 e0       	ldi	r18, 0x00	; 0
    1368:	30 e0       	ldi	r19, 0x00	; 0
    136a:	3c c0       	rjmp	.+120    	; 0x13e4 <digitalRead+0xa2>

	// If the pin that support PWM output, we need to turn it off
	// before getting a digital reading.
	if (timer != NOT_ON_TIMER) turnOffPWM(timer);
    136c:	22 23       	and	r18, r18
    136e:	51 f1       	breq	.+84     	; 0x13c4 <digitalRead+0x82>
//
//static inline void turnOffPWM(uint8_t timer) __attribute__ ((always_inline));
//static inline void turnOffPWM(uint8_t timer)
static void turnOffPWM(uint8_t timer)
{
	switch (timer)
    1370:	23 30       	cpi	r18, 0x03	; 3
    1372:	71 f0       	breq	.+28     	; 0x1390 <digitalRead+0x4e>
    1374:	24 30       	cpi	r18, 0x04	; 4
    1376:	28 f4       	brcc	.+10     	; 0x1382 <digitalRead+0x40>
    1378:	21 30       	cpi	r18, 0x01	; 1
    137a:	a1 f0       	breq	.+40     	; 0x13a4 <digitalRead+0x62>
    137c:	22 30       	cpi	r18, 0x02	; 2
    137e:	11 f5       	brne	.+68     	; 0x13c4 <digitalRead+0x82>
    1380:	14 c0       	rjmp	.+40     	; 0x13aa <digitalRead+0x68>
    1382:	26 30       	cpi	r18, 0x06	; 6
    1384:	b1 f0       	breq	.+44     	; 0x13b2 <digitalRead+0x70>
    1386:	27 30       	cpi	r18, 0x07	; 7
    1388:	c1 f0       	breq	.+48     	; 0x13ba <digitalRead+0x78>
    138a:	24 30       	cpi	r18, 0x04	; 4
    138c:	d9 f4       	brne	.+54     	; 0x13c4 <digitalRead+0x82>
    138e:	04 c0       	rjmp	.+8      	; 0x1398 <digitalRead+0x56>
	{
		#if defined(TCCR1A) && defined(COM1A1)
		case TIMER1A:   cbi(TCCR1A, COM1A1);    break;
    1390:	80 91 80 00 	lds	r24, 0x0080
    1394:	8f 77       	andi	r24, 0x7F	; 127
    1396:	03 c0       	rjmp	.+6      	; 0x139e <digitalRead+0x5c>
		#endif
		#if defined(TCCR1A) && defined(COM1B1)
		case TIMER1B:   cbi(TCCR1A, COM1B1);    break;
    1398:	80 91 80 00 	lds	r24, 0x0080
    139c:	8f 7d       	andi	r24, 0xDF	; 223
    139e:	80 93 80 00 	sts	0x0080, r24
    13a2:	10 c0       	rjmp	.+32     	; 0x13c4 <digitalRead+0x82>
		#if defined(TCCR2) && defined(COM21)
		case  TIMER2:   cbi(TCCR2, COM21);      break;
		#endif
		
		#if defined(TCCR0A) && defined(COM0A1)
		case  TIMER0A:  cbi(TCCR0A, COM0A1);    break;
    13a4:	84 b5       	in	r24, 0x24	; 36
    13a6:	8f 77       	andi	r24, 0x7F	; 127
    13a8:	02 c0       	rjmp	.+4      	; 0x13ae <digitalRead+0x6c>
		#endif
		
		#if defined(TIMER0B) && defined(COM0B1)
		case  TIMER0B:  cbi(TCCR0A, COM0B1);    break;
    13aa:	84 b5       	in	r24, 0x24	; 36
    13ac:	8f 7d       	andi	r24, 0xDF	; 223
    13ae:	84 bd       	out	0x24, r24	; 36
    13b0:	09 c0       	rjmp	.+18     	; 0x13c4 <digitalRead+0x82>
		#endif
		#if defined(TCCR2A) && defined(COM2A1)
		case  TIMER2A:  cbi(TCCR2A, COM2A1);    break;
    13b2:	80 91 b0 00 	lds	r24, 0x00B0
    13b6:	8f 77       	andi	r24, 0x7F	; 127
    13b8:	03 c0       	rjmp	.+6      	; 0x13c0 <digitalRead+0x7e>
		#endif
		#if defined(TCCR2A) && defined(COM2B1)
		case  TIMER2B:  cbi(TCCR2A, COM2B1);    break;
    13ba:	80 91 b0 00 	lds	r24, 0x00B0
    13be:	8f 7d       	andi	r24, 0xDF	; 223
    13c0:	80 93 b0 00 	sts	0x00B0, r24

	// If the pin that support PWM output, we need to turn it off
	// before getting a digital reading.
	if (timer != NOT_ON_TIMER) turnOffPWM(timer);

	if (*portInputRegister(port) & bit) return HIGH;
    13c4:	89 2f       	mov	r24, r25
    13c6:	90 e0       	ldi	r25, 0x00	; 0
    13c8:	88 0f       	add	r24, r24
    13ca:	99 1f       	adc	r25, r25
    13cc:	84 58       	subi	r24, 0x84	; 132
    13ce:	9f 4f       	sbci	r25, 0xFF	; 255
    13d0:	fc 01       	movw	r30, r24
    13d2:	a5 91       	lpm	r26, Z+
    13d4:	b4 91       	lpm	r27, Z+
    13d6:	8c 91       	ld	r24, X
    13d8:	20 e0       	ldi	r18, 0x00	; 0
    13da:	30 e0       	ldi	r19, 0x00	; 0
    13dc:	84 23       	and	r24, r20
    13de:	11 f0       	breq	.+4      	; 0x13e4 <digitalRead+0xa2>
    13e0:	21 e0       	ldi	r18, 0x01	; 1
    13e2:	30 e0       	ldi	r19, 0x00	; 0
	return LOW;
}
    13e4:	c9 01       	movw	r24, r18
    13e6:	08 95       	ret

Duane B

rcarduino.blogspot.com

It looks to me like objdump will do what I need.

Thanks

I really would like to be able to run the compiler manually on a file or two, so thanks for the compiler switches.
Not interested in running a makefile on Windows, excepting for the experience gained.

So, where do I find objdumP? A leading question indeed as there are many copies on my PC. Why so many? Who knows?

There is C:\WinAvr-20100110, C:\WinAVR-20090313, C:\Program Files (x86)\Atmel\AVR Tools\AVR Toolchain\bin; and then there is the one inside of the Arduino installation down in E:\Arduino Projects\Arduino\arduino-0021\arduino-0021\hardware\tools\avr.

Hi,
I use the one from the Arduino install with the -s switch and run it against the cpp.elf file, here is a sample of my command line -

C:\arduino-1.0-windows\arduino-1.0\hardware\tools\avr\bin\obj-dump -s C:\DOCUME~1\ADMINI~1\LOCALS~1\Temp\build5704327072959987835.tmp\RCChannelsL293D.cpp.elf > RCChannels.txt

Duane B

rcarduino.blogspot.com

When using Arduino IDE 1.0.5 I do not get the nice assembler listing with intermingled C-code (or C++ code) of the sketch. I experimented with this:

avr-objdump -S -l -C -h -t myprogram.cpp.elf > mylisting.txt

There are line numbers of my original sketch but no the contents of the line.

Of course, by putting the original sketch and the listing.txt side by side one can find the correnspondence. Is there a simple trick to correct this?

(I put on the verbose listing to find the file location of the *.elf).

My commands for the blink.ino was:

"C:\Program Files\Arduino\hardware\tools\avr\bin\avr-objdump" -S -l -C -h -t  C:\DOCUME~1\ANANAB~1\LOCALS~1\Temp\build4286924063808951866.tmp\Blink.cpp.elf > Blink.txt

and part of the output file Blink.txt, covering the loop() of the blink.ino:

00000100 <loop>:
loop():
C:\Program Files\Arduino/Blink.ino:20
 100:	80 91 00 01 	lds	r24, 0x0100
 104:	61 e0       	ldi	r22, 0x01	; 1
 106:	0e 94 b8 01 	call	0x370	; 0x370 <digitalWrite>
C:\Program Files\Arduino/Blink.ino:21
 10a:	68 ee       	ldi	r22, 0xE8	; 232
 10c:	73 e0       	ldi	r23, 0x03	; 3
 10e:	80 e0       	ldi	r24, 0x00	; 0
 110:	90 e0       	ldi	r25, 0x00	; 0
 112:	0e 94 e5 00 	call	0x1ca	; 0x1ca <delay>
C:\Program Files\Arduino/Blink.ino:22
 116:	80 91 00 01 	lds	r24, 0x0100
 11a:	60 e0       	ldi	r22, 0x00	; 0
 11c:	0e 94 b8 01 	call	0x370	; 0x370 <digitalWrite>
C:\Program Files\Arduino/Blink.ino:23
 120:	68 ee       	ldi	r22, 0xE8	; 232
 122:	73 e0       	ldi	r23, 0x03	; 3
 124:	80 e0       	ldi	r24, 0x00	; 0
 126:	90 e0       	ldi	r25, 0x00	; 0
 128:	0e 94 e5 00 	call	0x1ca	; 0x1ca <delay>
C:\Program Files\Arduino/Blink.ino:24
 12c:	08 95       	ret

The Blink.ino :

/*
  Blink
  Turns on an LED on for one second, then off for one second, repeatedly.
 
  This example code is in the public domain.
 */
 
// Pin 13 has an LED connected on most Arduino boards.
// give it a name:
int led = 13;

// the setup routine runs once when you press reset:
void setup() {                
  // initialize the digital pin as an output.
  pinMode(led, OUTPUT);     
}

// the loop routine runs over and over again forever:
void loop() {
  digitalWrite(led, HIGH);   // turn the LED on (HIGH is the voltage level)
  delay(1000);               // wait for a second
  digitalWrite(led, LOW);    // turn the LED off by making the voltage LOW
  delay(1000);               // wait for a second
}

Add "-I yourSketchDir" to the obj dump cmd.

westfw:
Add "-I yourSketchDir" to the obj dump cmd.

Thanks, that improved the situation somewhat. There is a keyword from the source line in the assembler list, but not everything, e.g. arguments of the functions are not there. Now loop() and setup() from the Blink.ino look like this:

00000100 <loop>:
loop():
C:\Program Files\Arduino/Blink.ino:20
 100:	80 91 00 01 	lds	r24, 0x0100
 104:	61 e0       	ldi	r22, 0x01	; 1
 106:	0e 94 b8 01 	call	0x370	; 0x370 <digitalWrite>
C:\Program Files\Arduino/Blink.ino:21
 10a:	68 ee       	ldi	r22, 0xE8	; 232
 10c:	73 e0       	ldi	r23, 0x03	; 3
 10e:	80 e0       	ldi	r24, 0x00	; 0
 110:	90 e0       	ldi	r25, 0x00	; 0
 112:	0e 94 e5 00 	call	0x1ca	; 0x1ca <delay>
C:\Program Files\Arduino/Blink.ino:22
 116:	80 91 00 01 	lds	r24, 0x0100
 11a:	60 e0       	ldi	r22, 0x00	; 0
 11c:	0e 94 b8 01 	call	0x370	; 0x370 <digitalWrite>
C:\Program Files\Arduino/Blink.ino:23
 120:	68 ee       	ldi	r22, 0xE8	; 232
 122:	73 e0       	ldi	r23, 0x03	; 3
 124:	80 e0       	ldi	r24, 0x00	; 0
 126:	90 e0       	ldi	r25, 0x00	; 0
 128:	0e 94 e5 00 	call	0x1ca	; 0x1ca <delay>
C:\Program Files\Arduino/Blink.ino:24
 12c:	08 95       	ret

0000012e <setup>:
setup():
C:\Program Files\Arduino/Blink.ino:15
 12e:	80 91 00 01 	lds	r24, 0x0100
 132:	61 e0       	ldi	r22, 0x01	; 1
 134:	0e 94 79 01 	call	0x2f2	; 0x2f2 <pinMode>
C:\Program Files\Arduino/Blink.ino:16
 138:	08 95       	ret

Is this the expected result or can it still be improved?

Edit. (blushing...). No, the assembler lists are the same! Something done wrong?
Arduino library routines are listed well, but the main sketch is not. E.g. the delay() is nice:

void delay(unsigned long ms)
{
	uint16_t start = (uint16_t)micros();

	while (ms > 0) {
 270:	21 15       	cp	r18, r1
 272:	31 05       	cpc	r19, r1
 274:	41 05       	cpc	r20, r1
 276:	51 05       	cpc	r21, r1
 278:	71 f6       	brne	.-100    	; 0x216 <delay+0x4c>
C:\Program Files\Arduino\hardware\arduino\cores\arduino/wiring.c:119
		if (((uint16_t)micros() - start) >= 1000) {
			ms--;
			start += 1000;
		}
	}
}
 27a:	08 95       	ret

Perhaps the main sketch should be modified to look like a library?

Edit: Now it worked with the -I -switch. :slight_smile: . I do not know what was wrong earlier. The info at
http://forum.arduino.cc/index.php?topic=160587.5;wap2
made me work harder and there was success. Thanks westfw!

Wouldn't this be a good option to add to the preferences page? Enquiring minds like to know what machine code they are generating. Very instructional, and at times essential.

David

dpharris:
... Enquiring minds like to know what machine code they are generating. Very instructional, and at times essential.
...

Yes. With the assembly listing one could tune short programs to work efficiently. When we use a timer with 16 Mhz frequency we could even count CPU cycles easily. It is also fascinating to see how the compiler optimizes the code.
I wrote a timing exercise:

/* 
How to count machine cycles of any instructions.
No external circuits needed. Tested with Arduino Uno 16 Mhz 
and Arduino IDE 1.0.5 
Written by optimistx, who takes no responsibility of this.
You may use this code as you like.
*/
void setup(){
// define the variables in the test instructions as volatile
// to prevent the optimizer to remove the instructions 
  byte volatile testbyte = 123;
  byte volatile ibyte = 0;
  int volatile i = 0;
  long int volatile j = 0;
  double volatile f = 1.0;
  
  byte t0,t1,t2,t3; 

  Serial.begin(115200);
// timer registers to initial values as in the atmega328 datasheet 
// arduino ide software had changed some 
  TIMSK2 = 0; // initial value, disables overflow interrupt
  TCCR2A = 0; // only timer2 op, no pwm (arduino changed to pwm, was B00000001) 
  TCCR2B = 0; // Stop Timer2, no prescaler. arduino had set prescaler 64
  TCNT2=0; // arduino had timer2 counting
  TIFR2 = 0; // should be initial value 0
  bitWrite(TIFR2, TOV2, 1); // TOV2 will be cleared to zero when writing one
   // (strange, but so the datasheet says and it worked so)

  noInterrupts();
  bitWrite(TCCR2B, CS20, 1); // Start Timer2
  t0 = TCNT2; // 1 cycle; takes then 2 cycles to store
  asm("nop\n");//1 cycle 
  asm("nop\n");//1
  asm("nop\n");//1
  asm("nop\n");//1
  t1 = TCNT2; // total of 7 cycles here; 
  asm("nop\n");
  asm("nop\n");
  asm("nop\n");
  asm("nop\n");
  t2 = TCNT2; // 13 = 7 + 2 + 1+1+1+1
  // test any instruction(s) between lines below or write your own
  // uncomment any example line below to run it
  // -------------------------------------
  testbyte = t2;  //  2 cycles with volatile testbyte
  
  //asm("nop\n"); // 1 cycle, else program error
  
  //ibyte = ibyte + t2; // 5 cycles with volatile ibyte, nonvolatile t2 
  
  //i = i + t2; // 12 cycles with volatile i (integer), nonvolatile t2
  
  //j = j + 123456L; // 20 cycles with volatile j (long integer)
  
  //f = f + 1.0; // 100 or 101 cycles with volatile f (floating point )
  
  //micros(); // 47 or 48 cycles = 3 microseconds
  
  //millis();// 21 or 22 cycles
  
  //for (byte ii = 0;ii < 10;ii++){ibyte = ibyte + ii;} // 90 cycles
  
  //for (int ii = 0; ii < 10;ii++){i = i + ii;} //  161 cycles
  
  //ibyte = bitRead(TIFR2, TOV2); // 4 cycles
  
  //Serial.print('x'); // 143 cycles. interrupts are off!
  // -----------------------------------------
  t3 = TCNT2; 
  interrupts();
  TCCR2B = 0; //Stop Timer2 
  
  
  if(bitRead(TIFR2, TOV2) == 0){ // if no overflow of timer2
    if((t0 != 1) || (t1 != 7) || (t2 != 13)){
      Serial.print(t0, DEC); Serial.print('\t');
      Serial.print(t1, DEC); Serial.print('\t');
      Serial.print(t2, DEC); Serial.print('\t');
      Serial.print(t3, DEC); Serial.println();
      Serial.println(F("The above should be 1\t 7\t 13\t ..."));
      Serial.println(F("The program might give wrong results"));
    }  
    Serial.print(F("The test instruction(s) took "));
    
    Serial.print (t3 - t2 - 2, DEC);
    Serial.print (F(" cycles  ( 62.5 ns each, if 16 Mhz CPU) "));
    Serial.println();
    if(t3-t2-2 <= 0){
      Serial.println(F("You may uncomment any example line(s),"));
      Serial.println(F("or add your own code. Then reload"));
    }
  } else {
    Serial.print(F("Timer2 overflow occurred, too much to do in 255 cycles"));
  }
}

void loop(){
}

So, with 1.5.x, it should be trivial to cause assembly listings to be generated at compile time, and not too difficult to cause a disassembly at the end of the build process. (when it works, I tend to find the disassembly more useful than the compiler-produced output.)

westfw:
...
(when it works, I tend to find the disassembly more useful than the compiler-produced output.)

More useful? When in doubt about the code produced, I would (also) trust the disassembly more. Or do you see other reasons for being more useful?

When trying to optimize interrupt service routines to be as fast as possible, it is nice to see, how the compiler is smart enough to save/restore only those registers which are really needed.
E.g. incrementing 4 byte volatile timer variable can be done in about 39 cycles of 62.5 ns. If an overflow interrupt happens every 256 cycles there is reason to think how to code: 39/256 is about 15 % of total cpu-cycles. However, premature optimization is a source of many unnecessary and complicated code sequences. But ah so interesting!

do you see other reasons

The assembler listing from the compiler:

  1. is full of debugging info and "noise"
  2. is pre-link, which means some optimization might not have been done ("relax"?), and absolute jump/call destinations are not filled in. Also, doesn't have the unused functions omitted by "gc-sections"
  3. doesn't have the full program including libraries.