Understanding the .hex file

Recently, I've been attempting to write a simulator for my Arduino Uno w/ ATMega328p.

I've done all the hard work as to implementing the ATMega instruction set, and defining arrays for the Data and Program Memory. All that seems to be working fine.

I've implemented these instructions via an Intel .hex file standard, like the Arduino IDE client creates when you compile your code. The .hex files I create by hand work just fine. I assume that when the micro-controller powers up, it sets the program counter to 0x00, and then reads instructions from the Program Memory beginning at 0x00. Since my .hex file begins as 0x00, this works just fine.

However, when I load a .hex file created by the Arduino IDE, the assembler instructions created don't make much sense. It appears to be doing several ADD register instructions, and then will randomly RJMP (relative jumps) to an address near 0x7000. There's no code down at Program Memory address 0x7000. The program is a very small "Hello World!" style program, just to make sure it's working before I go any further.

Can anyone describe what's going on in the Arduino .hex files? There seems to be quite a bit of Assembler overhead involved in declaring the setup() and loop() methods. I can't see any other reason for why there's so much data in the .hex file. Is there something funny that the Arduino does with the .hex to accommodate using the boot loader?

Well with a bootloader equipped processor the chip starts at reset or power up running the bootloader code ( located in high flash memory) which checks quickly if there is active upload request from the IDE, if not and there is a prior sketch loaded in the chip it jumps to it.

Lefty

So then, it's likely that the first several instructions in the .hex file (and subsequently Program Memory 0x00 ... up to say 0xA0) might be instructions to jump to some high address (eg: 0x7000) where the boot loader is, run some code there, and then to jump back to some lower address (eg: 0x1000) or so to where the program actually begins?

BKnight760:
So then, it's likely that the first several instructions in the .hex file (and subsequently Program Memory 0x00 ... up to say 0xA0) might be instructions to jump to some high address (eg: 0x7000) where the boot loader is, run some code there, and then to jump back to some lower address (eg: 0x1000) or so to where the program actually begins?

No
The sketch should not care whether or not there exists a bootloader, so it should not jump up as high as 0x7000, and there's nothing in the core code that indicates that should be happening. Jumping to the bootloader is taken care of by the fuses and reset circuitry, and jumping to user-sketch is done by the bootloader code via calling a function pointer to address 0.

I suggest you use a disassembler to translate your hex code back into assembly code, do this for both the hex file you hand right and the C++ compiler generated one

This might just be a simple endianness, address alignment, or other small problem

For example, the Sparkfun website has a tutorial for getting started with loading programs onto the microcontroller. They supply a .zip file with a .c file and the compiled .hex file (I've added spaces to indicate the formatting described below):

:10 0000 00 0C9434000C944F00 0C944F000C944F00 4F
:10 0010 00 0C944F000C944F00 0C944F000C944F00 24
:10 0020 00 0C944F000C944F00 0C944F000C944F00 14
:10 0030 00 0C944F000C944F00 0C944F000C944F00 04
:10 0040 00 0C944F000C944F00 0C944F000C944F00 F4
:10 0050 00 0C944F000C944F00 0C944F000C944F00 E4
:10 0060 00 0C944F000C944F00 11241FBECFEFD4E0 2E
:10 0070 00 DEBFCDBF11E0A0E0 B1E0E8EFF0E002C0 EC
:10 0080 00 05900D92A030B107 D9F711E0A0E0B1E0 E2
:10 0090 00 01C01D92A030B107 E1F70C9467000C94 E9
:10 00A0 00 00008FEF84B987B9 8EEF8AB9089501C0 37
:10 00B0 00 0197009759F020E0 0000000000000000 C8
:10 00C0 00 000000002F5F2A35 99F3F6CF08958FEF D7
:10 00D0 00 84B987B98EEF8AB9 8FEF88B985B98BB9 A2
:10 00E0 00 84EF91E00E945700 18B815B81BB884EF 50
:08 00F0 00 91E00E945700F0CF DF
:00 0000 01 FF

The format of the hex file is (Intel HEX - Wikipedia):

Start code, one character, an ASCII colon ':'.

Byte count, two hex digits, a number of bytes (hex digit pairs) in the data field. 16 (0x10) or 32 (0x20) bytes of data are the usual compromise values between line length and address overhead.

Address, four hex digits, a 16-bit address of the beginning of the memory position for the data. Limited to 64 kilobytes, the limit is worked around by specifying higher bits via additional record types. This address is big endian.

Record type, two hex digits, 00 to 05, defining the type of the data field.

Data, a sequence of n bytes of the data themselves, represented by 2n hex digits.

Checksum, two hex digits

[:][Byte Count][Address][Record Type][Data][Checksum]

So, the following line of the hex file can be split like this:

[:][10] [0060] [00] [0C944F000C944F0011241FBECFEFD4E0] [2E]

Instructions are 16-bits, or 4 hex digits.
The RJMP opcode is described in the ATMEGA datasheet as : 1100 kkkk kkkk kkkk
0b'1100 = 0xC
So, an instruction that has 0xC as the first digit, implies that it is an RJMP instruction:

The data from the .hex file can be parsed into the following instructions:
0C94
4F00
0C94
4F00
1124
1FBE
CFEF <--- This is the RJMP 0xFEF instruction. (0xFEF = decimal 4079)
D4E0

According to the addresses specified in the .hex file, the actual program is only 0x00F8 bytes long. I'm unsure why we're jumping 4079 bytes, when our program does not exist there.

1 Like

Frank, thank you for your reply.

Is there a program that will dis-assemble these .hex files? Could you point me to one?

I will gladly run the disassembly and check what actual assembler instructions are being used. Perhaps that will provide some clarity.

You could look at the disassembly of the sketch - could save some time.

You're right, no need to disassemble if the original assembly listing still exists somewhere. Any idea what it's named?

I've found the /Temp/build.../ directory where the code is compiled, but don't have anything that looks like assembly.

I have:

blink_1MHz.cpp <--- the original source code.
blink_1MHz.cpp.eep <--- just a single EOF record for the .hex file.
blink_1MHz.cpp.elf <--- Bunch of garbled text (Looks like it could be a raw hex file)
blink_1MHz.cpp.hex <--- .hex file as shown earlier.
blink_1MHz.cpp.o <--- object file

There's a few more .o files listed there, but I assume the object files aren't what I'm looking for. I was expecting to find a .s or a .lss file, but didn't see one.

ok, I found how to disassemble the .cpp.elf file using avr-objdump -S command from /hardware/tools/avr/bin/

That worked and now I have the following assembler:

C:\blink_1MHz.cpp.elf:     file format elf32-avr


Disassembly of section .text:

00000000 <__vectors>:
    }
   
    return(0);
}

void ioinit (void)
   0:	0c 94 34 00 	jmp	0x68	; 0x68 <__ctors_end>
   4:	0c 94 51 00 	jmp	0xa2	; 0xa2 <__bad_interrupt>
   8:	0c 94 51 00 	jmp	0xa2	; 0xa2 <__bad_interrupt>
   c:	0c 94 51 00 	jmp	0xa2	; 0xa2 <__bad_interrupt>
  10:	0c 94 51 00 	jmp	0xa2	; 0xa2 <__bad_interrupt>
  14:	0c 94 51 00 	jmp	0xa2	; 0xa2 <__bad_interrupt>
  18:	0c 94 51 00 	jmp	0xa2	; 0xa2 <__bad_interrupt>
  1c:	0c 94 51 00 	jmp	0xa2	; 0xa2 <__bad_interrupt>
  20:	0c 94 51 00 	jmp	0xa2	; 0xa2 <__bad_interrupt>
  24:	0c 94 51 00 	jmp	0xa2	; 0xa2 <__bad_interrupt>
  28:	0c 94 51 00 	jmp	0xa2	; 0xa2 <__bad_interrupt>
  2c:	0c 94 51 00 	jmp	0xa2	; 0xa2 <__bad_interrupt>
  30:	0c 94 51 00 	jmp	0xa2	; 0xa2 <__bad_interrupt>
  34:	0c 94 51 00 	jmp	0xa2	; 0xa2 <__bad_interrupt>
  38:	0c 94 51 00 	jmp	0xa2	; 0xa2 <__bad_interrupt>
  3c:	0c 94 51 00 	jmp	0xa2	; 0xa2 <__bad_interrupt>
  40:	0c 94 51 00 	jmp	0xa2	; 0xa2 <__bad_interrupt>
  44:	0c 94 51 00 	jmp	0xa2	; 0xa2 <__bad_interrupt>
  48:	0c 94 51 00 	jmp	0xa2	; 0xa2 <__bad_interrupt>
  4c:	0c 94 51 00 	jmp	0xa2	; 0xa2 <__bad_interrupt>
  50:	0c 94 51 00 	jmp	0xa2	; 0xa2 <__bad_interrupt>
  54:	0c 94 51 00 	jmp	0xa2	; 0xa2 <__bad_interrupt>
  58:	0c 94 51 00 	jmp	0xa2	; 0xa2 <__bad_interrupt>
  5c:	0c 94 51 00 	jmp	0xa2	; 0xa2 <__bad_interrupt>
  60:	0c 94 51 00 	jmp	0xa2	; 0xa2 <__bad_interrupt>
  64:	0c 94 51 00 	jmp	0xa2	; 0xa2 <__bad_interrupt>

00000068 <__ctors_end>:
  68:	11 24       	eor	r1, r1
  6a:	1f be       	out	0x3f, r1	; 63
  6c:	cf ef       	ldi	r28, 0xFF	; 255
  6e:	d8 e0       	ldi	r29, 0x08	; 8
  70:	de bf       	out	0x3e, r29	; 62
  72:	cd bf       	out	0x3d, r28	; 61

00000074 <__do_copy_data>:
  74:	11 e0       	ldi	r17, 0x01	; 1
  76:	a0 e0       	ldi	r26, 0x00	; 0
  78:	b1 e0       	ldi	r27, 0x01	; 1
  7a:	e2 e0       	ldi	r30, 0x02	; 2
  7c:	f1 e0       	ldi	r31, 0x01	; 1
  7e:	02 c0       	rjmp	.+4      	; 0x84 <.do_copy_data_start>

00000080 <.do_copy_data_loop>:
  80:	05 90       	lpm	r0, Z+
  82:	0d 92       	st	X+, r0

00000084 <.do_copy_data_start>:
  84:	a0 30       	cpi	r26, 0x00	; 0
  86:	b1 07       	cpc	r27, r17
  88:	d9 f7       	brne	.-10     	; 0x80 <.do_copy_data_loop>

0000008a <__do_clear_bss>:
  8a:	11 e0       	ldi	r17, 0x01	; 1
  8c:	a0 e0       	ldi	r26, 0x00	; 0
  8e:	b1 e0       	ldi	r27, 0x01	; 1
  90:	01 c0       	rjmp	.+2      	; 0x94 <.do_clear_bss_start>

00000092 <.do_clear_bss_loop>:
  92:	1d 92       	st	X+, r1

00000094 <.do_clear_bss_start>:
  94:	a0 30       	cpi	r26, 0x00	; 0
  96:	b1 07       	cpc	r27, r17
  98:	e1 f7       	brne	.-8      	; 0x92 <.do_clear_bss_loop>
  9a:	0e 94 53 00 	call	0xa6	; 0xa6 <main>
  9e:	0c 94 7f 00 	jmp	0xfe	; 0xfe <_exit>

000000a2 <__bad_interrupt>:
  a2:	0c 94 00 00 	jmp	0	; 0x0 <__vectors>

000000a6 <main>:
{
    //1 = output, 0 = input
    DDRB = 0b11111111; //All outputs
  a6:	8f ef       	ldi	r24, 0xFF	; 255
  a8:	84 b9       	out	0x04, r24	; 4
    DDRC = 0b11111111; //All outputs
  aa:	87 b9       	out	0x07, r24	; 7
    DDRD = 0b11111110; //PORTD (RX on PD0)
  ac:	8e ef       	ldi	r24, 0xFE	; 254
  ae:	8a b9       	out	0x0a, r24	; 10
{
    ioinit(); //Setup IO pins and defaults

    while(1)
    {
		PORTC = 0xFF;
  b0:	3f ef       	ldi	r19, 0xFF	; 255
  b2:	38 b9       	out	0x08, r19	; 8
		PORTB = 0xFF;
  b4:	35 b9       	out	0x05, r19	; 5
		PORTD = 0xFF;
  b6:	3b b9       	out	0x0b, r19	; 11
  b8:	84 ef       	ldi	r24, 0xF4	; 244
  ba:	91 e0       	ldi	r25, 0x01	; 1
  bc:	0b c0       	rjmp	.+22     	; 0xd4 <main+0x2e>
	...
//General short delays
void delay_ms(uint16_t x)
{
  uint8_t y, z;
  for ( ; x > 0 ; x--){
    for ( y = 0 ; y < 90 ; y++){
  ca:	2f 5f       	subi	r18, 0xFF	; 255
  cc:	2a 35       	cpi	r18, 0x5A	; 90
  ce:	b9 f7       	brne	.-18     	; 0xbe <main+0x18>

//General short delays
void delay_ms(uint16_t x)
{
  uint8_t y, z;
  for ( ; x > 0 ; x--){
  d0:	01 97       	sbiw	r24, 0x01	; 1
  d2:	11 f0       	breq	.+4      	; 0xd8 <main+0x32>
  d4:	20 e0       	ldi	r18, 0x00	; 0
  d6:	f3 cf       	rjmp	.-26     	; 0xbe <main+0x18>
		PORTC = 0xFF;
		PORTB = 0xFF;
		PORTD = 0xFF;
		delay_ms(500);

		PORTC = 0x00;
  d8:	18 b8       	out	0x08, r1	; 8
		PORTB = 0x00;
  da:	15 b8       	out	0x05, r1	; 5
		PORTD = 0x00;
  dc:	1b b8       	out	0x0b, r1	; 11
  de:	84 ef       	ldi	r24, 0xF4	; 244
  e0:	91 e0       	ldi	r25, 0x01	; 1
  e2:	0b c0       	rjmp	.+22     	; 0xfa <main+0x54>
	...
//General short delays
void delay_ms(uint16_t x)
{
  uint8_t y, z;
  for ( ; x > 0 ; x--){
    for ( y = 0 ; y < 90 ; y++){
  f0:	2f 5f       	subi	r18, 0xFF	; 255
  f2:	2a 35       	cpi	r18, 0x5A	; 90
  f4:	b9 f7       	brne	.-18     	; 0xe4 <main+0x3e>

//General short delays
void delay_ms(uint16_t x)
{
  uint8_t y, z;
  for ( ; x > 0 ; x--){
  f6:	01 97       	sbiw	r24, 0x01	; 1
  f8:	e1 f2       	breq	.-72     	; 0xb2 <main+0xc>
  fa:	20 e0       	ldi	r18, 0x00	; 0
  fc:	f3 cf       	rjmp	.-26     	; 0xe4 <main+0x3e>

000000fe <_exit>:
  fe:	f8 94       	cli

00000100 <__stop_program>:
 100:	ff cf       	rjmp	.-2      	; 0x100 <__stop_program>

From my somewhat crude understanding of this assembly,
We start out at 0x00, which has the JMP 0x68 instruction.
Immediately we jump to 0x68. It looks like most of this is overhead associated with setting up the timer.
Eventually we make it down to 0x9A where we do a jump to 0xA6 .
This is where our actual program starts.

I can follow it through from there.

So what happened to the CFEF instruction that was in the .hex file? Am I using the wrong instruction set?

In the assembler, I see it listed as:

  6c:	cf ef       	ldi	r28, 0xFF	; 255

Why is this coming up as a load immediate instead of an RJMP?!

LDI's opcode is

1110 KKKK dddd KKKK

0b'1110 = 0xE

Shouldn't this instruction start with E, not C?

I posted this question over on AVRFreaks.net and found the answer I was looking for, so I thought it would be prudent to post the answer here as well for people who are seeing the same issue:

http://www.avrfreaks.net/index.php?name=PNphpBB2&file=viewtopic&p=821010#821010

Turns out the bytes of the instruction are being reversed.

0x0C94  -->  0x940C

It appears that the second word for the double-word instruction is also reversed.

0x3400  -->  0x0034

Then 0x0034 is being shifted to the left one bit which results in 0x68. This is done because the JMP address needs to end up on an even byte boundry, so the last binary digit in the jump address will always be zero. They can save a bit (and thus address a space one power of two larger) if they store the bit-shifted number instead.

That's how the compiler is generating the JMP 0x68 assembler instruction.

I'm still not quite sure how to tell if the bytes have been reversed or not. Is that just for double-word instructions? Is there some documentation I missed that indicates this is how it works?

All AVR instructions are 16 bits wide. The intel .hex format is an encoding of a 8bit byte stream. In order to make a byte stream into 16bit quantities, you have to decide (arbitrarily) whether you're going to put the low-order byte in the lower address (which sort of makes sense, and it called "little endian"), or whether you're going to put the high-order byte in the lower address (which reads like arabic numerals, and is called "big endian.") It sounds like your simulator was written assuming that the .hex files were big-endian, but all the normal AVR tools generate little-endian files. See "IEN137: On Holy Wars and a Plea for Peace."

westfw,

Thank you for your reply. You are exactly right. The AVR GCC creates "Little-Endian" (low-byte at the lower address) hex files. I made a quick change to my program and everything started working as expected.

Is the data in the program memory on the ATMega328 also stored Little-Endian, or does the programmer swap the bytes as it uploads the sketch? I assume the latter, as the ATMega328 datasheet shows the instructions in "Big-Endian" format.

Is the data in the program memory on the ATMega328 also stored Little-Endian

The AVR is consistently little-endian. The relevant quote is in the AVR Instruction Set Reference manual, under the description of the LPM (load program memory) instruction:

Constant byte address is specified by the Z-register contents. The 15 MSBs select word address. For LPM, the LSB selects low byte if cleared (LSB = 0) or high byte if set (LSB = 1).

(SPM gets "special", and can only write 16-bits at a time.)
16 bit data in RAM is also little-endian, although I think the LDS instruction is the only place it shows up. Registers too (16 bit pointers like Z have the low bits of the address in the low-numbered register.)
The way numbers are "shown" has little to do with the way that they are stored in memory. x86 and Vax are famous consistent little-endian arcitctures. 68k is a famous big-endian architecture. A number of modern risc architectures (PPC, MIPS, ARM) allow either model of data access, sometimes based on a per-process status bit (one of the reasons that the G4 Macs could do a near-credible emulation of x86 machines was because they could run the x86 emulation processes in littleendian mode even though the core Mac software was big-endian.)

It's something you become intimately familiar with when you write networking code, since the "problem" is replicated when it comes to the order of data transmission...