Use 'C' Pointer To Flash Memory In Inline ASM [Solved]

I am currently working on a space and cycle constrained project on an Arduino Uno R3. I am having difficulty finding the proper way of loading a 16 bit pointer to flash (PROGMEM) into the Z register. Example as follows:

#include "Arduino.h"
#include <avr/pgmspace.h>

const uint8_t tbl[256] PROGMEM = {
    0x80,0x83,0x86,0x89,0x8C,0x8F,0x92,0x95
};

int main(void) {
  asm (
    "init:                \n" // Initialize routine.
    " LDI  ZL,lo8(%[tbl]) \n" //Load low byte of pointer to table.
    " LDI  ZH,hi8(%[tbl]) \n" //Load high byte of pointer to table.
  :
  : [tbl] "r" (tbl)  
  : 
  );
}

Compiler error similar to:

/home/rah/Arduino/sketch_feb12a/sketch_feb12a.ino:16: undefined reference to `r24'
/home/rah/Arduino/sketch_feb12a/sketch_feb12a.ino:16: undefined reference to `r24'

Assistance would be appreciated.

The microcontrollers from the AVR family can not read data from Flash memory and they can not execute code from ram memory.
A pointer to data is always a pointer to ram memory and a pointer (label) to code is always a pointer to Flash memory.

Why do you want to write assembly code ?
What are you going to do with that pointer ? Are you calling one of the pgm_read...() functions ?

AFAIK PROGMEM is the key word that enables data to be stored in Flash to be read later on.

I found this topic and tried to mix a C sketch with an .S file:

The .ino file:

extern "C" uint8_t read_array(uint8_t);

void setup() {
  Serial.begin(250000);
}

void loop() {
 
  for (uint8_t Index = 0; Index < 6 ; Index++)
  {
    read_array(Index);
    Serial.println(read_array(Index));
    delay(500);
  }
}

Then the .S file (Note uppercase S) that you can name as you want, e.g. Essay.S inside the .ino folder:

#include   "avr/io.h"

.global read_array
; input:    r24 = array index, r1 = 0
                                    ; output:   r24 = array value
                                        ; clobbers: r30, r31
read_array:
ldi r30, lo8(my_array)  ; load Z = address of my_array
ldi r31, hi8(my_array)  ; ...high byte also
add r30, r24            ; add the array index
adc r31, r1             ; ...and add 0 to propagate the carry
lpm r24, Z
ret

.global my_array
.type   my_array, @object
my_array:
.byte 12, 34, 56, 78, 80, 90

Koepel:
The microcontrollers from the AVR family can not read data from Flash memory and they can not execute code from ram memory. A pointer to data is always a pointer to ram memory and a pointer (label) to code is always a pointer to Flash memory.

Thank you for your response.

It's unclear to me if this is correct. I am referencing the following documentation:

"ATmega328P 8-bit AVR Microcontroller with 32K Bytes In-System Programmable Flash"
http://ww1.microchip.com/downloads/en/DeviceDoc/Atmel-7810-Automotive-Microcontrollers-ATmega328P_Datasheet.pdf

Section 7.2:

Constant tables can be allocated within the entire program memory address space (see the LPM – load program memory instruction description).

AppNote for the LPM instruction:

"AVR108: Setup and Use of the LPM Instruction"

Koepel:
Why do you want to write assembly code ?
What are you going to do with that pointer ? Are you calling one of the pgm_read...() functions ?

As I wrote in the OP I am currently working on a space and cycle constrained project. I require an exact cycle count in the main loop for verification. In my experience ASM is the preferred way to repeatedly and consistently yield cycle count verification independent of 'C' compiler version and optimization level.

Per the edit to my post (apologies, it's unclear how I posted it before completion) I require the address to the Program Memory where a table (per reference to section 7.2 above) is stored. The PROGMEM mechanism seems like a convenient way to let the C compiler and linker do the work of putting the table where it fits in flash without resorting to linker scripts.

Less work is good. However, I'm missing pieces of how the PROGMEM mechanism and gcc's inline asm syntax could work together. Thus the OP.

I am not using the pgm_* family of functions. Given the cycle count requirement it's preferable to do that in ASM.

For example, according to:

"AVR Instruction Set Manual"

the LPM instruction takes 3 cycles.

I will revisit the source code for the pgm_* functions again and see if I can find an answer there.

Appreciatively,

-Richard

rahealy:
As I wrote in the OP I am currently working on a space and cycle constrained project.

My first reaction to that problem is to throw better hardware at it. More powerful processors with more resources are cheap. Programers' time is expensive ..... unless you're banging out thousands of units per month.

gfvalvo:
My first reaction to that problem is to throw better hardware at it. More powerful processors with more resources are cheap. Programers' time is expensive ..... unless you're banging out thousands of units per month.

Hi,
For my motivation please see:

I am currently jobless and homeless. The rules of neoliberal capitalism are not in play for this one. :smiley:

Appreciatively,

-Richard

Might be of help: Arduino Inline Assembly Tutorial (Tables)

The LPM instructions is the way to go. I didn't know you were going that low level :wink:
The pgm_read...() functions is the LPM instructions.

Just in case this is your first rodeo, when you start mixing asm and C, don’t overlook register usage. You’re fine with LPM as it just clobbers Z or R30 & 31. The GCC reference is linked below, the section under registers is helpful when spinning the AVR’s gears without the compilers knowledge.

https://gcc.gnu.org/wiki/avr-gcc#Register_Layout

rahealy:
Less work is good.

In which case you are approaching the problem backwards. It is easily within the realm of possibility that the compiler generated code will be as good or better than anything you can hand assemble. But, given your necessity for repeatable builds...

  • Get your program working as well as you can in C
  • Dump the ELF into an assembly listing
  • Get the dumped assembly to build and run as well as you can
  • You now have baseline assembly
  • Hand tune the timing / review the execution paths for timing
  • You now have assembly that meets your criteria

In other words, get the C compiler to do the bulk of the work.

That process is how I built this.

  : [tbl] "r" (tbl)

“r” is not the proper way to pass a 16bit constant from C to asm. Find the “cookbook” for inline asm is avr-bcc

Thank you for your response.
Check. Check. Check. Did all that including rigging a simavr + GDB test harness before uploading to hardware.

I feel my question arises from a lack of specifics that aren't readily apparent (or much more likely) I've missed in the objdump output and 'C' source.

At some point in the learning process I feel one should accede to one's current limitations both in terms of patience and time and ask for help. The more esoteric questions are sometimes the most interesting. Why should I have all the fun?

When I do find the answer I'll definitely follow up. :smiley:

Appreciatively,

-Richard

WattsThat:
Just in case this is your first rodeo, when you start mixing asm and C, don’t overlook register usage. You’re fine with LPM as it just clobbers Z or R30 & 31. The GCC reference is linked below, the section under registers is helpful when spinning the AVR’s gears without the compilers knowledge.

avr-gcc - GCC Wiki

Thank you for the links.

One place I got into trouble with GCC was the 'C' ABI that reserves r18 and above for 'C' things. The workaround was to make sure all inline ASM was in the main loop, runs last after all 'C' initialization, and to declare all uint8_t variables used in the main loop as global and 'register'.

Polling hardware is currently well under the minimum cycle overhead incurred by an interrupt so hardware interrupts are not necessary at this time. This simplifies register and stack concerns considerably.

I haven't gotten as far as observing where the 'C' compiler puts larger global arrays in SRAM (1024 bytes for example). Since the ASM in the main loop is the last and only code to run and doesn't use the stack I don't think it will matter if the 'C' compiler puts the array on the stack or heap but it's on my list of things to check once I get there.

Appreciatively,

-Richard

Here's the cookbook I referenced:
https://www.nongnu.org/avr-libc/user-manual/inline_asm.html

You want your code to look like:

int main(void) {
  asm (
    "init:                \n" // Initialize routine.
    // tbl address is already in z
    " movw r24, %[tbl]    \n"
  :
  : [tbl] "z" (tbl)
  :
  );
}

The "z" modifier all cause the compiler to put the designated thing in the "z" register for you.

westfw:
Here's the cookbook I referenced:
Inline Assembler Cookbook

You want your code to look like:

int main(void) {

asm (
   "init:                \n" // Initialize routine.
   // tbl address is already in z
   " movw r24, %[tbl]    \n"
 :
 : [tbl] "z" (tbl)
 :
 );
}



The "z" modifier all cause the compiler to put the designated thing in the "z" register for you.

Thank you everyone for your help!

First, the 'z' modifier is the key if you want to let the compiler load the 'Z' register for you. Second, in gcc's inline asm writing '%[tbl]' to get the address of the table is incorrect.

Here is a small example. The first '#if'd' block demonstrates letting the compiler load the Z register. The second demonstrates loading it from a pointer. Following is the objdump showing the asm output for both methods.

#include "Arduino.h"

//
//sintbl[]
// First 8 bytes of a 256 byte sine table
//
const uint8_t sintbl[8] PROGMEM = {
    0x80,0x83,0x86,0x89,0x8C,0x8F,0x92,0x95
};

//
//main()
// Load first byte of sin table into r24, increment Z. NOP's exist to
// help find in objdump.
//
int main(void) {
#if 1
  asm (
    "compiler_lod_z: \n"
    " NOP            \n"
    " LPM r24, Z+    \n"
    " NOP            \n"
    :
    : [tbl] "z" (sintbl)
    :
  );
#else
  asm (
    "assembler_lod_z:      \n"
    " NOP                  \n"
    " LDI r30, lo8(sintbl) \n" 
    " LDI r31, hi8(sintbl) \n"
    " LPM r24, Z+          \n"
    " NOP                  \n"
    :
    :
    :
  );
#endif
}

For both examples the table is stored in address 0x0068:

00000068 <__trampolines_end>:
  68:	80 83       	st	Z, r24
  6a:	86 89       	ldd	r24, Z+22	; 0x16
  6c:	8c 8f       	std	Y+28, r24	; 0x1c
  6e:	92 95       	swap	r25

First example (r30,31 are the 'Z' register):

00000088 <main>:
  88:	e8 e6       	ldi	r30, 0x68	; 104
  8a:	f0 e0       	ldi	r31, 0x00	; 0
0000008c <compiler_lod_z>:
  8c:	00 00       	nop
  8e:	85 91       	lpm	r24, Z+
  90:	00 00       	nop

Second example (r30,31 are the 'Z' register):

00000088 <main>:
  88:	00 00       	nop
  8a:	e8 e6       	ldi	r30, 0x68	; 104
  8c:	f0 e0       	ldi	r31, 0x00	; 0
  8e:	85 91       	lpm	r24, Z+
  90:	00 00       	nop

I'm not sure if this behavior will persist but now at least I know what I should be looking for. :slight_smile:

Again, thank you for all your help!

Appreciatively,

-Richard

This topic was automatically closed 120 days after the last reply. New replies are no longer allowed.