Compiled inline assembler questions

For my project (stateanalyzer with Due) I need some assembler code. When I look at the compiled result some questions come up.
In the attached PDF document my code and the result of compilation (assembler) is shown side by side. Corresponding lines of my code and the compiled result are shown on the same line left/right. The assembler code conforms to the SAMD3X assembler syntax which is the controller on the Due board.
MyInlineAssembler.pdf (192.3 KB)
Now my questions:

  1. The compiler refuses to use registers r10 and r11. Have these registers special meanings so that they cannot be used?

  2. Register r9 shall be filled with a constant value 0x00800000. The compiler first fills r4 with this value and then copies r4 to r9. Why not directly fill r9?

  3. Register r3 holds the address of I/O register PIOC_SODR. R3 is used only for this purpose. But the compiler first copies r3 to fp and then uses fp as address for str and ldr statements. Is it not allowed to use r3 as address, e.g. str r9, [r3] ?

Since you have a whole function written in asm, why not just put it in a .S file so the c compiler doesn’t get a chance to change it?

R10 would normally be a callee-saved register…
Thumb is weird wrt using high numbered registers, but I think it should be possible on the Due’s m3.

@SupArdu
See Anas Kuzechie's how-to: Seek 3m04s

AFAIK, he's wrong - the .S and .ino files do NOT need to have the same name; in fact, you can have multiple .S files.

I have other comments on the assembler, if OP ever returns...
(here it is in code tags, BTW:)

boolean getData(uint8_t* ptdata, uint8_t* ptend) {
  uint32_t st_value=0x80000; // from EPM3064
  boolean retval=true;
  do {
    asm volatile (
      "mov r10, %[PIODSODR] \n\t" // address of PIOD_SODR
      "mov r11, %[PIOCSODR] \n\t" // address of PIOC_SODR
      "mov r9, %[VTICPIN] \n\t" // value for TICPIN, 0x00800000
      "mov r8, (0x400) \n\t" // value for FLAG_CLEAR, 0x400
      "mov r6, %2 \n\t" // ptdata
      "mov r7, %3 \n\t" // ptend
      "str r9, [r11, #0] \n\t" // set TICPIN (bit23 of PortC)
      "1: \n\t" // wait for FLAG
      "ldr r2, [r11, #12] \n\t" // input PORTC von REG_PIOC_PDSR
      "lsls r1, r2, #22 \n\t" // FLAG is bit 9 of register R2
      "bpl 1b \n\t" // jump back to "1:" if FLAG=0
      "lsr r2, r2, #1 \n\t" // shift right r2
      "strb r2, [r6], #1 \n\t" // store shifted r2 to memory
      // confirm read:
      "str r8, [r10, #4] \n\t" // FLAG_CLEAR low, REG_PIOD_CODR
      "str r8, [r10, #0] \n\t" // FLAG_CLEAR high, REG_PIOD_SODR
      // check error flag
      "lsls r1, r2, #13 \n\t" // ERROR_FLAG now (after lsr 1) is bit18 of R2
      "bmi 2f \n\t" // break, end read loop
      "str r9, [r11, #4] \n\t" // clear TICPIN, REG_PIOC_CODR
      "cmp r7, r6 \n\t" // ptdata == ptend?
      "bne 1b \n\t" // jump back to „1:“ for next data
      "2: \n\t"
      "nop"
      :
      : [PIOCSODR] "r" (0x400e1230), // (0x400e1230) REG_PIOC_SODR
        [PIODSODR] "r" (0x400e1430), // (0x400e1430) REG_PIOD_SODR
        "r" (ptdata),
        "r" (ptend),
        [VTICPIN] "r" (0x00800000) // value to set/clear TICPIN
      : "cc", "memory"
      );
    // check ERROR_FLAG (force break)
    if (st_value & ERROR_FLAG) {
      break; // end while loop
    }
  } while (ptdata < ptend);
  return(retval);
}
1 Like

Thank you. I learn something new every day.

@westfw Do you use a macro on your assembler code or put it in a "inline mangler" to insert the double-quotes and <CR><TAB> to create the inline code?

Nice to see my first language (asm) is still being used by some. I started out on IBM 1401, then IBM 360 assembler. I also learned Intel assembler. I have no plans to learn another assembler at age 82 but I do appreciate seeing it used.

The code around the inline assembler part comes from my first tests with C only. For my project of a stateanaIyzer with Due I tested the maximally achievable speed of the while loop and saw only 1.3 MHz. Now with assembler 4 MHz is possible!

In the initial version of getData() with C only ptdata was increased until it reached ptend. In this assembler version I forced a return by setting st_value in order to get the break at the end. This of course is not the final solution :innocent:

BTW: I was astonished about the optimization of the compiler. The resulting assembler code for getData() doesn't contain anything about definition and use of the variables st_value and retval. Indeed, if the code around the assembler part would be deleted it would work too.

Hmm. When I've written inline assembler, it's usually pretty short bits of code, and I think I usually just manually insert them, or use some minor editor magic (query replace to add the \n\t or a keyboard macro.) (helps to have a better editor, like EMACS!)

The code I put above was copy/pasted from the OP's .pdf file.


Comments on the code:
st_value and retval are never modified.

ptdata and ptend should already be in (low) registers, as per EABI (or would be, if this were a .S function actually called.)

Using "r" as an argument type for the inline asm should mean that PIOCSODR, PIODSODR, ptdata, ptend, and VTICPIN are already in registers; you shouldn't have to move them into different explicit registers. Theoretically, anyway. I'm not sure what happens if you run out of registers.
This allows the compiler to do register allocation for you, which (in theory) is a big win of using inline asm vs a .S file.
It'd look like:

      "ldr r2, [%[PIOCSODR], #12] \n\t" // input PORTC von REG_PIOC_PDSR

Using high regisers (r8-r11) means those will assemble to 32bit thumb2 instructions instead of (maybe) 16bit instructions, which one would THINK would have a performance impact, especially since they're in the inner loop. But that may be "complicated" and have more to do with flash instruction fetches/caching than instruction timing.


Isn't that duplicated by the C do..while loop you have around the asm?


I've re-written (I think. Untested, obviously) your inline asm, and the inner loop now looks like:

   801b0:       68ea            ldr     r2, [r5, #12]
   801b2:       0591            lsls    r1, r2, #22
   801b4:       d5fc            bpl.n   801b0 <getData2(unsigned char*, unsigned char*)+0x10>
   801b6:       ea4f 0252       mov.w   r2, r2, lsr #1
   801ba:       f800 2b01       strb.w  r2, [r0], #1
   801be:       6063            str     r3, [r4, #4]
   801c0:       6023            str     r3, [r4, #0]
   801c2:       0351            lsls    r1, r2, #13
   801c4:       d402            bmi.n   801cc <getData2(unsigned char*, unsigned char*)+0x2c>
   801c6:       606a            str     r2, [r5, #4]
   801c8:       4281            cmp     r1, r0
   801ca:       d1f1            bne.n   801b0 <getData2(unsigned char*, unsigned char*)+0x10>

Do you want to see it (or should you)? Or would you rather play with it some more yourself? :slight_smile:

st_value and retval are never modified.

Yes, never modified, but st_value is used in an if statement. As mentioned in post #7 these are relicts from my initial C-only version.

The assembler part is my first trial to use inline assembler. I derived it from examples in some documents, I got links for, from asking in forums, without really understanding how the compiler will handle with inline assembler. All examples were from AVR and i86 assembler.

I analyzed the assembler the compiler had generated from my C-only version and wanted to speed it up by avoiding repeatedly loading addresses and values to registers within the inner loop. My idea was first to store all addresses and values needed in the inner loop to registers and then use the registers in the inner loop because register accesses are faster than memory accesses.

I appreciate your effort to explain inline assembler backgrounds and to write a new version. Therefore I'd like to see your initial inline assembler, e.g. how r5 and r4 are filled.
I learned that ptdata and ptend are handed over in r0 and r1. In your assembler r1 (=ptend) is modified in the second statement (lsls ...) but it should be preserved for comparison with incremented r0 at the end.
I hope I can get a very fast inner loop with your hints and examples.
And now I have a better understanding how to use inline assembler, thank you very much.

This still has the semi-unused variables, and the redundant check of ptdata in the C code...

boolean getData2(uint8_t* ptdata, uint8_t* ptend) {
  uint32_t st_value = 0x80000; // from EPM3064
  boolean retval = true;
  do {
    asm volatile (
      "str %[VTICPIN], [%[PIOCSODR], #0] \n\t" // set TICPIN (bit23 of PortC)
      "1: \n\t" // wait for FLAG
      "ldr r2, [%[PIOCSODR], #12] \n\t" // input PORTC von REG_PIOC_PDSR
      "lsls r1, r2, #22 \n\t" // FLAG is bit 9 of register R2
      "bpl 1b \n\t" // jump back to "1:" if FLAG=0
      "lsr r2, r2, #1 \n\t" // shift right r2
      "strb r2, [%[PTDAT]], #1 \n\t" // store shifted r2 to memory
      // confirm read:
      "str %[FLAGCLR], [%[PIODSODR], #4] \n\t" // FLAG_CLEAR low, REG_PIOD_CODR
      "str %[FLAGCLR], [%[PIODSODR], #0] \n\t" // FLAG_CLEAR high, REG_PIOD_SODR
      // check error flag
      "lsls r1, r2, #13 \n\t" // ERROR_FLAG now (after lsr 1) is bit18 of R2
      "bmi 2f \n\t" // break, end read loop
      "str %[VTICPIN], [%[PIOCSODR], #4] \n\t" // clear TICPIN, REG_PIOC_CODR
      "cmp %[PTEND], %[PTDAT] \n\t" // ptdata == ptend?
      "bne 1b \n\t" // jump back to „1:“ for next data
      "2: \n\t"
      "nop"
      :
      : [PIOCSODR] "r" (0x400e1230), // (0x400e1230) REG_PIOC_SODR
      [PIODSODR] "r" (0x400e1430), // (0x400e1430) REG_PIOD_SODR
      [PTDAT] "r" (ptdata),
      [PTEND] "r" (ptend),
      [VTICPIN] "r" (0x00800000), // value to set/clear TICPIN
      [FLAGCLR] "r" (0x400)
      : "cc", "memory"
    );
    // check ERROR_FLAG (force break)
    if (st_value & ERROR_FLAG) {
      break; // end while loop
    }
  } while (ptdata < ptend);
  return (retval);
}

And the full assembler:

000801a0 <getData2(unsigned char*, unsigned char*)>:

boolean getData2(uint8_t* ptdata, uint8_t* ptend) {
   801a0:       b430            push    {r4, r5}
      [PTDAT] "r" (ptdata),
      [PTEND] "r" (ptend),
      [VTICPIN] "r" (0x00800000), // value to set/clear TICPIN
      [FLAGCLR] "r" (0x400)
      : "cc", "memory"
    );
   801a2:       4d0d            ldr     r5, [pc, #52]   ; (801d8 <getData2(unsigned char*, unsigned char*)+0x38>)
   801a4:       4c0d            ldr     r4, [pc, #52]   ; (801dc <getData2(unsigned char*, unsigned char*)+0x3c>)
   801a6:       f44f 0200       mov.w   r2, #8388608    ; 0x800000
   801aa:       f44f 6380       mov.w   r3, #1024       ; 0x400
   801ae:       602a            str     r2, [r5, #0]
   801b0:       68ea            ldr     r2, [r5, #12]
   801b2:       0591            lsls    r1, r2, #22
   801b4:       d5fc            bpl.n   801b0 <getData2(unsigned char*, unsigned char*)+0x10>
   801b6:       ea4f 0252       mov.w   r2, r2, lsr #1
   801ba:       f800 2b01       strb.w  r2, [r0], #1
   801be:       6063            str     r3, [r4, #4]
   801c0:       6023            str     r3, [r4, #0]
   801c2:       0351            lsls    r1, r2, #13
   801c4:       d402            bmi.n   801cc <getData2(unsigned char*, unsigned char*)+0x2c>
   801c6:       606a            str     r2, [r5, #4]
   801c8:       4281            cmp     r1, r0
   801ca:       d1f1            bne.n   801b0 <getData2(unsigned char*, unsigned char*)+0x10>
   801cc:       bf00            nop
  do {
   801ce:       4288            cmp     r0, r1
   801d0:       d3ed            bcc.n   801ae <getData2(unsigned char*, unsigned char*)+0xe>
    if (st_value & ERROR_FLAG) {
      break; // end while loop
    }
  } while (ptdata < ptend);
  return (retval);
}
   801d2:       2001            movs    r0, #1
   801d4:       bc30            pop     {r4, r5}
   801d6:       4770            bx      lr
   801d8:       400e1230        .word   0x400e1230
   801dc:       400e1430        .word   0x400e1430

That's fine. It helped me a lot.

Now I have a (hopefully) final question:
Within the loop there is a check for an error flag. If the error flag is set then the loop is ended (break). I'd like to have a function getData() with only the assembler part in it. And the return value shall be the difference between registers r0 and r1, i.e. when all data have been read this difference will be 0 and when there was an error before the difference will be greater than 0.

How can I force the compiler to generate a return value without having a C statement "return()"?
Is it sufficient to load r0 at the end with the difference?

Hi, I just tried filling PTDAT at the end and it works!

My function now looks like this:

boolean getData(uint8_t* ptdata, uint8_t* ptend) {

    asm volatile (
      "str %[VTICPIN], [%[PIOCSODR], #0] \n\t" // set TICPIN (bit23 of PortC)
      "1: \n\t" // wait for FLAG
      "ldr r2, [%[PIOCSODR], #12] \n\t" // input PORTC von REG_PIOC_PDSR
      "lsls r3, r2, #22 \n\t" // FLAG is bit 9 of register R2
      "bpl 1b \n\t" // jump back to "1:" if FLAG=0
      "lsr r2, r2, #1 \n\t" // shift right r2
      "strb r2, [%[PTDAT]], #1 \n\t" // store shifted r2 to memory
      // confirm read:
      "str %[FLAGCLR], [%[PIODSODR], #4] \n\t" // FLAG_CLEAR low, REG_PIOD_CODR
      "str %[FLAGCLR], [%[PIODSODR], #0] \n\t" // FLAG_CLEAR high, REG_PIOD_SODR
      // check error flag
      "lsls r3, r2, #13 \n\t" // ERROR_FLAG now (after lsr 1) is bit18 of R2
      "bmi 2f \n\t" // break, end read loop
      "str %[VTICPIN], [%[PIOCSODR], #4] \n\t" // clear TICPIN, REG_PIOC_CODR
      "cmp %[PTEND], %[PTDAT] \n\t" // ptdata == ptend?
      "bne 1b \n\t" // jump back to „1:“ for next data
      "2: \n\t"
      "subs %[PTDAT], %[PTEND], %[PTDAT]"
      :
      : [PIOCSODR] "r" (0x400e1230), // (0x400e1230) REG_PIOC_SODR
        [PIODSODR] "r" (0x400e1430), // (0x400e1430) REG_PIOD_SODR
        [PTDAT] "r" (ptdata),
        [PTEND] "r" (ptend),
        [VTICPIN] "r" (0x00800000), // value to set/clear TICPIN
        [FLAGCLR] "r" (0x400)
      : "cc", "memory", "r2", "r3"
    );

}

I made some changes to preserve r1 until the end and forbid the compiler to use r2 to store the value 0x800000 because r2 is used to store the content of the PIO register, see the last line

: "cc", "memory", "r2", "r3"

The function returns the difference between PTEND and PTDAT in PTDAT.
From the final assembler I saw that the return value is transferred in register r0. I also saw that ptdata is handed over to the function in r0.
Only with this knowledge it works.
I guess a better, because more transparent, solution would be
"subs r0, %[PTEND], %[PTDAT]"
which also works.

Ah. I didn't touch your "clobber" line. I'm not sure it's needed if you let the compiler allocate all of the registers (which I didn't do for r1 and r2, I guess.)

I would have been inclined to make [PTDAT]/ptdata an output of the asm() statement, and calculated the return value in C (return ptend-ptdata), except I'm not quite sure how to set it

And I still think it would have been "cleaner" implemented in a .S file.

I would have been inclined to make [PTDAT]/ptdata an output of the asm() statement

I agree making [PTDAT] an output because the assembler changes its value. But when I tried it nothing worked at all. Looking to the resulting assembler from the compiler shows that now the variable ptdata is handed over in r1 instead of r0 and r0 is used to hold the PIOD address.

My old code which works fine is based on knowing that ptdata and ptend will be handed over in ro and r1. There is no statement telling this fact.
I guess there are any rules in CMSIS or anywhere else defining how parameters have to be handed over to/from functions. I don't know these rules.

No matter, I have a working solution and I'm happy with it.

This topic was automatically closed 180 days after the last reply. New replies are no longer allowed.