Inline-assembler, efficiency, and frame pointer

Trying for a while to optimize an expensive sensor-requesting and processing loop. Now, I am quite close to a perfectly working solution. Exchanging some small inner loops with inline-assembler made the break through.

My main problem now is that all assembler commands with "Immediate" only work with the 16 upper registers which run out pretty fast when using 16-Bit variables.

And I observed that r28:r29 are occupied by the so called "frame pointer" which points to variables on the stack. But as I avoided all of these variables, in the resulting .s-file there is no access to the frame pointer in my whole function. Nonetheless, the compiler denies to use these two registers, so they are wasted.

Is there any flag, pseudo comment, or other trick to persuade the compiler to not allocate the frame pointer registers?

void setup( void )
{
  asm volatile
  (
    "ldi   r28, 0x00"                                 "\n\t"
    "ldi   r29, 0x00"                                 "\n\t"
    : // Outputs
    : // Intputs
    : // Clobbers
  );
}

void loop( void )
{
}
Binary sketch size: 448 bytes (of a 32,256 byte maximum)

Works for me.

For me, too! Great, thanks!

Trying for a while to optimize an expensive sensor-requesting and processing loop.

As I like optimizing, can you post the C++ code version? (+time requirement)
Maybe it can be optimized enough without use of assembler.

Thanks for your offer.

Actually, I guess without exact explanations what's going on it will be difficult. But ok, here is the content of a small inner loop:

void lin_to_inv_quasi_log( unsigned short v )
{
  byte e = highByte( v );

  if ( e == 0 ) {
    e = lowByte( v );
    if ( e >= 128 ) {            // Most frequently used
      e >>= 1;
      e += (64-32);
    } else if ( e >= 32 ) {      // 2nd most
      e -= 32;
    } else {
      e = 0;                     // Least frequently used
    }
  } else if ( e == 1 ) {
    e = (byte)(v>>2);
    e += (128-32);               // 3rd most
  } else if ( e == 2 ) {
    e = (byte)(v>>3);
    e += (192-32);
  } else {
    e = 255;
  }
  while ( !( UCSR0A & (1<<UDRE0) ) );  UDR0 = e;
}

Wouldn't a lookup be quicker?

A really fast lookup-table would have 64kBytes of size. If I would add "if ( v > 759 ) e = 255; else e = lookup_table[v];" this would reduce the size to 759 Bytes which would be ok.

But then the question of which assembler code is faster comes up.

The assembler version of my solution needs for the most frequently used branch only 2 cpi, 2 br.., 1 lsr, 1 subi = 6 CPU cycles

The above small table-lookup solution takes ... uh ... 1 cpi (only high byte), 1 br.., 1 addw v,Z (if the base address is in Z and I don't need v afterwards), and 1 ld r,v
That's 6 CPU cycles.
Benefits: 1) all values are equally fastly estimated, 2) very simple code, 3) any function could be easily implemented
Shortcomings: 1) Much more flash memory occupied, 2) initialization function of the table-lookup field necessary, 3) an additional index register is occupied