function optimization - wind direction - open for ideas

Some time ago I wrote a usefull function for compass heading to a compass string (part of an yet unfinished library). It is quite straightforward.

char direction[17][4] = {"N","NNE","NE","ENE","E","ESE","SE","SSE","S","SSW","SW","WSW","W","WNW","NW","NNW","N"}  // 68 bytes
char * degrees2direction(int degrees)
{
   degrees = degrees % 360;
   int idx = (degrees*10 + 112)/225;
   return direction[idx];
}

As there is quite some repetition in the direction array it is possible to code the array more efficiently. E.g. "NE" is part of "NNE", etc. The array

char s[] = "NNENESSESESSWSWNNWNW";

contains all possible directions.

Any ideas how to code this simple function more efficiently? => goal is to minimize the footprint of array AND code

TIA,

Does the function need to tolerate negative angles?

This is just a random thought.
By packing the data into the smallest space, you can get away with 21 bytes used in the array rather than 68

#define L_N  0
#define L_E  1
#define L_S  2
#define L_W  3
#define MAKE_NAME( count, a, b, c )  ( ( ( char ) count << 6 ) || ( ( char ) a << 4 ) || ( ( char ) b << 2 ) || ( ( char ) c ) )

char name[] = { 'N', 'E', 'S', 'W' };

char data[] = { 
  MAKE_NAME( 1, L_N, 0, 0 ),        
  MAKE_NAME( 3, L_N, L_N, L_E ),
  MAKE_NAME( 2, L_N, L_E, 0 ),
  MAKE_NAME( 3, L_E, L_N, L_E ),
  ...
  MAKE_NAME( 3, L_N, L_N, L_W ),
  MAKE_NAME( 1, L_N, 0, 0 )
 };
//Once idx is worked out
char result[] = { 0, 0, 0, 0 }; //EDIT: left out last 0 for null terminator, added!
char *ptr = result;

switch( ( data[ idx ] >> 6 ) & 0x3 ){
  case 3:
     *ptr++ = name[ data[ idx ] & 0x3 ];
    
  case 2:
    *ptr++ = name[ ( data[ idx ] >> 2 ) & 0x3 ];

  case 1:
    *ptr = name[ ( data[ idx ] >> 4 ) & 0x3 ];
}

This is an untested rush due to lack of time, I have a suspicion it retrieves the name backwards, if so just reverse the packing macro

@coding badly
It is primary meant to get readings from a digital compass [0…359] in a lib. So I prefer to remove the % operator to.

Negative values are handled quite easily, see code below. -400 as example.

int normalize(int deg)   // -400
{
  bool neg = false;
  if (deg < 0)   // true
  {
    neg = true;   // true
    deg = - deg;   // 400
  }
  deg %= 360;    // 40
  if (neg) deg = 360 - deg;  // 320 
  return deg;  //320
}

What I am searching at is some code that is just smaller.

My current ideas: (non of them elaborated yet)

  1. Condensed array
    The array char s = “NNENESSESESSWSWNNWNW”; is smaller but need 17 pairs of (startpos, length)
    Total 21 + 34 = 53 bytes for data. (15 bytes less)
    => notice the pattern in the string s = 4x pattern “AABAB” for every quadrant. This indicates that folding the array is an option and the code should determine the quadrant. I could try this folding first on the original code. I should work there too.

  2. Condensed array II
    similar to 1 but with \0 's in the array, so one only need to store the 17 start positions.
    The array char s = “NNNEENE*”; ==> 10 chars * 4 quadrants => 40
    Total: 40 + 17 = 57 bytes. a few more bytes but the code is expected to be simpler than (1)

  3. bitfields,

  • There are 4 possible direction chars (NEWS) can be coded in 2 bits
  • the longest direction is 3 chars ==> 3x2 bit = 6 bit.
  • length is 1,2,3, can be coded in 2 bits.

So any direction can be coded in a byte. - assume char d[4] = { ‘N’,‘E’, ‘S’, W’};
[01000000] = length=1 char=0; => d[0] = N
[03010001] = length=3 char= 1 0 1 => d[1] d[0] d1] ==> ENE
etc
This coding scheme compresses the strings effectively a factor 4. Which is interesting for logging / communication.

Total: To code all 17 strings I need 17 bytes + 4 chars == 21 bytes. + what the compiler makes of the bitfields…

It is just one of those open ends I am working on…

@pYro_65 Great idea, equals the bitfields idea above, but you have elaborated it quite a bit AND it does not depend on the compiler generated bitfield code ( which is probably similar)

Furthermore I like the unpacking code, the loop solved with the fall through in the switch!

Thanks,

No worries, switch statements can be very useful in certain circumstances.

Be careful.

c99 - Bit Fields.
An implementation may allocate any addressable storage unit large enough to hold a bitfield.
If enough space remains, a bit-field that immediately follows another bit-field in a structure shall be packed into adjacent bits of the same unit.
If insufficient space remains, whether a bit-field that does not fit is put into the next unit or overlaps adjacent units is implementation-defined.
The order of allocation of bit-fields within a unit (high-order to low-order or low-order to high-order) is implementation-defined.
The alignment of the addressable storage unit is unspecified.

Also note that the standard says that a “bit-field shall have a type that is a qualified or unqualified
version of int, unsigned int, or signed int”, so having a bit-field in a char type is non-standard.

As the arduino has a 1 byte boundary, should be fine for addressing.

Not too sure what you mean by

elaborated it quite a bit

All of it is necessary, MAKE_NAME boils down to a compile-time literal

Also #defines are a good thing, using them in place of literal values helps keep your frequently changed variables in one spot ( may never change ),

typing mistakes are avoided by reusing the one #define for many locations, as a literal will compile but a undefined ( misspelled ) #define will cause errors.

I was wander, why 17 elements, do you need repeating "N"? 16 is "golden" number :) String better to save in PROGMEM. And math could be replace with map:

{ degrees = degrees % 360; int idx = map (degrees, 0, 359, 0, 15); return direction[idx]; }

  • not tested.

Hi, here is another method (it gives the same result)

char direction1[] = {'N','E','S','S','W','N'};
char direction2[][3] = {"NE","SE","SW","NW"};

char *degrees2direction1(int degrees)
{
   static char ret[4];
   
   degrees = degrees % 360;
   byte idx = (degrees*10 + 112)/225;

   byte c1 = idx/3;
   byte c2 = idx/4;
   
   if( idx == 11 )
   {
      c1++;
   }

   switch( idx % 4 )
   {
      case 0:
         sprintf(ret, "%c", direction1[c1]);
         break;
      case 1:
      case 3:
         sprintf(ret, "%c%s", direction1[c1], direction2[c2]);
         break;
      case 2:
         sprintf(ret, "%s", direction2[c2]);
         break;
   }

   return ret;
}

@Magician

I was wander, why 17 elements, do you need repeating “N”? 16 is “golden” number

Because 0…22,5 == N and 338,5…360 == N

But I could go with 16 elements if I rotated 22.5 degrees CW before doing the lookup … interesting thought

{
degrees = degrees % 360;
int idx = map (degrees, 0, 359, 0, 15);
return direction[idx];
}

the map function internally truncates causing 359 map on WNW iso N
That is why I added 112 and divided by 225 to get rounding every 22.5 degree -
OK that is not exact either but within less than half a degree and kept the math within 16 bits.

Thanks for the inspiration for the 22.5 shift/rotate,

update –
PROGMEM, → would not decrease the footprint sec but would keep the RAM free, good idea too.

@pYro_65

Not too sure what you mean by Quote elaborated it quite a bit

I meant your code was allmost compiler ready, ("worked out quite far", is that a better phrase?, English is my 2nd language after C, eh Dutch)

@ea123 A new approach: concatenate "2 levels" partial strings. Did not think of that yet! Very good!

Could this technique be extended to 3 levels? The switch would get 8 cases I guess.....

@ea123 A new approach: concatenate "2 levels" partial strings. Did not think of that yet! Very good!

Could this technique be extended to 3 levels? The switch would get 8 cases I guess.....

if you write the sequence of possible directions: N N-NE NE E-NE E E-SE SE S-SE ...

you will se that there are two "levels" of strings: the initial letter ('N','S'...) and a pair ("NE", "SE",..). The sequence is not very simmetric because of the "N-xx" repetition, for this I added the condition

   if( idx == 11 )
   {
      c1++;
   }

probably by shifting the input angle as suggested in a post this can be eliminated.

Because 0..22,5 == N and 338,5..360 == N

But I could go with 16 elements if I rotated 22.5 degrees CW before doing the lookup .... interesting thought

Probably, you mean 0..11,25 == N and 348,75 == N . You are right, it's easier to add 17-th element to have deal with offset, than try to rotate cycle in negative area. Is it crucial to keep integer math, why not int idx = ((float)degrees + 11.25) / 22.5 ;

Probably, you mean 0..11,25 == N and 348,75 == N .

You are 100% right, it is +- 11.25 not 22.5, but you got the idea why I added the extra entry.

You are right, it's easier to add 17-th element to have deal with offset, than try to rotate cycle in negative area.

rotate is quite easy - see code below. The advantage of 17th element is that the var named degrees are real degrees, not "transformed" ones. (easier to maintain/debug)

char direction[16][4] = {"N", "NNE","NE","ENE","E","ESE","SE","SSE","S","SSW","SW","WSW","W","WNW","NW","NNW"}  // 64 bytes
char * degrees2direction(float degrees)
{
  degrees += 11.25;
  degrees = degrees % 360;
  int idx = (degrees+11.25)/22.5;
  return direction[idx];
}

It s not cruciual to have int math, but I like the smaller footprint and the speed of floatless code. Most digital compasses have an accuracy of ~1 degree, so ints are quite natural imho. I could do all internal math in 1/10 of a degree (as I did with the index formula) as the values will still fit in an int. But yes, floats would be more accurate.

goal is to minimize the footprint of array AND code

Does that include hidden / helper functions?

For example, because your function uses modulus, division, and multiplication, helper functions are included in the program. The helpers add about 150 bytes of code. However, if other code causes them to be included, they are essentially free for your function.

Good question CB,

I may assume that the division and multiplication is available as not much sketches can do without. modulo is less often used, so that one might be counted as extra - but I wanted to remove that anyway, see somewhere above -

Think it should include just all as in the end there is limited flash/ram, so for now given that the function has the signature char *windrose(int degrees)

which version adds the least to

void setup() { Serial.begin(9600); Serial.println(windrose(100)); }

loop(){};

There are some advantages if the caller provides storage…

char *windrose(int degrees, char buffer[4])   
{
  // stuff
  return( &buffer[0] );
}

Acceptable?

Acceptable?

Any idea is acceptable, I am allways open for ideas, does not mean it will get into it.

In the line of your idea, some people might say “N, NE, E, SE, S, SW, W, NW” or even NESW is good enough for me. that would mean additional params, ==> next version :slight_smile:

Currently I’m busy implementing the variations I have - 7 so far,

  1. first to get them right <<<< currently mostly here >
  2. get them fast & small << and a bit time is spend here >

fun!

No need for passing a buffer, ( it will reside in stack space anyway unless global or static ).

uint32_t &windrose( int degrees )   //EDIT: left out reference in return
  {
    uint32_t u_Return;
    char *buffer = ( char* ) &u_Return;
    
    //do stuff with buffer
    return u_Return;
  } 

void loop()
  {
    //By assigning the return value to a 'constant reference' the lifetime of the temporary ( return value ) is extended to that of the constant reference
    const uint32_t &u_Data = windrose( 5 );
    char *buffer = ( char* ) &u_Data;   

    return;
  }

This method prevents formal parameters being pushed and popped off the stack in the function call, and the data space held by the constant reference is the actual function return value space so no temporaries are copied either.

By putting the buffer and 32-bit value in a union you can return and access a single object.

union WindDir{
  uint32_t u_Data;
  char c_Buffer[4];
}

For speed, if you use the 'idx % 4' version use the formula 'numerator & (denominator - 1)' where denominator is 2^n ( 1, 2, 4, 8,... )

Here’s something to play with:

char *dir2s(char *b,unsigned char d)
{  unsigned char *p = b,
      x = "\x01\x43\x12\x47\x05\x67\x1A\x6B\x09"
          "\xEB\x3A\xEF\x0D\xCF\x32\xC3\x01"[d],
      n = 3 & x;
   while (n--) *p++ = "NESW"[3 & (x >>= 2)];
   *p = '\0';
   return b;
}

b is a pointer to where you want the direction placed, and d is your 0-16 direction number. If you’d like a short (non-Arduino) test program:

#include <stdio.h>
int main(void)
{  unsigned char b[4],i;
   for (i=0; i<17; i++) puts(dir2s(b,i));
   return 0;
}

(tested, test program added)

Thank you all for your ideas/remarks/optimizations etc. Most inspiring to see such different approaches for such a “simple” function. Learned new tricks and rediscovered old ones!

(Nice short one Morris! added it last minute)

I added all the algorithms (some refactored/rewritten a bit to fit the test) to the testprogram below. Also added 2 other ideas (6&7).
I did not include all optimizations proposed, esp. those about allocating buffers by the calling code.

The sketch tests for speed and correctness. For the speed I do 360 calls so every code path is used several times. For correctness I compare the output with my original code. I have not tested for size yet, maybe later today / tommorow.

Windrose testcode is attached as zip.

output of the program

Windrose speedtest 0.1

7240 	 20
7372 	 20
9216 	 25
21804 	 60
9956 	 27
8004 	 22
7280 	 20
8488 	 23

check output
0: 	 N 	 N 	 N 	 N 	 N 	 N 	 N 	 N
...
359: 	 N 	 N 	 N 	 N 	 N 	 N 	 N 	 N

(all disclaimers apply :slight_smile: -some code can be optimized more

Some analysis:

  • 360 calls to windrose() differ between 7240 and 21804 micros ==> 20 to 60 micros per call. (approx factor 3)
  • The 21804 uses sprintf() which is expensive and can easily be rewritten without sprintf ==> 9956 micros
  • I expected windrose2() which uses the packed array to be faster especially as Morris code which also uses a packed array is ~10% faster.

Again thanks,

windrose.zip (1.69 KB)