performance of map() function - an analysis

// just to share

Today I was playing with the map function, including looking at effects of integer overflow and how to prevent them.

If one uses large values the internal (integer) math can overflow causing serious trouble

 long x = map(i, 0, 1000000, 1000000, 100000000);

A solution to this is doing the math in float and loose some precision. Because float math is considered slower than integer math I decided to do some timing and I found out that a local written float mapf() function was 10x as fast as the map function!

The reason for this is that as the parameters are all known at compile time the compiler optimized almost all math away. (fastest math = no math). Same is true for a local written long mapl(), much faster than the library map() function, because it can be optimized aggressively.

This means that if one uses the map function a lot and one needs speed, there are dozens of microseconds to be gained per call if the mapping ranges are known compile time. The sketch below shows the differences

uint32_t start;
uint32_t stop;
long d[10];

void setup() 
{
  Serial.begin(115200);
  Serial.println("Start ");

  for (long i = 1; i <= 1000000; i*=10)
  {
    start = micros();
    long x = map(i, 0, 1000000, 1000000, 100000000);
    d[0] = micros() - start;

    start = micros();
    long a = mapl(i, 0, 1000000, 1000000, 100000000);
    d[1] = micros() - start;

    start = micros();
    long z = mapf(i, 0, 1000000, 1000000, 100000000);
    d[2] = micros() - start;

    Serial.print(i);
    Serial.print("\t");
    Serial.print(x);
    Serial.print("\t");
    Serial.print(a);
    Serial.print("\t");
    Serial.println(z);

    for (int i=0; i<3 ; i++)
    {
      Serial.print('\t');
      Serial.print(d[i]);
    }
    Serial.println();

  }
}

long mapl(long x, long in_min, long in_max, long out_min, long out_max)
{
  return (x - in_min) * (out_max - out_min) / (in_max - in_min) + out_min;
}

float  mapf(long x, float in_min, float in_max, float out_min, float out_max)
{
  return (x - in_min) * (out_max - out_min) / (in_max - in_min) + out_min;
}

void loop(){}

output (every second line is the timing.

1	1000099	1000099	1000099
	56	0	4  
10	1000990	1000990	1000990
	60	4	4
100	1001310	1001310	1009900   //   long overflow !!!
	60	4	4
1000	1000215	1000215	1099000
	56	4	8
10000	997863	997863	1990000
	56	4	4
100000	1000100	1000100	10900000
	60	8	4
1000000	1001003	1001003	100000000
	56	0	4

Note:

  • the locally defined mapl has also long overflow
  • the locally defined mapf() gives the correct answer
    (more precisely in the right order of magnitude as floats have less significant digits than long)
  • performance gain factor 10+ due to local definition

So back to the original question, how to prevent integer overflow in the mapping function?

  • use a local defined mapf() function to prevent the overflow
  • use compile time known ranges to squeeze out performance.

During the tests I found a variant of the map() function having an interesting interface and performance.

long mapf(long x, long px,  long py, float slope)
{
  return (x - px) * slope + 0.5  + py;  // includes rounding
}

px, py describes any point on the line (preferably 0, py or px, 0) and slope describes the slope/steepness of the line

the advantage of this map function is it has no (expensive) integer division but a float multiply.
first order measurement showed it is 10-15% faster than the normal map function. (but footprint is bigger due to float!)

update: not tested is a version where px and py are floats too

Couldn't you just use a template? This way instead of having multiple functions, you just have one that changes its type.

Good idea, not tried

I tried a similar experiment with the Arduino Due and got different results. Perhaps my code organization is to blame, but I thought I'd share anyhow.

Here's my main code:

#include "FastMap.h"

uint32_t start;
uint32_t stop;

long native_time;
long custom_time;

void setup() 
{
  delay(4000);

  Serial.begin(9600);
  Serial.println("Timing map vs map_uint16");

  start = micros();
  for (uint32_t i = 0; i <= 800000; i++)
  {
    uint16_t x = map(i%4095, 0, 4095, 500, 1000);
  }
  native_time = micros() - start;

  start = micros();
  for (uint32_t i = 0; i <= 800000; i++)
  {
    uint16_t x = map_uint16(i%4095, 0, 4095, 500, 1000);
  }    
  custom_time = micros() - start;

  Serial.print("map execution time: ");
  Serial.print(native_time);
  Serial.println();

  Serial.print("map_uint16 execution time: ");
  Serial.print(custom_time);

}


void loop(){}

FastMap.cpp is:

#include "Arduino.h"
#include "FastMap.h"

uint16_t map_uint16(uint16_t x, uint16_t in_min, uint16_t in_max, uint16_t out_min, uint16_t out_max)
{
  return (x - in_min) * (out_max - out_min) / (in_max - in_min) + out_min;
}

FastMap.h is:

#include "Arduino.h"

uint16_t map_uint16(uint16_t x, uint16_t in_min, uint16_t in_max, uint16_t out_min, uint16_t out_max);

The results were:

Timing map vs map_uint16
map execution time: 581895
map_uint16 execution time: 553289

Interesting stuff.
Bret

How much of a time difference is it, if you make x in uint16_t x = map(i%4095, 0, 4095, 500, 1000); long or 32bit?
The original map function is type long and yours is type int, but your putting the original map value into a int, so it must be converted.

I wonder how much time it takes to convert from long to int?

HazardsMind:
I wonder how much time it takes to convert from long to int?

think the compiler can do this really fast (2 instructions?) as it just have to ignore 16 highest bits?

Ok, but what about if its cycled 800000 times? Its got to add up to something.

yep a few seconds

To celebrate the IDE upgrade to compiler version 4.8.1, I attempted my first C++11 style function.

This will do the maths using either signed long long, or unsigned long long

template < typename T, typename U > struct IsSameType{ enum { Value = false }; };
template < typename T > struct             IsSameType< T, T > { enum { Value = true }; };

template< typename T, typename U, typename V, typename X, typename Y > 
  auto map( const T &x, U &&imin, V &&imax, X &&omin, Y &&omax ) -> T
    {
      typedef decltype( IsSameType< T, decltype( ( signed ) T() ) >::Value ? 1LL : 1ULL ) cll;
      return (cll)(x - imin) * (omax - omin) / (imax - imin) + omin;
    }

Here are the results with mine added onto the end.

Start
1 1000099 1000099 1000099 1000099
60 4 4 4
10 1000990 1000990 1000990 1000990
56 0 0 4
100 1001310 1001310 1009900 1009900
52 4 4 4
1000 1000215 1000215 1099000 1099000
52 4 4 4
10000 997863 997863 1990000 1990000
64 4 4 4
100000 1000100 1000100 10900000 10900000
64 4 0 0
1000000 1001003 1001003 100000000 100000000
56 4 4 4

However once the functions are not allowed to inline themselves:

Start
1 1000099 1000099 1000099 1000099
64 64 84 52
10 1000990 1000990 1000990 1000990
52 56 80 68
100 1001310 1001310 1009900 1009900
56 56 76 64
1000 1000215 1000215 1099000 1099000
56 56 80 80
10000 997863 997863 1990000 1990000
68 56 84 80
100000 1000100 1000100 10900000 10900000
64 56 80 84
1000000 1001003 1001003 100000000 100000000
64 52 80 96

Could probably squeeze more efficiency out of the template version by only using 'long long' when the inputs are 32 bits or more.

Also one important note, when using integer literals, you must put a suffix if the number is greater than 16bits, templates will treat them as an int and truncate the value.