Pages: 1 2 [3] 4 5   Go Down
Author Topic: faster printing of floats by divmod10() and others  (Read 5721 times)
0 Members and 1 Guest are viewing this topic.
U.K
Offline Offline
Jr. Member
**
Karma: 1
Posts: 70
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

I've gone now for code size optimising, couple of idears

Code:
 if ( notation == DEF ) {
    if ( (number > 8388608.0) || (number < 0.000001) ) {
      // as we head into SCI / ENG world add digits for output
      notation = SCI;
      digits += 6;
    }
  }
I've done this as this is the value, that we have to move to E notation, to only print accurate numbers, ie 8388609 actually prints as 8388610 :-(
this still has it listed wrong, but printed as E form allows the user to 'have a guess' that number might not be exactly as shown.

i've enum'd the DEF, SCI and ENG values, and by passing these we get a free compiler sense check on values, not perfect, but useful IMO. ( saves having to take a local copy in Enotation )

I've also changed printNumber, to take a third argument ( the number of digits we must print )  a simple
#define  NO_LEADING_ZERO 0  allows sensible looking code for existing use of the function. ( the compiler for your example program only twice has to reload r16 ( with the extra arg being passed, one time extra for when we use it to get leading zero's )

Code:
 // Extract the integer part of the number and print it watching how many digits we have
  uint32_t int_part = number;
  uint8_t tmp = prn_cnt;
  prn_cnt += printNumber(int_part, DEC, NO_LEADING_ZERO);

  // see if we are going to be printing too many digits, we can save time doing the decimal half.
  uint8_t digits_available = 7 - ( prn_cnt - tmp );
  if ( digits > digits_available ) digits = digits_available;

  if (digits > 0) {
    prn_cnt += write('.');

    double remainder = number - int_part;

    // make an unsigned long of the decimal part - of a certain length, ie leading zero's !
    uint32_t rem = remainder * remMult[digits - 1];
    prn_cnt += printNumber(rem, DEC, digits);
  }
thie above is the partial code that handles printing of the float now.  and the faster divmod10_asm is handled only once in the printNumber routine.

Code:
size_t Print::printNumber(uint32_t num, uint8_t base, uint8_t leading_zeros) {
  char buf[33];
  char *str = &buf[sizeof(buf) - 1];
  *str = '\0';

  uint8_t mod, tmp;
  int8_t extra_digits = leading_zeros;

  do {
#ifdef USE_STIMMER_OPTIMIZATION
    if ( base == DEC )
    {
      divmod10_asm32(num, mod, tmp);
      *--str = mod + '0';
      extra_digits -= 1;
    }
    else
#endif
    {
      *--str = '0' + num % base;
      if ( *str > '9' ) *str += 7;
      num /= base;
      extra_digits -= 1;
    }
  } while (num);

  for ( ; extra_digits > 0; extra_digits-- ) *--str = '0';
  return write(str);
}

i'm happy with not having optimised versions of print for HEX, OCT and BINary. and this shrinks down the code added nicely.

so slightly slower than the version you've posted, but a code shrink on it.  numbers printed before the decimal point, are always right, and rounding is cut off after a total of seven digits being printed ( ignoring sign exponent etc )

note, previously i've change a few vars from int down to uint8_t, to allow the compiler to utilise a single register. base being an example.

also, you need to use smaller vars, where possible... ie using int instead of (u)int8_t means lots of extra code checking and using ( ie expoent++ )
thats another example exponent++ and exponent--  and quite often on gcc 4.3.2 on (u)int8_t vars will extend to 16 bits, and thus waste time / cpu cycles.  expoenent -= 1; is faster and smaller cos the compiler produces code for a single register than a pair.



oops a few edits for spelling, and clearer reading text, and here are my times on a UNO

10737.41
1.0182
107.37

Time=448
per char incl .\r\n : 17.23   <----- corrected in the code for only printing 26 not 28 chars.
done



« Last Edit: July 29, 2013, 10:49:07 am by darryl » Logged

--
 Darryl

Global Moderator
Netherlands
Offline Offline
Shannon Member
*****
Karma: 168
Posts: 12428
In theory there is no difference between theory and practice, however in practice there are many...
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

@Darryl,
good points you make.( +1)

Code:
if ( notation == DEF ) {
    if ( (number > 8388608.0) || (number < 0.000001) ) {
      // as we head into SCI / ENG world add digits for output
      notation = SCI;
      digits += 6;
    }
  }
1) For this behaviour I would like to add a new format e.g. DEFSCI so that the DEF behavior is 100% backwards compatible.
2) the lower limit should be higher 0.01 as printing with 2 decimals is I think most used.


Can you post your print.h/.cpp so I can give it a try here?  

(My time=536 yours time=448, want to understand why yours is so much faster?)
« Last Edit: July 29, 2013, 12:46:15 pm by robtillaart » Logged

Rob Tillaart

Nederlandse sectie - http://arduino.cc/forum/index.php/board,77.0.html -
(Please do not PM for private consultancy)

Global Moderator
Netherlands
Offline Offline
Shannon Member
*****
Karma: 168
Posts: 12428
In theory there is no difference between theory and practice, however in practice there are many...
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

My last optimization are in the low level write() functions esp write(const char *str) . This may interfere with overridden implementations of them. [block device vs char device]

some other optimizations.

14 bytes smaller (not measurable in speed)
Code:
size_t Print::println(void)
{
    write('\r');
    write('\n');
    return 2;
    // size_t n = print('\r');
    // n += print('\n');
    // return n;
}




8 bytes smaller (3 uS faster)
Code:
size_t Print::write(const uint8_t *buffer, size_t size)
{
    size_t n = 0;
    for (; n < size; n++) {
        write(*buffer++);
    }
    return n;
 // size_t n = 0;
    // while (size--) {
        // n += write(*buffer++);
    // }
    // return n;
}

Other print functions can be optimized in similar ways, doing the repeated addition - n+=write(...);  - in a loop costs extra if the size is known in advance.
Logged

Rob Tillaart

Nederlandse sectie - http://arduino.cc/forum/index.php/board,77.0.html -
(Please do not PM for private consultancy)

U.K
Offline Offline
Jr. Member
**
Karma: 1
Posts: 70
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset


1) For this behaviour I would like to add a new format e.g. DEFSCI so that the DEF behavior is 100% backwards compatible.
2) the lower limit should be higher 0.01 as printing with 2 decimals is I think most used.


Can you post your print.h/.cpp so I can give it a try here?  

(My time=536 yours time=448, want to understand why yours is so much faster?)


sure,  make changes as you see fit....  i've attached my files.   you will notice that I've gone over the core library and changed certain parameters to calls reducing size being used.  the int down to 8bit for example.
another speed up is using bool, instead of the stock defined boolean, which is typedef by default on unsigned char ( uint8_t )  duh !

i use a macro based hardware serial, and for writing, i don't buffer... i do however run my serial port at 1,000,000 baud, so a probable speed up, altho on default TX & RX buffering, I think I remember the buffers being 32 byres in size. ( iuse a mega 2560 frequently, so wanted the RAM, so I explicitly define each serial port I want live. and I only buffer RX. so perhaps my faster baud rate ( with busy polling to write the next byte ) doesn't actually gain much over the buffer TX version. I know my hardware serial replacement saves over 500bytes.



* Print.cpp (5.8 KB - downloaded 11 times.)
* Print.h (6.63 KB - downloaded 11 times.)
* HardwareSerial.h (9.69 KB - downloaded 14 times.)
« Last Edit: July 29, 2013, 02:13:18 pm by darryl » Logged

--
 Darryl

Global Moderator
Netherlands
Offline Offline
Shannon Member
*****
Karma: 168
Posts: 12428
In theory there is no difference between theory and practice, however in practice there are many...
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

Thanks for sharing, will look at it later this week, was a (32bit) long day smiley-wink
In the 0.22 version the params were all uint8_t where possible.

Your additional mods explains quite a bit!

You should also have a look at the Teensy hardware serial code, it has some optimizations too. (www.pjrc.com)

update: if you want reduce size more you could not count the chars printed, just a 0 or 1 (so you can still use it as boolean) Breaking but smaller smiley-wink




« Last Edit: July 29, 2013, 02:26:04 pm by robtillaart » Logged

Rob Tillaart

Nederlandse sectie - http://arduino.cc/forum/index.php/board,77.0.html -
(Please do not PM for private consultancy)

U.K
Offline Offline
Jr. Member
**
Karma: 1
Posts: 70
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

Thanks for sharing, will look at it later this week, was a (32bit) long day smiley-wink
In the 0.22 version the params were all uint8_t where possible.

Your additional mods explains quite a bit!

You should also have a look at the Teensy hardware serial code, it has some optimizations too. (www.pjrc.com)

update: if you want reduce size more you could not count the chars printed, just a 0 or 1 (so you can still use it as boolean) Breaking but smaller smiley-wink


yes, over the years, i've peered at most bits of code commonly out their, this serial i'm happy with, its a good compromise for me, on speed and size taken, and not eating up ram when i only want to use one or two serial ports.

I had thought of returning a boolean from the printed calls, but as they are passed back in a register, its not much use.  the bool vars inside sections of code often get used in the zero or T flag sometimes the carry flag.  i get a bit over the top at times, and should really code in assembler ! ;-)

take this bit of code in wiring.c ( the micros() call )
Code:
// help the compiler generate some sensible code for the ((m<<8) + t)
__asm__ volatile (
"mov %D0, %C0" "\n\t"
"mov %C0, %B0" "\n\t"
"mov %B0, %A0" "\n\t"
"mov %A0, %[lo_byte]" "\n\t"
: "=r" (m)
: "0" (m), [lo_byte] "r" (t) );
return m * ( 64 / clockCyclesPerMicrosecond() );

I couldnt get the compiler to come up with sensible code for shifting the value in M left 8 bits, so had to give it a helping hand.




Logged

--
 Darryl

Global Moderator
Netherlands
Offline Offline
Shannon Member
*****
Karma: 168
Posts: 12428
In theory there is no difference between theory and practice, however in practice there are many...
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

for size sake ...

Code:
if (notation != DEF) {
    prn_cnt += write('e');

    if (exponent >= 0) {
      // the print below here, will do the minus sign print for us
      prn_cnt += write('+');
    }

    prn_cnt += print(exponent, DEC);
  }
could be
Code:
if (notation != DEF) {
    prn_cnt += write('e');
    prn_cnt += print(exponent, DEC);
  }
as the + is implicit and therfor optional smiley-wink
Logged

Rob Tillaart

Nederlandse sectie - http://arduino.cc/forum/index.php/board,77.0.html -
(Please do not PM for private consultancy)

U.K
Offline Offline
Jr. Member
**
Karma: 1
Posts: 70
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

for size sake ...
....
as the + is implicit and therfor optional smiley-wink

yes, back when you first pushed the suggestion for the SCI & ENG support, a few messages talked about the output format of the exponent part, upper or lower case E etc. I decided I liked best the + going between the lower case e and the actual value of the exponent. on small lcd displays its nicer reading I think.

guess i should pass a thanks on, as i use your stats library and the running average quite a lot ;-)
Logged

--
 Darryl

Rapa Nui
Offline Offline
Edison Member
*
Karma: 52
Posts: 1990
Pukao hats cleaning services
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

What is the latest fastest best optimized bug free print.cpp and print.h ?
Thnx.
Logged

Global Moderator
Netherlands
Offline Offline
Shannon Member
*****
Karma: 168
Posts: 12428
In theory there is no difference between theory and practice, however in practice there are many...
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

What is the latest fastest best optimized bug free print.cpp and print.h ?
Thnx.

I have attached my latest print.cpp and print.h. It is substantial faster than the default for almost all datatypes (except char).
I do not claim it is the fastest/best optimized or bug free. I am using this version since this thread started, in fact a bit longer.
In this time I have encountered a few issues and they are all fixed - most are discussed in this thread.
Last month I did not encounter new issues, so I would call it a stable beta (customer trial ready)

Besides the performance it also include SCIentific notation of small and large floats. So any value 32bit float can represent is supported.
check print.h and uncomment appropiate section
Code:
// uncomment if you want: int64 support, scientific notation and overflow testing
// #define PRINT_LONGLONG
// #define PRINT_SCIENTIFIC
// #define PRINT_NAN_INF

Please test the performance before and after, so you get an indication of the gain.
Please post unexpected things on this thread, I will check almost on a daily basis so I can reproduce/fix asap.


* Print.cpp (18.08 KB - downloaded 10 times.)
* Print.h (3.04 KB - downloaded 9 times.)
Logged

Rob Tillaart

Nederlandse sectie - http://arduino.cc/forum/index.php/board,77.0.html -
(Please do not PM for private consultancy)

Rapa Nui
Offline Offline
Edison Member
*
Karma: 52
Posts: 1990
Pukao hats cleaning services
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

Thanks!
I did a brief testing from what I have found here, while printing into a .CSV file on an SdCard (see a typical CSV record, under NilRtos using FIFO, so measured is the elapsed time for file.print() from the FIFO_struct to the Sdcard's buffer, 9 floats and 3 itegers):

Code:
,1377866878,667,0.1243,70.69,74.39,78.26,80.73,82.29,82.51,87.89,76.89,1720

Original file.print: ~7ms
Darryl file.print:   ~5ms
Rob(the latest) file.print: ~3.1ms
SdFat's file.printField: ~1.7ms
« Last Edit: August 30, 2013, 10:21:37 am by pito » Logged

Global Moderator
Netherlands
Offline Offline
Shannon Member
*****
Karma: 168
Posts: 12428
In theory there is no difference between theory and practice, however in practice there are many...
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

Note: Darryl did not implement all ideas discussed as he did not want the class to grow too much.
Logged

Rob Tillaart

Nederlandse sectie - http://arduino.cc/forum/index.php/board,77.0.html -
(Please do not PM for private consultancy)

0
Offline Offline
Edison Member
*
Karma: 44
Posts: 1471
Arduino rocks
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

The SdFat printField buffers the number plus the field separator or the CR/LF for end of line.  There is a high overhead for each call when writing to an SD.

I will soon post a version of SdFat that is even faster using ideas in this forum topic.
Logged

Rapa Nui
Offline Offline
Edison Member
*
Karma: 52
Posts: 1990
Pukao hats cleaning services
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

@rob: it seems the ENG does not work when SCI is not enabled.. (even from the source I can see the ENG is a part of SCI).. smiley
Myabe, for clarity, I would do:
Code:
#define PRINT_LONGLONG
#define PRINT_SCIENTIFIC_AND_ENGINEERING
#define PRINT_NAN_INF
Logged

Global Moderator
Netherlands
Offline Offline
Shannon Member
*****
Karma: 168
Posts: 12428
In theory there is no difference between theory and practice, however in practice there are many...
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

Good point, would it make sense to enable them separately?

(don't know if that's easy in the code as these two (SCI/ENG) are intertwined)
Logged

Rob Tillaart

Nederlandse sectie - http://arduino.cc/forum/index.php/board,77.0.html -
(Please do not PM for private consultancy)

Pages: 1 2 [3] 4 5   Go Up
Jump to: