The following function is smaller and faster than the current version of Print::printNumber(). It uses one more byte of RAM buffer, 33 bytes vs 32 bytes for the current version. It uses about 100 fewer bytes of flash.
It is faster since it replaces the mod operator with a multiply operator. It uses one write for the entire formatted string. This speeds up the SD library a great deal.
I have achieved an SD formatted write rate of over 18,000 bytes/second for println with four digit numbers. I replaced FILE_WRITE with (O_WRITE | O_CREAT) in the SD.open() call.
The current SD library and Print only achieve 77 bytes/second when files are opened with FILE_WRITE. The current Print/SD library achieves about half the above rate when (O_WRITE | O_CREAT) is used in the open call.
Here is the suggested replacement:
void Print::printNumber(unsigned long n, uint8_t base) {
char buf[8 * sizeof(long) + 1]; // Assumes 8-bit chars.
char *str = &buf[sizeof(buf) - 1];
*str = '\0';
do {
uint32_t m = n;
n /= base;
char c = m - base * n;
*--str = c < 10 ? c + '0' : c + 'A' - 10;
} while(n);
write(str);
}
but there is still an optimization to be found in
char buf[8 * sizeof(long) + 1]; // Assumes 8-bit chars.
The array-size could be base dependant, NB this size is only needed when base = 2
base = 2 => 32 + 1 char.
base = 10 => 10 + 1 char.
I used a buf size of 33 since base 2 is defined as the symbol BIN in Print.h.
The biggest problem for the current function and this function is a call with base = 1. They both write over memory. This function is never called with base = 0.
Maybe the end condition should be while (n && str != buf). This would produce 32 zero characters for this error. A base greater than 36 doesn't make sense but just produces a strange string.
I think the base 1 case just needs to avoid crashing and give an indication of a bug.
Printing the string "00000000000000000000000000" is a big clue.
On compiler optimization. You need to use large sketches to see if a change is worth doing. The compiler does a pretty global optimization of a class. Here are some examples for Print.
I used a large sketch with a lot of print/println calls to estimate the 100 byte savings for the new printNumber(). The results were old function 7492 bytes new function 7388 bytes. This was consistent with several other large sketches.
Things are not so clear with a very simple sketch. For this sketch:
How often is the base a variable instead of constant? Have you run across any situations where the base is a variable?
Everytime printNumber is called, the snippet I posted was intended to be included in printNumber, I should have made that more explicit.
void Print::printNumber(unsigned long n, uint8_t base) {
if ((base <2) || (base > 36)) // base to small or to big
{
write("BaseBug");
return;
}
char buf[8 * sizeof(long) + 1]; // Assumes 8-bit chars.
char *str = &buf[sizeof(buf) - 1];
*str = '\0';
do {
uint32_t m = n;
n /= base;
char c = m - base * n;
*--str = c < 10 ? c + '0' : c + 'A' - 10;
} while(n);
write(str);
}
Print is a core class of Arduino and every class derived from it uses this code. printNumber is a private member of this class, so it will not be called directly from usercode.
To be found at ~\arduino-0021\hardware\arduino\cores\arduino
printNumber( value, 10 );
uint8_t base = 10;
printNumber( value, base );
Both calls will lead to an internal var within printnumber called base that can be tested.
Grr. Pisses me off that the division routine almost certainly already computed the remainder, and then threw it away, requiring us to compute it again!
I have not dumped the assembly listing to see why.
I ran into this recently. It's because signed divide is implemented as a sign fixup function that calls unsigned divide. If your sketch doesn't use the signed divide anywhere else, your binary will be left with only the unsigned divide function (additionally saving the size of the sign-fixup wrapper.)
Grr. Pisses me off that the division routine almost certainly already computed the remainder, and then threw it away, requiring us to compute it again!
maybe the compiler optimizes it?
#include <stdlib.h>
void setup() {
}
void loop() {
unsigned long a, b, q, r;
a= 12345;
b = 1000;
ldiv_t d = ldiv(a, b);
q = d.quot;
r = d.rem;
}