String concatenation for Serial.print

krupski:
I know that "boldly asserted is half proven", but I would love to see some proof (links, whatever) to back up those 3 assertions (because you are wrong on all 3 counts).

Well, he did show three example programs using the different techniques, and the version with the giant string, and the one with sprintf both used about 50% more ram, 100% more flash, and executed more slowly. What sort of proof are you looking for?

krupski:
I know that "boldly asserted is half proven", but I would love to see some proof (links, whatever) to back up those 3 assertions (because you are wrong on all 3 counts).

Asking for proof and claiming he's wrong... with no proof. -dev showed the code and results. What more proof do you want? What proof have you other than your general distaste for the Arduino community? Does your version with printf produce smaller or faster code? Prove it.

There are cases where pre-buffering print() output might be useful. AFAIK, neither Ethernet.print() nor USBSerial.print() does anything intelligent in terms of avoiding the "one message per print statement" problem, and you could potentially improve throughput quite a lot. Normal HardwareSerial.print() would gain very little, though; in additional to there being no smarts for "aggregating" small output requests, there also aren't any smarts for optimizing large requests.

Hi all,
As the originator of this thread, I am overwhelmed with the magnitude of replies to a question I thought was a simple one.
It looks almost like opening a Pandora's box.
Nevertheless, reading through all the replies and discussions I think I have the answer to my question, namely that concatenating print values is not the best way to go with Arduino, or at least it will not yield any advantage.

I will keep trying and testing my specific app, but this issue is of secondary importance to me at this time.
Thanks to all contributors who made me somewhat smarter......

AFAIK...

USBSerial (aka Serial on a Leonardo) ultimately sends the bytes one at a time, so there is no advantage.

If the OP were using a W5100 (they're not), and if TCP "fragmentation" were an issue, simply derive a class from EthernetClient. Basically, the selected answer in that link adds the same RAM buffer used by the sprintf approach (I would also override the other virtual write method). It would use extra RAM like the sprintf technique, but would not have the other sprintf disadvantages.

Normal HardwareSerial.print() would gain very little, though; in additional to there being no smarts for "aggregating" small output requests, there also aren't any smarts for optimizing large requests.

There is nothing to gain. HardwareSerial puts each byte into an output buffer, and that buffer is emptied by transmit interrupts (in the background). This "output stream pump" is primed by the first character, earlier in that method.

For skeptics, this how the Arduino can continue doing other work (in the foreground) while those characters are gradually pumped out.

@samtal, glad to help!

USBSerial (aka Serial on a Leonardo) ultimately sends the bytes one at a time, so there is no advantage.

It's not the Send8() calls that are a problem - they just copy data from the user arguments to USB buffer. But the ReleaseTX() call a few lines later, in conjunction with the block on USB_SendSpace() before the byte loop. This is what releases the data (in the USB buffer) so that the USB hardware can send it. But there are only two USB Buffers, so once you've queued them to USB hardware to send, the code has to sit around and wait for the USB transaction(s) to complete. Because of the way USB works, this is relatively slow (~1ms)
I've attached a test program that does 5000 individual single-byte Serial.print() calls or 100 50-byte Serial.print() calls, and times how long it takes. For an Nano (with HardwareSerial, at 115200bps) the times are nearly identical - 420 vs 425 ms (about what you'd expect. 11520/5000 = .434)
On a Leonardo (using USB Serial), you get 135 vs 15ms. The "big" prints result in much better performance.
(now those actual numbers on Leonardo don't quite fit my "1 message per ms" explanation, so that wasn't quite right. However...)

void setup() {
  Serial.begin(115200);
  while (!Serial)
    ;
}

void loop() {
  uint32_t startt, endt;

  delay(1000);
  startt = millis();
  for (int i = 0; i < 5000; i++) {
    Serial.print(" ");
  }
  endt = millis();
  Serial.println();
  Serial.print("Write 5000 individual bytes in ");
  Serial.print(endt - startt);
  Serial.println(" milliseconds. \n");
  Serial.flush();

  startt = millis();
  for (int i = 0; i < 100; i++) {
    Serial.print("                      25                       50\r");
  }
  Serial.flush();

  endt = millis();
  Serial.println();
  Serial.print("Write 100 50 byte chunks in ");
  Serial.print(endt - startt);
  Serial.println(" milliseconds. \n");
  Serial.flush();

  while (Serial.read() < 0)
    ;
}

Interesting! Some advantage, then. Thanks for tracing the rest of the call stack. :slight_smile:

Makes me wonder why they didn't use a ring buffer like everything else. But if I had a nickel for every time I wondered that...

packetizing interactive byte-stream IO is a relatively complex problem; ripe for many tradeoffs...

Delta_G:
Asking for proof and claiming he's wrong... with no proof. -dev showed the code and results. What more proof do you want? What proof have you other than your general distaste for the Arduino community? Does your version with printf produce smaller or faster code? Prove it.

I have no "distaste" for the Arduino community. Why on earth would I get on this forum and spend 95% of my time here answering questions or explaining how something works?

As far as my "assertions", let me explain each one:

(from -dev):
There is no reason to build one giant string and then print it all out at once. [1] There is no overhead associated with Serial.print that you can avoid by building one giant string. [2] Printing each piece by itself is actually much more efficient:

  • [3] The Arduino can be printing the first part of the message while it formats the next parts.

#1: There is overhead associated with every and any call of a C function. Parameters have to be placed on the stack, as well as return address and other info. Calling a function 10 times with a small chunk of data is most certainly slower than calling the function with all the data at once (because you avoid the overhead of calling the function 9 times more than necessary).

#2: Wrong - same reason as #1.

#3: This statement is misleading. Although the Arduino hardware serial code does use a ring buffer and interrupts, the statement made by -dev makes it sound as though serial data will be output from the ring buffer even if the CPU is busy elsewhere (for example stuck in a blocking delay(nnn) call or in the middle of parsing the format specifiers of another string).

Now, I suppose I could write a simple sketch and show actual numbers, but I felt that there was no need since everyone knows (I assume) that calling a C function involves "behind the scenes" activity which obviously consumes more time if the function is called repeatedly.

I can do this if you wish.......

Delta_G:
Does your version with printf produce smaller or faster code? Prove it.

Forgot to address this part.

There is no such thing as "my version" of printf. I simply enable the use of existing functionality that the AVR-GCC compiler provides.

This saves me the headaches of using multiple "Serial.print" calls and makes for easier reading of the source code itself.

My IDE also has a checkbox which allows me at edit/compile time to enable or disable floating point support. Yes I know that it uses some resources, but when the sketch is compiled and floating point only adds 1.5K to my 24K sketch and it all fits with room to spare in an UNO or a MEGA, why should I beat my head against the wall to use "dtostrf" and all the extra work that involves?

Now, I don't know if enabling native floating point uses less or more resources than dtostrf and it's required buffer, but the difference can't be all that much and since I have room to spare anyway, why not?

My big gripe is with the DEVELOPERS of the Arduino IDE system. Sure, I understand that they want to conserve resources... I get it. But why not place an OPTION in Preferences to enable or disable certain features?

Want floating point? Just tick the checkbox.
Want to use printf? Just tick the checkbox.
Need every last bit of memory? Un-tick the checkbox.

But, GIVE THE USERS THE CHOICE AND CONTROL!!!

We can turn line numbering on or off, we can choose to auto-rename a sketch from .PDE to .INO, we can do a lot of other [sarcasm] really important [/sarcasm] things in Preferences.

But, control the things that EVERYONE HERE asks about? Nope, can't do it. WHY?

If you and I had a dollar for every time someone asked why trying to print a number only results in a question mark or how to get dtostrf working, we could afford to have others write our code! :slight_smile:

(although what fun would that be?)

samtal:
Hi all,
As the originator of this thread, I am overwhelmed with the magnitude of replies to a question I thought was a simple one.
It looks almost like opening a Pandora's box.

Please don't worry about anything. You didn't open a "Pandora's box", nor did you ask anything wrong.

In fact, the "Pandora's box" is mostly my fault, for I get quite passionate about seeing users have problems with their programming which result from the fact that many simple, standard features and functions in C are disabled by the Arduino developers.

Now, I understand the need to minimize memory and resource usage in a small microcontroller environment, and for many cases, the disabled functions will indeed save some space.

What I gripe about is not that these features are disabled but that there is no built in ability for a user to simply "click" the feature on or off.

I've modified my Arduino IDE to provide these options (as well as a few others) so I know that programming-wise, it's trivial. Any Arduino IDE developer could add these options to the newest IDE in less than a day (probably before lunch time).

In lieu of being to optionally enable these features, there are "alternatives" for example using the function "dtostrf" to take a floating point number and convert it into a user supplied buffer, enabling the user to print fractional numbers.

Unfortunately, the sequence of events (as most everyone here has seen dozens to hundreds of times), goes like this:

(1) User writes a program to display a temperature.
(2) To his surprise, any temperature is displayed as a question mark ('?').
(3) User checks and re-checks his code. Darn it looks fine!!!
(4) User checks the C documentation online and finds that, yes indeed he's doing it right.
(5) User then logs on here and asks about the problem.
(6) User gets a flurry of responses ranging from "It can't be done" to "It's disabled to save memory" to "Use the dtostrf function".
(7) Ah-ha! One positive reply... user CAN do it with dtostrf.
(8) User goes online looking in C documentation for how to use dtostrf.
(9) Darn! Can't seem to find anything. Back to the Arduino forum.
(10) User finds out that dtostrf is an AVR specific function.
(11) User looks at the Arduino documentation. Nothing found. ARGHHHHHH!!!
(12) User checks online for the docs. AH! Success!
(13) User reads the information.....

[b]The dtostrf() function converts the double value passed in val
into an ASCII representationthat will be stored under s. The
caller is responsible for providing sufficient storage in s.

Conversion is done in the format "[-]d.ddd". The minimum field
width of the output string (including the possible'.' and the
possible sign for negative values) is given in width, and prec
determines the number of digits after the decimal sign.

width is signed value, negative for left adjustment.

The dtostrf() function returns the pointer to the converted string s.

[/b]

Yeah OK.... um what the heck does THAT mean?

(14) Another post to this forum yields a few terse examples which user tries.
(15) After a few tries, the user figures out what "provide sufficient storage" means... :slight_smile:
(16) .....it goes on and on.......

OR!!!!!!!!! The user SHOULD be able to simply click the "Enable floating point" checkbox in Preferences, then do this:

printf ("The temperature is %5.1f degrees C\n", temperature);

...and have his sketch JUST WORK.

In fact, the above "scenario" I went through myself when I first started with the Arduino. I was not a noob... I had years of previous experience in programming assembler in Motorola and Intel, as well as programming in C and C++.

I was rather ticked off to find out how much time I wasted with that "floating point problem" and it ticks me off to see others go through the same thing simply because nobody will take a 1/2 hour and add a few options to the IDE that people ACTUALLY NEED.

That's when I get all revved up and write stuff like this.

So, please don't feel as though you did anything wrong or started any problems. You did not, and I sure hope you will continue to ask us anything you need - please feel free.

-- Roger

Of course, floating point support for Serial.print() DID get added. In a relatively nonStandard way that is not as powerful as printf(), but is just about as big.

krupski:
#1: There is overhead associated with every and any call of a C function.

I would probably agree that 10 calls with 2 arguments takes longer than 1 call with 12 arguments. I'm not sure, because there is some increased overhead due to the varargs calling convention of sprintf.

Regardless, the measurements show that the piece-wise overhead is much, much less than the time used by the other techniques (String and sprintf). Most of it has to do with the run-time interpretation of the format string.

#2: Wrong - same reason as #1.

The numbers seem to show that piece-wise if most efficient. I don't know how to discuss something with you if you don't look at the objective numbers. If you don't understand the measurement sketches, just ask a question.

#3: This statement is misleading... /dev makes it sound as though serial data will be output from the ring buffer even if the CPU is busy elsewhere (for example stuck in a blocking delay(nnn) call or in the middle of parsing the format specifiers of another string).

Serial data will be output from the ring buffer even if the CPU is doing a delay or formatting another piece. That's how interrupts work. I marvel that you don't know this, even after I provided links to the source. Continuing to argue the point fits the definition of Willful Ignorance.

Now, I suppose I could write a simple sketch and show actual numbers

I can do this if you wish.......

Yes, please. This is the essence of forum discussion. If you disagree, you have to support the argument with something that everybody can reproduce. Subjective ranting does not nullify the objective measurements. Show us your actual numbers, and maybe we'll all start using sprintf.

There is no such thing as "my version" of printf.

I didn't say "of" I said "with". The OP was asking about function overhead and efficiency. Does your version of code using printf instead of a chain of print calls result in more compact or more efficient code? Does it meet the requirements of the OP? Or does it just make for less typing which wasn't what he was asking for.

westfw:
Of course, floating point support for Serial.print() DID get added. In a relatively nonStandard way that is not as powerful as printf(), but is just about as big.

What I really mean is floating point AND standard in/out/err streams... enabled or disabled by the user, via a checkbox in Preferences.

Another thing that you may or may not know is that there are TWO separate "modules" connected with AVR-GCC floating point support.

One handles printf, fprintf, etc... and the other handles scanf, fscanf, etc...

The second one (the scanf support) is a resource hog and (as far as I know) provides very little benefit to the programmer. The first one (printf) adds only about 1.5K to a sketch and is the one most people would probably use.

So, in my Preferences, I have two "floating point" checkboxes... one for printf (which I use all the time) and one for scanf (which I have never used so far).

-dev:
Yes, please [run the test sketches]. This is the essence of forum discussion. If you disagree, you have to support the argument with something that everybody can reproduce. Subjective ranting does not nullify the objective measurements. Show us your actual numbers, and maybe we'll all start using sprintf.

OK, not sure what I'm supposed to be seeing here, but this is the output of your first sketch (only change I made was to set the serial baud rate to 115200 'cause that's what I always use).

[b]The value is currently 0x57 units
274us
The value is currently 0x05 units
240us
The value is currently 0xB3 units
240us
The value is currently 0x60 units
242us
The value is currently 0x0E units
242us[/b]

Second test sketch results (at 115200 baud):

[b]The value is currently 0x57 units
324us
The value is currently 0x05 units
326us
The value is currently 0xB3 units
328us
The value is currently 0x61 units
326us
The value is currently 0x0F units
326us
The value is currently 0xBD units
326us
[/b]

Third test sketch:

[b]The value is currently 0x57 units
206us
The value is currently 0x05 units
206us
The value is currently 0xB3 units
208us
The value is currently 0x60 units
206us
The value is currently 0x0E units
210us
The value is currently 0xBC units
210us[/b]

Does this look right? I have no idea what I'm supposed to be seeing.......

Delta_G:
I didn't say "of" I said "with". The OP was asking about function overhead and efficiency. Does your version of code using printf instead of a chain of print calls result in more compact or more efficient code? Does it meet the requirements of the OP? Or does it just make for less typing which wasn't what he was asking for.

Oh I see. I guess I misunderstood your question initially.

Using printf requires setting up small (3 or 4 line) functions to read and write the device (such as Serial or LCD, etc...) then "connect" them to the standard input/output/error streams using the "fdevopen()" function.

Of course, this adds a little bit to both the sketch size (i.e. flash) and sram usage.

Is the resulting code "more compact"? Most probably not. Is the code "more efficient"? What does that mean? In order to answer that question, I would have to make a test sketch that printed the same thing using one method and the other method and compare execution speed, resource use, etc....

And, is this all there is to the concept of "efficiency"? What about the fact that I can write, debug and finish the code a lot quicker because I don't have to fight multiple print calls? IMHO, that also counts as "efficiency".

I didn't do this because I don't care one bit if my program takes 300 microseconds to run or 310 microseconds, nor do I care if the program ends up being 22K or 25K in size.

But I DO care about being able to write standard code and have it work the way that I expect it to... usually the first time... as opposed to using a whole bunch of "print" calls and then going back and editing tiny glitches (such as a missing space between a message and a variable display).

I don't understand why everyone is so concerned about microseconds and a few extra K of flash?

If the resulting sketch grew too large to fit then, yeah, fine cut some corners to make it fit. No objections from me.

But when the compiled program takes 22K and I've got almost 256K to load it into, I simply don't care about optimizing down to the last byte. In fact, I may use -O3 instead of -Os... just to drive everyone crazy! :slight_smile:

There is overhead associated with every and any call of a C function. Parameters have to be placed on the stack, as well as return address and other info. Calling a function 10 times with a small chunk of data is most certainly slower than calling the function with all the data at once

Maybe. Most avr-gcc function calls use registers for passing the arguments, up to the point where there are too many arguments to fit in the registers allocated to that purpose. So calling a function with three arguments several times may in fact be faster than calling a function with six arguments (JUST due to function-call overhead.) And then both Serial.print() and printf() (every stdio-based hack I've seen for Arduino) end up calling Serial.write() one byte at a time, anyway. But Serial.write() on AVR arduinos is "light weight" compared to a "real computer" where it might be an operating system call with hundreds of cycles of additional overhead. (but see also my previous message WRT USB and TCP.) So it gets really complicated trying to micro-optimize this sort of thing.

I don't understand why everyone is so concerned about microseconds and a few extra K of flash?

One problem is that avr-libc has highly optimized code that implement both stdio "streams" (which are not actually file-system based), and floating point, and floating point output (via __ftoa_engine; recently discussed in another thread.) So adding printf() adds maybe 1.5k, and adding the floating point version of printf maybe another 2k, and 4k on a chip with 32k of flash isn't really very painful. (but remember that the first Arduino only had 6k of flash...)
However, other processor architectures aren't as lucky; you don't really appreciate avr-libc until you're forced to use something else. newlib-nano (used on most 32bit chips) has something like 12k in the integer-only printf(), and 30k+ if you add floating point by using plain newlib instead (which also adds a lot of bloat to stdio/etc.) That may still be irrelevant on a Due with 512k of flash, but there are smaller ARMs...
So it's not just the actual behavior on AVR that causes printf() to be avoided, but all of it's "reputation" earned on other platforms...

(hmm: incorrect/useless call to printf() in _exit() · Issue #47 · arduino/ArduinoCore-sam · GitHub)

Any Arduino IDE developer could add these options to the newest IDE in less than a day

Adding options is a whole can of worms. Atmel Studio has options for this sort of thing; it's a bewildering score or so of panels with strange names that it carefully documents as being equivalent to incomprehensible gcc options :frowning:

I found it a pain to type multiple Serial.print()'s for debug output so came up with 'Sprint'.

It just does a real cut-down printf():

//
// Example of 'Sprint' - a cut-down printf()
// - saves time typing in debug output for Serial.print
// TonyWilk


//-----------------------------
// Serial.print helper function
// - a real cut-down printf()
//
void Sprint( char *fmt, ... )
{
  char c;
  va_list args;
  va_start( args, fmt );
  while( (c=*fmt) != 0 ){
    switch( c )
    {
    case '%':
      c= *(++fmt);
      switch( c )
      {
      case 'd': Serial.print( va_arg(args,int) ); break;
      case 'f': Serial.print( va_arg(args,double) ); break;
      case 'h': Serial.print( va_arg(args,int), HEX ); break;
      case 'c': Serial.print( (char)va_arg(args,int) ); break;
      case 's': Serial.print( va_arg(args, char *) ); break;
      default:  break;
      }
      break;
    case '\\':
      c= *(++fmt);
      if( c == 'n' )
        Serial.println();
      else
        Serial.print( c );
      break; 
    default:
      Serial.print( c );
      break;
    }
    ++fmt;
  }
  va_end( args );
}

void setup() {
  // put your setup code here, to run once:
  Serial.begin(19200);
  Serial.println("boot...");
  char *astring= "test string";

  Sprint("This is an example...\n");
  Sprint("int:%d, float:%f, hex:0x%h, char: '%c', string:\"%s\".\nThe end\n", 
          42, 123.45, 0xFACE, 'x', astring );
}

void loop() {
  // put your main code here, to run repeatedly:

}

Efficient? well, it's a lot easier to type in "int:%d, float:%f, hex:0x%h\n" than the equivalent in separate statements and it doesn't need yet another buffer (like printf() would).

Anyway, I find it handy.

Yours,
TonyWilk

P.S. dunno how portable this is across all Arduinos, I dimply remember something about the types used with va_arg causing me some bother. I run only run Arduino Pro Mini (AtMega328).

krupski:
I didn't do this because I don't care one bit if my program takes 300 microseconds to run or 310 microseconds, nor do I care if the program ends up being 22K or 25K in size.

But I DO care about being able to write standard code and have it work the way that I expect it to... usually the first time... as opposed to using a whole bunch of "print" calls and then going back and editing tiny glitches (such as a missing space between a message and a variable display).

I don't understand why everyone is so concerned about microseconds and a few extra K of flash?

You seem awfully concerned with yourself here. The OP was asking about code size and execution efficiency. That's the only reason I was suggesting that they be used as metric in this case.