minimizing codesize by removing println() in favor of print('\n');

Came accross this "optimization". Remove all println() from print.cpp in favor of using \n explicitely in printing.

A simple test - IDE0.22 win 7/64 - shows

void setup()
{
  Serial.begin(9600);
  Serial.println("start");
}

void loop(){}

Binary sketch size: 1972 bytes (of a 30720 byte maximum)

void setup()
{
  Serial.begin(9600);
  Serial.print("start\n");
}

void loop(){}

Binary sketch size: 1490 bytes (of a 30720 byte maximum)

1972 - 1490 = 482 , that are a lot of \n's

Hard to believe they took 482 bytes to output a newline, but there you go ...

Time to dive in deeper ...

snippets needed for println() that one string ...

void Print::println(const char c[])
{
  print(c);
  println();
}

void Print::print(const char str[])  
{
  write(str);
}

void Print::println(void)
{
  print('\r');
  print('\n');  
}

void Print::print(char c, int base)
{
  print((long) c, base);
}

void Print::print(long n, int base)
{
  if (base == 0) {
    write(n);
  } else if (base == 10) {
    if (n < 0) {
      print('-');
      n = -n;
    }
    printNumber(n, 10);
  } else {
    printNumber(n, base);
  }
}

otherwise only this one would be needed

void Print::print(const char str[])  
{
  write(str);
}

if counted correctly
println("start") ==> 6 calls to print(..) and 3 calls to write(..)
print("start\n"); ==> 1 call to print(..) and 1 to write(..)

A q&d performance test shows no difference in speed,

Conclusion: keep in mind when running out of memory.

Hard to believe they took 482 bytes to output a newline, but there you go ...

It doesn't take 482 bytes to output a newline, but there needs to be an overloaded println() variation for every overloaded print() variation. Together, they add up.

It shouldn't need all those variations, Paul, if it doesn't use them. However I believe I see the problem ...

First for a fair test the "shorter" one should read:

void setup()
{
  Serial.begin(9600);
  Serial.print("start\r\n");
}

void loop(){}

That's because println (as you see above) outputs \r and \n.

Binary sketch size: 1480 bytes (of a 32256 byte maximum)

Now as for the rest, there is an implementation bug, namely this:

void Print::println(void)
{
  print('\r');
  print('\n');  
}

That should read:

void Print::println(void)
{
  write('\r');
  write('\n');  
}

Why? Because it is calling:

void Print::print(char c, int base)

Which casts c to long and then calls:

void Print::print(long n, int base)

Which calls:

void Print::printNumber(unsigned long n, uint8_t base)

A whole lot of effort (and time) to convert a newline into a decimal number, base 10, and then cast it back to what it was, and then print it.

Compare to this:

void myprintln ()
  {
  Serial.write ('\r');
  Serial.write ('\n');
  }

void myprintln (const char c[])
  {
  Serial.print (c);
  myprintln();  
  }
  
void setup()
{
  Serial.begin(9600);
  myprintln("start");
}

void loop(){}
Binary sketch size: 1512 bytes (of a 32256 byte maximum)

Only 32 more bytes now, by using a version of println that writes rather than prints.

So I seriously think the implementors of the Print library should change println to write, not print.

Even better, combine into a single write:

void myprintln ()
  {
  Serial.write ("\r\n");
  }

void myprintln (const char c[])
  {
  Serial.print (c);
  myprintln();  
  }
  
void setup()
{
  Serial.begin(9600);
  myprintln("start");
}

void loop(){}

Now size is:

Binary sketch size: 1500 bytes (of a 32256 byte maximum)

Only 20 more bytes. That's not too bad, after all we got another function out of it (println) that we didn't use before.

The 1.0 version is much the same:

size_t Print::println(void)
{
  size_t n = print('\r');
  n += print('\n');
  return n;
}

That could be:

size_t Print::println(void)
{
   return write ("\r\n");
}

Less complex too.

Hi Nick,

I knew you would dive into it :wink: You came up with the same refactor I saw, write("\r\n");

However it makes no real diff in speed when I used it in Serial as communication overhead is so much more...

For the fair test, you are right. It should be \r\n however most people will not use the \r as receiving terminals will often add \r automagically.

--- update ---
Reported - Google Code Archive - Long-term storage for Google Code Project Hosting. -

Nick,
Would it make sense to put the "\r\n" string in progmem? preserve 3 or 4 bytes of RAM ..

Teensyduino has a heavily optimized Print (and many other Arduino functions). Here's the code I wrote for Print::println():

size_t Print::println(void)
{
        uint8_t buf[2]={'\r', '\n'};
        return write(buf, 2);
}

Compare:

void myprintln ()
  {
  Serial.write ("\r\n");
  }

void myprintln (const char c[])
  {
  Serial.print (c);
  myprintln();  
  }
  
void setup()
{
  Serial.begin(9600);
  myprintln("start");
}

void loop(){}

Under version 0022, compiles as:

Binary sketch size: 1500 bytes (of a 32256 byte maximum)

Under version 1.0, compiles as:

Binary sketch size: 3480 bytes (of a 258048 byte maximum)

Forget saving 3 bytes with program memory! Save 1980 bytes by sticking to version 0022! Wow, just wow. That's 6% of memory lost just by upgrading.

However changing myprintln to:

void myprintln ()
  {
  Serial.print (F("\r\n"));
  }

Reduces code to:

Binary sketch size: 3534 bytes (of a 258048 byte maximum)

So a slight saving, yes.

Under version 0022 changing myprintln to:

void myprintln ()
  {
  uint8_t buf[2]={'\r', '\n'};
  Serial.write(buf, sizeof buf);
  }

Increased code from 1520 to 1528, strangely enough.

Binary sketch size: 1528 bytes (of a 32256 byte maximum)

Yes, but 1.0 also magically increased your maximum size from 32256 to 258048. That's a pretty impressive upgrade!!!

Or perhaps you tested 0022 on Uno and 1.0 on Mega? The code sizes between boards aren't really comparable, since lots of extra code gets included to support Mega's 4 serial ports, extra timers and so on.

Lol, you are right. I thought the figure on the right looked a bit strange.

Setting IDE 1.0 to Uno (where I usually have it, to be fair to me) I got:

Binary sketch size: 1928 bytes (of a 32256 byte maximum)

Only 428 bytes more for upgrading to version 1.0.

So, here's some advice ... if you are short of memory, stick to 0022 of the IDE. Unfortunately you lose the F() macro then. Ah well, there is always the old-fashioned way of doing program memory. Or you could probably retro-fit the macro.

Teensyduino is where the F() macro started.... of course also with input from Mikal Hart and Brian Cook, and then it was contributed to Arduino. Teensyduino supports it on 0022 and 0023. Even if you're not using Teensy, you could install Teensyduino and copy-n-paste bits from Teensy's Print.cpp to get a F() that works in 0022.

Sounds easy enough.

In addition to the F-macro, I highly recommend Teensyduino for the IDE enhancements.

If you just want to add the F-macro to version 0022, these two diffs should help...

http://code.google.com/p/arduino-tiny/source/diff?spec=svn66&r=58&format=side&path=/trunk/hardware/tiny/cores/tiny/Print.h&old_path=/trunk/hardware/tiny/cores/tiny/Print.h&old=44

http://code.google.com/p/arduino-tiny/source/diff?spec=svn66&r=58&format=side&path=/trunk/hardware/tiny/cores/tiny/Print.cpp&old_path=/trunk/hardware/tiny/cores/tiny/Print.cpp&old=8

Teensyduino has a heavily optimized Print

Tell the truth - did you actually optimize it, or did you just avoid bloating it? :slight_smile:
(It's been a bit depressing to watch Serial grow and grow with nearly every release... Despite contributions that would improve things.)

westfw:
Tell the truth - did you actually optimize it, or did you just avoid bloating it? :slight_smile:

Teensyduino's "Serial" isn't HardwareSerial at all. It's completely different code for USB virtual serial. There is a highly optimized Serial.write(buf, size) function which does block copy directly to USB packet buffers using 2 instructions per byte. It's optimized for speed, not minimal code size.

Teensyduino's Print has many optimizations that try to maximize use of write(buf, size), rather than writing 1 byte at a time. Recently Arduino's Print class has started implementing some of these, but in many places it still writes 1 byte at a time. With HardwareSerial, it doesn't matter, since write(buf, size) is just a loop which repetitively calls the single byte write. But with Teensyduino's Serial, and with Ethernet and the SD card library, using block writes is much faster. These Print optimizations are separate from optimizations in the code which actually implements available/read/write I/O. For streams than use block copy, it makes a huge improvement in performance.

End-to-end speed depends on many software factors, including the software on the PC side, but many people have reported easily achieving 300 kbytes/sec (yes bytes, not bits), and speeds in the 800 kbyte/sec range are possible.

(It's been a bit depressing to watch Serial grow and grow with nearly every release... Despite contributions that would improve things.)

Yes, Arduino's HardwareSerial is horribly inefficient. The use of indirect addressing for all the I/O registers and constants is terribly inefficient on AVR hardware. Somebody obviously felt 1 copy of the code, no matter how complex and inefficient, would be better than a separate copy for each port. From a maintenance perspective, maybe it is, but the trade-off is slow performance and unnecessary compiled code size.

At least 1.0.1 changes the index variables to unsigned, so the interrupt won't use the math library to implement the modulus operator! That's actually a huge improvement in interrupt latency.

Teensyduino also has a HardwareSerial which is heavily optimized, but it only needs to support a single hardware serial port. If there were 2 or more, I'd make copies. It's similar to the pre-0015 version Arduino had, but it has a number of small optimizations which have never appeared in any version of Arduino.

All this code I've published is open source. If anyone really cared, it could be ported back to Arduino, or at least mined for ideas to separately optimize the Arduino version.

Also there is no reason to use 16 bit indexes for the head/tail values in the ring_buffer.
You can save several hundred bytes of code space if you chop them down to 8 bit unsigned values.
16 bit indexes won't work anyway because they don't properly mask interrupts to ensure
atomicity when doing compares
so you might as well make them 8 bit values and pick up the extra speed and code space.

--- bill