Pages: 1 [2]   Go Down
Author Topic: minimizing codesize by removing println() in favor of print('\n');  (Read 2165 times)
0 Members and 1 Guest are viewing this topic.
SF Bay Area (USA)
Online Online
Tesla Member
***
Karma: 134
Posts: 6760
Strongly opinionated, but not official!
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

Quote
Teensyduino has a heavily optimized Print
Tell the truth - did you actually optimize it, or did you just avoid bloating it?  :-)
(It's been a bit depressing to watch Serial grow and grow with nearly every release...  Despite contributions that would improve things.)
Logged

0
Offline Offline
God Member
*****
Karma: 26
Posts: 610
Always making something...
View Profile
WWW
 Bigger Bigger  Smaller Smaller  Reset Reset

Tell the truth - did you actually optimize it, or did you just avoid bloating it?  :-)

Teensyduino's "Serial" isn't HardwareSerial at all.  It's completely different code for USB virtual serial.  There is a highly optimized Serial.write(buf, size) function which does block copy directly to USB packet buffers using 2 instructions per byte.  It's optimized for speed, not minimal code size.

Teensyduino's Print has many optimizations that try to maximize use of write(buf, size), rather than writing 1 byte at a time.  Recently Arduino's Print class has started implementing some of these, but in many places it still writes 1 byte at a time.  With HardwareSerial, it doesn't matter, since write(buf, size) is just a loop which repetitively calls the single byte write.  But with Teensyduino's Serial, and with Ethernet and the SD card library, using block writes is much faster.  These Print optimizations are separate from optimizations in the code which actually implements available/read/write I/O.  For streams than use block copy, it makes a huge improvement in performance.

End-to-end speed depends on many software factors, including the software on the PC side, but many people have reported easily achieving 300 kbytes/sec (yes bytes, not bits), and speeds in the 800 kbyte/sec range are possible.


Quote
(It's been a bit depressing to watch Serial grow and grow with nearly every release...  Despite contributions that would improve things.)

Yes, Arduino's HardwareSerial is horribly inefficient.  The use of indirect addressing for all the I/O registers and constants is terribly inefficient on AVR hardware.  Somebody obviously felt 1 copy of the code, no matter how complex and inefficient, would be better than a separate copy for each port.  From a maintenance perspective, maybe it is, but the trade-off is slow performance and unnecessary compiled code size.

At least 1.0.1 changes the index variables to unsigned, so the interrupt won't use the math library to implement the modulus operator!  That's actually a huge improvement in interrupt latency.

Teensyduino also has a HardwareSerial which is heavily optimized, but it only needs to support a single hardware serial port.  If there were 2 or more, I'd make copies.  It's similar to the pre-0015 version Arduino had, but it has a number of small optimizations which have never appeared in any version of Arduino.

All this code I've published is open source.  If anyone really cared, it could be ported back to Arduino, or at least mined for ideas to separately optimize the Arduino version.
« Last Edit: April 14, 2012, 12:28:45 pm by Paul Stoffregen » Logged

Dallas, TX USA
Offline Offline
Faraday Member
**
Karma: 67
Posts: 2702
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset


Yes, Arduino's HardwareSerial is horribly inefficient.  The use of indirect addressing for all the I/O registers and constants is terribly inefficient on AVR hardware.  Somebody obviously felt 1 copy of the code, no matter how complex and inefficient, would be better than a separate copy for each port.  From a maintenance perspective, maybe it is, but the trade-off is slow performance and unnecessary compiled code size.

At least 1.0.1 changes the index variables to unsigned, so the interrupt won't use the math library to implement the modulus operator!  That's actually a huge improvement in interrupt latency.

Also there is no reason to use 16 bit indexes for the head/tail values in the ring_buffer.
You can save several hundred bytes of code space if you chop them down to 8 bit unsigned values.
16 bit indexes won't work anyway because they don't properly mask interrupts to ensure
atomicity when doing compares
so you might as well make them 8 bit values and pick up the extra speed and code space.


--- bill
Logged

Pages: 1 [2]   Go Up
Jump to: