Go Down

Topic: Improved HardwareSerial (Read 9 times) previous topic - next topic

Kiril

Hi All,

I am sure many people have seen the shortcomings of HardwareSerial and came up with their own versions. So here's yet another one. The public interface is the same as the original, your sketches should work with the new one just as with the old one.

Changes/improvements:

* Fully asynchronous operation, uses both RX and TX buffers
* Buffer sizes are individually configurable
* Tried to optimize the code for size and memory usage

The library only supports Arduinos with ATmega168 or ATmega328. I don't have the Mega, or an old arduino with ATmega8, so I cannot test the lib for these MCUs.

I'd be happy if you take a look at the code and feed my back a little.

I can modify it to work with all arduinos if there's interest. I'd be especially happy to see some of the improvements in the official lib -- the TX buffer can speed things up a lot.


Code: [Select]
/*
 HardwareSerial.h - Improved HardwareSerial library for Arduino
 Copyright (c) 2009 Kiril Zyapkov.  All right reserved.

 This library is free software; you can redistribute it and/or
 modify it under the terms of the GNU Lesser General Public
 License as published by the Free Software Foundation; either
 version 2.1 of the License, or (at your option) any later version.

 This library is distributed in the hope that it will be useful,
 but WITHOUT ANY WARRANTY; without even the implied warranty of
 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
 Lesser General Public License for more details.

 You should have received a copy of the GNU Lesser General Public
 License along with this library; if not, write to the Free Software
 Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
*/

#ifndef HardwareSerial_h
#define HardwareSerial_h

#include <inttypes.h>

#include "Print.h"

// Define the amount of tolerance upon which x2 will be enabled, in percent
#define BAUD_TOL 2

// Both buffers must be powers of 2, up to 128 bytes
// Keep in mind the memory limits of the MCU!!
#define USART_RX_BUFFER_SIZE   16
#define USART_TX_BUFFER_SIZE   32

// There were other ways of implementing the ring buffers that I thought of,
// that avoid defining 2 separate struct's for the same thing:
//
//  * allocating memory for the buffers using malloc() at runtime
//      not sure how 'dynamic' memory management will work on the micro
//  * using separate variables, not packing the buffers into structs
//      structs are much neater
//  * defining a single macro for buffer size
//      both buffers would need to have the same size
struct rx_ring_buffer {
   uint8_t buffer[USART_RX_BUFFER_SIZE];
   volatile uint8_t head;
   volatile uint8_t tail;
};

struct tx_ring_buffer {
   uint8_t buffer[USART_TX_BUFFER_SIZE];
   volatile uint8_t head;
   volatile uint8_t tail;
};


class HardwareSerial : public Print
{
 private:
   rx_ring_buffer *_rx_buffer;
   tx_ring_buffer *_tx_buffer;

 public:
   HardwareSerial(rx_ring_buffer *, tx_ring_buffer *);
   void begin(unsigned long);
   void write(uint8_t);
   void flush(void);
   uint8_t available(void);
   uint8_t read(void);

};

extern HardwareSerial Serial;

#endif



Code: [Select]
/*
 HardwareSerial.cpp - Improved HardwareSerial library for Arduino
 Copyright (c) 2009 Kiril Zyapkov.  All right reserved.

 This library is free software; you can redistribute it and/or
 modify it under the terms of the GNU Lesser General Public
 License as published by the Free Software Foundation; either
 version 2.1 of the License, or (at your option) any later version.

 This library is distributed in the hope that it will be useful,
 but WITHOUT ANY WARRANTY; without even the implied warranty of
 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
 Lesser General Public License for more details.

 You should have received a copy of the GNU Lesser General Public
 License along with this library; if not, write to the Free Software
 Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
 
*/
#include <avr/io.h>
#include <avr/interrupt.h>

#include "HardwareSerial.h"

#define USART_RX_BUFFER_MASK (USART_RX_BUFFER_SIZE-1)
#define USART_TX_BUFFER_MASK (USART_TX_BUFFER_SIZE-1)

#if ( USART_RX_BUFFER_SIZE & USART_RX_BUFFER_MASK )
#    error RX buffer size is not a power of 2
#endif
#if ( USART_TX_BUFFER_SIZE & USART_TX_BUFFER_MASK )
#    error TX buffer size is not a power of 2
#endif

#ifndef BAUD_TOL
#   define BAUD_TOL 2
#endif

// both buffers are static global vars in this file. interrupt handlers
// need to access them
static rx_ring_buffer rx_buffer;
static tx_ring_buffer tx_buffer;

/**
* Initialize the USART module with the BAUD rate predefined
* in HardwareSerial.h
*
* This may implement passing addresses of registers, like the HardwareSerial
* class from the Arduino library, but that's not necessary at this point.
*
*/
HardwareSerial::HardwareSerial(rx_ring_buffer *rx_buffer_ptr, tx_ring_buffer *tx_buffer_ptr) {

   _rx_buffer = rx_buffer_ptr;
   _tx_buffer = tx_buffer_ptr;

   // Allocate memory for the rx and tx buffers
   // That is, if the structs contain pointers to the actual buffer
   // arrays. I'm going for defining 2 separate struct's and allocating them
   // statically.
//    _rx_buffer->buffer = (uint8_t *)malloc(USART_RX_BUFFER_SIZE);
//    _tx_buffer->buffer = (uint8_t *)malloc(USART_TX_BUFFER_SIZE);
}

void HardwareSerial::begin(unsigned long baud) {
   // taken from <util/setbaud.h>
   uint8_t use2x = 0;
   uint16_t ubbr =  (F_CPU + 8UL * baud) / (16UL * baud) - 1UL;
   if ( (100 * (F_CPU)) > (16 * (ubbr + 1) * (100 * ubbr + ubbr * BAUD_TOL)) ) {
       use2x = 1;
       ubbr = (F_CPU + 4UL * baud) / (8UL * baud) - 1UL;
   }

   UBRR0L = ubbr & 0xff;
   UBRR0H = ubbr >> 8;
   if (use2x) {
       UCSR0A |= (1 << U2X0);
   } else {
       UCSR0A &= ~(1 << U2X0);
   }

   // Flush buffers
   _tx_buffer->head = _tx_buffer->tail = 0;
   _rx_buffer->head = _rx_buffer->tail = 0;

   UCSR0B |= (1<<TXEN0);  // Enable Transmitter
   UCSR0B |= (1<<RXEN0);  // Enable Reciever
   UCSR0B |= (1<<RXCIE0); // Enable Rx Complete Interrupt
}

void HardwareSerial::write(uint8_t data) {
   // Calculate new head position
   uint8_t tmp_head = (_tx_buffer->head + 1) & USART_TX_BUFFER_MASK;

   // Block until there's room in the buffer
   // XXX: this may block forever if someone externally disabled the transmitter
   //      or the DRE interrupt and there's data in the buffer. Careful!
   while (tmp_head == _tx_buffer->tail);

   // Advance the head, store the data
   _tx_buffer->buffer[tmp_head] = data;
   _tx_buffer->head = tmp_head;

   UCSR0B |= (1<<UDRIE0); // Enable Data Register Empty interrupt
}

void HardwareSerial::flush(void) {
   // Not sure if disabling interrupts is needed here, but let's be on
   // the safe side
   // disable interrupts
   uint8_t oldSREG = SREG;
   cli();
   _rx_buffer->head = _rx_buffer->tail;
   // Re-enable interrupts
   SREG = oldSREG;
}

/**
* Returns the number of bytes in the RX buffer
*
*/
uint8_t HardwareSerial::available(void) {
   //    return (_rx_buffer->head + USART_RX_BUFFER_SIZE - _rx_buffer->tail) % USART_RX_BUFFER_SIZE;
   // stupid compiler ...
   uint16_t tmp = (_rx_buffer->head + USART_RX_BUFFER_SIZE - _rx_buffer->tail);
   tmp %= USART_RX_BUFFER_SIZE;
   return (uint8_t)tmp;
}

/**
* Returns a byte from the RX buffer, or NULL if none are available
*/
uint8_t HardwareSerial::read(void) {
   uint8_t tmp_tail, tmp_data;

   // disable interrupts
   uint8_t oldSREG = SREG;
   cli();
   if (_rx_buffer->head == _rx_buffer->tail) {
       // Better that than block the code waiting for data. Users should call
       // available() first
       tmp_data = -1;
   } else {
       tmp_tail = (_rx_buffer->tail +1) & USART_RX_BUFFER_MASK;
       tmp_data = _rx_buffer->buffer[tmp_tail];
       _rx_buffer->tail = tmp_tail;
   }
   // Re-enable interrupts
   SREG = oldSREG;

   return tmp_data;
}

/**
* Receive handler
*/
ISR(USART_RX_vect) {
   uint8_t data = UDR0;
   uint8_t tmp_head = (rx_buffer.head + 1) & USART_RX_BUFFER_MASK;
   if (tmp_head == rx_buffer.tail) {
       // buffer overflow! for now, the strategy is to discard a byte off
       // the queue, so that the fresh data can be written. Probably an overflow
       // flag should be raised somewhere ...
       // http://c2.com/cgi/wiki?CircularBuffer
       rx_buffer.tail = (rx_buffer.tail + 1) & USART_RX_BUFFER_MASK;
   }
   rx_buffer.buffer[tmp_head] = data;
   rx_buffer.head = tmp_head;
}

/**
* Data Register Empty Handler
*/
ISR(USART_UDRE_vect) {
   if (tx_buffer.head == tx_buffer.tail) {
       // Buffer is empty, disable the interrupt
       UCSR0B &= ~(1<<UDRIE0);
   } else {
       tx_buffer.tail = (tx_buffer.tail + 1) & USART_TX_BUFFER_MASK;
       UDR0 = tx_buffer.buffer[tx_buffer.tail];
   }
}

HardwareSerial Serial(&rx_buffer, &tx_buffer);


To give it a try, replace hardware/cores/arduino/HardwareSerial.{cpp,h} with the ones listed above. Keep backups of the originals!

mircho

Very clean, nice implementation!
Thanks for the effort!

mekon83

Very nice! Make sure you send an email to the developer's forum, developers@arduino.cc
I kinda gave up trying to improve the libraries because nobody reacted to my original improvement, and the new HardwareSerial was just so awful that I realized that Arduino developers don't care about efficiency at all =)

mellis

Yes, please email proposed changes to the developers list.  In particular, an explanation of why the optimization is important helps.  In general, we tend to focus on simplicity and ease-of-use, so patches that just effect performance tend to be lower priority, unless there's a compelling reason otherwise.

kg4wsv

Quote
I realized that Arduino developers don't care about efficiency at all =)

Some things are more important  than efficiency.  Correct operation and reliability are two.  Portability (across arduino processors and clock speeds) is another.

-j


mekon83

@kg4wsv, @mellis: I completely understand that there are compromises. But I don't understand why one of the developers can't change the type of rx_buffer_head and rx_buffer_tail from int to unsigned int to avoid integer division in the interrupt handler. This makes the handler to execute in 4us instead of 15us (16MHz clock). The whole serial library was refactored anyway. The slow handler causes problems with fast connections and interferes when precise timing is needed.

I'm not proposing any hardcore optimization to save a cycle or two. I think it's reasonable to have a simple, effective code in the core libraries so that people don't have to worry how it works and can just use it. It's not possible now, one has to hack the library to get a good performance.

And I don't understand why a more radical optimization isn't a priority. Arduino runs only on AVR processors, one has to change the libraries quite a bit to be able to run it on a different hardware.

Kiril

Quote
Correct operation and reliability are two.


I've tested as much as I could, and there are hundreds of users here -- these can easily be achieved with a combined effort.

Quote
Portability (across arduino processors and clock speeds) is another.


Portability was intentionally left out, as I have no means of testing it. Again, if there's a will, there's a way.

mekon83 is right.

Did anyone try the code at all? Can you do some benchmarks/tests?

kg4wsv

The arduino family is more varied than you apparently think.  There are subtle differences between the ATmega8, the ATmega168, the ATmega328, and the ATmega1280 (on the arduino MEGA). Besides those architectural differences, there are two clock speed configurations: 8MHz internal oscillator (ala Lillypad) and the 16MHz external crystal.

Changes to the core are not made (and should not be made) without some careful deliberation and testing to maintain reliability and compatibility.  E.g., there has been a reasonably lengthy discussion on the developer's list on the subject of changing the size of the RX buffer.  That's a lot less earth-shattering than the changes you propose.

No one is saying your changes are bad, but I only recall seeing two mentions of UART performance being a problem, and your code hasn't even ran on all platforms yet.  It's not quite ready for the arduino 0016 core.

-j


Kiril

#8
May 19, 2009, 02:06 pm Last Edit: May 19, 2009, 02:23 pm by lz3060 Reason: 1
0016?! Didn't even cross my mind!

What I am proposing is to think about introducing some of the improvements/features, mainly the TX buffer. This alone is too much of a change to even consider it feasible for the upcoming release, but let's start a discussion about it -- maybe we can have it by 0020, no?

What about the other changes? head and tail in the ring_buffer are still int-s? Is anyone using buffers larger than 128 bytes? Even if so, why not 'unsigned int'? mekon83 is the person who pointed out these issues first, and his implementation was much faster with minimal changes in the code that would have been completely unnoticeable to almost all users. Sure, most of them wouldn't have noticed the improved performance either ...

As to portability: my version should run fine at any clock speeds given BAUDs in a reasonable range, and can be made to run on all MCUs. Once again, the latter was intentionally left out.

mekon83

@kg4wsv: Thanks, but I read the libraries and the datasheets. All you need to change are the port addresses (to account for different chips) and the computation of baud rate (to account for different clocks). I know that every change needs testing. It should've been done in Arduino 0001. Obviously, it wasn't.

gabebear

As far as async writing:

I don't know about adding an outgoing buffer inside the HardwareSerial class, I hate giving up that memory and also giving up the time it takes to copy the data into the buffer. I think the memory should be allocated by the user.

I purpose a few new methods:
  • startAsync(uint8_t *ptr, uint8_t length) - write a length of data. If a transmition is in progress it blocks until the old data is sent.
  • waitAsync(uint8_t *ptr) - wait for any data to finish being sent
  • killAsync(uint8_t *ptr) - stop sending the data

startAsync() will send the data asynchronously, and is matched with a waitAsync() or killAsync(). The memory must not be modified and must stay in scope between these calls. This won't fit all use-cases, but should fit many. It would be extremely light weight.




This is almost a non-issue if you set the baud rate really high. The normal write() function shouldn't block until 3 bytes are waiting to be sent(1 in transmit buffer, 1 in shift buffer, 1 in software). Both the FTDI adapter and 16Mhz Arduino support 2Mbaud. Each byte takes 10 bits to transmit ( startbit + 8 databits + stopbit ), so for a 16Mhz Arduino:
9600 baud = 1666.6 cpu cycles / byte
2000000 baud = 80 cpu cycles / byte
8000000 baud = 20 cpu cycles / byte

8Mbaud is only supported in synchronous mode, and most serial ports aren't capable of hitting that speed(the FTDI adapter can't).




The only problem with 2Mbaud is that Windows and Java don't support high baud rates well... which is were we should be focusing effort first.

On Windows you can use the FTDI libraries to get high baud rates on FTDI hardware http://www.ftdichip.com/Documents/ProgramGuides.htm

On OSX, just use IOKit/ioctl http://www.arduino.cc/playground/Interfacing/Cocoa

rglenn

gabebear, I think you're assuming all serial interactions are going to be done via USB. I, for one, want a serial logging link from my robot back to my PC via XBee, and would prefer if the robot wasn't spending all it's time waiting to send back the next logging character.

I think that for such applications, even a 32 byte buffer would make a HUGE difference - as long as my logging messages were kept shorter than that, I could send one (copied to the buffer in, what, 40 cycles? Maybe 70?) and then go about my business.

As for startAsync etc... we lose any kind of nice formatting, type conversion, etc. that we'd get with Print functions. This is a huge cost in terms of ease of use.

gabebear

Hmm, I'm not a big fan of the Print class. Its virtual nature, crazy amount of internal casts, and slow number converter make me want to just ditch it.

That said, it is easy to fix your objection. Make "startAsync" an object hanging off the "Serial" object that descends from Print. Then you could have:

  • Serial.startAsync.print(...) - write some data. If a transmission is in progress it blocks until the old data is sent.
  • Serial.waitAsync(uint8_t *ptr) - wait for any data to finish being sent
  • Serial.killAsync(uint8_t *ptr) - stop sending the data





I was wrong in my above message, the 9600 baud takes 16666.6 cycles, not 1666.6.

Kiril

@rglenn: I agree. Adding the *Async() functions is not a good idea.

What's wrong with making the transmitter asynchronous? Copying data into the buffer is a small price to pay compared to the other benefits.

To me, both RX and TX should be asynchronous, the RX buffer size should be decreased (128 bytes is waaay more than needed in 99% of the cases), TX buffer should be 32 or 64 bytes by default.

If altering the existing HardwareSerial is such a big problem, why don't you guys just make an alternative and add it to the distribution? Given the choice, user's will pick what's best for them. Fact is, the current "official" implementation does not work for me, and many others, I believe.

kg4wsv

Quote
Hmm, I'm not a big fan of the Print class.

I became a big fan of the Print class when I was writing some code for the MAX7456 OSD.  I finally realized I just needed to code a write() function and add a bit of code to keep up with X and Y cursor positions, then I automagically inherited a bunch out output functions for free.

Quote
If altering the existing HardwareSerial is such a big problem, why don't you guys just make an alternative and add it to the distribution?


The problem with this is the fact that the HardwareSerial distributed has stuff preallocated and predefined.  There is no way to use an alternate HWS without hacking the distribution's core.  Even worse, I see no way to change this without breaking a lot of existing code which assumes this initialization has already taken place.

-j

Go Up