Roll your own software serial

Bear with me a while please while I explain. This is quite complex and I see no simple explanation.

I am building a project for which I’d like to use a 1284P. The alternative is a 2560 which would take the application in its stride (I already have that working using a hardware serial port). However the through hole mounting makes the 1284P and form factor is a good match for the requirement except for one thing. The1284P has only 2 serial ports and I need 2 1/2, the 1/2 being a receive port at 9600 baud, 8 bits of data, no parity and one stop bit. Like you get from many GPS units, being NMEA sentences.

No flexibility is required, however the implementation must be reliable. Dropped bits and bytes are unacceptable. Software serial (or one of the alternatives) seemed the obvious answer however all of them were incompatible with other parts of the project particularly the pin change interrupts (PCI) I am using for the user interface. I was getting interrupt vector collisions between the PCI and the software serial software as fatal compiler errors.

I decided I would take a look at a specific purpose implementation on a test 1280 I have to hand for such purposes. If it works on the 1280 I can make it work on a 1284P. I intended to use a pin change interrupt to detect the falling edge of the start bit and then Timer1 to sample the serial signal at appropriate intervals with an ISR.

The code is attached. I apologize in advance for the number of commented out lines and some debug lines, but at least the thinking can be seen.

It is supposed to work like this:

The bit rate period for 9600 Baud 104.17 uS. I decided that I would take only one sample per bit. The application does not have long length serial wires and the security against noise provided by say “best of 3” sampling per bit wasn’t justified, at least not get the thing working. Since the timer must be controlled by an integer I used 52uS. This is an error to be sure (about -0.16%) but it is not accumulated over more than one byte, since the algorithm resets during the stop bit.

Serial communication was developed in the era when communication devices were mechanical and it will take some convincing that a non-accumulated error of less than 1% has an impact.

For my test set up I am using A15, bit 7 on PortK of the 1280 as my serial input pin.

The whole thing is a sort of state machine, controlled by the variable receive_sequence with the following states:

const byte awaiting_start = 0;
const byte start_found = 1;
const byte bit0 = 2;
const byte bit1 = 3;
const byte bit2 = 4;
const byte bit3 = 5;
const byte bit4 = 6;
const byte bit5 = 7;
const byte bit6 = 8;
const byte bit7 = 9;
const byte stop_expected = 10;

We start at awaiting_start:
The PCI ISR routine is looking at the input serial signal and when it sees a high low transition which could be a start bit it restarts Timer 1 (set at 52uS). It also turns off the PCI by setting the mask register to 0, thus eliminating noise interrupts. The other important thing that happens is that the state is advanced to start bit found. At this point the PCI routine has done its job and the next part of the process is under the charge of timer interrupts from Timer1.

The routine read_bits implements a state machine where start found looks samples the serial stream. Some 30uS later (and yes it should be 52uS and Timer1 has many posts about stopping and starting and associated errors in timing and I read them assiduously to try to get to the bottom of this, but to no no avail at this point.) In any case 30uS is not unacceptable it being a just bit early in the bit timing. 52uS would be great but 30 will do as it puts the bit steam sample well towards the middle of the start bit. If a low is found then the serial bit receive process follows. If a high is found then we stop the timer, re-enable the PCI and return to the awaiting start state for a real start bit.

Then follows the reading of the data bits at 104uS intervals using the bit0 to bit7 states which control flow through the read_bits Timer1 ISR. The byte is built up in byte_received until we get to bit 7 and then we start to look for a stop bit. Data bits are sampled every 104uS and a weighting byte is used to build the received character.

stop_expected: At this point we re-enable the PCI for the next start bit and the serial stream is sampled. If a low is found (not a stop bit) the byte received is discarded and we do what we need to do to return to awaiting start. If the stop bit looks good the timer is stopped (it has done its job for now), byte_received_status is set to flag a character arrived, the character arrived is set into final_character. Also a circular buffer with pointer and a character count is updated, serial_buffer

And now we wait until another start bit arrives and we do the whole thing again.

In the process of debug I was using my oscilloscope to show the incoming character and a a digital write on pin 13 to show where the sample was being made. See attached photo. For each step of the state machine the pin 13 was taken high and immediately low giving a spike on pin 13 at about the time the sample is taken. For test purposes Serial1 was programmed to send a letter A into A15 and I displayed the character received on the top trace of my scope and the timing of the sampling is on the bottom trace. Sorry my X cursors are not in a more sensible place. An ideal spot would have been the 30uS timing sample in the start bit, but my bad. Also apologies for a screen photo taken by phone. I do have Rigol’s app but I am no fan of it.

This photo clearly shows that the there is a sample in the right spots, more or less. (I found that by reducing the period of Timer1 down to 26us and fiddling about with the state machine counter counter I could get this timing pretty well precise, but it made no difference to performance and it meant that there was only 26uS for stuff to happen in read_bits.)

Now this code works with a decent real life character stream. I am using a GPS module as a source of characters and as long as all we do is read the GPS with the software serial and echo the stream to the console there are no dropped bits or errors. I have checked this for thousands of characters, both by eye as it streams past and more rigorously with analysis tools. I can do this because I have the read side of Serial1 eavesdropping on the GPS source. So I have a known good source and the one under test. loop captures samples of the stream which are dumped to the console for analysis.

Beauty I thought, just feed the stream to an implementation of TinyGPS+ and we are home and hosed with a robust, if inflexible, serial using software. But no, TINYGPS+ reports checksum errors when fed from the software source and a check of those buffers you see in loop bytes_ss and bytes_ser1 recording characters shows that characters are dropped. I am seeing about 1 checksum error per second from TinyGPS+.

Now the point of this post. Why is this so? It is a mystery to me that the load in loop has an impact. TinyGPS+ will have a load sure, but I don’t think it turns the interrupt off. Sure the process is completely sensitive to interrupts. Too many other interrupts, slow interrupt routines, or turning the interrupt off altogether will be catastrophic. But I can’t see that.

Why does the feed to GPS fail when the feed to the console is fine? (2.99 KB)

I wrote this Yet Another Software Serial a few years ago. It may be of interest.

Also there are alternatives to SoftwareSerial called NeoSwSerial and AltSoftSerial - they might meet your need without having to do any programming


Your problem may be that other interrupts are causing significant delay in execution of your sw UART isr.

You might want to have a look at sw_uart.c in which has to deal with 38400 baud.

Note that the ISR timing is sensitive to other ISRs that run for too long compared to the bit rate so HW uarts need to be decoupled - see tty.c

except for one thing. The1284P has only 2 serial ports and I need 2 1/2, the 1/2 being a receive port at 9600 baud, 8 bits of data, no parity and one stop bit.

Just a thought ...

Could you offload what ever you are doing on one of the serial ports to I2C or SPI ?


Thanks for all that. Most helpful.

PeterP - I agree other interrupts could be troublesome if they are slow to execute or too frequent. Interrupt routines are best short and sharp. In this case though the code I posted is all there is. Unless I've missed one the only interrupts I can think of are the ones used by Arduino itself and my timer and PCI routines. However I mustn't let blindness to a potential problem in that space stop me finding it. I have just downloaded evofw3 and will take a look to see what it can teach me. Looks helpful. As you will see below I tried YASS first.

Robin2 - I did think about offloading the serial load to another chip, being either I2C or SPI to serial. In truth though it would be easier to just replace the 1284P with a 2560 rather than mess about with another device, especially a surface mounted one. The additional device I want to drive is an embedded MP3 player which only has an interface for push-buttons and a serial interface. The rest of the project is littered with SPI and I2C devices and one more serial device being a GPS receiver. For its age it is truly amazing how many devices are "old fashioned serial".

I downloaded YASS and had it working decoding NMEA in under half an hour on a Uno. No dropped bits and TinyGPS+ reported zero checksum errors. I had a sqizz at the code and figured I could modify the receive side for a 1284P with a bit of jiggery pokery here and there. I wasn't interested in the transmit side. I have not modified it. In fact I have commented out all the transmit code.

A good bout of staring at data sheets for the 328 and the 1284 convinced me that the code associated with Timer2 was good to go as it was. A bit more staring at the pins available and how I want to use them convinced me that the best interrupt to use was interrupt 2, which is Arduino pin 2 or PB2 on the 1284P. Interrupts 0 and 1 share pins with RXD1 and TXD1. How else would you assign them?

That meant that I had to change the code around the interrupt and the new pin to work as required. A bit more staring at datasheets and developing a map of the affected registers showed that it was possible with a few subtle changes. I like the scalability of the ATMEGA architecture, it makes such changes easier.

To make a "longer than it should be story" short, I have had the code running on a test 1284P for about an hour now. Apart from my usual fits of stupidity in missing things the process worked pretty well. I changed all the registers all right, but left the isr vector at 1. Nice crash that was. Then I forgot that the new pin was on port B, then forgot it was in a different bit position. It is truly remarkable how little data you receive when you are sampling the wrong pin.

However with a bit of debugging (and staring at the code) with the built in led it worked. I didn't even turn the scope on. As I write this I am receiving NMEA sentences into TinyGPS+ without checksum errors.

So thanks for a nice bit of code and drawing my attention to it. It might well serve my purposes. Next step is to try it out in the full project. That will be a test, there are additional interrupts and loop is pretty busy.

While I see that the original code is specifically aimed at the Uno, it is pretty generic really and is so written. Can I suggest a few additional comments on statements that need attention to change the interrupt pin, input port and bit position. I would help the blind like myself who missed them the first time.

Thanks again for the assistance to the both of you.

While I see that the original code is specifically aimed at the Uno, it is pretty generic really and is so written. Can I suggest a few additional comments on statements that need attention to change the interrupt pin, input port and bit position.

You may add a Reply to the YASS Thread if you wish.


Just so you're aware in case you have to come back to it.

Evofw3 is effectively a serial interface between a host PC (8N1 @115200 baud) and a wireless (868MHz) serial datastream (8N1 @38400 baud).

The Arduino NANO platform that had roiginally been used has the only HW UART attached to the USB port going to the host PC with no second UART for the radio stream hence the nedd for a SW UART (this is why I read your post in the first place)

The essential difference I think that exists for evofw3 is that there are multiple sources that are not physically connected to the Arduino device and they all have slightly different clock rates. That doesn't sound like a factor for you.

However the current (and previous versions that tried to sample on a clock in a similar way to you) all suffered at the hands of interrupt latency caused by other interrupts associated with handling the host serial interface.

That's why evofw3:tty.c (the host interface) handles interrupts and the rx/tx data the way it does - with minimal ISR activity with interrupts disabled. Most of the odd looking stuff in sw_uart.c is really because of the clock variation it has to deal with.

I think you need to inject some data into one of the HW serial ports on your test system and just run a bit bucket in loop to get rid of them. That'll help you see whether it's just the ISR latency or something else about TINYGPS+ that's causing your problems.

That ISR latency and the complications it causes is why I'm trying to port to an Atmega32u4 which has a HW UART available for the radio interface.

Thanks PeterP for the clarification. It looks like most of those issues do not apply in my case. Use of hardware UART is undoubtedly a better answer. UARTS have been around for decades now and they do a great job. I will return to my software serial when I have a bit of time. Finding interrupt latency is never fun. Hard to reproduce and seek out the root cause. I may have to pull out my logic analyser and re-learn yet again how to use it. I am running out of scope channels for dropped bits. I use that tool so infrequently but I fear it may be the only way to track this down. Even worse if YASS works in the project my "logic analyser phobia" may overcome me and the exact reason for my software serial fail may remain one of life's great mysteries. What really bugs me is that process used by Robin2's code and my own are really quite similar. However I am not one to re-invent the wheel. Thanks for your assistance. Much appreciated.

Robin2 that YASS thread is a bottler as we say here. I finally read it all. Even a comment or two from the great Nick Gammon, a countryman of mine as well. I miss Nick on this forum, no bad intent towards any others but I have always found Nick's posts most helpful. Informed, incisive and insightful.

You will find my post re the porting on the YASS thread. It is meant to educate and entertain as well as show how to migrate YASS receive to another processor. I am very keen to persuade users to read datasheets, a task worthy of Sisyphus sometimes. We can but try.

Thanks again for your code and your assistance.

Well, the jury is back. And the short answer is that it is a fail. I thought it would be right thing to close this loop. There might be a bit of information here that someone might find useful.

The project has the following existing interrupts:

  • a 1Hz interrupt from a real time clock. The interrupt code does nothing more than set a byte sized semaphore for action in loop. This is on a pin change interrupt on port A. I would far prefer a conventional interrupt but I am right out of them.
  • a 100Hz interrupt derived from the (50Hz) mains power used to manage a staged shutdown in the event of a power failure. The project is fed from a 12V AC wallwart and the pulses come from an EL814 opto-isolator on the 12VAC. Again this interrupt is minimal. All it does is reset a byte sized counter to the number of milliseconds in a mains half cycle plus a couple. This counter is decremented in the 1mS timer interrupt and if it reaches zero then a mains half cycle has been missed and a byte sized semaphore is set to trigger loop to initiate the managed shut down/power saving processes. This 100Hz is fed to a PCI on port B. Again I would far prefer an conventional interrupt.
  • The HMI buttons (5 of them) feed PCI on port A. This again is pretty minimal. Every time a button bounces it resets a byte sized counter to a "de-bounce time" value (100 currently). The 1mS timer below counts the counter down and when it finally reaches zero the bit pattern on Port A is scanned and turned into button pressed flags for loop to deal with.
  • a 1mS interrupt which takes care of all of the project's time critical actions. I tried setting an output high at the beginning of this loop and low at the end and it is clear from the scope trace that the time spent in that ISR is much shorter than 1ms, however the time varies significantly. The pulses on the scope do jitter about. Not just scope trigger issues.

These interrupts are critical to the processes of the project. If the 100Hz is absent the project sits there waiting for it to come good. Essentially it sits in low power mode. The 1Hz signal is used to trigger most events. 1Hz is plenty fast enough for this application.

Actually there is a much longer story here which I won't detail and is the reason why this post has been so delayed. Initially I put the 100Hz on Port A as well to start with and was dumbfounded when the HMI buttons didn't work. Much research and reading of datasheets informed me that there was an issue with fast changing inputs on PCI. Hence the move to Port B where it works (or appears to). If its worth it I might post my test code but there are posts already about troublesome fast interrupts on PCI. I am undecided. To adequately test requires access to a source of pulses for which you can ideally alter the frequency to feed one interrupt and see when the slow interrupts start to fail. Not everyone has such facilities.

As noted I am uncomfortable with the usage of PCI for this kind of interrupt, but I have 2 options, PCI or none.

So there is a good bit of interrupt activity, the 1mS being awfully close to the 10 bit (1 byte) timing at 9600 Baud ~1.042 mS. A bit too much activity on the 1mS combined with a co-indecent interrupt somewhere else might just be enough.

I am feeding the GPS into INT2 which is deliberately stressful. TinyGPS+ reports a checksum failure rate at between 7 and 8 an hour based on nearly 24 hours of running. It isn't affecting the GPS fix, there must be sufficient redundancy. The project is reporting solid GPS on the project's HMI display, but the detail of checksum failures is real.

I am saddened by this, I was absolutely sure it was going to work based on my testing. But it is what it is.

Thanks for all your help. I have already started on re-designing the PCB for a 2560 processor. Plenty of proper interrupts and more real serial ports than I need.


Thanks for all your help. I have already started on re-designing the PCB for a 2560 processor. Plenty of proper interrupts and more real serial ports than I need.

Thanks for the update.

That sounds like a practical solution even if it is time consuming to implement.