A lightweight serial output implementation (15 assembler instructions)

I've written a serial uart based on AVR305 which is a lot smaller than the Serial class or even TinyDebugSerial (~400 bytes smaller than TinyDebugSerial).

Like TinyDebugSerial, all interrupts are blocked during output. I may write another implementation using timer-clocked USI that will allow interrupts to run.

You probably should mention the trade-off / risk with the reduced code version.

it is a fixed baud rate?
can it be used in parallel with Serial Class?
?
?
?
(more questions will pop up?)

robtillaart:
it is a fixed baud rate?

Yes. 115200.

can it be used in parallel with Serial Class?

Yes. It's a bit-bang serial like SoftwareSerial.

(~400 bytes smaller than TinyDebugSerial).

That seems a bit unlikely, since the comparable parts of TinyDebugSerial are only about 130 bytes...
(TinyDebugSerial works with the Print class, which is an additional significant chunk of code. But comparing serOut() to Print+TinyDebugSerial seems a bit unfair.

Although... Why IS TinyDebugSerial+Print 400 bytes? I thought the compiler was better as excluding unused methods from a sketch?
TinyDebugSerial+Print ends up including at least TWO "write" methods from Print when compiled for tiny85, but only includes ONE such method when compiled for Uno.

void setup() {
  // put your setup code here, to run once:
 Serial.begin(115200);
 Serial.write('T'); 
}
# Compiled for tiny85
avr-nm -n *elf|grep Print
00000264 T _ZN5Print5writeEPKc
00000294 T _ZN5Print5writeEPKhj

# Compiled for Uno
avr-nm -n *elf|grep Print
00000524 T _ZN5Print5writeEPKhj

can it be used in parallel with Serial Class?

Yes. It's a bit-bang serial like SoftSerial.

I don't think that that's what he meant. Your code doesn't "inherit" from Stream, so it doesn't work with Print. Instead, it's a simple C-style replacement for Serial.write(char) and Serial.write(char *)

You COULD shoe-horn your smaller bitbang code (I like the double-complementing of the byte being output; very sneaky!) into TinyDebugSerial, and the savings would be relatively significant (probably about 100 bytes. It would be an educational experience!

TinyDebugSerial+Print ends up including at least TWO "write" methods from Print when compiled for tiny85, but only includes ONE such method when compiled for Uno.

ah. Tiny has a significantly different (older) version of Print, which has virtual methods for both

    virtual void write(const char *str);
    virtual void write(const uint8_t *buffer, size_t size);

While 1.5.2 on Uno has

    size_t write(const char *str) {
      if (str == NULL) return 0;
      return write((const uint8_t *)str, strlen(str));
    }
    virtual size_t write(const uint8_t *buffer, size_t size);

I think.

Thanks for sharing ralphd
I tried it on an ATtiny13 @9.6 MHz.
It used under 300 bytes.

I ammended this:

#elif F_CPU == 9600000L
  #warning Using 9.6Mhz CPU timing 
  #define TXDELAY 25

You probably should mention the trade-off / risk with the reduced code version.

Would you please explain what drawbacks the code has

Erni:

#elif F_CPU == 9600000L

#warning Using 9.6Mhz CPU timing
 #define TXDELAY 25




Ideally I'd use a macro, but my macro writing isn't as good as my C and assembly.
However looking at Bill's optiboot code (line 743), it should be relatively easy:
#define UART_B_VALUE (((F_CPU/BAUD_RATE)-20)/6)

I'll test out something like that and update the code. I also want to make it use flash-based strings (type f_str) to save a few more bytes of code and overhead of copying the string to RAM.



> You probably should mention the trade-off / risk with the reduced code version.



Would you please explain what drawbacks the code has

As others have pointed out, it doesn't implement the Stream interface like TinyDebugSerial does.

westfw:
You COULD shoe-horn your smaller bitbang code (I like the double-complementing of the byte being output; very sneaky!) into TinyDebugSerial, and the savings would be relatively significant (probably about 100 bytes. It would be an educational experience!

I can't take credit for the double-complementing trick; that comes from AVR305. I was able to knock a few instructions off the AVR305 implementation like removing the redundant sec after the com instruction (which is also in optiboot's soft uart).
Here's the avr305 sample code for reference:
http://read.pudn.com/downloads76/sourcecode/embed/287010/AVRembeded_sourecode/avr305.asm__.htm

My version has more jitter than the AVR305 code; up to 2 cycles of jitter per bit vs up to 1 cycle for the AVR305 code.

westfw:

(~400 bytes smaller than TinyDebugSerial).

That seems a bit unlikely, since the comparable parts of TinyDebugSerial are only about 130 bytes...
(TinyDebugSerial works with the Print class, which is an additional significant chunk of code. But comparing serOut() to Print+TinyDebugSerial seems a bit unfair.

Agreed; the blame for the code size certainly doesn't all fall on TinyDebugSerial. To be fair, I'm really impressed with the TinyDebugSerial code, in particular the way templates are used.

ralphd:

Erni:

#elif F_CPU == 9600000L

#warning Using 9.6Mhz CPU timing
 #define TXDELAY 25

Ideally I'd use a macro, but my macro writing isn't as good as my C and assembly.
However looking at Bill's optiboot code (line 743), it should be relatively easy:
#define UART_B_VALUE (((F_CPU/BAUD_RATE)-20)/6)

I'll test out something like that and update the code. I also want to make it use flash-based strings (type f_str) to save a few more bytes of code and overhead of copying the string to RAM.

It seems macros in assembler files isn't so easy. I tried this:

#define BAUD_RATE 115200L
#ifdef F_CPU
  #define TXDELAY (((F_CPU/BAUD_RATE)-8)/3)
#else
  #error CPU frequency F_CPU undefined
#endif

But get this error:

\Arduino\libraries\BasicSerial\BasicSerial.S:44: Error: missing ')'

My guess is the assembler does macro substitution but not macro evaluation. Not sure how to make it work...

westfw:
ah. Tiny has a significantly different (older) version of Print, which has virtual methods for both

Yeah. Sorry about that. I got tired of maintaining synchronization with Print (shiny things are distracting and TinyDebugKnockBang is far more shiny).

Erni:

You probably should mention the trade-off / risk with the reduced code version.

Would you please explain what drawbacks the code has

Trade-off: The TinyDebugSerial baud rate can be changed at run-time. That's part of the difference in code size.

Risk: This is from memory so the details may not be correct... The timing does not quite work out. The typical bit times (115200, 38400, 9600) are usually not evenly divisible by the typical processor speeds (1 MHz, 8 MHz, 16 MHz). At a processor speed of 8 MHz and a baud rate of 115200, a simple bit-bang (like the one in AVR305) has an accumulated half-bit-time error in the final bit. If the receiver is lower quality or the processor's oscillator is too far out of tune this can result in (very) unreliable communications. Certain bit patterns make a problem more likely to occur. Which is why Atmel chose a baud rate of 38400 instead of something more common like 9600 or 115200. The bit time is almost perfect when the processor is running at 1 MHz. TinyDebugSerial compensates for the problem in two ways. A small delay is added after the first five bits are sent and the stop bit is extended (I think it's 1.5 bit times). The extreme case is 1 MHz + 115200 baud. To keep the timing as accurate as possible I unrolled the loop. In other words, TinyDebugSerial should be close to the correct bit times at the expensive of code size.

My guess is the assembler does macro substitution but not macro evaluation.

The assembler does not like "L" in numbers, so #define BAUD_RATE 115200L won't work in math expressions.

westfw:

My guess is the assembler does macro substitution but not macro evaluation.

The assembler does not like "L" in numbers, so #define BAUD_RATE 115200L won't work in math expressions.

Changing it to 115200 without the L fails, this time I'm guessing on F_CPU (defined elsewhere as 16000000L).
Any other ideas?

Thanks for the explanation. Now the TinyDebugSerial code makes more sense. I could tell it was based on AVR305, but had more logic in it. I tried sending improvement suggestions a week ago to the arduino.tiny@gmail address but got no reply. I don't know if that's because you don't use that address or aren't maintaining it any more. Here's what I sent:

TinyDebugSerial is an impressive piece of code, especially the template tricks to minimize the generated code size. I was able to
knock a few bytes off the assembly code in BangOneByte as follows:
instead of:
"rjmp L%=ntop" "\n\t"
"L%=btop: "
"nop" "\n\t" // ---> 7
"nop" "\n\t" //
"nop" "\n\t" //
"nop" "\n\t" //
"nop" "\n\t" //
"nop" "\n\t" //
"nop" "\n\t" //

use:
"rjmp L%=ntop" "\n\t"
"L%=delay4cycle: "
"ret" "\n\t" //
"L%=btop: "
"rcall L%=delay4cycle" "\n\t" // ---> 7

and instead of
"brcc L%=bnoe" "\n\t" //
"nop" "\n\t" //
"nop" "\n\t" //
"L%=bnoe: "
use:
"brcc L%=bnoe" "\n\t" //
"rjmp L%=bnoe" "\n\t" // 2
"L%=bnoe: "

My version is off by 14 cycles after 9 bits, or 1/5th of a bit-time. Changing the 8Mhz delay to 20 from 21 is actually better by one cycle (13 cycles total). That's still better timing than a USART; 2.1% error vs 3.7% for the USART.
http://www.wormfood.net/avrbaudcalc.php?postbitrate=9600&postclock=8&u2xmode=1

I'm currently testing USI clocked off of Counter0 compare match, which should allow me to match the bit times to the nearest cycle. It'll be more code since I have to reverse the bit order before sending, but it won't block interrupts.

ralphd:
Thanks for the explanation.

You are welcome. Now that I found my Excel workbook I can remember a few more details. The timing is not adjusted after five bits but after each bit. The timing should never be wrong by more than 0.5 processor clock ticks (when the clock is >= 8 MHz).

I tried sending improvement suggestions a week ago to the arduino.tiny@gmail address but got no reply.

I got the suggestion. Thank you. I sincerely apologize for not responding. When I have time I will merge your changes.

And thank you for the kind words.

ralphd:
My version is off by 14 cycles after 9 bits, or 1/5th of a bit-time. Changing the 8Mhz delay to 20 from 21 is actually better by one cycle (13 cycles total).

Assuming I did the math correctly 19 loops gets the error to 2.08% / 0.1872 bit time. (I get -2.24% / -0.2016 bit time at 20 loops).

That's still better timing than a USART; 2.1% error vs 3.7% for the USART.

It is. Which raises an interesting question. Is there a need for a smaller code sized TinyDebugSerial?

I'm currently testing USI clocked off of Counter0 compare match, which should allow me to match the bit times to the nearest cycle. It'll be more code since I have to reverse the bit order before sending, but it won't block interrupts.

But is it worth the effort? (Or are you primarily doing it for your own edification?)