I've written a serial uart based on AVR305 which is a lot smaller than the Serial class or even TinyDebugSerial (~400 bytes smaller than TinyDebugSerial).
Like TinyDebugSerial, all interrupts are blocked during output. I may write another implementation using timer-clocked USI that will allow interrupts to run.
That seems a bit unlikely, since the comparable parts of TinyDebugSerial are only about 130 bytes...
(TinyDebugSerial works with the Print class, which is an additional significant chunk of code. But comparing serOut() to Print+TinyDebugSerial seems a bit unfair.
Although... Why IS TinyDebugSerial+Print 400 bytes? I thought the compiler was better as excluding unused methods from a sketch?
TinyDebugSerial+Print ends up including at least TWO "write" methods from Print when compiled for tiny85, but only includes ONE such method when compiled for Uno.
void setup() {
// put your setup code here, to run once:
Serial.begin(115200);
Serial.write('T');
}
# Compiled for tiny85
avr-nm -n *elf|grep Print
00000264 T _ZN5Print5writeEPKc
00000294 T _ZN5Print5writeEPKhj
# Compiled for Uno
avr-nm -n *elf|grep Print
00000524 T _ZN5Print5writeEPKhj
I don't think that that's what he meant. Your code doesn't "inherit" from Stream, so it doesn't work with Print. Instead, it's a simple C-style replacement for Serial.write(char) and Serial.write(char *)
You COULD shoe-horn your smaller bitbang code (I like the double-complementing of the byte being output; very sneaky!) into TinyDebugSerial, and the savings would be relatively significant (probably about 100 bytes. It would be an educational experience!
TinyDebugSerial+Print ends up including at least TWO "write" methods from Print when compiled for tiny85, but only includes ONE such method when compiled for Uno.
ah. Tiny has a significantly different (older) version of Print, which has virtual methods for both
#warning Using 9.6Mhz CPU timing #define TXDELAY 25
Ideally I'd use a macro, but my macro writing isn't as good as my C and assembly.
However looking at Bill's optiboot code (line 743), it should be relatively easy:
#define UART_B_VALUE (((F_CPU/BAUD_RATE)-20)/6)
I'll test out something like that and update the code. I also want to make it use flash-based strings (type f_str) to save a few more bytes of code and overhead of copying the string to RAM.
> You probably should mention the trade-off / risk with the reduced code version.
Would you please explain what drawbacks the code has
As others have pointed out, it doesn't implement the Stream interface like TinyDebugSerial does.
westfw:
You COULD shoe-horn your smaller bitbang code (I like the double-complementing of the byte being output; very sneaky!) into TinyDebugSerial, and the savings would be relatively significant (probably about 100 bytes. It would be an educational experience!
I can't take credit for the double-complementing trick; that comes from AVR305. I was able to knock a few instructions off the AVR305 implementation like removing the redundant sec after the com instruction (which is also in optiboot's soft uart).
Here's the avr305 sample code for reference: http://read.pudn.com/downloads76/sourcecode/embed/287010/AVRembeded_sourecode/avr305.asm__.htm
My version has more jitter than the AVR305 code; up to 2 cycles of jitter per bit vs up to 1 cycle for the AVR305 code.
That seems a bit unlikely, since the comparable parts of TinyDebugSerial are only about 130 bytes...
(TinyDebugSerial works with the Print class, which is an additional significant chunk of code. But comparing serOut() to Print+TinyDebugSerial seems a bit unfair.
Agreed; the blame for the code size certainly doesn't all fall on TinyDebugSerial. To be fair, I'm really impressed with the TinyDebugSerial code, in particular the way templates are used.
#warning Using 9.6Mhz CPU timing #define TXDELAY 25
Ideally I'd use a macro, but my macro writing isn't as good as my C and assembly.
However looking at Bill's optiboot code (line 743), it should be relatively easy: #define UART_B_VALUE (((F_CPU/BAUD_RATE)-20)/6)
I'll test out something like that and update the code. I also want to make it use flash-based strings (type f_str) to save a few more bytes of code and overhead of copying the string to RAM.
It seems macros in assembler files isn't so easy. I tried this:
#define BAUD_RATE 115200L
#ifdef F_CPU
#define TXDELAY (((F_CPU/BAUD_RATE)-8)/3)
#else
#error CPU frequency F_CPU undefined
#endif
You probably should mention the trade-off / risk with the reduced code version.
Would you please explain what drawbacks the code has
Trade-off: The TinyDebugSerial baud rate can be changed at run-time. That's part of the difference in code size.
Risk: This is from memory so the details may not be correct... The timing does not quite work out. The typical bit times (115200, 38400, 9600) are usually not evenly divisible by the typical processor speeds (1 MHz, 8 MHz, 16 MHz). At a processor speed of 8 MHz and a baud rate of 115200, a simple bit-bang (like the one in AVR305) has an accumulated half-bit-time error in the final bit. If the receiver is lower quality or the processor's oscillator is too far out of tune this can result in (very) unreliable communications. Certain bit patterns make a problem more likely to occur. Which is why Atmel chose a baud rate of 38400 instead of something more common like 9600 or 115200. The bit time is almost perfect when the processor is running at 1 MHz. TinyDebugSerial compensates for the problem in two ways. A small delay is added after the first five bits are sent and the stop bit is extended (I think it's 1.5 bit times). The extreme case is 1 MHz + 115200 baud. To keep the timing as accurate as possible I unrolled the loop. In other words, TinyDebugSerial should be close to the correct bit times at the expensive of code size.
Thanks for the explanation. Now the TinyDebugSerial code makes more sense. I could tell it was based on AVR305, but had more logic in it. I tried sending improvement suggestions a week ago to the arduino.tiny@gmail address but got no reply. I don't know if that's because you don't use that address or aren't maintaining it any more. Here's what I sent:
TinyDebugSerial is an impressive piece of code, especially the template tricks to minimize the generated code size. I was able to
knock a few bytes off the assembly code in BangOneByte as follows:
instead of:
"rjmp L%=ntop" "\n\t"
"L%=btop: "
"nop" "\n\t" // ---> 7
"nop" "\n\t" //
"nop" "\n\t" //
"nop" "\n\t" //
"nop" "\n\t" //
"nop" "\n\t" //
"nop" "\n\t" //
I'm currently testing USI clocked off of Counter0 compare match, which should allow me to match the bit times to the nearest cycle. It'll be more code since I have to reverse the bit order before sending, but it won't block interrupts.
You are welcome. Now that I found my Excel workbook I can remember a few more details. The timing is not adjusted after five bits but after each bit. The timing should never be wrong by more than 0.5 processor clock ticks (when the clock is >= 8 MHz).
I tried sending improvement suggestions a week ago to the arduino.tiny@gmail address but got no reply.
I got the suggestion. Thank you. I sincerely apologize for not responding. When I have time I will merge your changes.
ralphd:
My version is off by 14 cycles after 9 bits, or 1/5th of a bit-time. Changing the 8Mhz delay to 20 from 21 is actually better by one cycle (13 cycles total).
Assuming I did the math correctly 19 loops gets the error to 2.08% / 0.1872 bit time. (I get -2.24% / -0.2016 bit time at 20 loops).
That's still better timing than a USART; 2.1% error vs 3.7% for the USART.
It is. Which raises an interesting question. Is there a need for a smaller code sized TinyDebugSerial?
I'm currently testing USI clocked off of Counter0 compare match, which should allow me to match the bit times to the nearest cycle. It'll be more code since I have to reverse the bit order before sending, but it won't block interrupts.
But is it worth the effort? (Or are you primarily doing it for your own edification?)