What is the most efficient way of using Serial

Hello there!

I'm researching the execution speed (how many clock cycles does it take) of different programming solutions int the Arduino framework. Today I analyzed the Serial functions and I was quite surprised that Serial.write() is just so inefficient.

The piece of code I ran to thest the serial buffer write times (Timer 2 is set to a prescaler of 1):

void writeToSerialBuffer()
{

  Serial.print(F("\nSERIAL BUFFER WRITE TIME:"));
  Serial.println(F("\nTest messages start"));
  Serial.flush();
  
  TCNT2 = 0;
  Serial.print("Hello world!\n");
  toSerial[0] = (TCNT2 - read_timer_delay_toArray[0]);

  TCNT2 = 0;
  Serial.println("Hello world!");
  toSerial[1] = (TCNT2 - read_timer_delay_toArray[0]);

  TCNT2 = 0;
  Serial.write(charArr, 14);
  toSerial[2] = (TCNT2 - read_timer_delay_toArray[0]);

  TCNT2 = 0;
  Serial.write(charArr, sizeof(charArr));
  toSerial[3] = (TCNT2 - read_timer_delay_toArray[0]);
  
  Serial.flush();
  Serial.println(F("\nTest messages end"));
  
  Serial.print(F("\n\n Serial.print():\t"));
  Serial.print(toSerial[0]);
  Serial.print(F("\n Serial.println():\t"));
  Serial.print(toSerial[1]);
  Serial.print(F("\n Serial.write() (known message length):\t"));
  Serial.print(toSerial[2]);
  Serial.print(F("\n Serial.write() (calculated message length):\t"));
  Serial.println(toSerial[3]);
  Serial.flush();
}

And the output is:

SERIAL BUFFER WRITE TIME:
Test messages start
Hello world!
Hello world!
Hello world!

Test messages end


 Serial.print():	36
 Serial.println():	88
 Serial.write() (known message length):	133
 Serial.write() (calculated message length):	133

Why do you think it is happening? Shouldn't Serial.write() be more efficient?

What would be the most efficient way to write into the serial buffer? Even register level stuff is ok for me but that kind of inefficiency I cannot tolerate.

So just to say again. I want a super-efficient solution to write into the serial output buffer and then start transmitting the data.

What would be the most efficient way to write into the serial buffer? Even register level stuff is ok for me.

I think you have answered your own question.

This looks like an XY problem to me. Serial is slow at best. Why do you need this "efficiency"?

Did you try changing the order of the tests?

ferihun:
Why do you think it is happening? Shouldn't Serial.write() be more efficient?

Try putting a substantial delay() between each test to allow time for the serial data to be sent, or do a Serial.flush() between tests.

And, for the future please post a complete program.

...R

My experiment on 16 MHz UNO shows that the execution times of Serial.write(0x41); is about 4.75 us and that of Serial.write(myData, sizeof(myData)); is 102.93 us.

Test Sketch:

char myData[] = "Hello world!\n";
void setup()
{
  Serial.begin(9600);
  TCCR1A = 0;
  TCCR1B = 0;
  TCNT1 = 0;
  TCCR1B = 0x01;  //TC1 ON with prescaler-1
  Serial.write(0x41);
  //Serial.write(myData, sizeof(myData));
  TCCR1B =  0;     //TC1 is OFF
  Serial.println();
  Serial.println(TCNT1);  //reacd counts
}

void loop()
{
  
}

PerryBebbington:
I think you have answered your own question.

I don't think he has answered anything. It is a flawed test since Serial uses interrupts in the background to actually clock the data out. The very first test will always be fast since there is nothing in the buffer and the call returns quickly, before the data is actually sent out. Future calls may block/delay waiting for room in the buffer.

aarg:
This looks like an XY problem to me. Serial is slow at best. Why do you need this "efficiency"?

I need to pack the data efficiently into the buffer. Then the Serial hardware will do its thing afterwards. I need an efficient way because the goal of my research is to find the most efficient code that accomplishes the said function.

I have tried putting a Serial.flush() Between the different test cases and this is how the code looks like this (irrelevant variables and functions are left out because they're another few hundred lines - double checked that al relevant stuff is here):

uint8_t read_timer_delay = 0;
uint8_t read_timer_delay_toArray[2];

uint8_t charArr[] = "Hello world!\n";
uint8_t toSerial[4];

void test_pinMode();
void test_digital_outputs();
void measure_timer_read_time();
void test_digital_inputs();
void writeToSerialBuffer();

void setup()
{
  TIMSK0 &= ~_BV(TOIE0); // disable timer0 overflow interrupt (millis() function)
  
  //set timer 2 prescaler to 1
  TCCR2B = TCCR2B & 0b11111001; //Setting timer2 prescaler to 1
  TCCR2B = TCCR2B | 0b00000001; //Setting timer2 prescaler to 1

  DDRB &= ~(1<<(INPUT_SOURCE_PIN)); //Set PORTB4 as input

  Serial.begin(115200); //open serial port at desired speed
  
  measure_timer_read_time();

  test_digital_outputs();
  Serial.print(F("\n\n"));

  test_pinMode();
  Serial.print(F("\n\n"));

  test_digital_inputs();
  Serial.print(F("\n\n"));

  writeToSerialBuffer();
  
}

void loop() {
  // put your main code here, to run repeatedly:
}



void measure_timer_read_time()
{
  TCNT2 = 0; //Setting timer 2 value to 0
  read_timer_delay = TCNT2; //Read out timer 2 value

  TCNT2 = 0;
  read_timer_delay_toArray[0] = TCNT2;

  //writing the data through Serial port
  Serial.print(F("Time to read timer register: "));
  Serial.println(read_timer_delay);
  Serial.flush(); //wait until the serial buffer is empty (to prevent overflow)

  Serial.print(F("Time to read timer register into array: "));
  Serial.println(read_timer_delay_toArray[0]);
  Serial.flush(); //wait until the serial buffer is empty (to prevent overflow)
}

void writeToSerialBuffer()
{

  Serial.print(F("\nSERIAL BUFFER WRITE TIME:"));
  Serial.println(F("\nTest messages start"));
  Serial.flush();
  
  TCNT2 = 0;
  Serial.print("Hello world!\n");
  toSerial[0] = (TCNT2 - read_timer_delay_toArray[0]);
  Serial.flush();

  TCNT2 = 0;
  Serial.println("Hello world!");
  toSerial[1] = (TCNT2 - read_timer_delay_toArray[0]);
  Serial.flush();

  TCNT2 = 0;
  Serial.write(charArr, 14);
  toSerial[2] = (TCNT2 - read_timer_delay_toArray[0]);
  Serial.flush();

  TCNT2 = 0;
  Serial.write(charArr, sizeof(charArr));
  toSerial[3] = (TCNT2 - read_timer_delay_toArray[0]);
  Serial.flush();
  
  Serial.println(F("\nTest messages end"));
  
  Serial.print(F("\n\n Serial.print():\t"));
  Serial.print(toSerial[0]);
  Serial.print(F("\n Serial.println():\t"));
  Serial.print(toSerial[1]);
  Serial.print(F("\n Serial.write() (known message length):\t"));
  Serial.print(toSerial[2]);
  Serial.print(F("\n Serial.write() (calculated message length):\t"));
  Serial.println(toSerial[3]);
  Serial.flush();
}

Also the output looks very interesting as "println" now takes many more cycles to complete:

Time to read timer register: 1
Time to read timer register into array: 1




SERIAL BUFFER WRITE TIME:
Test messages start
Hello world!
Hello world!
Hello world!
Hello world!

Test messages end


 Serial.print():	36
 Serial.println():	231
 Serial.write() (known message length):	115
 Serial.write() (calculated message length):	115

Edit: I copied the output in 2 pieces and I left out 1 "Hello world!". I corrected the error.

Altering the order of test didn't show any difference in the results.

Do you think there is a way to write to the serial buffer at a register level? I'd be interested in it.

Do you think there is a way to write to the serial buffer at a register level? I

You have the source.

ferihun:
Do you think there is a way to write to the serial buffer at a register level? I'd be interested in it.

Serial.write() method is essentially linked with Serial FIFO Buffer in the context of Arduino Platform. In register level context, there is URD0 = dataByte instruction, which (I think) directly puts data into TX-Register.

ferihun:
I need to pack the data efficiently into the buffer. Then the Serial hardware will do its thing afterwards. I need an efficient way because the goal of my research is to find the most efficient code that accomplishes the said function.

Do you expect to write code more efficient than Serial.write()????

I'm not sure I understand what the point of your "research" is.

Power_Broker:
Do you expect to write code more efficient than Serial.write()????

I'm not sure I understand what the point of your "research" is.

Just a little piece of information:
"digitalWrite(13, HIGH)" takes 67 clock cycles to complete.
"PORTB |= 1<<(portPinNumber)" takes 2 and accomplishes the same thing.

One thing I've learned is that the Arduino commands are easy to use but as a consequence of this they are very inefficient.

Buy faster hardware. Arduino type stuff is really inexpensive. You could waste weeks trying to write data to the serial port with less overhead. Spend $22 on a Teensy or something even more powerful.

ferihun:
Just a little piece of information:
"digitalWrite(13, HIGH)" takes 67 clock cycles to complete.
"PORTB |= 1<<(portPinNumber)" takes 2 and accomplishes the same thing.

In my 16 MHz UNO,

digitalWrite(13, HIGH); takes 41 clock cycles to complete.

PORTB |= 1<<(PORTB5); takes 4 clock cycles to complete.

What is your setup that makes different results from 16 MHz UNO/NANO(Old Bootloader)?

ferihun:
Just a little piece of information:
"digitalWrite(13, HIGH)" takes 67 clock cycles to complete.
"PORTB |= 1<<(portPinNumber)" takes 2 and accomplishes the same thing.

With respect, what has that got to do with Serial.write(). The reason digitalWrite() is slow is because it needs to figure out at runtime which I/O pin to use. Serial.write() does not have to anything like that.

There is digitalWriteFast library that requires the I/O pins to be known at compile time and it is not that much slower than PORTB but it is a lot more convenient.

...R

ferihun:
Just a little piece of information:
"digitalWrite(13, HIGH)" takes 67 clock cycles to complete.
"PORTB |= 1<<(portPinNumber)" takes 2 and accomplishes the same thing.

And if portPinNumber is a variable . . ?

wildbill:
Buy faster hardware. Arduino type stuff is really inexpensive. You could waste weeks trying to write data to the serial port with less overhead. Spend $22 on a Teensy or something even more powerful.

I could buy faster hardware. But the general attitude to "simply buy faster hardware" is generally harmful to the environment and is out of the scope of my research. Optimizing things is time consuming but sometimes it's worth it. I could also spend weeks to dive into the depths of what the Serial.write() command exactly does.

TheMemberFormerlyKnownAsAWOL:
And if portPinNumber is a variable . . ?

Still 2 clock cycles. I tested it because I expected a different result but it's the same.

GolamMostafa:
In my 16 MHz UNO,

digitalWrite(13, HIGH); takes 41 clock cycles to complete.

PORTB |= 1<<(PORTB5); takes 4 clock cycles to complete.

What is your setup that makes different results from 16 MHz UNO?

I tested it on an Arduino nano with the old bootloader. digitalWrite returns 67/65 (HIGH/LOW), all the port manipulations return a value of 2 in both HIGH and LOW cases.
Here are the pieces of code where I test them:

//timer 2 is set up and runs with a prescaler of 1

#define PORT_PIN_NUMBER 5

uint8_t portPinNumber = 5;

TCNT2 = 0;
digitalWrite(13, HIGH); //test instruction
outputDW[0] = (TCNT2 - read_timer_delay_toArray[0]); //error correction writing result into variable

TCNT2 = 0;
PORTB = PORTB | 0b00100000; //test instruction
outputPM[0] = (TCNT2 - read_timer_delay_toArray[0]); //error correction writing result into variable

TCNT2 = 0;
PORTB |= 1<<(PORT_PIN_NUMBER); //test instruction
outputPM[2] = (TCNT2 - read_timer_delay_toArray[0]);

TCNT2 = 0;
PORTB |= 1<<(portPinNumber); //test instruction
outputPM[4] = (TCNT2 - read_timer_delay_toArray[0]);

Robin2:
With respect, what has that got to do with Serial.write(). The reason digitalWrite() is slow is because it needs to figure out at runtime which I/O pin to use. Serial.write() does not have to anything like that.

There is digitalWriteFast library that requires the I/O pins to be known at compile time and it is not that much slower than PORTB but it is a lot more convenient.

...R

Yes it figures out in the runtime but with Serial.write() it's different. Because it essentially does the same thing as Serial.print(). It puts some data into the transmit buffer. But Serial.print() uses a string as input which in theory is harder to handle than an array of bytes with a known size which you just have to copy into the buffer and tell the TX hardware to start transmitting. As in both test cases I used the same amout of data as the testing material, I've gotten the conclusion that there is some weird optimization problem with the Serial.write() command as by its function it should take less time to complete the operation.

. I tested it because I expected a different result but it's the same.

Test it again.
With more rigour.

ferihun:
But Serial.print() uses a string as input which in theory is harder to handle than an array of bytes with a known size which you just have to copy into the buffer and tell the TX hardware to start transmitting.

Not true. print() is an overloaded function, and the handler (and its efficiency) depends on the data type that is passed to it.

ferihun:
I could buy faster hardware. But the general attitude to "simply buy faster hardware" is generally harmful to the environment and is out of the scope of my research. Optimizing things is time consuming but sometimes it's worth it.

What is this "research"? I still don't see the usefulness of what you're trying to accomplish here. Is this "research" in support of an application where you need ultra-lightning-fast serial communication? If not, then what you're trying to do is a purely academic parlor trick and wasting our time trying to answer your questions.

ferihun:
I could also spend weeks to dive into the depths of what the Serial.write() command exactly does.

It takes weeks to understand this?

size_t HardwareSerial::write(uint8_t c)
{
  _written = true;
  // If the buffer and the data register is empty, just write the byte
  // to the data register and be done. This shortcut helps
  // significantly improve the effective datarate at high (>
  // 500kbit/s) bitrates, where interrupt overhead becomes a slowdown.
  if (_tx_buffer_head == _tx_buffer_tail && bit_is_set(*_ucsra, UDRE0)) {
    // If TXC is cleared before writing UDR and the previous byte
    // completes before writing to UDR, TXC will be set but a byte
    // is still being transmitted causing flush() to return too soon.
    // So writing UDR must happen first.
    // Writing UDR and clearing TC must be done atomically, otherwise
    // interrupts might delay the TXC clear so the byte written to UDR
    // is transmitted (setting TXC) before clearing TXC. Then TXC will
    // be cleared when no bytes are left, causing flush() to hang
    ATOMIC_BLOCK(ATOMIC_RESTORESTATE) {
      *_udr = c;
#ifdef MPCM0
      *_ucsra = ((*_ucsra) & ((1 << U2X0) | (1 << MPCM0))) | (1 << TXC0);
#else
      *_ucsra = ((*_ucsra) & ((1 << U2X0) | (1 << TXC0)));
#endif
    }
    return 1;
  }
  tx_buffer_index_t i = (_tx_buffer_head + 1) % SERIAL_TX_BUFFER_SIZE;
	
  // If the output buffer is full, there's nothing for it other than to 
  // wait for the interrupt handler to empty it a bit
  while (i == _tx_buffer_tail) {
    if (bit_is_clear(SREG, SREG_I)) {
      // Interrupts are disabled, so we'll have to poll the data
      // register empty flag ourselves. If it is set, pretend an
      // interrupt has happened and call the handler to free up
      // space for us.
      if(bit_is_set(*_ucsra, UDRE0))
	_tx_udr_empty_irq();
    } else {
      // nop, the interrupt handler will free up space for us
    }
  }

  _tx_buffer[_tx_buffer_head] = c;

  // make atomic to prevent execution of ISR between setting the
  // head pointer and setting the interrupt flag resulting in buffer
  // retransmission
  ATOMIC_BLOCK(ATOMIC_RESTORESTATE) {
    _tx_buffer_head = i;
    sbi(*_ucsrb, UDRIE0);
  }
  
  return 1;
}

(Overloads excepted)