UNO R4 USB Serial.print() is always blocking

In trying to track down some missing cycles, I found that the R4 Serial.print, println, etc functions block until all the data is flushed. here is a simple example that gives drastically different results on the R3 vs the R4. The low serial speed is just there to exaggerate the issue for the example.

unsigned long total_print_time=0;
unsigned long total_flush_time=0;

void setup() {
  // put your setup code here, to run once:
  Serial.begin(4800);
  Serial.println("BEGIN");
  Serial.flush();
}

void loop() {
  // put your main code here, to run repeatedly:
  {
    unsigned long ts = micros();
    Serial.print(total_print_time >> 10);
    Serial.print('\t');
    Serial.println(total_flush_time >> 10);
    total_print_time += (long)(micros()-ts);
  }
  {
    unsigned long ts = micros();
    Serial.flush();
    total_flush_time += (long)(micros()-ts);
  }
  if(Serial.available() > 0)
  {
    Serial.read();
    total_print_time=0;
    total_flush_time=0;
    Serial.println("READY");
    while( Serial.available() < 1) ;
    Serial.read();
  }
}

On the R3 most of the time spent waiting for the data is in the flush (~58:1), which makes sense:

99 5828
100 5846
100 5866
100 5887

On the R4 it is reversed (~74:1), pointing to the R4 USB Serial.print functions always blocking for the data:

7390 99
7390 100
7390 100
7391 100

If someone has a fix to make the R4 nonblocking I would love to see it, but I also just wanted to put this up here as it was difficult to find any definitive statements about the R4 Minima.

This may be related, but it mostly talks about the onboard UART Serial1 not the USB Serial.

https://forum.arduino.cc/t/re-how-about-dma-serial/1346539

Well, not all. The R3 and R4 implementations are completely different. The underlying pure virtual Print::write(uint8_t) is how each byte is written. So in the IDE, you can Go to Definition on either function name

  Serial.write(static_cast<uint8_t>(9));
  Serial.flush();

With R4 WiFi, it takes me to cores/arduino/Serial.cpp, but on Minima it only goes to Serial.h for some reason. Anyway, they're

and

The difference between empty and complete are at the top of the file

So effectively, the print will block until the buffer is empty and the last byte is on its way; and then flush waits for that to be done.

Doesn't seem to be any way to change the behavior.

Unless it has changed, in the last year or so, the Hardware Serial code on the Uno R4 was
not good, in that none of the SerialX writes were buffered at least at the Arduino level.
That is they waited until they could put the last byte into the hardware buffer, been a while, but I think it is two bytes long... And then returns.

When I last looked at the code, they had allocated a TX buffer as part of the Serial objects, but did not use it. There was also a bug where if you called another Serial output function fast enough, that it had not moved the item from out Output register to the shift register by the time you called it, it would overwrite the last byte of the previous write... Hopefully that one has been fixed.

At one point I had a Pull Request to change this code to buffer it:
Serial: Tx use FIFO plus fixes by KurtE · Pull Request #90 · arduino/ArduinoCore-renesas
I closed it out after about a year with no comments or reviews, and I moved on to something else (Giga). Since then there was another PR:

Modify UART Class to Make Use of the txBuffer by delta-G · Pull Request #304 · arduino/ArduinoCore-renesas\

Which looks like it has not been touched in nearly two years now.

Note: About USB, it depends on which UNO you are using.
Minima uses USB code, Wifi uses Hardware Uart to talk (blocking) to ESP32 which does
the USB transfer

For the standard R4 Minima Serial.cpp is actually the code for the UART Serial, normally SerialN where N is a number, the base Serial without a number is set to be the SerialUSB in Arduino.h.

It is using cdc in cores/arduino/usb/SerialUSB.cpp and its write calls flush if any data is put in the queue.

particularly this call to tud_cdc_write_flush when ever there are sent bytes.

A workaround for some situations:

  1. Replace the calls to flush, tud_cdc_write_flush(), in _SerialUSB::write with calls to tud_task()
  2. Put some Serial.flush() calls in your code when you know you have some time for processing or if you need the full print flushed to the port. Waiting for a serial read of user input may be a great time to flush the Tx.
  3. You can also sprinkle tud_task() around, maybe before all the calls in serialUSB to check if connected(), or put it in a low priority timer interrupt which is how I think it is intended to be used.

This got me to about parity (1:1) time between the prints and the flushes in the original example.

Or you could go one step further and just use the tud_ functions directly

Or you could go even one step further and try to rewrite _SerialUSB to use the tud_ functions and an interrupt more similar to how the R3 does. Though care has to be taken with tud_task as it has no timeout on how long it can run.