Max command processing using SerialPort

I have a Mega 2560 R3 of the brand Arduino and GWS. I'm using SerialPort to communicate through USB (CDC) with an application on my computer. As a test, I wrote the below sketch with the Arduino IDE and a console based application in C# (using Visual Studio 2019).

The PC application sends a command "0123456789\n", and the Arduino responds with an acknowledge "A\n". The PC application shows how many commands/ack sets can be processed per second.

This is the Arduino Sketch:

void setup()
{
  Serial.begin(115200);  
}


void loop()
{
  String sCommandRx = Serial.readStringUntil('\n');
  if (sCommandRx == "0123456789")
  {
    Serial.print("A\n"); // send acknowledge
    sCommandRx = ""; // although not neccessary, this is just to be sure it's empty
  }
}

This is the C# code:

using System;
using System.Diagnostics;
using System.IO.Ports;


namespace BasicComTest
{
    class Program
    {
        public static SerialPort _serialPort;


        static void Main(string[] args)
        {
            _serialPort = new SerialPort();
            _serialPort.BaudRate = 115200;
            _serialPort.PortName = "COM4";
            _serialPort.Handshake = Handshake.None;
            _serialPort.NewLine = "\n";


            _serialPort.ReadTimeout = 2000;
            _serialPort.WriteTimeout = 2000;


            _serialPort.Open();


            _serialPort.WriteLine("");


            Stopwatch sw = new Stopwatch();
            sw.Start();


            ulong count = 0;


            while (!Console.KeyAvailable)
            {
                _serialPort.WriteLine($"0123456789");
                string ack = _serialPort.ReadLine();


                count++;
                if (count == 1000)
                {
                    sw.Stop();
                    Debug.WriteLine($"1000 commands in {sw.ElapsedMilliseconds} - {1000000/sw.ElapsedMilliseconds} commands/sec");
                    count = 0;
                    sw.Restart();
                }
            }


            _serialPort.Close();
        }
    }
}

What is strange, is that for baudrates of 38400, 57600, 115200, 500000..., the result sticks on 244 commands/second. Why is that? If we estimate that each character takes 10 bits (8 bits + start/stop), then at 115200 baud this means 11520 characters/second. My send and receive action are in sequence, and are in total 11 + 2 = 13 characters. So I would expect something close to 11520/13 = 886 commands/second. 244 is way off! And I can't believe that there is so much overhead.

Has this to do with USB CDC behavior? I have read somewhere that the polling intervals are 1 msec. But even that limit should give me more than 244 commands/second. If one polling sequence is used to send my command, and the other to get the ACK, then I still would have 500 commands/second.

Please don't comment on the fact that I used "readStringUntil()". I also used the below code to read the string:

while (Serial.available()) {
//if (Serial.available() > 0) { // I also used this alternative, but all with same results
  char inChar = (char)Serial.read();
  if (inChar == '\n')
  {
    sCommandRx = sCommand; // sCommandRx is used by the main loop (we cut off '\n')
    sCommand = ""; // arm for next command
    bCommandRx = true; // signal reception of a new command (this is then used in main loop)
  }
  else
  {
    sCommand += inChar;
}

I even used the above in an interrupt based manner using SerialEvent.

But all ways of working give me exactly the same magical result of 244 commands/second.

2 questions:

  • Anybody an idea why this speed limit is happening?
  • Anybody an idea how to solve it?

Drop the String type and replace it by C standard char. Dynamic memory management cannot work reliably on the smaller Arduinos with little RAM.

hansbilliet:
I even used the above in an interrupt based manner using SerialEvent.

On an Arduino SerialEvent is not interrupt based. It just checks if Serial.available() > 0 at the end of every iteration of loop().

I'm not familiar with C# so I can't comment on your PC program. I use Python with Linux.

Have a look at the examples in Serial Input Basics - simple reliable non-blocking ways to receive data.

I have not tested what you have done but I vaguely recall testing throughput a few years ago and found it to reflect the baud rate fairly closely.

On the other hand, with 244 commands per second the Arduino only has 4 millisecs to do anything useful with each command so a faster throughput may not be of much practical value.

...R

Try bypassing Arduino code and use serial interrupt instead. This will setup 500 Kbps 8-n-1 and will reply as fast as it can

volatile uint8_t rcvd = 0;


ISR(USART0_RX_vect)
{
	// received data
	rcvd = UDR0;
	
	// reply with "A\n" after receiving "\n"
	if(rcvd == '\n')
	{
		UDR0 = 'A';
		while((UCSR0A & 1<<TXC0)==0);
		UCSR0A |= 1<<TXC0;
		
		UDR0 = '\n';
		while((UCSR0A & 1<<TXC0)==0);
		UCSR0A |= 1<<TXC0;
	}
}

void setup()
{
	// set baud rate (500 Kbps)
	UBRR0H = 0;
	UBRR0L = 3;

	// USART initialization with 16MHz system clock, 8-none-1, RX interrupt
	UCSR0A = 1<<U2X0 | 0<<MPCM0;
	UCSR0B = 1<<RXCIE0 | 0<<TXCIE0 | 0<<UDRIE0 | 1<<RXEN0 | 1<<TXEN0 | 0<<UCSZ02;
	UCSR0C = 0<<UMSEL01 | 0<<UMSEL00 | 0<<UPM01 | 0<<UPM00 | 0<<USBS0 | 1<<UCSZ01 | 1<<UCSZ00 | 0<<UCPOL0;
}

void loop(){
}

hzrnbgy:
Try bypassing Arduino code and use serial interrupt instead. This will setup 500 Kbps 8-n-1 and will reply as fast as it can

Out of curiosity, do you know why that is faster than the Arduino Serial code?

...R

I don't. That's why I'm asking if he can try it and see what the result is. What I do know is the code above is barebone enough to do what he's testing and see if the bloatware that is Arduino code has any significant effect.

hzrnbgy:
That's why I'm asking if he can try it and see what the result is. What I do know is the code above is barebone enough to do what he's testing and see if the bloatware that is Arduino code has any significant effect.

It seems to me a little unreasonable to expect a newbie to be a guinea pig for the sort of trial you have proposed.

There is very little "bloatware" in an Arduino serial interrupt - have a look at the source code. And an Arduino can work so much faster than serial data can arrive (even at a high baud rate) that I doubt that the OP's problem is the fault of the Arduino.

...R

Don’t worry, that guinea pig can handle it quite well :slight_smile: I’m not that newbie, and have experience with microcontrollers.

I did the test with the code you proposed, and I get exactly double the speed! Now I am at 488 Tx/Rx sets per second, which used to be 244. I even tried the whole range from 38400 baud up to 2000000 baud (changing UBRR using datasheet), and I have the same effect that it clips at around 57600 baud (although, 57600 gave errors, not sure why).

But it even gets stranger. At 38400, I get 325 cycles. But that should be lower! 38400 at 8-none-1 should be 3840 chars/second. Sending “0123456789\n” and receiving “A\n” is total 13 characters, which means 3840 / 13 = 295 Tx/Rx sets per second. But I get 325!

I double checked and crossed checked my C# code, but it’s all synchronous Writeline/Readline, of which the core is like (even somebody not familiar with C# should get this):

_serialPort.WriteLine($"0123456789");
string ack = _serialPort.ReadLine();
if (ack != "A")
    Debug.WriteLine("ACK error");

It even gets more interesting. The code snippet provided by @hzrnbgy sends “A\n” directly from the ISR routine. But I made a second version that brings a received command back in the “loop()” to allow it to evaluate the received command and only acknowledge valid ones. Because the ISR for Serial is already used (vector __vector_25), I can’t make use of Serial.println, so I just copied the code provide by @hzrnbgy to send the acknowledge.

volatile uint8_t rcvd = 0;

String sCommandRx;
String sCommand;
bool bCommandRx;

ISR(USART0_RX_vect)
{
  // received data
  rcvd = UDR0;

  if (rcvd == '\n')
  {
    sCommandRx = sCommand;
    bCommandRx = true;
    sCommand = "";
  }
  else
    sCommand += (char)rcvd;
}

void setup()
{
  UBRR0H = 0;
  UBRR0L = 0;

  // USART initialization with 16MHz system clock, 8-none-1, RX interrupt
  UCSR0A = 1<<U2X0 | 0<<MPCM0;
  UCSR0B = 1<<RXCIE0 | 0<<TXCIE0 | 0<<UDRIE0 | 1<<RXEN0 | 1<<TXEN0 | 0<<UCSZ02;
  UCSR0C = 0<<UMSEL01 | 0<<UMSEL00 | 0<<UPM01 | 0<<UPM00 | 0<<USBS0 | 1<<UCSZ01 | 1<<UCSZ00 | 0<<UCPOL0;

  sCommand = "";
}

void loop()
{
  if (bCommandRx)
  {
    if (sCommandRx == "0123456789")
    {
      UDR0 = 'A';
      while((UCSR0A & 1<<TXC0)==0);
      UCSR0A |= 1<<TXC0;
      
      UDR0 = '\n';
      while((UCSR0A & 1<<TXC0)==0);
      UCSR0A |= 1<<TXC0;
    }

    bCommandRx = false;
  }
}

The above gives me again exactly 244 Tx/Rx sets per second, or half of what was achieved with the @hzrnbgy code. It can’t be a coincidence that the speed is exactly half of the earlier 488 Tx/Rx sets per second.
So… what’s the verdict? I’m really curious if somebody could give me a clue.
@Robin2 - I do agree that 244 Tx/Rx sets is a high number already. And yes, if I’m going to do some processing in-between, I might even not reach at that speed (and this is also why I build and ACK system). But just because I don’t want to lose some processing time for a reason that I don’t understand, I want to understand if I’m doing something wrong or not.

Just some additional remark (edit): Some people could question if the issue is not caused by my C# code. That a good reflex of course. But I really doubt it is, for 2 reasons:

  • The difference between 244 and 488 uses exactly the same C# code
  • I originally made lot more complex C# code based on async/wait (asynchronous communication) and got exactly the same numbers (244) - the code provided here is written only to provide the bare necessities exactly with the purpose to rule out any mistakes

_serialPort.WriteLine($"0123456789");
string ack = _serialPort.ReadLine();
if (ack != "A")
Debug.WriteLine("ACK error");

Not an expert on C# but is it possible to declare the ack variable outside of loop so you don't have the overhead of initializing it every iteration.

It's odd that it maxes out at 488 even when you increase the baud rate to 2Mbps since theoretically, that would be four times the speed.

The next logical step, to eliminate any doubts on the C# code side, is to let two Arduino's battle it out. It would not be hard to code an Arduino side transmitter that simulate the C# program. I bet you can get more than 488 ACKs/second since at 500000 bps, your test should be able to handle up to ~3800 ACKs/second. I guess this boils down to the underlying OS your C# is running from...

I can guarantee you that the C# compiler is clever enough not to declare the string variable each time, but I am considering every suggestion. Even when I remove the check completely, the result is exactly the same.

But if the C# code would be the cause of this problem (which I also take as one of the possibilities), then I don't understand why if I change your suggested Arduino code in the second version that evaluates the received command in the loop(), that this causes the speed to increase from 244 to 488. I didn't change a single bit in my C# code for that.

The main issue that I have with these Arduino's is that there is not an easy way to run code and debug. I see that the Arduino board has an ICSP connector, and I have a PICKit4 that works with the MPLAB compiler, but not sure if I'm able to do something with that? (somebody said I'm a newbie, so I guess they are right :)).

I was also thinking about your suggestion, to connect 2 Arduino's (I have 2). But Serial0 is connected via USB, right? Can I bypass that? What do you think of this setup (open for suggestions):

  • Arduino 1 is the "unit under test", Arduino 2 is used to simulate my C# code
  • I connect Serial0 of Arduino 1 with Serial1 of Arduino 2 (is this possible? - because Serial0 is normally controlled via USB, right? Can this be bypassed?)
  • I connect Arduino 2 via USB with the Monitor built in the Arduino IDE - I use this to report results

Would that work?

Since you have two Megas, you can use Serial1 on both and reserved Serial0/USB for output to your terminal to display statistics.

Just crossed the TX of Serial1 (arduino1) to the RX of Serial1(arduino2) and vice versa.

Then

That should be a good test whether OS plays a role in capping your serial speed (which I honestly believe is the case)

Working on it already. Still was going to use Serial0 (should be possible, I looked it up - I won't connect this to USB) - reason is that it's not because it works for Serial1, that it would with Serial0.

You can expect some results within... less than an hour?

I'm willing to bet my 10 Golden fishes that OS plays a part in all of these

Don’t forget Gnd to Gnd also.

The results are in! @hzrnbgy, your Golden fishes can relax :slight_smile: :slight_smile: :slight_smile:

I did tests at 115200, 500000 and 2000000 baud, and the results are within expectations. Results are Tx/Rx cycles per second (sending "0123456789\n" and receiving "A\n" = 13 characters).

  • 115200 - calculated 886 - measured 806
  • 500000 - calculated 3846 - measured 2958
  • 2000000 - calculated 15384 - measured 6060

The difference between calculated and measured speed is because of some overhead. I measure this to be about 100 msec for each cycle of 1000 Tx/Rx cycles. If I take this overhead in account, then the difference between calculated and measured becomes relatively small:

  • 115200 - calculated 814 - measured 806
  • 500000 - calculated 2777 - measured 2958
  • 2000000 - calculated 6060 - measured 6060

So, the winner is! The Arduino.

That means that the inefficiency comes from my C# code. Although, I don't think that there is something wrong with my code. I suspect that the interfacing through USB is giving some trouble. Or the SerialPort class that I'm using is causing some inefficiencies (I read already a lot of bad things about this SerialPort class).

Well... this was an exciting investigation. Thanks a lot to all your contributions. My special thanks to @hzrnbgy, who gave me some interesting insights in using ISR.

Late to the party; I ran your original codes from the opening post on a Mega and got the same results.

Next I ran it on a Nano clone with CH340 and I got around 340 command/sec.
Next I used an SparkFun RedBoard wih FTDI and got only 62 commands/sec.

Lastly I tried a SparkFun ProMicro and an Arduino Leonardo; your C# code crashed with a read timeout exception (even with a 5 second timeout value). I don't have time to investigate that but will keep it in mind when developing. But running it in Serial Monitor worked (got A back) :o

PS
Compiler used was VS2017

Not necessarily entirely on the C# code but rather how an operating system operates. If you bypass the OS and write baremetal code directly on the x86 platform, you should get the same result.

It does come to show that even a 16 MHz microcontroller can beat a multi gigahertz machine in some specific applications.

@sterretje - thanks a lot for your efforts as well - it kind of confirms that I was not dreaming :slight_smile:

@hzrnbgy - It doesn't stop for me here. I really bet (I don't have Golden fishes unfortunately) that the USB has to do with it. USB 2.0 HID devices have a polling rate of 1 msec (if I'm not mistaken). I experienced with some PIC microcontrollers, and built my own USB stack on my PC to communicate with it, and I had some limit of 500 commands/sec. With your fastest ISR-approach, using SerialPort in C#, I could get 488 commands/second, which is slightly slower but pretty close. There might of course be some "conversion-lag", creating some de-synchronization, between serial and USB, hence missing some USB frames. Then there is still the puzzle why your native code gave me 488 commands, and putting the "A\n" answer in the loop() gave me only the half (244) commands. I have a theory about it, which is that putting the answer in the ISR made it react so fast, that the data could still "piggy back" on the same frame. If I put the answer in the loop(), it missed the train, and took the next 1msec frame. Anyway, these are just WILD GUESSES for now, so I could be COMPLETELY wrong of course. I will post this on Stack Overflow, and see if somebody could come up with an answer.

If I have any more feedback, I will post it here. Could always be handy for some other users.

hansbilliet:
USB 2.0 HID devices have a polling rate of 1 msec (if I'm not mistaken).

I had forgotten about that but I got stung with it several years ago when I was trying to replace an old PC parallel port connection with a USB connection and an FTDI module.

If the message contains at least 64 bytes (IIRC) the USB system will send it immediately - otherwise it will wait for the 1ms timeout.

One way to verify if this is the problem is to write the equivalent code to send commands from Arduino Mega to another using the Serial1 connection.

...R

I did do the test using 2 Arduino's - you can read the results in this thread. And yes, connecting 2 Arduino's back to back works perfect. So I guess USB is the cause of some issues.

I just did a test with sending 70 characters (+ '\n'), and it gives me exactly the same results of 244 Tx/Rx cycles. This means that sending 10+1 characters or 70+1 characters doesn't make a difference. So that almost proofs that somewhere in the chain, something is waiting for a buffer to be filled, and then starts transmitting.

Still, thinking more about it, I don't completely understand why I get 244 Tx/Rx sets per second. I can understand that my transmission waits for one frame of 1 msec, and the ACK travels on the next frame. That should still give me about 500 Tx/Rx sets per second. The overhead in the Arduino code could make it miss a frame, but I measured it before and it's about 100 msec for 1000 Tx/Rx sets, so only 0.1msec per Tx/Rx set - hard to miss more than one frame then.

Anyway, if interested, I posted it on Stack Overflow: