Performance of SafeRingBufferN

Not sure if this is the best place to ask... Better as issue? Better in different category...

Quick overview - trying another version of SerialX speedup that does not require any changes to FSP, and makes usage of the ring buffer that is already built into the UART class.

arduino::SafeRingBufferN<SERIAL_BUFFER_SIZE> txBuffer;

The speed of this does not feel right, for example here is a logic analyzer output, showing one write operation for a ping operation to Robotis servos:

The message is 14 bytes long. The Channel 6 data shows the time that the code is within the UART::Write(buffer, cnt) function, which with the timing markers on the right hand side is almost: 50us. Channel 7 I do a toggle when I am about to call the lower layer to start the transfer. From the point until the return from the write took a little over 4ms

So this code:

size_t  UART::write(const uint8_t* c, size_t len) {
  if(init_ok && (len > 0)) {
    DBGDigitalWrite(A5, HIGH);
    size_t cb_left = len;

    while (cb_left) {
      // see if we have room in the FIFO to store this character.
      bool tx_active_on_entry = tx_fsi_active;

      // Put as much of data into txBuffer as will fit
      while(cb_left && !txBuffer.isFull()) {
        txBuffer.store_char(*c++);
        cb_left--;
      }

      // if the UART was not active at the start we will start it.
      if (!tx_active_on_entry) {
        size_t cb = 0;
        while (cb < sizeof(tx_fsi_buffer)) {
          int ch = txBuffer.read_char();
          if (ch == -1) break;
          tx_fsi_buffer[cb++] = ch;            
        }
        tx_fsi_active = true;
        DBGDigitalToggle(A2);
        R_SCI_UART_Write(&(uart_ctrl), tx_fsi_buffer, cb);

Tooks about 45us, or about the time the UART took to output 4.5 characters at 1mbs.
Just feels like it should be faster. Not sure if it is because the Safe part, disables interrupts to do almost every operation?

Investigating. May try the unsafe version to see if it makes any differences.

'cb_left' will always be true inside a while block that test and finds it true. Therefore you don't need to test it again, any place inside the block.

cb_left is decremented within the inner loop. The inner loop should in vast majority of cases wind itself to 0 and exit. The other case though is if the write fills the FIFO queue, at which point we exit the inner loop with isFull and hopefully do stuff to either startup a transfer or allow the transfer to continue, which at some point the ISR will call our callback, which will extract data from this FIFO, and then our inner loop will be entered again... until all of the data can be queued.

Quick update: I changed: to the simple fifo

    //arduino::SafeRingBufferN<SERIAL_BUFFER_SIZE> txBuffer;
    arduino::RingBufferN<SERIAL_BUFFER_SIZE> txBuffer;


And the time went from 50 to 17.
So yes a lot of time spent in the Safe stuff.

Question is, is it needed? Should not be terribly needed if the underlying ring buffer is setup to properly handle single producer and consumer.

Will look more at that code...

Please post your entire sketch.

Note: the code is not from sketch but WIP change to core code.
I pushed up the changes in a new branch of my Fork:
KurtE/ArduinoCore-renesas at SerialX_use_write_fifo (github.com)

Now the sketch I am trying now is:
slightly modified for debug purposes version of the DynixelShield example scan_dynamixels

/*******************************************************************************
* Copyright 2016 ROBOTIS CO., LTD.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
*     http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*******************************************************************************/

#include <DynamixelShield.h>

#if defined(ARDUINO_AVR_UNO) || defined(ARDUINO_AVR_MEGA2560)
  #include <SoftwareSerial.h>
  SoftwareSerial soft_serial(7, 8); // DYNAMIXELShield UART RX/TX
  #define DEBUG_SERIAL soft_serial
#elif defined(ARDUINO_SAM_DUE) || defined(ARDUINO_SAM_ZERO)
  #define DEBUG_SERIAL SerialUSB
#else
  #define DEBUG_SERIAL Serial
#endif

//#define MAX_BAUD  5
//const int32_t buad[MAX_BAUD] = {57600, 115200, 1000000, 2000000, 3000000};
#define MAX_BAUD  1
const int32_t buad[MAX_BAUD] = {1000000};

Dynamixel2Arduino dxl(DXL_SERIAL, DXL_DIR_PIN);

//This namespace is required to use Control table item names
using namespace ControlTableItem;

void setup() {
  // For Uno, Nano, Mini, and Mega, use UART port of DYNAMIXEL Shield to debug.
  DEBUG_SERIAL.begin(115200);   //set debugging port baudrate to 115200bps
  while(!DEBUG_SERIAL);         //Wait until the serial port is opened
#if 1
  pinMode(A2, OUTPUT);
  pinMode(A3, OUTPUT);
  pinMode(A4, OUTPUT);
  pinMode(A5, OUTPUT);
  for (int i = 0; i < 5; i++) {
    digitalWrite(A2, HIGH);
    digitalWrite(A3, HIGH);
    digitalWrite(A4, HIGH);
    digitalWrite(A5, HIGH);
    digitalWrite(A5, LOW);
    digitalWrite(A4, LOW);
    digitalWrite(A3, LOW);
    digitalWrite(A2, LOW);
  }
#endif
}

void loop() {
   int8_t index = 0;
  int8_t found_dynamixel = 0;

  for(int8_t protocol = 1; protocol < 3; protocol++) {
    // Set Port Protocol Version. This has to match with DYNAMIXEL protocol version.
    dxl.setPortProtocolVersion((float)protocol);
    DEBUG_SERIAL.print("SCAN PROTOCOL ");
    DEBUG_SERIAL.println(protocol);
    delay(10); // give a break between to easier find 
    for(index = 0; index < MAX_BAUD; index++) {
      // Set Port baudrate.
      DEBUG_SERIAL.print("SCAN BAUDRATE ");
      DEBUG_SERIAL.println(buad[index]);
      dxl.begin(buad[index]);
      //for(int id = 0; id < DXL_BROADCAST_ID; id++) {
      for(int id = 0; id < 25/*DXL_BROADCAST_ID*/; id++) {
        //iterate until all ID in each buadrate is scanned.
        if(dxl.ping(id)) {
          DEBUG_SERIAL.print("ID : ");
          DEBUG_SERIAL.print(id);
          DEBUG_SERIAL.print(", Model Number: ");
          DEBUG_SERIAL.println(dxl.getModelNumber(id));
          found_dynamixel++;
        }
      }
    }
  }
  
  DEBUG_SERIAL.print("Total ");
  DEBUG_SERIAL.print(found_dynamixel);
  DEBUG_SERIAL.println(" DYNAMIXEL(s) found!");

  DEBUG_SERIAL.println("\n\n*** Press any key to run again ***");

  while (DEBUG_SERIAL.read() == -1) ;
  while (DEBUG_SERIAL.read() != -1) ;

}

Requires library or two from robotis which is in the library manager.

EDIT: - Note I have not tested some of my other types of tests yet, like ones which output long Serial outputs, or multiple calls to Serial output... That need to exercise some of the code in calback. I suspect I will need to add some additional hack in there to handle case where the callback is called after the ISR put the last character of current buffer and then calls us, and the write tries (or tried before) to place first character of new buffer in output register, even though it was not empty...

But that is more for the actual SerialX code updates thread.

Mainly started this thread to talk about the SafeRingBufferN class and see if or should some of it be updated

Please explain. We can hardly analyze complex code if it isn't complete. Perhaps you can find someone here who already has a lot of experience with this particular code... if not, we would have to analyze it from the ground up. For that, we need a reliable and complete snapshot of the code. Asking us to assemble it from a base plus a WIP, is really a tall order unless you are floating it as a paid job.

1 Like

Sorry, unsure of best place to ask this... Maybe better as Issue against the core code? @ptillisch, what do you suggest?

Where, hopefully those who work on the core code, will understand and maybe suggest what maybe should be changed. But here is more background:

The Serial (UARTn) code is in Serial.h and Serial.cpp within cores\arduino
The currently released board versions already have two of the SafeRingBufferN objects, which are templated objects.

    arduino::SafeRingBufferN<SERIAL_BUFFER_SIZE> rxBuffer;
    arduino::SafeRingBufferN<SERIAL_BUFFER_SIZE> txBuffer;

The code for this object is contained in SafeRingBuffer.h, which is also contained within the same project directory. The SafeRingBufferN is a sublass of the templated class RingBufferN, which is contained in the Arduino API project, but if you have the full clone of the core project including subprojects, AND the links are correct. (Several of them on WIndows, need to be patched). But then on my machine are logically in the code in the file:
cores\arduino\api\RingBuffer.h

The performance hit: Almost all, if not all of the methods in the SafeRingBuffer.h look like:

template <int N>
void SafeRingBufferN<N>::store_char(uint8_t c) {
  synchronized {
    RingBufferN<N>::store_char(c);
  }
}

Now if you look in sync.h you will find:

class __Guard {
public:
	__Guard() : primask(__get_PRIMASK()), loops(1) {
		__disable_irq();
	}
	~__Guard() {
		if (primask == 0) {
			__enable_irq();
			// http://infocenter.arm.com/help/topic/com.arm.doc.dai0321a/BIHBFEIB.html
			__ISB();
		}
	}
	uint32_t enter() { return loops--; }
private:
	uint32_t primask;
	uint32_t loops;
};

#define synchronized for (__Guard __guard; __guard.enter(); )

#endif

So everyone of those calls, creates and destroys an object, and disables and if appropriate reenables interrupts and the global level.

So wondering if that is the best approach to use with Serial? Do we need this level of protection? Without this? Is the code safe enough for most cases where one logical thread (main code) does SerialX.write() to specfic SerialX and the Interrupt handler for that Serial port is the only Consummer.

Note: The RX side of the Serial objects is implemented. in the currently released code. On the TX side it is not. The TX side only outputs one chracter at a time, and waits until the register is empty or FIFO not full on those USARTS that hava fifo.

Hope that helps.

Follow on to the previous posts, about maybe try without using the Safe version. It looks like the RingBufferN is not safe. even for a simple single Producer and simple Consumer setup.

Breaking down the calls, I mentioned:

      while(cb_left && !txBuffer.isFull()) {
        txBuffer.store_char(*c++);
        cb_left--;
      }

The above code has two calls, that go through the __Guard code. Oops actually not as isFull() does not appear to wrapped so probably calling through RingBuffer. Not sure if by design, or simply missed. But suggestion might be, that store_char return something like a bool, so you could imply check the return value...

But the main issue, is that the two methods read_char, and store_char unnecessarily update the same internal variable _numElems, which is not atomic.

The code could easily be modified to remove that problem.

That is store_char only updates_iHead and read_char only updates _iTail

When you need to know how many characters are currently in the buffer, it can be easily computed something like:

	if (head >= tail) return (head - tail);
	else return = N + head - tail;

I am not showing all of the glue, but hopefully enough to make sense.

If the code has a Github repository, and you are not the author, open an issue on it.

Thanks,
arduino/ArduinoCore-renesas (github.com)
and
arduino/ArduinoCore-API: Hardware independent layer of the Arduino cores defining the official API (github.com)

In order to make all relevant information available to any who are interested in this subject, I'll share a link to the issue @KurtE submitted to the repository:

1 Like