Random function on R4 terribly slow

A common use of Arduinos is electronic games. To test such games, it makes sense to replace user inputs by random data using the built-in random function. This works pretty well with Arduino UNO and Arduino MEGA2560. Surprisingly, the latter is 20 percent faster, which I don't fully understand, but it won't give me sleepless nights.

When I switched the project to UNO R4 Minima or WiFi, at first it looked like I had bricked the MCU. Only after an endless wait some data was printed on the Serial monitor. So I wrote an MCVE to narrow things down. (I had to make use of the calculated random data, otherwise the compiler would optimize it out and remove all the random calls.)

As I am working with the R3 for nearly 20 years and with the R4 for nearly two years I am more than surprised and really shocked.

Can a product be considered “compatible” if this feature is 1000 times slower on R4 than on R3? I easily might use code written by Donald E. Knuth but built-in functions should work.

This is my MCVE, tested with various Arduinos (originals and clones) using both IDEs (1.x and 2.x) and various versions of Windows.

It looks like whoever wrote the library with the random function should repeat his homework twice.

/*
  minimal, complete and verifiable example" (MCVE),
  R3:        9 milliseconds
  Minima: 9295 milliseconds
  WiFi:   9581 milliseconds
*/

void setup() {
  Serial.begin(115200);
  delay(2000);
  while (!Serial);
  delay(2000);
  Serial.println(__FILE__);
  const int N = 100;
  int x[N];
  // begin:
  long t1 = millis();
  for (int i = 0; i < N; i++)
    x[i] = random(7);
  long t2 = millis();
  // end
  Serial.println("add");
  long s = 0;
  for (int i = 0; i < N; i++)
    s = s + x[i];
  Serial.println(s);
  Serial.println(t2 - t1);
}

void loop() {}

This is weird indeed.

just to be sure it's not an optimisation issue, what does this code print on the R4 and R3 ?

/*
  minimal, complete and verifiable example" (MCVE),
  R3:        9 milliseconds
  Minima: 9295 milliseconds
  WiFi:   9581 milliseconds
*/

void setup() {
  Serial.begin(115200);
  delay(2000);
  while (!Serial);
  delay(2000);
  Serial.println(__FILE__);
  const int N = 100;
  volatile int x[N];
  // begin:
  volatile unsigned long t1 = millis();
  for (int i = 0; i < N; i++)
    x[i] = random(7);
  volatile unsigned long t2 = millis();
  // end
  Serial.println("add");
  long s = 0;
  for (int i = 0; i < N; i++)
    s = s + x[i];
  Serial.println(s);
  Serial.println(t2 - t1);
}

void loop() {}

Had that same problem long ago, which made me write

This library uses a call to random() to fill a 32 bit buffer.
It has functions to get n bits from this buffer.
If the buffer has less than n bits it is refilled again.

You could use a similar trick to get your random values.

In your case extract 3 random bits, if in range 0..6 use them, else extract 2 new.

Give it a try.

I wonder if this is related to the True RNG hardware module in the RA4M1. Specs for this seem scant.

However, instead of random() try the C standard function rand().

1 Like

You are completely right, but you must admit that the arduino random(long min, long max) is kind of luxory. Unfortunately, https://docs.arduino.cc/language-reference/funktionen/random-numbers/random/ dated 17.05.2024, does not give you any warning when using the R4. They really should.

Seems likely. It may be generating bits from some internal breakdown noise hardware and doing a lot of processing to ensure it meets cryptographic specifications.

Often those are used to just to seed a faster algorithmic random number generator.

1 Like

You can see the hardware/true random number generator being used for random() in ArduinoCore-renesas/cores/arduino/WMath.cpp at main · arduino/ArduinoCore-renesas · GitHub. I agree the behaviour/feature should be documented. I had a play with this to see if it's rate limited. It's not, it's just slow all the time. My R4 Minima takes 115ms and my R4 WiFi is mysteriously quicker at 98ms. I wonder if this varies per chip based on some sort of entropy checking...

You can disable the use of the TRNG by calling randomSeed with a value. An alternative is to call random() without a value but I'm not sure how pukka that's considered in Arduino world.

Not so mysterious, the bits of wifi, field strength etc can give extra entropy assuming it is harvested. However I would have expected more performance gain typical wifi is beyond 1 Mbit/second so harvesting the bits could be faster.

Do you see different timing on the R4 Wifi

  • if there is a lot of traffic?
  • if there is low traffic?
  • if the Wifi switched off?

just curious...

I'd assumed the hardware just used an electrical source of noise. If there's some entropy gathering from multiple sources then it would involve chatting to the ESP32-S3 which does WiFi.

I'm not using WiFi, I'd hope the radio isn't enabled for that case for Arduino programs.

I got 10561 ms. this is really too much. worse than C64 on Basic.

That’s very poor indeed

I get about 98 ms, pretty consistently. (~97900 us, using micros())

The Arduino code for R4 boards reads:

static long trng()
{
  uint32_t value[4];
  if (HW_SCE_McuSpecificInit() != FSP_SUCCESS)
    return -1;
  HW_SCE_RNG_Read(value);
  return (long)value[0] >= 0 ? value[0] : -value[0];
}

Apparently almost ALL of this time is spent in HW_SCE_McuSpecificInit(), which doesn't remember that it has already been initialized! (sigh. Cursed vendor libraries!)

The time goes down to 20us. That's still about 5x slower than the PRNG algorithm, but it's certainly better!

Test sketch:

extern "C" {
  fsp_err_t HW_SCE_McuSpecificInit(void);
  fsp_err_t HW_SCE_RNG_Read(uint32_t * OutData_Text);
};

void setup() {
  Serial.begin(115200);
  while (!Serial)
    ;
  randomSeed(1235);
}

uint32_t startTime, openTime, getTime, closeTime, endTime;

static long trng() {
  uint32_t value[4];
  static bool SCE_inited = false;
  if (!SCE_inited) {
    if (HW_SCE_McuSpecificInit() != FSP_SUCCESS) {
      Serial.println("SCE Init failed®");
      return -1;
    }
    SCE_inited = true;
  }
  openTime = micros();
  HW_SCE_RNG_Read(value);
  return (long)value[0] >= 0 ? value[0] : -value[0];
}

void loop() {

  delay(2000);
  Serial.println(__FILE__);
  startTime = micros();
  int x = random(10000);
  endTime = micros();
  Serial.print("simple Micros: ");
  Serial.println(endTime - startTime);
  startTime = micros();
  x = trng();
  endTime = micros();
  Serial.print("trng SCE Init: ");
  Serial.println(openTime - startTime);
  Serial.print("trng total: ");
  Serial.println(endTime - startTime);
}
1 Like

In order to make relevant information available to any who are interested in this subject, I'll share a link to the formal report @westfw submitted to the "Arduino UNO R4 Boards" platform developers:

Thanks @westfw!

I submitted a bug against the Renesas fsp tree as well.
their "Init" function should be smarter than that, especially since they don't document enough to let user code check the status of the SCE.

1 Like

Good catch!,
in your test sketch you could add a delay(100) to be sure the Serial has flushed.
Interrupts can affect timing measurements (on AVR they certainly do).

That’s what Serial.flush(); is for. Using a delay might get you to wait for longer than needed or not enough depending on what was in the outgoing buffer.

1 Like

There is already a 2-second delay() after each set of output (actually, at the beginning of the loop, but it'll still happen between the output and the next set of timing...)

1 Like