Arduino Compiler for Mega and DUE

Hello everyone,

I am working on a security protocol implementation on Arduino boards. I have implemented the protocol to operate on 64-bit, 128-bit, and 256-bit variables (e.g., nonces, hash results, …). I have used the Arduino Mega (8-bit, 16MHz, 8KB) and the Arduino DUE (32-bit, 84MHz, 96KB). In terms of correctness, the protocol runs fine and terminates correctly. However, I have noticed some inconsistencies:

  1. The codes that use larger variables (128 bit or 256 bit) are compiled to smaller codes (around 17KB) than the code that operates on 64-bit variables (around 31KB) when using the Arduino Mega board. The same thing is observed on the DUE board, but the difference is small (19K for 128-bit, 19K for 256-bit, and 20K for 64-bit)

  2. As a consequence of the previous point, the 128-bit and 256-bit code run faster than the 64-bit code.

  3. The code that uses 128-bit variables and run perfectly on the Arduino Mega, gets stuck at the beginning when running on the Arduino DUE board.

I have left the compiler optimizer as is (i.e., option -Os). If I use another optimization option (e.g., -O1, -O2, or -O3), the codes with larger variables are compiled to smaller codes, whereas the 64-bit code is compiled to a larger one (around 190KB).

Does anybody have an idea of what is going on? I cannot provide an interpretation of my results, in terms of execution time. It would be great if I can get some information about the issue that I pointed out above.

I have attached a plot for the execution time of the different phases of the protocol, for different variable sizes (the solid bars are for a serial communication rate of 190Kbps and the ones with diagonal lines are for a serial communication rate of 250Kbps). Regardless of the communication rate, you can see the inconsistencies: by increasing the variables sizes, the protocol becomes faster.

Cheers,

Plot.png

The compiler for AVR and ARM Arduinos doesn't offer native 128bit variables. So it depends on your implementation which you're hiding from us.

Thanks pylon for your reply.

A variable of 128 bits could be as simple as a static array of bytes declared as:

byte[16] var;

and initiated as:

byte[16]={0x55,0x88,0x66,0xf5,0x12,0x37,0xcc,0x2d, 0x55,0x88,0x66,0xf5,0x12,0x37,0xcc,0x2d};

So basically, a classic way of using variables, which I assume does not cause any issue.

So basically, a classic way of using variables, which I assume does not cause any issue.

So the difference between the three versions are only a few defines?

Even then the optimizer may use different strategies depending on the values used. Difficult to analyze if we can see the code, impossible in your case as you’re still hiding the code from us.

If all of your large variables are byte arrays, I don't see why the implementations would be of significantly different size. Are you sure you are not trying to use uint64_t (a.k.a. 'unsigned long long' on the MEGA) anywhere? That would likely bring in a library since it is running on an 8-bit processor.

Hello everyone,

Thanks for your inputs,

I managed to find where the issue came from. I was using a hash function library Blake2 and I have made the mistake of using the blake2s library for the 128-bit code and 256-bit code and used the blake2b (which is optimized for 64-bit CPUs) for the 64-bit code. I have replaced blake2b with blake2s all over the places and things became more consistent.

Nevertheless, the random number generator that is provided by the same cryptographic library is taking more time to generate a number on 16 bytes (around 178000 ms on Arduino Mega) than to generate a number on 8 bytes or 32 bytes (6050 ms on Arduino Mega). It is strange, but it could be related to the point that pylon mentioned about 128-bit variables.

Thanks all for your responses!

The avr-gcc support for "native" 64bit integers is "known" to be "not very good." If you have code that is using "int64_t" data types for 64bit stuff, but has its own (byte-array-based) math package for 128bits+, I wouldn't be surprised that the 128bit version comes out faster.

You could probably demonstrate this with some simple tests on a couple of your "core" functions.

Thanks westfw!

Thus far I am not having any issue with the 64-bit variables that I am declaring as byte-arrays of 8 bytes. Also, the same thing holds for 256-bit variables that I am declaring as byte-arrays of 32 bytes. The issue is with the byte-arrays of 16 bytes. I was confused that it takes much longer to generate a random number on 128 bits than on 64 bits or 256 bits.

I am not sure what that is, but nothing is mentioned in the library specification (Arduino Cryptography Library: RNGClass Class Reference).

Regards,

If the isssue is to generate an random number, Sam3x TRNG peripheral provides a 32-bit random value (4 bytes) every 1 us. Therefore filling a 16 bytes array with 4 consecutive random numbers should take ~ 4 us, a 32 bytes array ~ 8 us and a 64 bytes array ~ 16 us.

I managed to find where the issue came from. I was using a hash function library Blake2 and I have made the mistake of using the blake2s library for the 128-bit code and 256-bit code and used the blake2b (which is optimized for 64-bit CPUs) for the 64-bit code. I have replaced blake2b with blake2s all over the places and things became more consistent.

Thanks for wasting our time.

pylon,

You are welcome.