# Arduino Due vs. Intel 2.8GHz single core

In forum thread Aduino Due vs. Nano performance I did compare performance of Arduino Due and Arduino Nano for a given simple code (increment of a volatile variable). Runtime factors were 9(18) for 16(32) bit variable.

Recently I remembered (unrelated to Arduino) on “minimal magic 3x3 prime square”:

I found that back in 1982 with a Sinclair ZX81 program, was found by R. Ondrejka in 1979 before:

`````` 47| 29|101|
113| 59|  5|
17| 89| 71|
``````

Optimized C program did compute the minimal solution in 6μs:

Sinclair ZX81 CPU had 3.25MHz clock or 0.31μs for a single clock cycle:

Today I thought how much time the optimized search program would take on an Arduino Due.

The posted C program made use of 64bit type, but I had 32bit version with same performance. I just ported that code (was not difficult, in fact measuring microseconds is much easier on Arduino than on Linux, formatted printing is worse). This is Serial monitor output generated by Arduino Due:

`````` 47| 29|101|
113| 59|  5|
17| 89| 71|

548us
``````

So for this non-trivial code Intel 2.8GHz single core is not 92x faster than the Arduino Due!

The Arduino preprocessor could not deal with multiline #define, so I had to bring the forall_odd_primes_less_than() macro into a single line. This is the complete code (also attached).

``````uint32_t B[]={0x35145105,0x4510414,0x11411040,0x45144001};

#define Prime(i) ((B[(i)>>5] & (0x80000000UL >> ((i)%32))) != 0)

#define forall_odd_primes_less_than(p, m, block)  for((p)=3; (p)<(m); (p)+=2) if (Prime((p))) block

void setup() {
Serial.begin(9600);

uint8_t p,a,b,c,d,i;
unsigned long t0 = micros();

forall_odd_primes_less_than(p, 64,
forall_odd_primes_less_than(a, p,
if Prime(2*p-a)
{
forall_odd_primes_less_than(b, p,
if ( (b!=a) && Prime(2*p-b) )
{
c= 3*p - (a+b);

if ( (c<2*p) && (2*p-c!=a) && (2*p-c!=b) && Prime(c) && Prime(2*p-c) )
{
if (2*a+b>2*p)
{
d = 2*a + b - 2*p;   // 3*p - (3*p-(a+b)) - (2*p-a)

if ( (d!=a) && (d!=b) && (d!=2*p-c) && Prime(d) && Prime(2*p-d) )
{
unsigned long t1 = micros();
print3(a); print3(b); print3(c); Serial.println();
print3(2*p-d); print3(p); print3(d); Serial.println();
print3(2*p-c); print3(2*p-b); print3(2*p-a); Serial.println();
Serial.println();
Serial.print(t1-t0);
Serial.println("us");
}
}
}
}
)
}
)
)
}

void loop() {}

void print3(int n) {
if (n<10) Serial.print("  ");
else if (n<100) Serial.print(" ");
Serial.print(n);
Serial.print("|");
}
``````

Hermann.

sketch_apr05b.ino (1.45 KB)

Just realized that Due compilation was done with -Os – I changed that to -O3 in “~/.arduino15/packages/arduino/hardware/sam/1.6.4/platform.txt”:

``````\$ diff platform.txt.orig platform.txt
22c22
< compiler.c.flags=-c -g -Os {compiler.warning_flags} -ffunction-sections -fdata-sections -nostdlib --param max-inline-insns-single=500 -Dprintf=iprintf -MMD
---
> compiler.c.flags=-c -g -O3 {compiler.warning_flags} -ffunction-sections -fdata-sections -nostdlib --param max-inline-insns-single=500 -Dprintf=iprintf -MMD
24c24
< compiler.c.elf.flags=-Os -Wl,--gc-sections
---
> compiler.c.elf.flags=-O3 -Wl,--gc-sections
27c27
< compiler.cpp.flags=-c -g -Os {compiler.warning_flags} -ffunction-sections -fdata-sections -nostdlib -fno-threadsafe-statics --param max-inline-insns-single=500 -fno-rtti -fno-exceptions -Dprintf=iprintf -MMD
---
> compiler.cpp.flags=-c -g -O3 {compiler.warning_flags} -ffunction-sections -fdata-sections -nostdlib -fno-threadsafe-statics --param max-inline-insns-single=500 -fno-rtti -fno-exceptions -Dprintf=iprintf -MMD
\$
``````

Sketch size increased from 11,624 bytes to 13,548 bytes, but is still only 2% of the 512MB available on Due.

Runtime decreased from 548μs to 494μs, so now 83 x TIntel_2.8GHz > TArduino_Due:

`````` 47| 29|101|
113| 59|  5|
17| 89| 71|

494us
``````

Hermann.