Go Down

Topic: Arduino Due vs. Intel 2.8GHz single core (Read 1 time) previous topic - next topic

HermannSW

Apr 05, 2016, 02:26 am Last Edit: Apr 05, 2016, 02:39 am by HermannSW
In forum thread Aduino Due vs. Nano performance I did compare performance of Arduino Due and Arduino Nano for a given simple code (increment of a volatile variable). Runtime factors were 9(18) for 16(32) bit variable.

Recently I remembered (unrelated to Arduino) on "minimal magic 3x3 prime square":
https://twitter.com/HermannSW/status/709843062789414912

I found that back in 1982 with a Sinclair ZX81 program, was found by R. Ondrejka in 1979 before:
Code: [Select]

 47| 29|101|
113| 59|  5|
 17| 89| 71|


Optimized C  program did compute the minimal solution in 6μs:
https://twitter.com/HermannSW/status/709893851020972032

Sinclair ZX81 CPU had 3.25MHz clock or 0.31μs for a single clock cycle:
https://twitter.com/HermannSW/status/709895870246985728


Today I thought how much time the optimized search program would take on an Arduino Due.

The posted C program made use of 64bit type, but I had 32bit version with same performance. I just ported that code (was not difficult, in fact measuring microseconds is much easier on Arduino than on Linux, formatted printing is worse). This is Serial monitor output generated by Arduino Due:
Code: [Select]

 47| 29|101|
113| 59|  5|
 17| 89| 71|

548us


So for this non-trivial code Intel 2.8GHz single core is not 92x faster than the Arduino Due!

The Arduino preprocessor could not deal with multiline #define, so I had to bring the forall_odd_primes_less_than() macro into a single line. This is the complete code (also attached).
Code: [Select]
uint32_t B[]={0x35145105,0x4510414,0x11411040,0x45144001};

#define Prime(i) ((B[(i)>>5] & (0x80000000UL >> ((i)%32))) != 0)

#define forall_odd_primes_less_than(p, m, block)  for((p)=3; (p)<(m); (p)+=2) if (Prime((p))) block

void setup() {
  Serial.begin(9600);
  
  uint8_t p,a,b,c,d,i;
  unsigned long t0 = micros();

  forall_odd_primes_less_than(p, 64,
    forall_odd_primes_less_than(a, p,
      if Prime(2*p-a)
      {
        forall_odd_primes_less_than(b, p,
          if ( (b!=a) && Prime(2*p-b) )
          {
            c= 3*p - (a+b);

            if ( (c<2*p) && (2*p-c!=a) && (2*p-c!=b) && Prime(c) && Prime(2*p-c) )
            {
              if (2*a+b>2*p)
              {
                d = 2*a + b - 2*p;   // 3*p - (3*p-(a+b)) - (2*p-a)

                if ( (d!=a) && (d!=b) && (d!=2*p-c) && Prime(d) && Prime(2*p-d) )
                {
                  unsigned long t1 = micros();
                  print3(a); print3(b); print3(c); Serial.println();
                  print3(2*p-d); print3(p); print3(d); Serial.println();
                  print3(2*p-c); print3(2*p-b); print3(2*p-a); Serial.println();
                  Serial.println();
                  Serial.print(t1-t0);
                  Serial.println("us");
                }
              }
            }
          }
        )
      }
    )
  )
}

void loop() {}

void print3(int n) {
  if (n<10) Serial.print("  ");
  else if (n<100) Serial.print(" ");
  Serial.print(n);
  Serial.print("|");
}


Hermann.
https://forum.arduino.cc/index.php?topic=462107.msg3236016#msg3236016
http://stamm-wilbrandt.de/en/Raspberry_camera.html

HermannSW

Just realized that Due compilation was done with -Os -- I changed that to -O3 in "~/.arduino15/packages/arduino/hardware/sam/1.6.4/platform.txt":
Code: [Select]
$ diff platform.txt.orig platform.txt
22c22
< compiler.c.flags=-c -g -Os {compiler.warning_flags} -ffunction-sections -fdata-sections -nostdlib --param max-inline-insns-single=500 -Dprintf=iprintf -MMD
---
> compiler.c.flags=-c -g -O3 {compiler.warning_flags} -ffunction-sections -fdata-sections -nostdlib --param max-inline-insns-single=500 -Dprintf=iprintf -MMD
24c24
< compiler.c.elf.flags=-Os -Wl,--gc-sections
---
> compiler.c.elf.flags=-O3 -Wl,--gc-sections
27c27
< compiler.cpp.flags=-c -g -Os {compiler.warning_flags} -ffunction-sections -fdata-sections -nostdlib -fno-threadsafe-statics --param max-inline-insns-single=500 -fno-rtti -fno-exceptions -Dprintf=iprintf -MMD
---
> compiler.cpp.flags=-c -g -O3 {compiler.warning_flags} -ffunction-sections -fdata-sections -nostdlib -fno-threadsafe-statics --param max-inline-insns-single=500 -fno-rtti -fno-exceptions -Dprintf=iprintf -MMD
$


Sketch size increased from 11,624 bytes to 13,548 bytes, but is still only 2% of the 512MB available on Due.

Runtime decreased from  548μs  to  494μs, so now  83 x TIntel_2.8GHz > TArduino_Due:
Code: [Select]

 47| 29|101|
113| 59|  5|
 17| 89| 71|

494us


Hermann.
https://forum.arduino.cc/index.php?topic=462107.msg3236016#msg3236016
http://stamm-wilbrandt.de/en/Raspberry_camera.html

Go Up