End of last year I played a lot with my 2.2" LCD display. There were many slow graphics libraries, but one very fast named "ILI9341_due". It was much faster than the others even on non-Due boards, but was shown to be much faster with Due's DMA and 32bit on youtube. This was the first trigger for me to order a Due, others were the 48x ram size (96kb) compared to Uno/Nano.
I learned a lot from that library and then wrote my own screenshotToFat() for ILI9341_due library:
https://forum.arduino.cc/index.php?topic=357013.0
There I used a 960 byte array (out of 2KB Nano memory) for storing a single 320 pixel line of display before writing to SD. Of course having 96KB allows to drastically speedup that solution as well.
Another reason was that I want to do video processing with 7$ Jtron OV7670 300KP VGA Camera, and the memory and speed of Due should help a lot compared to Nano/Uno.
I first ordered a 12$ Due China clone and was not able to get it work, To be sure I get a working one I did order another two 12$ Due's from a different supplier. Later I realized that I had to install the Due board in Arduino IDE first, and now I do have three Arduino Due working well
I was able to get two 15V motors run at 14.37m/s or 51.7km/h in my Motor Test Station:
And the 9.8m/s motors were able to move a Uno driven robot with 3.1m/s linearly:
Looking at the Due board it may get my new robot platform by just attaching 2 motors and half a table tennis ball.
OK, before working with the camera I wanted to get a feeling of Due vs. Nano performance.
I wrote a small test program that does just do a lot of "--k;" statements inside a while loop:
// loop with 2^14=16384 times decerement of volatile variable "k"
while (a<m) { D(D(D(D(D(D(D(D(D(D(D(D(D(D(--k;)))))))))))))) }
And I did trigger an Interrupt Service routine from external real time chip at different frequencies:
// ISR: take new measurement
void fcnt(void) { if (a<m) A[a++]=k; }
The modification/access of variables "a" and "k" from within while loop and ISR requires declaring both as "volatile". This has a good side effect for measurements in that the spec requires the variable to be read from memory, accessed/modified then and stored to memory again, each single time, without any optimizations.
I did play with the RTC chip last year and knew how to get 1024/4096/8192/32768 Hz frequencies to trigger Arduino interrupts:
The maximal number of "--k"s between two interrupt was 13572 and so I did next higher power of 2 many decrement operations inside the while loop (by handy "D" statement duplication macro) in order to get some runs that do not see the overhead of end-of-while-loop processing.
Arduino Nano processor is ATMega328 8bit processor, here its 660page spec. The Due has a AT91SAM3X8E 32bit processor, here its 1459 page spec.
The type of variable "k" (int/long) does make a difference for the Nano, but not for the Due (int and long on Due are 32bit).
OK, here is the basis for the analysis:
Lets start with Nano and "int" type variable "k". Column A lists the different test frequencies used. Column B lists the maximal number of decremnts reported between the 100 interrupts triggered. Column C is the product of A and B and shows the total number of decrements per second. The numbers are quite different for different frequencies, and the reson for that is the different number of interrupts with their overhead in a second. In E2 the number of decrement cost of a single interrupt for the 24572 overhead interrupts between C2 and C3 is computed (11.66), and addeded with correct factor to result in adjusted values in column F. These values are nearly identical, and give the number of decrements of a volatile int variable per second. Dividing 16.000.000 (Nano CPU frequency) by these values shows values around 10.
Now I did a cross check and enabled verbose compilation output in Arduino IDE in order to see the exact command line to compile the sketch. I removed the "-o blah.o" part and added "-S" to get the produced assembly. I did this with just one more "--k;" again and the only difference shown between both assembler files is this:
> lds r24,k
> lds r25,k+1
> sbiw r24,1
> sts k+1,r25
> sts k,r24
int 2+2+2+2+2=10
Looking up the cycles per assembler statement in spec confirms that a single "--k;" does take 10 clock cycles.
I did the same evaluation for long type variable k on Nano (rows 7-10), and the values in column H are all very near to 20. Doing the same assembler generation as above showed this diff for a single "--k;":
> lds r24,k
> lds r25,k+1
> lds r26,k+2
> lds r27,k+3
> sbiw r24,1
> sbc r26,__zero_reg__
> sbc r27,__zero_reg__
> sts k,r24
> sts k+1,r25
> sts k+2,r26
> sts k+3,r27
long 2+2+2+2+2+1+1+2+2+2+2=20
Again looking up the commands in spec confirms 20 clock cycles.
Finally I determined the numberd for Arduino Due as well (rows 12-15). There are no cycle counts listed in AT91SAM3X8E spec. I assume reason is the 3-stage pipeline of the processor. So column H (division of 84.000.000 by adjusted decrement count) shows roughly 6 clock cycles per "--k", but maybe the single commands have counts with a biggere sum. This is the assembly diff for Due:
> ldr r2, [r3]
> subs r2, r2, #1
> str r2, [r3]
Column I shows the processing time of a single decrement in microseconds in order to make the values comparable. The Nano time for long (1.25μs) is double the time for int (0.63μs), and that is still a factor 9 higher than that of a Due (0.07μs).
This whole comparison was done for a decrement operation of a volatile variable, and factors different to 9(18) are likely for Nano vs. Due, but this confirms at least that Due is "much" faster than Nano/Uno.
Here is the whole sketch for completeness:
// needed for RTC.set(_, _), see bottom as well
#include <Wire.h>
#include <Time.h>
#include <DS1307RTC.h>
// from http://www.pjrc.com/teensy/td_libs_DS1307RTC.html
// duplicate passed statements
#define D(stmts) stmts stmts
typedef long num;
// typedef int num;
const int m=100; // measurement count
num A[m]; // measurements
volatile int a=0; // next measurement index
volatile num k=10000000; //
// ISR: take new measurement
void fcnt(void) { if (a<m) A[a++]=k; }
void setup(void) {
Serial.begin(57600);
/*
enable 1kHz SQW square wave (is disabled on DSC3231 powerup)
http://datasheets.maximintegrated.com/en/ds/DS3231.pdf#page=13
*/
RTC.set(0x0E, 0x08);
attachInterrupt(digitalPinToInterrupt(2), fcnt, FALLING);
// loop with 2^14=16384 times decerement of volatile variable "k"
while (a<m) { D(D(D(D(D(D(D(D(D(D(D(D(D(D(--k;)))))))))))))) }
// output number of variable decrements between two interrupts
for(k=1; k<a; ++k) Serial.println(A[k-1]-A[k]);
}
void loop(void) { }
/*
added to DS1307RTC.h:
static bool set(uint8_t reg, uint8_t val);
and to DS1307RTC.cpp:
bool DS1307RTC::set(uint8_t reg, uint8_t val)
{
Wire.beginTransmission(DS1307_CTRL_ID);
Wire.write(reg); // reset register pointer
Wire.write(val) ;
if (Wire.endTransmission() != 0) {
return false;
}
return true;
}
*/
Hermann.