Aduino Due vs. Nano performance

End of last year I played a lot with my 2.2" LCD display. There were many slow graphics libraries, but one very fast named "ILI9341_due". It was much faster than the others even on non-Due boards, but was shown to be much faster with Due's DMA and 32bit on youtube. This was the first trigger for me to order a Due, others were the 48x ram size (96kb) compared to Uno/Nano.

I learned a lot from that library and then wrote my own screenshotToFat() for ILI9341_due library:
https://forum.arduino.cc/index.php?topic=357013.0

There I used a 960 byte array (out of 2KB Nano memory) for storing a single 320 pixel line of display before writing to SD. Of course having 96KB allows to drastically speedup that solution as well.

Another reason was that I want to do video processing with 7$ Jtron OV7670 300KP VGA Camera, and the memory and speed of Due should help a lot compared to Nano/Uno.

I first ordered a 12$ Due China clone and was not able to get it work, To be sure I get a working one I did order another two 12$ Due's from a different supplier. Later I realized that I had to install the Due board in Arduino IDE first, and now I do have three Arduino Due working well :wink:

I was able to get two 15V motors run at 14.37m/s or 51.7km/h in my Motor Test Station:

And the 9.8m/s motors were able to move a Uno driven robot with 3.1m/s linearly:

Looking at the Due board it may get my new robot platform by just attaching 2 motors and half a table tennis ball.

OK, before working with the camera I wanted to get a feeling of Due vs. Nano performance.

I wrote a small test program that does just do a lot of "--k;" statements inside a while loop:

// loop with 2^14=16384 times decerement of volatile variable "k" 
  while (a<m)  { D(D(D(D(D(D(D(D(D(D(D(D(D(D(--k;)))))))))))))) }

And I did trigger an Interrupt Service routine from external real time chip at different frequencies:

// ISR: take new measurement
void fcnt(void)  { if (a<m)  A[a++]=k; }

The modification/access of variables "a" and "k" from within while loop and ISR requires declaring both as "volatile". This has a good side effect for measurements in that the spec requires the variable to be read from memory, accessed/modified then and stored to memory again, each single time, without any optimizations.

I did play with the RTC chip last year and knew how to get 1024/4096/8192/32768 Hz frequencies to trigger Arduino interrupts:

The maximal number of "--k"s between two interrupt was 13572 and so I did next higher power of 2 many decrement operations inside the while loop (by handy "D" statement duplication macro) in order to get some runs that do not see the overhead of end-of-while-loop processing.

Arduino Nano processor is ATMega328 8bit processor, here its 660page spec. The Due has a AT91SAM3X8E 32bit processor, here its 1459 page spec.

The type of variable "k" (int/long) does make a difference for the Nano, but not for the Due (int and long on Due are 32bit).

OK, here is the basis for the analysis:

Lets start with Nano and "int" type variable "k". Column A lists the different test frequencies used. Column B lists the maximal number of decremnts reported between the 100 interrupts triggered. Column C is the product of A and B and shows the total number of decrements per second. The numbers are quite different for different frequencies, and the reson for that is the different number of interrupts with their overhead in a second. In E2 the number of decrement cost of a single interrupt for the 24572 overhead interrupts between C2 and C3 is computed (11.66), and addeded with correct factor to result in adjusted values in column F. These values are nearly identical, and give the number of decrements of a volatile int variable per second. Dividing 16.000.000 (Nano CPU frequency) by these values shows values around 10.

Now I did a cross check and enabled verbose compilation output in Arduino IDE in order to see the exact command line to compile the sketch. I removed the "-o blah.o" part and added "-S" to get the produced assembly. I did this with just one more "--k;" again and the only difference shown between both assembler files is this:

>       lds r24,k
>       lds r25,k+1
>       sbiw r24,1
>       sts k+1,r25
>       sts k,r24
int 2+2+2+2+2=10

Looking up the cycles per assembler statement in spec confirms that a single "--k;" does take 10 clock cycles.

I did the same evaluation for long type variable k on Nano (rows 7-10), and the values in column H are all very near to 20. Doing the same assembler generation as above showed this diff for a single "--k;":

>       lds r24,k
>       lds r25,k+1
>       lds r26,k+2
>       lds r27,k+3
>       sbiw r24,1
>       sbc r26,__zero_reg__
>       sbc r27,__zero_reg__
>       sts k,r24
>       sts k+1,r25
>       sts k+2,r26
>       sts k+3,r27
long 2+2+2+2+2+1+1+2+2+2+2=20

Again looking up the commands in spec confirms 20 clock cycles.

Finally I determined the numberd for Arduino Due as well (rows 12-15). There are no cycle counts listed in AT91SAM3X8E spec. I assume reason is the 3-stage pipeline of the processor. So column H (division of 84.000.000 by adjusted decrement count) shows roughly 6 clock cycles per "--k", but maybe the single commands have counts with a biggere sum. This is the assembly diff for Due:

>       ldr     r2, [r3]
>       subs    r2, r2, #1
>       str     r2, [r3]

Column I shows the processing time of a single decrement in microseconds in order to make the values comparable. The Nano time for long (1.25μs) is double the time for int (0.63μs), and that is still a factor 9 higher than that of a Due (0.07μs).

This whole comparison was done for a decrement operation of a volatile variable, and factors different to 9(18) are likely for Nano vs. Due, but this confirms at least that Due is "much" faster than Nano/Uno.

Here is the whole sketch for completeness:

// needed for RTC.set(_, _), see bottom as well
#include <Wire.h>
#include <Time.h>
#include <DS1307RTC.h>
// from http://www.pjrc.com/teensy/td_libs_DS1307RTC.html


// duplicate passed statements
#define D(stmts) stmts stmts

typedef long num;
// typedef int num;

const int m=100;         // measurement count
num A[m];                // measurements
volatile int a=0;        // next measurement index
volatile num k=10000000; // 

// ISR: take new measurement
void fcnt(void)  { if (a<m)  A[a++]=k; }


void setup(void) {
  Serial.begin(57600);
  
/*
   enable 1kHz SQW square wave (is disabled on DSC3231 powerup)
   http://datasheets.maximintegrated.com/en/ds/DS3231.pdf#page=13
*/
  RTC.set(0x0E, 0x08); 
  
  attachInterrupt(digitalPinToInterrupt(2), fcnt, FALLING);

  // loop with 2^14=16384 times decerement of volatile variable "k" 
  while (a<m)  { D(D(D(D(D(D(D(D(D(D(D(D(D(D(--k;)))))))))))))) } 
    
  // output number of variable decrements between two interrupts
  for(k=1; k<a; ++k)  Serial.println(A[k-1]-A[k]); 
}

void loop(void) { }


/*
added to DS1307RTC.h:
    static bool set(uint8_t reg, uint8_t val);

and to DS1307RTC.cpp:
bool DS1307RTC::set(uint8_t reg, uint8_t val)
{
  Wire.beginTransmission(DS1307_CTRL_ID);
  Wire.write(reg); // reset register pointer  
  Wire.write(val) ;   
  if (Wire.endTransmission() != 0) {
    return false;
  }
  return true;
}
*/

Hermann.

Interesting that it shows the Sam to be faster (in this testcase) than the AT328 per clock cycle.

If you're interested, this is how one may overclock the Due to 100mhz+

#if 1
   #define SYS_BOARD_PLLAR (CKGR_PLLAR_ONE | CKGR_PLLAR_MULA(18UL) | CKGR_PLLAR_PLLACOUNT(0x3fUL) | CKGR_PLLAR_DIVA(1UL))
    #define SYS_BOARD_MCKR ( PMC_MCKR_PRES_CLK_2 | PMC_MCKR_CSS_PLLA_CLK)
           
    /* Set FWS according to SYS_BOARD_MCKR configuration */
    EFC0->EEFC_FMR = EEFC_FMR_FWS(4); //4 waitstate flash access
    EFC1->EEFC_FMR = EEFC_FMR_FWS(4);

    /* Initialize PLLA to 114MHz */
    PMC->CKGR_PLLAR = SYS_BOARD_PLLAR;
    while (!(PMC->PMC_SR & PMC_SR_LOCKA)) {}

    PMC->PMC_MCKR = SYS_BOARD_MCKR;
    while (!(PMC->PMC_SR & PMC_SR_MCKRDY)) {}
#endif

Taken from AvrFreaks

"but this confirms at least that Due is "much" faster than Nano/Uno."

Seems like a ridiculous amount of time and effort to prove something that is pretty blatantly obvious from simply looking at a few high-level specs for the two chips. Both are RISC processors, so at the first order, simply comparing clock rates will give you the worst-case comparison: 83MHz / 16 MHz => Due is ~ 5X faster at a minimum. Next, Due is 32-bit, while the AVR is 8-bit.. That, combined with the Dues hardware multiplier and barrel shifter, is good for another ~2X on most code, making the Due ~10X faster than an AVR.

Regards,
Ray L.

Ray,
Other than simply stating so, Hermann has actually quantified that difference, or theoretic advantage, as it were.

Ray,

first I did all the experiments for myself, and I wanted to get a real understanding of what is going on. In the process I learned how to get the assembly code. And I had no idea why column C (Hz*max) was so different before I thought on the cost of the "Hz" many interrupts per second. Normally Arduino programs can be reproduced up to microseconds and so I am still not perfectly OK with the "roughly" 10 or "roughly" 20 cycles measured. But for getting a basic understanding the experiments were good enoiugh.

If this is all clear to you that is fine, but I posted it here because some of the techniques may be useful for others as well. I did want to know "a" (decrement of int/long volatile variable) factor exactly and not only ~2 or ~5.

I do not have an oscilloscope at home, but 10 years ago I had some at IBM Böblingen lab and did analysis of side channel attacks for smartcards (we developed smartcard OSes at that tme). It was beautiful to be able to see (crypto) computations repeat identically up to nanoseconds and I really like doing similar with Arduino up to microseconds (or nanoseconds on Due). Smartcards do have a hardware random number generator and tuning that on and off gives a huge peak on oscilloscope measuring card reader voltage. I did add these peaks (temporarily) at different locations in our code and was able to debug code on the scope. I will definitely try some Arduino based scope sketches in the future.

Okio:
... If you're interested, this is how one may overclock the Due to 100mhz+ ...

Thanks for the pointer, I will not do that right now. With camera I will most likely not use full VGA resolution but more quad or even qquad(160x120) because such a frame will fit into Due memory completely even with 2 byte per pixel (38400 bytes). After I will have that under control I will try to see what overclocking Due might achieve fps wise (eg. for camera->TFT).

Okio:
Interesting that it shows the Sam to be faster (in this testcase) than the AT328 per clock cycle.

If you're interested, this is how one may overclock the Due to 100mhz+

#if 1

#define SYS_BOARD_PLLAR (CKGR_PLLAR_ONE | CKGR_PLLAR_MULA(18UL) | CKGR_PLLAR_PLLACOUNT(0x3fUL) | CKGR_PLLAR_DIVA(1UL))
    #define SYS_BOARD_MCKR ( PMC_MCKR_PRES_CLK_2 | PMC_MCKR_CSS_PLLA_CLK)
         
    /* Set FWS according to SYS_BOARD_MCKR configuration */
    EFC0->EEFC_FMR = EEFC_FMR_FWS(4); //4 waitstate flash access
    EFC1->EEFC_FMR = EEFC_FMR_FWS(4);

/* Initialize PLLA to 114MHz */
    PMC->CKGR_PLLAR = SYS_BOARD_PLLAR;
    while (!(PMC->PMC_SR & PMC_SR_LOCKA)) {}

PMC->PMC_MCKR = SYS_BOARD_MCKR;
    while (!(PMC->PMC_SR & PMC_SR_MCKRDY)) {}
#endif




Taken from [AvrFreaks](http://www.avrfreaks.net/forum/running-due-1mbaud-and-25m-baud-solved?page=all)

Ok, I will bite. I edited the 'system_sam3xa.c' file as outlined above, it doesn't appear to make ANY difference.... What am I missing?

4.8.3-2014q1/bin/arm-none-eabi-g++ -c -g -Os -w -ffunction-sections -fdata-sections -nostdlib -fno-threadsafe-statics --param max-inline-insns-single=500 -fno-rtti -fno-exceptions -Dprintf=iprintf -MMD -mcpu=cortex-m3 -DF_CPU=84000000L -DARDUINO=10605 -DARDUINO_SAM_DUE -DARDUINO_ARCH_SAM -D__SAM3X8E__ -mthumb -DUSB_VID=0x2341 -DUSB_PID=0x003e -DUSBCON -DUSB_MANUFACTURER="Unknown" -DUSB_PRODUCT="Arduino Due" -IC:\Users\Graham\AppData\Roaming\Arduino15\packages\arduino\hardware\sam\1.6.4\system/libsam

Does the 'F_CPU' need changing or is it pretty meaningless too?

I would love to get this working but since there is so little on google besides the same guy posting here and on sparkfun with no explanation or examples, I remain unconvinced it does anything......

Regards,

Graham

Recursive search in "~/.arduino15" directory

$ find ~/.arduino15 -type f -exec grep -nH SYS_BOARD_PLLAR {} \;

does finds many files, seems like

~/.arduino15/packages/arduino/hardware/sam/1.6.4/system/CMSIS/Device/ATMEL/sam3xa/source/system_sam3xa.c

is the Due one.

The main difference is "CKGR_PLLAR_MULA(18UL)" instead of "CKGR_PLLAR_MULA(0xdUL)" in definition of SYS_BOARD_PLLAR in that file.

From .arduino15/packages/arduino/hardware/sam/1.6.4/system/CMSIS/Device/ATMEL/sam3xa/include/component/component_pmc.h:

...
#define CKGR_PLLAR_MULA_Pos 16
#define CKGR_PLLAR_MULA_Msk (0x7ffu << CKGR_PLLAR_MULA_Pos) /**< \brief (CKGR_PLLAR) PLLA Multiplier */
#define CKGR_PLLAR_MULA(value) ((CKGR_PLLAR_MULA_Msk & ((value) << CKGR_PLLAR_MULA_Pos)))
...

Perhaps the changes from Okio need to be done in system_sam3xa.c ...

From http://www.atmel.com/images/atmel-11057-32-bit-cortex-m3-microcontroller-sam3x-sam3a_datasheet.pdf#page=519:
PLLACK is the output of the Divider and 96 to 192 MHz programmable PLL (PLLA).

HermannSW:

~/.arduino15/packages/arduino/hardware/sam/1.6.4/system/CMSIS/Device/ATMEL/sam3xa/source/system_sam3xa.c

is the Due one.

The main difference is "CKGR_PLLAR_MULA(18UL)" instead of "CKGR_PLLAR_MULA(0xdUL)" in definition of SYS_BOARD_PLLAR in that file.

HermannSW:
Perhaps the changes from Okio need to be done in system_sam3xa.c ...

ghlawrence2000:
Ok, I will bite. I edited the 'system_sam3xa.c' file as outlined above, it doesn't appear to make ANY difference.... What am I missing?

.......

Regards,

Graham

Sorry that I overlooked your hint.

I did a complete build with verbose compile messages.
There was exactly one string "sam3x" in the whole output:
.arduino15/packages/arduino/hardware/sam/1.6.4/variants/arduino_due_x/libsam_sam3x8e_gcc_rel.a

From file libsam_sam3x8e_gcc_rel.a.txt in same directory:

...
system_sam3xa.o:
00000000 D SystemCoreClock
00000000 T SystemCoreClockUpdate
00000000 T SystemInit
00000000 T system_init_flash

...

And "SystemInit()" is exactly the function with the changes proposed by Okio.
And the .txt file confirmes that indeed sam3xa is used.

So it seems you just have tpo trigger recompilation of "libsam_sam3x8e_gcc_rel.a" in order to see effects for any program compiled with Arduino IDE.

I have not figured out yet how to rebuild "libsam_sam3x8e_gcc_rel.a", if you can provide instructions I could try Okio's changes and see whether the measured numbers go up (indicating overclocking to work).

Hermann.

Ok, so I am getting a little out of my depth here....

Hermann, are you familiar with or have you used 'Make' before?

I have 'arm-none-eabi-gcc' in my environment path, I am using Windows7 btw.

I have opened a command prompt in :- 'C:\Users\Graham\AppData\Roaming\Arduino15\packages\arduino\hardware\sam\1.6.4\variants\arduino_due_x\build_gcc'.

When I issuse 'Make' this is what I get :-

------------------------------------------------------------------------------------
--- Making variant arduino_due_x
------------------------------------------------------------------------------------
-------------------------
--- Preparing variant arduino_due_x files in debug_arduino_due_x ../../../cores/arduino
-------------------------
make[1]: [create_output] Error 1 (ignored)
------------------------------------------------------------------------------------
/bin/sh: /arm-none-eabi-gcc: No such file or directory
make[1]: *** [debug_arduino_due_x/variant.o] Error 127
make: *** [arduino_due_x] Error 2

So I thought I was on the right track, and would try to be a little more clever and tried :-

'Make -IC:\Users\Graham\AppData\Roaming\Arduino15\packages\arduino\tools\arm-none-eabi-gcc'

But I get exactly the same error... Any ideas?

You are right, it does need 'libsam_sam3x8e_gcc_rel.a' to be rebuilt as I inserted a '#pragma message()' command into 'system_sam3xa.c' and that file is never compiled.

As I said, I am a little out of my depth here....

Regards,

Graham

Edit: I am going round in circles and starting to get annoyed........... is this more likely to work under the dreaded *nix? I would be even more out of my depth with *nix but at least the environment would be set up correctly... Some EXPERT help would be nice.....

Even more annoyed now, having spent past 3 hours trying to setup Ubuntu, I have failed to achieve anything which may contribute to fixing this.

Sleepy time....

Graham

Hi Graham,

I do run Linux only since 9 years and don't miss anything (from Windows).

I saw similar compile errors than you. Reason is that environment variable ARM_GCC_TOOLCHAIN was not set and therefore "/arm-none-eabi-gcc" was called under root directory where it does not reside.

Another recursive grep led me to the directory where the library can be rebuilded easily. These are the steps:

  • set environment variable ARM_GCC_TOOLCHAIN=~/.arduino15/packages/arduino/tools/arm-none-eabi-gcc/4.8.3-2014q1/bin/
  • cd ~/.arduino15/packages/arduino/hardware/sam/1.6.4/system/libsam/build_gcc
  • make clean
  • make libsam_sam3x8e_gcc_rel.a

I see a huge number of "packed attribute" warnings, but these files of interest get created (newly after "make clean"):

  • ../../../cores/arduino/libsam_sam3x8e_gcc_rel.a
  • ../../../cores/arduino/libsam_sam3x8e_gcc_rel.a.txt
  • ../../../system/libsam/build_gcc/release_sam3x8e/system_sam3xa.o

The order of ".o" files in ".txt" file is totally different to that of "libsam_sam3x8e_gcc_rel.a.txt" used by Arduino IDE.. I did do a "sed paragraph sort" (http://sed.sourceforge.net/sed1line.txt) for both roughly 550 line files. The diff is only 200 lines (and not 1100), but still too much for me from a risk standpoint.

Good news is that 3rd file of interest gets rebuild, and its entries are identical in Arduino IDE lib and newly created lib:

system_sam3xa.o:
00000000 D SystemCoreClock
00000000 T SystemCoreClockUpdate
00000000 T SystemInit
00000000 T system_init_flash

What I will try next is to

  • make a copy of old Arduino IDE lib
  • remove sam3xa.o from the lib
  • add newly created 3rd file of interest to the lib

That should be the most riskless solution, the tool to modify the lib is arm-none-eabi-ar.

Hermann.

OK, repĺacing "system_sam3xa.o" in "libsam_sam3x8e_gcc_rel.a" was easy.
And more importantly, flashing my program from first posting in this thread worked.
And it produced the same measurements as before (because I did not do any of Okio's changes yet).

$ cd ~/.arduino15/packages/arduino/hardware/sam/1.6.4/variants/arduino_due_x
$ ll
total 188
drwxrwxr-x. 4 stammw stammw   4096 Jan  9 11:17 build_gcc
drwxrwxr-x. 4 stammw stammw   4096 Apr 23  2015 debug_scripts
-rw-rw-r--. 1 stammw stammw 116884 Jan  9 12:49 libsam_sam3x8e_gcc_rel.a
-rw-rw-r--. 1 stammw stammw  15179 Jan  9 12:26 libsam_sam3x8e_gcc_rel.a.txt
drwxrwxr-x. 4 stammw stammw   4096 Apr 23  2015 linker_scripts
drwxrwxr-x. 2 stammw stammw   4096 Jan  9 12:36 old
-rw-rw-r--. 1 stammw stammw    823 Apr 23  2015 pins_arduino.h
-rw-rw-r--. 1 stammw stammw  23248 Apr 23  2015 variant.cpp
-rw-rw-r--. 1 stammw stammw   8263 Apr 23  2015 variant.h
$ export PATH=$PATH:~/.arduino15/packages/arduino/tools/arm-none-eabi-gcc/4.8.3-2014q1/bin
$ arm-none-eabi-ar -t libsam_sam3x8e_gcc_rel.a | grep system
system_sam3xa.o
$ arm-none-eabi-ar -x libsam_sam3x8e_gcc_rel.a system_sam3xa.o
$ ll system_sam3xa.o 
-rw-r--r--. 1 stammw stammw 1848 Jan  9 12:50 system_sam3xa.o
$ arm-none-eabi-ar -d libsam_sam3x8e_gcc_rel.a system_sam3xa.o
$ cp ~/.arduino15/packages/arduino/hardware/sam/1.6.4/system/libsam/build_gcc/release_sam3x8e/system_sam3xa.o .
$ ll system_sam3xa.o 
-rw-r--r--. 1 stammw stammw 1892 Jan  9 12:52 system_sam3xa.o
$ arm-none-eabi-ar -t libsam_sam3x8e_gcc_rel.a | grep system
$ arm-none-eabi-ar -q libsam_sam3x8e_gcc_rel.a system_sam3xa.o
$ arm-none-eabi-ar -t libsam_sam3x8e_gcc_rel.a | grep system
system_sam3xa.o
$

Did another test, removed "system_sam3xa.o" from "libsam_sam3x8e_gcc_rel.a" and uploaded "Blink" example. Upload worked fine, but nothing happened (because "SystemInit()" was missing). After readding "system_sam3xa.o" to "libsam_sam3x8e_gcc_rel.a" Blink example does blink after uploading newly.

I will stop here. Below diff shows the only real changes from Okio.
I did rebuild "system_sam3xa.o" and replaced the one in "libsam_sam3x8e_gcc_rel.a".
Now uploading simple Blink example does not blink.

We need Okio to clarify on what he did and where and how.

$ diff system_sam3xa.c.orig system_sam3xa.c
31c31
< 							| CKGR_PLLAR_MULA(0xdUL) \
---
> 							| CKGR_PLLAR_MULA(18UL) \
79c79
< 	PMC->PMC_MCKR = (SYS_BOARD_MCKR & ~PMC_MCKR_CSS_Msk) | PMC_MCKR_CSS_MAIN_CLK;
---
> 	PMC->PMC_MCKR = SYS_BOARD_MCKR;
$

Hermann.

I looked into the Sparkfun posting and found instructions that made overclocking work:
https://forum.sparkfun.com/viewtopic.php?f=42&t=38327#p187184

Okio's lines just need to be inserted as first lines in Setup(), nothing more.
Serial communication does not work anymore, though.
I measured 108MHz by manually counting blinks of Blink example using stopwatch.

Yes, I realised doing it in setup 'something' happens because not only is serial messed up but among other things micros/millis are messed up also. I was after doing this 'properly'... with all features working.

I am still doing battle with Ubuntu even having followed your instruction about setting the environment variable....

Did you try make in the variants/arduino_due_x/build-gcc folder?

Graham

By the way, if you want to fix your serial while still using the Setup() method, this will do the trick :-

#define SYS_BOARD_PLLAR (CKGR_PLLAR_ONE | CKGR_PLLAR_MULA(18UL) | CKGR_PLLAR_PLLACOUNT(0x3fUL) | CKGR_PLLAR_DIVA(1UL))
#define SYS_BOARD_MCKR ( PMC_MCKR_PRES_CLK_2 | PMC_MCKR_CSS_PLLA_CLK)
//Set FWS according to SYS_BOARD_MCKR configuration 
EFC0->EEFC_FMR = EEFC_FMR_FWS(4); //4 waitstate flash access
EFC1->EEFC_FMR = EEFC_FMR_FWS(4);
// Initialize PLLA to 114MHz
PMC->CKGR_PLLAR = SYS_BOARD_PLLAR;
while (!(PMC->PMC_SR & PMC_SR_LOCKA)) {}
PMC->PMC_MCKR = SYS_BOARD_MCKR;
while (!(PMC->PMC_SR & PMC_SR_MCKRDY)) {} 
SystemCoreClockUpdate();

Regards,

Graham

PS, it doesn't fix SPI speed problems...

Thank you Graham, "SystemCoreClockUpdate()" call is so cool and fixes Serial completely!

I just ran my sketch from first posting in this thread.
It did measure maximum of 13572 with 1024Hz interrupt frequency @84MHz Due speed.
Now maximum of 17281 (increments of volatile long variable) gets reported!

So the speedup is real, and the program works as if Due was run normally @84MHz, just quicker.

Btw, how to configure for 96MHz and spec reference on how MULA value (18/15/13) translates to Due clock frequency can be found in this posting:
http://forum.arduino.cc/index.php?topic=241503.msg2556836#msg2556836