Measuring Speed of Light with Arduinos and no moving parts

OK, this seems to be all I have for now, next steps on the weekend:

  • try with 192MHz for better time precision (after more stability/reliability experiments!)
  • compare measurements with same sensor for longer distance,
    but closer together for having less brightness difference

If you have any recommendations or ideas on how to increase measurement precision for above described MSoL experiment, please respond here.

Hermann.

Although this is not rocket science, here are some ideas I would experiment to realize this measurement :

I/
1/ Hook a first light sensor to ADC channel 7(A0)

II/
1/ Code DAC0 to trigger the laser beam for 200 ns and at the same time start an
ADC PDC DMA to record the output of ADC channel 7. Code your ADC for a low resolution on 10 bits instead of 12 by default because the conversion is faster.

To wait for a precise number X of ticks, you can use X times the NOP instruction, because the NOP does nothing and waits exactly for 1 tick.

Calibrate your ADC PDC DMA to know how many words are affected by the high level of the light sensor for 200 ns and detect the level for a rising edge above the noise. Now you have a ratio : number of Words for 200 ns

2/ Hook a second laser sensor to ADC channel 6 (A1), same model as the first one.

Code DAC0 to trigger the laser beam for 200 ns and at the same time start an
ADC PDC DMA to record the outputs of ADC channels 7 and 6.

III/
See how many words are affected by the rising edge of sensor 1 and how many words are affected by the rising edge of sensor 2 in the ADC PDC DMA recording, subtract the 2 numbers.

IIII/
Permut laser sensors 1 and 2 and do II/2 again. You will have an average of the 2 measurements.

With the previous calibration of the ADC PDC DMA, you know exactly the corresponding time between the rising edge of sensor 1 and the rising edge of sensor 2 (which should be around 50 ns).

With this method, you don't have to bother with the laser starting time and/or the sensor reactivity .

With this method, you get the maximum precision thanks to the super fast PDC DMA.
There is an even faster recording method with the Peripheral to Memory Advanced High performance bus DMA (AHB DMA) but seems to be very tricky to use.

To enhance the UC speed, start Clock_System with the max you can just before you trigger DAC0 and restore 84 M Hz with Clock_System (13) just after the 200 ns recording. Of course Clock_System should be used with the same number for calibration and final experiment.

Thanks, I will try.

I never did "ADC PDC DMA", found this thread and hope it is related:
http://forum.arduino.cc/index.php?topic=205096.0

Code DAC0 to trigger the laser beam for 200 ns

The laser transmitter I have is a 650nm 5V laser. And I tested that it does not work with 3.3V (beam is nearly not existent). I did run "03 Analog Fading" demo on DAC0 and the scope says that not even 3V are reached:

Is this 3V laser transmitter what is needed for working with DAC0?
http://www.aliexpress.com/item/Free-Shipping-2pcs-Brass-metal-shell-laser-transmitter-650NM-3v-laser-head/1964552437.html

If I understand correctly your method to read input avoids use of ISR, at least for the initial steps. How is the >200 clock cycles overhead avoided when doing the measurements for two events later (on same Arduino Due?)?

Hermann.

I read the instructions again and now I see that step III does the measurements without any interrupt involved.

So instead of using a 3V laser I do not (yet) have I will try to trigger my 5V laser from Arduino Due for 200ns via the bidirectional level converter.

The only thing left for me is to understand how "ADC PDC DMA" works. "See how many words are affected by the rising edge of sensor 1" and "DMA" seems to mean that the measurements get stored in (consecutive) memory cells, sounds really interesting.

Hermann.

Hello HermannSW

For PDC DMA see for example this thread from reply 8 and after : speed of analogRead - Arduino Due - Arduino Forum

In your case, you will not use circular buffers, only one DMA buffer, no Next DMA buffer (0) so the sample rate would be better. A 1 M samples per second is clearly not enough. I think 10 M samples per second is a minimum for this experiment.

Then recording process could be that one:

Record one analog input at a time for maximum speed:
You place a sensor at the location of sensor 1, you hook this sensor on A0, start your program and record the rising edge with only A0 enabled. You record N1 words before the rising edge.

Then you place the same sensor at the location of sensor 2, you hook this sensor on A0 , start your program and record the rising edge with only A0 enabled. You record N2 words before the rising edge.

There is no overhead with the example pulse length of 200 ns used for calibration, it could be 2000, this is only for calibration: how many words recorded for which pulse length, this is proportional.

Use for calibration a buffer size sufficient to cover the pulse length. I think word aligned buffers should accelerate the process.

So you get N3 words between the rising and the falling edges of a 2000 ns pulse.
( (N2 – N1)/N3) * 2000 = T ns

With the distance D = (lasermirror2sensor2) – (lasermirror1sensor1)
D/T is the speed of light

Since your laser must be supplied with 5 volts, the DAC will not do the job because its output is in the range of 1/6 * 3.3V minimum and 5/6 * 3.3V maximum. A high level on a PIO and a logical level shifter should do the job.

In case ADC PDC DMA sampling is not sufficient, I think inline assembly code will be necessary to have a precise count of clock cycles inside interrupt functions.

If i would try this, i'd consider using SPI Dma
I quickly googled, and think this dma can go up to 42 Mhz. Which means you can at a speed of 42Mhz send data to a pin (the MOSI pin of the SPI), and receive data from an other pin (the MISO pin of the SPI).

So if you'd send the signal 1111111111 (with the MOSI pin connected to the laser, with a transistor in between i assume)
And received 0000111111 (with the MISO pin connected to the sensor, and its signal is strong enough te be registered as a 1)
Then the signal was underway for about one 10.000.000th of a second.

You'd first have to calibrate this by setting them right next to eachother, and seeing how much delay you have without any distance in between (since i assume the laser needs time to fire up, and the sensor to receive it).
And then you can start doing experiments with a very fine resolution :).

Thanks racemaniac, perhaps I will have to come back to this.

Thanks ard_newbie for the pointer to the thread needed, I took stimmer's sketch fronm this posting in the thread as basis for my experiment (I used DueVGA library from stimmer a lot):
http://forum.arduino.cc/index.php?topic=137635.msg1137618#msg1137618

The version with the differences I made is attached.

Let me discuss the differences I made.

First I added some variables, a macro for doubling statements passed in and D21 macro. D21 increments a variable 21 times which takes exactly 21 clock cycles. Compiler cannot optimize away (I always use -O3) the statements because the variable is declared volatile:

2a3,11
> // C.23 = D7
> Pio *p = digitalPinToPort(7);
> uint32_t b7 = digitalPinToBitMask(7);
> 
> uint32_t volatile cnt=0;
> 
> #define D(stmts) stmts; stmts;
> #define D21  D(D(D(D(cnt++)))) D(D(cnt++)) cnt++;
>

Next I added a buffer and made a new buffer cycling. The effect is that buf[0] gets filled once and will not be touched anymore. This allowed me to do my experiments without having to understand how to stop the analog reads:

12c21
< uint16_t buf[4][256];   // 4 buffers of 256 readings
---
> uint16_t buf[5][256];   // 4 buffers of 256 readings, 5th special
17c26
<   bufn=(bufn+1)&3;
---
>   bufn=(bufn&3)+1;      // cycle 0123412341...

I did measure the clock cycles a complete buffer fill with 256 converted analog values takes, it was less that 36000 clock cycles or 428μs(!).

I don't need native USB data output but just some debug output. I did connect pin D7 with A0, so set D7 to LOW initially:

24,25c33,36
<  SerialUSB.begin(0);
<  while(!SerialUSB);
---
>  Serial.begin(57600);
>  while(!Serial);
>  pinMode(7,OUTPUT);
>  p->PIO_CODR = b7; // digitalWrite(7,LOW);

Here comes the area where I did all my experiments. First 512 clock cycles are spent to be at the point the first analog value has been converted. Then each D21 uses 21 clock cycles and lets D7 on low for one more analog conversion. After setting D7 to HIGH and waiting for buf[0] being completely read, the first 10 values are written to Serial:

41a53,66
> 
>  D(D(D(D(D(D(D(D(D(cnt++)))))))))  // wait until 1st analog capture
>  D21 D21 D21 D21 
> // D21 D21 D21 D21
>  
>  p->PIO_SODR = b7; // digitalWrite(7,HIGH);
>  
>  while(obufn==bufn); // wait for buffer to be full
>  obufn=(obufn&3)+1;    
>  for(uint32_t i=0; i<10; ++i) {
>    Serial.print(buf[0][i]);
>    Serial.print(" ");
>  }
>  Serial.println();

The last diff is just omitting sending back read data to Serial:

46,47c71
<  SerialUSB.write((uint8_t *)buf[obufn],512); // send it - 512 bytes = 256 uint16_t
<  obufn=(obufn+1)&3;    
---
>  obufn=(obufn&3)+1;

This is example output after pressing Reset button several times:

4 4 2 5 2 4095 4095 4095 4095 4095 
3 4 2 5 2 4095 4095 4095 4095 4095 
3 3 2 5 2 4095 4095 4095 4095 4095 
3 3 2 4 3 4095 4095 4095 4095 4095 
4 4 3 4 2 4095 4095 4095 4095 4095 
3 4 2 4 2 4095 4095 4095 4095 4095 
3 3 2 4 1 4095 4095 4095 4095 4095 
3 3 2 4 2 4095 4095 4095 4095 4095 
3 3 2 4 2 4095 4095 4095 4095 4095

The 21 clock cycles are absolutely reliable, I tested once with 38 D21 statents.

The rising edge is really steep, I reduced last D21 to 20, 19, ... increments. I saw "... 2 5 4081 4095 ...", which basically means that the rising edge takes just one ADC conversion.

Sofar so good, but 21 clock cycles per completed analog read means 21*11.9ns=250ns. Light travels 75m in that time ...

But I remember that you told me to reduce ADC resolution from 12 to 10 bits, will be the next thing to do.

The ADC frequency is set to maximum which is ADC_FREQ_MAX = 20,000,000.

12bit->10bit and overclocking might help to reduce the time needed for a single analog conversion, but maybe not by a factor of 5 to fit into my home.

Is Due chip capable of doing digital DMA reads? Instead of reading an analog pin and converting it in 21 clock cycles, just reading a digital pin and say write it in 2 clock cycles with DMA to Arduino Due's memory? That would be the solution, since my laser sensor already does the conversion to 0/1 on the module, and I used that signal in the experiment described.

Hermann.

sketch_jun11b.ino (1.71 KB)

HermannSW:
Is Due chip capable of doing digital DMA reads? Instead of reading an analog pin and converting it in 21 clock cycles, just reading a digital pin and say write it in 2 clock cycles with DMA to Arduino Due's memory? That would be the solution, since my laser sensor already does the conversion to 0/1 on the module, and I used that signal in the experiment described.

Hermann.

That's why i said to use SPI dma :slight_smile:
it'll do exactly that for you :slight_smile:

The main reason i'd use SPI is because then you know you no longer have to worry about any other processor overhead. The spi will 42 million times per second send a bit to a pin, and receive data from a pin. You want something very predictabe, and very well synced, SPI will do that for you. Any other methods i'd expect you to have a lot of noise or delays to take into account since your code will take time to execute, the things you use have latencies, .... certainly when writing this in c :).

Thanks, I will go that route, especially since I now tested 10bit ADC versus 12bit:

39c39
<  ADC->ADC_MR |=0x80; // free running
---
>  ADC->ADC_MR |=0x84; // free running, lowres (10bit)

With 10bit one analog read cycle takes the exact same 21 clock cycles as it does with 12bit.

Do you have a pointer to sample code on how to read a digital pin with 42MHz?
In experiment Due would turn laser one, and then, in very few clock cycles (of 84MHz) first digital pin will go from LOW to HIGH, and then, 4-11 clock cycles later the other digital ping will go from LOW to HIGH.

So lets start with just monitoring a single laser sensor pin:

That sensor OUT pin is either LOW or HIGH, do you mean I should connect it to MISO pin of SPI?
You said MOSI pin, where should that get connected to (sensor module only has VCC, GND and OUT)?

Hermann.

HermannSW:
Thanks, I will go that route, especially since I now tested 10bit ADC versus 12bit:

39c39

<  ADC->ADC_MR |=0x80; // free running

ADC->ADC_MR |=0x84; // free running, lowres (10bit)




With 10bit one analog read cycle takes the exact same 21 clock cycles as it does with 12bit.


Do you have a pointer to sample code on how to read a digital pin with 42MHz?
In experiment Due would turn laser one, and then, in very few clock cycles (of 84MHz) first digital pin will go from LOW to HIGH, and then, 4-11 clock cycles later the other digital ping will go from LOW to HIGH.

So lets start with just monitoring a single laser sensor pin:
![](https://stamm-wilbrandt.de/en/forum/laser.sensor.module.png)

That sensor OUT pin is either LOW or HIGH, do you mean I should connect it to MISO pin of SPI?
You said MOSI pin, where should that get connected to (sensor module only has VCC, GND and OUT)?

Hermann.

Spi is a serial protocol, it usually uses 4 pins:
CLK: clock pulse, this gives the frequency of the serial data. Whenever the pulse goes from high to low, or low to high (it's configurable), it will send and receive a bit on its lines
CS: a pin that's set to high when a device has to listen to the SPI
MOSI: Master Out, Slave In. This pin is your due sending data to the SPI device
MISO: Master In, Slave Out. This pin is your due receiving data from an spi device.

Now ofcourse we don't have an spi device in this case, so we ignore the CLK pin & the CS pin, and will kind of abuse this port for some other end. What does this mean: We now have 2 pins: MISO & MOSI which we have very high speed control over. The MISO pin will be polled by the microcontroller at 42Mhz, and that data will be written to the memory by the DMA. And for the MOSI pin you can set data at the same speed, so you can toggle your laser very precisely, and that data also comes from your memory, from where the DMA reads.
So in this case you'd want to connect your laser to the MOSI pin (with whatever you need in between so that the pin isn't overloaded).
And the output pin of your sensor indeed goes to the MISO pin of the spi, as this is the pin that will get read a lot.

How to do this on the DUE: you'll have to start googling, there are libraries that use DMA SPI on the due, so you should be able to find info & examples. But personally i've been using another microcontroller and know the DMA from that one, but have no experience with the DUE yet.

Hello there

The SPI solution proposed by racemaniac with SPI AHB DMA is clearly the fastest method , although Peripheral to Memory AHB DMA is a bit tricky to implement. It seems that by chance a library exists (with 1 master and 2 slaves in your case ?).

I noticed 2 details in your trials with ADC.
1/ You tried the 10 bits resolution, but the correct implementation is this one (datasheet 43.7.2):
ADC-ADC_MR = ADC_MR_FREERUN_ON | ADC_MR_LOWRES_BITS_10;
With ADC_MR_FREERUN_ON | ADC_MR_LOWRES_BITS_10 = 0x80 | 0x10 = 0x90
I don't know if it changes something but should be tried.

2/And if I understand your manip, you observed a 21 clock cycles offset after the rising edge, which seems logic because the 12 bits resolution takes 20 clock cycles, plus one to register by the PDC, and this rising edge comes (512 + 6) Half Words after PDC starting, right ?

Suppose this was done when your sensor was at location of sensor 1, now when you place your sensor at the location of sensor 2 and do the manip again, you will observe a rising edge coming after (512 + 6 + X) Half Words.

So when you subtract the 2 results, the offset is discarded, including the 21 clock cycles , rest the X Half Words. And thanks to calibration, you know the timing correspondence.

Thanks for correcting me on 1/, I did retest with 0x90 vs 0x80 and there was no difference in timing. I did measure from "adc_init()" until after the 1st block was converted "while(obufn==bufn);", 35822 clock cycles in both cases.

Let me clarify on 2/, I did not test with laser light yet, as said before I just have a connection from D7 to A0 for the sketch I used. I did work on this yesterday afternoon in a cafe where I had to wait for 2 hours :wink:

I realized that the German sentence "Wer mißt misst Mist" ("Who measures measures crap") partially applies to my reported measurements. D21 did consist of 21 "cnt++" statements, that is right. But that did not translate to 21 clock cycles, but 126 beacuse "cnt++" gets optimized compiler to this:

6 clock cycles
ldr     r7, [r3]
adds    r7, r7, #1
str     r7, [r3]

I verified that by using 100 "t=SysTick->VAL" statements as D21, they had same effect and took 125 clock cycles:

1.25 clock cycles
ldr     r3, [r2, #8]

I remembered that somebody said "nop" would take exactly 1 clock cycle, but even that is not true! A big sequence of 888 nops takes 1000 clock cycle:

9/8=1.125 clock cycles
nop

Because 5/4 seemed easier than 9/8 I did measurements based on 100 "t=SysTick->VAL"s in a D21, see bottom for diff to previous posted sketch (as said you can verify with only having a Due and a D7-A0 connector cable yourself) and the new sketch attached as well.

Summary from the new measurements (with clock timing control and generated assembler control [copy out IDE compile command that compiles sketch, remove the trailing "-o ..." and add "-S"], both is important) that a single "D21" being the time needed to get one full ADC conversion is 125/126 clock cycles. That is loooong for measuring speed of light at home.

Let me add some more measurments I did. While a ADC cycle takes 125 clock cycles, in a sketch with absolutely reproducible setup as with D7-A0 connector I was able to get a picture of the rising edge at 1.25 clock cycle or 15ns timing resolution!

I did that by adding a single 1.25 clock cycle statement D1 before setting D7 to HIGH until I found the rising edge. So I had to run many sketches, and I did run each several times to see that the reported values are reproducible and stable. This was a sample line reported:

4 2 2 2 1007 4095 4095 4095 4095 4095

So setting D7 to HIGH with a single port statement "p->PIO_SODR = b7;", the rising edge takes 15ns even if looking at the big step from 1007 to 3916 only.

I have created a small laser transmitter and sensor testbed to measure the effect of real laser instead of setting a pin to HIGH on the Due:

Last let me say that I have not investigated DMA SPI with 42MHz yet, the 2 clock cycle resolution of measurements for a single bit sound promising.

What I found is a very simple method to record all 32 bits of port b every 3 clock cycles, but only 6 times in total. While 6 timestamps that can be taken does not sound much, the timeframe of (6-1)311.9ns=178.5ns should be more than enough for measuring speed of light at home, because light travels 54m(!) in that time.

This is the simple script, no DMA, just storing port value into local(!) variables, and "-O3" compilation:

// C.23 = D7
Pio *p = digitalPinToPort(7);

void setup() {
  uint32_t b,a1,a2,a3,a4,a5,a6,a7;
  uint32_t t0,t1,b7 = digitalPinToBitMask(7);
  Serial.begin(57600);
  while(!Serial){}
  pinMode(7, INPUT);
  t0=SysTick->VAL;  // 1
  a1 = p->PIO_PDSR; // 7 
  a2 = p->PIO_PDSR; // 10
  a3 = p->PIO_PDSR; // 13  
  a4 = p->PIO_PDSR; // 16
  a5 = p->PIO_PDSR; // 19
  a6 = p->PIO_PDSR; // 22
/*  
  a7 = p->PIO_PDSR; // 26
*/
  t1=SysTick->VAL;
  Serial.println( ((t0<t1)?84000+t0:t0)-t1 );
  Serial.print(a1,HEX); Serial.print(",");
  Serial.print(a2,HEX); Serial.print(",");
  Serial.print(a3,HEX); Serial.print(",");
  Serial.print(a4,HEX); Serial.print(",");
  Serial.print(a5,HEX); Serial.print(",");
  Serial.print(a6,HEX); Serial.print(",");
  Serial.print(a7,HEX);
  Serial.println();
}

void loop() {}

Just for demoing this program I connected D7 (now digital INPUT pin) with GND first, did run sketch, then connected it do 3.3V pin and ran sketch again. The bit flip can be seen in Serial Monitor output, 7F7... vs 7FF...:

22
7F7FFFFE,7F7FFFFE,7F7FFFFE,7F7FFFFE,7F7FFFFE,7F7FFFFE,0
22
7FFFFFFE,7FFFFFFE,7FFFFFFE,7FFFFFFE,7FFFFFFE,7FFFFFFE,0

So if one can arrange an experiment to happen inside the range of 15 clock cycles one gets all 32 digital bit settings for port b at 3 clock cycle resolution! I hope measurements with laser in testbed described above will allow to do exactly that, with 1 or 2 laser sensors.

Hermann.

$ diff sketch_jun11b.ino.posted sketch_jun11b/sketch_jun11b.ino 
7a8
> uint32_t t;
10c11,17
< #define D21  D(D(D(D(cnt++)))) D(D(cnt++)) cnt++;
---
> #define D1 t=SysTick->VAL;
> #define D2 D(D1)
> #define D4 D(D2)
> #define D8 D(D4)
> #define D16 D(D8)
> #define D32 D(D16)
> #define D21 D(D32 D16 D2)
32a40
>  uint32_t t0,t1;
55c63,64
<  D21 D21 D21 D21 
---
>  t0=SysTick->VAL;
>  D21 D21 D21 D21
56a66
>  t1=SysTick->VAL;
66a77
>  Serial.println( ((t0<t1)?84000+t0:t0)-t1 );
$

sketch_jun11b.ino (1.9 KB)

Hello Hermann

Congratulations ! You find the courage to work on the speed of light even when at the café :))))

I did some trials to duplicate your experiment with the NOP (see code below). Once wait states adjusted for read and write operations, and interrupts disabled (in option) alles gut !

uint32_t t;
void setup() {
/* Avoid all interruptions which take a lot of time by
  "pushing" registers during prologue and "poping" registers during epilogue 
  on the stack.

  Avoid any call to a function for the same reasons if you don't want overhead
*/ 
 WDT->WDT_MR = WDT_MR_WDDIS;            // Watchdog disable
 
 EFC0->EEFC_FMR &=~ EEFC_FMR_FWS_Msk;
 EFC0->EEFC_FMR |= EEFC_FMR_FWS(3);     // After initialization, you have 4+1 wait states
                                        // here 3+1 wait states for read and write operations
                                        // the minimum without hanging
                                        // will give 889 clock cycles for 888 Nop !!
                                        // The main reason why you get 1000 clock cycles for 888 Nop
__asm__ __volatile__(
  ".macro NOPX                  \n\t"
  ".rept 1000                   \n\t"   // adjust the number here
  "NOP                          \n\t"
  ".endr                        \n\t"   // End of Repeat
  ".endm                        \n\t"   // End of macro
  );
}

void loop() {
 //noInterrupts(); 
 t=SysTick->VAL;
__asm__ __volatile__("NOPX"); // The preprocessor replaces the macro by 1000 NOP, this is not a call
 t-=SysTick->VAL; 
 //interrupts();
 if (t<0) (t = t + SystemCoreClock/pow(10,3));
 Serial.begin(250000);
 Serial.print(" 1000 NOP take exactly "); Serial.print(t-1);Serial.println(" Clock Cycles");
 
 /*If you comment out the NoInterrupts and interrupts line, you get 1001 cycles for 1000 Nop
 if you comment these lines, you get 1003 cycles for 1000 NOP
 */
delay(5000);

}

Your last solution seems interesting, but the timing tricky to tune. The beauty of DMA controller is that it works while the CPU can do other things (see the benchmark in :
AT07685: CPU Usage Demonstration using DMAC Application NOTE, page 29 , Table 8.1)

The CPU is idle 96% of the time during which the DMAC is active !!

You could, for example, enable DMAC to register sensor 1, then wait (the CPU) exactly 100 clock cycles, then disable DMAC, then analyze the stored half words.

Every technical solution is a trade_off as used to say the great Confucius.

Thanks for that sketch, I was able to see 1003 sometimes, but when I commented out the noInterrupts()/interrupts() it increased to 1007 and did not go down to 1000. But reducing "4+1" to "3+1" wait states is really dangerous. Most time your sketch did output nothing but did hang, pressing Reset button did not get it to work. So I "buy" the noInterrupts(), but not the wait state reduction.

I played with the laser testbed shown above. I was able to capture the falling edge (I turned laser on and immediately off) after 34μs plus some clock cycles several times. That really surprised me because I did not expect such a precise reproducability because of the "delayMicroseconds(34)" statements.

Now I worked on capturing the rising edge, that happens 62(87) clock cycles @192(84) MHz CPU clock after laser pin gets set high and low in next statement [btw, I had to add a L9110S motor controller to the game, it allows the 3.3V Due to drive 5V laser transmitter I have. I already ordered new module less 3.3V laser transmitters]. With sketch attached and shown at bottom of this posting I was able to reliably capture the rising edge. And I did so with 192MHz overclocking, and I can capture 8 port B values with 3 clock cycles delta times. So the time from one of the 8 port values per line to the next is only (1000/192)3 = 15..6ns, just a little bit more than a "normal" (84MHz) Arduino Due clock cycle. Light only travels 4.73m in that time -- that is good for my 29=18m measuring distance in kitchen+living room. "FD..." has pin 2 (B.27) low, "FF..." has that pin high:

---------
FDFFFFFF FDFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF 87 30 
FDFFFFFF FDFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF 87 30 
FDFFFFFF FDFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF 87 30 
FDFFFFFF FDFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF 87 30 
FDFFFFFF FDFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF 87 30 
FDFFFFFF FDFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF 87 30 
FDFFFFFF FDFFFFFF FDFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF 87 30 
FDFFFFFF FDFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF 87 30 
FDFFFFFF FDFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF 87 30 
FDFFFFFF FDFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF 87 30 
FDFFFFFF FDFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF 87 30 
FDFFFFFF FDFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF 87 30 
FDFFFFFF FDFFFFFF FDFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF 87 30 
FDFFFFFF FDFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF 87 30 
FDFFFFFF FDFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF 87 30 
FDFFFFFF FDFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF 87 30 
FDFFFFFF FDFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF 87 30 
FDFFFFFF FDFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF 87 30

The whole measurement is repeated in loop(), with 192MHz for measurements and 84MHz for serial output. The different number of NOPs needed to adjust to the rising edge of laser sensor is prove that the clocking really changes. Lines 7 and 13 show that sometimes the rising edge is detected one measurement later. But overall a very precise and reproducible measurement.

One fact I find interesting is that although the laser is turned on and immediatly off few clock cycles later, the laser pulse gets emitted for 34μs as the falling edge mentioned above shows. Seems to be the laser transmitter itself, or the L9110S.

While I did the first experiments in a darkened, but still lit room, I worked today in complete darkness (only laptop screen and keybord lit, both not in direction of laser sensor). This seems to be the reason for the reproducability of the experiment although laser pulse lasts only 34μs (light travels 10.3km in that time).

Here is the current sketch with 15.6ns timing resolution port captures:

// B.25 = D2
Pio *p = digitalPinToPort(2);

// C.23 = D7
Pio *q = digitalPinToPort(7);
uint32_t b7 = digitalPinToBitMask(7);

void setup() {
  pinMode(2, INPUT);    // laser sensor
  pinMode(6, OUTPUT);   // "motor" direction  (L9110S) 
  pinMode(7, OUTPUT);   // laser transmitter  (L9110S)

  digitalWrite(6, LOW); // "motor" fixed direction
  digitalWrite(7, LOW); // laser off
  delay(200);

  Serial.begin(57600);
  while (!Serial){}
  Serial.println(".");
  Serial.println("---------");

/*
 WDT->WDT_MR = WDT_MR_WDDIS;      
 // Watchdog disable
 
 EFC0->EEFC_FMR &=~ EEFC_FMR_FWS_Msk;
 EFC0->EEFC_FMR |= EEFC_FMR_FWS(3); 
*/ 
}

void loop() {
  uint32_t a1,a2,a3,a4,a5,a6,a7,a8;
  uint32_t t0,t1,t2;

  Clock_System(31);  // set 192MHz CPU clock
  noInterrupts(); 
    
  q->PIO_SODR = b7;  // turn laser on ...
  q->PIO_CODR = b7;  // ... and directly off again
  t0=SysTick->VAL;

  // add some short delay to catch rising edge of laser sensor
  //
  asm volatile(".rept 68\n\tNOP\n\t.endr");  // for 192MHz
//asm volatile(".rept 50\n\tNOP\n\t.endr");  // for 84MHz
  t1=SysTick->VAL;
  
  // capture port B every 3 clock cycles
  //
  a1 = p->PIO_PDSR;
  a2 = p->PIO_PDSR;
  a3 = p->PIO_PDSR;
  a4 = p->PIO_PDSR;
  a5 = p->PIO_PDSR;
  a6 = p->PIO_PDSR;
  a7 = p->PIO_PDSR;
  a8 = p->PIO_PDSR;
  t2=SysTick->VAL;
  
  interrupts();
  Clock_System(13);  // set 84MHz CPU clock
  
  Serial.print(a1,HEX); Serial.print(" ");
  Serial.print(a2,HEX); Serial.print(" ");
  Serial.print(a3,HEX); Serial.print(" ");
  Serial.print(a4,HEX); Serial.print(" ");
  Serial.print(a5,HEX); Serial.print(" ");
  Serial.print(a6,HEX); Serial.print(" ");
  Serial.print(a7,HEX); Serial.print(" ");
  Serial.print(a8,HEX); Serial.print(" ");
  Serial.print( ticks_diff(t0,t1) ); Serial.print(" ");
  Serial.print( ticks_diff(t1,t2) ); Serial.print(" ");
  Serial.println() ;

  delay(1000);
}


// only for durations <1ms
//
uint32_t ticks_diff(uint32_t t0, uint32_t t1) {
  return ((t0 < t1) ? 84000 + t0 : t0) - t1;
}


// set system clock to  (1 + clock_mul)*6 MHz
//
void Clock_System(uint8_t clock_mul) {

#define SYS_BOARD_PLLAR     (CKGR_PLLAR_ONE \
                            | CKGR_PLLAR_MULA(clock_mul) \ 
                            | CKGR_PLLAR_PLLACOUNT(0x3fUL) \
                            | CKGR_PLLAR_DIVA(0x1UL))
#define SYS_BOARD_MCKR      ( PMC_MCKR_PRES_CLK_2|PMC_MCKR_CSS_PLLA_CLK)   

/* Initialize PLLA to X MHz */
PMC->CKGR_PLLAR = SYS_BOARD_PLLAR;
while (!(PMC->PMC_SR & PMC_SR_LOCKA));
 /* Setting up prescaler */
PMC->PMC_MCKR = SYS_BOARD_MCKR;
while (!(PMC->PMC_SR & PMC_SR_MCKRDY));

 /* Update SystemCoreClock */
SystemCoreClockUpdate();  

/* For your experiment you don't need to re-synchronize UART 
 so don't serial print when your uc is running at 236 M Hz */
}

Hermann.

sketch_jun14b.ino (2.72 KB)

Not completely done, did make some progress and walked some dead ends (Intel).

I built a new compact Measure Speed of Light setup from the parts of the original Ikea Ivar experiment at top of this thread:

I placed a mirror just in front and made the sensor from last experiment mentioned above lit. What really surprised me is that although new setup/new laser sensor​s/new laser transmitter/another Due the unchanged sketch hits rising edge!

This was first time capturing of both laser sensors rising edge with similar distance mirror​s (F9…➫FD… FD…➫FF…). There are 4-5 FDs, but should be 0 or 1:

Next I realized that overclocking Arduino Due above 102MHz does not really make sense, see Overclocking Due over 102MHz gives nothing :frowning:. I did the runs with 192MHz and assumed 3 clock cycles between measurements having dropped from 35.7ns to 15.6ns -- it seems it only dropped to 28ns based on the other posting's measurements:

Setting up the mirror placements always was a very fragile thing, and my big hands did not make it easier. I thought on how I can fine tune mirror inclination and thought on a screw to be needed. Finally I used Lego bricks with a skrew and now have a really robust mirror fine tune setup:

Today I entered the "Intel dead end" for better precision timing. I realized that a very old 1.6GHz Thinkpad T42 has a parallel port [I did build a complete water cutting machine control (with 4000 [Bar](Bar (unit) - Wikipedia) pressure) over parallel port back in 1987!]. And parapin lib immediately worked under Linux:

Two consecutively executed RDTSC commands capturing 64bit Intel CPU clock counter show a difference of 44 which is 27.5ns. Even worse, without parapin lib a single inb() reading in 8 lines of parallel port (compared to 32 lines for Arduino Due read port) takes >1000 clock cycles (or 625ns) -- Due takes only 28ns for reading 32 lines!

I thought that operating system may be the reason for such slow port read, and found nice article Hello, World on the Bare Metal. Based on that I extended the little Hello World assembler program to measure and display the number of clock cycles in Intel real mode (program is stored and executed on master boot record of USB stick, maximal 510 bytes!). The 000EE95F or similar I measured on repeated reboots of the T42 were really dissatisfying, compare to only 44 under Linux (programs, Makefile, README can be found in zip of the tweet):

Summary, Intel processors have few GHz processors, but they are unusable for reading parallel port with 1-2digit nanosecond between measurements, which Arduino Due easily can (35.7ns) even without overclocking! So no Intel anymore, will continue working with Arduino Due.

Hermann.

I did play a lot with (5$) Raspberry Pi Zero recently, and especially its 1ns precision (1GHZ) cycle counter:
https://www.raspberrypi.org/forums/viewtopic.php?p=1017550#p1017550

While 1ns precision sounded promising, a read from GPIO does need 50ns is best case, and hat is worse than the 3 clock cycle per read described above for (12$) Arduino Due, 3*(1000/84)=35.7ns per read.

Hermann.

(8$) 0.83ns precision (1.2GHz quad core) Nanopi Neo is even worse than Pi Zero (92.5ns between GPIO reads),
http://www.friendlyarm.com/Forum/viewtopic.php?f=47&t=273&p=832&sid=cf73f9f4abc39f7f42f18be32c25e705#p832

This posting did show that Arduino Due can even do 28Msps (Million Samples Per Second) until its RAM is fully filled (3 clock cycles between consecutive GPIO reads for 10000s), 32CH for 714μs or 8CH for 3ms.

Summary: Arduino Due is best for high Msps GPIO reads and outperforms Raspberry Pi Zero as well as NanoPi Neo.

The timing of the measurements is really precise, I did show that before.

This is the repeated pattern from today's measurement:

...
FDFFFFFF FDFFFFFF FDFFFFFF FDFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF 84 30 
FDFFFFFF FDFFFFFF FDFFFFFF FDFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF 84 30 
FDFFFFFF FDFFFFFF FDFFFFFF FDFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF 84 30 
FDFFFFFF FDFFFFFF FDFFFFFF FDFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF 84 30 
FDFFFFFF FDFFFFFF FDFFFFFF FDFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF 84 30 
...

The border floated a bit, and I knew that it is dependent on light (therefore I do the measurements in near darkness normally). Today I changed the cabling in order to connect my 100MSps logic analyzer and that affected the timings as well:

This is how the logic analyzer sees the experiment, channel 0 is the signal sent to laser diode, channel 1 the signal received from the laser sensor:

I did use the Saleae Logic software export function the first time, I captured 6 seconds and since the sketch repeated the experiment every second the data reflects that:

Time[s], Channel 0, Channel 1
0.000000000000000, 0, 0
0.079313560000000, 1, 0
0.079313590000000, 0, 0
0.079314630000000, 0, 1
0.079334550000000, 0, 0
1.079328010000000, 1, 0
1.079328040000000, 0, 0
1.079329090000000, 0, 1
1.079348900000000, 0, 0
2.079342460000000, 1, 0
2.079342500000000, 0, 0
2.079343550000000, 0, 1
2.079363500000000, 0, 0
3.079356920000000, 1, 0
3.079356950000000, 0, 0
3.079358000000000, 0, 1
3.079377870000000, 0, 0
4.079371370000000, 1, 0
4.079371400000000, 0, 0
4.079372450000000, 0, 1
4.079392270000000, 0, 0
5.079385920000000, 1, 0
5.079385950000000, 0, 0
5.079386990000000, 0, 1
5.079406860000000, 0, 0

These are the interesting differences, when timing starts (t0=SysTick->VAL) after laser diode has been turned off and then (t1=SysTick->VAL) just before the laser sensor receives the light beam the first time:

Time[s], Channel 0, Channel 1
...
0.079313590000000, 0, 0
0.079314630000000, 0, 1
...
1.079328040000000, 0, 0
1.079329090000000, 0, 1
...
2.079342500000000, 0, 0
2.079343550000000, 0, 1
...
3.079356950000000, 0, 0
3.079358000000000, 0, 1
...
4.079371400000000, 0, 0
4.079372450000000, 0, 1
...
5.079385950000000, 0, 0
5.079386990000000, 0, 1
...

Only the 1st and 6th show a difference of 1040ns (or 1.04μs), the others show 1050ns difference (100MSps resolution is 10ns).

t2=SysTick->VAL is 84+30=114 clock ticks after turning the laser diode signal off, so the rising edge of channel 1 in logic analyzer is somewhere between 84+43=96 and 114-43=102 clock ticks.

Hmmm, 105084/1000=88.2 and 104084/1000=87.36, which indicates 88 clock ticks between falling edge of channel 0 and rising edge of channel 1 ...

Hermann.

Hi HermannSW,
An interesting project your working on and nice to know how the big gun CPU's are poor at I/O though I'm not sure if it's the CPU at fault or the southbridge chip.
As you seem to have tried several different CPU/MCU combinations I wonder if you had tried a PIC MCU?
Specifically a PIC chip that has a Charge-Time Measurement Unit (CTMU) built in, the CTMU can be used to measure time resolution to within a nanosecond (see this).
I cheated with my project here and used and external timing chip (ACAM GP22) to do the timing for me. I had considered trying the same experiment using light instead of electricity and even got as far as buying a light splitter but never continued with the test (bigger and better fish to fry).

Keep up the good work.