Go Down

Topic: 2 Mbps data stream to computer (Read 2367 times) previous topic - next topic

bobloblaw651

Jul 08, 2015, 09:39 pm Last Edit: Jul 08, 2015, 09:46 pm by bobloblaw651
Hi all,

I want to use the Due to get the state of one digital I/O pin at a sampling rate of 2 MHz. I then want to pass along this state (either a 1 or 0) to a computer via any available method, and have the computer process it and write it to a text file.

So far, I have tried using the following code for digitalReadDirect in a while loop to collect 32 bits and then push it out to the Serial Monitor on the computer.

Code: [Select]

inline int digitalReadDirect(int pin){
  return (g_APinDescription[pin].pPort -> PIO_PDSR & g_APinDescription[pin].ulPin);
}

int state;
int cnt = 0;
uint32_t inPin = 22;
unsigned long sending;
unsigned long time1;
unsigned long time2;
unsigned long time3;

void setup() {
pinMode(inPin,INPUT);
Serial.begin(250000, SERIAL_8N1);
}

void loop() {
while(1)
{
  time1 = micros();
  while (cnt<32)
  {
    sending |= ((digitalReadDirect(22)==HIGH)<<(31-cnt));
    cnt++;
  }
  //digitalWriteDirect(outPin, state);
  time2 = micros();
  Serial.print(sending);
  cnt = 0;
  sending = 0;
  time3 = time2 - time1;
  Serial.println(time3);
  delay(1000);
}
}


However, it seems that the digitalReadDirect takes around 1 microsecond to execute (meaning the maximum achievable rate for data collection is about 1 MHz). Is there any faster way to read the state of the pin.

Additionally, I am realizing that writing the data to the Serial Monitor will take time so data will be lost while the Arduino is writing the 4 bytes. The data is an output from a sensor and I can't lose any bits.

Any guidance or help would be greatly appreciated. Thanks.

earx

when you don't have to right interfaces to dump this to your pc continuously you can record snapshots (of max 100 kilobyte) and dump those to pc afterwards.. you can use the SAM3X's DMA Controller (DMAC) to read out the PIO pins. probably 2 megasamples/s is well within the limits of the DMAC's capacities, but it will require some hours of programming to get all of this to work.

LMI1

Have you thought about assembly programming? A modern C program is fast, but you may get better results with some assembler and C.

MorganS

"May get better results" is not exactly a promising direction to invest a week's worth of effort. Just converting C code to assembler is what the compiler does, and it does it very well. To improve the throughput significantly, you need a better algorithm.

Serial is using the programming port. This is limited by the serial speed between the SAM3X and 16U2 chips on the Due board. SerialUSB is much better - this uses the native port directly attached to the SAM3X. With native USB you can achieve the full speed specification of USB2.0 480Mbps. You won't get that speed without some work but maybe the Arduino SerialUSB library can get the speed you need without modification.

The second thing to change in your sketch is the output method: You are using Serial.print() which will convert the 32-bit integer into (up to) 10 ASCII characters. Serial.write() will write an 8-bit byte directly. Change the main loop to only record 8 bits of data and then SerialUSB.write() will probably be able to send at the speed you need. The other end will need to record this binary data and then turn it back into whatever real output you need.
"The problem is in the code you didn't post."

AdderD

If you're looking for real speed first do what MorganS said. Then, find out what the compiler is doing with this:

Code: [Select]

inline int digitalReadDirect(int pin){
  return (g_APinDescription[pin].pPort -> PIO_PDSR & g_APinDescription[pin].ulPin);
}


The chip can do 84 instructions per microsecond but the above could naively be turned into many assembler instructions so perhaps you're right about it running at about 1 million cycles per second. If not optimized away the function first grabs an entry out of g_APinDescription. That entry is a data structure. So, it then grabs the pPort entry in that structure. Wouldn't you know, another look up. So, get the PIO_PDSR entry within that data structure. Now you have the raw value returned by that port (all 32 pins). It then does another look up all over for the same entry in g_APinDescription and gets the pin mask (in reality the optimizer will be nearly certain to optimize this look up away since it already did that look up). That pin mask is bitwise and'ed with the raw port value to get the state of that one pin. I don't know how the optimizer will handle all of that but you're probably still looking at a few lookups and the bitwise AND. Instead, since you're always using the same pin just cache the memory location of

Code: [Select]

g_APinDescription[pin].pPort -> PIO_PDSR


Then, you can read that memory location whenever you want for just a single memory access. Also cache the bit mask and do it directly. Now it's a memory access followed by a bitwise operation. If the compiler is really sneaky it might get almost that good but I doubt it can quite get there because the function is still geared toward being generic. Thus, you should get better performance by caching and hard coding for your specific pin. But, look at the assembly dump of your compiled code first because modern compilers tend to be pretty tricky. Maybe it does better than I think.

bobloblaw651

Thanks for your help Collin. Caching the mask location sped up the rate a bit, but I couldn't figure out how to implement your first suggestion.

I don't exactly understand how to cache the location of the port in memory to be able to read it in a single access.

My initial thought was just to set a uint32_t equal to g_APinDescription[pin].pPort -> PIO_PDSR in the setup portion of the code, but then I realized this would just store one value of it.

What is the proper way to do this? Thanks

AdderD

An easy approach is to use a pointer:

Code: [Select]

uint32_t *someVariable = &g_APinDescription[pin].pPort->PIO_PDSR;


Then you can get the value any time you want with:
Code: [Select]

otherVariable = *someVariable;

bobloblaw651

Thank you so much, your advice helped get the sampling rate into the 4 Mbps range. I really appreciate your guidance.

Now, though I would like to implement delays in between samples to get the sampling frequency to exactly the rate I need. Do you know of any libraries or macros that would give me nanosecond resolution for delays?

Also, it seems that SerialUSB.write() is taking a while to output the data when using the serial monitor, do you know of any code/programs to approach the theoretical max of USB 2.0 speeds?

Thanks so much for your help.

AdderD

Thank you so much, your advice helped get the sampling rate into the 4 Mbps range. I really appreciate your guidance.

Now, though I would like to implement delays in between samples to get the sampling frequency to exactly the rate I need. Do you know of any libraries or macros that would give me nanosecond resolution for delays?

Also, it seems that SerialUSB.write() is taking a while to output the data when using the serial monitor, do you know of any code/programs to approach the theoretical max of USB 2.0 speeds?

Thanks so much for your help.
Well, the processor is 84MHz so executing a single instruction will delay you by 11.9ns. You can thus get delays of multiples of 11.9ns by placing instructions in your code. Perhaps the easiest approach here might be to declare a variable "volatile" and then create a loop or do some increment operations on it. An increment on a volatile variable should not be optimized away even if you don't do anything with it. So, variable++; should take one cycle to execute. If you do a couple of those you will delay a little bit. You'll have to play with it to get a feel for what works.

Unfortunately, USB port access is a black art and the computer is in control so you might get delays that are beyond your control. But, I've gotten it to work fairly quickly even with SerialUSB.write.

bobloblaw651

A quick followup question:

Is there a function or method that would allow me to find out how many ticks a line of code takes to execute (I want to know exactly how many NOPs to add to be able to sample at 2MHz (a period of 500ns)).

The code I am interested in finding the number of ticks for is:

dataBit = *port;
dataByte |= (!!(dataBit & mask))<<7;

Thanks so much for all of your help

bobloblaw651

#10
Jul 29, 2015, 09:02 pm Last Edit: Jul 29, 2015, 09:45 pm by bobloblaw651
A followup question: I have been trying to rewrite the code directly in assembly to gain however much efficiency that will give me, but I'm having a little trouble converting the code to assembly.

I have been trying to more or less transfer it from the assembly dump of the original C code, but I have yet to figure out how to properly store the memory location of the port status.

Here's what I have so far:

Code: [Select]
volatile uint32_t *port = &(g_APinDescription[22].pPort -> PIO_PDSR);
byte sending;
uint32_t mask = g_APinDescription[22].ulPin;


void setup() {
  // put your setup code here, to run once:
pinMode(22, INPUT);
Serial.begin(250000);
}

void loop() {
  // put your main code here, to run repeatedly:
  __asm__ volatile( "LDR R0, [%0]\n\t" ::"r"(&mask));
 
  __asm__ volatile( "LDR R6, [%0]\n\t" ::"r"(&sending));

  __asm__ volatile( "LDR R2, %0\n\t" ::"m"(g_APinDescription[22].pPort -> PIO_PDSR));
  while(1)
  {
    __asm__ volatile( "LDR R1, [R2]\n\t");
    __asm__ volatile( "TST R1, R0\n\t");
    __asm__ volatile( "ITE EQ\n\t");
    __asm__ volatile( "ORREQ R6, #0\n\t");
    __asm__ volatile( "ORRNE R6, #128\n\t");
.
.
.
.
    __asm__ volatile( "STRB R6, [%0]\n\t" ::"r"(&sending));
    Serial.println(sending, BIN);
    __asm__ volatile( "MOV R6, #0\n\t");
    sending = 0;
  }
}


But it's not working the same way as the original C code from the first post.

I have tested all of the elements of this code and the one that is causing the issue is the line:
Code: [Select]
  __asm__ volatile( "LDR R2, %0\n\t" ::"m"(g_APinDescription[22].pPort -> PIO_PDSR));

How do I properly reference this to have the same functionality as the original C code?

Thanks for your help

Refernce: http://stackoverflow.com/questions/28761599/iar-inline-assembly-using-global-c-variable

Go Up