Hi,
I want to be able to read values from a ring buffer of bytes as fast as possible.
I think the absolute fastest is to simply increment the lo byte of a pointer to a 256-byte buffer
(and somehow arrange for the buffer to be on a 256-byte boundary)
However, this wastes a lot of RAM since I only really need 16 bytes or so. (I did this, now need more RAM )
The next fastest I've found is to arrange a pointer to a 16-byte buffer aligned to a 32-byte boundary, increment the lo byte of the pointer and clear the 4th bit.
i.e.
union _bp{
volatile uint8_t *ptr; // we set this pointer into sampleBuffer at a 32-byte boundary
struct _hl{
uint8_t lo; // and use this as the index by incrementing then clearing bit 4
uint8_t hi;
}hilo;
}samplePtr;
ISR(TIMER2_COMPA_vect)
{
val= *samplePtr.ptr; // extract value from ring buffer
samplePtr.hilo.lo++; // directly increment pointer lo byte
samplePtr.hilo.lo &= 0xEF; // clear bit 4
...
...currently I allocate a 48-byte buffer and search for an appropriate entry point and set 'samplePtr', which wastes 32bytes.
So... a couple of questions:
-
Is there any faster way of implementing such a ring buffer ?
-
Is there some compiler directive/pragma/whatever which would align a 16-byte buffer to a 32-byte boundary?
Yours,
TonyWilk
P.S.
If anyone's interested, this is why...
I'm writing code for an Arduino Pro Mini (ATmega328) which has to update a bunch of serially-addressed LEDs and output audio at 16Kz sampling.
The LEDs are a pain because the datarate is stupidly high and the chain of LEDs resets if there's more than about 6uS gap in the bit train. The Audio out is a pain because even small delays in sample outputs generates an audible 'click'.
So, the LEDs are driven in mainline code and the PWM updates are done by the Timer2 ISR
The ISR is pared down to the absolute minimum: get a sample from a buffer, stuff it in the PWM register.
It is this indexing into the sample buffer which is critical.