I'm working on a synthesizer project where I'm using the DDS method (digital direct synthesis) to produce a very high frequency resolution. Here's a description of a similar project: http://interface.khm.de/index.php/lab/experiments/arduino-dds-sinewave-generator/
Everything worked fine till I wanted to use a 128byte wabetable instad of a 256byte one. For the new wavetable I hade to make a bitshift to the right of 25 bits, because I needed just 7bits (128steps) to step through the whole wavetable. But now, the ISR needs to much time, and the whole aplication gets very slow. That meens that the main loop get started too infrequently. I think I's because of the bitshifting that takes too much time, but why is it a problem with bitshifting 25bits and is no problem with 24bits.
I'm very pleasd about any suggestion!
thanx lukas
volatile byte icnt;
volatile unsigned long phaccu = 0; // pahse accumulator in 32Bit
volatile unsigned long tword_m = 0; // dds tuning word m in 32Bit
ISR(TIMER2_OVF_vect) {
phaccu = phaccu + tword_m;
//icnt=phaccu >> 24; // use upper 8 bits for phase accu as frequency information for 256byte wavetable
icnt=phaccu >> 25; // use upper 7 bits for phase accu as frequency information for 128byte wavetable
//OCR2A=(float) pgm_read_byte_near(sine256 + icnt);
OCR2A=(float) pgm_read_byte_near(sine128 + icnt);
if(count++ == 125) { //timing for the main loop
divide = true; //main loop starts when "divide == true"
count=0;
}
}
is this line - OCR2A=(float) pgm_read_byte_near(sine128 + icnt); - not the problem?
An explicit convertion to float may cost time too : line - OCR2A=(float) pgm_read_byte_near(sine128 + icnt); -
As OCR2A is an "integer" type the question is if the conversion to float is needed.
the line - icnt=phaccu >> 25; - might be sped up --- see code -- by mapping the var also to a byte array in an union so only 1 shift is needed iso 25
you might give it a try...
union t
{
byte x[4];
unsigned long phaccu;
};
volatile byte icnt;
volatile t temp;
temp.phaccu = 0;
volatile unsigned long tword_m = 0; // dds tuning word m in 32Bit
ISR(TIMER2_OVF_vect) {
phaccu = phaccu + tword_m;
//icnt=phaccu >> 24; // use upper 8 bits for phase accu as frequency information for 256byte wavetable
//icnt=phaccu >> 25; // use upper 7 bits for phase accu as frequency information for 128byte wavetable
icnt = temp.x[0] >> 1; // icnt=phaccu >> 25
//OCR2A=(float) pgm_read_byte_near(sine256 + icnt);
OCR2A=(float) pgm_read_byte_near(sine128 + icnt);
if(count++ == 125) { //timing for the main loop
divide = true; //main loop starts when "divide == true"
count=0;
}
}
And as robtillaart already mentioned, you've got a lot of work ahead of you to speedup the code if you plan to do high frequency work as you mentioned. Some things to consider:
The extra 128 bytes of a 256 table is negligible if the code is already debugged and working - you may wish to go back to it and optimize after you're seeing positive results (the number 256 also gives you some machine code optimizations you may need later, like no masking of BYTEs, saving an instruction or two now and then).
Float on the Arduino is bad - a huge amount of instructions, and if you can avoid it until absolutely necessary, you'll see a big difference in speed.
but why is it a problem with bitshifting 25bits and is no problem with 24bits.
24 bits is exactly 3 bytes , for computers a "round" number, which can be optimized (by a similar assembly trick as I proposed)
the long is 4 bytes and shifting 24 means just "copy" the first byte of the 4
25 bits is 3 1/8 byte, not easy optimizable ... so probably it isn't. (I hope my handcoded optimization works.)
union t
{
byte s[4];
unsigned long x;
} volatile temp;
void setup()
{
Serial.begin(115200);
temp.x = 0xAA555555;
volatile byte y = 0;
unsigned long start = millis();
for (long i=0; i< 100000L; i++)
{
y = temp.x >> 24;
}
Serial.println(millis() - start);
start = millis();
for (long i=0; i< 100000L; i++)
{
y = temp.s[0];
}
Serial.println(millis() - start);
start = millis();
for (long i=0; i< 100000L; i++)
{
y = temp.x >> 25;
}
Serial.println(millis() - start);
start = millis();
for (long i=0; i< 100000L; i++)
{
y = temp.s[0] >> 1;
}
Serial.println(millis() - start);
}
void loop()
{}
Output
168
107
1245
126
This means the optimization of the shift 25 will probably work in your code too - factor 10! that is more than I expected.
Furthermore the shift 24 can be faster too; from 168 -> 107 = ~35% faster
byte y: 25; a bit with the size of 25 bits ??? that is quite good compression scheme I understand it is a filler
The test missing is a test to see shifting from 1-31 [ OK we did 1 allready]
With a variable (add this to prev sketch
for (int a=0; a <32; a++)
{
start = millis();
for (long i=0; i< 100000L; i++)
{
y = temp.x >> a;
}
Serial.println(millis() - start);
}
176 - 0
221
264
308
352
396 - 5
441
484
529
572
615 - 10
661
704
749
793
835 - 15
880
923
968
1013
1057 - 20
1100
1143
1188
1233
1277 -25
1321
1365
1408
1452
1496 - 30
1540
Average = 858
for shift 25 : use a constant: 1246 use a variable: 1277
tried to optimize with the var.
for (int a=20; a <32; a++)
{
start = millis();
for (long i=0; i< 100000L; i++)
{
if (a < 25) // should be 24 in fact !! can also be optimized.
y = temp.x >> a;
else
y = temp.s[0] >> (a-24);
}
Serial.println(millis() - start);
}
An explicit convertion to float may cost time too : line - OCR2A=(float) pgm_read_byte_near(sine128 + icnt); -
As OCR2A is an "integer" type the question is if the conversion to float is needed.
Unfortunately, I think I have to do this, because the whole code looks like this:
adsr is a float varable to controle the volume of the wavetable. If anyone has a suggestion for optimizing, I would be very happy!
robtillaart, thanx a lot for this detailed optimizing reserch! But i don't know how i can deal with the byte-array in my case. How can I do this line:
phaccu = phaccu + tword_m;
if "tword_m" is a float and "phaccu" is a byte-array, without a lot of other bitshifting.
The purpos of this line is that the "phaccu" get filled constantly with the step of "tword_m" and when it's over the 32bits, it starts from the beginning.
(sorry for my terrible english, I hope you can understand what I mean!)
if "tword_m" is a float and "phaccu" is a byte-array, without a lot of other bitshifting.
The purpos of this line is that the "phaccu" get filled constantly with the step of "tword_m" and when it's over the 32bits, it starts from the beginning.
In your original code, tword_m is not a float. If tword_m is an array step size, it must be an integer type.
Hmm this sounds very obvious! I realy dont need so many digits after the comma But I'm sorry, what are fixed-point variables again! I'm quite a newbee in programming!
You can't add set a whole array to anything in one statement.
Yea, thats exactly my problem.
The two main operations i have to do witch phaccu and tword_m are these:
phaccu = phaccu + tword_m;
icnt=phaccu >> 25;
robtillaart made the suggestion to do the bitshifting with phaccu as a byte array instead of an unsigned long. That's why I'm looking for a way to do the addition of a "unsigned long" with a byte-array. And for me, that seams to get more time-consuming than the bitshift of 25.
robtillaart, thanx a lot for this detailed optimizing reserch! But i don't know how i can deal with the byte-array in my case. How can I do this line:
Here a rewrite of your original routine, I added the "union trick", leaving your code intact except for the calculation of icnt so minimal change. If phaccu is not used outside this routine it can be even a tiny bit faster.
Please check if it is fast enough to solve your original problem.
volatile byte icnt;
// added to speed up shift
union t
{
byte b[4];
unsigned long l;
} volatile temp;
unsigned long phaccu = 0; // phase accumulator in 32Bit
volatile unsigned long tword_m = 0; // dds tuning word m in 32Bit
ISR(TIMER2_OVF_vect)
{
phaccu = phaccu + tword_m;
temp.l = phaccu; // work copy
// icnt=phaccu >> 24; // use upper 8 bits for phase accu as frequency information for 256byte wavetable
// icnt = temp.b[0];
// icnt=phaccu >> 25; // use upper 7 bits for phase accu as frequency information for 128byte wavetable
icnt = temp.b[0] >> 1;
//OCR2A=(float) pgm_read_byte_near(sine256 + icnt);
OCR2A = pgm_read_byte_near(sine128 + icnt);
if (count++ == 125) { //timing for the main loop
divide = true; //main loop starts when "divide == true"
count=0;
}
}
OCR2A=(float) pgm_read_byte_near(sine128 + icnt) * adsr;
adsr is a float varable to controle the volume of the wavetable. If anyone has a suggestion for optimizing, I would be very happy!
What is the range of adsr ?
Assumption based on discussion so far adsr is a float that represents the volume and it goes from 0.0 - 1.0 you can use a byte instead, lets name it adsrBYTE range 0..255. If the volume is read from analogRead(pin) you have a value from 0..1023 which needed to be divided by 4 to get adsrBYTE; OCR2A = pgm_read_byte_near(sine128 + icnt) * adsrBYTE / 256;