Go Down

Topic: More speed from ATMEGA328 Internal Clock (Read 1 time) previous topic - next topic

twing207

May 11, 2017, 04:19 pm Last Edit: May 12, 2017, 05:18 am by twing207
I have a board that is designed and produced. It was designed to use an Atmega328 with the internal clock. There is no space on the board to place an external clock. After completing these and testing them I have found that 8Mhz is too slow to complete my main loop actions seamlessly. The main loop involved shifting data out to 20 registers. Is there any way to speed this up without going back to the drawing board on the PCB design? Can another AVR be substituted as a drop in replacement with a faster internal clock? Can the internal clock be made to run faster or doubled to achieve 16Mhz? Or can code optimization double the speed of my display loop?

The application is a matrix display that has 50 columns and 7 rows. The registers are the display buffers and represent the current state of the display. Because of the multiplexing in the matrix, only 1/5 of the pixels are illuminated at any given time. In order to trick the eyes into seeing all lit at the same time the speed needs to be faster. The display flickers with the interal clock. If I connect and UNO board instead of using the on board ATMEGA328 (QFP) it works good. So the difference between 8Mhz and 16Mhz is visible. If I can double the speed of the display loop than that is a solution. Below is the loop, for reference, latchPin = 8, clockPin = 12, dataPin = 11.


jremington

Quote
Is there any way to speed this up without going back to the drawing board on the PCB design?
You can't speed up the clock, but you might be able to rewrite the code for more speed.

septillion

#2
May 11, 2017, 04:36 pm Last Edit: May 11, 2017, 04:36 pm by septillion
I see a lot of shiftOut(). You could speed that up A LOT by using the hardware SPI.

And you could reduce you code A LOT by using arrays (more) ;)
Use fricking code tags!!!!
I want x => I would like x, I need help => I would like help, Need fast => Go and pay someone to do the job...

NEW Library to make fading leds a piece of cake
https://github.com/septillion-git/FadeLed

twing207

I looked into hardware SPI, but I am afraid my PCB has the wrong pins utilized for it? My registers are on latchPin = 8, clockPin = 12, dataPin = 11. Hardware SPI on the Atmega328 is MOSI = 11, MISO = 12, SCK = 13. Can this be made to work?

septillion

Nope (or maybe), that's why you protoype ;) shiftout is all in software and just not fast.

And the maybe is, make wire links to fix your error ;)
Use fricking code tags!!!!
I want x => I would like x, I need help => I would like help, Need fast => Go and pay someone to do the job...

NEW Library to make fading leds a piece of cake
https://github.com/septillion-git/FadeLed

twing207

Yea that's what I thought. Initial prototype worked, but it was using an uno as the controller and therefore had a 16mhz clock. Now we have a production run of boards that are the way they are. Too many to manually correct with 'wires'. I am hoping software optimization might swoop in and save the day. Any ways to make the software faster would be helpful. If I can eliminate half the instructions, or double the speed another way I should be good to go.

septillion

#6
May 11, 2017, 05:29 pm Last Edit: May 11, 2017, 05:30 pm by septillion
Yeah, that's kind of stupid... Make a prototype but change it without testing for the product...

And bodge wires where very normal up until like 10 year ago, even in big production runs ;)

But without the use of arrays changing the code alone already is a big task, damn. And without de full code even impossible.

Only small improvement I can think of is making a post manipulation based shiftout().
Use fricking code tags!!!!
I want x => I would like x, I need help => I would like help, Need fast => Go and pay someone to do the job...

NEW Library to make fading leds a piece of cake
https://github.com/septillion-git/FadeLed

MrMark

The atmega328 has provisions for calibrating the 8 MHz internal oscillator and typically can be tuned over a rather large range.  Figure 1-1 in the application note linked below suggests it can be tuned to something like 14 MHz at the high end.  This, of course, will have to be accounted for in the setup for any peripherals you might be using.

Atmel-2555-Internal-RC-Oscillator-Calibration-for-tinyAVR-and-megaAVR-Devices_ApplicationNote_AVR053.pdf

twing207

But without the use of arrays changing the code alone already is a big task, damn. And without de full code even impossible.

Only small improvement I can think of is making a post manipulation based shiftout().
How do you mean without the full code? I have the full code. I can make any changes necessary.... How slow is pin lookup anyway?

septillion

Use fricking code tags!!!!
I want x => I would like x, I need help => I would like help, Need fast => Go and pay someone to do the job...

NEW Library to make fading leds a piece of cake
https://github.com/septillion-git/FadeLed

twing207

The atmega328 has provisions for calibrating the 8 MHz internal oscillator and typically can be tuned over a rather large range.  Figure 1-1 in the application note linked below suggests it can be tuned to something like 14 MHz at the high end.  This, of course, will have to be accounted for in the setup for any peripherals you might be using.

Atmel-2555-Internal-RC-Oscillator-Calibration-for-tinyAVR-and-megaAVR-Devices_ApplicationNote_AVR053.pdf
Hmm this is interesting, when you say accounted for in peripherals. The shift registers are slave devices and get the clock from the ATMEGA. The only other peripheral I need is UART at 9600 baud. Can that still work with a high OSCCAL value?


Yeah, but we don't...
I am attaching it here.

MrMark

Hmm this is interesting, when you say accounted for in peripherals. The shift registers are slave devices and get the clock from the ATMEGA. The only other peripheral I need is UART at 9600 baud. Can that still work with a high OSCCAL value?
As I understand it, Arduino has a compile time parameter (f_cpu) that the peripheral libraries use to derive parameters for peripheral setup.  I've not used this myself, so perhaps we'll get appender with the relevant experience to comment on the procedure and limitations.  

Typically this would be used to configure a custom board with a non-standard oscillator and the bootloader would have to be corrected as well.  I think the model here is that you are still at a nominal 8 MHz for the internal oscillator in the bootloader, and the calibration register would be modified in the run time code, so the bootloader is ok without modification.  This presupposes that peripheral (re)configuration happens as part of the run time code which may not be the case.


Smajdalf

#12
May 11, 2017, 09:06 pm Last Edit: May 11, 2017, 09:12 pm by Smajdalf
I am not sure how good is optimizing in Arduino and what code are you using for shiftOut. But if it is the same as mine:
Code: [Select]

void shiftOut(uint8_t dataPin, uint8_t clockPin, uint8_t bitOrder, uint8_t val){
uint8_t i;

for (i = 0; i < 8; i++)  {
if (bitOrder == LSBFIRST)
digitalWrite(dataPin, !!(val & (1 << i)));
else
digitalWrite(dataPin, !!(val & (1 << (7 - i))));

digitalWrite(clockPin, HIGH);
digitalWrite(clockPin, LOW);
}
}

and if it is the part slowing your code most you should be able to make it much (10 times?) faster. I would remove the if statement and toggle pins by writing to registers. Also I am not sure if the compiler is able to optimize the "val&(1<<(7-i))" statement.
It MUST be possible to shift out 1 bit in 20 CK, that is 160CK per shift register, 3200 to shift out everything. If you sacrifice 50% of CPU time for shifting you will get 1 kHz refresh rate.

EDIT: 10CK per bit is probably too optimistic, fixed to realistic values.
How to insert images: https://forum.arduino.cc/index.php?topic=519037.0

septillion

Yeah, that's what I said.

And now you talk about the "val&(1<<(7-i))" statement, but sure to use LSB first, that's way faster because you can shift as you go. Just flip the logic/variables you use to match. That would speed up things a bit. Not SPI levels but faster :)
Use fricking code tags!!!!
I want x => I would like x, I need help => I would like help, Need fast => Go and pay someone to do the job...

NEW Library to make fading leds a piece of cake
https://github.com/septillion-git/FadeLed

CrossRoads

#14
May 11, 2017, 10:15 pm Last Edit: May 11, 2017, 10:16 pm by CrossRoads
"I looked into hardware SPI, but I am afraid my PCB has the wrong pins utilized for it? My registers are on latchPin = 8, clockPin = 12, dataPin = 11. Hardware SPI on the Atmega328 is MOSI = 11, MISO = 12, SCK = 13. "

Pity, another waste of the fast internal hardware.
Did you know that this takes just 17 clocks?
Code: [Select]

spdr = dataArray[x]; nop; nop; nop; nop; nop; nop; nop; nop; nop; nop; nop; nop; nop; nop; nop; // wait out the transfer


That's just over 1uS at 16 MHz, just over 2uS at 8 MHz.
Designing & building electrical circuits for over 25 years.  Screw Shield for Mega/Due/Uno,  Bobuino with ATMega1284P, & other '328P & '1284P creations & offerings at  my website.

Go Up