I have a board that is designed and produced. It was designed to use an Atmega328 with the internal clock. There is no space on the board to place an external clock. After completing these and testing them I have found that 8Mhz is too slow to complete my main loop actions seamlessly. The main loop involved shifting data out to 20 registers. Is there any way to speed this up without going back to the drawing board on the PCB design? Can another AVR be substituted as a drop in replacement with a faster internal clock? Can the internal clock be made to run faster or doubled to achieve 16Mhz? Or can code optimization double the speed of my display loop?
The application is a matrix display that has 50 columns and 7 rows. The registers are the display buffers and represent the current state of the display. Because of the multiplexing in the matrix, only 1/5 of the pixels are illuminated at any given time. In order to trick the eyes into seeing all lit at the same time the speed needs to be faster. The display flickers with the interal clock. If I connect and UNO board instead of using the on board ATMEGA328 (QFP) it works good. So the difference between 8Mhz and 16Mhz is visible. If I can double the speed of the display loop than that is a solution. Below is the loop, for reference, latchPin = 8, clockPin = 12, dataPin = 11.