More speed from ATMEGA328 Internal Clock

Yea that's what I thought. Initial prototype worked, but it was using an uno as the controller and therefore had a 16mhz clock. Now we have a production run of boards that are the way they are. Too many to manually correct with 'wires'. I am hoping software optimization might swoop in and save the day. Any ways to make the software faster would be helpful. If I can eliminate half the instructions, or double the speed another way I should be good to go.