The data sheet shows operating parameters at 3.3v but are there any downfalls? Many pins on the MEGA328 are going into a 3.3v FPGA. I don't want a boat load of level translators.
Software is being developed on a UNO R3 but the target hardware is a MEGA328 programmed with the HEX file from the UNO.
See section 28.4 of the data sheet(very little info provided; somewhere, I've seen more, but it might have been an errata sheet). The lower your voltage, the lower frequency you can run the CPU. For 3.3, I think you'll need to get down around 10 MHz. If you're using UARTS, TWI, SPI, etc. you'll need to look at the effects of CPU clock change.
Typically when running at 3.3V a frequency of 8 MHz is used. For example the Pro Mini that is "retired" but variants are still available from places like Adafruit and clones on Amazon.
The main use was lower power, but it could be a solution to you problem if the lower performance is tolerable.
You can modify an Arduino UNO (or copy) to run at 3v3. I have one on the bench that I modified. The basics are here:
You would be outside the manufacturers specs to run 3v3 @ 16Mhz. Assuming the slope is linear in the speed grade graph in the datasheet, the max recommended clock for 3v3 is around 13.333Mhz.
That sits nicely above the magic 11.0592Mhz crystal frequency that divides down nicely for lots of UART baud rates.