Power consumption of ATMEGA32U4 during sleep Power Down higher than expected

Well... not a fix for your issue specifically, but I ended up changing MCUs.

Figuring out how to set all the registers to get best power saving on the 32u4 was well beyond my paygrade, but I did achieve success on a SAMD21, specifically a Seeeduino XIAO. The standby (aka sleep) power is in the mid 20's of uA with the only hardware change being removing the power LED. Perhaps even more interesting is CPU scaling which can be dynamically changed on-the-fly when not sleeping with a current of about 12 mA when running at 48 MHz, down to about 1.8 mA at 4 MHz (clock divisor of 12). Ex:

    GCLK->GENDIV.reg = GCLK_GENDIV_DIV(12) |
                       GCLK_GENDIV_ID(0);

The only downside I can see is that there is no compensation for the actual F_CPU in any other code, so things like delay(1) actually delays 12 ms when using a clock divisor of 12. That breaks timing critical things including USB so you need to disconnect USB before messing with the clock.

That might not be the answer you were looking for but in case it helps someone...