Q:Protocol/communication alternatives for wireless telemetry and remote control?

What are viable options and alternatives you can think of to accomplish this kind of wireless communication for telemetry and remote control?

Goal:
Interface Raspberry Pi controller with sensors/remotes, with main constraints being on ATmega side in terms of sketch size and RAM usage. RPi will send commands to, and receive notifications and telemetry from, these ATmegas.

Brief background:
I aim to build sensors based on ATmega (currently Arduino pro mini) and nRF24L01+. Some will also have remote controllable devices connected, such as lights.
My communication will be toward Raspberry Pi, running Python (preferably), interfacing nRF24L01+ over hardware driven SPI.
I'm using AES-256 for CBC-encryption and SipHash-2-4 as MAC to protect the data over the air from eavesdropping, MITM and replay attacks. (May sound like overkill but it's been fairly straight forward and simple.)

nRF24L01+ can send at most 32 bytes per transmission, so my initial 32-byte package will contain an 8-bytes start sequence including number of encrypted blocks, a 16-byte initialization vector for AES, and an 8-byte MAC for the first 24 bytes.
After this, encrypted blocks will arrive in 2x16 bytes, ending with a MAC. Or possibly interleaved with MACs so I can start to decrypt without waiting for the whole sequence, and thus reduce the need to buffer data. I will only decrypt that which has a valid MAC.

Constraints:
My largest sketch so far, with most debugging code disabled, takes up roughly 19 / 30 kB program storage and 1.5 / 2 kB RAM. It is mostly done, and already using as many of the intended function calls to external libraries as possible for a complete picture, although I have not yet implemented everything in detail.
Preferably the incoming data should be possible to process as a stream, up to 16 or 32 bytes at a time. If I can avoid parsing everything at once, I require less buffer space. (RAM is precious on the ATmega.)

Current alternatives:

  • Google Protocol Buffers using nanopb - adds roughly 7 kB program storage for encode/decode of simple sensor data.
    Benchmarks indicate the stack usage is potentially too high for me, although they do claim to use a very large message with all different types represented.
    The overhead is high, but this is otherwise a very tempting and flexible option.

  • Adafruit_Sensor framework, or something similar, with simple (and rigid) data structure.
    Will require a little bit more effort to design and implement for both ATmega and Python on Raspberry Pi.

(See initial question at the top. :))