In this posting: Faster Pin I/O on Zero? - #39 by OMac @OMac wrote two useful functions digitalWrite_fast and digitalRead_fast, in a response to a question from @wholder . I've measured it, and it improves the speed from 350 kHz for toggling a pin to 1.4 MHz.
The thread is closed, so I can't reply there, but my solution might be interesting for others as well. So here is my code with C++, which makes the code much more readable, and the toggle speed is now 3.4 MHz (see last code at the bottom) :
Left as an exercise to the reader: Integrate the pinMode call