it does not work without a line of code between the digitalWrite´s, looks like then there is no change at all on the line.
Are you sure? This is what a digitalWrite(HIGH) followed by a digitalWrite(LOW) looks like on my Arduino without any assembler nops. The pulse width is 4.5 microseconds (this is much longer than needed but as you say, is fast enough for most applications)
The shorter pulse that follows was created using direct port IO and this would need nops to increase the pulse width (from 126 nanoseconds to around 1 microsecond)
The horizontal scale used above is 1 microsecond per division