Execution time for a single cycle machine instruction is 62.5ns on a 16MHz board. Time between two digitalWrite's would be in the order of several micro seconds, which then seems like "years" away from your "a few ns" requirement.
If you connect the two (or more pins) to the same AtMega output port (one port in direct io context can address 8 I/O pins), you can switch them on/off at the same time with a single instruction. Delay from high-to-low and/or low-to-high is then in accordance with rise/fall time as specified in the AtMega datasheet.
Example:
// switch on digial 2 and digital 3 (force high)
PORTD |= _BV(PORTD2) | _BV(PORTD3);
// switch off digital 2 and digital 3 (force low)
PORTD &= ~(_BV(PORTD2) | _BV(PORTD3));
// toggle digital 2 and digital 3 (change to low or high)
PIND = _BV(PIND2) | _BV(PIND3);
Writing to the port register in one parallel operation as you suggest is definitely going to be better than digitalWrite which is likely to be several clock cycles apart. Assuming you pick ports on the same register, they will transition at the same clock cycle edge.
The difference then will be down to differences in the transistors on the output stages of the IO pins. This will be affected by differences in their loading.
No, not a H-bridge.
I have to trigger two different scientific instruments with a maximum jitter of few ns.
I can't use a single pin because I have to trigger only one and both the instruments alternately.
When you say that I have to pick ports on the same register to have transition at the same clock cycle edge, what do you exactly mean?
Do you have an idea of the possible jitter doing so?
To BenF:
I can't find this information on the data-sheet, can you help me?
it tells you which pins are on which port registers.
E.g. PORTD maps to Arduino digital pins 0 to 7
When you set an 8 bit value into one of these registers, it sets to high the outputs for all the bits of that byte that are a 1, and it does that simultaneously (as much as anything is simultaneous).
But if you picked pins that were on different port registers, then you would not get this benefit.
I can't find this information on the data-sheet, can you help me?
Well - I did not find the propagation delay of the IO output stage either, but is this your real concern (a scope would give the answer)?
As for simultaneous switching, pins on the same port (the same micro controller register) will switch on the same edge of the same clock cycle, so literally there is no delay between them.
I've used bitWrite() to speed up writing to digital pins for things like SPI, where the delay in using digitalWrite() is too long. I've also just used the PORT commands to send a whole value out to PORTB or PORTD, which I learned when using PICs.
How often do you have to trigger your target instruments? You don't necessarily have to use a shift register to toggle the outputs, but by stuffing the register and then toggling the latch with the /CS you may be able to reduce the delay. Look at some datasheets for the shift regs that you're thinking of using and see what the propagation delay might be.
I would just use port manipulation if you don't have too many outputs. I'd write a function and pass the required pins to it, and have it do the port manipulation for me. You could also stuff a BYTE with the values, e.g. 00011000 for bits 4 and 5, and then AND the port with its ones complement to turn on the required pins. OR'ing with the original value would turn them off. By leaving the unused bits as zeroes, you wouldn't mess up the states of the other pins on the port either. Lots of options here.