40kHz is one every 25 µsecs - which is fast by Arduino standards. digitalWrite() takes about 8 µsecs IIRC but digitalWriteFast() is nearly as quick as port manipulation - just a few instructions so maybe 0.25 µsecs
It would be easy to write a short test program to measure how long the instructions take. Just take the time for 10,000 or 20,000 using micros().
...R