Timers are a bit complex. yes (that's why they are so flexible though). You can pulse the output directly in code using delayMicroseconds() like this:
void loop ()
{
digitalWrite (.., HIGH) ;
delayMicroseconds (12) ;
digitalWrite (..., LOW) ;
delayMicroseconds (13) ; // 12us + 13us = 25us, one cycle for 40kHz.
}
Except that at these speeds the time spent in the calls to digitalWrite will mean it will toggle somewhat slower than 40kHz... You can try tuning the delay values, values of 6 and 7 seem to be best.
Or adopt a more proper approach like this:
unsigned long last_time = micros () ;
void loop ()
{
while (micros() - last_time < 12)
{}
digitalWrite (3, HIGH) ;
last_time += 12 ;
while (micros() - last_time < 13)
{}
digitalWrite (3, LOW) ;
last_time += 13 ;
}
However this produces a jittery output as micros() takes some time to execute.