Multi-channel PWM outputs (square wave), what code?

What kind of code would be best for multi-channel PWM? This is for a 2ms pulse, technically 40% duty cycle at 200Hz (5ms period) - I only care that the pulses are 2.000ms long, I don't really care when they occur. So far I've seen things like the Fast digital I/O Library, the Timer2_Counter Library, direct port manipulation (a bit alien to me), etc. I ask because I've read about the time it takes for digitalWrite to actually switch the pin state (4-5.8us), and seeing that micros only has a resolution of 4us. Sometimes rounding up or down, resulting in 8-16us variance. I'm looking for less than 2us max.

I'm designing a microcontroller that's used to test multiple high-speed solenoid valves at a time (multiple output pins). Specifically it's used to match large batches into matched sets. To do so, I'll be sending a square wave to each in sequence (turn one on & off, then repeat with the next) with a 2ms pulse. My goal is to get the pulses to be w/in 0.1% or better but w/o some of the low-level code I've seen. That means a variance of 1us. I don't care exactly WHEN the pulse happens, I only care how long the signal is HIGH.

This is for a 2ms pulse, technically 40% duty cycle at 200Hz

Technically, that would be "rectangular", not "square".

If you're worried digitalWrite is too slow, you could use direct port manipulation.

digitalWrite() and digitalWriteFast() won't be accurate enough. You need to use a timer library or address the timers directly to get that level of precision.

It looks like direct port manipulation will work fine. It is a relatively simple output sequence, only one output HIGH at a time and setting it LOW before changing the next output.

Will the delaymicroseconds() function work well enough? I've read it's accurate when the pause is over 3us, and I plan to use delays between 500us to 3000us. I also found a _delay_us() function/library that would probably get be closest, I just haven't used it before.

In the general case, delayMicroseconds() won’t work well enough. For this purpose, it is probably good enough.

It is accurate for very small delays because it uses a different method to time short delays. For longer delays then it’s reliant on a timer with only 4us resolution on the 16MHz Arduinos. So delayMicroseconds(500) and delayMicroseconds(501) will usually have the same duration. (But not always - they could be 4us apart too.)

_delay_us() sounds bad. With a name like that it’s not meant to be used in user programs. you should be calling another function, which may actually call this one.

All my delays will be divisible by 4, so I presume that may help if I try using delayMicroseconds().

The _delay_us() comes from the link below. The problem is I can’t find much about this “Atomic.h” beyond the code at the bottom.

Both guys at the bottom said they were able to use it to tune a very accurate pulse. If I have to, I can tune a variety of the target pulses I’m looking for. I’m more concerned about the on-time length than the time the output is off.