A tight loop toggling a single pin using direct port manipulation can get about 500kHz IIRC.
So you have 10:1 theoretical advantage but if you need to do three pins + clever control stuff as well it might be a struggle.
I'm looking to create as sharp a pulse as possible
Not exactly an engineering term, but the rise and fall times of a digital pin will I'm sure be fast enough.
You still need to answer Coding Badly's questions. For example if 1:256 is good enough granularity for the duty cycle some form of table lookup may be the way to go.
Duty cycle as low as possible.
Does this mean that the duty cycle is not variable, just a constant-width pulse?