I have actually looked it up. Please correct me if I am wrong:
printf is basically a call to vfprintf and vfprintf has both assembly and __ftoa_engine usage.
Which means I go back to the same place where I started.
I do not have too much experience with AVR assembly and considering the IDE support for assembly I gave up on that immediately.
I read more about Dragon and Grisu and I think it is a lot more complex than what we need in Arduino. They have actually introduced new data types of up to 35 uint32_t array (for 64 bit float). It is very precious but I do not think we can afford that much processing for dtostrf. The solution in second link is slightly better because it is based on itoa and therefore I have to do less work to come up with a new dtosrf and bigger part of it has been tested.
Honestly deep inside I wish I could use __ftoa_engine(), because then everything would be supper easy. sigh.