Arduino USART To USB onboard bridge baudrate mismatch

This is a question for those only that are part of the community for long enough
AND involved enough to have any significant knowledge about the topic. Don't bother answering
if you are not such a person. Feel free to read the post for the reasons behind this rationelle...

Background:
I am develloping a DC motor driver using an arduino uno as a baseboard with the
(self designed) motor driver board stacked on top of it. The Arduino receives
controll strings of 4 bytes over UART (I will use serial, UART and USART interchangeably knowing that they
are technically different things entirely). For the start I am using the on board USB to serial bridge that
ships with the Arduino (I originally use an Uno board for the real thing but notched up to the Mega2560 for devellopment
for it's JTAG Debugging capabilities, which I have a debugger for (AVR Dragon) instead of the DebugWire of the Uno which I have nothing for) and then later backport the code to the lower-end chip.

How the problem showed itself (Symptoms) and analysis:
Sending Bytes over serial works at any Baudrate. Receiving works technically on every baudrate but echo-ing back the received bytes indicates
something going wrong (still on the Uno, on 115200 baud) no problems show up at 9600 baud however). Switching over to the Mega2560 the problem persists. Some digging with the JTAG debugging later shows
that the Bytes that are seemingly wrong (in the Echo) are actually what the processor received and thus there is no goof-up in the send routine (since the processor transmits to the best of its knowledge) and rather the bytes
are received wrong. Some digging and Logic-analyser-ing later I find that the bytes are send correctly by the on board USB to serial bridge (measuring on the TXD and RXD pins of the Mega2560)
and that the processor seemingly interprets the (single) stop bit as the most significant databit. Leading to the garbage data. Since it is common knowledge that garbage data is usually the
fruit of mismatched baudrates the bit timing is looked at more carefully. Reveiling that the bittiming for the RX data (coming from the onboard bridge) is
a) too fast (several percent)
b) also inconsistent (ranging from 8.3us to 8.7us for 115200 baud)
The Code on the main Processor side is tested and the register contents for the usart are so far consistent and do not change unexpectedly leading to the assumtion that that code is not
at fault. Especially the content of the UBRR0 Register is right ( I tested various baudrates inbetween the 9600 and 115200 marks some especially slower ones work the faster ones dont). The Register contents
are always within +-1 of the recommended setting in the datasheet for the given baudrate for the given crystal (the inaccuracies I suspect come from the Integer division in the UBRR calculating macro, which is a straight copy from the datasheet as well).
Changing the Register to the right value where it is not spot on does help for several baudrates but interrestingly for the 115200 baud the UBRR value is spot on at 0x0008 for a 16MHz crystal as recommended but yielding transmission problems.
Thus the fault is most likely in the atmega16u2 that serves as a bridge for uart to serial. Interrestingly with an external bridge that was specifically designed for that purpose (in my case the CP2102) the effective baudrate for 115200 is marginally slower than it is with the onboard bridge. But everything works just fine.
And indifferent of the baudrate the external bridge yields no transmission problems. Since 115200 is usually not an issue for an arduino even with the onboard bridge i am somewhat wondering what goes wrong inside that poor atmega16 that the baudrates vary so significant.
Also this is something that neither I personally nor any of my work collegues have ever encountered before. Making matters more strange instead of more clear.

Now you should be able to guess why this question has this specific subset of the arduino community as a target audience:
Is / Was there any point where such behaviour was encountered in the many years arduino is a thing now? Is there anything specifically known about the specific firmware of the bridge I have on my board? (I read the flash of that IC into a hex file and you find the SHA256 of it below)
that is anywhere near up that alley? Since I dont know where to find the changelogs for the USART to USB Bridge firmware (and honestly am not even willing to read through years of backlogs not even exactly knowing if there is anything to find in the first place and not even knowing what exactly to
look for...) I thought I could rather ask here if any body here recalls such a situation.

SHA256 of the USB to serial bridge firmware:
e8bfe5ce9253c8841ef1ca10a76eda051f1a8ef9f0fc1df9bff46fead917256a

Some words to end:
I am a student in embedded systems programming writing code for over half of my life now.
The Code for the main processor is tested. Since it works flawlessly with the external bridge I do explicitly not suspect the problem there. Thus I dont see any neccessity to post it anywhere...

Message to the moderators:
Should you happen to find any place better suited for this question within this forum feel to move it there.

It appears you are being hit with a stack up of tolerances. I ran into this problem many years ago with some early micros. What I did as a short fix was always send with two stop bits or more and receive with 0.5 or 1 if I could. This compensated for some of the skew. Also your timing has to be within 1/2 bit time or it will not work. Also look at the divider that is used to determine the USART timing. It most likely is not an exact divide for the baud chosen. A simple adjustment of this value may solve your problem. The slower the baud the more room you have for timing errors. This response is to help you get started in solving your problem, not solve it for you.
Good Luck & Have Fun!
Gil

for the 115200 baud the UBRR value is spot on at 0x0008 for a 16MHz crystal as recommended but yielding transmission problems.

As recommended, perhaps, but not "spot on."
You're using non-Arduino software, right?
You should be aware that the Arduino software sets the U2X bit, so the correct UBRR value is 16. This results in a "closer" 115200bps bit rate (2.1% off), which is why it was done. If you use 8 without UX2 set, you'll get 3.7% error IN THE OTHER DIRECTION. The combined error is more than the ~5% that Async can easily tolerate.

The 16u2 chip used for USB/Serial conversion on the (genuine) MEGA board has the same USART hardware, same clock rate, same baud rate generator, and the same firmware (w U2X decision), as the Arduino libraries, so if you use them the speeds should match exactly, and users won't notice a problem. But if you're writing software without the Arduino core libraries, you have to be careful.

(There's an interesting related hack WRT 57.6kbps. Very old Arduino core code would try both with and without U2X and pick whichever was best. Which was computationally ridiculous for most common uses.
When they switched to "always use U2X" they discovered that the resulting bitrate was quite far off and didn't work with some existing USB converters. So ArduinoCore-avr/cores/arduino/HardwareSerial.cpp at master · arduino/ArduinoCore-avr · GitHub)

PS: keep in mind that the source code for both the Arduino core and the 16u2 firmware is Free OSSW, and it online and browseable.
Arduino Core Serial Driver
Arduino 16u2 USB/Serial Firmware