@dsebastian
I'd like to reemphasize the distinction of atomically modifying the value of a variable and atomically writing/reading a variable's value at a given point in time.
As I mentioned earlier these issues are different.
When using 8 bit variables there will never be an issue of reading a partially written variable since there is only a single byte access.
i.e. with 8 bit variables you avoid the issue of reading the first byte while the foreground was in the middle of trying to update the 16 bit (two bytes) which causes reading a corrupted value.
That said an ISR can still read a 8 bit value that is invalid due to the foreground having been in the middle of trying to update its value since the update operation can be multiple interruptible instructions that can involve multiple reads and/or writes of the variable.
Depending on the code and how it is written it can cause the variable to be re-written to memory multiple times during the update process/calculation which offers the potential of the ISR to read interim invalid values.
Example, suppose your index values can only be between 0 and 15
The interrupt could occur right at wrap point when the foreground bumped the value from 15 to 16 but the mask has not forced the index back to zero.
Depending on several factors, the generated code may update the variable in RAM for each part of the calculation
i.e. variable in RAM momentarily changes to 16 before it changes to zero.
The likelyhood of this happening increases if you use volatile since that tells the compiler to always push the contents back to RAM as much as possible.
Example:
When volatile is not used, and compiling for an AVR using the Arduino IDE these two generate the identical code when readIdx is a uint8_t:
readIdx = ((readIdx+1) & BUFFER_MASK);
and
readIdx++;
readIdx &= BUFFER_MASK;
turn on volatile and second one will update readIdx after the increment but before the mask as well as after the mask.
This is just a simple example demonstrating that how code is written can make a difference. As the code gets bigger and more complex it can get harder to detect such changes and interactions.
example, the optimizations being done are so complex that sometimes making a very small unrelated change in function or even an unrelated function can significantly alter the code written including register usage in a function.
In these situations blocking the other thread during an interruptible update is required. i.e. the foreground should mask interrupts while messing with a variable that the ISR "sees".
There are cases (most cases) where using volatile is absolutely required.
There are some cases where you can get away without out it and in fact don't want it to help improve performance.
Other cases where reading the variable into a temporary variable can avoid the multiple RAM access issues.
Once you start to do abnormal things like not use volatile for variables shared between threads, (like foreground and ISRs) you really have to know what you doing so often it is best to just avoid doing that.
It comes down to understanding threads and how they interact, what operations are interruptible and how to avoid unwanted interruptions.
This is a very complex topic that often takes quite a bit of experience to fully grasp and develop techniques to handle various situations, and even then, people with lots of experience can make mistakes and create subtle bugs that can be hard to track down.
I've been playing with this kind of stuff for 40+ years. I've experienced these kinds of situations and issues across many different processors and s/w environments.
I've gained enough expertise to know when it may make sense to try to play compiler tricks/games and when it doesn't.
One thing for sure is that if you are going to play at this level, you are going to have to look at the assembler code generated by the compiler.
If you can't do that, then stay away from trying to play tricky code games.
I would recommend that you do things the safest most robust way, until you know that it can't work due to some performance issue in the code.
Often issues related to queuing / buffer overflows are not related to the speed of the code itself but some overhead in some other portion of the system.
Over the years, I have seen WAY more bugs and problems when ISR threads are involved from people not properly using volatile variables, not doing atomic locking like ISR blocking to do updates, or trying to write "cute" or "fancy" code they perceive to be faster than simply sticking with simpler more robust code.
--- bill