[SOLVED] Is an assignment to 8 bit-long variable on 8 bit processor always atomic?

I have a more general question on how code is executed for 8bit-long variables on 8 bit processor. Reason behind is that I have implemented a ring-buffer with indicies for reading and writing, say:

byte readIdx;    // Index into buffer for reading
byte writeIdx;   // Index into buffer for writing

So both are just 8 bit long. So it is my understanding that on a 8-bit processor, like ATmega328p, reading or writing to them is always atomic. Well, is it?
That is, since my function WriteToBuffer manipulates writeIdx only, and ReadFromBuffer manipulates readIdx only, I always get a consistent pair of (readIdx, writeIdx) when they are read in either WriteToBuffer() or ReadFromBuffer() function (which is needed to detect buffer overflow or new data in buffer)

Now consider following line for updating readIdx in ReadFromBuffer():

   readIdx = ((readIdx + 1) & BUFFER_MASK);     // with BUFFER_MASK = 2^n - 1, n <=8

Well, is this atomic with respect to the result of the calculation? What does the compiler makes out of this? Will it be somewhat (I don't know how to write it with assembler code)

// Variant 1
// take this as backward c++ Code created  from assembled line readIdx = ((readIdx + 1) & BUFFER_MASK);
byte temp = ((readIdx + 1) & BUFFER_MASK);  // i.e. new idx is stored temporarily
readIdx = temp;                             // atomic assignment for new value of index


// Variant 2
// take this as backward c++ Code created  from assembled line readIdx = ((readIdx + 1) & BUFFER_MASK);

readIdx &= BUFFER_MASK;

I.e. readIdx may hold for short period of time the (invalid) value of BUFFER_MASK + 1.

So, what is correct? I have not idea to test the function because I don't know how to force the compiler to make the assignment to readIdx just once and when final result is calculated. And, if the compiler does not, a non-valid pair of readIdx, writeIdx (due to non-valid value of readIdx) will be created quite seldom, and thus pretty hard to detect.

And, even I create a code using a temporarily variable, like for variable temp above, I dont't know if the compiler optimizes it to the two assignments, because temp is never used again ...

Your support is appreciated.

well copying a memory location in SRAM to another location in SRAM might not be atomic. Depends on the processor, some will require that the source is loaded into a register and then the register written in the destination. So you can get 2 operations (or more if you need to load the source or destination into an address register). Other processors have direct memory access and instructions to copy one byte from one address to another address.

for your question, it's more like variant1 as the cell memory for readIdx is never incremented. you get transient memory allocated (or registers dedicated) to perform the maths and then the result is moved into the destination.

feels like an XY problem at the back of this. What's your use case and why do you ask?

Reading or writing bytes is atomic, but reading and writing back is not.

Matter of fact, I though describing my problem properly.... let me try to improve:

  1. I have created a ring buffer to receive messages on I2C bus. That is, a ISR is writing into the buffer.

  2. Buffer Data is processed in loop().

  3. I guess I need a buffer because both controllers work independently. The I2C master does not care whether the slave has processed the data or not. From my application I can a size a feasible size of the buffer to minimze the risk of buffer overflow to acceptable manner (most likely to be caused by operator misuse only, which is, well, me)

  4. Deployed algorithm is decribed here FIFO – Mikrocontroller.net - sorry article is in german. Used method is described " 2n-Ringpuffer - die schnellste Lösung". Basically it says: update writeIdx in function that does the writing, i.e. the ISR in my case, update readIdx that does the reading, i.e. loop(). But always evaluate BOTH indicies for either detecting that a new message is received (done in loop()) or a bufferflow has occured (done in ISR).

  5. Managing writeIdx in ISR is not a problem. No other function will disturb the writing process

  6. Update of readIdx can be interrupted by ISR, so risk of non-valid data in ISR.

My question is about managing the risk on non-valid data in ISR.

that's enough to say that you should use volatile variables and critical section protection if the loop can be interrupted when you process the buffer.

1 Like

Hm, what needs to be done not to fall in this (reading and writing) pit? Sound like that there should be some other code line inbetween, e.g.

void loop() {
// just do it even though not known yet if calculated value of readIdxNext will be used or not
byte readIdxNext = ((readIdx + 1) & BUFFER_MASK); 

// to all calculations in loop() with lots of variables and functions so that the compiler stores 
// the complete result of readIdxNext in RAM (and not intermediate results)

// uups, new message has been received, and evaluated completely
readIdx = readIdxNext;

Like this? If so, what is a sufficient "distance in code" between reading and writing?

you need to suspend interrupts, possibly copy the data and work from a copy, if the loop can be interrupted

What is wrong with the code in the loop() of the Slave ? Is it too slow ? Did you use too many delay ?
What is wrong with the Master ? Is it sending too much data ?

A buffer does not solve all problems, because a buffer can overflow as well. You need to be able to detect when there is too much incoming data. An error counter that is 100% reliable is the information that you need, in my opinion.

Did you know that the Wire library uses blocking function ? The Wire.endTransmission() waits until the I2C bus session has completely finished. That means that during the I2C activity the Slave Arduino has time to run other code (but not if the Slave is an Arduino Uno).
Did you know that the Arduino Wire library for the Uno at 16MHz is demanding when it is set in Slave mode. When a Master pounds the Slave with I2C sessions then the Slave has no time to breathe.
Did you know... well there are many other things that makes the use of a buffer trivial.

To expand on @DrDiettrich answer,
No, none of your examples are atomic operations since they are not doing simple byte reads or byte writes to memory.
They all read memory into a register modify the register, then write the register back to memory.
The AVR is a RISC processor with a very simple instruction set, pretty much every operation is a separate instruction which is interruptible.

But it sounds like you are more concerned about seeing corrupt/intermediate values than actual atomicity. THAT is different than atomicity.

I assume you looking at index values in foreground and ISR, you also have some issues related to how the compiler may choose to cache a variable in register.
i.e. you may have to use the volatile keyword to ensure that the foreground actually re-stores its value back into actual memory for the ISR to "see".

The easiest/simplest answer is to just say declare your variables volatile and make sure your foreground code locks out the ISR during the updates.
It is simple and guaranteed to work.
If using the AVR platform, it even contains atomicity wrapper macros that can ensure atomicity for you.

My suggestion would be start with that, and once that is working look to the next level if performance is an issue.

Is using volatile variables and ensuring full atomic updates the only solution?
(note: these are two separate things)
No, there are definitely ways to write the code that do not need the use of volatile when using 8 bit variables. There are sometimes techniques that can ensure that the lack of atomic updates does not create an issue from seeing intermediate values depending on the code.
In fact not using volatiles can be quite useful given the faster code that is often generated and it is often easy to write code does not need this when using 8 bit variables but atomic locking during updates is often still required.
i.e. in many cases volatile is not needed in the Arduino environment when globals are shared between foreground code and ISR code if the foreground blocks the ISR during the variable update. But it can depend on how the code is written. This is often the case for queuing type operations since the worst that happens is the writer may temporarily see a buffer that looks a bit fuller than it really is.
The only reason to not use volatiles and ensure strict atomicity would be if there are some serious timing requirements that can't be met using the normal/standard approach.
But doing it differently requires using some advanced programming techniques and more in depth knowledge of how specific processor works with respect to output of the compiler and more specifically how the code in the system works.
i.e. how/when the index variables are used.
Things like are they used in loops, ISRs, or used in code called from functions, etc...

They only way you could switch to these more advance techniques is if you are actually not worried about the atomicity of the update operation but rather just never seeing "corrupted" or out of bounds values.
That is something different.

And if working at this level, you need to be looking at the compiler output to make sure you are getting code that will work.
If using the AVR, you will want to create a .lss file which has the assembler output.
If you are using linux, I can provide you a wrapper script that you can install that will generate .lss and .sym files when doing builds from the IDE so you can see the assembler output and symbol table.
If you are using Windows, bummer, that OS is just too wimpy and doesn't have the needed tool set capabilities as it wasn't designed for development like unix was.

--- bill

there are so many "if" to avoid the volatile keyword that unless it's really really needed, this should not even be considered.

I don't think it's really slower code (the ISR will need to read from memory and write back - compiler has no business keeping the value in a register) and if in the loop you have a critical section where the only thing you do is create a copy of that variable and then work on the copy, then there is no extra access penalty.

It's a matter of priority. In a Ringbuffer the writer updates the writeIdx, the reader only reads it (for testing purpose). Similarly the reader updates readIdx while the writer only reads it.

As long as each idx is a single byte everything should be fine without further synchronization. If the writer fails to push a byte only because it reads the readIdx before the reader updates it, then the buffer is too small for reliable operation.

The reader only has to finish reading of the next byte before incrementing the readIdx. Like
byte justRead = buf[readIdx]; readIdx++;

Ok, there are several instructions involved to the update of the index in my buffer - fine with that.
BUT: the very final operation (writing from register to RAM) I would assume is just one(!) operation and thus atomic in that sense that in case not be interrupted. And here it is relevant that the index is just 8 bit long on a 8 bit processor. Thats why I wrote "always atomic":

  1. If the ISR has been started prior to update of index (done in loop()), ISR may use an outdated value, which is equivalent - in my application - with a buffer more filled. I'm fine with that.

  2. If the ISR has been started after update of index (done in loop()), all fine

  3. If the ISR is started during update of index (done in loop()), data is inconsistent provided the compiler does the update in RAM (from which the ISR will take the value when called very first time, and, by the way, I create a local copy in ISR, i.e. reading from RAM will occur just once). However, that requires that the formula below is executed in registers/cache and NOT in the RAM (so in other words my question in how this would be compiled to)

((readIdx + 1) & BUFFER_MASK)

If the index would be 16 bit long, the writing into RAM can not be done on the AVR (ATmega328) by its HW-nature. So there is for sure the probabilty that ISR will take some strange value (e.g. lower byte updated, upper not yet, or vice versa)

Hm, yes, no, probably, don't know. Atomicity is a measre to see corrupted values, isn't it?

... and, well, a Windows PC

the simplest solution is use of a 16 or 32 bit controller, which also has enough RAM for a 16 bit index.

No delay at all, don't worry :wink:
Is the Slave to slow, or Master sending too much data.
The Master is sending commands, which are triggered through user-input, to the slave. So just a maniac is expected to create a whole bunch of commands in a short time frame. On the other side, the slave deemed as faster as the master in most cases.
Matter of fact the scenario is that user/master requests power up and power down at high rate (just pushing a button), while slave needs some time to actually do that. For me is fine to run into buffer overflow after some on/off/on/off/on/off/on/off/on/off - inputs because it is mis-use. All stuff can be restored by cutting main power.

Well, the slave is an Uno (or, will be a ATmega328p on custom board)

Ah, no, sorry for leading the topic into wrong direction. 8 bit are absolutly fine for my application. With respect to buffer and all other things.

Just the amount of time needed to properly debounce the button is going to limit how often that can occur. You could also implement feedback from the slave to verify an action has been completed before sending the next command.

1 Like

I'd like to reemphasize the distinction of atomically modifying the value of a variable and atomically writing/reading a variable's value at a given point in time.
As I mentioned earlier these issues are different.

When using 8 bit variables there will never be an issue of reading a partially written variable since there is only a single byte access.
i.e. with 8 bit variables you avoid the issue of reading the first byte while the foreground was in the middle of trying to update the 16 bit (two bytes) which causes reading a corrupted value.
That said an ISR can still read a 8 bit value that is invalid due to the foreground having been in the middle of trying to update its value since the update operation can be multiple interruptible instructions that can involve multiple reads and/or writes of the variable.
Depending on the code and how it is written it can cause the variable to be re-written to memory multiple times during the update process/calculation which offers the potential of the ISR to read interim invalid values.

Example, suppose your index values can only be between 0 and 15
The interrupt could occur right at wrap point when the foreground bumped the value from 15 to 16 but the mask has not forced the index back to zero.

Depending on several factors, the generated code may update the variable in RAM for each part of the calculation
i.e. variable in RAM momentarily changes to 16 before it changes to zero.

The likelyhood of this happening increases if you use volatile since that tells the compiler to always push the contents back to RAM as much as possible.

When volatile is not used, and compiling for an AVR using the Arduino IDE these two generate the identical code when readIdx is a uint8_t:

	readIdx = ((readIdx+1) & BUFFER_MASK);


	readIdx &= BUFFER_MASK;

turn on volatile and second one will update readIdx after the increment but before the mask as well as after the mask.

This is just a simple example demonstrating that how code is written can make a difference. As the code gets bigger and more complex it can get harder to detect such changes and interactions.
example, the optimizations being done are so complex that sometimes making a very small unrelated change in function or even an unrelated function can significantly alter the code written including register usage in a function.

In these situations blocking the other thread during an interruptible update is required. i.e. the foreground should mask interrupts while messing with a variable that the ISR "sees".

There are cases (most cases) where using volatile is absolutely required.
There are some cases where you can get away without out it and in fact don't want it to help improve performance.
Other cases where reading the variable into a temporary variable can avoid the multiple RAM access issues.

Once you start to do abnormal things like not use volatile for variables shared between threads, (like foreground and ISRs) you really have to know what you doing so often it is best to just avoid doing that.

It comes down to understanding threads and how they interact, what operations are interruptible and how to avoid unwanted interruptions.
This is a very complex topic that often takes quite a bit of experience to fully grasp and develop techniques to handle various situations, and even then, people with lots of experience can make mistakes and create subtle bugs that can be hard to track down.

I've been playing with this kind of stuff for 40+ years. I've experienced these kinds of situations and issues across many different processors and s/w environments.
I've gained enough expertise to know when it may make sense to try to play compiler tricks/games and when it doesn't.

One thing for sure is that if you are going to play at this level, you are going to have to look at the assembler code generated by the compiler.
If you can't do that, then stay away from trying to play tricky code games.

I would recommend that you do things the safest most robust way, until you know that it can't work due to some performance issue in the code.
Often issues related to queuing / buffer overflows are not related to the speed of the code itself but some overhead in some other portion of the system.

Over the years, I have seen WAY more bugs and problems when ISR threads are involved from people not properly using volatile variables, not doing atomic locking like ISR blocking to do updates, or trying to write "cute" or "fancy" code they perceive to be faster than simply sticking with simpler more robust code.

--- bill

1 Like

That means buggy code.

No. It means reading from memory whenever the value is required, not holding it in registers.

Incredible. In detail the second version forces multiple updates of the variable, the first one can not have more than one. Optimization may reduce the number of updates but that's another topic.

Sorry for my rude wording, fixed it now :frowning:

Both actually. If you modified the value, write it back asap. If you need it, read it from memory

I don’t get the idiotic remark.

What I showed was a very simplistic example.
Showing some effects even with some simple examples of code.
When code gets more complex it isn't always obvious what really happens like how the variable will be updated back into RAM or h/w registers.
And what actually happens or when physical ram is read/written can change depending on other code.

In fact years ago there were some big arguments over how to implement the ATOMIC block macros in the AVR Clib. Since they weren't always working.
It came down to some very complex and obscure interpretations of the standards over the use of scoping. loop unrolling and volatiles.
I can't fully remember but I think it may have even required a tweak to the compiler to make it work for some tricky edge cases involving loop optimizations related to loop unrolling.
If you look in the atomic.h header file you will see the use of:

 __asm__ volatile ("" ::: "memory");

volatile works both ways. It forces a read when accessed and also affects flushes back to ram after updates.
Without volatile the compiler is allowed to hold on to a global variable value in a register until a function call is done, or the function returns.
Or it can reorder things to try to improve code size or speed.

In fact if the code looped forever in a while loop, the compiler will typically never write the value back to the global variable's RAM since from what it has seen, (lack of use of volatile) nothing else is using that variable so there is no need to ever write it back to ram.

--- bill