How many clock cycles does digitalRead/Write take?

Table Covering Execution Times of a Selected Number of Arduino High Level Instructions

exetime.doc (48 KB)

Figure-1: Internal structure of digital port-pin of ATmge328

I don't know if the port diagrams have ever been updated to reflect the PIND/toggle hack that has been added.

(3) I have experimentally verified that the execution of the instruction PIND = 0b00000100 does toggle the output of the latch FF2 (Fig-1). This toggling, in fact, affects the internal pull-up resistor associated with PIND2-pin.

The datasheet says that writing a one to a PIND bit toggles the bit value of PORTD, and also says that writing a one to PORTD bit enables the pullup register for that bit, so this follows.

Is this one of your "suggested quiz questions to enhance your understanding", or are you seriously asking? (It would be nice if you'd make that clear in future postings...)

(4) Which one of the following is a recommended way to affect the internal pull-up of PIND-2 pin?
(ii) pinMode(2, INPUT_PULLUP);

That one, please. It satisfies "most obvious" as well as "most portable"...

(i) pinMode((2, INPUT);

Doesn't enable the pullup

(iii) PIND = 0b00000100;

Toggles the pullup setting, rather than setting it. **On some AVRs. ** Not portable, not clear, and wrong.

(iv) DDRD = 0b11111011; //PIND2 is input
PORTD = 0b00000100; //enables internal pull-up
bitClear(MCUCR, 4); // Global PUD (pull-up disable bit).

Not portable. Probably fastest... Global PUD is disabled by default.

(v) bitWrite(PORTB, 2, !bitRead(PORTB, 2)); //read-modify-write (toggling)

Looks OK for toggling a bit. If the result is that the bit is a 1, and the port is an input, then it will enable the pullup.

Not mentioned:

(i) pinMode((2, INPUT);

  • digitalWrite(2, HIGH);*

This is essentially the way that people first discovered they could set pullups on Arduino pins, back before INPUT_PULLUP existed. It's horrible and it needs to die. The idea that you can set the pullup by writing to the port is very AVR-specific. The code added to digitalWrite() on non-AVR platforms to create "backward compatibility" is gross and slows down a function that is already often insulted for its poor performance. :frowning: (Most of the more advanced processors have a SEPARATE register for controlling pullups/pulldowns.)

Figure-1: Internal structure of digital port-pin of ATmge328

Hah! You need a newer datasheet. The following is from the 02/09 version, and I’ve highlighted the “toggle” circuitry…

Screen Shot 2017-06-30 at 2.27.00 PM.jpg

westfw:
No, it doesn't. Careful analysis might lead you to THINK that it does (It's a RMW statement of the whole port, right? So bits that read as ones will toggle the PORTD values!) But the compiler carefully generates a SBI or CBI instruction, which apparently doesn't behave this way. (Try it! Sketch attached.)

On the other hand, "PIND |= variableBitMask;" will probably behave as you describe. And "PINH |= 0b100;" or even "PINB |= 0b101;" will as well.

I've gone over this before multiple times in other threads.

When &= and |= are used with a single bit mask, the compiler optimizes this to a single bit set or clear. This is not logically the same operation. It doesn't cause trouble when used in normal memory, but for bits implemented as strobes (that execute a function when written to) it causes unexpected behavior.

With a single bit set or cleared in the mask, the statement is optimized to the single-bit instruction. Trying toggling two pins in the same statement and it won't do that. It compiles to the full read-modify-write sequence. I've tested this.

The optimization is also only possible when the mask value is a constant value that can be deduced at compile time. If the compiler can't figure out the value (because you're using a non-const variable for it or something), it will have to create the RMW sequence as well.

Relying on an obscure optimization to produce the code you intend when you use the wrong statement is crazy.

I have been a bit confused at the very first sight of PIND = 0b00000100 thinking that how can we write data into a port which has been configured to work as input?

Registers are not configured as inputs, pins are. Registers are just memory locations they can all be read and written to the same as any other. The datasheet will specify if those writes have any effect at all or not.

@westfw

(1) Figure-1 of my post #18.

The original diagram belongs to Atmega32 not Atmega328. Why was it of Atmega32 and not of ATmega328?

Figure-1: Structure of digital IO port of Atmega32

Figure-2: Structure of digital IO port of Atmega328

Just recently (Feb/17), I have migrated from ATmega32 territory into the territory of ATmega328. Without noting the key differences between these two processors, posting of the IO diagram of ATmega32, blindly believing that the Atmega328's one would be the same, is a grave and unpardonable mistake from my behalf. Advocating of this kind of non-scrutinized information might lead audiences loosing their interests in tracking other sound posts. I must acknowledge your prompt action in the form of passing notification to me.

ATmega32 did not posses the new feature of toggling a bit through manipulation of PIND register; I was reading IO port of Atmega32 considering it as the IO port of Atmega328; I was also reading the post of MarkT on the role of PIND-register for toggling bit; all these lead me to testing few instructions using Arduino UNO, of course, to enhance my understanding level. What I claim to have been understood that proved to be futile, and I attempt to re-learn again. As I learn more, I become more ignorant; as I write more, I write more than what I know. Still, I am surviving; it is due to my very luck that I always fix up 'things of lives' this way or that way! Let me tell you one small story:

In 1996, I was developing the 'Single Stepping Routine for the Monitor Program of an 8086 Microprocessor Trainer. The routine was not working, and I could not find the fault. One of the instructions ' or WORD PTR [bp + 00h], 0100h was being coded by MASM as 83 4E 00 01 instead of 81 4E 00 00 01 (found by direct hand coding using template). It was just my luck which whispered me - why don't you code the instruction by hand and see the difference?
//------------------------------------------------------------------------------------------------------------
Now, about Global PUD Bit of MCUCR Register

(1) Default value of PUD bit = 0

(2) In Fig-2, PUD bit has been conditioned by G1 (AND gate)

(3) G1 has a bubble (inversion) at the reception point of PUD bit

(4) So, LL (0) for PUD will enable G1, which will in turn enable internal-pull up. (Assume that the port is configured as input and LH is written into the corresponding PORTxn bit.)

(5) Therefore by default (after reset), the PUD is at enabled state.

Still knowing that why have I executed the instruction bitClear(MCCUR, 4)? It is due to my poor understanding of the job of 'System program = Boot Loader'. What does it do after power up reset? there is no clear documentation about it. I have seen, while playing with interrupts, that the Boot Loader enables the Global Interrupt (I bit) of SREG; however, this bit was originally at 0-state done by the ATmega reset process. In order to be sure that the PUD bit is at enabled state, I have executed the bitClear(MCCUR, 4).

//------------------------------------------------------------------------------------------------------------

Query: Does ATmega32 allow the process of toggling a bit through the manipulation of PIND register? If the answer is YES, then I have another query: What is the necessity of having feed-back connection in the 'toggling feature' area of the ATmega328?
//------------------------------------------------------------------------------------------------------------------

@Jiggy-Ninja

I hope that the great wall of your patience and perseverance will favor you to listen what I have heard from ATmega Programming School.

(1) The MCU (ATmega328) has a physical pin (say, Pin-6).

(2) The pin is associated with a signal (PD4).

(3) The signal has a full name: Bit-4 of Port-D register.

(4) The pin can be programmed to receive data from external input device and deliver it to the input of an internal register. It is an input port.

(5) The internal register is named PIND register.

(6) The pin is physically connected with the input side of PIND register.

(7) The PIND register is a 'Read only' register.

** **(8)** **
The following sayings are all equivalent:
(i) Reading data from Pin-6
(ii) Reading data from PD4 port-line
(iii) Reading data from PIND register.

(9) However, in programming the valid sayings are:
(i) in r16, PIND
(ii) byte x = PIND
(iii) lds r16, $0010
(iv) boolean n = digiatlRead(4);
(v) boolean n = bitRead(PIND, 4);

I started learning things in 1974 under tremendous stress of exam/quiz/pass/fail; began another learning in 1978 in a Fertilizer Factory; next learning in 1982 with Schlumberger Wire Line Ltd.; next learning in 1997 with a University; next learning in 2017 with Arduino Forum; next learning in the Grave; still, I must continue learning to remove every bit of imperfections within myself in order to achieve 'The Nirvana or Janna'. What is Nirvana or Janna? It is a process of 'Full Merging with Nothingness' or 'Full Merging with Oneness.'

//----------------------------------------------------------------------------------------------------------------

(4) The pin can be programmed to receive data from external input device and deliver it to the input of an internal register. It is an input port.

I think this is where you're getting confused.

From the perspective of a program, registers are just memory addresses. A program statement can be written to write to or read from any memory address. What those particular reads and writes do is up to the hardware, and will be documented int he datasheet. It just so happens that when Atmel was designing the processor they decided that reading the PINx registers reads the values of the input buffers, whereas writing to the PINx registers performs an action (toggles the corresponding PORTx bit). It was a design choice on their part to do this, and they could have just as easily decided to do something else. Apparently older AVR processors like the ATmega32 mentioned are read only; writes have no effect. Other architectures can have different behavior. On PICs for example, writing to the input register (PORTx) is transparently redirected to the output register (LATx).

What does it do after power up reset? there is no clear documentation about it. I have seen, while playing with interrupts, that the Boot Loader enables the Global Interrupt (I bit) of SREG;

The source code of the Arduino bootloaders is freely available. As far as I can determine, Optiboot does not turn on interrupts. They're turned on in the core's init() function, which is part of the uploaded sketch.

It was a design choice on their part to do this, and they could have just as easily decided to do something else.

If I have understood your informative write-up reasonably, my standings:

(1) PIND Register could be thought as containing two separate registers with single address (like Control/Status Register of LCD):

(i) Read-only PIND Register
When we perform read operation (byte x = PIND;) on PIND Register, the logic values of the actual physical pins is stored into the variable x.

(ii) Write-only PIND Register
When we perform write operation ( PIND =0B0000100;) on PIND Register, the operand (0b00000100) affects the logic value of the normal output (non inverting) of PORTD2 flipflop. The affect appears as a toggling (if it was LH before the operation, it will now turn to LL and vice versa) of the PORTD2 bit. The Conceptual Diagram (let us forget the electrical racing) is:

Figure-1: Role of PD2 port-line as a single bus communication means between ATmega328 and AM2320

(2) I have conceptually drawn Fig-1 in favor of possible justification of your comment that the ‘Atmel people made it a design choice’. It makes sense in the following perspectives:

(a) If we decide to engage the PD2-line as a single bus communication means as per Fig-1, the AM2320 (Humidity-Temperature Sensor) protocol demands that the PD2-line has to undergo transitions like this:

Zstate–>O/PSrcL–>O/PSrcH–>Zstate–>I/PSinkL–>I/PSinkH–>I/PSinkL/HData–>I/PSinkL–>ZState
20us 1000us 20us 20us 80us 80us 4800us 50us 20us

Zstate = High impedance Z-state

O/PScrcL : Ouput Sourcing Logic Low

O/PScrH : Ouput Sourcing Logic High

I/PSinkL : Input Sinking Logic Low

I/PSnkH : Input Sinking Logic High

I/PSinkL/HData: Input Sinking 40-bit data (bit-0 (50us low followed by 26us high); bit-1 (50us low followed by 70us high) consisting of: <Humidity - 2x8; Temperature - 2x8: CheckSum - 8>.

(b) In Step-2(a), we observe that the PD2 line changes its states (toggles) as well directions during the acquisition period of one data frame. Toggling could be done in many ways as you have speculated; however, there could be possibly benefits for the particular choice of engaging ‘Write-only PIND’ register for this purpose.

(3) The justified conclusions could only be drawn by making a working system based on Fig-1 (or any other alternative) and then measuring its performance (in terms of execution cycles) under various models of controlling instructions.

Thank you for taking time to explain the issue.

That horse

(x) is really dead.
() has already bolted.
() is a different colour.
() was a gift.
() can reach the other side of the stream perfectly well.
() won’t drink.
() is not a unicycle.
() is a horse, of course, of course.
() is wooden.
() is not really very high
() is behind the cart

(1) PIND Register could be thought as containing two separate registers with single address (like Control/Status Register of LCD):

That's probably the best way to think of it.

That horse

(x) is really dead.

No it's not, as evidenced by the fact that is keeps getting posted.

sherzaad:
thank for your replies.

I'm actually looking for time taken for a digitalRead/Write in clock cycles not microseconds as I will be transferring my code from a UNO (16MHz) to a pro mini (8MHz).

As my code is time critical, knowing how many clock cycles these routines take is essential for me.

Any help would be much appreciated.

But since your code is time critical it really IS about how long the functions take.
And when going from 16Mhz to 8Mhz the math is simple. It will take EXACTLY twice as long.
(Clock cycles will be exactly the same since the instructions will be exactly the same on the two boards, but at 8mhz the execution time will be twice as long as at 16Mhz)

The actual answer is that it is impossible to predict the number of cycles or the time for the digitalRead() & digitalWrite() functions with any accuracy.
This is because it can very substantially depending on the types of parameters used and the version of the compiler along with the other code in the sketch.

The reason is a combination of things that when combined generate some pretty lousy code for manipulating i/o pins along with some substantially different code depending on the optimizations the compiler can do which varies depending on the parameters handed to digitalRead()/digitalWrite() and the version of the compiler.
The API semantics itself for functions like digitalWrite(pin, value) is HORRIBLE particularly when the pin parameter can be naked constants like 1, 2, etc..
if naked constants were disallowed in favor of pre defined constants or defines and the API used semantics of setting the pin high and setting the pin low vs passing in a parameter, things could be much faster.

Also there are MANY things that the IDE bundled AVR core code could be doing, even with the existing API semantics that could dramatically speed up the code.

The naked constants really hurts, especially with the existing poor code implementation used in the IDE bundled core code. The IDE bundled core code will always do run time table lookups for many pieces of data - some of which could have been pre-calculcated at compile time had things been done differently.
Table lookups are from data in flash and the AVR is so wimpy that it cant access flash directly which requires doing functions calls to read the flash which adds additional overhead and slows things down.

Making things even more complex is the compiler optimizations.
The compiler can decide to do some pretty crazy and intense optimizations which can have a dramatic affect on the code generated by digitialRead()/digitalRead() which can dramatically affect the code, its numbers of cycles to run which affects its timing.
For example in some cases it can decide to inline everything and totally eliminate function calls.
(The digitalRead()/digitalWrite() would be inlined right into the sketch code)
This will normally be done on very simple code like some simple code in loop() that manipulates a pin, but I have seen it also do it in the middle of some fairly complex code.
But how/when it decides to inline and unroll loops is not really predictable.
Because of this, it is simply not possible to predict with any accuracy how many cycles or how much time digital i/o functions will take.

On a 16Mhz AVR, I have seen it vary between 3.5us and 6us by just changing the compiler.
And I've also seen it jump from the 3.5us up to 6us by making a tiny change to some unrelated code.

So there really is no way to predict how long these functions take anymore as they are no longer a constant.
If you really care about cycles and need consistency, you can't really really use C/C++ anymore - or at least not any of the Arduino core API functions.

--- bill

That's probably the best way to think of it.

Getting wonderful hopes in the great 'turmoil of debates' to go ahead with the implementation of the idea of Fig-1 of post #26.

(1) PIND Register could be thought as containing two separate registers with single address...

That's probably the best way to think of it.

I agree. We do a lot of explaining that "ports" are like memory locations that are visible to the outside world, which is sort-of true, especially for the cases where that is used as the explanation. But in fact it's trivial from a hardware design point of view to set things up so that a write triggers completely different hardware than a read. (like the UART data register, for example - the uart receiver hardware that you read when you read this register is separate from the uart transmit hardware that activates when you write to it...

The whole "pin toggle" feature is of questionable value in the AVRs, IMO, given that it's only implemented on some chips. and is only really fast on some ports, and the potential side effects. It might be more valuable on some of the ARM chip GPIO ports, where it has been given a dedicated register...

So there really is no way to predict how long these functions take anymore as they are no longer a constant. If you really care about cycles and need consistency, you can't really really use C/C++ anymore - or at least not any of the Arduino core API functions.

But, we do still need some values which is Technically Justified under the given platform of IDE 1.8.0 and Arduino UNO R3.

The Table of post #20 has documented some experimental values for the execution times of a few selected Arduino instructions under the Platform of: IDE 1.8.0 and Arduino UNO R3. Are they Technically Justified? Can they be used for some kind of purposes?

Let us make a similar Table of post #20 under the Platform of Atmel Studio Assembly and ATmega328 and see the differences in the execution times. Doing this kind of things could be an excellent cryptic job but of little practical usage!

We need to live with the real world where things happen under agreed protocols and for the time being we have agreed to work under the protocols of IDE 1.8.0 and Arduino UNO R3.

(like the UART data register, for example - the uart receiver hardware that you read when you read this register is separate from the uart transmit hardware that activates when you write to it...

Perfect statement, sabbash!; it agrees with the following diagram which belongs to the USART Module of ATmega328.

(1) In Fig-1, we observe that there is only one register URD0 (called USART Data Register number#0). It is composed of two registers: Read-only UDR0 and Write-only URD0.

(2) When we perform the instruction in r16, UDR0, data comes from the Receiver (Rx Register). In Arduino we don't have equivalent instruction for in r16, UDR0); what we have is byte x = Serial.read(); which reads data from the Arduino defined Serial Data Buffer.

(3) When we perform the instruction out UDR0, r16, data is written into Transmit Register (Tx Register). In Arduino we have equivalent instruction for it: Serial.write(arg);.

Figure-1: Structure of the USART Module of ATmega328

GolamMostafa:
But, we do still need some values which is Technically Justified under the given platform of IDE 1.8.0 and Arduino UNO R3.

The Table of post #20 has documented some experimental values for the execution times of a few selected Arduino instructions under the Platform of: IDE 1.8.0 and Arduino UNO R3. Are they Technically Justified? Can they be used for some kind of purposes?

That table in post #20 is a collection of a few arduino core functions but it is mostly cycles and timing for code that is stepping outside of arduino and using AVR specific raw port i/o macros.
Just because the Arduino IDE can build some code for an AVR based board, does not mean that the code inside the sketch is actually Arduino code.
i.e. there is a difference between "Arduino" the IDE and "Arduino" the code.

"Arduino" is not a true language and therefore has no actual instructions. Arduino uses C++ and so things like analogRead(), digitalRead() and digitalWrite(), which come bundled with the IDE in the AVR core library, are functions and inside those functions other functions and/or macros which are sometimes called.

Because the Arduino core digital pin i/o functions are C/C++ level code and due to the optimizations that gcc is now doing, there is no way to accurately predict the number of instructions generated for those functions since the compiler and linker can do massively different optimizations on the final generated the code.
The optimizations can vary substantially depending on the other code used in the sketch.
The code that can affect the code generation of an arduino digital i/o function does not even need to be in immediate proximity to the use of the function.

So therefore anything that may show cycles or timing that involves using Arduino pin i/o functions can never be accurate if only a single number is provided as the code, and therefore its cycles and execution time, can vary substantially.

In that table there are many cycle times and timings for code that is NOT arduino code.
For example anything that uses raw AVR port i/o is not arduino code. Code that is using things like PORTx, DDRx, PINx, TCNT, OCRx, etc... is using macros that are supplied with the avr-gcc compiler that are AVR specific raw port i/o. While the cycles and timing of doing raw AVR port i/o is generally consistent, it is definitely stepping outside of Arduino and is therefore not portable.

The thread has wandered around quite a bit and has deviated from the OPs original questions related to using Arduino functions.
The original poster stated he was interested in portable code. From post #8

sherzaad:
thank you all for your suggestions.

I am aware of that direct port manipulation can be faster but I want my code to be as x-platform as possible (for arduinos that this) and therefore would prefer to use digitalRead and digitalWrite.

So far I understand from this thread that digitalWrite is 48 clock cycles...

what about digitalRead? any ideas...

My point being that if you stick to using the actual Arduino core i/o functions (which is what ensures portability), then not only do the i/o functions tend to be substantially slower than other potential alternatives, but there is no way to accurately predict the cycle times and therefore timing when using arduino functions.
It would be possible to provide a range of timings but that range can vary quite a bit so anybody using the timings must take that into consideration since it is not really possible to predict when the compiler will do certain types of optimizations.

An alternative is to use Teensy AVR boards which supplies its own better written AVR core.
If that is done, even when using the standard digitalWrite()/digitalRead() most i/o operations will generate single instructions which are 2 clocks - when the parameters are constants.
The developers of the AVR core that comes with the Arduino IDE chose quite some time ago to reject these types of compile time macro optimizations and therefore when using AVR boards like an UNO that use the AVR core that comes with the IDE, pin i/o operations are typically 40x-50x or more slower than when using the Teensy core with compile time constants.

GolamMostafa:
Let us make a similar Table of post #20 under the Platform of Atmel Studio Assembly and ATmega328 and see the differences in the execution times. Doing this kind of things could be an excellent cryptic job but of little practical usage!

Creating a table with assembly instructions would be a waste of time.
The cycle times for AVR instructions can found in the AVR instruction documentation.
There is no need to write code to measure them.
When using C, the cycles for any code that does RAW port i/o can looked up in the AVR documentation assuming you know how the C instruction will be converted into an AVR instruction.

GolamMostafa:
We need to live with the real world where things happen under agreed protocols and for the time being we have agreed to work under the protocols of IDE 1.8.0 and Arduino UNO R3.

EXACTLY. And that is why I brought up that when using actual Arduino core functions, the timings are not guaranteed to be consistent given the way the compiler can choose to optimize the code.
And the variation can be quite substantial.

And there can also be quite a variation due to how the core code is written.
And example of this can be seen by comparing the IDE supplied AVR core code vs the core code that comes with Teensyduino for the Teensy AVR boards.

So what you will see in real-world use is that even though Leonardo and and a Teensy board may use the same exact processor, same Arduino IDE, same compiler, same exact code that uses the portable Arduino digitalWrite()/digitalRead() functions, pin i/o on the Teensy can be as much as 50x faster than on the Leonardo.
The difference is that the Teensy pin i/o core code simply does things better.
This is a great example of why you can't really provide accurate cycles counts or timings for anything other than assembly code.

--- bill

What to say in reply except extending warm congratulation for taking time in presenting marvelous materials which deserve more and more readings.

It is indeed an interesting explanation. It is extremely valuable to know how such things work. However, in practice it is easier to be satisfied with a rough estimate of the time the function will take, and if it seems unsatisfactory, abandon it completely and use assembly as has been mentioned. It would be futile to agonize over the exact number of cycles the function uses in an actual time critical application, for this reason.

aarg:
...it is easier to be satisfied with a rough estimate of the time the function will take...

"Worst case" is usually more useful in these situations. :wink:

I have built multiplexed LED displays that use digitalWrite(). My worst case was any delay that would cause a visible flicker, and there was none. That was easy and didn't require much thinking.

I sometimes use the "contractor's rule of thumb" and just double the nominal case if I don't know for sure.

It's mentioned above, the only way to ensure a definite response time is to use assembly. It will also be faster.

bperrybap

+1 for your words

.