Switching clock from int 1M to int 8M does not speed up

Hello

I’m currently learning about clocks and timing, using a stock 5V Uno rev3 (16Mhz) and an ICSP-programed 328P.

Hope someone could enlighten me on these, as i am kind of baffled about what i see.

Stock UNO clock and instruction duration

Using the stock uno 16MHz crystal, running one instruction per cycle, each should take 62.5ns per cycle. Using direct port manipulation, i see (on a Rigol scope) that each port write is taking 125ns...

QUESTION A : why does a single instruction seem to take 2 clock cycles ?

Using timer1 with no pre-scaler, i time my code according to the code below.

What i see and understand :

  • bitSet/bitClear should be 1 cycle (direct and/or'ing memory address)
  • bitRead should be 3 cycles (copy'ing memory address, and'ing, shift)
  • in total 12 * 1 cycle + 2 * 3 cycle = 18 cycles
  • but it prints 38 on the serial monitor
  • i guess that 38 = 18 * 2 + 2.
  • the "*2" could be because each instruction takes 2 cycles (as per Question A above)
  • the "+2" could be from stopping the timer at the end (1 instruction, 2 cycle, Question A)

QUESTION B: where can i find the "readable" assembly code to verify my timing analysis ?

Bare 328P chip (ICSP/SPI programmed)

I read from boards.txt that it activates CKDIV8, and CKSEL=0b0010 (int 8MHz RC) :

atmega328.menu.clock.internal1=Internal 1 MHz
atmega328.menu.clock.internal1.bootloader.low_fuses=0x62
atmega328.menu.clock.internal1.bootloader.high_fuses=0xdb
atmega328.menu.clock.internal1.bootloader.extended_fuses=0xfd
atmega328.menu.clock.internal1.build.f_cpu=1000000L

In this configuration, CPU clock is 1Mhz, and each instruction should take 1us.

Using direct port manipulation, i see that each port write is taking 2us...
So question A still applies.
And i have a test led blinking using delay() : 1s on, 1s off.
In that configuration, it blinks "really" on time.

Then i tried to burn the code again using "Tools / Clock / Internal 8Mhz" (instead of 1Mhz)

From boards.txt it should deactivate CKDIV8, and keep CKSEL as int 8MHz RC :

atmega328.menu.clock.internal8=Internal 8 MHz
atmega328.menu.clock.internal8.bootloader.low_fuses=0xe2
atmega328.menu.clock.internal8.bootloader.high_fuses=0xdb
atmega328.menu.clock.internal8.bootloader.extended_fuses=0xfd
atmega328.menu.clock.internal8.build.f_cpu=8000000L

After that, i see that using direct port manipulation, each port write still take 2us !

QUESTION C : why does it seem like CKDIV8 is not de-activated ? i should run at 8MHz !

And i see the light blinks 8x slower (8s on, 8s off) ... so delay() is misbehaving. That leads me to believe that f_cpu is actually taken into account "in code" even if the hardware clock is wrong.

QUESTION D : how can i fix this "software" side-effect (delay and such) ?

Thanks in advance for your feedback, and have a nice day !

Below is the test code i used for this experiment.


#define DURATION_MS 1000

void setup() {
  Serial.begin(9600);
  // scope output on UNO D9
  bitSet(DDRB, PB1);
  // dummy input on UNO D10
  bitClear(DDRB, PB2);
  // led output on UNO D13
  bitSet(DDRB, PB5);
}

uint8_t foo;
uint8_t bar;
uint16_t count;

void loop() { 

  // start timer
  TCCR1A = 0;
  TCCR1B = 0;
  TCCR1C = 0;
  TCNT1H = 0;
  TCNT1L = 0;
  bitClear(TIFR1, TOV1);
  TCCR1B = 1;

  // scope timing
  bitSet(PORTB, PB1);
  bitClear(PORTB, PB1);
  bitSet(PORTB, PB1);
  bitClear(PORTB, PB1);

  // reading input
  foo = bitRead(PINB, PB2);

  // scope timing
  bitSet(PORTB, PB1);
  bitClear(PORTB, PB1);
  bitSet(PORTB, PB1);
  bitClear(PORTB, PB1);

  // reading input
  bar = bitRead(PINB, PB2);

  // scope timing
  bitSet(PORTB, PB1);
  bitClear(PORTB, PB1);
  bitSet(PORTB, PB1);
  bitClear(PORTB, PB1);

  // stop timer
  TCCR1B = 0;
  count = (TCNT1H << 8) | TCNT1L;
  Serial.print("tov1=");
  Serial.print(bitRead(TIFR1, TOV1));
  bitClear(TIFR1, TOV1);
  Serial.print(" count=");
  Serial.println(count);
  
  // dummy code so that reads are not "optimized away" by the compiler
  foo += bar;

  // visual delay duration check
  bitSet(PORTB, PB5);
  delay(DURATION_MS);
  bitClear(PORTB, PB5);
  delay(DURATION_MS);
}

Some take two and some take three and some can take one, two, or three

I think it would be at least 2 cycles. 1 cycle to read and another cycle to write. The and/or operation might get folded into either the read or write cycles, I guess.

There is a way to update the port in a single cycle on ATMEGA chips. You can write to PINB. Where you write a zero bit, the bit in PORTB is unchanged, but where you write a 1 bit, the bit in PORTB is toggled from 0 to 1 or vice versa.

See: How can one view the assembly code output from the Arduino IDE?

Sounds like you forgot to click "burn bootloader" before you uploaded the sketch. So it's still running at 1MHz but because the sketch was compiled with the expectation it would be running at 8MHz, delay() takes 8 times longer.

Try changing:

// scope timing
bitSet(PORTB, PB1);
bitClear(PORTB, PB1);
bitSet(PORTB, PB1);
bitClear(PORTB, PB1);

to:

 // scope timing
 PORTB = B00000010;
 PORTB = B00000000;
 PORTB = B00000010;
 PORTB = B00000000;

Since you seem to be using ICSP to program the device, you still need to set the fuses say using AVRDUDE.
If you burn a boot loader then the fuses are set as a byproduct of that activity. However, you get into a mess if you then use ICSP to load the code because the BOOTRST fuse will be inappropriately set.

Why write B00000000 to PORTB, all that does is add the delay of an instruction that basically does nothing. Sorry, read that wrong, was thinking you were writing to PINB to toggle the pin.

Use MCUdude's MiniCore, set the fuses by burning the bootloader with the bootloader selection set to "no bootloader", then load the sketch via ICSP.

1 Like

That is also a possible solution using Minicore instead of the equivalent Arduino board package.

However, it appears that the OP is on an exercise to did deep into the system internals so setting the fuses with AVRDUDE could contribute to the learning effect.

Of course, we know the history of this but the idea of having to "burn a bootloader" can initially appear strange to someone who does not actually want a bootloader to be installed.

The number of cycles depends on the instruction. See the AVR Instruction set manual

Thanks for your quick feedback !

For questions A and B

You all rock. Indeed, what i thought would be a single cycle instruction, is indeed a single instruction, running during 2 cycles. Switching from bitSet/bitClear to full port writing showed the desired 1 cycle of 62.5ns.

Today i learnt ! After glossing over the instruction set, i learnt that there are variable timing for the same instructions across "generations" (?).

And in this case, according to avr-objdump -S sketch.elf i can verify that the counter i got from timer1 proved that i was indeed misunderstanding something, and the timer was .. correct !

Indeed, for AVRe instruction set, (2+2+2+2)*3 + (1+1+1+1+2)*2 + 2 = 38 (clock cycle)

  // scope timing (x3)
  bitSet(PORTB, PB1);
 682:	29 9a       	sbi	0x05, 1	; 5 --> AVRe: 2 cycles 
  bitClear(PORTB, PB1);
 684:	29 98       	cbi	0x05, 1	; 5 --> AVRe: 2 cycles 
  bitSet(PORTB, PB1);
 686:	29 9a       	sbi	0x05, 1	; 5 --> AVRe: 2 cycles 
  bitClear(PORTB, PB1);
 688:	29 98       	cbi	0x05, 1	; 5 --> AVRe: 2 cycles 

  // reading input (x2)
  foo = bitRead(PINB, PB2);
 68a:	83 b1       	in	r24, 0x03	; 3 --> AVRe: 1 cycle 
 68c:	82 fb       	bst	r24, 2 ; --> AVRe: 1 cycle 
 68e:	88 27       	eor	r24, r24 ;  --> AVRe: 1 cycle 
 690:	80 f9       	bld	r24, 0 ; --> AVRe: 1 cycle 
 692:	80 93 27 01 	sts	0x0127, r24	; 0x800127 <foo> --> AVRe: 2 cycle 

  // stop timer (x1)
  TCCR1B = 0;
 6b2:	10 92 81 00 	sts	0x0081, r1	; 0x800081 <__DATA_REGION_ORIGIN__+0x21> ;  --> AVRe: 2 cycle 

Now, for question C and D (namely, “fuses as side-effects of something”) :

Thanks to you i just learnt that fuses were not pushed during each programming (as i thought !) and it required either burning a bootloader (which sets fuse as a side effect) or using avrdude.

I admit i have not yet understood this part.

If i change fuses (if a change is needed) with avrdude *after i upload my sketch via ICSP, would it prevent what you just said ?

If not, what is the solution ? "clear" the BOOTRST fuse and re-upload via ICSP ? something else ?

The only thing i knew about the bootloader, is that it is used to self-program over serial.

By deduction, i know i need it on any "development board", but i rarely do need that feature when plugging a pre-loaded MCU in a target pcb, so i think i can manage without one for the time being when pre-loading via ICSP

... if i has no negative side-effect, of course (BOOTRST maybe ?)

So next, i'll play with avrdud and fuses, and hopefuly report back with success.

As @david_2018 pointed out use minicore. If you go to the website all of your questions will be answered.

1 Like

@jim-p I will, for sure. The documentation seems quite complete too, that is great :slight_smile:

I the meantime, i managed to read the fuses with avrdude (nicely documented too) :

C:\\Users\\xxxxx\\AppData\\Local\\Arduino15\\packages\\arduino\\tools\\avrdude\\6.3.0-arduino17\\bin\\avrdude.exe "-CC:\\Users\\xxxxx\\AppData\\Local\\Arduino15\\packages\\arduino\\tools\\avrdude\\6.3.0-arduino17\\etc\\avrdude.conf" -n -v -V -p atmega328p -c stk500v1 -P COM5 -b 19200 -U lfuse:r:-:i -U hfuse:r:-:i -U efuse:r:-:i
...
avrdude.exe: safemode: lfuse reads as 62
avrdude.exe: safemode: hfuse reads as D9
avrdude.exe: safemode: efuse reads as FF
avrdude.exe: safemode: Fuses OK (E:FF, H:D9, L:62)

Which confirm that the 328P is still in factory (internal 8MHz, CKDIV8, etc)

Be careful, there are some fuses that cannot be changed from the ICSP, They require special high voltage programming.

I know disabling the reset pin to use as an I/O is one. It cannot be changed back via ICSP because reset is disabled (duh).

Yes, thanks for the warning.

I read the datasheet and saw that high fuse is the one that is “dangerous”, with SPIEN and RSTDISBL. Though after further reading there is a note for SPIEN which says “The SPIEN Fuse is not accessible in serial programming mode”, so i guess we are safe, as it can only be changed in what the datasheet calls “parallel programming” (something i surely will have fun with later, because i’m a sucker for completeness :slight_smile: ).

So only the RSTDISBL (in high fuse) is actually dangerous in serial (SPI) programming mode.

Finally, i changed “just” the low fuse, after calculating it with AVR® Fuse Calculator – The Engbedded Blog and changed it from the factory default lfuse=0x62 to “the same thing” but “unprogramming” CKDIV8 for a “real” 8MHz internal clock, ie. lfuse=0xE2 using this avrdude command :

C:\Users\xxxxx\AppData\Local\Arduino15\packages\arduino\tools\avrdude\6.3.0-arduino17\bin\avrdude.exe "-CC:\Users\xxxxx\AppData\Local\Arduino15\packages\arduino\tools\avrdude\6.3.0-arduino17/etc/avrdude.conf" -v -V -p atmega328p -c stk500v1 -P COM5 -b 19200 -U lfuse:w:0xE2:m

After doing this, the chipset i previously programmed with a sketch uploaded with the 1MHz clock (arduino ide option) actually ran 8x faster :slight_smile: That is, until i reprogrammed the chip using the 8MHz IDE option, to update f_cpu to match the new clock parameters in the fuse.

As a final note, the scope now shows that single-cycle PORTx= instruction take 125ns, and two-cycle bitSet/bitClear take 250ns, which is all good and dandy, now that the chip actually runs at 8Mhz.

So that sums it up or me. This has been fun, thanks a LOT for your help and advice :+1:

Now onto MiniCore, to see what’s it’s about. Have a great day !

I'm not surprised that this was not understood because I oversimplified the explanation.

The first thing to say is that, by correct use of the minicore board package, as now mentioned a couple of times, you are not confronted with this problem.

However, since you seem to be digging deep into this, a longer explanation is warranted.
If you use the Arduino board package for the Uno and use the "burn bootloader" option, then the fuses are set up for the standard things like external clock but also the fuses for the bootloader (size/start address etc) and this is all fine if you use a USB cable/serial method to upload the sketch using the bootloader. If, however, you use ICSP to upload a sketch then the whole flash area is cleared (including the bootloader area) and the sketch is uploaded. However, the fuses are still set to assume a bootloader. Now, when the system starts, it starts at the beginning of the bootloader area in high memory (because the BOOTRST fuse is set by the "burn bootloader" function). However, there is no bootloader there, just nulls (0x00) because the area has been cleared. These nulls are treated as NOP instructions and the these are systematically executed until the top of the memory area is reached and then back to the beginning of the memory area, where the sketch is located, and can then start, and that all (normally) works fine. But, if your sketch is so large that it infiltrates the area that would have been occupied by the bootloader, then strange things will happen on a system start because it will be jumping into a random position in your sketch. Hence it is important if you uploading the sketch by ICSP (ie not the bootloader) that the BOOTRST fuse is not set.

You can see all the fuse settings here:

Setting the fuses is normally a one-time activity unless there is a serious change like moving from an external crystal to the internal oscillator or something similar. Usually, the fuses are set before the first sketch is uploaded.

Do you understand how the processor works? If the data being used is contained in the instruction, then the operation can occur as the instruction is executed. BUT is the data referenced must be fetched from memory before the instruction can be executed, that requires at least one more cycle. If the data is referenced indirectly, as being indexed, then two or more cycles are needed.

Nope i cannot say i have a deep understanding on how a processor actually works : all 4 instruction below seemed to me to “hold” their data in the instruction (the address for port B, and either the bit to change or the register to write from) : this induced me into thinking that they could have the same “latency”.

My intuition was wrong, so … every day you learn !

  bitSet(PORTB, PB1);
 1de:	c1 9a       	sbi	0x18, 1	; 24 --> 2 cycles
  bitClear(PORTB, PB1);
 1e0:	c1 98       	cbi	0x18, 1	; 24 --> 2 cycles
  PORTB = B00000010;
 1e2:	c8 bb       	out	0x18, r28	; 24 --> 1 cycle
  PORTB = B00000000;
 1e4:	18 ba       	out	0x18, r1	; 24 --> 1 cycle

Your detailed explanation makes very much sense. So thank you for the time to took to clear this up :+1: And this afternoon i did the contrary : i changed the fuse after uploading the sketch, as i wanted to see if it would resolve in real-time.

It seemed to do, but i could not guess if it was due to an “automatic reset” after programming (from avr dude or automatic MCU reset) or from a real-time modification of the “runtime setting” of the clock prescaler :face_with_raised_eyebrow:

PS: I have duly noted that Minicore exists. It’s on a Post-it for next week-end !

That register value must be fetched before the operation can be done.