SF Bay Area (USA)
Offline
Faraday Member
Karma: 78
Posts: 5453
Strongly opinionated, but not official!
|
 |
« on: December 26, 2008, 05:06:56 am » |
Hmm. This was asked over on AVRFreaks, and it's FREQUENTLY a Frequently asked question about CPUs/etc, though I don't recall ever seeing it asked here. Since I actually did the experiment, I'll post the answer anyway! while (1) { digitalWrite(3, 1); digitalWrite(3, 0); } produces a 106.8kHz square wave on digital pin 3 in Arduino 0010, 0011, and 0012. Though it would probably be foolish to count on exactly that speed; library functions are subject to change. cli(); while (1) { PORTD |= 0x8; PORTD &= ~0x8; } on the same board runs at 2.667MHz. (This does produce the minimal sbi/cbi/rjmp loop that you'd expect, BTW.) (so that's about a 20x penalty for the arduino library code; sounds about right: the overhead of abstracting IO to "pin number" is pretty substantial: a subroutine call, lookup table to get the port, another lookup table to get the bit, a third to check whether analogWrite is in use, and then less efficient instructions to access the port "indirectly")
|
|
|
|
|
Logged
|
|
|
|
|
Connecticut, US
Offline
Edison Member
Karma: 1
Posts: 1036
Whatduino
|
 |
« Reply #1 on: December 26, 2008, 09:46:49 am » |
Speaking of the hefty digitalWrite() overhead, I noticed two things when I went poking into that code.
One, some pins are slower than others, because they have PWM timers that have to be disengaged.
Two, the "function" that turns off those timers has a really ugly chain of if/if/if/if statements that should really be a switch or at least if/else if/else if/else statements. I did a timing analysis like you, westfw, and found that the gcc compiler really does optimize the code the same way since it's all forced inline, and comparing against static const data. But it bugs me to see code that relies on the optimizer to rephrase so drastically to do the right thing-- one accidental tweak or compiler update and boom, digitalWrite() could be twice as slow, because the compiler decides to implement what is written, not what might possibly be implied.
|
|
|
|
« Last Edit: December 26, 2008, 09:48:21 am by halley »
|
Logged
|
|
|
|
|
0
Offline
God Member
Karma: 0
Posts: 507
|
 |
« Reply #2 on: December 26, 2008, 01:58:23 pm » |
That's interesting.
re: 2.667MHz example. Was that at 8hz? I have been operating under the assumption that a 16mhz atmega did 16,000,000 instructions per second. Which would be 5.333 million loops per second with only 3 instructions.
Also the output isn't exactly "square", you would need another STI (or NOP) to hold the pin high for the same time as low, but it would be a bit slower if half/on half/off was a requirement.
|
|
|
|
« Last Edit: December 26, 2008, 02:01:01 pm by dcb »
|
Logged
|
|
|
|
|
London
Offline
Faraday Member
Karma: 6
Posts: 6226
Have fun!
|
 |
« Reply #3 on: December 26, 2008, 02:49:30 pm » |
I would have thought it would do 4 million loops per second at 16mhz. The while statement will be optimized to a relative jump, which is a two clock instruction – so the whole loop should be 4 clock cycles long
|
|
|
|
« Last Edit: December 26, 2008, 02:49:46 pm by mem »
|
Logged
|
|
|
|
|
SF Bay Area (USA)
Offline
Faraday Member
Karma: 78
Posts: 5453
Strongly opinionated, but not official!
|
 |
« Reply #4 on: December 26, 2008, 04:55:20 pm » |
some pins are slower than others, because they have PWM timers that have to be disengaged. Yes. The code reads: timer = digitalPinToTimer(pin); : if (timer != NOT_ON_TIMER) turnOffPWM(timer); I had assumed digitalPinToTimer() would be false if analogWrite wasn't active, but it's actually a ROM-based function that says which timer MIGHT be associated with the pin. Pin3 used in my example IS a PWM output... The examples are with a 16MHz MDC Bare Bones Board, as shown by the little frequency readout on my Tek TDS210 scope; you're right that that's not what I'd expect based on instruction timings. This may require more investigation!
|
|
|
|
|
Logged
|
|
|
|
|
SF Bay Area (USA)
Offline
Faraday Member
Karma: 78
Posts: 5453
Strongly opinionated, but not official!
|
 |
« Reply #5 on: December 26, 2008, 05:00:25 pm » |
Ah. SBI and CBI are 2-cycle instructions (I guess that makes sense, since they're read-modify-write), as is the jmp, so 2.66666 MHz is exactly as expected after all!
This implies that I can make a faster loop using OUT and pre-loaded registers...
|
|
|
|
|
Logged
|
|
|
|
|
SF Bay Area (USA)
Offline
Faraday Member
Karma: 78
Posts: 5453
Strongly opinionated, but not official!
|
 |
« Reply #6 on: December 26, 2008, 10:10:02 pm » |
while (1) { PORTD = ones; PORTD = zeros; }
Gives me 4Mhz (and noticeably not "square wave." Adding a nop makes it more square but changes the max freq to about 3.2MHz. Adding two nops makes for very square, but back to 2.667MHz.) Using digitalWrite() on a non-PWM pin (4 instead of 3) runs about 148.4kHz instead of 106.8kHz: while (1) { digitalWrite(4, 1); digitalWrite(4, 0); }
|
|
|
|
|
Logged
|
|
|
|
|
0
Offline
God Member
Karma: 0
Posts: 507
|
 |
« Reply #7 on: December 27, 2008, 01:04:59 am » |
Ok, it makes sense now  Here's a square wave version (resulting assembler confirmed), should be 8 cycles per loop, and that would be a 2mz output on a 16mhz CPU (1 mhz on an 8mhz cpu). cli(); while (1) { PORTD |= 0x8; PORTD |= 0x8; PORTD &= ~0x8; }
Are changing fuses allowed  ? it seems you can also program the CKOUT fuse and get the system clock echoed on CLK0 (digital  , which could be useful I recon. So that would be a 16mhz toggle speed, and it is conceivable that you could control it with an external gate with high precision.
|
|
|
|
|
Logged
|
|
|
|
|
SF Bay Area (USA)
Offline
Faraday Member
Karma: 78
Posts: 5453
Strongly opinionated, but not official!
|
 |
« Reply #8 on: December 27, 2008, 04:00:37 am » |
You can usually suck the system clock off of one of the oscillator pins (XTAL2 is an output of an inverter-based circuit.), especially if you're willing to add a gate to square things off. I guess it depends on whether the fuse bits are set for the "low power" oscillator or the "full swing" oscillator. See Sections 7.2 through 7.4 of the Atmega168 data sheet.
|
|
|
|
|
Logged
|
|
|
|
|
France
Offline
Sr. Member
Karma: 0
Posts: 254
|
 |
« Reply #9 on: June 13, 2009, 10:04:27 pm » |
You can save one cycle from cli(); while (1) { PORTD |= 0x8; PORTD |= 0x8; PORTD &= ~0x8; } and get the higher square wave frequency with cli(); while (1) { PORTD |= B1000; PORTD &= B11110111; }
|
|
|
|
« Last Edit: June 13, 2009, 10:06:55 pm by selfonlypath »
|
Logged
|
|
|
|
|
0
Offline
Faraday Member
Karma: 6
Posts: 2504
|
 |
« Reply #10 on: June 14, 2009, 12:32:57 pm » |
The newer devices (168/328, IIRC) have an instruction to toggle a pin - too bad I can't recall the name of it right now... That would reduce it by one more instruction.
-j
|
|
|
|
|
Logged
|
|
|
|
|
Oxford (England)
Offline
Jr. Member
Karma: 0
Posts: 58
|
 |
« Reply #11 on: June 14, 2009, 01:14:29 pm » |
I think the new 'toggle' functionailty for pins works by writing a 1 to it's PIN register. e.g. to toggle bit 3 of port B, you can do:
PINB = B1000 ;
[yup - M88 datasheet, section 13.1 : "writing a logic one to a bit in the PINx Register, will result in a toggle in the corresponding bit in the Data Register"]
This does NOT work on Mega8's.
I think this means a 3 cycle loop is possible (1 cycle for the OUT instruction, 2 cycles for the RJMP). That results in a 2.666Mhz square signal.
In theory you can beat this by filling the memory space with that OUT instruction, and letting the program-counter roll-over at the end of flash, but there are problems with this solution, despite a theoretical 8Mhz output...
|
|
|
|
|
Logged
|
|
|
|
|
France
Offline
Sr. Member
Karma: 0
Posts: 254
|
 |
« Reply #12 on: June 14, 2009, 01:35:43 pm » |
Could someone tell exactly how many cycles uses the while(true), I mean the overhead or surrounding cli(); while (true) { PORTD |= B1000; PORTD &= B11110111; } For example, each while(true) loop will take 2 cycles to execute both PORTD writing / toggling but how many cycles the while(true) itself will take each run whatever instructions inside the loop ?
|
|
|
|
« Last Edit: June 14, 2009, 01:36:21 pm by selfonlypath »
|
Logged
|
|
|
|
|
London, England
Offline
Edison Member
Karma: 3
Posts: 1026
Go! Go! Arduinoooo !!!
|
 |
« Reply #13 on: June 14, 2009, 01:37:19 pm » |
That's an inexcusable loss of speed. The compiler should be conveting the digitalWrite code into the same ASM that you get by doing it in raw C. This should be fixed in the next IDE if possible.
|
|
|
|
|
Logged
|
|
|
|
|
Oxford (England)
Offline
Jr. Member
Karma: 0
Posts: 58
|
 |
« Reply #14 on: June 14, 2009, 02:31:58 pm » |
@selfonlypath: Your code (or at least a variant using PORTB on my M8) compiles to:
sbi 0x18, 3 cbi 0x18, 3 rjmp .-6
Each of these instructions are 2 cycles, hence this is a 6 cycle loop. Note that it's not uniform though. The output will be on for 2 cycles, and off for 4 cycles.
|
|
|
|
|
Logged
|
|
|
|
|
|