Producing a high frequency block wave.

First of all: this is more academical than practical a problem! Just those little things that are fun to figure out and teach a bit more about the internals of the processor.

Yesterday I was tinkering a bit, and I was looking to create a high frequency block wave, target frequency around 1 MHz. This is of course trivial to do with a 555 timer, but as I was too lazy to dig up a 555 and wire it up I decided to use a 16 MHz Arduino board, to see how far I could push it.

First attempt: digitalRead() calls with a short delay between.

void setup() {
  pinMode(11, OUTPUT);
}

void loop() {
  digitalWrite(11, HIGH);
  delayMicroseconds(1);
  digitalWrite(11, LOW);
  delayMicroseconds(1);
}

That gave a nice block wave but it was too low a frequency, about 300 kHz.

Next I changed the digitalWrite calls for direct port register calls:

void setup() {
  pinMode(11, OUTPUT);
}

void loop() {
  delayMicroseconds(1);
  PORTB |= (1 << PB3);
  delayMicroseconds(1);
  PORTB &= ~(1 << PB3);
}

This increased the frequency dramatically, and I was getting about 2 MHz (0.5µs peak to peak)! Progress.

But also it didn’t give me a nice waveform any more. The pins simply can not follow this fast, probably too much stray capacitance to overcome. Also I noticed that the peak goes from low to high and back to low in about 0.2µs, after which it stays low for about 0.3 µs before going high again.

This taught me two things:

  • delayMicroseconds() won’t do a 1 us delay, the minimum appears to be 4. I know the micros() counter increments in steps of 4, but didn’t see anything in the delayMicroseconds() documentation. A delay of 10 us is commonly used, though (but no-one of course measures to see if it’s really 10 us, not 8 or 12).
  • digitalRead() and digitalWrite() are slow: taking about 3 us to complete (this was the half wave I saw when stripping out the delays.
  • a direct port call takes two clock cycles (0.125µs)
  • the looping of loop() takes 0.5 - 2 * 0.125 = 0.25µs - four clock cycles.

So… improvements:

void setup() {
  pinMode(11, OUTPUT);
}

void loop() {
  while (true) {
    PORTB &= ~(1 << PB3);
    PORTB |= (1 << PB3);
  }
}

The while(true) loop brings the frequency a bit further up, now the period is just under 0.4µs. So three clock cycles to operate this loop, that’s three instructions, so I don’t think that can be improved on any further.

Now I still have the problem of a much shorter high than low. The high lasts 2 cycles, the low 2+3 = 5 cycles. Actually I want to get to 8 cycles high and 8 cycles low for 1 MHz. Or 6 and 6 for 1 1/2 MHz. Or 5 and 5, if possible.

Let’s try 8 cycles. 4x a port call for high, 2 times + loop() for low.

void setup() {
  pinMode(11, OUTPUT);
}

void loop() {
  PORTB |= (1 << PB3);
  PORTB |= (1 << PB3);
  PORTB |= (1 << PB3);
  PORTB |= (1 << PB3);
  PORTB &= ~(1 << PB3);
  PORTB &= ~(1 << PB3);
}

1 MHz - beautiful :slight_smile: Strip two lines and it’s at 1.5 MHz. Nice. And a pretty nice waveform. That looks like the limit, unless there is a command that takes one cycle (and doesn’t do anything really), or some built in hardware that can do this by itself that I’m not aware of.

Checkout

  PINB = (1 << PB3);

it toggles the bit with one instruction.

Look at using a hardware Timer to generate a 50% PWM signal.

I thought it would be one instruction, but the measurement of the output shows me it takes two clock cycles, and normally one cycle is one instruction on these processors.

Just checked again but the waveform lasts really 1 µs for the last sketch that I posted, that's 16 clock cycles (16 MHz Pro Mini), so 8 cycles for the four port calls, and another 8 for the two second set of two port calls plus the loop() cycle.

Riva:
Look at using a hardware Timer to generate a 50% PWM signal.

I was thinking about that - but the only technique I know is timer + interrupt, and the interrupt overhead is also 2-3 clock cycles.

A quick test doing a loop() with a huge number of port calls (to get rid of the loop() overhead itself) gave me something that resembles a sawtooth wave, at a period of 250 ns (so four clock cycles). Definitely over the speed limit for an Arduino pin to produce an actual square wave: it takes too long to get the pin to change potential.

I really should try to figure out how to get images off my scope...

This taught me two things:
delayMicroseconds() won't do a 1 us delay, the minimum appears to be 4. I know the micros() counter increments in steps of 4, but didn't see anything in the delayMicroseconds() documentation

From the reference

Notes and Warnings
This function works very accurately in the range 3 microseconds and up. We cannot assure that delayMicroseconds will perform precisely for smaller delay-times.

328P datasheet:

18.2.2.
Toggling the Pin
Writing a '1' to PINxn toggles the value of PORTxn, independent on the value of DDRxn. The SBI instruction can be used to toggle one single bit in a port.

The SBI instruction takes 2 clocks but you can write the port with SBR in 1 cycle, any bits you leave 0 don't get toggled.

I learned it from Nick Gammon in his Arduino-VGA blog.

wvmarle:
I was thinking about that - but the only technique I know is timer + interrupt, and the interrupt overhead is also 2-3 clock cycles.

I don't have an Arduino or scope to hand to test but I think the below code snippet will output a 1MHz square wave on pin 3 of an UNO

void setup()
{
  pinMode(3, OUTPUT);
  TCCR2A = _BV(COM2A1) | _BV(COM2B1) | _BV(WGM21) | _BV(WGM20);
  TCCR2B = _BV(CS21);
  OCR2B = 63;
}

void loop(){}

TolpuddleSartre:
From the reference

Thanks.
It appears though that the complete instruction is optimised out somehow, as I do not see a difference in the wave with or without the delayMicroseconds(1) lines.

Riva:
I don't have an Arduino or scope to hand to test but I think the below code snippet will output a 1MHz square wave on pin 3 of an UNO

That code produces a 128 µs period, and maybe 20% duty cycle. Going to try and understand it in more detail, see how far it can be pushed :slight_smile:

Don't use delayMicrosecond(), it relies on the micros() counter that is granular to 4.

You can likely save time by looking into The Atduino Playground for articles on timing.

GoForSmoke:
Don't use delayMicrosecond(), it relies on the micros() counter that is granular to 4.

..except . . . it doesn't.

Why would you state such a thing?

One error in the code: it should be the CS20 instead of CS21 bit that’s set to get prescaler 1 instead of prescaler 8. That brings down the frequency by a factor of 8, so I’m seeing a 16 µs period.

Then setting OCR2B to 128 gives a 50% duty cycle. Apparently the full period is 256 clock cycles.

Some more reading told me that indeed it’s 256 cycles in fast PWM mode 3, but TOP of the register can be set to OCR2A when fast PWM mode 7 is chose. But I can’t seem to get it to actually stop counting at the value stored in OCR2A.

void setup()
{
  pinMode(3, OUTPUT);
  TCCR2A = _BV(COM2A1) |                            // Fast PWM mode: Clear OC2A on Compare Match, set OC2A at BOTTOM
           _BV(COM2B1) |                            // Fast PWM mode: Clear OC2B on Compare Match, set OC2B at BOTTOM, (non-inverting mode).
           _BV(WGM22) | _BV(WGM21) | _BV(WGM20);    // Fast PWM mode 7.
           
  TCCR2B = _BV(CS20);                               // Prescaler 1
  OCR2B = 64;                                       // Sets HIGH time.
  OCR2A = 128;                                      // Should set period length; doesn't seem to do anything.
}

void loop() {}

wvmarle:
But I can’t seem to get it to actually stop counting at the value stored in OCR2A.

The counter does not stop counting when it hits OCR2A - it “overflows” to 0 starting from beginning. But I guess it was only wrong choice of words. Your real problem is in
TCCR2A = _BV(COM2A1) |_BV(COM2B1) | _BV(WGM22) | _BV(WGM21) | _BV(WGM20);
WGM22 bit is in TCCR2B register!

wvmarle:
A quick test doing a loop() with a huge number of port calls (to get rid of the loop() overhead itself) gave me something that resembles a sawtooth wave, at a period of 250 ns (so four clock cycles). Definitely over the speed limit for an Arduino pin to produce an actual square wave: it takes too long to get the pin to change potential.

Are you SURE? I believe Arduino should be able to handle 10MHz signal easily (i.e. when driving SPI clock) - I would expect your scope is pushed to its limits, not Arduino.

wvmarle:
That code produces a 128 µs period, and maybe 20% duty cycle. Going to try and understand it in more detail, see how far it can be pushed :slight_smile:

Headslap moment regarding the duty cycle. The OCR2B = 63; line should read OCR2B = 127; for 50% duty cycle.
Also I noticed I had not set the other registers correctly. Will have to wait until I'm next to the tools to test it.

Smajdalf:
WGM22 bit is in TCCR2B register!

Right… Missed that part. Now it works perfectly - up to 8 MHz, higher is not possible.

I’ve scoped very good looking block waves at 1.5 MHz before, that worked fine, but now this 8 MHz signal doesn’t go down well. Probably indeed a scope limit. Putting the signal through a buffer also doesn’t help at all, same result. The scope does seem to sample fast enough, the signal looks like a cap charging/discharging so that’s probably the issue. It’s a DSO Quad, should do 36 MS/s, or even 72 MS/s.

void setup()
{
  pinMode(3, OUTPUT);
  TCCR2A = _BV(COM2A1) |                            // Fast PWM mode: Clear OC2A on Compare Match, set OC2A at BOTTOM
           _BV(COM2B1) |                            // Fast PWM mode: Clear OC2B on Compare Match, set OC2B at BOTTOM, (non-inverting mode).
           _BV(WGM21) | _BV(WGM20);                 // Fast PWM mode.
           
  TCCR2B = _BV(CS20) |                              // Prescaler 1
           _BV(WGM22);                              // OCRA match for fast PWM mode.
           
  OCR2B = 1;                                        // Sets HIGH time.
  OCR2A = 2;                                        // Should set period length; doesn't seem to do anything.
}

Arduino can handle clock speeds of up to f/2 so 8 MHz for a 16 MHz clock. This code gives that frequency. The counters can also count pulses at up to that speed, I’ve done about 1.5 MHz so far. Fast enough.

That was a fun exercise exploring the various limits :slight_smile:

With
OCR2B = 1; OCR2A = 2;
I would expect the counter to count 0 … 1 … 2 … 0 … 1 … And so frequency will be 16/3 MHz with 2/3 duty? To get 50% duty @ 8 MHz you need
OCR2B = 0; OCR2A = 1;
or not?

As I understand it, the compare is indeed an overflow (yes my wording was wrong). The moment it compares, it's reset.

Let's say TCNT2 = 0. PWM signal is high - start condition.

Clock pulse:
TCNT2 is increased to 1. OCR2B compares: PWM pin is set low.

Clock pulse:
TCNT2 is increased to 2. OCR2A compares, back to starting condition: TCNT2 is set to 0, PWM pin is set high.

The counter never really reaches 2 - the moment it reaches that number it's reset within the same clock pulse.

At least, that's how I understand it to work. Please correct me if I'm wrong.

The section describing the CTC mode is not clear about this topic but when you look at the timing diagrams it shows the TCNT reaches TOP (and stays here for a full timer cycle) before being cleared to BOTTOM.

wvmarle:
. . . Now it works perfectly - up to 8 MHz, higher is not possible. . . .

One needs to burn fuses, so outside of the normal Arduino ecosystem, but the Atmega328 has the ability to output the system clock (normally 16 MHz) on the CLKO pin. CLKO is an alternate function of PB0/Arduino digital pin 8.