Why has this block of code slowed down?

So I have this block of code:

for (long i = 76800; i; i--) {
  PORTC = 0x6;
  PORTD = dHi;
  PORTB = bHi;
  PORTC = 0x2;
  PORTC = 0x6;
  PORTD = dLo;
  PORTB = bLo;
  PORTC = 0x2;
}

This code used to run in less than a millisecond. Now, it takes ~68 milliseconds (measured using Serial.println(millis()) around it). This code is used to color 76800 pixels of an LCD screen. The lag is clearly visible; I can see ~10 fps. The LCD can run at 70 fps, and I've seen it run that fast. I'm using a 2.8" Adafruit LCD if you want to take a look and it's driven by an ILI9341. I'm using an Arduino Uno.
At one point I forgot to wire the reset from the LCD to the Arduino's reset, but I'm pretty sure that this problem was happening a bit before since I would not have unwired it. I have attached the library (the slow code would be in void ezILI9341::oneColor). I would also attach a video I took before of the LCD running too fast for itself but the video is more than 2MB;

I would appreciate it if someone were to install the library (run it like this: ezILI9341 lcd;) and test the functions

Serial.begin(2000000);
lcd.begin();
lcd.memoryControl(0, 0, 1, 0, 1, 0);
Serial.println(micros());
lcd.drawRectangle(0, 0, 320, 240, 0x87F0);
Serial.println(micros());

In void setup(). There is no need for proper wiring, I'd just like to know other Arduinos' speed. Of course, I understand that many of you may be skeptical of malware and I respect that.

I'm sorry if I haven't formatted my code like the Arduino forums requests; I'm used to StackOverflow.

ezILI9341.zip (1.33 KB)

Eight instructions @ 62.5ns x 76800 = 38.4ms.
And that's not including loop overheads, or the fact that I haven't factored-in other instructions.

I can't imagine it ever taking anything like 1ms.

which Arduino?

I'm using an Arduino Uno.

oops
coffee time, will be back soon :slight_smile:

I think you're not the only one needing coffee.

Maybe I need more coffee too, but what does that loop actually do other than write the same thing to the same ports lots of times, nothing references the loop counter so how does the next pixel get selected?

nothing references the loop counter so how does the next pixel get selected?

Many controllers have autoincrement

TheMemberFormerlyKnownAsAWOL:
Eight instructions @ 62.5ns x 76800 = 38.4ms.
And that's not including loop overheads, or the fact that I haven't factored-in other instructions.

I can't imagine it ever taking anything like 1ms.

Ok, maybe it wasn't a millisecond. Either way, that block is still taking 68.388ms and it's visually slower. I have a joystick that controls a little square and it is so much slower than I remember.

68ms does not seem out of reality for 76800 iterations.

That’s less than 14 clock cycles per iteration. The 8 instructions are likely one cycle each ( maybe more with the 4 fetch for Hi and Low bytes if they are not in a register) and subtraction on 4 bytes plus a compare and a branch is likely another 5 cycles very easily...

Are you sure you did not change anything else?

J-M-L:
68ms does not seem out of reality for 76800 iterations.

That’s less than 14 clock cycles per iteration.

But why would it not go at 16 Mhz?

RohitRojo:
But why would it not go at 16 Mhz?

What do you mean?

The fastest instructions are single cycle, aka one instruction every 62.5ns.

Even if your loop contained only a single instruction, that would take nearly 5ms, even with zero loop overhead.

Do some simple arithmetic.

Are you sure you're not remembering the code running faster for a smaller update window?
By my rough calculation, you could manage a 32x32 window in a millisecond.

What happens if you make the loop overhead a much smaller percentage of the total time? BTW, your code doesn't even compile since dHi, bHi, dLo, and bLo are all undefined.

  for (long i = 4800; i; i--) {
    PORTC = 0x6;
    PORTD = dHi;
    PORTB = bHi;
    PORTC = 0x2;
    PORTC = 0x6;
    PORTD = dLo;
    PORTB = bLo;
    PORTC = 0x2;

    PORTC = 0x6;
    PORTD = dHi;
    PORTB = bHi;
    PORTC = 0x2;
    PORTC = 0x6;
    PORTD = dLo;
    PORTB = bLo;
    PORTC = 0x2;

    PORTC = 0x6;
    PORTD = dHi;
    PORTB = bHi;
    PORTC = 0x2;
    PORTC = 0x6;
    PORTD = dLo;
    PORTB = bLo;
    PORTC = 0x2;

    PORTC = 0x6;
    PORTD = dHi;
    PORTB = bHi;
    PORTC = 0x2;
    PORTC = 0x6;
    PORTD = dLo;
    PORTB = bLo;
    PORTC = 0x2;

    PORTC = 0x6;
    PORTD = dHi;
    PORTB = bHi;
    PORTC = 0x2;
    PORTC = 0x6;
    PORTD = dLo;
    PORTB = bLo;
    PORTC = 0x2;

    PORTC = 0x6;
    PORTD = dHi;
    PORTB = bHi;
    PORTC = 0x2;
    PORTC = 0x6;
    PORTD = dLo;
    PORTB = bLo;
    PORTC = 0x2;

    PORTC = 0x6;
    PORTD = dHi;
    PORTB = bHi;
    PORTC = 0x2;
    PORTC = 0x6;
    PORTD = dLo;
    PORTB = bLo;
    PORTC = 0x2;

    PORTC = 0x6;
    PORTD = dHi;
    PORTB = bHi;
    PORTC = 0x2;
    PORTC = 0x6;
    PORTD = dLo;
    PORTB = bLo;
    PORTC = 0x2;

    PORTC = 0x6;
    PORTD = dHi;
    PORTB = bHi;
    PORTC = 0x2;
    PORTC = 0x6;
    PORTD = dLo;
    PORTB = bLo;
    PORTC = 0x2;

    PORTC = 0x6;
    PORTD = dHi;
    PORTB = bHi;
    PORTC = 0x2;
    PORTC = 0x6;
    PORTD = dLo;
    PORTB = bLo;
    PORTC = 0x2;

    PORTC = 0x6;
    PORTD = dHi;
    PORTB = bHi;
    PORTC = 0x2;
    PORTC = 0x6;
    PORTD = dLo;
    PORTB = bLo;
    PORTC = 0x2;

    PORTC = 0x6;
    PORTD = dHi;
    PORTB = bHi;
    PORTC = 0x2;
    PORTC = 0x6;
    PORTD = dLo;
    PORTB = bLo;
    PORTC = 0x2;

    PORTC = 0x6;
    PORTD = dHi;
    PORTB = bHi;
    PORTC = 0x2;
    PORTC = 0x6;
    PORTD = dLo;
    PORTB = bLo;
    PORTC = 0x2;

    PORTC = 0x6;
    PORTD = dHi;
    PORTB = bHi;
    PORTC = 0x2;
    PORTC = 0x6;
    PORTD = dLo;
    PORTB = bLo;
    PORTC = 0x2;

    PORTC = 0x6;
    PORTD = dHi;
    PORTB = bHi;
    PORTC = 0x2;
    PORTC = 0x6;
    PORTD = dLo;
    PORTB = bLo;
    PORTC = 0x2;

    PORTC = 0x6;
    PORTD = dHi;
    PORTB = bHi;
    PORTC = 0x2;
    PORTC = 0x6;
    PORTD = dLo;
    PORTB = bLo;
    PORTC = 0x2;

  }

Even unrolling the loop isn't going to get anywhere near the claimed rate.

TheMemberFormerlyKnownAsAWOL:
Even unrolling the loop isn't going to get anywhere near the claimed rate.

Indeed, OP could have 614.400 lines of code (76800 times the 8 PORT instructions) it would still take the 38.4ms you listed in your answer #1

J-M-L:
Indeed, OP could have 614.400 lines of code (76800 times the 8 PORT instructions) it would still take the 38.4ms you listed in your answer #1

And the compiler will probably optimize the code and re-establish the loop anyway.

RohitRojo:
I have a joystick that controls a little square and it is so much slower than I remember.

Is this maybe relevant to my earlier comment about a smaller update window?

What happens if you count up instead of down?

for (long i = 0; i<76800; i++) {
  PORTC = 0x6; // 0b00000110
  PORTD = dHi;
  PORTB = bHi;
  PORTC = 0x2; // 0b00000010
  PORTC = 0x6;
  PORTD = dLo;
  PORTB = bLo;
  PORTC = 0x2;
}

J-M-L:
Indeed, OP could have 614.400 lines of code (76800 times the 8 PORT instructions) it would still take the 38.4ms you listed in your answer #1

True, but as you continue to unroll the loop, I'd expect the total time to (asymptotically) approach that value in the limit. In fact, it should get close rather quickly. If it doesn't, then something else is going on.

gfvalvo:
BTW, your code doesn't even compile since dHi, bHi, dLo, and bLo are all undefined.

Don't worry about my actual code, it's compiling fine. It's just slow

TheMemberFormerlyKnownAsAWOL:
What do you mean?

The fastest instructions are single cycle, aka one instruction every 62.5ns.

Even if your loop contained only a single instruction, that would take nearly 5ms, even with zero loop overhead.

Do some simple arithmetic.

Are you sure you're not remembering the code running faster for a smaller update window?
By my rough calculation, you could manage a 32x32 window in a millisecond.

Yeah, maybe it wasn't a millisecond. But it was pretty fast.
Here it is going slow now
Here it is going fast before
You can notice that the LCD now has that weird color mix going down the square much slower than before. The FPS my phone camera handles is probably slower than the FPS the screen handles, but I don't know if that should matter.