Why is it handling the array slowly ?

I have a code.
Available on Wokwi: Slow array handling - Wokwi ESP32, STM32, Arduino Simulator

#include <digitalWriteFast.h>

class MyClass {

private:
	int pins[5] = { 2, 3, 4, 5, 6};

public:

	// 48 micro seconds:
	void slowWrite() {
		for (int i = 0; i < 5; i++) {
			digitalWriteFast(pins[i], 1);
			digitalWriteFast(pins[i], 0);
		}
	}

	// 4 micro seconds:
	void fastWrite() {
		digitalWriteFast(2, 1);
		digitalWriteFast(2, 0);

		digitalWriteFast(3, 1);
		digitalWriteFast(3, 0);

		digitalWriteFast(4, 1);
		digitalWriteFast(4, 0);

		digitalWriteFast(5, 1);
		digitalWriteFast(5, 0);

		digitalWriteFast(6, 1);
		digitalWriteFast(6, 0);
	}
};

void setup() {

	Serial.begin(1000000);
	while (!Serial) {}

	unsigned long microsec = 0;
	MyClass test;

	// It's just a test
	pinModeFast(2, OUTPUT);
	pinModeFast(3, OUTPUT);
	pinModeFast(4, OUTPUT);
	pinModeFast(5, OUTPUT);
	pinModeFast(6, OUTPUT);

	while (1) {

		microsec = micros();

		test.slowWrite();

		Serial.println(micros() - microsec);

	}
}

Welcome to the forum

As your topic does not relate directly to the installation or operation of the IDE it has been moved to the Programming Questions category of the forum

You should give a more comprehensive description of what "slow" means because that is always relative.

Try printing less often. At 1Mb/s, you can print 100,000 characters/second. The speed of your while-loop is probably faster than that and your serial buffer will fill up after which it slows down the complete process.

If you want to time it, call slowWrite() e.g. 1000 times, time that and print the result.

This does not matter.
The problem with array. If I write the functions separately within a function, the time is only 4 microseconds.
But 48 in Array loop......

I apologise; I missed the times that you mentioned in the functions.

1 Like

Consider what the code has to do when you use a for loop and an array

  • it needs to read the values from the array 10 times which will involve calculation of the memory location of the value
  • it needs to increment the loop variable
  • it needs to check 5 times whether the for loop value has been reached
  • it needs to jump back in the code

That is much more than the inline code needs to do

  • no calculation of memory locations
  • no incrementing of a variable
  • no testing the value of a variable
  • no jumping back in the code

I would be interested in knowing whether reversing the for loop to count down towards zero changes the speed of execution

1 Like

The problem is not only in the array, but primarily in the for loop. At each iteration, you need to increment the variable, compare it with the limit, and make a transition from the end of the loop to the beginning. This all takes time, comparable to accessing pins. The fact that the cycle will be 2-3 times slower than direct writing to pins is normal.

However, the difference between 4 and 48us is too large. I think you didn't measure the time quite correctly. try repeating the tests several times and take the average:

microsec = micros();
// run test 12 times without loop
test.slowWrite(); test.slowWrite(); test.slowWrite(); test.slowWrite();
test.slowWrite(); test.slowWrite(); test.slowWrite(); test.slowWrite();
test.slowWrite(); test.slowWrite(); test.slowWrite(); test.slowWrite();

Serial.println((micros() - microsec)/12);

A little more research.

This is in the library

#if !defined(digitalWriteFast)
#  if (defined(__AVR__) || defined(ARDUINO_ARCH_AVR)) && defined(__digitalPinToPortReg)
#    if defined(THROW_ERROR_IF_NOT_FAST)
#define digitalWriteFast(P, V) \
if (__builtin_constant_p(P)) { \
  BIT_WRITE(*__digitalPinToPortReg(P), __digitalPinToBit(P), (V)); \
} else { \
    NonConstantsUsedForDigitalWriteFast(); \
}
#    else
#define digitalWriteFast(P, V) \
if (__builtin_constant_p(P)) { \
  BIT_WRITE(*__digitalPinToPortReg(P), __digitalPinToBit(P), (V)); \
} else { \
	Serial.println("using digitalWrite");\
  digitalWrite((P), (V)); \
}
#    endif // defined(THROW_ERROR_IF_NOT_FAST)
#  else
#define digitalWriteFast digitalWrite
#  endif
#endif

I have added the Serial.println().

The library falls back to normal digitalWrite() when using an array.. I did not look too deeply into the reason why, it seems to be because you use a variable and not a value. I'll leave it up to the C/C++ specialists to explain more.

1 Like

digitalWriteFast() is only able to use the "fast" technique when the pin number is known at compile time. Obviously, that's not the case when used with an array in a loop.

3 Likes

do you know an alternative to the loop?

Yes, the 4us code you posted.

1 Like

If you have a loop over a small number of things, but you are short on time, the technique is to do 'xactly what your faster version does.

It has a name, loop unrolling, one of a handful of optimisations it still can make sense to do.

a7

I concur. And at the same time do not use variables for indexes, but use the actual numbers so they can be compiled into your code.

I was about to say that.
any who, in short, these processes take time, each process doesn't take very long, but all of the delays stack up and make a longer delay.

We can knock away a few of the things that slow it down, and retain the convenience of the pin array for changing the wiring:

private:
	byte pins[] = { 2, 3, 4, 5, 6, 0};

public:

  void slowWrite() {
    byte *foo = pins;
    byte xx;
     while (xx = *foo) {
       digitalWrite(xx, 1);
       digitalWrite(xx, 0);
       foo++;
    }
  }

One could expect *foo to be evaluated once by the compiler and drop the xx thing, reasonable I think. I used digitalWrite() since it digitalWriteFast() reverts.

In the last few years, however, I would say any time I've tried to outwit the compiler I have failed, at the expense of less obscure code. :expressionless:

a7

1 Like

THX for using wokwi and posting the link.

The wokwi made it easy to test, and I can report no speed difference was seen.

I am not surprised.

a7

So the code below sets 5 pins HIGH then LOW on an Uno, in a loop, using direct port manipulation in under 12-16us. Not bad considering the loop and other overhead required.

#include "Arduino.h"

class MyClass {
	struct PinStruct {
		const uint8_t pinNumber;
		volatile uint8_t *portReg = nullptr;
		uint8_t setMask = 0;
		uint8_t resetMask = 0;

		PinStruct(uint8_t p) : pinNumber(p) {
		}
	};

	PinStruct pins[5] = {2, 3, 4, 8, 9};

public:
	void begin() {
		for (auto &pin : pins) {
			pinMode(pin.pinNumber, OUTPUT);
			pin.portReg = portOutputRegister(digitalPinToPort(pin.pinNumber));
			pin.setMask = digitalPinToBitMask(pin.pinNumber);
			pin.resetMask = ~pin.setMask;

			/*
			 Serial.println(pin.pinNumber);
			 Serial.println(reinterpret_cast<uintptr_t>(pin.portReg), HEX);
			 Serial.println(pin.setMask, HEX);
			 Serial.println(pin.resetMask, HEX);
			 Serial.println();
			 */
		}
	}

	void fastWrite() {
		for (auto &pin : pins) {
			noInterrupts();
			*(pin.portReg) |= pin.setMask;
			interrupts();
			noInterrupts();
			*(pin.portReg) &= pin.resetMask;
			interrupts();
		}
	}
};

void setup() {
	Serial.begin(115200);
	delay(2000);

	MyClass test;
	test.begin();

	while (1) {
		uint32_t startTime = micros();
		test.fastWrite();
		uint32_t duration = micros() - startTime;

		Serial.print("Duration: ");
		Serial.print(duration);
		Serial.println("us");

		delay(200);
	}
}

void loop() {

}

2 Likes

Thank you so much and everyone. It's works :saluting_face:

Why are these needed?

I guess |= and &= are not atomic, so there's a possibility, albeit very small, that an interrupt routine could change the port bits between when this code reads them and updates them?

Does digitalWrite() also do this?

How long do interrupts() and noInterrupts() take?

Would it be better to remove these 2 lines when they are consecutive?

			interrupts();
			noInterrupts();

Right, so when the update writes back to the port it could clobber the changes made within the ISR code.

Yes, see the source code for digitalWrite() in wiring_digital.c.

From Arduino.h:

#define interrupts() sei()
#define noInterrupts() cli()

I think those are single-cycle instructions, then 62.5ns so a 16MHz AVR?

Yes, but I had assumed that the code from @r-istvan was just a demo and the real application needed an equivalent to digitalWrite(), not just toggling an output pin. If the latter is required, there are faster ways of doing it.

1 Like