digitalWriteFast and checking __builtin_constant_p(V)

I was wondering if anyone had some insight on why to val param is checked to be constant during compile time in the digitalWriteFast macros?

It seems like the compile time check limits optimizing some code that may be toggling the value high or low. The value only used once in bitWrite to determine if a bit should be set or cleared. If a constant value is passed in the compiler will do it’s thing, otherwise if the value is a variable the code will have a a simple comparison to determine whether to set or clear the bit without the added overhead of the standard digitalWrite.

It seems like a solid performance trade off to remove this compile time check and only do a compile time check on the pin number. Is there any issue to worry about here with interrupts that I’m overlooking?

making this change allows the following code

inline void SSD1306fast::spiwrite(uint8_t val) {
	uint8_t i;

	for (i = 0; i < 8; i++)  {
            if (!(val & (1 << (7 - i)))) 
            { 
                digitalWriteFast(sid, LOW); 
            } 
            else 
            { 
                digitalWriteFast(sid, HIGH); 
            };

            digitalWriteFast(sclk, HIGH);
            __asm__ __volatile__ ("nop \n\tnop \n\t");
	    digitalWriteFast(sclk, LOW);	
	}

}

to be written like this and still be optimized by the compiler

inline void SSD1306fast::spiwrite(uint8_t val) {
	uint8_t i;

	for (i = 0; i < 8; i++)  {
            digitalWriteFast(sid, !!(val & (1 << (7 - i)))); 
            digitalWriteFast(sclk, HIGH);
            __asm__ __volatile__ ("nop \n\tnop \n\t");
	    digitalWriteFast(sclk, LOW);	
	}

}

craiglink: I was wondering if anyone had some insight on why to val param is checked to be constant during compile time in the digitalWriteFast macros?

The reason is very likely an historical artifact coupled with a failure to reanalyze the new implementation.

Paul Stoffregen's version (which I believe is the original; it has certainly become the benchmark) has separate handlers for each of the four cases (pin constant + value constant; pin constant + value variable; pin variable + value constant; both variable). The implementation consumes a fairly large amount of Flash. On the Teensy boards, which have plenty of Flash, it isn't a problem. On more modest processors like the 328 or 168, it can be a burden. In addition, the implementation is somewhat complicated (but very clever) making it a little difficult to port.

The digital*Fast implementations that I've seen try to keep the Flash consumption about the same as the standard core and the details as simple as possible. Which means the focus is on the constant+constant case. The variable+variable case always resorts to the standard core functions. This leaves the two variable+constant cases dangling. The conservative approach, the approach typically taken, is to also use the standard core functions for the variable+constant cases.

It seems like the compile time check limits optimizing some code that may be toggling the value high or low.

It does. If you're especially curious, somewhere on the "old" forum is a discussion about the gory details.

If a constant value is passed in the compiler will do it's thing, otherwise if the value is a variable the code will have a a simple comparison to determine whether to set or clear the bit without the added overhead of the standard digitalWrite.

Exactly. A few more machine instructions just to perform the comparison.

It seems like a solid performance trade off to remove this compile time check and only do a compile time check on the pin number.

Not only does that improve performance it also reduces the total amount of code generated. There is no reason to perform a constant check on the value.

Is there any issue to worry about here with interrupts that I'm overlooking?

No.