digitalWriteFast, digitalReadFast, pinModeFast etc

jrraines wrote

earlier on this forum item, on 5/16, Paul said Quote:

I'd like to point out this macro approach is not my preferred coding style for this optimization. I did it this way because David said he didn't like the verbose inline function approach I originally posted. Indeed that way is much larger, but it has the advantage of not touching any pin when an illegal pin number is given as the input. This macro, as currently implemented, will always write to some pin even if the pin number is too large.

This is only your second post and you seem to know a trick for getting rid of ';' in macros that I do not. I still feel very much like a newcomer to a lot of this stuff.

Well. I've been writing C code for a long time, but only playing with the Arduino for a little bit. And I cop to joining the thread late and not having read all the messages.

It surprises me that the inline function "is much larger". I assume he means the generated code is bigger. My experience with gcc is that inline functions are quite efficient, and the little test code I just tried confirmed that avr-gcc is not different. Of course, digitalWriteFast.h is much more complex than my test. Maybe I'll try to convert it to inline functions later.

Also, I think the macros that use if statements can also use the ?: conditional expression. Is that also for better generated code?

Again, sorry if this has all been gone through before.

I assume he means the generated code is bigger.

Naw. The source code is where the difference lies. They both reduce to a single (or a few depending on the circumstances) machine instruction. It's a matter of taste, maintenance, and functionality. My preference is for inline-functions for one reason: I HATE debugging macro related bugs.

Of course, digitalWriteFast.h is much more complex than my test. Maybe I'll try to convert it to inline functions later.

I suggest starting with Paul Stoffregen's work...
http://www.pjrc.com/teensy/teensyduino.html

Also, I think the macros that use if statements can also use the ?: conditional expression.

That's very likely true. AlphaBeta uses conditional expressions frequently and he hasn't complained of any problems.

Is that also for better generated code?

I don't think there is any difference in the generated code. "What's in an "if"? That which we call a condition. By any other name would branch just the same."

I didn't think you could meaningfully use __builtin_constant_p in inline functions?

the only place I used the ? trigrams was in digitalReadFast. I need it there because it yields a value I can assign to a variable. Bill Westfield had to explain that to me. see http://www.arduino.cc/cgi-bin/yabb2/YaBB.pl?num=1266983837/1

westfw wrote:

I didn't think you could meaningfully use __builtin_constant_p in inline functions?

That's a good point, but it turns out that it does work. I was too lazy to look up the documentation to see whether it's defined behavior though. I just tried this:

static int
g(int n)
{
   return n * 10;
}

static inline int
f(int n)
{
   if (__builtin_constant_p(n))
      return n + 10;
   return g(n);
}

int
main(int argc, char **argv)
{
   return f(10);
}

and got this (with avr-gcc -O -S):

...
main:
/* prologue: frame size=0 */
        ldi r28,lo8(__stack - 0)
        ldi r29,hi8(__stack - 0)
        out __SP_H__,r29
        out __SP_L__,r28
/* prologue end (size=4) */
        ldi r24,lo8(20)
        ldi r25,hi8(20)
/* epilogue: frame size=0 */
        rjmp exit
...

Not the most elegant test program but I was in a bit of a hurry.

jrraines wrote:

the only place I used the ? trigrams was in digitalReadFast. I need it there because it yields a value I can assign to a variable. Bill Westfield had to explain that to me. see http://www.arduino.cc/cgi-bin/yabb2/YaBB.pl?num=1266983837/1

I see.

The advantage of using ?: in a macro instead of an if statement is that it keeps the expansion an expression. Since the macro call looks like a function call, which is syntactically an expression, it's better when the expansion is also an expression. This also means any trailing semicolon behaves correctly.

Sometimes, you need to cast one of the branches into a void, when the other branch is a function that returns void. For example, in (c) ? f() : g(), if f() returns void but g() doesn't, then write it as (c) ? f() : (void) g(). (Yes, then the whole expression returns void and can only be used in a statement context, for side effects only, which is also the correct behavior.)

So, basically, ?: is better than an if statement in a macro, if it's possible. Now, gcc also has the ({...}) syntax,, so pretty much any macro can be written with ?:. On the other hand, why not just use an inline function, which doesn't evaluate arguments more than once?

That's certainly MUCH prettier (and a lot easier to read) than the macro version!

Several times I have thought I should put digitalWriteFast onto the Playground section of the site. When I have looked at doing this, I can't figure out what category it would go in. This is, of course, a reflection of Paul Stoffregen's point, that it belongs in the core, not in a library.

Nonetheless, it seems to me it should be on the Playground. I am open to suggestions as to what the category should be.

New category: "Advanced Optimizations, including alternative Core functionality."

There has been discussion of a lot of stuff in the forums that could go there, including (for example) hints on just how to go about DOING optimization...

I did some experiments with the msp430 implementation the Arduino core I've been working on (with a digitalWrite() macro that uses __builtin_constant_p), and I found that using an inline function caused the results to be "overly and mysteriously dependent on the optimization flags given to the compiler." I think I'll stick with the macros...

I made a very small change to support the Mega2560. I've tested it on the Mega1280 and the UNO. A 2560 is in the mail. I also included in the download versions of the test code for both the Uno and for the Mega. The version for the Mega actually runs through the tests 3 times in sequence for one time through loop(). It is special in that it demonstrates that the arduino gcc actually can handle program code that gets to a size that big.

i.e. Binary sketch size: 87296 bytes (of a 126976 byte maximum)

(Sketches that contain a huge amount of PROGMEM data and then code that gets stored higher up don't work at this time see: http://www.arduino.cc/cgi-bin/yabb2/YaBB.pl?num=1274821710 )

The download is at Google Code Archive - Long-term storage for Google Code Project Hosting.

Thanks for maintaining this, jrraines. I'd like to include it in the Arduino software, soon, as a replacement for the current digitalRead() (at least and probably pinMode() and digitalRead(), too). Any reason not to? Or any reason why it should stay a separate function?

I'm certainly in favor of adding it to the core.

There has been much discussion of pros and cons by people who are wiser than I am.

This just handles the case where the pin numbers are defined at compile time, of course. So you can't really get rid of the existing versions to handle cases like subroutines that get pin numbers passed in or loops with different pin numbers on successive passes through the code. But this is just a big macro so it will never make code longer or slow down runtime speed.

I made a lot of progress Friday night and early Saturday morning and have had a confusing and frustrating time since.

I actually have/had something working to move this stuff into the core in wiring.h and wiring_digital.c I emailed it to Paul Stoffregen hoping to get him to review it and haven't heard back.

I struggled all day Saturday trying to get a version of digitalWriteFast that would cooperate with both the current versions of wiring.h/wiring_digital.c and my modified ones.

My Mega2560 came and it does not work with what I posted a few days ago. It seems to me that I should just be able to use the same version of the test code as I use for the Mega1280, select Mega2560 from the boards menu and it should have just worked. But there are at least some errors in the test program on almost every pin pair. AND THE CODE SIZE FOR THE 2560 IS 20000 bytes smaller! I was very puzzled.

Here is the 1280 disassembly (which works):

analogWrite(2,254);
     2dc:      82 e0             ldi      r24, 0x02      ; 2
     2de:      6e ef             ldi      r22, 0xFE      ; 254
     2e0:      70 e0             ldi      r23, 0x00      ; 0
     2e2:      0e 94 14 a3       call      0x14628      ; 0x14628 <analogWrite>
pinModeFast(2,INPUT);
     2e6:      6c 98             cbi      0x0d, 4      ; 13
digitalWriteFast(2,HIGH); 
     2e8:      80 91 90 00       lds      r24, 0x0090
     2ec:      8f 7d             andi      r24, 0xDF      ; 223
     2ee:      80 93 90 00       sts      0x0090, r24
     2f2:      74 9a             sbi      0x0e, 4      ; 14
pinModeFast(5,OUTPUT);
     2f4:      6b 9a             sbi      0x0d, 3      ; 13
digitalWriteFast(5,LOW);
     2f6:      80 91 90 00       lds      r24, 0x0090
     2fa:      8f 77             andi      r24, 0x7F      ; 127
     2fc:      80 93 90 00       sts      0x0090, r24
     300:      73 98             cbi      0x0e, 3      ; 14
delay(1);
     302:      61 e0             ldi      r22, 0x01      ; 1
     304:      70 e0             ldi      r23, 0x00      ; 0
     306:      80 e0             ldi      r24, 0x00      ; 0
     308:      90 e0             ldi      r25, 0x00      ; 0
     30a:      0e 94 59 a2       call      0x144b2      ; 0x144b2 <delay>
if((int) digitalReadFast(2) != LOW) error(2,5,1);
     30e:      80 91 90 00       lds      r24, 0x0090
     312:      8f 7d             andi      r24, 0xDF      ; 223
     314:      80 93 90 00       sts      0x0090, r24
     318:      64 9b             sbis      0x0c, 4      ; 12
     31a:      07 c0             rjmp      .+14           ; 0x32a <loop+0x5e>
     31c:      82 e0             ldi      r24, 0x02      ; 2
     31e:      90 e0             ldi      r25, 0x00      ; 0
     320:      65 e0             ldi      r22, 0x05      ; 5
     322:      70 e0             ldi      r23, 0x00      ; 0
     324:      41 e0             ldi      r20, 0x01      ; 1
     326:      50 e0             ldi      r21, 0x00      ; 0
     328:      9a df             rcall      .-204          ; 0x25e <_Z5erroriii>

analogWrite(5,254);
     32a:      85 e0             ldi      r24, 0x05      ; 5
     32c:      6e ef             ldi      r22, 0xFE      ; 254
     32e:      70 e0             ldi      r23, 0x00      ; 0
     330:      0e 94 14 a3       call      0x14628      ; 0x14628 <analogWrite>
pinModeFast(2,INPUT);
     334:      6c 98             cbi      0x0d, 4      ; 13
digitalWriteFast(2,HIGH); 
     336:      80 91 90 00       lds      r24, 0x0090
     33a:      8f 7d             andi      r24, 0xDF      ; 223
     33c:      80 93 90 00       sts      0x0090, r24
     340:      74 9a             sbi      0x0e, 4      ; 14
pinModeFast(5,OUTPUT);
     342:      6b 9a             sbi      0x0d, 3      ; 13
digitalWriteFast(5,LOW);
     344:      80 91 90 00       lds      r24, 0x0090
     348:      8f 77             andi      r24, 0x7F      ; 127
     34a:      80 93 90 00       sts      0x0090, r24
     34e:      73 98             cbi      0x0e, 3      ; 14
delay(1);
     350:      61 e0             ldi      r22, 0x01      ; 1
     352:      70 e0             ldi      r23, 0x00      ; 0
     354:      80 e0             ldi      r24, 0x00      ; 0
     356:      90 e0             ldi      r25, 0x00      ; 0
     358:      0e 94 59 a2       call      0x144b2      ; 0x144b2 <delay>
if((int) digitalReadFast(2) != LOW) error(2,5,1);
     35c:      80 91 90 00       lds      r24, 0x0090
     360:      8f 7d             andi      r24, 0xDF      ; 223
     362:      80 93 90 00       sts      0x0090, r24
     366:      64 9b             sbis      0x0c, 4      ; 12
     368:      07 c0             rjmp      .+14           ; 0x378 <loop+0xac>
     36a:      82 e0             ldi      r24, 0x02      ; 2
     36c:      90 e0             ldi      r25, 0x00      ; 0
     36e:      65 e0             ldi      r22, 0x05      ; 5
     370:      70 e0             ldi      r23, 0x00      ; 0
     372:      41 e0             ldi      r20, 0x01      ; 1
     374:      50 e0             ldi      r21, 0x00      ; 0
     376:      73 df             rcall      .-282          ; 0x25e <_Z5erroriii>

here is corresponding 2560 disassembly from 2 of the cases that fail:

analogWrite(2,254);
     2e0:      82 e0             ldi      r24, 0x02      ; 2
     2e2:      6e ef             ldi      r22, 0xFE      ; 254
     2e4:      70 e0             ldi      r23, 0x00      ; 0
     2e6:      0e 94 c1 7b       call      0xf782      ; 0xf782 <analogWrite>
pinModeFast(2,INPUT);
     2ea:      52 98             cbi      0x0a, 2      ; 10
digitalWriteFast(2,HIGH); 
     2ec:      5a 9a             sbi      0x0b, 2      ; 11
pinModeFast(5,OUTPUT);
     2ee:      55 9a             sbi      0x0a, 5      ; 10
digitalWriteFast(5,LOW);
     2f0:      84 b5             in      r24, 0x24      ; 36
     2f2:      8f 7d             andi      r24, 0xDF      ; 223
     2f4:      84 bd             out      0x24, r24      ; 36
     2f6:      5d 98             cbi      0x0b, 5      ; 11
delay(1);
     2f8:      61 e0             ldi      r22, 0x01      ; 1
     2fa:      70 e0             ldi      r23, 0x00      ; 0
     2fc:      80 e0             ldi      r24, 0x00      ; 0
     2fe:      90 e0             ldi      r25, 0x00      ; 0
     300:      0e 94 06 7b       call      0xf60c      ; 0xf60c <delay>
if((int) digitalReadFast(2) != LOW) error(2,5,1);
     304:      4a 9b             sbis      0x09, 2      ; 9
     306:      07 c0             rjmp      .+14           ; 0x316 <loop+0x46>
     308:      82 e0             ldi      r24, 0x02      ; 2
     30a:      90 e0             ldi      r25, 0x00      ; 0
     30c:      65 e0             ldi      r22, 0x05      ; 5
     30e:      70 e0             ldi      r23, 0x00      ; 0
     310:      41 e0             ldi      r20, 0x01      ; 1
     312:      50 e0             ldi      r21, 0x00      ; 0
     314:      a6 df             rcall      .-180          ; 0x262 <_Z5erroriii>

analogWrite(5,254);
     316:      85 e0             ldi      r24, 0x05      ; 5
     318:      6e ef             ldi      r22, 0xFE      ; 254
     31a:      70 e0             ldi      r23, 0x00      ; 0
     31c:      0e 94 c1 7b       call      0xf782      ; 0xf782 <analogWrite>
pinModeFast(2,INPUT);
     320:      52 98             cbi      0x0a, 2      ; 10
digitalWriteFast(2,HIGH); 
     322:      5a 9a             sbi      0x0b, 2      ; 11
pinModeFast(5,OUTPUT);
     324:      55 9a             sbi      0x0a, 5      ; 10
digitalWriteFast(5,LOW);
     326:      84 b5             in      r24, 0x24      ; 36
     328:      8f 7d             andi      r24, 0xDF      ; 223
     32a:      84 bd             out      0x24, r24      ; 36
     32c:      5d 98             cbi      0x0b, 5      ; 11
delay(1);
     32e:      61 e0             ldi      r22, 0x01      ; 1
     330:      70 e0             ldi      r23, 0x00      ; 0
     332:      80 e0             ldi      r24, 0x00      ; 0
     334:      90 e0             ldi      r25, 0x00      ; 0
     336:      0e 94 06 7b       call      0xf60c      ; 0xf60c <delay>
if((int) digitalReadFast(2) != LOW) error(2,5,1);
     33a:      4a 9b             sbis      0x09, 2      ; 9
     33c:      07 c0             rjmp      .+14           ; 0x34c <loop+0x7c>
     33e:      82 e0             ldi      r24, 0x02      ; 2
     340:      90 e0             ldi      r25, 0x00      ; 0
     342:      65 e0             ldi      r22, 0x05      ; 5
     344:      70 e0             ldi      r23, 0x00      ; 0
     346:      41 e0             ldi      r20, 0x01      ; 1
     348:      50 e0             ldi      r21, 0x00      ; 0
     34a:      8b df             rcall      .-234          ; 0x262 <_Z5erroriii>

and here is disassembly for the uno (which has different pin/port correspondences than the 2560; this code also works):

analogWrite(2,254);
     194:      82 e0             ldi      r24, 0x02      ; 2
     196:      6e ef             ldi      r22, 0xFE      ; 254
     198:      70 e0             ldi      r23, 0x00      ; 0
     19a:      0e 94 87 14       call      0x290e      ; 0x290e <analogWrite>
pinModeFast(2,INPUT);
     19e:      52 98             cbi      0x0a, 2      ; 10
digitalWriteFast(2,HIGH); 
     1a0:      5a 9a             sbi      0x0b, 2      ; 11
pinModeFast(5,OUTPUT);
     1a2:      55 9a             sbi      0x0a, 5      ; 10
digitalWriteFast(5,LOW);
     1a4:      84 b5             in      r24, 0x24      ; 36
     1a6:      8f 7d             andi      r24, 0xDF      ; 223
     1a8:      84 bd             out      0x24, r24      ; 36
     1aa:      5d 98             cbi      0x0b, 5      ; 11
delay(1);
     1ac:      61 e0             ldi      r22, 0x01      ; 1
     1ae:      70 e0             ldi      r23, 0x00      ; 0
     1b0:      80 e0             ldi      r24, 0x00      ; 0
     1b2:      90 e0             ldi      r25, 0x00      ; 0
     1b4:      0e 94 f3 13       call      0x27e6      ; 0x27e6 <delay>
if((int) digitalReadFast(2) != LOW) error(2,5,1);
     1b8:      4a 9b             sbis      0x09, 2      ; 9
     1ba:      07 c0             rjmp      .+14           ; 0x1ca <loop+0x46>
     1bc:      82 e0             ldi      r24, 0x02      ; 2
     1be:      90 e0             ldi      r25, 0x00      ; 0
     1c0:      65 e0             ldi      r22, 0x05      ; 5
     1c2:      70 e0             ldi      r23, 0x00      ; 0
     1c4:      41 e0             ldi      r20, 0x01      ; 1
     1c6:      50 e0             ldi      r21, 0x00      ; 0
     1c8:      a6 df             rcall      .-180          ; 0x116 <_Z5erroriii>

analogWrite(5,254);
     1ca:      85 e0             ldi      r24, 0x05      ; 5
     1cc:      6e ef             ldi      r22, 0xFE      ; 254
     1ce:      70 e0             ldi      r23, 0x00      ; 0
     1d0:      0e 94 87 14       call      0x290e      ; 0x290e <analogWrite>
pinModeFast(2,INPUT);
     1d4:      52 98             cbi      0x0a, 2      ; 10
digitalWriteFast(2,HIGH); 
     1d6:      5a 9a             sbi      0x0b, 2      ; 11
pinModeFast(5,OUTPUT);
     1d8:      55 9a             sbi      0x0a, 5      ; 10
digitalWriteFast(5,LOW);
     1da:      84 b5             in      r24, 0x24      ; 36
     1dc:      8f 7d             andi      r24, 0xDF      ; 223
     1de:      84 bd             out      0x24, r24      ; 36
     1e0:      5d 98             cbi      0x0b, 5      ; 11
delay(1);
     1e2:      61 e0             ldi      r22, 0x01      ; 1
     1e4:      70 e0             ldi      r23, 0x00      ; 0
     1e6:      80 e0             ldi      r24, 0x00      ; 0
     1e8:      90 e0             ldi      r25, 0x00      ; 0
     1ea:      0e 94 f3 13       call      0x27e6      ; 0x27e6 <delay>
if((int) digitalReadFast(2) != LOW) error(2,5,1);
     1ee:      4a 9b             sbis      0x09, 2      ; 9
     1f0:      07 c0             rjmp      .+14           ; 0x200 <loop+0x7c>
     1f2:      82 e0             ldi      r24, 0x02      ; 2
     1f4:      90 e0             ldi      r25, 0x00      ; 0
     1f6:      65 e0             ldi      r22, 0x05      ; 5
     1f8:      70 e0             ldi      r23, 0x00      ; 0
     1fa:      41 e0             ldi      r20, 0x01      ; 1
     1fc:      50 e0             ldi      r21, 0x00      ; 0
     1fe:      8b df             rcall      .-234          ; 0x116 <_Z5erroriii>

So its pretty clear what's going on. Its picking the pin/port for a 328 Arduino. To be continued...

it seems like the problem has to be in the second line of code below but I cannot see it (I have recently added the comments on the #endif statements, you can consider them suspect):

#if !defined(digitalPinToPortReg)
#if !( defined(__AVR_ATmega1280__) || defined(__AVR_ATmega2560__) )

// Standard Arduino Pins
#define digitalPinToPortReg(P) \
(((P) >= 0 && (P) <= 7) ? &PORTD : (((P) >= 8 && (P) <= 13) ? &PORTB : &PORTC))
#define digitalPinToDDRReg(P) \
(((P) >= 0 && (P) <= 7) ? &DDRD : (((P) >= 8 && (P) <= 13) ? &DDRB : &DDRC))
#define digitalPinToPINReg(P) \
(((P) >= 0 && (P) <= 7) ? &PIND : (((P) >= 8 && (P) <= 13) ? &PINB : &PINC))
#define digitalPinToBit(P) \
(((P) >= 0 && (P) <= 7) ? (P) : (((P) >= 8 && (P) <= 13) ? (P) - 8 : (P) - 14))

#if defined(__AVR_ATmega8__)
// 3 PWM
#define digitalPinToTimer(P) \
(((P) ==  9 || (P) == 10) ? &TCCR1A : (((P) == 11) ? &TCCR2 : 0))
#define digitalPinToTimerBit(P) \
(((P) ==  9) ? COM1A1 : (((P) == 10) ? COM1B1 : COM21))
#else  //168,328

// 6 PWM
#define digitalPinToTimer(P) \
(((P) ==  6 || (P) ==  5) ? &TCCR0A : \
        (((P) ==  9 || (P) == 10) ? &TCCR1A : \
        (((P) == 11 || (P) ==  3) ? &TCCR2A : 0)))
#define digitalPinToTimerBit(P) \
(((P) ==  6) ? COM0A1 : (((P) ==  5) ? COM0B1 : \
        (((P) ==  9) ? COM1A1 : (((P) == 10) ? COM1B1 : \
        (((P) == 11) ? COM2A1 : COM2B1)))))
#endif  //defined(__AVR_ATmega8__)

#else
// Arduino Mega Pins
#define digitalPinToPortReg(P) \
(((P) >= 22 && (P) <= 29) ? &PORTA : \
        ((((P) >= 10 && (P) <= 13) || ((P) >= 50 && (P) <= 53)) ? &PORTB : \
        (((P) >= 30 && (P) <= 37) ? &PORTC : \
        ((((P) >= 18 && (P) <= 21) || (P) == 38) ? &PORTD : \
        ((((P) >= 0 && (P) <= 3) || (P) == 5) ? &PORTE : \
        (((P) >= 54 && (P) <= 61) ? &PORTF : \
        ((((P) >= 39 && (P) <= 41) || (P) == 4) ? &PORTG : \
        ((((P) >= 6 && (P) <= 9) || (P) == 16 || (P) == 17) ? &PORTH : \
        (((P) == 14 || (P) == 15) ? &PORTJ : \
        (((P) >= 62 && (P) <= 69) ? &PORTK : &PORTL))))))))))

#define digitalPinToDDRReg(P) \
(((P) >= 22 && (P) <= 29) ? &DDRA : \
        ((((P) >= 10 && (P) <= 13) || ((P) >= 50 && (P) <= 53)) ? &DDRB : \
        (((P) >= 30 && (P) <= 37) ? &DDRC : \
        ((((P) >= 18 && (P) <= 21) || (P) == 38) ? &DDRD : \
        ((((P) >= 0 && (P) <= 3) || (P) == 5) ? &DDRE : \
        (((P) >= 54 && (P) <= 61) ? &DDRF : \
        ((((P) >= 39 && (P) <= 41) || (P) == 4) ? &DDRG : \
        ((((P) >= 6 && (P) <= 9) || (P) == 16 || (P) == 17) ? &DDRH : \
        (((P) == 14 || (P) == 15) ? &DDRJ : \
        (((P) >= 62 && (P) <= 69) ? &DDRK : &DDRL))))))))))

#define digitalPinToPINReg(P) \
(((P) >= 22 && (P) <= 29) ? &PINA : \
        ((((P) >= 10 && (P) <= 13) || ((P) >= 50 && (P) <= 53)) ? &PINB : \
        (((P) >= 30 && (P) <= 37) ? &PINC : \
        ((((P) >= 18 && (P) <= 21) || (P) == 38) ? &PIND : \
        ((((P) >= 0 && (P) <= 3) || (P) == 5) ? &PINE : \
        (((P) >= 54 && (P) <= 61) ? &PINF : \
        ((((P) >= 39 && (P) <= 41) || (P) == 4) ? &PING : \
        ((((P) >= 6 && (P) <= 9) || (P) == 16 || (P) == 17) ? &PINH : \
        (((P) == 14 || (P) == 15) ? &PINJ : \
        (((P) >= 62 && (P) <= 69) ? &PINK : &PINL))))))))))

#define digitalPinToBit(P) \
(((P) >=  7 && (P) <=  9) ? (P) - 3 : \
        (((P) >= 10 && (P) <= 13) ? (P) - 6 : \
        (((P) >= 22 && (P) <= 29) ? (P) - 22 : \
        (((P) >= 30 && (P) <= 37) ? 37 - (P) : \
        (((P) >= 39 && (P) <= 41) ? 41 - (P) : \
        (((P) >= 42 && (P) <= 49) ? 49 - (P) : \
        (((P) >= 50 && (P) <= 53) ? 53 - (P) : \
        (((P) >= 54 && (P) <= 61) ? (P) - 54 : \
        (((P) >= 62 && (P) <= 69) ? (P) - 62 : \
        (((P) == 0 || (P) == 15 || (P) == 17 || (P) == 21) ? 0 : \
        (((P) == 1 || (P) == 14 || (P) == 16 || (P) == 20) ? 1 : \
        (((P) == 19) ? 2 : \
        (((P) == 5 || (P) == 6 || (P) == 18) ? 3 : \
        (((P) == 2) ? 4 : \
        (((P) == 3 || (P) == 4) ? 5 : 7)))))))))))))))

// 15 PWM
#define digitalPinToTimer(P) \
(((P) == 13 || (P) ==  4) ? &TCCR0A : \
        (((P) == 11 || (P) == 12) ? &TCCR1A : \
        (((P) == 10 || (P) ==  9) ? &TCCR2A : \
        (((P) ==  5 || (P) ==  2 || (P) ==  3) ? &TCCR3A : \
        (((P) ==  6 || (P) ==  7 || (P) ==  8) ? &TCCR4A : \
        (((P) == 46 || (P) == 45 || (P) == 44) ? &TCCR5A : 0))))))
#define digitalPinToTimerBit(P) \
(((P) == 13) ? COM0A1 : (((P) ==  4) ? COM0B1 : \
        (((P) == 11) ? COM1A1 : (((P) == 12) ? COM1B1 : \
        (((P) == 10) ? COM2A1 : (((P) ==  9) ? COM2B1 : \
        (((P) ==  5) ? COM3A1 : (((P) ==  2) ? COM3B1 : (((P) ==  3) ? COM3C1 : \
        (((P) ==  6) ? COM4A1 : (((P) ==  7) ? COM4B1 : (((P) ==  8) ? COM4C1 : \
        (((P) == 46) ? COM5A1 : (((P) == 45) ? COM5B1 : COM5C1))))))))))))))

#endif  //mega
#endif  //#if !defined(digitalPinToPortReg)

#ifndef __digitalWrite
      #ifndef digitalWriteFast
            #define digitalWriteFast(P, V) \
            if (__builtin_constant_p(P) && __builtin_constant_p(V)) { \
                                    if (digitalPinToTimer(P)) \
                                                bitClear(*digitalPinToTimer(P), digitalPinToTimerBit(P)); \
                                    bitWrite(*digitalPinToPortReg(P), digitalPinToBit(P), (V)); \
                        } else { \
                                    digitalWrite((P), (V)); \
                        }
      #endif  //#ifndef digitalWriteFast

      #ifndef digitalWriteFast2
            #define digitalWriteFast2(P, V) \
            if (__builtin_constant_p(P) && __builtin_constant_p(V)) { \
            bitWrite(*digitalPinToPortReg(P), digitalPinToBit(P), (V)); \
            } else { \
            digitalWrite((P), (V)); \
            }
      #endif  //#ifndef digitalWriteFast2


      #else 
            #define digitalWriteFast(  digitalWrite(
            #define digitalWriteFast2(  digitalWrite(
#endif //#ifndef __digitalWrite

#ifndef __pinMode
      #ifndef pinModeFast
            #define pinModeFast(P, V) \
            if (__builtin_constant_p(P) && __builtin_constant_p(V)) { \
                                    bitWrite(*digitalPinToDDRReg(P), digitalPinToBit(P), (V)); \
                        } else {  \
                                    pinMode((P), (V)); \
                        } 
      #endif  //#ifndef pinModeFast

      #if !defined(pinModeFast2)
            #define pinModeFast2(P, V) \
            if (__builtin_constant_p(P) && __builtin_constant_p(V)) { \
            if (digitalPinToTimer(P)) \
            bitClear(*digitalPinToTimer(P), digitalPinToTimerBit(P)); \
            bitWrite(*digitalPinToDDRReg(P), digitalPinToBit(P), (V)); \
            } else {  \
            pinMode((P), (V)); \
            } 
      #endif

      #else
            #define pinModeFast(  pinMode(
            #define pinModeFast2( pinMode(
#endif //#ifndef __pinMode

#ifndef __digitalRead
      #ifndef digitalReadFast
            #define digitalReadFast(P) ( (int) __digitalReadFast__((P)) )
            #define __digitalReadFast__(P ) \
            (__builtin_constant_p(P) ) ? ( \
                                    digitalPinToTimer(P) ? ( \
                                             bitClear(*digitalPinToTimer(P), digitalPinToTimerBit(P)) ,  \
                                                       bitRead(*digitalPinToPINReg(P), digitalPinToBit(P))) : \
                                      bitRead(*digitalPinToPINReg(P), digitalPinToBit(P)))  : \
                                    digitalRead((P))
      #endif   //#if !defined(digitalReadFast)

      #if !defined(digitalReadFast2)
            #define digitalReadFast2(P) ( (int) __digitalReadFast2__((P)) )
            #define __digitalReadFast2__(P ) \
            (__builtin_constant_p(P) ) ? ( \
            ( bitRead(*digitalPinToPINReg(P), digitalPinToBit(P))) ) : \
            digitalRead((P))
      #endif

      #else
            #define digitalReadFast( digitalRead(
            #define digitalReadFast2( digitalRead(
#endif  //#ifndef __digitalRead

AVR_ATmega2560 is not defined? That seems to be the only way for the mega256 code to be the same as the not-mega code.

That symbol is used throughout the core now, for instance in WProgram.h, pins_arduino. The !( ) are not used in the places I looked, though.

What I tried this morning:
I went back to Arduino-18, updated digitalWriteFast there. I'd already loaded Mark Sproul's 2560 bootloader etc. there. No difference from Arduino 21.

I added a few lines just above the part of the sketch I disassembled, mimicking to some extent the stuff that doesn't seem to work in digitalWriteFast (indeed I pasted the key line in from digitalWritefast.h):

#if !( defined(__AVR_ATmega2560__) ||defined(__AVR_ATmega1280__) )
Serial.println("This must be the UNO.");
#else
#if defined(__AVR_ATmega2560__)
Serial.println("This is the Mega2560.");
#endif
#if defined(__AVR_ATmega1280__)
Serial.println("This is the Mega1280.");
#endif
#endif  //#if !( defined(__AVR_ATmega2560__) ||defined(__AVR_ATmega1280__) )
//=============================the output from progprog.py goes below===================
analogWrite(2,254);
pinModeFast(2,INPUT);
digitalWriteFast(2,HIGH); 
pinModeFast(5,OUTPUT);
digitalWriteFast(5,LOW);
delay(1);
if((int) digitalReadFast(2) != LOW) error(2,5,1);

What printed was correctly “This is the Mega2560.”

I added a line at the top of my sketch
#define AVR_ATmega2560

that did not make a difference.

I changed that line to
#define AVR_ATmega1280
In the Boards menu I still selected the 2560.
now the test program runs on the 2560, reports no errors and is the appropriate size. This is, of course, not an actual solution.

I feel befuddled.

Is there some way an #undef could be snuck in between ?!?

Did you try:

#if !defined(__AVR_ATmega2560__) && !defined(__AVR_ATmega1280__)

?

doing it with ! && ! didn't work either.

I then tried reversing the order of the code, too, so that the mega stuff came first. When I did that I pasted the line I think is at fault in directly from arduino_pins. Still didn't work.

I feel like I must be looking in the wrong place but I don't see where else it could be.

#if !defined(digitalPinToPortReg)
#if defined(__AVR_ATmega1280__) || defined(__AVR_ATmega2560__)
// Arduino Mega Pins
#define digitalPinToPortReg(P) \
(((P) >= 22 && (P) <= 29) ? &PORTA : \
((((P) >= 10 && (P) <= 13) || ((P) >= 50 && (P) <= 53)) ? &PORTB : \
(((P) >= 30 && (P) <= 37) ? &PORTC : \
((((P) >= 18 && (P) <= 21) || (P) == 38) ? &PORTD : \
((((P) >= 0 && (P) <= 3) || (P) == 5) ? &PORTE : \
(((P) >= 54 && (P) <= 61) ? &PORTF : \
((((P) >= 39 && (P) <= 41) || (P) == 4) ? &PORTG : \
((((P) >= 6 && (P) <= 9) || (P) == 16 || (P) == 17) ? &PORTH : \
(((P) == 14 || (P) == 15) ? &PORTJ : \
(((P) >= 62 && (P) <= 69) ? &PORTK : &PORTL))))))))))

#define digitalPinToDDRReg(P) \
(((P) >= 22 && (P) <= 29) ? &DDRA : \
((((P) >= 10 && (P) <= 13) || ((P) >= 50 && (P) <= 53)) ? &DDRB : \
(((P) >= 30 && (P) <= 37) ? &DDRC : \
((((P) >= 18 && (P) <= 21) || (P) == 38) ? &DDRD : \
((((P) >= 0 && (P) <= 3) || (P) == 5) ? &DDRE : \
(((P) >= 54 && (P) <= 61) ? &DDRF : \
((((P) >= 39 && (P) <= 41) || (P) == 4) ? &DDRG : \
((((P) >= 6 && (P) <= 9) || (P) == 16 || (P) == 17) ? &DDRH : \
(((P) == 14 || (P) == 15) ? &DDRJ : \
(((P) >= 62 && (P) <= 69) ? &DDRK : &DDRL))))))))))

#define digitalPinToPINReg(P) \
(((P) >= 22 && (P) <= 29) ? &PINA : \
((((P) >= 10 && (P) <= 13) || ((P) >= 50 && (P) <= 53)) ? &PINB : \
(((P) >= 30 && (P) <= 37) ? &PINC : \
((((P) >= 18 && (P) <= 21) || (P) == 38) ? &PIND : \
((((P) >= 0 && (P) <= 3) || (P) == 5) ? &PINE : \
(((P) >= 54 && (P) <= 61) ? &PINF : \
((((P) >= 39 && (P) <= 41) || (P) == 4) ? &PING : \
((((P) >= 6 && (P) <= 9) || (P) == 16 || (P) == 17) ? &PINH : \
(((P) == 14 || (P) == 15) ? &PINJ : \
(((P) >= 62 && (P) <= 69) ? &PINK : &PINL))))))))))

#define digitalPinToBit(P) \
(((P) >=  7 && (P) <=  9) ? (P) - 3 : \
(((P) >= 10 && (P) <= 13) ? (P) - 6 : \
(((P) >= 22 && (P) <= 29) ? (P) - 22 : \
(((P) >= 30 && (P) <= 37) ? 37 - (P) : \
(((P) >= 39 && (P) <= 41) ? 41 - (P) : \
(((P) >= 42 && (P) <= 49) ? 49 - (P) : \
(((P) >= 50 && (P) <= 53) ? 53 - (P) : \
(((P) >= 54 && (P) <= 61) ? (P) - 54 : \
(((P) >= 62 && (P) <= 69) ? (P) - 62 : \
(((P) == 0 || (P) == 15 || (P) == 17 || (P) == 21) ? 0 : \
(((P) == 1 || (P) == 14 || (P) == 16 || (P) == 20) ? 1 : \
(((P) == 19) ? 2 : \
(((P) == 5 || (P) == 6 || (P) == 18) ? 3 : \
(((P) == 2) ? 4 : \
(((P) == 3 || (P) == 4) ? 5 : 7)))))))))))))))

// 15 PWM
#define digitalPinToTimer(P) \
(((P) == 13 || (P) ==  4) ? &TCCR0A : \
(((P) == 11 || (P) == 12) ? &TCCR1A : \
(((P) == 10 || (P) ==  9) ? &TCCR2A : \
(((P) ==  5 || (P) ==  2 || (P) ==  3) ? &TCCR3A : \
(((P) ==  6 || (P) ==  7 || (P) ==  8) ? &TCCR4A : \
(((P) == 46 || (P) == 45 || (P) == 44) ? &TCCR5A : 0))))))
#define digitalPinToTimerBit(P) \
(((P) == 13) ? COM0A1 : (((P) ==  4) ? COM0B1 : \
(((P) == 11) ? COM1A1 : (((P) == 12) ? COM1B1 : \
(((P) == 10) ? COM2A1 : (((P) ==  9) ? COM2B1 : \
(((P) ==  5) ? COM3A1 : (((P) ==  2) ? COM3B1 : (((P) ==  3) ? COM3C1 : \
(((P) ==  6) ? COM4A1 : (((P) ==  7) ? COM4B1 : (((P) ==  8) ? COM4C1 : \
(((P) == 46) ? COM5A1 : (((P) == 45) ? COM5B1 : COM5C1))))))))))))))

#else

// Standard Arduino Pins
#define digitalPinToPortReg(P) \
(((P) >= 0 && (P) <= 7) ? &PORTD : (((P) >= 8 && (P) <= 13) ? &PORTB : &PORTC))
#define digitalPinToDDRReg(P) \
(((P) >= 0 && (P) <= 7) ? &DDRD : (((P) >= 8 && (P) <= 13) ? &DDRB : &DDRC))
#define digitalPinToPINReg(P) \
(((P) >= 0 && (P) <= 7) ? &PIND : (((P) >= 8 && (P) <= 13) ? &PINB : &PINC))
#define digitalPinToBit(P) \
(((P) >= 0 && (P) <= 7) ? (P) : (((P) >= 8 && (P) <= 13) ? (P) - 8 : (P) - 14))

#if defined(__AVR_ATmega8__)
// 3 PWM
#define digitalPinToTimer(P) \
(((P) ==  9 || (P) == 10) ? &TCCR1A : (((P) == 11) ? &TCCR2 : 0))
#define digitalPinToTimerBit(P) \
(((P) ==  9) ? COM1A1 : (((P) == 10) ? COM1B1 : COM21))
#else  //168,328

// 6 PWM
#define digitalPinToTimer(P) \
(((P) ==  6 || (P) ==  5) ? &TCCR0A : \
        (((P) ==  9 || (P) == 10) ? &TCCR1A : \
        (((P) == 11 || (P) ==  3) ? &TCCR2A : 0)))
#define digitalPinToTimerBit(P) \
(((P) ==  6) ? COM0A1 : (((P) ==  5) ? COM0B1 : \
        (((P) ==  9) ? COM1A1 : (((P) == 10) ? COM1B1 : \
        (((P) == 11) ? COM2A1 : COM2B1)))))
#endif  //defined(__AVR_ATmega8__)


#endif  //mega
#endif  //#if !defined(digitalPinToPortReg)

#ifndef __digitalWrite
      #ifndef digitalWriteFast
            #define digitalWriteFast(P, V) \
            if (__builtin_constant_p(P) && __builtin_constant_p(V)) { \
                                    if (digitalPinToTimer(P)) \
                                                bitClear(*digitalPinToTimer(P), digitalPinToTimerBit(P)); \
                                    bitWrite(*digitalPinToPortReg(P), digitalPinToBit(P), (V)); \
                        } else { \
                                    digitalWrite((P), (V)); \
                        }
      #endif  //#ifndef digitalWriteFast

      #ifndef digitalWriteFast2
            #define digitalWriteFast2(P, V) \
            if (__builtin_constant_p(P) && __builtin_constant_p(V)) { \
            bitWrite(*digitalPinToPortReg(P), digitalPinToBit(P), (V)); \
            } else { \
            digitalWrite((P), (V)); \
            }
      #endif  //#ifndef digitalWriteFast2


      #else 
            #define digitalWriteFast(  digitalWrite(
            #define digitalWriteFast2(  digitalWrite(
#endif //#ifndef __digitalWrite

#ifndef __pinMode
      #ifndef pinModeFast
            #define pinModeFast(P, V) \
            if (__builtin_constant_p(P) && __builtin_constant_p(V)) { \
                                    bitWrite(*digitalPinToDDRReg(P), digitalPinToBit(P), (V)); \
                        } else {  \
                                    pinMode((P), (V)); \
                        } 
      #endif  //#ifndef pinModeFast

      #if !defined(pinModeFast2)
            #define pinModeFast2(P, V) \
            if (__builtin_constant_p(P) && __builtin_constant_p(V)) { \
            if (digitalPinToTimer(P)) \
            bitClear(*digitalPinToTimer(P), digitalPinToTimerBit(P)); \
            bitWrite(*digitalPinToDDRReg(P), digitalPinToBit(P), (V)); \
            } else {  \
            pinMode((P), (V)); \
            } 
      #endif

      #else
            #define pinModeFast(  pinMode(
            #define pinModeFast2( pinMode(
#endif //#ifndef __pinMode

#ifndef __digitalRead
      #ifndef digitalReadFast
            #define digitalReadFast(P) ( (int) __digitalReadFast__((P)) )
            #define __digitalReadFast__(P ) \
            (__builtin_constant_p(P) ) ? ( \
                                    digitalPinToTimer(P) ? ( \
                                             bitClear(*digitalPinToTimer(P), digitalPinToTimerBit(P)) ,  \
                                                       bitRead(*digitalPinToPINReg(P), digitalPinToBit(P))) : \
                                      bitRead(*digitalPinToPINReg(P), digitalPinToBit(P)))  : \
                                    digitalRead((P))
      #endif   //#if !defined(digitalReadFast)

      #if !defined(digitalReadFast2)
            #define digitalReadFast2(P) ( (int) __digitalReadFast2__((P)) )
            #define __digitalReadFast2__(P ) \
            (__builtin_constant_p(P) ) ? ( \
            ( bitRead(*digitalPinToPINReg(P), digitalPinToBit(P))) ) : \
            digitalRead((P))
      #endif

      #else
            #define digitalReadFast( digitalRead(
            #define digitalReadFast2( digitalRead(
#endif  //#ifndef __digitalRead

its almost enough to make one 'superstitious'. Defnitely need to fully understand this one!