digitalWriteFast, digitalReadFast, pinModeFast etc

That's certainly MUCH prettier (and a lot easier to read) than the macro version!

Several times I have thought I should put digitalWriteFast onto the Playground section of the site. When I have looked at doing this, I can't figure out what category it would go in. This is, of course, a reflection of Paul Stoffregen's point, that it belongs in the core, not in a library.

Nonetheless, it seems to me it should be on the Playground. I am open to suggestions as to what the category should be.

New category: "Advanced Optimizations, including alternative Core functionality."

There has been discussion of a lot of stuff in the forums that could go there, including (for example) hints on just how to go about DOING optimization...

I did some experiments with the msp430 implementation the Arduino core I've been working on (with a digitalWrite() macro that uses __builtin_constant_p), and I found that using an inline function caused the results to be "overly and mysteriously dependent on the optimization flags given to the compiler." I think I'll stick with the macros...

I made a very small change to support the Mega2560. I've tested it on the Mega1280 and the UNO. A 2560 is in the mail. I also included in the download versions of the test code for both the Uno and for the Mega. The version for the Mega actually runs through the tests 3 times in sequence for one time through loop(). It is special in that it demonstrates that the arduino gcc actually can handle program code that gets to a size that big.

i.e. Binary sketch size: 87296 bytes (of a 126976 byte maximum)

(Sketches that contain a huge amount of PROGMEM data and then code that gets stored higher up don't work at this time see: http://www.arduino.cc/cgi-bin/yabb2/YaBB.pl?num=1274821710 )

The download is at Google Code Archive - Long-term storage for Google Code Project Hosting.

Thanks for maintaining this, jrraines. I'd like to include it in the Arduino software, soon, as a replacement for the current digitalRead() (at least and probably pinMode() and digitalRead(), too). Any reason not to? Or any reason why it should stay a separate function?

I'm certainly in favor of adding it to the core.

There has been much discussion of pros and cons by people who are wiser than I am.

This just handles the case where the pin numbers are defined at compile time, of course. So you can't really get rid of the existing versions to handle cases like subroutines that get pin numbers passed in or loops with different pin numbers on successive passes through the code. But this is just a big macro so it will never make code longer or slow down runtime speed.

I made a lot of progress Friday night and early Saturday morning and have had a confusing and frustrating time since.

I actually have/had something working to move this stuff into the core in wiring.h and wiring_digital.c I emailed it to Paul Stoffregen hoping to get him to review it and haven't heard back.

I struggled all day Saturday trying to get a version of digitalWriteFast that would cooperate with both the current versions of wiring.h/wiring_digital.c and my modified ones.

My Mega2560 came and it does not work with what I posted a few days ago. It seems to me that I should just be able to use the same version of the test code as I use for the Mega1280, select Mega2560 from the boards menu and it should have just worked. But there are at least some errors in the test program on almost every pin pair. AND THE CODE SIZE FOR THE 2560 IS 20000 bytes smaller! I was very puzzled.

Here is the 1280 disassembly (which works):

analogWrite(2,254);
     2dc:      82 e0             ldi      r24, 0x02      ; 2
     2de:      6e ef             ldi      r22, 0xFE      ; 254
     2e0:      70 e0             ldi      r23, 0x00      ; 0
     2e2:      0e 94 14 a3       call      0x14628      ; 0x14628 <analogWrite>
pinModeFast(2,INPUT);
     2e6:      6c 98             cbi      0x0d, 4      ; 13
digitalWriteFast(2,HIGH); 
     2e8:      80 91 90 00       lds      r24, 0x0090
     2ec:      8f 7d             andi      r24, 0xDF      ; 223
     2ee:      80 93 90 00       sts      0x0090, r24
     2f2:      74 9a             sbi      0x0e, 4      ; 14
pinModeFast(5,OUTPUT);
     2f4:      6b 9a             sbi      0x0d, 3      ; 13
digitalWriteFast(5,LOW);
     2f6:      80 91 90 00       lds      r24, 0x0090
     2fa:      8f 77             andi      r24, 0x7F      ; 127
     2fc:      80 93 90 00       sts      0x0090, r24
     300:      73 98             cbi      0x0e, 3      ; 14
delay(1);
     302:      61 e0             ldi      r22, 0x01      ; 1
     304:      70 e0             ldi      r23, 0x00      ; 0
     306:      80 e0             ldi      r24, 0x00      ; 0
     308:      90 e0             ldi      r25, 0x00      ; 0
     30a:      0e 94 59 a2       call      0x144b2      ; 0x144b2 <delay>
if((int) digitalReadFast(2) != LOW) error(2,5,1);
     30e:      80 91 90 00       lds      r24, 0x0090
     312:      8f 7d             andi      r24, 0xDF      ; 223
     314:      80 93 90 00       sts      0x0090, r24
     318:      64 9b             sbis      0x0c, 4      ; 12
     31a:      07 c0             rjmp      .+14           ; 0x32a <loop+0x5e>
     31c:      82 e0             ldi      r24, 0x02      ; 2
     31e:      90 e0             ldi      r25, 0x00      ; 0
     320:      65 e0             ldi      r22, 0x05      ; 5
     322:      70 e0             ldi      r23, 0x00      ; 0
     324:      41 e0             ldi      r20, 0x01      ; 1
     326:      50 e0             ldi      r21, 0x00      ; 0
     328:      9a df             rcall      .-204          ; 0x25e <_Z5erroriii>

analogWrite(5,254);
     32a:      85 e0             ldi      r24, 0x05      ; 5
     32c:      6e ef             ldi      r22, 0xFE      ; 254
     32e:      70 e0             ldi      r23, 0x00      ; 0
     330:      0e 94 14 a3       call      0x14628      ; 0x14628 <analogWrite>
pinModeFast(2,INPUT);
     334:      6c 98             cbi      0x0d, 4      ; 13
digitalWriteFast(2,HIGH); 
     336:      80 91 90 00       lds      r24, 0x0090
     33a:      8f 7d             andi      r24, 0xDF      ; 223
     33c:      80 93 90 00       sts      0x0090, r24
     340:      74 9a             sbi      0x0e, 4      ; 14
pinModeFast(5,OUTPUT);
     342:      6b 9a             sbi      0x0d, 3      ; 13
digitalWriteFast(5,LOW);
     344:      80 91 90 00       lds      r24, 0x0090
     348:      8f 77             andi      r24, 0x7F      ; 127
     34a:      80 93 90 00       sts      0x0090, r24
     34e:      73 98             cbi      0x0e, 3      ; 14
delay(1);
     350:      61 e0             ldi      r22, 0x01      ; 1
     352:      70 e0             ldi      r23, 0x00      ; 0
     354:      80 e0             ldi      r24, 0x00      ; 0
     356:      90 e0             ldi      r25, 0x00      ; 0
     358:      0e 94 59 a2       call      0x144b2      ; 0x144b2 <delay>
if((int) digitalReadFast(2) != LOW) error(2,5,1);
     35c:      80 91 90 00       lds      r24, 0x0090
     360:      8f 7d             andi      r24, 0xDF      ; 223
     362:      80 93 90 00       sts      0x0090, r24
     366:      64 9b             sbis      0x0c, 4      ; 12
     368:      07 c0             rjmp      .+14           ; 0x378 <loop+0xac>
     36a:      82 e0             ldi      r24, 0x02      ; 2
     36c:      90 e0             ldi      r25, 0x00      ; 0
     36e:      65 e0             ldi      r22, 0x05      ; 5
     370:      70 e0             ldi      r23, 0x00      ; 0
     372:      41 e0             ldi      r20, 0x01      ; 1
     374:      50 e0             ldi      r21, 0x00      ; 0
     376:      73 df             rcall      .-282          ; 0x25e <_Z5erroriii>

here is corresponding 2560 disassembly from 2 of the cases that fail:

analogWrite(2,254);
     2e0:      82 e0             ldi      r24, 0x02      ; 2
     2e2:      6e ef             ldi      r22, 0xFE      ; 254
     2e4:      70 e0             ldi      r23, 0x00      ; 0
     2e6:      0e 94 c1 7b       call      0xf782      ; 0xf782 <analogWrite>
pinModeFast(2,INPUT);
     2ea:      52 98             cbi      0x0a, 2      ; 10
digitalWriteFast(2,HIGH); 
     2ec:      5a 9a             sbi      0x0b, 2      ; 11
pinModeFast(5,OUTPUT);
     2ee:      55 9a             sbi      0x0a, 5      ; 10
digitalWriteFast(5,LOW);
     2f0:      84 b5             in      r24, 0x24      ; 36
     2f2:      8f 7d             andi      r24, 0xDF      ; 223
     2f4:      84 bd             out      0x24, r24      ; 36
     2f6:      5d 98             cbi      0x0b, 5      ; 11
delay(1);
     2f8:      61 e0             ldi      r22, 0x01      ; 1
     2fa:      70 e0             ldi      r23, 0x00      ; 0
     2fc:      80 e0             ldi      r24, 0x00      ; 0
     2fe:      90 e0             ldi      r25, 0x00      ; 0
     300:      0e 94 06 7b       call      0xf60c      ; 0xf60c <delay>
if((int) digitalReadFast(2) != LOW) error(2,5,1);
     304:      4a 9b             sbis      0x09, 2      ; 9
     306:      07 c0             rjmp      .+14           ; 0x316 <loop+0x46>
     308:      82 e0             ldi      r24, 0x02      ; 2
     30a:      90 e0             ldi      r25, 0x00      ; 0
     30c:      65 e0             ldi      r22, 0x05      ; 5
     30e:      70 e0             ldi      r23, 0x00      ; 0
     310:      41 e0             ldi      r20, 0x01      ; 1
     312:      50 e0             ldi      r21, 0x00      ; 0
     314:      a6 df             rcall      .-180          ; 0x262 <_Z5erroriii>

analogWrite(5,254);
     316:      85 e0             ldi      r24, 0x05      ; 5
     318:      6e ef             ldi      r22, 0xFE      ; 254
     31a:      70 e0             ldi      r23, 0x00      ; 0
     31c:      0e 94 c1 7b       call      0xf782      ; 0xf782 <analogWrite>
pinModeFast(2,INPUT);
     320:      52 98             cbi      0x0a, 2      ; 10
digitalWriteFast(2,HIGH); 
     322:      5a 9a             sbi      0x0b, 2      ; 11
pinModeFast(5,OUTPUT);
     324:      55 9a             sbi      0x0a, 5      ; 10
digitalWriteFast(5,LOW);
     326:      84 b5             in      r24, 0x24      ; 36
     328:      8f 7d             andi      r24, 0xDF      ; 223
     32a:      84 bd             out      0x24, r24      ; 36
     32c:      5d 98             cbi      0x0b, 5      ; 11
delay(1);
     32e:      61 e0             ldi      r22, 0x01      ; 1
     330:      70 e0             ldi      r23, 0x00      ; 0
     332:      80 e0             ldi      r24, 0x00      ; 0
     334:      90 e0             ldi      r25, 0x00      ; 0
     336:      0e 94 06 7b       call      0xf60c      ; 0xf60c <delay>
if((int) digitalReadFast(2) != LOW) error(2,5,1);
     33a:      4a 9b             sbis      0x09, 2      ; 9
     33c:      07 c0             rjmp      .+14           ; 0x34c <loop+0x7c>
     33e:      82 e0             ldi      r24, 0x02      ; 2
     340:      90 e0             ldi      r25, 0x00      ; 0
     342:      65 e0             ldi      r22, 0x05      ; 5
     344:      70 e0             ldi      r23, 0x00      ; 0
     346:      41 e0             ldi      r20, 0x01      ; 1
     348:      50 e0             ldi      r21, 0x00      ; 0
     34a:      8b df             rcall      .-234          ; 0x262 <_Z5erroriii>

and here is disassembly for the uno (which has different pin/port correspondences than the 2560; this code also works):

analogWrite(2,254);
     194:      82 e0             ldi      r24, 0x02      ; 2
     196:      6e ef             ldi      r22, 0xFE      ; 254
     198:      70 e0             ldi      r23, 0x00      ; 0
     19a:      0e 94 87 14       call      0x290e      ; 0x290e <analogWrite>
pinModeFast(2,INPUT);
     19e:      52 98             cbi      0x0a, 2      ; 10
digitalWriteFast(2,HIGH); 
     1a0:      5a 9a             sbi      0x0b, 2      ; 11
pinModeFast(5,OUTPUT);
     1a2:      55 9a             sbi      0x0a, 5      ; 10
digitalWriteFast(5,LOW);
     1a4:      84 b5             in      r24, 0x24      ; 36
     1a6:      8f 7d             andi      r24, 0xDF      ; 223
     1a8:      84 bd             out      0x24, r24      ; 36
     1aa:      5d 98             cbi      0x0b, 5      ; 11
delay(1);
     1ac:      61 e0             ldi      r22, 0x01      ; 1
     1ae:      70 e0             ldi      r23, 0x00      ; 0
     1b0:      80 e0             ldi      r24, 0x00      ; 0
     1b2:      90 e0             ldi      r25, 0x00      ; 0
     1b4:      0e 94 f3 13       call      0x27e6      ; 0x27e6 <delay>
if((int) digitalReadFast(2) != LOW) error(2,5,1);
     1b8:      4a 9b             sbis      0x09, 2      ; 9
     1ba:      07 c0             rjmp      .+14           ; 0x1ca <loop+0x46>
     1bc:      82 e0             ldi      r24, 0x02      ; 2
     1be:      90 e0             ldi      r25, 0x00      ; 0
     1c0:      65 e0             ldi      r22, 0x05      ; 5
     1c2:      70 e0             ldi      r23, 0x00      ; 0
     1c4:      41 e0             ldi      r20, 0x01      ; 1
     1c6:      50 e0             ldi      r21, 0x00      ; 0
     1c8:      a6 df             rcall      .-180          ; 0x116 <_Z5erroriii>

analogWrite(5,254);
     1ca:      85 e0             ldi      r24, 0x05      ; 5
     1cc:      6e ef             ldi      r22, 0xFE      ; 254
     1ce:      70 e0             ldi      r23, 0x00      ; 0
     1d0:      0e 94 87 14       call      0x290e      ; 0x290e <analogWrite>
pinModeFast(2,INPUT);
     1d4:      52 98             cbi      0x0a, 2      ; 10
digitalWriteFast(2,HIGH); 
     1d6:      5a 9a             sbi      0x0b, 2      ; 11
pinModeFast(5,OUTPUT);
     1d8:      55 9a             sbi      0x0a, 5      ; 10
digitalWriteFast(5,LOW);
     1da:      84 b5             in      r24, 0x24      ; 36
     1dc:      8f 7d             andi      r24, 0xDF      ; 223
     1de:      84 bd             out      0x24, r24      ; 36
     1e0:      5d 98             cbi      0x0b, 5      ; 11
delay(1);
     1e2:      61 e0             ldi      r22, 0x01      ; 1
     1e4:      70 e0             ldi      r23, 0x00      ; 0
     1e6:      80 e0             ldi      r24, 0x00      ; 0
     1e8:      90 e0             ldi      r25, 0x00      ; 0
     1ea:      0e 94 f3 13       call      0x27e6      ; 0x27e6 <delay>
if((int) digitalReadFast(2) != LOW) error(2,5,1);
     1ee:      4a 9b             sbis      0x09, 2      ; 9
     1f0:      07 c0             rjmp      .+14           ; 0x200 <loop+0x7c>
     1f2:      82 e0             ldi      r24, 0x02      ; 2
     1f4:      90 e0             ldi      r25, 0x00      ; 0
     1f6:      65 e0             ldi      r22, 0x05      ; 5
     1f8:      70 e0             ldi      r23, 0x00      ; 0
     1fa:      41 e0             ldi      r20, 0x01      ; 1
     1fc:      50 e0             ldi      r21, 0x00      ; 0
     1fe:      8b df             rcall      .-234          ; 0x116 <_Z5erroriii>

So its pretty clear what's going on. Its picking the pin/port for a 328 Arduino. To be continued...

it seems like the problem has to be in the second line of code below but I cannot see it (I have recently added the comments on the #endif statements, you can consider them suspect):

#if !defined(digitalPinToPortReg)
#if !( defined(__AVR_ATmega1280__) || defined(__AVR_ATmega2560__) )

// Standard Arduino Pins
#define digitalPinToPortReg(P) \
(((P) >= 0 && (P) <= 7) ? &PORTD : (((P) >= 8 && (P) <= 13) ? &PORTB : &PORTC))
#define digitalPinToDDRReg(P) \
(((P) >= 0 && (P) <= 7) ? &DDRD : (((P) >= 8 && (P) <= 13) ? &DDRB : &DDRC))
#define digitalPinToPINReg(P) \
(((P) >= 0 && (P) <= 7) ? &PIND : (((P) >= 8 && (P) <= 13) ? &PINB : &PINC))
#define digitalPinToBit(P) \
(((P) >= 0 && (P) <= 7) ? (P) : (((P) >= 8 && (P) <= 13) ? (P) - 8 : (P) - 14))

#if defined(__AVR_ATmega8__)
// 3 PWM
#define digitalPinToTimer(P) \
(((P) ==  9 || (P) == 10) ? &TCCR1A : (((P) == 11) ? &TCCR2 : 0))
#define digitalPinToTimerBit(P) \
(((P) ==  9) ? COM1A1 : (((P) == 10) ? COM1B1 : COM21))
#else  //168,328

// 6 PWM
#define digitalPinToTimer(P) \
(((P) ==  6 || (P) ==  5) ? &TCCR0A : \
        (((P) ==  9 || (P) == 10) ? &TCCR1A : \
        (((P) == 11 || (P) ==  3) ? &TCCR2A : 0)))
#define digitalPinToTimerBit(P) \
(((P) ==  6) ? COM0A1 : (((P) ==  5) ? COM0B1 : \
        (((P) ==  9) ? COM1A1 : (((P) == 10) ? COM1B1 : \
        (((P) == 11) ? COM2A1 : COM2B1)))))
#endif  //defined(__AVR_ATmega8__)

#else
// Arduino Mega Pins
#define digitalPinToPortReg(P) \
(((P) >= 22 && (P) <= 29) ? &PORTA : \
        ((((P) >= 10 && (P) <= 13) || ((P) >= 50 && (P) <= 53)) ? &PORTB : \
        (((P) >= 30 && (P) <= 37) ? &PORTC : \
        ((((P) >= 18 && (P) <= 21) || (P) == 38) ? &PORTD : \
        ((((P) >= 0 && (P) <= 3) || (P) == 5) ? &PORTE : \
        (((P) >= 54 && (P) <= 61) ? &PORTF : \
        ((((P) >= 39 && (P) <= 41) || (P) == 4) ? &PORTG : \
        ((((P) >= 6 && (P) <= 9) || (P) == 16 || (P) == 17) ? &PORTH : \
        (((P) == 14 || (P) == 15) ? &PORTJ : \
        (((P) >= 62 && (P) <= 69) ? &PORTK : &PORTL))))))))))

#define digitalPinToDDRReg(P) \
(((P) >= 22 && (P) <= 29) ? &DDRA : \
        ((((P) >= 10 && (P) <= 13) || ((P) >= 50 && (P) <= 53)) ? &DDRB : \
        (((P) >= 30 && (P) <= 37) ? &DDRC : \
        ((((P) >= 18 && (P) <= 21) || (P) == 38) ? &DDRD : \
        ((((P) >= 0 && (P) <= 3) || (P) == 5) ? &DDRE : \
        (((P) >= 54 && (P) <= 61) ? &DDRF : \
        ((((P) >= 39 && (P) <= 41) || (P) == 4) ? &DDRG : \
        ((((P) >= 6 && (P) <= 9) || (P) == 16 || (P) == 17) ? &DDRH : \
        (((P) == 14 || (P) == 15) ? &DDRJ : \
        (((P) >= 62 && (P) <= 69) ? &DDRK : &DDRL))))))))))

#define digitalPinToPINReg(P) \
(((P) >= 22 && (P) <= 29) ? &PINA : \
        ((((P) >= 10 && (P) <= 13) || ((P) >= 50 && (P) <= 53)) ? &PINB : \
        (((P) >= 30 && (P) <= 37) ? &PINC : \
        ((((P) >= 18 && (P) <= 21) || (P) == 38) ? &PIND : \
        ((((P) >= 0 && (P) <= 3) || (P) == 5) ? &PINE : \
        (((P) >= 54 && (P) <= 61) ? &PINF : \
        ((((P) >= 39 && (P) <= 41) || (P) == 4) ? &PING : \
        ((((P) >= 6 && (P) <= 9) || (P) == 16 || (P) == 17) ? &PINH : \
        (((P) == 14 || (P) == 15) ? &PINJ : \
        (((P) >= 62 && (P) <= 69) ? &PINK : &PINL))))))))))

#define digitalPinToBit(P) \
(((P) >=  7 && (P) <=  9) ? (P) - 3 : \
        (((P) >= 10 && (P) <= 13) ? (P) - 6 : \
        (((P) >= 22 && (P) <= 29) ? (P) - 22 : \
        (((P) >= 30 && (P) <= 37) ? 37 - (P) : \
        (((P) >= 39 && (P) <= 41) ? 41 - (P) : \
        (((P) >= 42 && (P) <= 49) ? 49 - (P) : \
        (((P) >= 50 && (P) <= 53) ? 53 - (P) : \
        (((P) >= 54 && (P) <= 61) ? (P) - 54 : \
        (((P) >= 62 && (P) <= 69) ? (P) - 62 : \
        (((P) == 0 || (P) == 15 || (P) == 17 || (P) == 21) ? 0 : \
        (((P) == 1 || (P) == 14 || (P) == 16 || (P) == 20) ? 1 : \
        (((P) == 19) ? 2 : \
        (((P) == 5 || (P) == 6 || (P) == 18) ? 3 : \
        (((P) == 2) ? 4 : \
        (((P) == 3 || (P) == 4) ? 5 : 7)))))))))))))))

// 15 PWM
#define digitalPinToTimer(P) \
(((P) == 13 || (P) ==  4) ? &TCCR0A : \
        (((P) == 11 || (P) == 12) ? &TCCR1A : \
        (((P) == 10 || (P) ==  9) ? &TCCR2A : \
        (((P) ==  5 || (P) ==  2 || (P) ==  3) ? &TCCR3A : \
        (((P) ==  6 || (P) ==  7 || (P) ==  8) ? &TCCR4A : \
        (((P) == 46 || (P) == 45 || (P) == 44) ? &TCCR5A : 0))))))
#define digitalPinToTimerBit(P) \
(((P) == 13) ? COM0A1 : (((P) ==  4) ? COM0B1 : \
        (((P) == 11) ? COM1A1 : (((P) == 12) ? COM1B1 : \
        (((P) == 10) ? COM2A1 : (((P) ==  9) ? COM2B1 : \
        (((P) ==  5) ? COM3A1 : (((P) ==  2) ? COM3B1 : (((P) ==  3) ? COM3C1 : \
        (((P) ==  6) ? COM4A1 : (((P) ==  7) ? COM4B1 : (((P) ==  8) ? COM4C1 : \
        (((P) == 46) ? COM5A1 : (((P) == 45) ? COM5B1 : COM5C1))))))))))))))

#endif  //mega
#endif  //#if !defined(digitalPinToPortReg)

#ifndef __digitalWrite
      #ifndef digitalWriteFast
            #define digitalWriteFast(P, V) \
            if (__builtin_constant_p(P) && __builtin_constant_p(V)) { \
                                    if (digitalPinToTimer(P)) \
                                                bitClear(*digitalPinToTimer(P), digitalPinToTimerBit(P)); \
                                    bitWrite(*digitalPinToPortReg(P), digitalPinToBit(P), (V)); \
                        } else { \
                                    digitalWrite((P), (V)); \
                        }
      #endif  //#ifndef digitalWriteFast

      #ifndef digitalWriteFast2
            #define digitalWriteFast2(P, V) \
            if (__builtin_constant_p(P) && __builtin_constant_p(V)) { \
            bitWrite(*digitalPinToPortReg(P), digitalPinToBit(P), (V)); \
            } else { \
            digitalWrite((P), (V)); \
            }
      #endif  //#ifndef digitalWriteFast2


      #else 
            #define digitalWriteFast(  digitalWrite(
            #define digitalWriteFast2(  digitalWrite(
#endif //#ifndef __digitalWrite

#ifndef __pinMode
      #ifndef pinModeFast
            #define pinModeFast(P, V) \
            if (__builtin_constant_p(P) && __builtin_constant_p(V)) { \
                                    bitWrite(*digitalPinToDDRReg(P), digitalPinToBit(P), (V)); \
                        } else {  \
                                    pinMode((P), (V)); \
                        } 
      #endif  //#ifndef pinModeFast

      #if !defined(pinModeFast2)
            #define pinModeFast2(P, V) \
            if (__builtin_constant_p(P) && __builtin_constant_p(V)) { \
            if (digitalPinToTimer(P)) \
            bitClear(*digitalPinToTimer(P), digitalPinToTimerBit(P)); \
            bitWrite(*digitalPinToDDRReg(P), digitalPinToBit(P), (V)); \
            } else {  \
            pinMode((P), (V)); \
            } 
      #endif

      #else
            #define pinModeFast(  pinMode(
            #define pinModeFast2( pinMode(
#endif //#ifndef __pinMode

#ifndef __digitalRead
      #ifndef digitalReadFast
            #define digitalReadFast(P) ( (int) __digitalReadFast__((P)) )
            #define __digitalReadFast__(P ) \
            (__builtin_constant_p(P) ) ? ( \
                                    digitalPinToTimer(P) ? ( \
                                             bitClear(*digitalPinToTimer(P), digitalPinToTimerBit(P)) ,  \
                                                       bitRead(*digitalPinToPINReg(P), digitalPinToBit(P))) : \
                                      bitRead(*digitalPinToPINReg(P), digitalPinToBit(P)))  : \
                                    digitalRead((P))
      #endif   //#if !defined(digitalReadFast)

      #if !defined(digitalReadFast2)
            #define digitalReadFast2(P) ( (int) __digitalReadFast2__((P)) )
            #define __digitalReadFast2__(P ) \
            (__builtin_constant_p(P) ) ? ( \
            ( bitRead(*digitalPinToPINReg(P), digitalPinToBit(P))) ) : \
            digitalRead((P))
      #endif

      #else
            #define digitalReadFast( digitalRead(
            #define digitalReadFast2( digitalRead(
#endif  //#ifndef __digitalRead

AVR_ATmega2560 is not defined? That seems to be the only way for the mega256 code to be the same as the not-mega code.

That symbol is used throughout the core now, for instance in WProgram.h, pins_arduino. The !( ) are not used in the places I looked, though.

What I tried this morning:
I went back to Arduino-18, updated digitalWriteFast there. I'd already loaded Mark Sproul's 2560 bootloader etc. there. No difference from Arduino 21.

I added a few lines just above the part of the sketch I disassembled, mimicking to some extent the stuff that doesn't seem to work in digitalWriteFast (indeed I pasted the key line in from digitalWritefast.h):

#if !( defined(__AVR_ATmega2560__) ||defined(__AVR_ATmega1280__) )
Serial.println("This must be the UNO.");
#else
#if defined(__AVR_ATmega2560__)
Serial.println("This is the Mega2560.");
#endif
#if defined(__AVR_ATmega1280__)
Serial.println("This is the Mega1280.");
#endif
#endif  //#if !( defined(__AVR_ATmega2560__) ||defined(__AVR_ATmega1280__) )
//=============================the output from progprog.py goes below===================
analogWrite(2,254);
pinModeFast(2,INPUT);
digitalWriteFast(2,HIGH); 
pinModeFast(5,OUTPUT);
digitalWriteFast(5,LOW);
delay(1);
if((int) digitalReadFast(2) != LOW) error(2,5,1);

What printed was correctly “This is the Mega2560.”

I added a line at the top of my sketch
#define AVR_ATmega2560

that did not make a difference.

I changed that line to
#define AVR_ATmega1280
In the Boards menu I still selected the 2560.
now the test program runs on the 2560, reports no errors and is the appropriate size. This is, of course, not an actual solution.

I feel befuddled.

Is there some way an #undef could be snuck in between ?!?

Did you try:

#if !defined(__AVR_ATmega2560__) && !defined(__AVR_ATmega1280__)

?

doing it with ! && ! didn't work either.

I then tried reversing the order of the code, too, so that the mega stuff came first. When I did that I pasted the line I think is at fault in directly from arduino_pins. Still didn't work.

I feel like I must be looking in the wrong place but I don't see where else it could be.

#if !defined(digitalPinToPortReg)
#if defined(__AVR_ATmega1280__) || defined(__AVR_ATmega2560__)
// Arduino Mega Pins
#define digitalPinToPortReg(P) \
(((P) >= 22 && (P) <= 29) ? &PORTA : \
((((P) >= 10 && (P) <= 13) || ((P) >= 50 && (P) <= 53)) ? &PORTB : \
(((P) >= 30 && (P) <= 37) ? &PORTC : \
((((P) >= 18 && (P) <= 21) || (P) == 38) ? &PORTD : \
((((P) >= 0 && (P) <= 3) || (P) == 5) ? &PORTE : \
(((P) >= 54 && (P) <= 61) ? &PORTF : \
((((P) >= 39 && (P) <= 41) || (P) == 4) ? &PORTG : \
((((P) >= 6 && (P) <= 9) || (P) == 16 || (P) == 17) ? &PORTH : \
(((P) == 14 || (P) == 15) ? &PORTJ : \
(((P) >= 62 && (P) <= 69) ? &PORTK : &PORTL))))))))))

#define digitalPinToDDRReg(P) \
(((P) >= 22 && (P) <= 29) ? &DDRA : \
((((P) >= 10 && (P) <= 13) || ((P) >= 50 && (P) <= 53)) ? &DDRB : \
(((P) >= 30 && (P) <= 37) ? &DDRC : \
((((P) >= 18 && (P) <= 21) || (P) == 38) ? &DDRD : \
((((P) >= 0 && (P) <= 3) || (P) == 5) ? &DDRE : \
(((P) >= 54 && (P) <= 61) ? &DDRF : \
((((P) >= 39 && (P) <= 41) || (P) == 4) ? &DDRG : \
((((P) >= 6 && (P) <= 9) || (P) == 16 || (P) == 17) ? &DDRH : \
(((P) == 14 || (P) == 15) ? &DDRJ : \
(((P) >= 62 && (P) <= 69) ? &DDRK : &DDRL))))))))))

#define digitalPinToPINReg(P) \
(((P) >= 22 && (P) <= 29) ? &PINA : \
((((P) >= 10 && (P) <= 13) || ((P) >= 50 && (P) <= 53)) ? &PINB : \
(((P) >= 30 && (P) <= 37) ? &PINC : \
((((P) >= 18 && (P) <= 21) || (P) == 38) ? &PIND : \
((((P) >= 0 && (P) <= 3) || (P) == 5) ? &PINE : \
(((P) >= 54 && (P) <= 61) ? &PINF : \
((((P) >= 39 && (P) <= 41) || (P) == 4) ? &PING : \
((((P) >= 6 && (P) <= 9) || (P) == 16 || (P) == 17) ? &PINH : \
(((P) == 14 || (P) == 15) ? &PINJ : \
(((P) >= 62 && (P) <= 69) ? &PINK : &PINL))))))))))

#define digitalPinToBit(P) \
(((P) >=  7 && (P) <=  9) ? (P) - 3 : \
(((P) >= 10 && (P) <= 13) ? (P) - 6 : \
(((P) >= 22 && (P) <= 29) ? (P) - 22 : \
(((P) >= 30 && (P) <= 37) ? 37 - (P) : \
(((P) >= 39 && (P) <= 41) ? 41 - (P) : \
(((P) >= 42 && (P) <= 49) ? 49 - (P) : \
(((P) >= 50 && (P) <= 53) ? 53 - (P) : \
(((P) >= 54 && (P) <= 61) ? (P) - 54 : \
(((P) >= 62 && (P) <= 69) ? (P) - 62 : \
(((P) == 0 || (P) == 15 || (P) == 17 || (P) == 21) ? 0 : \
(((P) == 1 || (P) == 14 || (P) == 16 || (P) == 20) ? 1 : \
(((P) == 19) ? 2 : \
(((P) == 5 || (P) == 6 || (P) == 18) ? 3 : \
(((P) == 2) ? 4 : \
(((P) == 3 || (P) == 4) ? 5 : 7)))))))))))))))

// 15 PWM
#define digitalPinToTimer(P) \
(((P) == 13 || (P) ==  4) ? &TCCR0A : \
(((P) == 11 || (P) == 12) ? &TCCR1A : \
(((P) == 10 || (P) ==  9) ? &TCCR2A : \
(((P) ==  5 || (P) ==  2 || (P) ==  3) ? &TCCR3A : \
(((P) ==  6 || (P) ==  7 || (P) ==  8) ? &TCCR4A : \
(((P) == 46 || (P) == 45 || (P) == 44) ? &TCCR5A : 0))))))
#define digitalPinToTimerBit(P) \
(((P) == 13) ? COM0A1 : (((P) ==  4) ? COM0B1 : \
(((P) == 11) ? COM1A1 : (((P) == 12) ? COM1B1 : \
(((P) == 10) ? COM2A1 : (((P) ==  9) ? COM2B1 : \
(((P) ==  5) ? COM3A1 : (((P) ==  2) ? COM3B1 : (((P) ==  3) ? COM3C1 : \
(((P) ==  6) ? COM4A1 : (((P) ==  7) ? COM4B1 : (((P) ==  8) ? COM4C1 : \
(((P) == 46) ? COM5A1 : (((P) == 45) ? COM5B1 : COM5C1))))))))))))))

#else

// Standard Arduino Pins
#define digitalPinToPortReg(P) \
(((P) >= 0 && (P) <= 7) ? &PORTD : (((P) >= 8 && (P) <= 13) ? &PORTB : &PORTC))
#define digitalPinToDDRReg(P) \
(((P) >= 0 && (P) <= 7) ? &DDRD : (((P) >= 8 && (P) <= 13) ? &DDRB : &DDRC))
#define digitalPinToPINReg(P) \
(((P) >= 0 && (P) <= 7) ? &PIND : (((P) >= 8 && (P) <= 13) ? &PINB : &PINC))
#define digitalPinToBit(P) \
(((P) >= 0 && (P) <= 7) ? (P) : (((P) >= 8 && (P) <= 13) ? (P) - 8 : (P) - 14))

#if defined(__AVR_ATmega8__)
// 3 PWM
#define digitalPinToTimer(P) \
(((P) ==  9 || (P) == 10) ? &TCCR1A : (((P) == 11) ? &TCCR2 : 0))
#define digitalPinToTimerBit(P) \
(((P) ==  9) ? COM1A1 : (((P) == 10) ? COM1B1 : COM21))
#else  //168,328

// 6 PWM
#define digitalPinToTimer(P) \
(((P) ==  6 || (P) ==  5) ? &TCCR0A : \
        (((P) ==  9 || (P) == 10) ? &TCCR1A : \
        (((P) == 11 || (P) ==  3) ? &TCCR2A : 0)))
#define digitalPinToTimerBit(P) \
(((P) ==  6) ? COM0A1 : (((P) ==  5) ? COM0B1 : \
        (((P) ==  9) ? COM1A1 : (((P) == 10) ? COM1B1 : \
        (((P) == 11) ? COM2A1 : COM2B1)))))
#endif  //defined(__AVR_ATmega8__)


#endif  //mega
#endif  //#if !defined(digitalPinToPortReg)

#ifndef __digitalWrite
      #ifndef digitalWriteFast
            #define digitalWriteFast(P, V) \
            if (__builtin_constant_p(P) && __builtin_constant_p(V)) { \
                                    if (digitalPinToTimer(P)) \
                                                bitClear(*digitalPinToTimer(P), digitalPinToTimerBit(P)); \
                                    bitWrite(*digitalPinToPortReg(P), digitalPinToBit(P), (V)); \
                        } else { \
                                    digitalWrite((P), (V)); \
                        }
      #endif  //#ifndef digitalWriteFast

      #ifndef digitalWriteFast2
            #define digitalWriteFast2(P, V) \
            if (__builtin_constant_p(P) && __builtin_constant_p(V)) { \
            bitWrite(*digitalPinToPortReg(P), digitalPinToBit(P), (V)); \
            } else { \
            digitalWrite((P), (V)); \
            }
      #endif  //#ifndef digitalWriteFast2


      #else 
            #define digitalWriteFast(  digitalWrite(
            #define digitalWriteFast2(  digitalWrite(
#endif //#ifndef __digitalWrite

#ifndef __pinMode
      #ifndef pinModeFast
            #define pinModeFast(P, V) \
            if (__builtin_constant_p(P) && __builtin_constant_p(V)) { \
                                    bitWrite(*digitalPinToDDRReg(P), digitalPinToBit(P), (V)); \
                        } else {  \
                                    pinMode((P), (V)); \
                        } 
      #endif  //#ifndef pinModeFast

      #if !defined(pinModeFast2)
            #define pinModeFast2(P, V) \
            if (__builtin_constant_p(P) && __builtin_constant_p(V)) { \
            if (digitalPinToTimer(P)) \
            bitClear(*digitalPinToTimer(P), digitalPinToTimerBit(P)); \
            bitWrite(*digitalPinToDDRReg(P), digitalPinToBit(P), (V)); \
            } else {  \
            pinMode((P), (V)); \
            } 
      #endif

      #else
            #define pinModeFast(  pinMode(
            #define pinModeFast2( pinMode(
#endif //#ifndef __pinMode

#ifndef __digitalRead
      #ifndef digitalReadFast
            #define digitalReadFast(P) ( (int) __digitalReadFast__((P)) )
            #define __digitalReadFast__(P ) \
            (__builtin_constant_p(P) ) ? ( \
                                    digitalPinToTimer(P) ? ( \
                                             bitClear(*digitalPinToTimer(P), digitalPinToTimerBit(P)) ,  \
                                                       bitRead(*digitalPinToPINReg(P), digitalPinToBit(P))) : \
                                      bitRead(*digitalPinToPINReg(P), digitalPinToBit(P)))  : \
                                    digitalRead((P))
      #endif   //#if !defined(digitalReadFast)

      #if !defined(digitalReadFast2)
            #define digitalReadFast2(P) ( (int) __digitalReadFast2__((P)) )
            #define __digitalReadFast2__(P ) \
            (__builtin_constant_p(P) ) ? ( \
            ( bitRead(*digitalPinToPINReg(P), digitalPinToBit(P))) ) : \
            digitalRead((P))
      #endif

      #else
            #define digitalReadFast( digitalRead(
            #define digitalReadFast2( digitalRead(
#endif  //#ifndef __digitalRead

its almost enough to make one 'superstitious'. Defnitely need to fully understand this one!

Solved.

I had an extra copy of the old version of digitalWriteFast.h in another folder within the Library folder. That one was getting used instead of the one that I was modifying. All 3 boards work with the new digitalWriteFast now, when I use the current (arduino 21) wiring.h and wiring_digital.c

When I try to change to the version of those core files that I built a few days ago, digitalWriteFast does not cooperate with them. I will struggle with that at least a little longer before I repost code.

It isn't absolutely critical that I get the two things to work together but it would be nice.

Paul Stoffregen pointed out that these macros are not currently safe from issue #146 on the Megas. Conceptually what I have to add is something like

void inline atomicWrite( uint16_t address, unint8_t p, unint8_t v) {
  if (address < 0x20)     bitWrite(*address, __digitalPinToBit(p),v);
  else  {
    uint8_t register saveSreg = SREG;     
        cli;                                  
        bitWrite(*address, __digitalPinToBit(p), v);  
        SREG=saveSreg;
  }
}

#define digitalWrite(P, V) \
do {
  if (__builtin_constant_p(P) && __builtin_constant_p(V))   atomicWrite(digitalPinToPortReg(P),P,V);
    else  __digitalWrite((P), (V));         \
 }while (0)

Last night the compiler and I couldn't agree on the exact syntax. the Xcode editor I was using wouldn't help balance the last few }'s.

Surprisingly, this morning the Arduino IDE editor agrees with how I thought the }'s balanced. I broke it up by splitting out atomicWrite this morning but it seems obvious a cast of some sort is missing and I won't test it now.

Handling issue 146 will further diverge performance on ports above F, which already required more code. It struck me overnight that this MIGHT be a reason for continued existence of digitalWriteFast beyond implementation of faster digitalWrite in the core. After all, the vast majority of code I write does not turn the interrupt on. Worth some thought.

It also strikes me that I should probably create test code for whether issue 146 is solved. Maybe I can just add Tone to my current test programs, or something like that. I need something that will generate interrupts 10-1000 times per second, I think. If anyone understands how to write that, or can point me to a forum item I would appreciate that.

Handling issue 146 will further diverge performance on ports above F, which already required more code. It struck me overnight that this MIGHT be a reason for continued existence of digitalWriteFast beyond implementation of faster digitalWrite in the core. After all, the vast majority of code I write does not turn the interrupt on. Worth some thought.

It also strikes me that I should probably create test code for whether issue 146 is solved. Maybe I can just add Tone to my current test programs, or something like that. I need something that will generate interrupts 10-1000 times per second, I think. If anyone understands how to write that, or can point me to a forum item I would appreciate that.

Ah... issue #146. (I was the one that asked Paul to post this bug)

There should be no need to write & run test code to see if this is resolved.
Either the foreground bit set/clear operations are atomic or they
aren't. A simple code inspection can determine this.
(Ideally, look at the assembler output, it will be obvious)

I recommend code inspection over test code because it is super easy and because with test code there is no guarantee that the exact timing scenario will ever exist to create a problem.

Keep in mind that ensuring that the i/o operation is atomic
is not about protecting bits set/cleared in registers by the users foreground (loop) code from being interrupted by interrupt code, it is about protecting the bits set/cleared by the other code done in interrupt routines from the users foreground (loop) code.

It does not matter if the user does not have any
code that runs as an interrupt. It will be his foreground (loop) code
that clobbers the bits set/cleared/used by the interrupt routine.
The users foreground (loop) code i/o operations will always complete and work flawlessly with or without the atomic code masking

And so Ironically, the code that runs at interrupt level like the servo library code does not need to have the additional code overhead of masking/clearing interrupts to make the operation "atomic" since the code is running at interrupt level, it is by definition atomic but
the foreground (loop) code running must have it, if they share the
same i/o register.

And that is the dilemma, sometimes you need the extra code overhead
to ensure the operation is atomic and sometimes you don't need
it to still have atomicity, and yet other times, you may not need atomicity at all.
The problem is the only time you can be ensured that you don't need
it, is if the code is part of an interrupt routine.

In my view, ALL the i/o operations need to be atomic.
Anything else, doesn't really work, and eventually will cause
problems.
Yes there are situations where you can be guaranteed that the
additional code to mask/restore interrupt masks isn't needed but
those are only in interrupts routines which are typically library code
and usually not done by the normal/casual arduino user.

If it were desirable to support some thinner/faster i/o routines to be used
by ISR routines, then that is definitely possible
but again, this goes back to my original suggestion of using
layering and _ (underbar) functions/macros so that the code writer can pick and choose what level of support his code needs.

--- bill

I did finally post a significant update to this:
http://code.google.com/p/digitalwritefast/downloads/list

In the original version of this library I supplied 2 versions of each command with different behavior around turning off PWM (the analogWrite facility). This version does not turn off PWM--that feature is planned to be removed from future versions of the standard commands.

In this version I only supply 1 version of the commands

Much more importantly this version is interrupt safe.

There are several good descriptions of the need for interrupt safe digitalWrite and pinMode commands. I struck me that I have a disassembly listing of the commands that certainly can be used to describe the situation in a different, not necessarily better, way:

For Ports F and below, a single instruction can be used to set or clear an individual bit that corresponds to the digitalWrite or PinMode setting you want to change. It will look like this:

pinModeFast(61,OUTPUT);
    390a:      87 9a             sbi      0x10, 7  ; 16  set a bit of port to 1
digitalWriteFast(61,LOW);
    390c:      8f 98             cbi      0x11, 7  ; 17  clear a bit of port to 0

The cbi and sbi instructions are inherently interrupt safe.

For Ports above F, the microprocessor must read in all the 8 pins in a Port, modify and rewrite all 8 pins. It would look like this if (it were done in an interrupt unsafe fashion):

digitalWriteFast(64,HIGH); 
    38fe:      80 91 08 01       lds      r24, 0x0108  ;read the 8 bits of port into register 24
    3902:      84 60             ori      r24, 0x04    ; set a bit in register 24 to 1
    3904:      80 93 08 01       sts      0x0108, r24  ; write register 24 back to the port

The cbi and sbi instructions cannot address the ports higher than F.

The problem arises when the port is modified by other means in the microsecond between it being read into the register, modified and rewritten. Most commonly this seems to come up with the Servo library, but any code that has an interrupt routine that itself does a digitalWrite or pinMode might cause the issue.

What happens is that the 'lds r24, 0x0108' would read the current information from the port. Then it might happen that an interrupt would divert the microprocessor to handle another task. Imagine that 0x108 is modified during processing of the interrupt. Then the microprocessor returns to your program, which proceeds with 'ori r24, 0x04' setting the bit you wanted, and 'sts 0x0108, r24' destroying whatever the interrupt routine did. The interrupt could come between the ori and the sts as well, of course. The problems are notoriously hard to debug; the exact timing of the interrupt cannot be replicated and the bits of code that interact are likely to be located far from each other--perhaps even in a library, whose source code you have never read!

This version of the library correctly generates interrupt safe code that looks like this:

digitalWriteFast(64,HIGH); 
    38fa:      9f b7             in      r25, 0x3f      ; 63  read the interrupt status
    38fc:      f8 94             cli                  ; turn off interrupts
    38fe:      80 91 08 01       lds      r24, 0x0108
    3902:       84 60             ori      r24, 0x04      ; 4
    3904:       80 93 08 01       sts      0x0108, r24
    3908:       9f bf             out      0x3f, r25      ; 63  restore the interrupt status

This is needed for digitalWriteFast and pinModeFast. The standard digitalWrite was corrected to cover this situation in Arduino 19.

Skimming through this thread, all of the examples presented have used only the constants HIGH and LOW as the values being written using digitalWriteFast. This does correspond with the Reference documentation for digitalWrite as well as many of the example sketches but there are some examples (eg, BlinkWithoutDelay) and likely much existing code that use variables instead. In fact, digitalWrite currently maps any non-zero value into a write of '1'. I would like to see that behavior maintained. How will the new "fast" routines handle these cases?

pinModeFast(2,INPUT);
     1a0:      52 98             cbi      0x0a, 2      ; 10
digitalWriteFast(2,200); 
     1a2:      5a 9a             sbi      0x0b, 2      ; 11
pinModeFast(5,OUTPUT);
     1a4:      55 9a             sbi      0x0a, 5      ; 10
digitalWriteFast(5,LOW);
pinModeFast2(2,INPUT);
     1c6:      52 98             cbi      0x0a, 2      ; 10
digitalWriteFast2(2,HIGH);
     1c8:      5a 9a             sbi      0x0b, 2      ; 11
pinModeFast2(5,OUTPUT);
     1ca:      55 9a             sbi      0x0a, 5      ; 10
digitalWriteFast2(5,LOW);
     1cc:      5d 98             cbi      0x0b, 5      ; 11