1 word: Portability.
If you want a library or sketch code to run across many different platforms or even different boards, then the easiest way to do that is to use an abstraction layer.
Abstraction layers are not necessarily bad, it is just that the way the digital i/o routines were done on arduino force a very sub optimal implementation.
Consider what happens if you use raw port i/o on a AVR as you mentioned.
Some libraries have done that and typically libraries that do this are not portable and VERY difficult to change which pins are used because the code didn't use any sort abstraction even internally.
So even moving from one AVR based board to another can be extremely painful since the specific PORT and bit assigned to an Arduino pin # like digital pin 10, is not the same on different boards.
This makes things really messy for shields that need to use a specific digital pin #
But say you solve that, through some some extensive macros, or defines then there is the issue of moving between processors.
If you hard code things for AVR it obviously immediately fails on any other processor.
One thing that would have really helped is if Arduino didn't use naked constants for its digital pin numbers.
If it had used names/defines like D10, D13, etc... then those values could have encoded information that could have made things MUCH faster since there would be no need to do a lookup to get from the naked constant to the i/o port/bit information.
There are things that can be done, to make the i/o much faster and still be portable, but it is complicated.
From my working with Arduino over the years, it is obvious to me that the original Wiring & Arduino team was not very technical and familiar with gcc and embedded development techniques.
There are many things that would and should have been done differently. And many of them would not affect the end user API.
I have a library called openGLCD that does do raw port i/o so that it can do bit flipping 100s of times faster than using the standard Arduino digital i/o routines.
But is is MANY thousands of lines of code and difficult to maintain for all the board and processors.
The avrio header is pretty magical. It creates a similar dgitalWrite() type interface for the application but generates the fewest possible instructions for doing multi bit pin i/o.
If the user uses appropriate pins, it can even do 8 bit port i/o.
avrio is 1700 lines of code that often ends up generating a single AVR instruction.
It also has to have a fall back mode so that if a board or processor that does not have specific support that the code can still function (albeit slower).
Creating a system that supports this type of i/o that allows users to still be able to configure their pins is very complex and difficult to implement.
What makes doing raw port i/o so complex and painful on the AVR is that I don't think that the chip designers really understood that C compatibility is pretty critical.
They flubbed big time in two areas:
- a harvard architecture that doesn't have direct access to flash on the data side
(no other modern RISC processor does this, not ARM, and not pic32, etc...)
- registers require using bit set/clear instructions for atomic updates
(Many other processors and i/o devices have bit set/clear registers)
It is that last one that really kills things when combined with the way the Wiring/Arduino guys defined their API.
The combination of the the way the AVR does atomic i/o and the way Wiring/Arduino defined its digital i/o API is what creates the perfect storm of crappy i/o performance.
When doing things like digitalWrite() a bit gets set/cleared in a register.
Its is VERY important that this operation be atomic to avoid register corruption.
The way the AVR was designed, the only way to do that is to use the special bit set/clr instructions.
C has no knowledge of such low level processor specific instructions.
To work around this the avr-gcc developers hacked the gcc compiler to make some optimizations under certain conditions:
- memory address is known at compile time
- bit is known at compile time.
If those conditions are met, then the compiler will emit sbi/cbi instructions.
So if you do something like
PORTD |= 0x20;
The compiler will notice two things:
- the memory address of PORTD is known
- the bit is know
So it will generate a sbi instruction.
The problem is that arduino supports runtime pin configuration.
This means that neither are guaranteed to be known at compile time.
i.e. you could do digitalWrite() using variables for pin and value.
Even worse, since naked constants are allowed for pin numbers, you have to convert that number into the needed information.
and, in their implementation the lookup table is in flash which cannot be directly accessed on the AVR so you have to call functions just to get to the table data.
AND even worse.... when cbi/sbi instructions are not used, it is painful to ensure atomicity.
You have to mask interrupts to ensure atomicity.
So now instead of a single cbi/sbi, you have to:
- save the ISR state,
- mask interrupts
- read the i/o register in a tmp register
- or/and bit mask into tmp register
- recover ISR state
There are ways to make it better, like detect if the parameters are compile time constants and do all the lookups at compile time. But the Arduino team has rejected it.
The Teensy versions of the digital i/o routines do this and that is why they are 50x faster than the standard AVR versions.
You could even do some special declarations to allow the compiler to use the data table at compile time so that if the pin numbers are constants the code would be looked up by the compiler at compile time vs at runtime using the flash access routines to get to the data.
But again, the arduino team has been unwilling to look at these types of enhancements.
Interesting side story, the very first day I started looking at Arduino (8 years ago) the first thing I looked at was the digitalWrite() code as I was about to convert a glcd library to use Arduino. I immediately saw an atomicity issue.
They were doing |= and &= operations on registers with runtime data which meant that the operation was not atomic.
It took over a year to convince them that this was an issue and to fix it.
It wasn't an issue of not having the code for the fix, as that was provided.
They simply did not understand the issue. They didn't understand that |= is interruptible on a RISC processor.
Even when showed the exact assembler instructions generated by the compiler and explaining the exact time sequence of events that can create the register corruption, it still was not sinking in.
The only way it got fixed was that several people put the code fix in their version of the AVR core code and it "magically" fixed a long term issue when using the servo library.
It does manifest itself in a odd way in that the foreground code is causing corruption because of an atomic update operation being done in an ISR.
To this day, I still don't think they ever understood the issue.
The net result was that having to add the ISR blocking/restoration around the port updates added additional overhead to i/o operations that were already not that great.
--- bill