8 Bit Bus / Byte Bashing / 5MB Read-Write

I want to minimize latency, and maximize actual through put to some IO, without too much overhead or complexity. The primary option is SPI with DMA, but as the target IO is only accessed in blocks a few bytes long (~3 to 5) it seems this option will create more overhead and complexity than it saves. The next solution is some “Byte Bashing”, or make a basic 8 bit parallel port. After looking through the Data Sheet, and the Arduino Zero Schematic, and lots of posts, I have made a list of PINS, and how they correlate to the Zero's “GPIO”. Looking at the list, it seems there are possibly two solid bytes of IO, and a further hand full for other functions. I have modified port.h such that PORT_(IN/OUT)_Type has an extra member of four uint8_t's in the union, allowing access to each byte. According to the data sheet these registers can be accessed and modified in bytes. I then read/write these bytes. It seems to work. It seems to generate about 4-5M read/write cycles per second. The program does seem to copy the data from the B Byte to the C Byte. (Extremely short on jumpers at the moment.) The LED toggles when I short the appropriate input

So any opinions on this? Is there a way to see the assembly code to make sure it is really doing byte access, not some crazy tricks. Is there a faster/simpler way to do this?

Pin List

A0 - PORT_PA02
AREF - PORT_PA03
A3 - PORT_PA04
A4 - PORT_PA05
D8 - PORT_PA06
D9 – PORT_PA07

D4 - PORT_PA08
D3 - PORT_PA09
D1 – PORT_PA10
D0 - PORT_PA11
MISO - PORT_PA12
ATN - PORT_PA13
D2 - PORT_PA14
D5 – PORT_PA15

D11 - PORT_PA16
D13 - PORT_PA17
D10 - PORT_PA18
D12 - PORT_PA19
D6 - PORT_PA20
D7 – PORT_PA21
SDA – PORT_PA22 (Note Pull Up Res)
SCL – PORT_PA23 (Note Pull Up Res)

A5 - PORT_PB02
A1 - PORT_PB08
A2 - PORT_PB09
MOSI - PORT_PB10
SCK - PORT_PB11

..\packages\arduino\tools\CMSIS\4.0.0-atmel\Device\ATMEL\samd21\include\component\port.h

typedef union {
struct {
uint32_t OUT:32; /*!< bit: 0..31 Port Data Output Value /
} bit; /
!< Structure used for bit access /
struct {
uint8_t A;
uint8_t B;
uint8_t C;
uint8_t D;
} my_byte; /
!< Structure used for byte access */

uint32_t reg; /*!< Type used for register access */
} PORT_OUT_Type;

typedef union {
struct {
uint32_t IN:32; /*!< bit: 0..31 Port Data Input Value /
} bit; /
!< Structure used for bit access /
struct {
uint8_t A;
uint8_t B;
uint8_t C;
uint8_t D;
} my_byte; /
!< Structure used for byte access /
uint32_t reg; /
!< Type used for register access */
} PORT_IN_Type;

/*
Byte Bashing
*/

int i;
// the setup function runs once when you press reset or power the board
void setup()
{
// initialize serial communication at 9600 bits per second:
Serial.begin(9600);

pinMode(4, INPUT_PULLUP);
pinMode(3, INPUT_PULLUP);
pinMode(1, INPUT_PULLUP);
pinMode(0, INPUT_PULLUP);
pinMode(MISO, INPUT_PULLUP);
pinMode(ATN, INPUT_PULLUP);
pinMode(2, INPUT_PULLUP);
pinMode(5, INPUT_PULLUP);

pinMode(11, OUTPUT);
pinMode(13, OUTPUT);
pinMode(10, OUTPUT);
pinMode(12, OUTPUT);
pinMode(6, OUTPUT);
pinMode(7, OUTPUT);
// pinMode(SDA, OUTPUT);// (Note Pull Up Res)
// pinMode(SCL, OUTPUT);// (Note Pull Up Res)

}

// the loop function runs over and over again forever
void loop()
{
for (i = 0; i < 1000000; i++)
{
PORT->Group[PORTA].OUT.my_byte.C = PORT->Group[PORTA].IN.my_byte.B;
PORT->Group[PORTA].OUT.my_byte.C = PORT->Group[PORTA].IN.my_byte.B;
PORT->Group[PORTA].OUT.my_byte.C = PORT->Group[PORTA].IN.my_byte.B;
PORT->Group[PORTA].OUT.my_byte.C = PORT->Group[PORTA].IN.my_byte.B;
PORT->Group[PORTA].OUT.my_byte.C = PORT->Group[PORTA].IN.my_byte.B;
PORT->Group[PORTA].OUT.my_byte.C = PORT->Group[PORTA].IN.my_byte.B;
PORT->Group[PORTA].OUT.my_byte.C = PORT->Group[PORTA].IN.my_byte.B;
PORT->Group[PORTA].OUT.my_byte.C = PORT->Group[PORTA].IN.my_byte.B;
PORT->Group[PORTA].OUT.my_byte.C = PORT->Group[PORTA].IN.my_byte.B;
PORT->Group[PORTA].OUT.my_byte.C = PORT->Group[PORTA].IN.my_byte.B;
}
Serial.println("10 Million!");
}

It's possible to use register manipulation in your Arduino sketch to increase speed, however it's at the expense of portability between processors.

The SAMD21 has two ports, A and B. The SAMD21G variant used on the Arduino Zero uses most pins of port A, but only a few of port B. In the following examples a 0 suffix after the register's name defines the register for port A, while a 1 defines the register for port B.

Here's an example of the blink sketch using register manipulation:

void setup() {
  // put your setup code here, to run once:
  REG_PORT_DIRSET0 = PORT_PA17;   // Set the direction of the port pin PA17 to an output
}

void loop() {
  // put your main code here, to run repeatedly:
  REG_PORT_OUTSET0 = PORT_PA17;     // Switch the output to 1 or HIGH
  delay(1000);
  REG_PORT_OUTCLR0 = PORT_PA17;     // Switch the output to 0 or LOW
  delay(1000);
}

Here's an example of the same Blink sketch using the toggle register:

void setup() {
  // put your setup code here, to run once:
  REG_PORT_DIRSET0 = PORT_PA17;   // Set the direction of the port pin PA17 to an output
}

void loop() {
  // put your main code here, to run repeatedly:
  REG_PORT_OUTTGL0 = PORT_PA17;     // Toggle the output HIGH and LOW
  delay(1000);
}

To check if a given pin, say digital pin 12 is high:

if (REG_PORT_IN0 & PORT_PA19)  // if (digitalRead(12) == HIGH)

Alternatively to check if it's low:

if (!(REG_PORT_IN0 | ~PORT_PA19)) // if (digitalRead(12) == LOW)

The register definitions are a number of Atmel files (on my Windows machine) currently located at: C:\User\AppData\Local\Arduino15\packages\arduino\tools\CMSIS\4.0.0-atmel\Device\ATMEL\SAMD21\include...

...following this there are two directories named "component" and "instance". The "port.h" file in the "component" directory defines the structure of each register. Using this file you can use the format:

PORT->Group[PORTA].DIRSET.reg = PORT_PA17;

The "port.h" file in the "instance" directory gives the definitions that define the registers' absolute locations, this gives an alternative, but equivalent format used in the examples above.

REG_PORT_DIRSET0 = PORT_PA17;

For your application, where you require a byte's worth of outputs to change at the same time, it might be better to use the SAMD21's OUT registers: REG_PORT_OUT0 (port A) or REG_PORT_OUT1 (port B), rather than the OUTSET and OUTCLR registers used in the examples above.

Thanks for the response. I am definitely throwing portability at the IO level out the window with this. I will modularize things in software, and hardware to make moving to new and improved Arduino boards possible. (Zero Mega Please, or Due 2.0)

The statement

PORT->Group[PORTA].OUT.my_byte.C

I believe is as you suggest. All I have done is broken up the single 32 bit register into four 8 bit registers, which I can write to. I am a little surprised that it takes ~10-12 clock cycles to do the read/write cycle, and would like to see the ASM code that the compiler generates. Also any suggestions on how to accomplish this without editing port.h or on how to speed things up are welcome.

Reading your post more...
From:
C:\xxx\AppData\Local\Arduino15\packages\arduino\tools\CMSIS\4.0.0-atmel\Device\ATMEL\samd21\include\instance

port.h

There is defined
#define REG_PORT_OUT0 (*(RwReg *)0x41004410U) /**< \brief (PORT) Data Output Value 0 */

Is there a way to define or locate "RwByteReg" or similar? And define REG_PORT_OUT0_A/B/C/D?

I think I have what I want, it seems to be the same speed though,

#define REG_PORT_OUT0_A (*(RwReg8 )0x41004410U)
#define REG_PORT_OUT0_B (
(RwReg8 )0x41004411U)
#define REG_PORT_OUT0_C (
(RwReg8 )0x41004412U)
#define REG_PORT_OUT0_D (
(RwReg8 *)0x41004413U)

#define REG_PORT_IN0_A (*(RoReg8 )0x41004420U)
#define REG_PORT_IN0_B (
(RoReg8 )0x41004421U)
#define REG_PORT_IN0_C (
(RoReg8 )0x41004422U)
#define REG_PORT_IN0_D (
(RoReg8 *)0x41004423U)

And

REG_PORT_OUT0_C = REG_PORT_IN0_B;

This copies the byte input "B" to the output byte C on portA