Arduino nano, but faster

Hello everyone

I was thinking this would be the question everyone asked, but seems like I was wrong again. Seems like I use mainly Arduino UNO or NANO with logic level 5V. I have realized my projects often include something time sensitive things like steppers/PWM/servo/pulse encoders etc. and I think I really benefit faster processors (or otherwise have to learn how interrupt works.)

There are so many alternatives, but don´t want to loose what I have learned, so is there any close enough plug and play alternatives but faster for basic NANO?

Programming is not really my thing, but I really keep wondering where GRBL gets all the time to execute all correct steps, must be some sort of sorcery :wink:

There are two ways to solve your problem. One is to get a faster processor. The other is to write more efficient code. This is a very common issue for the whole IT industry. Writing code takes a lot of time and costs a lot of money, even in countries that do not pay their programmers so well. Writing efficient code takes expert programmers, and they get paid more. Hardware is relatively cheap. Hence the decision is easily made, and programmers that are expert enough to write efficient code are a dying breed.

But in the Arduino world, people code as a hobby and don't attach such a high value to the time they spend coding.

Writing efficient code often means making use of features that are specific to a platform, such as avr microcontrollers. Also accessing the hardware directly, not via an abstraction interface. For example, the Arduino function digitalWrite() is much slower than accessing the "PORTx" hardware register directly. But if you access the port directly, then your code cannot be easily moved to another platform later, because the underlying hardware is different, whereas digitalWrite() will probably work without alteration.

If you want a faster version of the Nano, you will almost certainly have to make the move from 5V logic to 3.3V. There is lots of choice. Teensy 3.x, Maple Mini, esp8266 for example. For maximum Arduino compatibility, it may be wisest to choose one based on the microcontroller used in the newer Arduino models "Zero" and "M0" which is the SAMD21 chip. Both AdaFruit and Sparkfun sell Nano-like boards based on this chip. There are also SAMD21 boards on eBay branded "Wemos". But the brand name "Wemos" appears to have been stolen, because these boards do not appear on Wemos' official website. They are not cheaper than some of the AdaFruit/Sparkfun offerings, so I can't see any reason to recommend them.

Thanks for quick reply.
This is only hobby, so might not go direct access and code seems to become so complicated to read, at least to me it does. It was good to point out how slow digitalRead() and digitalWrite() are, I just get light bulb moment and figured out where I lose so much time, on my current project. It was the reason to ask for more speed.

Seems like my stepper pulsing subroutine is written by idiot, well it´s not that bad, but should split on three different subroutines. My subroutine reads all user buttons on every half step and requirement is only for one button.

But this might be suitable time to start checking more powerful alternatives anyway, there are more projects to come :slight_smile:

In general, digitalRead() and digitalWrite() are notoriously slow. But, they provide a consistent interface to hardware regardless of processor type. This is one of the strengths of the Arduino platform but of course any strength is also a weakness.

You can speed up I/O routines significantly by using direct hardware port reads and writes using the predefined DDRn PORTn PINn registers. The strength is speed, the weakness is the loss of processor/board portability.

Fortunately, the Uno and Nano use the same ATmega328 processor so it’s not an issue for those two boards and what you’re doing now, but it would require a rewrite to use the code on a Mega (ATmega2561) or even worse, a non-AVR part like a Due or Zero.

There is a compromise solution to speeding up code that uses digitalWrite().

myPort = portOutputRegister(digitalPinToPort(myPin));
myPinBit = digitalPinToBitMask(myPin);

myPort |= myPinBit; // equivalent to digitalWrite(myPin, HIGH)
myPort &= ~myPinBit; // equivalent to digitalWrite(myPin, LOW)

This approach should give some protection when moving to a different platform (chip). But I wonder how much? I might do some experimenting. I would hope it would work on atmega328, atmega2560 as a minimum. Would it also work on ATtiny45/85? On SAMD21? On esp8266?

Even if these functions/macros are available, you would need to be careful about the data types of myPort and myPinBit because although byte might be fine on avr chips, I suspect they will need to be something larger on 32-bit platforms.

EDIT: I found out that for the Arduino core for esp8266, there are the following definitions in Arduino.h:

#define digitalPinToPort(pin)       (0)
#define digitalPinToBitMask(pin)    (1UL << (pin))
#define digitalPinToTimer(pin)      (0)
#define portOutputRegister(port)    ((volatile uint32_t*) GPO)
#define portInputRegister(port)     ((volatile uint32_t*) GPI)
#define portModeRegister(port)      ((volatile uint32_t*) GPE)

So that looks hopeful. I guess from the "1UL" that myPinBit would need to be uint32_t and myPort would need to be uint32_t*. Those types would be less efficient on avr, so some compiler directives might be needed to define the appropriate types for the chip, unless they too already exist...

Unfortunately, the SAMD processors are not as much faster than an AVR as you might hope.
While the clock rate is higher, the "special instructions" for writing to pins are gone, and it takes the ARM 3 or four instructions to do what would have been a single instruction on AVR. I did some experiments recently, and found digitalWrite() to be about 3x faster than an AVR (310kHz max toggle speed.) Best all-out-effort pin toggle speed was also about 3x of an AVR (12MHz), but it's a bit less "general" in some sense. You'd have to carefully craft and benchmark specific code cases. (For example, the 12MHz toggle code on ARM uses two registers, while the 4MHz AVR code uses none. And there are only (sort-of) 8 registers on ARM CM0.)

https://forums.adafruit.com/viewtopic.php?f=57&t=133497#p668317

Got around to doing some testing.

My test code:

#define PIN 2

uint8_t myPort;
uint8_t myPinBit;

void setup() {
  pinMode(PIN, OUTPUT);
  Serial.begin(115200);
  myPort = portOutputRegister(digitalPinToPort(PIN));
  myPinBit = digitalPinToBitMask(PIN);
}

void loop() {
  unsigned long startTime = micros();
  for (long i = 0; i < 100000UL; i++) {
    digitalWrite(PIN, HIGH);
    digitalWrite(PIN, LOW);
  }
  float digWriteSpeed = (micros() - startTime)/200000.00;
  Serial.print("digitalWrite() ");
  Serial.print(digWriteSpeed);
  
  startTime = micros();
  for (long i = 0; i < 100000UL; i++) {
    myPort |= myPinBit;
    myPort &= ~myPinBit;
  }
  float portManipSpeed = (micros() - startTime)/200000.00;
  Serial.print("us, Port Manipulation ");
  Serial.print(portManipSpeed);
  Serial.print("us which is ");
  Serial.print(digWriteSpeed/portManipSpeed);
  Serial.println(" times faster");
  
}

Arduino Nano 3 (atmega328 @ 16MHz):
digitalWrite() 3.59us, Port Manipulation 0.28us which is 12.65 times faster

Arduino Pro Micro (atmega32u4 @ 16MHz):
digitalWrite() 3.67us, Port Manipulation 0.28us which is 12.89 times faster

For esp, I had to make a couple of changes to the sketch. I knew I would need to use 32 bit rather than 8, but upon compiling, I realised I also had to use pointers.

#define PIN D2

volatile uint32_t *myPort;
uint32_t myPinBit;
...
    *myPort |= myPinBit;
    *myPort &= ~myPinBit;

Wemos Mini (esp8266 @80MHz):
digitalWrite() 0.46us, Port Manipulation 0.28us which is 1.62 times faster

Wemos Mini (esp8266 @160MHz):
digitalWrite() 0.23us, Port Manipulation 0.21us which is 1.11 times faster

Strangely, at 160MHz, the port manipulation time did not halve as I expected. I increased the loops to 1,000,000, but got the same result.

I then went back to the Nano to see if the pointer version worked OK:

#define PIN 2

uint8_t *myPort;
uint8_t myPinBit;

...

    *myPort |= myPinBit;
    *myPort &= ~myPinBit;

It did work, and the result was:
digitalWrite() 3.59us, Port Manipulation 0.41us which is 8.76 times faster

So using the pointer slowed down the direct port manipulation, but it works with and without using a pointer.

Using the "volatile" keyword slowed it down a little further:
digitalWrite() 3.59us, Port Manipulation 0.54us which is 6.70 times faster

PaulRB:
I then went back to the Nano to see if the pointer version worked OK:

#define PIN 2

uint8_t *myPort;
uint8_t myPinBit;

...

*myPort |= myPinBit;
    *myPort &= ~myPinBit;




It did work, and the result was:
digitalWrite() 3.59us, Port Manipulation 0.41us which is 8.76 times faster

So using the pointer slowed down the direct port manipulation, but it works with and without using a pointer.

Using the "volatile" keyword slowed it down a little further:
digitalWrite() 3.59us, Port Manipulation 0.54us which is 6.70 times faster

Thanks, this seems to be the way to go for now and now comes the big BUT, would it be possible to write example with more ports used than only one. There are probably many hobbyist like me, monkey see, monkey do. With this example I can directly use one port, but how to define if used more than 1? Like every time, did some google on this and google choose to not give me answer on this, just keep saying "these are not the direct access codes you are looking for".

Well, you say "this seems to be the way to go", but my experiments showed me that using these functions doesn't give as smooth an upgrade path to faster processors as I had hoped. And as westfw pointed out, they won't necessarily give your code a big boost when you move to a faster processor. And indeed I found that a single port manipulation took 0.28us on a 16MHz Nano and 0.28us on an 80MHz esp.

So to answer your question, I'll assume you will sticking with avr processors for now, because I don't have a good answer otherwise.

Even on different processors from the avr family, Arduino Pin X is not necessarily the same bit on the same port. Worse still, Arduino Pins X & Y might be on the same port on one avr processor but on different ports on another avr processor.

To deal with those situations, your code needs to use the functions portOutputRegister(), digitalPinToPort() and digitalPinToBitMask() for each Arduino pin your code needs to use, and store the results in different variables. So for example.

#define CLK 2
#define DATA 3
#define LATCH 4

uint8_t clkPort, clkBit, dataPort, dataBit, latchPort, latchBit;

void setup() {
  clkPort = portOutputRegister(digitalPinToPort(CLK));
  clkBit = digitalPinToBitMask(CLK);
  dataPort = portOutputRegister(digitalPinToPort(DATA));
  dataBit = digitalPinToBitMask(DATA);
  latchPort = portOutputRegister(digitalPinToPort(LATCH));
  latchBit = digitalPinToBitMask(LATCH);
}

and so on.

Thanks, this clears how to multiply things. So I did miss this nano vs esp entirely, weird. But with this nano case, faster port manipulations frees time to do other things. I wish I knew this when I was experimenting radiocommunication on those cheap 355Mhz radios with stepper motor controls, well it was disaster and big mess ^2 :slight_smile:

Very interesting comparisons.

PaulRB:
There are also SAMD21 boards on eBay branded "Wemos". But the brand name "Wemos" appears to have been stolen, because these boards do not appear on Wemos' official website. They are not cheaper than some of the AdaFruit/Sparkfun offerings, so I can't see any reason to recommend them.

Just as an FYI, WeMos doesn't sell any micro controllers that aren't based on the ESP8266 or ESP32 chip.

If it doesn't come from http://wemos.cc, don't buy it.

You ask: "Programming is not really my thing, but I really keep wondering where GRBL gets all the time to execute all correct steps, must be some sort of sorcery :wink: ".

All the hard work is done on the PC feeding the controller on your machine. The Arduino controller program just executes the commands sent to it. Those commands are not the "G" and "M" codes, but the resulting output from the interpreter. There may be dozens of commands for a single "G" code line.

But, all the commands are optimized for that particular movement.

Paul

I have seen this board (Arduino ARM). I have a project with space constraints and use of a specific port expander shield. I need the same pinout and more memory. And for future evolution it permits me to have some addition calculus power. I think it can work well.

https://wiki.protoneer.co.nz/NANO-ARM

Yes, I have a NanoARM in my collection of boards. It seems a good board, but the support on the Protoneer forum could be better. I tried to get the board into a very low power mode and was not as successful as I had hoped. Similar boards from other designers achieve much lower power, and I never figured out how. It could be the way the Arduino core works. The NanoARM uses the Arduino Zero core. Perhaps those very low power boards have their own core which helps achieve that. But perhaps you are not concerned with low power operation.

The NanoARM is pin compatible with Nano to a degree but inevitably there are some differences. The obvious one is that the NanoARM is 3.3V and the Nano is 5V. NanoARM is not 5V tolerant. So while it might be mechanically compatible with your "shield" whatever that is, the voltages may not be compatible.

Another difference is the SPI port. But I think I did manage to get SPI working on the same pins as Nano with some extra configuration code. But it certainly did not work "out of the box". The same may be true of i2c, I have not tried that.

Just to let you know, waking up old, dead topics is somewhat frowned upon in the forum. There is a warning message on the page of old topics about that. Better to start a new topic and put a link to the old topic in your post.


You can see above that the i2c pins on NanoARM are where A6 & A7 are on the Nano. On Nano, they are A4 & A5.

Another annoying difference is that Serial.print() does not print to the serial monitor. You must use SerialUSB.print() instead. This is also true of Arduino Zero.