Controling Ports directly for fast digitalRead and digitalWrite, also replace delays with timer to further decrease time of program execution

So, my objective is to control SCARA arm, but issue is both arms need to move simultaneously (There is variation in rotation θ1 and θ2, but for now simultaneous rotation is okey)
after searching in forums. I get to know we can directly manipulate port to reduce delay due to digitalRead and digitalWrite also having advantage to doing simultaneous operation, also came across timer (of crystal oscillator) to replace it with delays.

but I don't understand how. Is there tutorial for that?
This is my program that I done from what I understand.

/*
Stepper Motor Tester v3 @Kunal Panchal (MicroStepping)
Board - Arduino Uno R3
*/
//Config
int Pulse_Delay = 100;
int cwStep = 10000;
int ccwStep = 10000;
int Microstep_Resolution = 16;

void setup() {
  DDRB |= 0x1f; //Equivalent to setting pin 12,11,10,9,8 as output !
  DDRD |= 0xfc; //Equivalent to setting pin 7,6,5,4,3,2 as output !
  /*
  //how to put input_pullup ?
  DDRB |= 1<<DDB2;
  PORTB |= 1<<DDB2; //send pin high for pullup.... ?
  DDRB |= 1<<DDB2;
  //well it's not working though
  */
  pinMode(2, INPUT);  //above code as example for this as input_pullup ?
  pinMode(A0, INPUT_PULLUP); //if correct same logic apply to analog pin ?
  pinMode(A1, INPUT_PULLUP);


  //MicroStepping logic
  if (Microstep_Resolution == 32){
    PORTB |= 0x07;
    PORTD |= 0x38; 
    /* 
    Example for PORTB |= 0x07;
    digitalWrite(8, HIGH);
    digitalWrite(9, HIGH);
    digitalWrite(10, HIGH);
    */
  }
  if (Microstep_Resolution == 16){
    PORTB |= 0x04;
    PORTD |= 0x20;
    /*
    Example for PORTB |= 0x04;
    digitalWrite(8, LOW);
    digitalWrite(9, LOW);
    digitalWrite(10, HIGH);
    */
  }
  if (Microstep_Resolution == 8){
    PORTB |= 0x03;
    PORTD |= 0x18;
    /*
    Example for PORTB |= 0x03;
    digitalWrite(8, HIGH);
    digitalWrite(9, HIGH);
    digitalWrite(10, LOW);
    */
  }
  if (Microstep_Resolution == 4){
    PORTB |= 0x02;
    PORTD |= 0x10;
    /*
    Example for PORTB |= 0x02;
    digitalWrite(8, LOW);
    digitalWrite(9, HIGH);
    digitalWrite(10, LOW);
    */
  }
  if (Microstep_Resolution == 2){
    PORTB |= 0x01;
    PORTD |= 0x08;
    /*
    Example for PORTB |= 0x01;
    digitalWrite(8, HIGH);
    digitalWrite(9, LOW);
    digitalWrite(10, LOW);
    */
  }
  if (Microstep_Resolution == 1){
    PORTB &= 0;
    PORTD &= 0;
    /*
    digitalWrite(8, LOW);
    digitalWrite(9, LOW);
    digitalWrite(10, LOW);
    */
  }

}
void RUN_all (){
  PORTB |= 0x08; //pulse 1
  PORTD |= 0x40; //pulse 2
  delayMicroseconds(Pulse_Delay);
  PORTB &= 0xf7; //pulse 1
  PORTD &= 0xbf;  //pulse 2
  delayMicroseconds(Pulse_Delay);
}
void RUN_1 (){
  PORTB |= 0x08; //pulse 1
  delayMicroseconds(Pulse_Delay);
  PORTB &= 0xf7; //pulse 1
  delayMicroseconds(Pulse_Delay);
}
void RUN_2 (){
  PORTD |= 0x40; //pulse 2
  delayMicroseconds(Pulse_Delay);
  PORTD &= 0xbf;  //pulse 2
  delayMicroseconds(Pulse_Delay);
}
void loop(){
  if (digitalRead(2)==HIGH){ 
    /*
    Is there way to digital read using port like
    if (PORTD == (1<<PD2)){"command"} //well ig its completly wrong ?
    */
    if (digitalRead(A0)==HIGH && digitalRead(A1)==LOW){
      PORTB |= 0X10;  //Direction Signal 1
      for (int i=0; i<=cwStep; i++) 
      RUN_1();
      PORTB &= 0xEF;  //Direction Signal 1
      delay(1000);
      for (int i=0; i<=ccwStep; i++)
      RUN_1();
      delay(1000);
    }
    if (digitalRead(A0)==LOW && digitalRead(A1)==HIGH){
      PORTD |= 0x80; //Direction Signal 2
      for (int i=0; i<=cwStep; i++) 
      RUN_2();
      PORTD &= 0x7f; //Direction Signal 2
      delay(1000);
      for (int i=0; i<=ccwStep; i++)
      RUN_2();
      delay(1000);
    }
    if (digitalRead(A0)==HIGH && digitalRead(A1)==HIGH){
      PORTB |= 0X10;  //Direction Signal 1
      PORTD |= 0x80; //Direction Signal 2
      for (int i=0; i<=cwStep; i++) //simultaneous operation same θ
      RUN_all();
      PORTB &= 0xEF;  //Direction Signal 1
      PORTD &= 0x7f; //Direction Signal 2
      delay(1000);
      for (int i=0; i<=ccwStep; i++) //simultaneous operation same θ
      RUN_all();
      delay(1000);
    }
    if (digitalRead(A0)==LOW && digitalRead(A1)==LOW){
      PORTB |= 0X10;  //Direction Signal 1
      PORTD |= 0x80; //Direction Signal 2
      for (int i=0; i<=cwStep; i++) //simultaneous operation same θ
      RUN_all();
      PORTB &= 0xEF;  //Direction Signal 1
      PORTD &= 0x7f; //Direction Signal 2
      delay(1000);
      for (int i=0; i<=ccwStep; i++) //simultaneous operation same θ
      RUN_all();
      delay(1000);
    }
  }
}

So, what I want help with

  1. Direct port manipulation to Set pin as INPUT, OUTPUT, INPUT_PULLUP. I know DDRx but don't know how it works i seen different variation in code.
  2. Direct port manipulation to Set pin as HIGH, LOW, So I suppose "|=" for High and "&=" for low? And why it don't interrupt my old "|=" (HIGH)
  3. Direct port manipulation to Read pin, Well I don't understand anything about read with PORTx.
  4. Is it different for analog pin? And what about analogRead and analogWrite as direct PORT manipulation (Just curious not that important for my project)
  5. And how i can replace delays with timer.

I attached image for the circuit.

You know, when "we" say that digitalWrite() is "very slow", we mean that it takes something like 10 microseconds, instead of the 0.25us that it could take using "direct port writes." 10 us is still pretty fast compare to most physical movement; have you tried your sketch with normal digitalWrite() and digitalRead() calls? "Premature optimization is the root of much evil."

In particular, there is no point whatsoever in using direct port manipulation on the DDR registers in one-time setup(), when you really don't care how fast it goes. The normal pinMode() commands work fine in conjunction with PORTB writes later on, if you end up needing them.

  • Direct port manipulation to Set pin as HIGH, LOW, So I suppose "|=" for High and "&=" for low? And why it don't interrupt my old "|=" (HIGH)

|= for set, &= ~ for clear (followed by a bit mask.)
Or you know, you could also just search out one of the many existing "digitalWriteFast" implementations...

  • Direct port manipulation to Read pin, Well I don't understand anything about read with PORTx.

For read, you'd reference PINx. PINx & bitmask isolates a single bit - zero means the pin was low, non-zero means it was high.

  • Is it different for analog pin? And what about analogRead and analogWrite as direct PORT manipulation

Yes, it would be significantly different. analogRead() involves interacting with the ADC peripheral, and analogWrite() involves interacting with the TIMER peripherals.

Hello kunalppanchal

Take a view into the BlinkWithOutDelay example to be found in the IDE.

This example provides as mother of all timers used in the Arduino biotop the basic coding to design own timers.

Have a nice day and enjoy coding in C++.

On classic AVRs with 16MHz system clock, digitalReadFast() can be made to run in 0.125us. On modern AVRs half that (because in AVRxt, CBI and SBI are single clock).

DDRx |= (1<<bit); sets that bit using an atomic single word instruction.

DDRx &= ~(1<<bit) clears that bit.

The same is true of PORTx - 1's are high and 0's are lows.

For reads, PINx & (1<<bit) in the test of an if or while statement will compile to an SBIC , skipping the next instruction of the bit is not set (hence the pin is low) .

On modern (post 2016 Revolution) AVRs, you replace PORTx/etc with VPORTx.DIR, VPORTx.OUT, and VPORTx.IN. setting and clearing bits on modern AVRs like that is an atomic SINGLE CLOCK operation. At 20 MHa, that means a digitalWriteFast() where both arguments are known at compiletime and constant, that is a single clock atomic instruction, or use the digitalWriteFast() and similar functions provided by most third party hardware packages for those devices. . digitalWriteFast(constant pin, val?HIGH::LOW) is slightly faster on modern AVRs (at least the ones my cores ship with

What is the maximum step-pulse-frequency that you need to have?

Depending on that maximum-fequency you can use the stepper-library MobaTools to drive mutliple steppers.

If In your case two stepper-motors have to move in a step for step coordinated way this is usually done by applying the bresenham algorithm

best regards Stefan

Thank you Everyone.
@westfw

  1. yes, I get it setup don't need DDRx I could use standard pinMode.
    it just makes code short to assign multiple pins as output or input I was just wondering how input_pullup is assigned with DDRx method, but I will stick with pinMode.

  2. Well, I didn't know about digitalWriteFast I will look into it. seems it is as fast as PORTx method. But much simpler.

  3. Still I didn't understand how you read with PINx. I am not that good at coding but if you can share an example.

  4. I am staying away from analogRead() and analogWrite() then XD.

@paulpaulson
Yes, I know about millis() and micros() but didn't know there was example in IDE.
Referencing it, I will try to change my delay() to millis() code.
so, what exactly delay() blocking? avoiding use of delay() increase my speed of program/pulse?
I also came across TimerOne library and also timer and counter (TCCRxx) for replace delay().

@DrAzzy
well, I didn't understand VPORTx part but I will search for that.

For reads, PINx & (1<<bit)

how do I implement that, sorry not that good at this.

DDRx |= (1<<bit);
DDRx &= ~(1<<bit);

for "bit" I seen many code using like "DDBx" or "PBx" or directly assigning DDRB |= (1<<7); bit number that why I was confused for while.

@StefanL38
ok I will look into MobaTools and bresenham algorithm.

Thing is at my work I use different controller that have 9.83MHz clock but able to give pulse frequency of 2MHz but it's only have 5 I/O for Start, Pulse, DIR, EL+, EL- (yes single axis and that's the problem)
So, I was wondering why 16MHz Arduino gives less pulse frequency,
degitalWrite(), delay() making it slow ?

I have another stepper driver that control heavy stepper Moters (10 Apeak) with 1/256 microstepping which have input pulse frequency of 1MHz max. So is there way to give pulse output at high frequency (100-200KHz works too)? (Not that necessary I can still go for 1/16).
but sometimes I need to rotate my SCARA Arms but like 0.2º so higher microstepping is preferable.

This might be out of topic but how Marline able to give simultaneous pulse frequency to different drivers for different rotation and direction.

Hello kunalppanchal

Take a view here to gain the knowledge:

Why You Shouldn’t Always Use the Arduino Delay Function | Random Nerd Tutorials.

never looked into the marlin firmware special code using all the tricks, pretty sure the marlin firmware uses the bresenham-algorithm
this means for each coordinated movement of 2 or 3 axles the distances between startpoint and endpoint are calculated
the axle with the biggest distance becomes the leading axle
which means the leading axles change between X,Y and Z depending on which axle has the biggest distance.
then there is one single loop that creates the pulses for all two or three axles
The slower axles get their step-pulses at a lower frequency calculated based on the slope-value between the axles dx/dy dy/dx always depending on which axle has the biggest distance

squeezing 100 kHz step-pulses for two axles out of an arduino is a hard task.

If you need stepper-pulses on two axles synchronised step by step with 200 kHz, change to a different microcontroller Teensy 4.1v(600 MHz Clock 32bit )

best regards Stefan

In terms of priority (sorted high to low), we get this?

  • 1000000 μs: delay(1000)
  • 100 μs: Pulse_Delay = 100
  • 10 μs: digitalWrite()
  • 0.25 μs: direct port manipulation

So, Programming controller NPM FMC32 was kind of Assembly Programming that's why it outputs very high frequency. it directly modifies register to do task or schedule the tasks.

So, I thought how about doing same with Arduino, I didn't know about bare metal programming but was just searching for alternative methods to program without use Arduino function.

/*
Stepper Motor Tester v3.1 @Kunal Panchal (Dual Axis) (MicroStepping)
Board - Arduino Uno R3
*/
//Config
const float Pulse_Delay = 0.5; //to change pulse frequency
const unsigned long cwStep = 51200;//No. of steps Clockwise.
const unsigned long ccwStep = 51200; //No. of steps CounterClockwise.
const int Microstep_Resolution = 16; // 1,2,4,8,16

void RUN_all (){
  PORTB |= 0x08; //pulse 1
  PORTD |= 0x40; //pulse 2
  _delay_us(Pulse_Delay);
  PORTB &= 0xf7; //pulse 1
  PORTD &= 0xbf;  //pulse 2
  _delay_us(Pulse_Delay);
}
void RUN_1 (){
  PORTB |= 0x08; //pulse 1
  _delay_us(Pulse_Delay);
  PORTB &= 0xf7; //pulse 1
  _delay_us(Pulse_Delay);
}
void RUN_2 (){
  PORTD |= 0x40; //pulse 2
  _delay_us(Pulse_Delay);
  PORTD &= 0xbf;  //pulse 2
  _delay_us(Pulse_Delay);
}

int main(void){
  DDRB |= 0x1f; //Equivalent to setting pin 12,11,10,9,8 as output ! (Drive 1)
  DDRD |= 0xf8; //Equivalent to setting pin 7,6,5,4,3,2 as output !  (Drive 2)
  DDRD &= 0xfb; //Equivalent to setting pin 2 as input ! (start)
  DDRC &= 0xfc; //Equivalent to setting pin A0,A1 as input ! (drive ENA)
  PORTC |= 0x03; //Equivalent to setting pin A0,A1 as pullup !

  //MicroStepping logic {M1,M2,M3} for A4988/DRV8825 stepper driver
  if (Microstep_Resolution == 32){ //{1,1,1}|{0,1,1}|{1,0,1}
    PORTB |= 0x07;
    PORTD |= 0x38;
  }
  if (Microstep_Resolution == 16){ //{0,0,1}
    PORTB |= 0x04;
    PORTD |= 0x20;
  }
  if (Microstep_Resolution == 8){ //{1,1,0}
    PORTB |= 0x03;
    PORTD |= 0x18;
  }
  if (Microstep_Resolution == 4){ //{0,1,0}
    PORTB |= 0x02;
    PORTD |= 0x10;
  }
  if (Microstep_Resolution == 2){ //{1,0,0}
    PORTB |= 0x01;
    PORTD |= 0x08;
  }
  if (Microstep_Resolution == 1){ //{0,0,0}
    PORTB &= 0;
    PORTD &= 0;
  }
    // Loop forever
    while (1)
    {
      _delay_ms(5); //code breaks if I dont use delay her A0,A1 reads high initally and sometimes goes random
      if (PIND & 1<<2){ // start button HIGH
        if (PINC & 1<<0 && (PINC & 1<<1)<1){ //drive 1 enable (A0==HIGH && A1==LOW)
          PORTB |= 0X10;  //Direction Signal 1
          for (unsigned long i=0; i<=cwStep; i++) // first one (-_-)
          //for (unsigned long cwStep = 51200; cwStep>0; cwStep--) //instead comparing with other variable ! 
          //which is better, probably first one?
          RUN_1();
          PORTB &= 0xEF;  //Direction Signal 1
          _delay_ms(1000);
          for (unsigned long i=0; i<=ccwStep; i++)
          RUN_1();
          _delay_ms(1000);
        }
        if ((PINC & 1<<0)<1 && PINC & 1<<1){//drive 2 enable (A0==LOW && A1==HIGH)
          PORTD |= 0x80; //Direction Signal 2
          for (unsigned long i=0; i<=cwStep; i++)
          RUN_2();
          PORTD &= 0x7f; //Direction Signal 2
          _delay_ms(1000);
          for (unsigned long i=0; i<=ccwStep; i++)
          RUN_2();
          _delay_ms(1000);
        }
        // initial signal error for this condition
        if (PINC & 1<<0 && PINC & 1<<1){ //Both High (A0==HIGH && A1==HIGH)
          PORTB |= 0X10;  //Direction Signal 1
          PORTD |= 0x80; //Direction Signal 2
          for (unsigned long i=0; i<=cwStep; i++) 
          RUN_all(); //simultaneous operation same θ
          PORTB &= 0xEF;  //Direction Signal 1
          PORTD &= 0x7f; //Direction Signal 2
          _delay_ms(1000);
          for (unsigned long i=0; i<=ccwStep; i++)
          RUN_all();
          _delay_ms(1000);
        }
      }
    }
}

And this is what I got after going through forums.
and it's much faster to produce higher pulse for dual axis comparatively to digitalWrite/Read. (Now I need to work on different rotation angles)

  1. Any optimization for my code?
  2. while programing (NPM controller) I came across (jump function).
  • JMP (unconditional Jump to specific line of code/Label.)
  • JNZ (Jump to specific line of code/Label until register value become 0)
    {We have to specify register value i.e 0-255 just like analog pin in Arduino!
    each time label run we decrement variable register (value - 1)}
  • JPI (Jump to specific line of code/Label when interrupt is generated)
    And just for curiosity is it possible to program Arduino by using assembly language?
    Edit:- Well I looks into different forums and people say its not worth it and I am not skilled enough to search through datasheets and ATMEL docs.

@StefanL38
So, considering single pin (As output and nothing else) how fast I could pulse? (Using 3.3v/GND as signal for DIR/ENA).
And I also have ESP32 (240MHz) does it work for this, I really didn't use it much.

The mobatools documentation says 30 kHz maximum stepper-frequency with ESP32

The maximum pin toggle frequency, in a tight loop, for a 16MHz AVR (ATmega328 generation) is about 2.7MHz

But that is a VERY tight loop (3 instructions) with no decision making at all involved, and adding instructions cuts the speed pretty quickly.
To get 100kHz, you'd have 160 clock cycles to play with. You can probably implement the inner loop of a Bresenham algorithm in less than that, using direct port writes, but it would be "an interesting challenge."

void RUN_all (){
  PORTB |= 0x08; //pulse 1
  PORTD |= 0x40; //pulse 2
  _delay_us(Pulse_Delay);
  PORTB &= 0xf7; //pulse 1
  PORTD &= 0xbf;  //pulse 2

Ugh. Please at least assign some names, to make the code much more readable:

#define M1_PORT PORTB
#define M2_PORT PORTD
static const uint8_t STEP_M1 = 0x0x8;
static const uint8_t STEP_M2 = 0x40;

void RUN_all (){
  M1_PORT |= STEP_M1; //pulse 1
  M2_PORT |= STEP_M2; //pulse 2
  _delay_us(Pulse_Delay);
  M1_PORT &= ~STEP_M1; //pulse 1
  M2_PORT &= ~STEP_M2;  //pulse 2

(Note that the bitwise inversion happens at compile time, so using ~STEP_M1 is just as fast as inverting the constant yourself.)

Any optimization for my code?

Implement the Bresenham algorithm first. Because you'll end up calculating a start point, direction, and fractional increment/decrement, and then the inner loop is a pretty simple "move on axis, maybe move on the other axis, increment values"
There's a simple Bresenham implementation here, along with a pointer to an explanation. (ok, it was for graphics, so it's SLIGHTLY different.)

[ JMP, JNZ, etc]
And just for curiosity is it possible to program Arduino by using assembly language?
Edit:- Well I looks into different forums and people say its not worth it

Of course you can program an Arduino in assembly language. But yeah - usually the C compiler does a VERY good job of optimizing C source code and putting in the proper JMP (or BRanch, for an AVR), and any gain you might get from using assembly language is quite small.

This topic was automatically closed 180 days after the last reply. New replies are no longer allowed.