ESP8266 direct register reading

I've been trying to get proper usage for a rotary encoder with a nodemcu, and after a lot of debouncing and struggle, i settled on an algorithm which mostly works (it annoyingly increments twice on most motions, but i can live with that).
Now i want to replace my digitalRead calls with a direct read of both my encoder pins (GPIO 5 & 4/ D1 & 2). According to my search, i should see all the pin states at 0x60000318, but when i print ((volatile uint32_t) 0x60000318) im getting numbers which dont correspond to the pin states as far as i can tell. I know that pin A and B are equal only on increments, but i dont see 2 bits which are consistently equal on increments but different on decrements.
Here are some values i get for example:
increments
3221319681
3221319729
3758321713
3758321665
decrements
3221450769
3221450785
3758321681
3758321697

For reference, this is the code:

int PinA = 5;
int PinB = 4;
int Counter = 0;
int PinALastState = LOW;
int pinAState = LOW;
static boolean rotating = false;

void setup() {
  pinMode (PinA, INPUT_PULLUP);
  pinMode (PinB, INPUT_PULLUP);
  PinALastState = digitalRead(PinA);
  pinAState = PinALastState;
  attachInterrupt(PinA, rotEncoder, CHANGE);
  Serial.begin (9600);
}

void loop() {
  while (rotating)
  {
    pinAState = digitalRead(PinA);
    if (pinAState != PinALastState)
    {
      if (digitalRead(PinB) == pinAState) {
        Counter++;
      } else {
        Counter--;
        if (Counter == 0) Counter=0;
      }
      
      Serial.println(Counter);
    }
    PinALastState = pinAState;

    rotating = false;
  }
}

void rotEncoder() {
  rotating = true;
}
extern int ICACHE_RAM_ATTR __digitalRead(uint8_t pin) {
  if(pin < 16){
    return GPIP(pin);
  } else if(pin == 16){
    return GP16I & 0x01;
  }
  return 0;
}

and
from packages/esp8266/hardware/esp8266/2.5.0/cores/esp8266/esp8266_peri.h

#define ESP8266_REG(addr) *((volatile uint32_t *)(0x60000000+(addr)))
#define GPI    ESP8266_REG(0x318) //GPIO_IN RO (Read Input Level)
#define GPIP(p) ((GPI & (1 << ((p) & 0xF))) != 0)

Do you really need added speed?

While you can write code to do direct port reads, I wouldn't be so worried about trying to do direct port i/o on the ESP part as you won't see the big gains from switching from the standard API to direct i/o like you can see when using an AVR.
The ESP core doesn't have the poor coding implementation that causes the overhead in digitalRead()/digitalWrite() like the AVR core.
The ESP core avoids much of what causes the overhead like the slow table lookups that were used in the AVR core.
Not to mention that the ESP is much faster in general.

--- bill

The PJRC Encoder Library supports direct reads on many processors, including ESP8266. However, it does suffer from contact bounce and double-counting with certain encoder types.

I "borrowed" the PJRC direct read code and implemented a state machine solution that takes care of contact bounce and supports both full and half quadrature-cycle-per-detent encoder types. Thus, no double count issues. Take a look and use anything you find useful: GitHub - gfvalvo/NewEncoder: Rotary Encoder Library

gfvalvo,
That code is really more of an indirect port i/o read than a direct port i/o read as it uses a run-time pre calculated port address and bit mask rather than addresses and bits known at compile time.
On the AVR, using direct i/o where the port address and bit are hard coded constants at compile time makes big difference as the compiler will go in and use special AVR instructions to test the bits. It is a work around due to h/w limitations of the AVR processor.
That optimization is not possible with indirect port i/o since the address and bit position are not known at compile time.
i.e. on an AVR you can get single instruction i/o if these are known constants at compile time vs with indirect port i/o, having to fetch the address of the port, read the port, fetch the bitmask, then AND the bitmask with the value read, then test the result of the AND operation.
While indirect port i/o is still much faster than digitalRead() with the Arduino supplied AVR core, due to its poor implementation, it is many times slower than true raw port i/o on the AVR.

I'm curious have you done any bench-marking or logic analyzer traces on the ESP parts to see how much, if any, faster this indirect port i/o is than using their supplied digitalRead()/digitalWrite() functions?
The ESP part is very different from the AVR and some other processors in that it only has a single IO port for all the pins.
This combined with the ESP not needing pin mapping tables makes the Arduino digitalRead()/digitalWrite() API work much more efficiently on the ESP part than on the AVR as well as some of the other processors which has to deal with the lookups to map an Arduino pin # to a register address and bit mask before it can do the i/o read/write operation.

--- bill

The use of the pre-calculated pointer and mask makes sense in the context of this application (even for ESP8266) because it parallels the technique required for the code to run on other processors.

gfvalvo:
The PJRC Encoder Library supports direct reads on many processors, including ESP8266. However, it does suffer from contact bounce and double-counting with certain encoder types.

I "borrowed" the PJRC direct read code and implemented a state machine solution that takes care of contact bounce and supports both full and half quadrature-cycle-per-detent encoder types. Thus, no double count issues. Take a look and use anything you find useful: GitHub - gfvalvo/NewEncoder: Rotary Encoder Library

Thank you gfvalvo, really nice solution, through when I tried, an admitably less intricate solution of my own making, with the delay concept for denouncing, I ended up playing around with 1-5 ms delay with no perfect result on any of them. 2 ms was fine unless I turned the encoder faster. But I'll give yours a try later.

bperrybap I'm not really looking for speed, I'm mostly concerned with the value of pinB changing between the time I read pinA and the time I read pinB as my program gets heavier when I add Wifi and MQTT. I wanted to read both values simultaneously.
Though on the speed side of things, I can note that my encoder results aren't consistent. As rotation speed increases I do see increments of 1 instead of 2. I hide them well with divisions and rounding, but I still don't like unknowns. Of course I have no benchmarks to prove direct pin reads are faster in esp8266 than digital reads, I barely find information online about the process itself, let alone speed tests.

Thank you all for the help