unsigned long as byte container ?

Anyone able to see why this goes wrong? Platform is Mega2560.

My attempt to use unsigned long as byte container to use as an index fails similar as if there is a problem with signet bit for uint16_t.

uint8_t keybitmap[4];
uint32_t keyindex = 0;

..fill keybitmap[] with confirmed valid unsigned numbers 1,2,4,8,16,32,64,128

//uint32_t as 4 byte container
keyindex |= keybitmap[3] << 24;
keyindex |= keybitmap[2] << 16;
keyindex |= keybitmap[1] << 8; 
keyindex |= keybitmap[0];

Fails at bit 16

Serial.print("0x");
Serial.println(keyindex,HEX);
//output from loop moving single bit though each byte fails at bit 16:
	0x1
	0x2
	0x4
	0x8
	0x10
	0x20
	0x40
	0x80
	0x100
	0x200
	0x400
	0x800
	0x1000
	0x2000
	0x4000
	0xFFFF8000
	0x0 from here on

what you are trying to implement is an union ?

union
{
  uint8_t bitmap[4];
  uint32_t index;
} key;

key.index = 0;
key.bitmap[3] = 0xC0;

Serial.println(key.index, HEX);

Don’t really know what you are doing, have you looked at using an Union?

Not actually. I'm trying to use as few µC cycles as possible to speed up keypad validation.
Rather than make an indexed table I thought I could use the four array values to create an unique index.

So you want to scan a keypad as fast as possible?
what size is the keypad?

My problem is more like unit32_t act as if it is uint16_t

I have a working keypad scanner that are reasonable fast. It s the validation of mutiple keys that gave me speed problems. Herein the need for an index (table).

void scankeys() {
/* read hardware port */
	keyindex = 0;
    for (uint8_t c=0; c<8; c++) {
        DDRL = 1 << c; 
        for (uint8_t r=0; r<4; r++) { 
            BIT_WRITE(keybitmap[r], c, !BIT_READ(PINB, r+4));			
        }
    }
	// create unique index for fast validation
	keyindex |= keybitmap[3] << 24;
	keyindex |= keybitmap[2] << 16;
	keyindex |= keybitmap[1] << 8; 
	keyindex |= keybitmap[0]; 
}
keyindex |= keybitmap[2] << 16;

You’re doing sixteen bit integer arithmetic.
Any left shift by sixteen or more bits will result in zero.

Try casting the byte value to uint32_t before shifting.

Thank you TolpuddleSartre. That did it.

what is the current timing of the scankeys call ?

Now I have seen the code I think its speed can be squeezed a bit more.

robtillaart:
what is the current timing of the scankeys call ?

Now I have seen the code I think it can squeezed a bit more.

Yes please, optimising is welcome. What are you thinking?

StillNotWorking:
Yes please, optimising is welcome. What are you thinking?

What is current timing?

(be back in < 5 minutes)

Not found a good way to measure. Main program is run from CTC interrupt timer.

Loop unrolled and removed some shifts

void scankeys()
{
  DDRL = 0x01;
  for (uint8_t c = 0; c < 8; c++)
  {
    BIT_WRITE(keybitmap[0], c, !BIT_READ(PINB, 4));
    BIT_WRITE(keybitmap[1], c, !BIT_READ(PINB, 5));
    BIT_WRITE(keybitmap[2], c, !BIT_READ(PINB, 6));
    BIT_WRITE(keybitmap[3], c, !BIT_READ(PINB, 7));
    DDRL <<= 1;
  }
  // create unique index for fast validation
  keyindex = keybitmap[3];
  keyindex <<= 8;
  keyindex |= keybitmap[2];
  keyindex <<= 8;
  keyindex |= keybitmap[1];
  keyindex <<= 8;
  keyindex |= keybitmap[0];
}

Think the for (uint8_t c …) loop can be faster when you go from 7 → 0 as compare with zero is instantaneously.

but first time this one

Do you have the timing of the original allready?

StillNotWorking:
Not found a good way to measure. Main program is run from CTC interrupt timer.

start = micros();

scankeys();

stop = micros():

print stop - start

...code

Think the for (uint8_t c ...) loop can be faster when you go from 7 -> 0 as compare with zero is instantaneously.

but first time this one

Do you have the timing of the original allready?

Interesting, I give it try and run both trough avr-objdump and see how much they differ

keep us posted ! (note longer code can be faster !)

Big thank you :slight_smile: This is insane, :o to the point I’ wondered if I made a mistake. Test code are running stand alone with default settings for m2560.

My initial code as posted need 136 micros
Rob’s code perform more than 44% better at 96 micros. Didn’t realize setup for loop and simple math where this costly.

But then I set up as suggested and change loop counter to subtract until zero. Now the micros test often ends at 4 micros (varies between at 4 and 8 micros). I’m not smart enough to understand what’s happening here???

Running both code snips after the other produce identical outputs for the actual port scan = both are working as intended.

void scankeysB(){
  startB = micros();
  
  for (uint8_t c = 7; c = 0; c--)
  {
    BIT_WRITE(keybitmap[3], c, !BIT_READ(PINB, 7));
    BIT_WRITE(keybitmap[2], c, !BIT_READ(PINB, 6));
    BIT_WRITE(keybitmap[1], c, !BIT_READ(PINB, 5));
    BIT_WRITE(keybitmap[0], c, !BIT_READ(PINB, 4));
    DDRL <<= 1;
  }
  // create unique index for fast validation
  keyindex = keybitmap[3];
  keyindex <<= 8;
  keyindex |= keybitmap[2];
  keyindex <<= 8;
  keyindex |= keybitmap[1];
  keyindex <<= 8;
  keyindex |= keybitmap[0];
  stopB = micros();
  Serial.println(keyindex);
}
for (uint8_t c = 7; c = 0; c--)

c = 0 ?
Oh dear, oh dear.
That's always false isn't it?

void scankeysB(){
  startB = micros();
  DDRL  = 0x01;
  for (uint8_t c = 7; c !=255; c--)
  {
    BIT_WRITE(keybitmap[3], c, !BIT_READ(PINB, 7));
    BIT_WRITE(keybitmap[2], c, !BIT_READ(PINB, 6));
    BIT_WRITE(keybitmap[1], c, !BIT_READ(PINB, 5));
    BIT_WRITE(keybitmap[0], c, !BIT_READ(PINB, 4));
    DDRL <<= 1;
  }
  // create unique index for fast validation
  keyindex = keybitmap[3];
  keyindex <<= 8;
  keyindex |= keybitmap[2];
  keyindex <<= 8;
  keyindex |= keybitmap[1];
  keyindex <<= 8;
  keyindex |= keybitmap[0];
  stopB = micros();
  Serial.println(keyindex);
}

might look strange but give it a try

why faster ?

most time is gained by less shifting ( exercise: count how much less per call)

unrolling the loop gave compile time fixed indices for the array members,
removes a multiplication per access