Debouncing many switches used for MCU input via a shift register

If you are using the Teensy, it has a 32bit MCU. So you can do your bitwise maths in 32 bit chunks.