Need help with sketch for16 inputs with 2- 74HC165 shift registers

Hello all, I found the sketch below online for 8 inputs with a 165 and to output those inputs to a 595 to light an led corrisponding to the input. it works perfect for 8 inputs and outputs. Now I am trying to daisy chain another 165 for 16 inputs and 16 outputs from a second daisy chained 595. I can still only get the first 8 inputs and outputs to work. Im stuck on what needs to be changed in this sketch to make it work or if it will work at all. Thanks for the help.

int pin_button_clock = 12;
int pin_button_apl = 13;
int pin_button_data = 11;
int pin_button_latch = pin_button_apl;

int pin_led_clock=2;
int pin_led_latch=3;
int pin_led_data=4;

byte my_bit;
byte button_byte;

void setup() {
Serial.begin(9600); //start serial

pinMode(pin_button_latch, OUTPUT);
pinMode(pin_button_clock, OUTPUT);
pinMode(pin_button_data, INPUT);

pinMode(pin_led_data, OUTPUT);
pinMode(pin_led_latch, OUTPUT);
pinMode(pin_led_clock, OUTPUT);

void loop() {

byte get_button_states(){
pulse_pin(pin_button_latch); // sample the button states
button_byte=0; // clear the byte ready for new data
for (int n=0; n<8; n++){
button_byte = button_byte << 1; // shift bits to the left, making space to capture the state of the next button
button_byte=clear_lsb(button_byte); // make sure the new lsb is zero, to avoid surprises
my_bit = digitalRead(pin_button_data); // store the current Q7 value from the button register in my_bit
button_byte = button_byte | my_bit; // OR th my_input with the new bit to add it to the lsb
pulse_pin(pin_button_clock); // cue the next bit slot for reading
Serial.println (button_byte,BIN);

void set_led_states(byte button_states){
for (int n=0; n<8; n++) {
if((button_states & 1) != 0) digitalWrite(pin_led_data,HIGH); // if the LSB is 1, turn the ‘current’ led on
else digitalWrite(pin_led_data,LOW);
button_states = button_states >> 1; // Shift to get a new LSB from buttonstates
pulse_pin(pin_led_clock); // address the next bit slot
pulse_pin(pin_led_latch); // when the latch goes from low to high, the data that’s been stored to the register’s memory gets sent to its output pins

// Set a pin to low, then high
void pulse_pin(int pin_number){

// Clear the least significant bit
byte clear_lsb(byte byte_to_clear){
return(byte_to_clear & 0xfe); // AND the byte with 11111110 to be sure that the LSB is zero

Both of the “get” and “send” states routines have at their hearts
for (int n=0; n<8; n++)

Clearly, that won’t work too well for a 16 inputs/outputs situation.

I tried changing those numbers to 16 also``

Wow, lot of code to do a simple thing.
Try something simpler.
Use pin9 to latch 165 data
use pin10 to latch 595 data
connect SCK (d13) to all clock lines
connect MISO (d12) to 165 data out
connect MOSI (d11) to 595 data in

at top of sketch:
#include <SPI.h>

in setup():
SPI.begin(); // default speed 4 MHz

in loop():
digitalWrite (pin9, LOW);
digitalWrite (pin9, HIGH); // capture 165 inputs
byte1 = SPI.transfer(0); // read in 165 #2 data
byte2 = SPI.transfer(0); // read in 165 #1 data
SPI.transfer(byte1); // send data to 595 #1
SPI.transfer(byte2); // send data to 595 #2
digitalWrite (pin10, LOW);
digitalWrite (pin10, HIGH); // latch to output of 595s.

Be sure to have 0.1uF cap on all shift register VCC pins to Gnd.