PCA9564 parallel bus to i2c-bus controller

An Arduino with 8 cascaded MCP23017 is okay at the end level. However, I think that shift registers are better than I2C devices.
The most used shift register is the '595' for shifting out: https://www.arduino.cc/en/Tutorial/ShiftOut and the '165' for shifting in: Arduino Playground - ShiftRegSN74HC165N.

Sometime a wireless option is choosen, even if the modules are only centimeters apart from each other. In this case I would not use a wireless option.

It is possible to have a circular chain-link with a serial port. Every Arduino reads the RX and sends it to TX, meanwhile filtering the messages that are for that Arduino board.
The disadvantage is that a software layer is needed and if a wire breaks, all the modules behind it can not be reached.
The advantage is the strong high/low signal, that is very reliable, and adding more modules is easy.
This should only be done with a hardware serial port. A SoftwareSerial would introduce too many problems.

With diodes or extra logic it is possible to have a single serial bus, instead of a chain-linked bus.

The RS-485 is an industrial standard for multiple nodes on a serial bus. Because it is a good standard, it is also used when the distance is very short.

Adding extra hardware makes a project less reliable. But since you are thinking about 64 modules (it is a lot) it can be more reliable with an extra layer in between.
Perhaps a bunch of Arduino Mega 2560 boards in the middle which control the lower level Arduino boards.

In my opinion, the best option is probably a single Arduino board and many modules with normal logic TTL chips (shift registers). Every module can have many shift registers, for example 32 chips for 256 input pins. The communication is the clock and data signals, like a SPI bus. The clock speed has to be lowered to make it reliable.
I think that modules with normal logic chips is the most reliable option. The digital signals are strong high/low signals, which is always better and faster than the weak open-drain I2C signals.
Since the Arduino Mega 2560 or Arduino Due have many pins, it is possible to have a number of different busses.

The more I think about it, the more I prefer a single Arduino board. When you have 64 Arduino boards, and you need to change a communication protocol for all of them, that is a lot of work. I also dislike the I2C bus for this more and more. What if you finished your project and it turns out it is too slow ?