About the bypass caps, do you think it may be the source of the issue?
Yes along with the lack of buffering for the clock and latch pins.
so it's not easy to add that much caps on them
Yes it is. If you add surface mount ceramic caps to the pins on the underside of the board you can put one on each chip, which is the recommended number.
Also, you must have a 1.0uF bypass cap for each chip,
I would use 0.1uF ceramic capacitors.