You have much to learn Grasshopper if you want to play at this level.
I can say fairly confidently that the code you have will not work.
Not only is not sending the proper initialization sequence to get the LCD into 8 bit mode and then into 4 bit mode but it also will violate the hd44780 inter instruction timings.
For example, the first few instructions sent to the LCD when using only 4 data pins are sent as a single nibble not two and then there is no delay between instructions.
The majority of what you will need to focus on is learning the hd44780 interface, its instructions and how to initialize it.
Interfacing to the LCD using a serial register is the easier part.
You need to spend some serious time with the hd44780 datasheet to understand the various h/w and instruction timings as well as the proper initialization sequence.
https://www.sparkfun.com/datasheets/LCD/HD44780.pdf
You will need to understand all the various timings.
There are timings at multiple levels that must be honored.
There are no guard rails at this level and you can't just slam nibbles at the LCD as it will do things too fast and will violate those timing requirements which means it won't work.
There are timings for the AVR instructions - that will be in the AVR datasheet.
You will need to look this up to know how fast you will be flipping pins.
There are timings for the 595 - that will be in that datasheet.
There are low level BUS timings for the hd44780 interface - that will be in figure 25 of the hd44780 datasheet.
In particular, pay attention to things like tAS, PWEH, tcycE,
There are 4 bit mode instruction sequences and timing - that is in figure 24 page 26.
And then there are timings for instructions once the LCD is initialized.
Those are in table 6 on page 24.
If you send an instruction to the LCD before the previous one has completed, it will be lost. Even worse is if in 4 bit mode and the timing is just right, you might lose just the first nibble, and now the host and the LCD are out of nibble sync and will never recover until the LCD is re-initialized.
Until you get the LCD to initialize it doesn't make sense to try to print anything.
I.e. once you can initialized and clear the LCD, you can then start to try to print characters to it.
From my looking, the code you currently using will violate the hd44780 LCD instruction timing and likely some of the BUS timing.
IMO, if you want to play at this low of a level you need appropriate tools.
My suggestion is to get a logic analyzer - you can use this to verify all your signals.
You can also use the logic analyzer to look at the various timings of all the signals going to the shift register and to the LCD to make sure you are not violating any of various timings - which is easy to accidental do when working at this level - and to see if the nibbles are what you are expecting them to be.
You can pick up inexpensive USB based analyzers for under $20 USD.
You should also be looking at the assembler output from the compiler to see what is really happening.
Its all part of playing at the bare metal level.
I have written several shift register libraries for controlling a hd44780 display.
BTW, with some additional components it can be done with fewer pins.
Including using only a single pin.
More commonly it is done with some additional components (resistor and diode) and controlled with only two pins.
I will also say that the 595 can be problematic in that it is quite sensitive to noise.
i.e. if you don't have good clean power an good grounds and proper decoupling it will never work because you will get phantom clocking happening.
In your schematic there is no decoupling so it may have issues and depending on wire lengths may suffer from ground bounce issues.
Also, if you clock the 595 too fast it can also cause issues.
Having spent around 15 years writing various hd44780 libraries including for the Arduino, I'm vary familiar with the hd44780 and how to make it work on the AVR and in the Arduino environment. (I'm the Author of the Arduino hd44780 library and was the author of several of the shift register i/o classes for the NewLiquidCrystal library)
I've written code to talk to the hd44780 pretty much any way possible.
bit banging on lots of different processors, through i2c, shift registers, etc..
And using raw port access or Arduino core library i/o functions.
I was also involved with the AVR libC developers to correct some issues in the util/delay.h routines. This came as a direct result of some of the timing issues I ran into developing the newLiquidCrytal library code for shift registers like the 595.
In the hd44780 library I actually don't include a shift register i/o class.
I have the code.
I just chose not to include it in the release since it is more difficult to get working than an I2C backpack.
For a while, it was cheaper to DIY a serial backpack, but as of about 8 years ago the i2c backpacks got so cheap that you can't even DIY your own shift register backpack for less than you can buy a PCF8574 based one.
Yeah a shift register based interface can be faster particularly if using raw port i/o to bit bang it, but it also comes with some issues and on some of the faster platforms like the ESP based ones, the big banging is too fast for the shift register so you have to insert delays between bit toggles - which makes it slower.
These are some of the reasons why I decided to not include a serial shift register i/o class in the hd44780 library.
Playing at this level requires a very good understanding of the hd44780 chip interface and instructions. It takes quite a bit of time to read through the datasheet to grasp all the various timings and the 4 bit instruction sequence.
And spending an enormous amount of time with a logic analyzer to verify that all the signals are properly conforming to the timing specifications.
Also, from my looking at various authors hd44780 LCD libraries, over the years, the majority do not understand the 4 bit initialization sequence.
It isn't a "magic" sequence. It is simply a sequence of regular instructions in a specific order that will first put the LCD into 8 bit mode and then into 4 bit mode. And this will work regardless of what mode the LCD is in or what nibble state it is in when the sequence is started.
If you want to full explanation of this initialization sequence, you can look at the hd44780.cpp module in the hd44780 library.
See the comments starting around line 240
--- bill