LCD 1602 (and similar) databus sniffer

LCD1602_sniffer_V0_10

I've simplified the LCD1602 and similar (HD44780 protocol) sniffer and made it more robust, handling multiple initialisation sequences and including those which rely on the "Internal Reset" condition of the display on power up as described in the HD44780 data sheet.
The usage instructions are contained in the program. This upgrade has been partly inspired by comments and user experience related in this thread.

The new version is attached here.

LCD1602_sniffer_V0_10.zip (4.9 KB)

A test program is attached here (if required). It uses some direct access methods to write to the LCD, instead of using a library, to test also the HD44780 "internal reset" initialisation method which may be encountered in some systems.

LCD1602_write_V0_05.ino (8.0 KB)

It is a long time ago now but I have had some time to look at this again and made some changes to my design but there is (at least) one thing I cannot agree with in your comments. That is an analyser for the bit stream that the host sends to the HD44780 must, as far as I see, be a stateful activity. It requires 3 states:
(1) 8 bit mode
(2) 4 bit mode and expecting the High nibble
(3) 4 bit mode and expecting the Low nibble.
so it can respond appropriately to the current item on the data bus.

If the current state is (2) and that current nibble is a "Function Set to 8 bit" command, it responds by setting its internal state to 8 bit, that is state (1). If the current state is 8 bit mode and it sees, in the four high order bits, the "Function Set to 4 bit command", it responds by setting the state to state (2) above and thus resetting the nibble synchronisation.

If the current state is (2) and that current nibble is a "Function Set to 8 bit" command, it responds by setting its internal state to 8 bit

Not quite sure what you meant by this
The chip does not respond to an instruction in 4 bit mode until it has received two nibbles. i.e. if in 4 bit mode, and you send the high nibble of a function set to change to 8 bit mode, the chip will not revert to 8 bit mode until another nibble is sent.


I didn't say that there were no states involved.
I said that there was no state information for the "initialization" function sequence
which is used to get into nibble sync like what was being done in the prior code where it was attempting to identify a sequence of instructions.

Yes there is state information for 8 bit vs 4 bit mode but, there is no state information for "initialization" or an "Internal Reset" to look for a sequence of instructions to force nibble sync. That isn't how the chip set works.

i.e. many people mistakenly think that the 3 instruction sequence outlined in the datasheet for initialization is being looked for by the chip set but it isn't because it isn't' necessary and chip doesn't work that way.
The chip simply processes every function set it sees and uses the data length it sees and switches modes if the specified mode it sees in a function set instruction is different then its current mode.
If switching from 8 bit to 4 bit mode it sets the expected nibble to the high nibble.
The point of the 3 instruction function set sequence is that after it completes, the chip and the host are guaranteed to be in sync with other regardless of what state the chip was in when it started. i.e. the LCD chip could have been in 8 bit mode, 4 bit high nibble, or 4 bit low nibble when the sequence started. It won't matter.
And yes garbage instructions may be executed depending on the mode/state the chip is in (like if the chip was expecting a low nibble), but again it doesn't matter. What matters is that the host and chip are guaranteed to be in sync after the hosts sends that sequence of instructions.
The synchronization is due to the careful selection of the bits in instruction set combined with the specific bits and reserved bits in the function set command along with sending the 3 instruction sequence. The chip isn't looking for a sequence. The sync "just happens" by sending that sequence.

The main point I've been trying to say is that hd44780 chip never looks for any type of sequence or order of commands/instructions to get in sync with the host.
All it does is process 8 bit instructions as it sees them, processing each one individually.
The difference between 8 bit vs 4 bit mode is how the chip composes the 8 bit instructions that it processes.

There is state information like:

  • 4 bit vs 8 bit mode
  • in 4 bit mode, high nibble vs low nibble
  • BUSY status
  • DDRAM vs CGRAM mode; set by set DDRAM addr, and set CGRAMADDR
  • internal memory address counter
  • display starting address - gets modified by display shift instructions
  • several states that get set / modified based on
    entry mode, display control, and cursor/shift instructions.
    These set things like cursor mode, cursor move direction, blink mode, pixels on/off,

But for processing/fetching instructions, the only state information involved is
8 bit vs 4 bit and in 4 bit if expecting high or low nibble.
And since processors of this vintage were fairly slow, I'm assuming that the "state" information for 8 bit mode vs 4 bit mode is that the chip must have on board h/w to build the 8 bit instructions and sets the BUSY status when the 8 bit instruction is available.
i.e. in 8 bit mode the h/w latches all the data pins to build the byte when E is dropped.
in 4 bit mode the h/w grabs 4 bits at a time to build the byte.
The micro-controller could set the h/w for the proper mode every time it sees a function set instruction.
h/w could process E transitions and set the BUSY status when a byte has been built.
This would allow the processor to always process 8 bit instructions and when done, clear the BUSY status. The processor could literally be slow as molasses and never fall behind.

IMO, I would implement it like the chip appears to be doing it. using two layers.
A front end that just gathers and builds 8 bit instructions
And then back end that process the 8 bit instructions.
8 bit mode vs 4 bit only affects how the 8 bit instructions are built.

In a sniffer, I would use a functional layered approach similar to the chip to break the functionality into clean an simple layers.
ISR queues stuff fast.
and a function that just de-ques stuff to build 8 bit instructions.
i.e. I wouldn't process any instructions in the queuing or dequeuing code.
So in your sniffer code, I wouldn't be looking for function set instructions in
processQueue() as that isn't a clean functional layer separation.
i.e. I would have processQueue() just process the queue to create the 8 bit instructions based on being in 8 bit or 4 bit mode.
Then, higher level code can look at each 8 bit instruction and determine what to do.
In the case of a function set, the higher level code could tell processQueue() how to process queued data (8 bit vs 4 bit) by say setting some variable(s).

It keeps the code clean and functionally layered.
i.e. each layer effectively does its one thing.

ISR just queues information on each falling of E

ProcesssQueue() just processes the Q to build 8 bit instructions.
It does have to know about being in 8 bit or 4 bit mode but does not have to process any instructions. i.e. it defaults to 8 bit mode, but the mode (8 bit vs 4 bit) is controlled/set by other code that processes the instructions.

Higher level code processes the 8 bit instructions by always looking at 8 bit instructions with no concern whether 4 bit mode or 8 bit mode is being used.
It can choose what instructions to process and how to process them.
In the case of a function set, it tells ProcessQueue() about what mode (8 bit or 4 bit) to start using to build the 8 bit instructions.

--- bill

OK. I don't think there is much more to be said on the stateful activity of joining two four bit nibbles. It is true that my previous code looked for the complete 3 part initialisation sequence which was unnecessary and, because this sequence is not even mandatory (although I guess most libraries use it), was not strictly correct. Dropping that was a useful simplification. Yes, I could have separated out the process of assembling an 8 bit data item, and the dump to the console of the interpreted result, but that is only cosmetic. In practice, it would be implemented by a function call when the 8 bit data item is ready, however, I might do that if there is a future version.
The other thing you have said is that, even if the hardware is configured exclusively for 4 bit mode, as say in the case of an I2C backpack, it is necessary to connect all 8 data bus wires to the sniffer. I don't believe that is correct. The sniffer should never, in that case, actively process the bits D3, D2, D1, D0. Of course, it is possible that there is a mismatch between the firmware on the host, say assuming the display is wired for the full 8 bits and the display it self which is only wired for 4 bits. In that case the sniffer would attempt to process bits D3, D2, D1, D0, however the display would also, in that case, show invalid data.

If you don't think the 3 function set sequence is not mandatory for robustness and reliability, then you still don't understand the host and lcd synchronization and how the hd44780 interface and function set sequence really works.

I guess I should give up, it seems no matter how many times I try to explain the need for the function set sequence and how it works, it doesn't seem to be sinking in.

I'll give it one more go.

Because there is no way to reset the LCD from the host,
doing the 3 function set sequence is mandatory since it is the only way to ensure that the host and the LCD can get into the same mode and synchronized. This is true regardless of whether the host is communicating with the LCD using 8 data pins or 4 data pins.

The 3 function set sequence is the only way to ensure that the the LCD can reliably be put into 8 bit mode regardless of what state the LCD is in.
After the LCD is put into 8 bit mode, the host library can do a function set to set the appropriate data length and the font.

In other words the only way for a library to have a reliable LCD initialization is to start from when the LCD is in 8 bit mode and the only way to ensure that the LCD is in 8 bit mode is to use the 3 function set sequence to put the LCD into 8 bit mode.
The LCD is not looking for this function set sequence.
It doesn't have to. The sequence "just works", given the bit patterns of the instructions.
Depending on the mode the LCD is when the sequence starts the LCD will interpret these 3 function sets differently. i.e. the LCD can execute 2 or 3 functions sets or even a garbage command depending on the mode the LCD is in when host starts the sequence.

If LCD in 8 bit mode:
( a total of three 8 bit mode function sets)

if LCD in 4 bit mode expecting high nibble:
(a total of one 4 bit mode function set, and one 8 bit mode function set)

if in 4 bit mode expecting low nibble:
(a total of one 4 bit mode garbage command, and one 4 bit mode function set)
OR
(a total of one 4 bit mode garbage command and two 8 bit mode function sets)
which one depends on if the garbage instruction happened to put the LCD into 8 bit mode.

So while the host sends 3 function sets, the LCD can and will interpret these function set instructions differently depending on what mode the LCD was in when the sequence started, but no matter what mode the LCD was in when it started, the LCD will always be in 8 bit mode after the E drops on the 3rd function set.

This all explained in gory detail in the hd44780 library hd44780.cpp file in the the begin() function.

There are libraries out there that do not do this function set sequence
(I have seen 8 bit only libraries skip this and I've also seen some 4 bit libraries skip this)
While it kind of "works", those libraries assume that when the library starts its LCD initialization that the LCD is in 8 bit mode. While that is true immediately after after a LCD power up it isn't guaranteed to always be true.
They have issues during a host warm start depending on the state of the LCD when a host warm start occurs.
i.e. if the host doesn't do this function set sequence and the LCD is in 4 bit mode, from either intentionally being put into 4 bit mode by the host or unintentionally from some sort of h/w or s/w issue, then the host will not be able to communicate with the LCD even if the host were reset and restarted because the host library is assuming that the LCD is in 8 bit mode when it starts and the LCD is not in it 8 bit mode.
This will be true regardless of whether 8 bit or 4 bit communication is being used, but is a particularly issue when using 4 bit mode since the LCD is likely to be in 4 bit mode on every host warm start.
The only way to resolve that situation is to to power cycle LCD and restart the host.
IMO, that is a poor implementation given if the host simply does the 3 function set sequence then the warm start initialization issue goes away.

--- bill

That isn't what I actually said.
I said that if you are only going to only connect 4 data pins to the sniffer to probe DB4 to DB7, then to be fully compatible with the way the LCD works, you would need to put pull downs on the other for 4 pins that the sniffer is reading rather than leave them floating since from my testing the LCD internally reads unconnected data pins as low.

I talked about this in #9 and also suggested that it might be beneficial to have a conditional compilation for 4 pin/bit mode only. i.e. the code only looks at DB4 to DB7.
If building for 4 data pin mode, it could not read the other 4 pins and just set their values/states to zero.

So while you can likely get away with reading garbage for DB0 to DB3 when the sniffer is in 8 bit mode but only wired up to four data pins, it won't be perfectly emulating the behavior that I've seen from testing on actual hd44780 chips. that is that unconnected DBx pins are read as low by the chipset.

Another benefit of adding a compilation conditional for 4 data pin only sniffing is that it could allow doing some optimizations to make the code a bit faster since you would only need to read and map the necessary pins that are connected.
Also a by moving pin connections around for the data pins
like put Arduino
B0 on LCD DB4
B1 on LCD DB5
B2 on LCD DB6
B3 on LCD DB7
Using those pins could allow more quickly getting the nibble since there would be no bit mapping required. the low nibble of PINB is the nibble.

The amount of time and energy I am prepared to spend discussing, or even reading, repeated observations of the 3 part HD44780 initialisation sequence, which you appear to have elevated to the status of a holy writ, is rapidly coming to an end. However, I will deal only with the following point which I found particularly patronising.

The HF44780 datasheet https://forum.arduino.cc/uploads/short-url/uGDA5ZAHZnkrMk50pl1ZAY6hZsU.pdf defines an alternative initialisation sequence which it refers to as "internal reset" as opposed to the better known "instruction reset". That is, the designer can rely on the device being in an known 8 bit state when it is powered up and, hence, can omit the 3 part initialisation sequence. This is nothing to do with a lack of understanding on my part, the data sheet even gives clear and unambiguous examples of this in Tables 11, 12 and 13 starting at page 40. It could well be that a designer working with a very low specification MCU (the chip has been around since the 1980s) may have been grateful for the saving of a few bytes and machine cycles that it could yield. Such a designer could then, if required, have forced an initialisation by a controlled power cycling of the display. However, I agree that in practice, no one would rely on that today and would always, as a precaution, send the 3 part initialisation sequence anyway.

Ok, Truce.

What drove most of my comments in this thread about the 3 function sets is that
I've seen over the years that many people have a lack of or even misunderstanding of how the 3 function sets really work and assume that this sequence is being looked for by the LCD to cause some sort of internal reset.
I've seen many monitoring / snooping / decoding code implementations that have made incorrect assumptions about this sequence and they will have issues as it isn't emulating how the the chip set actually processes the instructions.
This included several logic analyzer decoders like the Saleae LCD decoder as well as your early implementation.

The sad part is that the reality of how the chip works is actually much simpler than what some of these implementations have done.

One of the reasons that I pushed so hard is that any implementation that is looking for this function set sequence will not be robust enough to work in many situations including with my Arduino hd44780 library when used on LCD backpacks due to the probing the hd44780 library does through the PCF8574 chip.

--- bill

After this post, I'll stop.

more detailed stuff below


"initialize by reset" vs "initialize by instruction"

An implementer that truly understands the datasheet and the chip instructions should be able to infer both "initialize by reset" and "initialize by instruction" sequences.
"initialize by reset" simply assumes 8 bit mode is active and thus skips over any steps of reliably forcing the LCD into 8 bit mode because it relies on the LCD already being in 8 bit mode from a fresh power up cycle. ("internal reset") and starts sending instructions for initialization.

That is the only difference between "initialize by reset" and "initialize by instruction".

And then the initial sequence of "initialize by instruction" to force the LCD into 8 bit mode so communication can commence can be inferred if taking into consideration the instruction set bit patterns, the 8bit/4bit state the LCD could be in when the sequence starts, and the potential for a garbage instruction being executed. The timing of the steps along the way can be inferred by assuming a worst case timing of using 100kHz LCD clock instead of the standard 270kHz, and assuming the longest instruction was executed as the garbage instruction.
If you do all that, you end up with the force to 8 bit mode instruction sequence and timing you see in figures 23 and 24 that is just above the sequence in the box at the bottom which is the actual initialization.


About the examples in tables 11, 12, and 13

Technically those 3 examples in tables 11, 12, & 13 are not really alternate initialization sequences.
They are examples that use and depend on the LCD "internal reset" that happens when the LCD powers up.
They do more than just initialize the LCD, they print a message.
The LCD initialization in those examples is
Table 11 steps 2,3, 4 (8 bit mode)
Table 12 steps, 2,3,4,5 (4 bit mode)
Table 13, steps, 2,3,4 (8 bit mode)

But as you mentioned, using "initialize by reset" is a short cut that can work but can have issues if the host ever does warm start initialization to the LCD without power cycling the LCD.
For example, attempting to do the initialization used in Tables 11, 12, and 13 when the LCD is in 4 bit mode expecting a high nibble will fail every other time.
This can cause a situation where every other warm start/reset "works" and I have seen this in some host implementations out in the wild.

What I have noticed over the years in conversations with various people is that some people (not saying this is you) have misinterpreted the phrase "with internal reset" mentioned in a few places in the datasheet and come to the mistaken belief that some of these instruction sequences are causing the LCD to do an internal reset vs the instruction sequence is depending on the LCD to have previously done an internal reset by just powering up.
And I believe that this is what has caused some host libraries to not properly implement a fully robust initialization sequence as the author has mistakenly believed that the instruction sequence he is doing is internally resetting the LCD so it should always work.


more details on the table 11-13 examples below

The table 11,12, and 13 examples are also incomplete or misleading at best in that they left off or failed to mention a critical step that if not done will cause the initialization to fail.
They failed to mention a step between 1 and 2.
The host must ensure that the "internal reset" has completed. It takes time for the internal LCD processor to complete the "internal reset" which is shown on page 23.
The "internal reset" done at powerup actually does a bit more than what is shown, it also initializes DDRAM which is what causes the boxes to show up.
While the host can poll BF to determine when "internal reset" is complete, the datasheet never mentions when BF can be polled to check for completion of the "internal reset".
If you assume that their step 1 includes this time until BF can be looked at, then they still left off a step of either waiting enough time for the "internal reset" to complete or polling BF to ensure that it has completed.

The datasheet is not clear on how soon before BF shows up or how long "internal reset" really takes.
The only mentions of this is on page 23 where it describes "initialization by internal reset"

The busy state lasts for 10 ms after VCC rises to 4.5 V.

which is odd since the LCD is specified to run all the way down to 2.7v
But no mention of how long before BF is valid after power up.

Also, that 10ms time could vary from that if the clock used is not the standard 270 kHz.

The "initialization by instruction" also has some information on this as it is more detailed and complete.
The "initialize by instruction" sequences in figures 23 and 24 specify:

Wait for more than 40 ms after VCC rises to 2.7 V

and

Wait for more than 15 ms after V CC rises to 4.5 V

Since these are not using BF at this point, These indicate that it takes more than 10ms
But then these sequences also assume a 100kHz clock for a 270kHz clock for their non BF timing.

Either way, the examples in tables 11 and 12 are incomplete in that they don't mention to wait for the "internal reset" to complete before sending any instructions.

-- bill

Hi,
this is a really cool project!
I currently try to simulate a LCD1602 with an Arduino Mega2560, which is pretty much the same what has be done here. The Display is written by a 8031 mikrocontroller.

But somtimes the display is written much faster (1-2 usec) by the 8031 than the Arduino is able to read the data and write it into a buffer.

Do you have any idea how fast your sketch can read the display Pins?

I haven't actually got any performance figures for that program (the latest version, incidentally, is attached to post #41).
It should be reasonably fast because it queues ports B and D in an ISR, on the falling edge of pin "E" directly instead of using say a series of digitalRead() instructions. However, I could guess that that would take 2-3uS. There may be scope for some optimisation there. Depending on which pins you used on the Mega you may anyway have to change this part of the code.
A quick glance at the HD44780 data sheet https://www.sparkfun.com/datasheets/LCD/HD44780.pdf seems to imply the shortest instruction is 37us (clock=270kHz) so I guess that the first thing to do is to verify the behaviour of the 8031, probably with a logic analyser, in respect of these very short timing intervals you have mentioned.

While most instruction times are 37us, the minimum hold time for holding data stable after lowering E (tH) on a write is only 10ns.
i.e. the data on the bus doesn't have to remain stable after E is dropped for more than 10ns.
Also, in 4 bit mode, there is no delay requirement so the host can do the second nibble back to back from the first nibble which could be as short as 500ns (tcycE)

In real world implementations the times are considerable longer than the minimums but there is always the possibility that an optimized implementation could do some pretty quick turn arounds on the bus so the sniffer might not be able to keep up.

The hd44780 library can start modifying the data bus pins after lowering E before the previous instruction has completed, but it uses digitalWrite() to control all the pins so it tends to take a few us.

That said, on some processors like the ESP parts digitalWrite() is considerably faster than on the AVR.

Indeed. There is at least one theoretical use case which could result in the data on the bus to the HD44780 only being held for the 10 nanoseconds mentioned following the falling edge of "E". That is if the bus is shared with something else apart from the display driver chip. It would be an unusual design, though. In that case, a hardware latching circuit, based on say a 74LS377 8bit register, may solve the problem.
Incidentally, I did not see the 10ns hold time explicitly referenced in the data sheet and it does seem rather fast but I could have missed it.
I would be interested to see a logic analyser output of the 8031 driving the HD44780.

Thanks a lot for your replies.

10ns hold time should not be an issue because the display is written by an 8031 µC.
Unfortuanally I don't have an logic analizer ready to go, but I've attached a measurment done with oszillograph.
Channel1 (blue) shows enable signal (E), provided by 8031.
Channel2 (yellow) shows a test pin I toggle within an interrupt routine directly after (E) = high and (E) = low. So your assumption of 2-3µsec delay seems fine.
(I dont use digital.write but write directly to port register)

The main issue for my display simulator is, that I have to provide data from display, also. E.g. to simulate the busy flag...

My next step now is to modify a ESP32 (because its faster) to figure out whether its fast enough.

If you are interested in more information about my project, take a look to groups.io
Ch1-Disp_E_Ch-2_TestPin_kommentiert

seems the ESP32 is not an alternative, I couldn't find any possibility to read/write whole ports at once

I found an 8-Bit logic analyzer from AZ-Delivery in one of my boxes which runs with the saleo Logic 2 software.

I attached the capture file (txt) and my configuration
Edit:new LCD-capture.txt because 2 pins were swaped at logic analyzer
LCD-capture.txt (48.7 KB)

In this document: https://www.sparkfun.com/datasheets/LCD/HD44780.pdf
It is tH in figure 25 page 58
and if you look back at "Bus Timing Characteristics" for write operation on page 52, you can see that minimum value for tH is 10ns

Ignoring the 10ns hold time,

Here is a potential optimization that could be done by the host if using 8 bit/pin mode.
The host could check to see if the next byte to transfer is the same as the previous byte transferred.
If so, there is no setup overhead for the 8 data pins or the control pins.
So all that is necessary is to drop E but still honor tcycE and PWEH.
This could reduce the transfer time of back to back bytes to the minimum E cycle time which is tcycE which is 500ns.
Consider a host that is blanking/erasing a line by writing multiple spaces it could optimize this to just hump E.

This same tcycE timing applies if the host is VERY fast (fast enough that it could setup the control and data lines for a write transfer in 250ns).

So the worst case scenario for the sniffer is if can catch and keep up with back to back bytes being transferred at 500ns.
Again, most implementations probably won't do this or do it this fast, but it is the works case byte transfer rate.

--- bill

I haven't looked that ESP32 code, but I seem to recall on the ESP8266 that all the pins are controlled by a single 32 bit register and pin number is the bit number in the register, which is why it is so much faster than all the goofy Arduino pin look up table stuff in implementations like the AVR core.

I would think the ESP32 would be doing something similar.

--- bill

Thanks. I see it now tucked away at the back.

I looked at the link and tried to understand what you are doing. It looks like your aim is to build a replica of this portable data programmer (PDP) for the Philips FM1000 series radio using maybe a real 80C31 chip plus original firmware but with the functionality of the other components transferred to a microcontroller.

From the schematic supplied, it looks like the display is driven in 4 bit mode.

If all you are doing is reading the contents of the display, that is not writing to it, then you do not have to respect the busy pin (as far as I can see) . The driving chip (80C31), if it uses it at all, will not attempt to manipulate the 'E' pin until it determines that the the HD44780 is not busy so you just have to act when 'E' falls.

Looking at the video, there were some "special" characters written to the screen, for example a small 'R'. These make it a bit more complex if you need to read those as well. In a complex design, these may be created dynamically on demand (max 8) so you may have to watch them be built to keep track of them. Worse could be if any part of that display has a horizontally scrolling section which could be very difficult to capture. In the absolute worst case you'd need to fully emulate the HD44780 to understand the relationship between what is sent to the HD44780 and what appears on the display. This is clearly not impossible as some on-line arduino simulators have these (e.g. Wokwi.com) but you may have difficulty running an emulator fast enough for your application.

image

For reading ESP32 ports directly you can look at this example at c - ESP32 direct port manipulation - Stack Overflow . I've not tried it myself though.

The logic analyser screen shots would appear ideally like those in post #1 rather than text output. The Saleae logic analyser even has a decoder for 44780 data streams (as you have found). However, I see now that you are considering a solution which may involve modifying the 80C31 code which may ultimately be a better solution.

Last time I tried the hd44780 decoder in the Saleae analyzer, it is not doing the decoding correctly.
And as a result it has decoding issues.
For example, it won't work with my hd44780 library when used on a PCF8574 backpack.

They assumed (like so many other s/w people out there) that the initialization by instruction sequence (particularly the 4 bit initialization) is a bit magic and attempt to track initialization states during function set instructions to try to detect the initialization by instruction sequence.
(Your sniffer is also doing this)
I've said this over and over again to many s/w developers. That is isn't how this works.
Trying to detect an initialization by instruction sequences is incorrect as each hd44780 instruction can and MUST be processed on its own if you are really wanting to sniff and emulate the hd44780 chipset.
The "magic" 3 instruction initialize by instruction sequence is not magic and there is no state involved.
i.e. the only state involved is the current 8 bit vs 4 bit mode which is used to determine how to read the bus to process the next instruction. Beyond that there is no other state and each function set instruction processed can set 8 bit or 4 bit mode.

Any hd44780 sniffer/decoding implementation that attempts to do things like count function set instructions or track some kind of states to try to detect a an initialization by instruction sequence will have issues and not work correctly.

All that is required is to process each instruction as they come across the data bus and if the instruction is recognized at a function set, set 8 bit / 4 bit mode appropriately to process the next instruction.
It is just that simple.

--- bill