Is there a limit to the number of bytes per line in a hex file?

Hi, i have an Intel HEX file on a SPIFFS file system (on an ESP8266 board) and I intend to use SPIFFS.readBytesUntil('\n') on it to get one record of the hex file at a time. Each data record in the hex files I have examined so far has had 45 bytes (I confirmed with a small python script). For example: (I am showing python bytes objects to examine each byte)

b':10FC100049F0982F9A70923029F081FF02C097EF37\r\n'

There were other records as well that had fewer bytes like this one:

b':020000023000CC\r\n'

I am a little confused about the following things:

  1. Is there a limit to the number of bytes per line? Can I assume an upper limit of say 64? The reason I ask is because I intend to read it via readBytesUntil('\n') and send it over a UART buffer. So knowing an upper limit will help. In this example, the author uses a buffer[64] to read in a hex file record:
    char buffer[64];
    int lineNumber = 0;
    while (hexFile.available() && !ihex2binError) {
      int length = hexFile.readBytesUntil('\n', buffer, sizeof(buffer));
  1. Does the specification mandate both a carriage return \r and a newline \n at the end or is this a Windows only thing?

  2. The documentation for Serial.readBytesUnitil() says that The terminator itself is not returned in the buffer. meaning if i run hexFile.readBytesUntil('\n', buffer, sizeof(buffer));, it would only read in the carriage return right?

If you have so many questions about "Serial.readBytesUntil()", then why do you use it ? If I knew the answers, then why are those answers not well documented ?

I can summerize your post to one word :rofl:

Question: readBytesUntil
Answer: No

Intel HEX - Wikipedia

To my knowledge there is no limit to line length, there is a record length field that tells you how much data is there. Check this link, this format has been around for me since the 60's. Documentation – Arm Developer . It is up to the source to determine how big it is. There is also a binary version and a similar Motorola "S" record format. The HEX file has nothing to do with your buffer, it is up to you and your software to accommodate the line. There is flow control hardware and/or software with serial and that is how you can control it without losing data. you have to tell it to stop sending before your buffer overflows and then again to send when you have room to receive more. Do some research on "Xon and Xoff", that was very common years ago and is still implemented in many systems.

Intel hex produced by avr gcc has always the same max line length. it doesn't decide randomly that it will produce a longer line

1 Like

"avr gcc" is the source and it is controlling the line length. It is there choice of line length, this was decided by the people that wrote the code. The line length is not specified it is up to the person writing the record. There is a very well defined way of doing it. It is common to break it into even binary lengths, some times 16 bytes, sometimes more sometimes less but note there is not a required length. Read the length you will see it is specified by record length.

1 Like

@20nik00

Write your own version of readBytesUntil() and you will can to choose any reasonable limit of line length.

3 Likes

According to the old Intel documentation, the max number of data bytes per record is 255,
So that plus the 6 byte overhead (including the colon) per record gives 261 bytes per record maximum.

1 Like

If all you are doing with the data is sending it over serial, then why is it necessary to read an entire line at a time?

The two bytes after the initial colon specify in hex the number of data bytes in the line. So "10" is actually 0x10, or 16 bytes. But since each data byte needs two characters, that would be 32 characters. The other 13 characters are overhead - the colon, the length bytes, the address, the record type, the checksum, and CR/LF.

But since only two hex characters are allowed to specify the length, the maximum length is 0xFF, which is 255. 0x10 and 0x20 are widely used. I've never run across anything larger.

Anyway, the first two bytes after the colon tell you the number of data bytes in the line.

4 Likes

Note that this is NOT the same as the number of characters in the line. which is something like 2*n+11. 80bytes should be a safe buffer size. 64 is risky because of those 32-byte entries.

1 Like

I am presenting below an example of an Intel-hex frame (Fig-1, generated by LOD186.exe for 8086 Microprocessor) in order to explain the meanings of various fields of the frame. Please, note that every symbol/digit of the frame is transmitted/received as ASCII code. This frame does not contain LF/CR as the end mark which could have been added by the sender.


Figure-1:

Field-(a): There is the symbol (:), which has the ASCII code of 3AH. The IBMPC sends it to the trainer to indicate the start of the transmission of a frame.

Field-(b): 14 (14h = 20 decimal), indicates that the sender will transmit 20 bytes information bytes, which are contained in Field-(e).

It is not a fixed amount -- LOD186.exe accommodates 24 bytes. ASM51.exe accommodates 16 bytes, and Arduino IDE accommodates 16 bytes.

MICROCHIP STUDIO accommodates different values for different frames as is seen below:

:020000020000FC
:040000000395FECF97
:00000001FF

Field-(c): 100 (1000h) indicates the RAM/Buffer location (only the OFFSET) of the receiver from which the information byte of Field -(e) would start saving. This is to say that the byte 90h would be stored at location 01000h (0000:1000 for 8086) of the CSM. 20h would be stored at location 01001h and so on.

Field-(d): 00 (00h) indicates normal mode of transmission.

Field-(e): There are 20 (14h) bytes of information, which agrees with the declaration made in Filed- (b) of the frame.

Field-(f): D3 (D3h) is known as the Checksum Error (CHKSUM). This is the code, which the sender computes from all the data bytes of the Fields-(b), (c), (d) and (e) only and is sent as a last byte of the frame. This code allows the receiver to take decision as to the validity of the received data frame.

The receiver computes the CHKSUM in the following ways:
1. All the data bytes from Fields- (b) to (e) are added. The result is: 092Dh
2. The accumulated carry of the result is discarded. The rest is: 2Dh
3. All the bits are inverted (0010 1101  1101 0010): The result is: D2h
4. 01 is added. The result is: D3, which is the 2’s complement form of the lower byte of the sum, which is 2Dh in step-2 above.

The receiver accepts all the data bytes of a frame as they are coming from the sender. It adds up all the data bytes the first and the last one of a frame. The upper byte of the addition is discarded. The lower byte is added with the last byte of the frame and the result should be 00h and if so, the received frame is probably good. Otherwise, the receiver sends an error message to the sender.

===> The first 2-digit (in hex base, Fig-1) after the initial colon ( : ) specify the number of information bytes (Field-e of Fig-1) in the line/frame.

Yes! These are first 2-byte after the initial colon ( : ---> 3A) when talking in respect of the ASCII coded transmission/reception frame.
==> 3A31343130.............................4433 for the frame of Fig-1.

1 Like

This topic was automatically closed 180 days after the last reply. New replies are no longer allowed.