PaulS:
In the code you posted (at the #105 link), kg_evt_bluetooth_inquiry_response is NOT explicitly set to NULL anywhere. Where is the pointer EXPLICITLY set to NULL. Where is it set to point to some real function?
The repo code doesn't explicitly set the pointers to NULL, though I did modify line 133 in the "support_bluetooth2_iwrap.h" file to do this as a test:
/* 0x03 */ uint8_t (*kg_evt_bluetooth_inquiry_response)(uint8_t *address, uint8_t *cod, int8_t rssi, uint8_t status, uint8_t index, uint8_t name_len, uint8_t *name_data) = 0;
The compiler output for this run was not different than leaving it uninitialized in the source. However, C standards say that global variables are initialized to zero at compile time, and these are global. They should all be zero anyway, and the behavior of the rest of the application suggests this. For testing, I've added zero assignments to the code again (see lines 131-139).
There are also no places in the code where that particular function pointer is assigned to any real functions, but this is by design because that callback is left to the user to implement only if needed. It's typically left as null because it's rarely needed for a user application.
MarkT:
In short the presence or absence of braces is a red-herring, since it makes no difference. The presence of a memcpy() call earlier up is suggestive, you've likely smashed the stack or heap. Small changes in code structure can lead to the compiler generating different stack frame layout, which can interact with the stack corruption in different ways - in other words the symptoms of memory corruption can be anything at all and typically are Heisenbugs. You may have run out of RAM.
THIS is exactly what I expect. ("Heisenbug" is such an appropriate term, too.) The bug appears until I add debug info to look at it, and then it goes away despite no apparent link between the added debug code and the broken behavior. I really do suspect something odd in my code, but can't inspect the stack in a non-invasive way to see what actually happens. Typically I would expect that just adding braces around a single line of code shouldn't do this though...and it appears that with the other changes I've played with today and left in the code, adding braces no longer fixes it (see below). I'm actually glad that this is still reliably occurring right now.
MarkT:
The only ways the braces could have an effect is if there is some broken macro being expended that turns what looks like a single assigment statement into several statements, or if the compiler has a bug. Are you using any macros?
I'm not using macros that I'm aware of, no. I even checked the Arduino IDE build output (in the \Temp\buildxxxxxxxx folder in my userdir AppData location) to see if anything was modified, but it's exactly as I wrote it. The code there is just normal code, no macros.
pYro_65:
In one function you set the name at position 13 ( index 12 )
...
And index twelve is unset. (could be zero, which is effectively an empty string.)
The contents of the "my_iwrap_evt_name" function are all correct in this case except for the fact that index 12 is unset. This is a bug that I introduced today after adding and removing various debugging attempts inside the function. I've fixed it so that this byte is assigned properly again, with the same line of code that used to be there:
payload[12] = name_len;
pYro_65:
if (name_len) memcpy(payload + 13, friendly_name, name_len);
So the name you copy in overruns the array by 1 byte!
I don't think I've overlooked something here, but let me talk through it one more time. The payload packet is always at least 13 bytes (so "uint8_t payload[13]" is the right size). But there could be between 0 and name_len more bytes of ASCII name data (so actually "uint8_t payload[13 + name_len];" is the right size). The memcpy call doesn't include the null terminator, but only the number of actual ASCII character bytes. The payload[12] byte (the 13th byte) contains the name length, and the remaining bytes (if any) are the actual name. There should be enough memory allocated in every case. I think.
The code as it exists on Github right now reliably demonstrates the issue on my hardware. I can currently make it happen every single time I run a device inquiry ("kg_cmd_bluetooth_discover(10);"). Note that this code now has all of the following:
- Explicit function pointer initialization to 0
- Explicit pointer comparison to 0 in conditional statement on line 1108
- Braces around the "if" statement's conditional block
I have noticed that setting name_len to 0 regardless of what is actually in friendly_name appears to get avoid the freeze condition. However, adding debug serial output that spits out the value of name_len to verify that it is correct (which it is) also avoids the freeze condition, despite the fact that in that test case name_len is left alone and not set to 0. The presence of the debug output alone prevents the issue from occurring. Argh.
I will keep trying different modifications and tracing backwards to see where I might be overflowing a buffer or something.