Non-executing code block freezes, but works if block is in { } braces

In one function you set the name at position 13 ( index 12 )

payload[12] = 0; // no NAME yet

However in the function you referenced, you do not use index 12 at all.

if (name_len) memcpy(payload + 13, friendly_name, name_len);

So the name you copy in overruns the array by 1 byte!

And index twelve is unset. (could be zero, which is effectively an empty string.)

PaulS:
In the code you posted (at the #105 link), kg_evt_bluetooth_inquiry_response is NOT explicitly set to NULL anywhere. Where is the pointer EXPLICITLY set to NULL. Where is it set to point to some real function?

The repo code doesn't explicitly set the pointers to NULL, though I did modify line 133 in the "support_bluetooth2_iwrap.h" file to do this as a test:

/* 0x03 */ uint8_t (*kg_evt_bluetooth_inquiry_response)(uint8_t *address, uint8_t *cod, int8_t rssi, uint8_t status, uint8_t index, uint8_t name_len, uint8_t *name_data) = 0;

The compiler output for this run was not different than leaving it uninitialized in the source. However, C standards say that global variables are initialized to zero at compile time, and these are global. They should all be zero anyway, and the behavior of the rest of the application suggests this. For testing, I've added zero assignments to the code again (see lines 131-139).

There are also no places in the code where that particular function pointer is assigned to any real functions, but this is by design because that callback is left to the user to implement only if needed. It's typically left as null because it's rarely needed for a user application.

MarkT:
In short the presence or absence of braces is a red-herring, since it makes no difference. The presence of a memcpy() call earlier up is suggestive, you've likely smashed the stack or heap. Small changes in code structure can lead to the compiler generating different stack frame layout, which can interact with the stack corruption in different ways - in other words the symptoms of memory corruption can be anything at all and typically are Heisenbugs. You may have run out of RAM.

THIS is exactly what I expect. ("Heisenbug" is such an appropriate term, too.) The bug appears until I add debug info to look at it, and then it goes away despite no apparent link between the added debug code and the broken behavior. I really do suspect something odd in my code, but can't inspect the stack in a non-invasive way to see what actually happens. Typically I would expect that just adding braces around a single line of code shouldn't do this though...and it appears that with the other changes I've played with today and left in the code, adding braces no longer fixes it (see below). I'm actually glad that this is still reliably occurring right now.

MarkT:
The only ways the braces could have an effect is if there is some broken macro being expended that turns what looks like a single assigment statement into several statements, or if the compiler has a bug. Are you using any macros?

I'm not using macros that I'm aware of, no. I even checked the Arduino IDE build output (in the \Temp\buildxxxxxxxx folder in my userdir AppData location) to see if anything was modified, but it's exactly as I wrote it. The code there is just normal code, no macros.

pYro_65:
In one function you set the name at position 13 ( index 12 )
...
And index twelve is unset. (could be zero, which is effectively an empty string.)

The contents of the "my_iwrap_evt_name" function are all correct in this case except for the fact that index 12 is unset. This is a bug that I introduced today after adding and removing various debugging attempts inside the function. I've fixed it so that this byte is assigned properly again, with the same line of code that used to be there:

payload[12] = name_len;

pYro_65:

if (name_len) memcpy(payload + 13, friendly_name, name_len);

So the name you copy in overruns the array by 1 byte!

I don't think I've overlooked something here, but let me talk through it one more time. The payload packet is always at least 13 bytes (so "uint8_t payload[13]" is the right size). But there could be between 0 and name_len more bytes of ASCII name data (so actually "uint8_t payload[13 + name_len];" is the right size). The memcpy call doesn't include the null terminator, but only the number of actual ASCII character bytes. The payload[12] byte (the 13th byte) contains the name length, and the remaining bytes (if any) are the actual name. There should be enough memory allocated in every case. I think.

The code as it exists on Github right now reliably demonstrates the issue on my hardware. I can currently make it happen every single time I run a device inquiry ("kg_cmd_bluetooth_discover(10);"). Note that this code now has all of the following:

  • Explicit function pointer initialization to 0
  • Explicit pointer comparison to 0 in conditional statement on line 1108
  • Braces around the "if" statement's conditional block

I have noticed that setting name_len to 0 regardless of what is actually in friendly_name appears to get avoid the freeze condition. However, adding debug serial output that spits out the value of name_len to verify that it is correct (which it is) also avoids the freeze condition, despite the fact that in that test case name_len is left alone and not set to 0. The presence of the debug output alone prevents the issue from occurring. Argh.

I will keep trying different modifications and tracing backwards to see where I might be overflowing a buffer or something.

Have you actually checked your available memory. You may well be running out of sRam.

Can you confirm that the code as currently found on GitHub also exhibits these characteristics (described in the OP)? I personally don't see anything wrong with the way you have written the "if" (except readability).

I don't have all those libraries so I can't get a clean compile, but I suggest you do one both with and without the braces and compare the object files.

See Gammon Forum : Electronics : Microprocessors : Arduino programming traps, tips and style guide for how to disassemble the resulting .elf files.

Since adding the braces shouldn't make any difference I am guessing that you have overwritten memory somewhere, and that the braces have made the compiler make a slightly different register allocation, this different register allocation is springing a crash caused by the memory leak / overwriting.

How big can the (contents of the) variable iwrap_autocall_index get? (eg. 42)

It is what your code looked like was happening, as I mentioned, the function above it set index 12 to zero and commented on no name. And it seemed like you had used index 13 by accident. Ah well.

The repo code doesn't explicitly set the pointers to NULL, though I did modify line 133 in the "support_bluetooth2_iwrap.h" file to do this as a test:

You can not initialize variables in a header file. What that did was make the function a pure virtual function. Which means that, if the code compiled, some other class inherits from the class you modified, AND actually implements the function. So, you are NOT looking at the code that actually gets executed.

Thanks, everyone, for all of your comments so far. At Paul Stoffregen's recommendation I have ordered an AVR One and will see if what kind of debugging I can manage with Atmel Studio. (I actually found a used device available for a steal on eBay from a local US seller, so here's to hoping...)

KenF:
Have you actually checked your available memory. You may well be running out of sRam.

I haven't actually checked, no. I'll look into the best way to do this on a Teensy++ (I've seen some examples on the Arduino Playground, but haven't verified if they work as-is on this device). I suspect this is not to blame since the compiler projects RAM usage of less than 20% of what is available, and I do not have many large allocations at runtime. I'm acutely aware of the limited memory available and try whenever possible to use flash for constant data like string literals. Buffers malloc()'d at runtime are kept few and small, and free()'d in all cases to avoid memory leaks (assuming I haven't missed any, which is not impossible). But regardless, I will definitely test this to make sure.

Yes, the code on the repository right now exhibits the problem. Thanks for the pointer to object file comparison and .elf investigation; I was not aware of this approach before. I'll read up on this and see what I can do with it.

Depending on the number of paired devices, this can be between 0 and 15. In most applications, it will not go above perhaps 2 or 3. In my test case here, it does not go above 0 because there are no paired devices. The autocall algorithm never kicks in since iwrap_pairings equals 0, and the condition on line 566 won't pass.

PaulS:
You can not initialize variables in a header file. What that did was make the function a pure virtual function. Which means that, if the code compiled, some other class inherits from the class you modified, AND actually implements the function. So, you are NOT looking at the code that actually gets executed.

Maybe I am misunderstanding something about the Arduino build environment, but doesn't this only apply when you are using classes? All of the functions and function pointers involved in this are not part of any class definitions. The kg_evt_bluetooth_inquiry_response function pointer is a global variable, and the my_iwrap_evt_name function is a global function, not a member of a class. Unless the Arduino IDE is wrapping everything in its own giant class, I think it is still allowed to declare and initialize this kind of value in a header file, right?

I think it is still allowed to declare and initialize this kind of value in a header file, right?

You have so many files and such useless comments that it's hard to keep track of what's what.

If the function is not in a class, then nothing I said about it applies.

PaulS:
You have so many files and such useless comments that it's hard to keep track of what's what.

Honestly? That's a bit harsh. I won't disagree on the project complexity alone though. Here's a map:

http://www.keyglove.net/docs/arduino/support__bluetooth2__iwrap_8h.html
http://www.keyglove.net/docs/arduino/files.html

PaulS:
If the function is not in a class, then nothing I said about it applies.

Good to know.

I will continue investigating.

I sussed out your approach which is a bit unusual, but in itself I don't think would cause a problem*. You have a lot of code in .h files which is a bit of a novel approach, however since they are only included once, I don't think would be the issue.

More conventionally, you would have a lot of .cpp files, and manage calls from one to the other by using function prototypes in a .h file.

What device are you running this on? A Teensy++?

Is that the one with a AT90USB1286 processor? 8192 bytes of RAM, 130048 bytes of flash?

I seem to recall some issues with the Atmega2560 where code (or data?) crosses a 64 kB boundary.

  • Having said that, your .h file approach effectively produces a monolithic object file which may possibly have problems compared to separate compilation units.

I agree, this is an unusual project structure. I originally took that approach because I ran into issues trying to get the Arduino IDE to compile multiple .cpp/.h files in the same sketch, way back when I started. Even though I managed to get it to compile and work generally, it has created more than a few challenges concerning #include order and which files can reference which other files at which points. At the time, I chalked it up to the way Arduino combined sketch files. I started in 2010 with this, long before v1.0 of the IDE, and maybe things have changed since--or maybe I just had no idea what I was doing when I started, which is more likely, and it always would have worked if I did it the right way.

In either case, if it is possible to use the standard approach of .cpp/.h file combinations today, I will absolutely put in the effort needed to restructure the project.

The monolithic object file is yet another thing that I had not considered, and another reason to make the above change. Thanks for the insight here.

Yes, it's the Teensy++ 2.0 from PJRC, the AT90USB1286 variant.

Yes, it's the Teensy++ 2.0 from PJRC (thanks PaulS)

Paul Stoffregen, the maker of the teensy, posts under his complete name. I do not. You've got the wrong Paul S. there.

PaulS:
Paul Stoffregen, the maker of the teensy, posts under his complete name. I do not. You've got the wrong Paul S. there.

Ah. Oops. Sorry about that.

See: How to avoid the quirks of the IDE sketch file pre-preprocessing

I will definitely read through that prior to refactoring this project into a more normal format. Thanks!
^^^ EDIT: Good grief, this is fantastically simple. Amazing.

Follow-up on SRAM usage: according to Adafruit's quick RAM check function, I still have over 6kB of ram free out of the 8kB on the chip immediately prior to the offending lines of code, so it doesn't look like I'm running out of memory. That's what I thought, but all the same...good.

I am nowhere near the guru that many on here are, and what this reminds me of is something that I encountered about 30 years ago working on assembler on an IBM mainframe, but when a line of code stops or starts mis-behaving when another line of code is inserted (even a single blank assembly instruction), it would seem to me that there is some kind of memory bounds issue involved.
Not necessarily that there is not enough memory, but that one variable ends up writing over another variable.
Without comparing the decompiled code of the two versions (with and without braces), it would be impossible to figure out why. And even having the decompiled code would be extremely difficult.

Can you possibly re-order the sequence of functions within a portion of the code, so that this function is earlier by one or two functions, or later by one or two functions. If it continues to fail, you probably have something, somewhere, that is setting the pointer to that function (in the if statement), but setting it to something that does not act as a function, and thus it "hangs" or terminates in such a way as to mess things up. (If it doesn't continue to fail, then the code will probably fail elsewhere after having moved the function).

My guess, is that when you put the braces in, you have moved the problem to elsewhere within your code, because somewhere you are overwriting a bit of memory outside of where you think you are writing.

Quick update on this issue: I finished rearranging the code into standard C project form, rather than one .ino and a million .h files. This has greatly simplified the code organization, especially regarding order-of-include challenges that I previously had a hard time solving.

I also noticed that at some point, my Arduino IDE had been set to run the Teensy++ at 16MHz (IDE default), rather than 8MHz. It's been modified with the 3.3v LDO, and per Paul Stoffregen's recommendation, that's a bad idea. An AT90USB1286 @ 3.3v is technically overclocked at 16MHz. Maybe the setting got reset when I updated to v1.0.6 of the IDE instead of the v1.0.5-r2 that I had been using...I really don't know how long it has been that way, but I put it back to 8MHz in the middle of the code rearrangement process.

The API event no longer freezes like it did before, despite not having some of the extra-safe adjustments to the code (braces around the conditional block, explicit NULL tests in "if" conditions, etc.). So far, things appear to be 100% reliable again.

I cannot say for sure what change fixed this behavior since there were many of them simultaneously, and I'm also not sure whether the problem is actually gone or only moved. It could still be an invalid memory dereference operation or buffer overflow that is once again invisible. In case it shows up again. I now have an AVR One debugger to attack the problem with. For the moment though, this case is closed. Thanks for all of your help and insight!

Glad to help, and good to hear your code is better organized. :slight_smile: