Go Down

Topic: Non-executing code block freezes, but works if block is in { } braces (Read 8834 times) previous topic - next topic

Jeff Rowberg

Post code - this is all just hand-waving.
I did, a couple of replies ago:

The source file in question is here, on Github:

https://github.com/jrowberg/keyglove/blob/master/keyglove/support_bluetooth2_iwrap.h#L1105

The full project is on the same Github repo:

https://github.com/jrowberg/keyglove/tree/master/keyglove
(You can't put code formatting inside code tags)
My mistake; I forgot to verify whether that would work in the preview before posting. I have removed the incorrect tags. ...actually, in retrospect (adding this via an edit), the "Preview" button on the forum seems to post, rather than preview. That's the second time this has happened. Odd.

PaulS

In the code you posted (at the #105 link), kg_evt_bluetooth_inquiry_response is NOT explicitly set to NULL anywhere. Where is the pointer EXPLICITLY set to NULL. Where is it set to point to some real function?

MarkT

In short the presence or absence of braces is a red-herring, since it makes no
difference.  The presence of a memcpy() call earlier up is suggestive, you've likely
smashed the stack or heap. Small changes in code structure can lead to the compiler
generating different stack frame layout, which can interact with the stack corruption
in different ways - in other words the symptoms of memory corruption can be anything
at all and typically are Heisenbugs.  You may have run out of RAM.

The only ways the braces could have an effect is if there is some broken macro
being expended that turns what looks like a single assigment statement into several
statements, or if the compiler has a bug.  Are you using any macros?
[ I DO NOT respond to personal messages, I WILL delete them unread, use the forum please ]

pYro_65

In one function you set the name at position 13 ( index 12 )

Code: [Select]
payload[12] = 0; // no NAME yet

However in the function you referenced, you do not use index 12 at all.

Code: [Select]
if (name_len) memcpy(payload + 13, friendly_name, name_len);

So the name you copy in overruns the array by 1 byte!

And index twelve is unset. (could be zero, which is effectively an empty string.)

Jeff Rowberg

In the code you posted (at the #105 link), kg_evt_bluetooth_inquiry_response is NOT explicitly set to NULL anywhere. Where is the pointer EXPLICITLY set to NULL. Where is it set to point to some real function?
The repo code doesn't explicitly set the pointers to NULL, though I did modify line 133 in the "support_bluetooth2_iwrap.h" file to do this as a test:

Code: [Select]

/* 0x03 */ uint8_t (*kg_evt_bluetooth_inquiry_response)(uint8_t *address, uint8_t *cod, int8_t rssi, uint8_t status, uint8_t index, uint8_t name_len, uint8_t *name_data) = 0;


The compiler output for this run was not different than leaving it uninitialized in the source. However, C standards say that global variables are initialized to zero at compile time, and these are global. They should all be zero anyway, and the behavior of the rest of the application suggests this. For testing, I've added zero assignments to the code again (see lines 131-139).

There are also no places in the code where that particular function pointer is assigned to any real functions, but this is by design because that callback is left to the user to implement only if needed. It's typically left as null because it's rarely needed for a user application.

In short the presence or absence of braces is a red-herring, since it makes no difference.  The presence of a memcpy() call earlier up is suggestive, you've likely smashed the stack or heap. Small changes in code structure can lead to the compiler generating different stack frame layout, which can interact with the stack corruption in different ways - in other words the symptoms of memory corruption can be anything at all and typically are Heisenbugs.  You may have run out of RAM.
THIS is exactly what I expect. ("Heisenbug" is such an appropriate term, too.) The bug appears until I add debug info to look at it, and then it goes away despite no apparent link between the added debug code and the broken behavior. I really do suspect something odd in my code, but can't inspect the stack in a non-invasive way to see what actually happens. Typically I would expect that just adding braces around a single line of code shouldn't do this though...and it appears that with the other changes I've played with today and left in the code, adding braces no longer fixes it (see below). I'm actually glad that this is still reliably occurring right now.

The only ways the braces could have an effect is if there is some broken macro being expended that turns what looks like a single assigment statement into several statements, or if the compiler has a bug.  Are you using any macros?
I'm not using macros that I'm aware of, no. I even checked the Arduino IDE build output (in the \Temp\buildxxxxxxxx folder in my userdir AppData location) to see if anything was modified, but it's exactly as I wrote it. The code there is just normal code, no macros.

In one function you set the name at position 13 ( index 12 )
...
And index twelve is unset. (could be zero, which is effectively an empty string.)
The contents of the "my_iwrap_evt_name" function are all correct in this case except for the fact that index 12 is unset. This is a bug that I introduced today after adding and removing various debugging attempts inside the function. I've fixed it so that this byte is assigned properly again, with the same line of code that used to be there:

Code: [Select]

payload[12] = name_len;


Code: [Select]
if (name_len) memcpy(payload + 13, friendly_name, name_len);

So the name you copy in overruns the array by 1 byte!
I don't think I've overlooked something here, but let me talk through it one more time. The payload packet is always at least 13 bytes (so "uint8_t payload[13]" is the right size). But there could be between 0 and name_len more bytes of ASCII name data (so actually "uint8_t payload[13 + name_len];" is the right size). The memcpy call doesn't include the null terminator, but only the number of actual ASCII character bytes. The payload[12] byte (the 13th byte) contains the name length, and the remaining bytes (if any) are the actual name. There should be enough memory allocated in every case. I think.

The code as it exists on Github right now reliably demonstrates the issue on my hardware. I can currently make it happen every single time I run a device inquiry ("kg_cmd_bluetooth_discover(10);"). Note that this code now has all of the following:

  • Explicit function pointer initialization to 0
  • Explicit pointer comparison to 0 in conditional statement on line 1108
  • Braces around the "if" statement's conditional block


I have noticed that setting name_len to 0 regardless of what is actually in friendly_name appears to get avoid the freeze condition. However, adding debug serial output that spits out the value of name_len to verify that it is correct (which it is) also avoids the freeze condition, despite the fact that in that test case name_len is left alone and not set to 0. The presence of the debug output alone prevents the issue from occurring. Argh.

I will keep trying different modifications and tracing backwards to see where I might be overflowing a buffer or something.

KenF

Have you actually checked your available memory.  You may well be running out of sRam.

nickgammon

Can you confirm that the code as currently found on GitHub also exhibits these characteristics (described in the OP)? I personally don't see anything wrong with the way you have written the "if" (except readability).

I don't have all those libraries so I can't get a clean compile, but I suggest you do one both with and without the braces and compare the object files.

See http://www.gammon.com.au/forum/?id=12153#info1 for how to disassemble the resulting .elf files.

Since adding the braces shouldn't make any difference I am guessing that you have overwritten memory somewhere, and that the braces have made the compiler make a slightly different register allocation, this different register allocation is springing a crash caused by the memory leak / overwriting.
Please post technical questions on the forum, not by personal message. Thanks!

More info: http://www.gammon.com.au/electronics

nickgammon

How big can the (contents of the) variable iwrap_autocall_index get? (eg. 42)
Please post technical questions on the forum, not by personal message. Thanks!

More info: http://www.gammon.com.au/electronics

pYro_65

It is what your code looked like was happening, as I mentioned, the function above it set index 12 to zero and commented on no name. And it seemed like you had used index 13 by accident. Ah well.

PaulS

Quote
The repo code doesn't explicitly set the pointers to NULL, though I did modify line 133 in the "support_bluetooth2_iwrap.h" file to do this as a test:
You can not initialize variables in a header file. What that did was make the function a pure virtual function. Which means that, if the code compiled, some other class inherits from the class you modified, AND actually implements the function. So, you are NOT looking at the code that actually gets executed.

Jeff Rowberg

Thanks, everyone, for all of your comments so far. At Paul Stoffregen's recommendation I have ordered an AVR One and will see if what kind of debugging I can manage with Atmel Studio. (I actually found a used device available for a steal on eBay from a local US seller, so here's to hoping...)

Have you actually checked your available memory.  You may well be running out of sRam.
I haven't actually checked, no. I'll look into the best way to do this on a Teensy++ (I've seen some examples on the Arduino Playground, but haven't verified if they work as-is on this device). I suspect this is not to blame since the compiler projects RAM usage of less than 20% of what is available, and I do not have many large allocations at runtime. I'm acutely aware of the limited memory available and try whenever possible to use flash for constant data like string literals. Buffers malloc()'d at runtime are kept few and small, and free()'d in all cases to avoid memory leaks (assuming I haven't missed any, which is not impossible). But regardless, I will definitely test this to make sure.

Can you confirm that the code as currently found on GitHub also exhibits these characteristics (described in the OP)? ... this different register allocation is springing a crash caused by the memory leak / overwriting.
Yes, the code on the repository right now exhibits the problem. Thanks for the pointer to object file comparison and .elf investigation; I was not aware of this approach before. I'll read up on this and see what I can do with it.

How big can the (contents of the) variable iwrap_autocall_index get? (eg. 42)
Depending on the number of paired devices, this can be between 0 and 15. In most applications, it will not go above perhaps 2 or 3. In my test case here, it does not go above 0 because there are no paired devices. The autocall algorithm never kicks in since iwrap_pairings equals 0, and the condition on line 566 won't pass.

You can not initialize variables in a header file. What that did was make the function a pure virtual function. Which means that, if the code compiled, some other class inherits from the class you modified, AND actually implements the function. So, you are NOT looking at the code that actually gets executed.
Maybe I am misunderstanding something about the Arduino build environment, but doesn't this only apply when you are using classes? All of the functions and function pointers involved in this are not part of any class definitions. The kg_evt_bluetooth_inquiry_response function pointer is a global variable, and the my_iwrap_evt_name function is a global function, not a member of a class. Unless the Arduino IDE is wrapping everything in its own giant class, I think it is still allowed to declare and initialize this kind of value in a header file, right?

PaulS

Quote
I think it is still allowed to declare and initialize this kind of value in a header file, right?
You have so many files and such useless comments that it's hard to keep track of what's what.

If the function is not in a class, then nothing I said about it applies.

Jeff Rowberg

You have so many files and such useless comments that it's hard to keep track of what's what.
Honestly? That's a bit harsh. I won't disagree on the project complexity alone though. Here's a map:

http://www.keyglove.net/docs/arduino/support__bluetooth2__iwrap_8h.html
http://www.keyglove.net/docs/arduino/files.html

If the function is not in a class, then nothing I said about it applies.
Good to know.

I will continue investigating.

nickgammon

I sussed out your approach which is a bit unusual, but in itself I don't think would cause a problem*. You have a lot of code in .h files which is a bit of a novel approach, however since they are only included once, I don't think would be the issue.

More conventionally, you would have a lot of .cpp files, and manage calls from one to the other by using function prototypes in a .h file.

What device are you running this on? A Teensy++?

Is that the one with a AT90USB1286 processor? 8192 bytes of RAM, 130048 bytes of flash?

I seem to recall some issues with the Atmega2560 where code (or data?) crosses a 64 kB boundary.

* Having said that, your .h file approach effectively produces a monolithic object file which may possibly have problems compared to separate compilation units.
Please post technical questions on the forum, not by personal message. Thanks!

More info: http://www.gammon.com.au/electronics

Jeff Rowberg

I sussed out your approach which is a bit unusual, but in itself I don't think would cause a problem...more conventionally, you would have a lot of .cpp files, and manage calls from one to the other by using function prototypes in a .h file. ... *Having said that, your .h file approach effectively produces a monolithic object file which may possibly have problems compared to separate compilation units.
I agree, this is an unusual project structure. I originally took that approach because I ran into issues trying to get the Arduino IDE to compile multiple .cpp/.h files in the same sketch, way back when I started. Even though I managed to get it to compile and work generally, it has created more than a few challenges concerning #include order and which files can reference which other files at which points. At the time, I chalked it up to the way Arduino combined sketch files. I started in 2010 with this, long before v1.0 of the IDE, and maybe things have changed since--or maybe I just had no idea what I was doing when I started, which is more likely, and it always would have worked if I did it the right way.

In either case, if it is possible to use the standard approach of .cpp/.h file combinations today, I will absolutely put in the effort needed to restructure the project.

The monolithic object file is yet another thing that I had not considered, and another reason to make the above change. Thanks for the insight here.

What device are you running this on? A Teensy++? Is that the one with a AT90USB1286 processor? 8192 bytes of RAM, 130048 bytes of flash? ...I seem to recall some issues with the Atmega2560 where code (or data?) crosses a 64 kB boundary.
Yes, it's the Teensy++ 2.0 from PJRC, the AT90USB1286 variant.

Go Up