Non-executing code block freezes, but works if block is in { } braces

I am running into a problem which seems to have to do with the code execution pointer, and the behavior makes zero sense at all. There is a conditional statement which doesn't match, and since the code it controls is only one line, I omitted the surrounding { } braces for brevity. This is a pattern that I have reused many other times all throughout this fairly complex project without any observable bad consequences.

However, when I let the code run this way, execution apparently stops at that line. Here is the code block:

/**
 * @brief iWRAP "NAME" event handler
 * @param[in] mac MAC address of remote device
 * @param[in] friendly_name Null-ternimated device name string
 */
void my_iwrap_evt_name(const iwrap_address_t *mac, const char *friendly_name) {
    uint8_t name_len = 0;
    if (friendly_name) name_len = strlen(friendly_name);

    uint8_t payload[13 + name_len];
    payload[0] = mac -> address[5];
    payload[1] = mac -> address[4];
    payload[2] = mac -> address[3];
    payload[3] = mac -> address[2];
    payload[4] = mac -> address[1];
    payload[5] = mac -> address[0];
    payload[6] = 0;
    payload[7] = 0;
    payload[8] = 0;
    payload[9] = 0; // RSSI
    payload[10] = 4; // inquiry status
    payload[11] = find_pairing_from_mac(mac);
    if (name_len) memcpy(payload + 13, friendly_name, name_len);

    // send kg_evt_bluetooth_inquiry_response(...) event
    skipPacket = 0;
    if (kg_evt_bluetooth_inquiry_response) skipPacket = kg_evt_bluetooth_inquiry_response(payload + 0, payload + 6, payload[9], payload[10], payload[11], payload[12], name_len ? payload + 13 : 0);
    if (!skipPacket) send_keyglove_packet(KG_PACKET_TYPE_EVENT, 13 + name_len, KG_PACKET_CLASS_BLUETOOTH, KG_PACKET_ID_EVT_BLUETOOTH_INQUIRY_RESPONSE, payload);
}

The offending line of code is this:

    if (kg_evt_bluetooth_inquiry_response) skipPacket = kg_evt_bluetooth_inquiry_response(payload + 0, payload + 6, payload[9], payload[10], payload[11], payload[12], name_len ? payload + 13 : 0);

Note that "kg_evt_bluetooth_inquiry_response" function pointer which is currently NULL, so this condition doesn't match. Now, if I modify this one line of code to include braces:

    if (kg_evt_bluetooth_inquiry_response) {
        skipPacket = kg_evt_bluetooth_inquiry_response(payload + 0, payload + 6, payload[9], payload[10], payload[11], payload[12], name_len ? payload + 13 : 0);
    }

Suddenly, everything works. There is NO reason I can think of why this should be occurring. Is there some weird Arduino IDE Java -> C++ conversion that is messing with the code blocks before compiling? Can I check this somehow?

I have also noticed that if I add a log output command (not shown here) earlier inside the function, in a way that is logically unrelated (and separated by many lines of code) from the condition shown here, the problem also goes away even if I don't add braces. I suspect some weird memory leak in my code or a buffer overflow or something, but I have no way of doing low-level debugging with breakpoints, watches, etc.

I am using Arduino v1.0.6 with a Teensy++ 2.0 (AT90USB1286) and Teensyduino v1.20 (latest as of this moment).

Condition or assignment?
Did you mean ==?

In your statement:

 if (kg_evt_bluetooth_inquiry_response) {
        skipPacket = kg_evt_bluetooth_inquiry_response(payload + 0, payload + 6, payload[9], payload[10], payload[11], payload[12], name_len ? payload + 13 : 0);
    }

what do you think the function call name in the sub-expression:

  if(kg_evt_bluetooth_inquiry_response)

does?

AWOL:
Condition or assignment?
Did you mean ==?

The "skipPacket" variable should be assigned if the condition matches and the function is executed, but in this case there should be only the test for a non-null function pointer, which should fail since the function pointer is null.

econjack:
In your statement:

 if (kg_evt_bluetooth_inquiry_response) {

skipPacket = kg_evt_bluetooth_inquiry_response(payload + 0, payload + 6, payload[9], payload[10], payload[11], payload[12], name_len ? payload + 13 : 0);
   }




what do you think the function call name in the sub-expression:



if(kg_evt_bluetooth_inquiry_response)




does?

The condition is meant to check whether the function pointer has been assigned to an actual function or not. The function pointer is declared like this, but not initialized:

uint8_t (*kg_evt_bluetooth_inquiry_response)(uint8_t *address, uint8_t *cod, int8_t rssi, uint8_t status, uint8_t index, uint8_t name_len, uint8_t *name_data);

In this circumstance, the non-dereferenced "kg_evt_bluetooth_inquiry_response" pointer is null, so "if (null)" evaluates to false (or it should). This kind of conditional check for assigned vs. unassigned function pointers as optionally defined callbacks is present all over the place in the code, and they appear to work fine--with this one exception.

The "skipPacket" variable should be assigned if the condition matches

Which condition?

if (kg_evt_bluetooth_inquiry_response) skipPacket = kg_evt_bluetooth_inquiry_response(payload + 0, payload + 6, payload[9], payload[10], payload[11], payload[12], name_len ? payload + 13 : 0);

Here, "kg_evt_bluetooth_inquiry_response" is called unconditionally, the return value assigned to "skipPacket" and the result of that assignment tested.
However the semicolon at the end means no decision is taken as a result of that test.

AWOL:
Which condition?

This one:

if (kg_evt_bluetooth_inquiry_response) skipPacket = kg_evt_bluetooth_inquiry_response(payload + 0, payload + 6, payload[9], payload[10], payload[11], payload[12], name_len ? payload + 13 : 0);

Translated into English:

"If the kg_evt_bluetooth_inquiry_response function pointer is non-null--in other words, if it has been set to point to a real function instead--then run the kg_evt_bluetooth_inquiry_response function with the supplied arguments, and store the return value in skipPacket."

The skipPacket variable is declared and initialized to 0 one line above this, so it starts with a known value. But it should not be modified regardless, since the "if (kg_evt_bluetooth_inquiry_response)" condition should evaluate as false, and the line of code the subsequently would call that function should not be executed.

My understanding of function pointers (and the rest of the code and other libraries I have written) suggests that if you have a function pointer declared like this:

uint8_t (*my_function_pointer)(uint8_t arg1, uint8_t arg2);

...and then you use only the pointer variable name without a parameter list like this:

if (my_function_pointer != null) { ... }

...it tests the pointer's assigned memory address only, and does not actually call the function (which is what I want here). On the other hand, if you include the parameter list:

if (my_function_pointer(10, 20) != null) { ... }

...then it will dereference the pointer and execute the real function that it is pointing to, with the supplied arguments, and then the conditional statement will evaluate the return value rather than the function pointer's memory address.

Translated into English:

"If the kg_evt_bluetooth_inquiry_response function pointer is non-null--in other words, if it has been set to point to a real function instead--then run the kg_evt_bluetooth_inquiry_response function with the supplied arguments, and store the return value in skipPacket."

That's not how to translate that code into English.
You're calling the function whether or not the pointer to it is NULL.

if (fn && (result = fn(paramList)))
{
  doThis();
}

Tests the function pointer and if it is not NULL, calls the function and assigns the value returned by the function to the variable "result".
If the value of "result" is non-zero, the function "doThis" will be executed.

Until "notify me" is fixed.
I know this should not matter, but have you actually tried to compare to null?
I am not questioning you, but does the process stops at the first if(...) or executes the next if(!...)?
Just asking.

I think it's time for you to post ALL of your code, so that we can try to reproduce the problem.

I'm wondering if you expressly initialised kg_evt_bluetooth_inquiry_response to be null when you declared it. Such as void (*kg_evt_bluetooth_inquiry_response)()=NULL;

But we could all make guesses until the cows come home. Unless you actually post your code, we're all just fumbling around in the dark.

AWOL:
You're calling the function whether or not the pointer to it is NULL.

I follow the example you give, but this statement above doesn't make sense in light of my code or your example. If that were true given the syntax I have in my code, then I would have a gigantic mess of runtime errors all over the place in this codebase and others where attempts to check for function pointer assignment are in fact dereferencing null function pointers and trying to call them. But that is demonstrably not what happens.

These examples from elsewhere online follow my understanding:

In short:

  • Using a function pointer without a parameter list gives you the pointer address.
  • Using a function pointer with a parameter list executes the function.

If my code were actually calling the function pointer in the conditional statement, then your code would be doing so twice. In your example (emphasis mine):

AWOL:

if (fn && (result = fn(paramList)))

{
 doThis();
}



**Tests the function pointer and [u]if it is not NULL[/u]**, calls the function and assigns the value returned by the function to the variable "result". If the value of "result" is non-zero, the function "doThis" will be executed.

That's exactly what I'm saying. The condition "fn && (result = fn(paramList))" will first check to make sure the pointer is not null, and then if it is not null the second half of the logically AND'ed condition will be evaluated (i.e. "fn(paramList)" will run and have its return value stored in the "result" variable).

But I don't want to evaluate the return value inside my if statement's condition; I want it to be run as a result of the condition. In contrast, my code is equivalent to this mutation of your example:

if (fn)
{
  result = fn(paramList);
}

Presumably, this should test the function pointer to ensure it is not null first; then, if it is not null, call the function and store the return value. This is what I am going for. Visually rearranging my code above, it's the same:

if (kg_evt_bluetooth_inquiry_response)
{
  skipPacket = kg_evt_bluetooth_inquiry_response(payload + 0, payload + 6, payload[9], payload[10], payload[11], payload[12], name_len ? payload + 13 : 0);
}

Except, I left the braces off because it's one line:

if (kg_evt_bluetooth_inquiry_response)
    skipPacket = kg_evt_bluetooth_inquiry_response(payload + 0, payload + 6, payload[9], payload[10], payload[11], payload[12], name_len ? payload + 13 : 0);

Is there anything wrong with this?

Vaclav:
I know this should not matter, but have you actually tried to compare to null?

Good question; I have not tried this yet, though it has not been necessary in other places. Using "if (kg_evt_bluetooth_inquiry_response)" should be logically the same as "if (kg_evt_bluetooth_inquiry_response != 0)" as far as I know, but it is worth trying just to be sure.

The challenge here, and why I suspect a memory leak or problem somewhere else in my code, is that seemingly irrelevant modifications suddenly cause this to work again (e.g. adding a debug output line). This is why I don't think it's necessarily due to this specific block of code, but at the same time I don't know where to look.

PaulS:
I think it's time for you to post ALL of your code, so that we can try to reproduce the problem.

No problem as far as I'm concerned, but it's a large file in a project with dozens of others, some of which rely on additional hardware outside the Teensy++ 2.0 board (specifically a Bluegiga WT12 module in this particular case). Reproducing the behavior externally may be difficult. The source file in question is here, on Github:

The full project is on the same Github repo:

KenF:
I'm wondering if you expressly initialised kg_evt_bluetooth_inquiry_response to be null when you declared it. Such as void (*kg_evt_bluetooth_inquiry_response)()=NULL;

I did not, but I can try that as well.

However, the seemingly inexplicable fact remains that the exact same code works with braces around the (non-matching!) conditional code block, and doesn't work when the braces are gone, despite the conditional code being exactly one line.

And, similarly, the same code works (without braces in both cases) when a debug output line is present a dozen lines up, while it locks up without the debug line present.

If my code were actually calling the function pointer in the conditional statement, then your code would be doing so twice.

Nope.

if (fn)

does not call the function, even once, it merely tests the function pointer to decide if it is or is not null.

if (fn && fn (parameterList))

does not call the function iff "fn" is null.
It doesn't check to see if "fn" is a valid pointer, only that it is or is not null.
The function can only be called once if the pointer is not null

AWOL:

if (fn)

does not call the function, even once, it merely tests the function pointer to decide if it is or is not null.

if (fn && fn (parameterList))

does not call the function iff "fn" is null.
It doesn't check to see if "fn" is a valid pointer, only that it is or is not null.

I think we are talking past each other. This is exactly what I was saying about my original code example, and exactly how I want it to work and how it does work in every one of the dozens of cases where I have this kind of code, except this one. Which is why I am pulling my hair out.

I do realize that the "if (fn)" condition does not validate the memory address if one is assigned, but only checks for null vs. non-null. However, adding braces around the conditional block should have zero impact on whether the function pointer being tested contains null or not, yet this what seems to be occurring (although I know it cannot be, which is why I wish I had hardware-level debug access).

Also, to test, I have just tried adding explicit zero initializations:

uint8_t (*func_pointer)() = 0;

...and explicit non-zero conditional checks:

if (func_pointer != 0) ...

...and explicit braces around the single-line conditional blocks:

if (func_pointer != 0) { ... }

...and the Arduino compiler reports the exact same flash and RAM usage before and after this modification. I will leave the code in for slightly better (?) explicit readability and try testing with it as well, but it seems that the compiler is optimizing these explicit bits out, or at least not optimizing any differently than before they were added.

Post code - this is all just hand-waving.
(You can't put code formatting inside code tags)

AWOL:
Post code - this is all just hand-waving.

I did, a couple of replies ago:

AWOL:
(You can't put code formatting inside code tags)

My mistake; I forgot to verify whether that would work in the preview before posting. I have removed the incorrect tags. ...actually, in retrospect (adding this via an edit), the "Preview" button on the forum seems to post, rather than preview. That's the second time this has happened. Odd.

In the code you posted (at the #105 link), kg_evt_bluetooth_inquiry_response is NOT explicitly set to NULL anywhere. Where is the pointer EXPLICITLY set to NULL. Where is it set to point to some real function?

In short the presence or absence of braces is a red-herring, since it makes no
difference. The presence of a memcpy() call earlier up is suggestive, you've likely
smashed the stack or heap. Small changes in code structure can lead to the compiler
generating different stack frame layout, which can interact with the stack corruption
in different ways - in other words the symptoms of memory corruption can be anything
at all and typically are Heisenbugs. You may have run out of RAM.

The only ways the braces could have an effect is if there is some broken macro
being expended that turns what looks like a single assigment statement into several
statements, or if the compiler has a bug. Are you using any macros?

In one function you set the name at position 13 ( index 12 )

payload[12] = 0; // no NAME yet

However in the function you referenced, you do not use index 12 at all.

if (name_len) memcpy(payload + 13, friendly_name, name_len);

So the name you copy in overruns the array by 1 byte!

And index twelve is unset. (could be zero, which is effectively an empty string.)

PaulS:
In the code you posted (at the #105 link), kg_evt_bluetooth_inquiry_response is NOT explicitly set to NULL anywhere. Where is the pointer EXPLICITLY set to NULL. Where is it set to point to some real function?

The repo code doesn't explicitly set the pointers to NULL, though I did modify line 133 in the "support_bluetooth2_iwrap.h" file to do this as a test:

/* 0x03 */ uint8_t (*kg_evt_bluetooth_inquiry_response)(uint8_t *address, uint8_t *cod, int8_t rssi, uint8_t status, uint8_t index, uint8_t name_len, uint8_t *name_data) = 0;

The compiler output for this run was not different than leaving it uninitialized in the source. However, C standards say that global variables are initialized to zero at compile time, and these are global. They should all be zero anyway, and the behavior of the rest of the application suggests this. For testing, I've added zero assignments to the code again (see lines 131-139).

There are also no places in the code where that particular function pointer is assigned to any real functions, but this is by design because that callback is left to the user to implement only if needed. It's typically left as null because it's rarely needed for a user application.

MarkT:
In short the presence or absence of braces is a red-herring, since it makes no difference. The presence of a memcpy() call earlier up is suggestive, you've likely smashed the stack or heap. Small changes in code structure can lead to the compiler generating different stack frame layout, which can interact with the stack corruption in different ways - in other words the symptoms of memory corruption can be anything at all and typically are Heisenbugs. You may have run out of RAM.

THIS is exactly what I expect. ("Heisenbug" is such an appropriate term, too.) The bug appears until I add debug info to look at it, and then it goes away despite no apparent link between the added debug code and the broken behavior. I really do suspect something odd in my code, but can't inspect the stack in a non-invasive way to see what actually happens. Typically I would expect that just adding braces around a single line of code shouldn't do this though...and it appears that with the other changes I've played with today and left in the code, adding braces no longer fixes it (see below). I'm actually glad that this is still reliably occurring right now.

MarkT:
The only ways the braces could have an effect is if there is some broken macro being expended that turns what looks like a single assigment statement into several statements, or if the compiler has a bug. Are you using any macros?

I'm not using macros that I'm aware of, no. I even checked the Arduino IDE build output (in the \Temp\buildxxxxxxxx folder in my userdir AppData location) to see if anything was modified, but it's exactly as I wrote it. The code there is just normal code, no macros.

pYro_65:
In one function you set the name at position 13 ( index 12 )
...
And index twelve is unset. (could be zero, which is effectively an empty string.)

The contents of the "my_iwrap_evt_name" function are all correct in this case except for the fact that index 12 is unset. This is a bug that I introduced today after adding and removing various debugging attempts inside the function. I've fixed it so that this byte is assigned properly again, with the same line of code that used to be there:

payload[12] = name_len;

pYro_65:

if (name_len) memcpy(payload + 13, friendly_name, name_len);

So the name you copy in overruns the array by 1 byte!

I don't think I've overlooked something here, but let me talk through it one more time. The payload packet is always at least 13 bytes (so "uint8_t payload[13]" is the right size). But there could be between 0 and name_len more bytes of ASCII name data (so actually "uint8_t payload[13 + name_len];" is the right size). The memcpy call doesn't include the null terminator, but only the number of actual ASCII character bytes. The payload[12] byte (the 13th byte) contains the name length, and the remaining bytes (if any) are the actual name. There should be enough memory allocated in every case. I think.

The code as it exists on Github right now reliably demonstrates the issue on my hardware. I can currently make it happen every single time I run a device inquiry ("kg_cmd_bluetooth_discover(10);"). Note that this code now has all of the following:

  • Explicit function pointer initialization to 0
  • Explicit pointer comparison to 0 in conditional statement on line 1108
  • Braces around the "if" statement's conditional block

I have noticed that setting name_len to 0 regardless of what is actually in friendly_name appears to get avoid the freeze condition. However, adding debug serial output that spits out the value of name_len to verify that it is correct (which it is) also avoids the freeze condition, despite the fact that in that test case name_len is left alone and not set to 0. The presence of the debug output alone prevents the issue from occurring. Argh.

I will keep trying different modifications and tracing backwards to see where I might be overflowing a buffer or something.