Go Down

Topic: What is Scope? (Read 10204 times) previous topic - next topic

econjack

Scope

One concept that is important for a programmer to understand is scope. Simply stated, scope is the lifetime and visibility a variable has in a program. There are essentially three basic types of scope:

     1) global
     2) function (also called local)
     3) statement block

We'll consider each of these individually.

Global Scope

For a variable to have global scope, the variable must be defined outside of a function block. We can define a variable with global scope as follows.

   Definition: A variable has global scope if the definition of the variable is made
   outside of all program functions

   Properties:
                1. A variable with global scope is visible from its point of definition to the end
          of the file in which it is defined
                2. A variable with global scope has a lifetime that exists from the time the
                    program is loaded until the program terminates

What do we mean by the phrase "point of definition"? Consider the following program:

Code: [Select]

int val; // <--- val starts its life with the semicolon

void setup() {
  // put your setup code here, to run once:
  val = 10;
}

void loop() {
    // put your main code here, to run repeatedly:

}
// <--- val ends its life here, the end of the source code file

   
At the top of the program a variable named val is defined. Because it is defined outside of any function (e.g., the setup() or loop() functions), val has global scope. For discussion purposes, val's lifetime starts when the compiler sees the semicolon at the end of the int val statement. Before the next statement is read by the compiler, the compiler has created an entry for val in its symbol table. A compiler symbol table is used to record the details about a variable when its lifetime starts. For our discussion, the important details are 1) its type (an int), 2) its name (val), 3) its scope (global, because of where it is defined), and 4) that memory is allocated for val. If you try to define a variable with the same name at the same scope level, you will get a "duplicate definition" error.

It is important to note that the scope of val extends from its point of definition to the end of the source code file in which it is defined. In the next example, we define a second variable, num, at the bottom of the file, but we try to use it in setup():

Code: [Select]

int val; // <--- val starts its life with the semicolon

void setup() {
  // put your setup code here, to run once:
  val = 10;
  num = 50; // Error here!!
}

void loop() {
    // put your main code here, to run repeatedly:

}
int num; // Global scope, but no code after its definition...worthless
// <--- val ends its life here, the end of the source code file


If you try to compile this program, you get an error message on the assignment statement for num in setup() which says: " 'num' was not declared in this scope". (Actually, the error message should say "num is not defined in this scope". As we shall see later, define and declare are two different concepts and many programmers are sloppy about making the distinction.) The important thing to notice, however, is that our code tries to use num before it is defined. Since the definition of num doesn't occur until the end of the source code file, it comes to life for about a nano second and then the end of the file is read and it dies. Putting the definition of num at the end of the file does no good since it doesn't come into scope until the last point in the program. If you move the definition of num to a new line just below the definition of val, the program compiles without error.


Pros and Cons of Global Scope


Pros: Global scope does make it easier for all aspects of your program to have access to a variable. If you define a variable at the top of the program with global scope, every statement in your source code file can use/change the value of that globally-defined variable. For example:

Code: [Select]

int val;

void setup() {

  Serial.begin(9600);
  val = 10;
  SquareIt();
  Serial.print("The square is: ");
  Serial.println(val);
}

void loop() {

}

void SquareIt()
{
  val = val * val;
}


In this example, we wrote a new function named SquareIt() that takes the value of val and multiplies it by itself. Because val has global scope, the SquareIt() function knows where val lives in memory, can fetch its value and square it. The conclusion is that global scope makes it easy for other parts of the program to access that variable.

Cons: Global scope makes it too easy for all aspects of your program to have access to a variable. For example, suppose you have a large program with several thousand program statements in it and a variable with global scope. For reasons you can't explain, that variable takes on an unexpected value when the program is run. Because every statement can "see" and "use" that global variable, you don't really know where to start looking for the program error. The problem could be anywhere that variable is in scope!

A global variable is like putting a prostitute at the top of your program and then giving every assignment statement in your program a $50 bill. You don't know who is the father of the error is or where that error was conceived. Global scope makes it more difficult to find and correct a program error (in a process called debugging) simply because every statement in the program can see and modify the global variable. As a result, it is desirable to limit the number of variables defined with global scope as much as possible. The process of limiting the scope of a variable is called encapsulation. The more you can encapsulate a variable, the easier it is to debug a program.

econjack

#1
Mar 01, 2016, 02:10 pm Last Edit: Mar 02, 2016, 09:21 pm by econjack
Define versus Declare

As you saw earlier, the compiler maintains a symbol table during the compile process to record details about each data item in a program. However, a data definition and a data declaration are two different animals:

   declaration: an entry in the symbol table that constructs an attribute list for a data item. No
      memory is allocated for the data item. A data declaration is most often used for data
      type checking.

   definition: an entry in the symbol table that constructs an attribute list for a data item and
      allocates memory for that data item. A variable cannot store data until it is defined.

In other words, a data definition subsumes a data declaration. Both terms build an attribute list, but only a data definition actually creates a variable capable of storing information by allocating memory for it.

econjack

Function Block Scope

Function block scope applies to variables that are defined within a function block. Function block  scope (also called local scope) for a variable extends from its point of definition to the end of the function block in which it is define. For example, let's take the last program and move the definition of val into the setup() function:

Code: [Select]


void setup() {
  int val; // ← val comes into scope here
  Serial.begin(9600);
  val = 10;
  SquareIt();
  Serial.print("The square is: ");
  Serial.println(val);
}
// ← val goes out of scope here

void loop() {

}

void SquareIt()
{
  val = val * val;
}


Because val is defined within the setup() function, its scope extends from the semicolon at the end of its definition to the closing brace of the setup() function. Stated differently, no aspect of the program outside of setup() even knows val exists. In fact, if you try to compile the program above, you get an error in the SquareIt() function that says: " 'val' was not declared in this scope".

So, if I want to square the value of val, how can SquareIt() work if it can't access val because of function block scoping rules? Easy...we send the value of val to SquareIt(). We simply make a copy of the value stored in val and send that copy to SquareIt(). To do that, we need to change SquareIt() so it can receive the copy of val for its own use. So, let's try the following changes:

Code: [Select]

void setup() {
  int val;
  Serial.begin(9600);
  val = 10;
  SquareIt(val); // Pass val to SquareIt()
  Serial.print("The square is: ");
  Serial.println(val);
}

void loop() {

}

void SquareIt(int val) // SquareIt() can now receive val
{
  val = val * val;
}


Now the program compiles without error. Alas, it doesn't work correctly either. It shows the value of val to still be 10 after the call to the SquareIt() function even though it should be 100. What went wrong?

Look at the first line of the SquareIt() function. What does the expression enclosed in parentheses:

(int val)

look like? It looks almost identical to the first statement in setup()...the definition of val. How can that be? We already have val defined in setup(), so shouldn't we get a duplicate definition error for this val defined in SquareIt()? No, we won't get an error message because the val in SquareIt() also has function scope, so it knows nothing about the val defined in setup(). The val defined in SquareIt() is a completely different variable that lives in a different place in memory.

So how can we fix SquareIt() so it works the way we want it to? Make the following changes to the program:

Code: [Select]

void setup() {
  int val;
  Serial.begin(9600);
  val = 10;
  val = SquareIt(val); // Note the assignment operator
  Serial.print("The square is: ");
  Serial.println(val);
}

void loop() {
}

// This is the function type specifier, int
//    |
//    |
    int SquareIt(int val) // Note change from void to int
    {
      return val * val; // Note the return statement
    }


There are three changes that need to be made: First, we use an assignment statement as part of the function call to SquareIt(). We are now taking the value returned from the function and assigning it into val. Second, for this assignment to work, we need to change SquareIt()'s definition from a function that returned nothing (void) to the caller to a function that returns an integer (int) value. Third, we changed the function type specifier from its earlier type void to its new type int. Because the function now returns an integer value, we need to add a return statement to the function.

This line:

int SquareIt(int val) // The function signature

is often called a function signature and has three parts: 1) the function type specifier (int) which tells you the type of data this function sends back to the user, 2) the name of the function (SquareIt), and 3) the parameter list which is contained within the parentheses. If you take the function's signature and put a semicolon at the end of the line, it becomes a function prototype. Function prototypes are used by the compiler for purposes of type checking. That is, making sure you are sending the right function the right kind of data and using the data returned from the function properly.

Pros: Function block scope limits the visibility of a variable and, hence, encapsulates it from the rest of the program. This can make it much easier to track down a bug. If something goes wrong with the variable, at least the bug is confined to the function in which it is defined.

Cons: Because its scope is limited, if another function "needs" the value of a local variable, you need to pass the value of that variable to the function that needs it.

econjack

Pass by Value vs Pass by Reference

As you just saw, SquareIt() cannot permanently change the value of val in setup() because it is a copy of the value that is sent to the function. If we want SquareIt() to permanently change val in setup(), the function must know where val is stored in memory. So, why not tell SquareIt() where val lives? That's exactly what pass by reference does.


Pass by Reference in C

The address-of operator (&) let's you pass the address of where a variable lives in memory. That is, instead of sending a copy of the value of val, let's send its memory address to SquareIt(). We need to make a few changes to the program.

Code: [Select]

void setup() {
  int val;
  Serial.begin(9600);
  val = 10;
  SquareIt(&val);      // Note address-of operator
  Serial.print("The square is: ");
  Serial.println(val);
}

void loop() {
}

void SquareIt(int *ptr)   // Note indirection operator
{
  * ptr = * ptr * *ptr; 
}


First, when we call SquareIt(), we pass the function the memory address of where val resides in memory by using the address-of operator, &:

   SquareIt(&val);

The address-of operator tells the compiler to send val's memory address to the function instead of a copy of what is stored in val. Because of this, SquareIt() knows where to find val in memory.

Because you are sending the memory address of val instead of a copy of its value, you need to let the function in on the secret. You do this by using the indirection operator, *. The statement:

void SquareIt(int *ptr)

uses the indirection operator to tell the compiler that SquareIt() is being passed a memory address, not a "normal copy" of the variable's value. In other words, we have a pointer to where the variable resides in memory, not its actual value. So what does this statement do:

  * ptr = *ptr * *ptr;


We could rewrite this as:

  (*ptr) = (*ptr) * (*ptr);

where each parenthesized expression says: "Use the memory address held in ptr and go to that address in memory and fetch the value stored there." But, how many bytes should it fetch? Because we used the int type specifier when we defined the parameter in the function definition's signature:

void SquareIt(int *ptr)

the compiler knows to fetch "int-bytes" of data. This is called the pointer's scalar. Scalars are important because they tell the compiler the number of bytes associated with the pointer and are used in pointer arithmetic (e.g., increment and decrement). Since an int on an Arduino uses 2 bytes for storage, using the pointer scalar causes the code to fetch 2 bytes from the memory address held in the pointer. We know that val was assigned the value of 10 in setup(), so really what you're looking at is:

  (*ptr) = (*ptr) * (*ptr);
  (*ptr) = (10) * (10);
  (*ptr) = 100;


The last statement above takes the new value of 100 and writes it back to the same two bytes in memory. This means you have permanently changed the value of val back in setup()! This process of using a memory address to indirectly change the value of a variable is called indirection. You will often hear programmers say we used a pointer to val to permanently change its value.


Pass by Reference in C++


C++, which is the underlying language for the Arduino IDE, has a less clunky syntax for pass by reference. Let's take the same program as above, but use the more modern syntax of C++:

Code: [Select]

void setup() {
  int val;
  Serial.begin(9600);
  val = 10;
  SquareIt(val);      // Note No address-of operator used
  Serial.print("The square is: ");
  Serial.println(val);
}

void loop() {
}

void SquareIt(int &num)    // Note address-of operator
{
   num = num * num;  // Note indirection operator not needed
}


Using C++, the call to SquareIt() in setup() does not require the use of the address-of operator with the parameter val. The only place we see the address-of operator is in the signature for the SquareIt() function. Indeed, that's how the compiler knows to send SquareIt() the memory address of val in setup() instead of a copy of val's numeric value. Armed with this information, the compiler is smart enough to know to do indirection automatically on num to fetch the value of val (as defined in setup()) for the squaring operation. Because the result is assigned back into num in the statement:

  num = num * num;

but since num is actually pointing to val, it is val whose value is permanently changed. The C++ syntax is much easier to read.

econjack

#4
Mar 01, 2016, 02:32 pm Last Edit: Mar 02, 2016, 09:30 pm by econjack
Statement Block Scope

A variable with statement block scope extends its life and visibility from its point of definition to the end of the statement block in which it is defined. For example:

Code: [Select]

void setup() {
  int val;
  Serial.begin(9600);
  val = 0;
  //            | j defined here
  //            |
  //            |
  for (int j = 0; j < 100; j++)
  {
    val += j;
  }             
  // j dies here

  Serial.print("j = ");
  Serial.println(j);
  Serial.print("val equals: ");
  Serial.println(val);
}

void loop() {
}


If you try to compile this example, you get the error message: " 'j'  was not declared in this scope" in the statement that attempts to print the value of j. The reason is because variable j comes into scope after the first expression of the for loop is compiled, but it goes out of scope as soon as the closing brace of the for loop is reached. Therefore, the scope of j is confined to the for statement block itself. To get the code to compile, you must remove the second Serial statement. Indeed, you may as well delete the first two Serial statements since j is not in scope.

Pros: Statement block scope is about as narrowly-defined as a variable can be, which makes it pretty simple to determine what is changing the value of such a scoped variable. As a secondary benefit, in a perfect world, the memory allocated to j could be reclaimed the minute it goes out of scope. This process of reclaiming memory resources that are no longer in use is called garbage collection. Alas, the memory management for most microcontrollers is not perfect and you should not expect those unclaimed resources to become immediately available.

Cons: Statement block scope doesn't really have any cons. However, C coding style is to define local variables at the top of the function block, which also makes them easy to locate and determine their type. Because statement block variables have their definition buried in the statement block, they are a little harder to find.

Also, once program control is sent out of the statement block, that variable can no longer be used. For example:
Code: [Select]

for (int j = 0; j < number; j++) {
   if (array[j] == target)
      break;
}
found = array[j]; // ERROR...j is out of scope

You cannot use variable j outside of the for loop since it has statement block scope.

econjack

Scope and "Ties"

What if you wrote the following:

Code: [Select]

int val = 20; // Globally defined val

void setup() {
  int val = 10; // Locally defined val
  Serial.begin(9600);
  Serial.print("val = ");
  Serial.println(val);
}

void loop() {
}


In this program, we have val defined twice; once outside of setup() (i.e., global scope) and once inside setup() (i.e., local scope). Does it draw an error message? Not always. You may get a warning saying that the definition of val "shadows" another definition of val. In some cases, there is no warning. The reason is, because their scope levels are different, there is no conflict even though they have the same name.

The bigger question is: What value is printed for val? The rule is simple: If two variables have the same name, the variable with the more restrictive scope level is used. Because local (function block) scope is more restrictive than global scope, the val defined within setup() is the variable used in the print statement, so 10 is displayed.

econjack

#6
Mar 01, 2016, 02:38 pm Last Edit: Mar 02, 2016, 09:37 pm by econjack
Scope Across Two Different Source Files

Sometimes programs get to a size where it makes sense to split them apart into two or more source code files. You may have a variable defined with global scope in one file, but you need to use it in a different file. The problem is: The definition of a variable with global scope extends from its point of definition to the end of the current source file. Therefore, the global variable in scope in one file is not in scope in the second file where you want to use it.

For example:

File 1                                                    File 2

int val;                                              void SquareIt()
                                                         {
                                                                val = val * val;
                                                         }

The variable val only is in scope in File 1, but you are trying to use it in File 2. We can fix this problem with a single statement in File 2:


File 1                                                       File 2

int val;                                                extern int val; // New statement

                                                           void SquareIt()
                                                           {
                                                                 val = val * val;
                                                           }

The extern keyword can be used to allow us to use val in File 2 even though it is defined in File 1. Simplifying things a bit, the statement:

extern int val;

in File 2 says to the compiler: "The integer variable named val is not defined in this file, but let me use it as an integer variable name val everywhere I need to in File 2." The compiler says: "OK, I'll treat it as an int variable named val, and I'll leave a "hole" in the code I generate to hold the memory address of where val is actually stored." After the compiler has done its job, another program, called a linker, comes along and "links" the two files together by filling in all the memory address "holes" left by the compiler.

So, what is:

extern int val;

It is a data declaration, NOT a definition. The definition of val is done in File 1. The difference between a data definition and a data declaration is that a definition allocates a memory address for the variable; a declaration does not. Both build an attribute list for a variable (e.g., its name, its type, its scope, etc.), but only a definition sets aside a chunk of memory where the variable lives.

Sometimes you may see something like:

File 1                                                       File 2

int val;                                                 extern int val; // New statement
                                                           void SquareIt();

                                                           // Many lines of program code...

                                                           void SquareIt()
                                                           {
                                                               val = val * val;
                                                           }

As you saw earlier, the second statement in File 2 is a function prototype, and it is a declaration for the Squareit() function which is defined in File 2. Once again, the function prototype is a declaration that allows the compiler to build an attribute list for the function so that function can be used properly in File 1. However, the actual memory for the function is allocated in File 2, hence the definition for the function is in File 2.

econjack

The static Keyword

One of the advantages of global scope is that its value is not "reset" each time it is used as is often the case with a local variable. For example, suppose you want to maintain a count of how many times a function is called. Your first attempt results in:

Code: [Select]

void setup() {
  Serial.begin(9600);
}

void loop() {
  int count;

  count = Pass();
  Serial.print("count = ");
  Serial.println(count);
}

int Pass()
{
  int counter = 0;

  return ++counter;
}


When you compile and run it, it doesn't perform like you want, so you add a global variable to track the count. To do that, you move the definition of count outside of loop() and make it global:

Code: [Select]

int count;

void setup() {
  Serial.begin(9600);
}

void loop() {
  count = Pass();
  Serial.print("count = ");
  Serial.println(count);
}

int Pass()
{
  return ++count;
}


Now you've got the program working correctly, but you've turned count into a hooker by giving it global scope. Is there a way to encapsulate count, but still have the program work as before? Obviously "Yes" or I wouldn't have brought it up. Try:

Code: [Select]

void setup() {
  Serial.begin(9600);
}

void loop() {
  Serial.print("count = ");
  Serial.println(Pass());
}

int Pass()
{
  static int count = 0;    // Note the static keyword
  return ++count;
}


Note the use of the static keyword in the Pass() function. The variable count now has local scope, but the keyword static tells the compiler to allocate its memory in such a way that it is only defined once, when the program is first started, not each time Pass() is called. Variable count now retains its value between function calls to Pass() because it is allocated in memory as though it were a global variable. (Usually, global and static variables are allocated to a section of memory called the heap. Local variables are allocated in a section of memory called the stack.)

Note that we were also able to simplify the code in loop(), too. The big advantage here is that count now behaves like a global in terms of its value, but it is encapsulated in Pass() and has scope that is limited to Pass().

Another use of the word static is to allow duplicate variable names to be defined in two different files, but in the same program space. For example:

   File 1                  File 2

int val;                  int val;

would draw an error message from the linker. However, if you instead use:

   File 1                  File 2

static int val;               static int val;

the two variables do not need to be resolved by the linker. Their address space is resolved by the compiler because of the static keyword.

econjack

@AWOL: You posted faster than I could get everything posted!

larryd

@econjack
Good addition.
.
No technical PMs.
If you are asked a question, please respond with an answer.
If you are asked for more information, please supply it.
If you need clarification, ask for help.

-dev

I'd like to second the motion for "4) File scope".

This is a common term, and it's actually used in the language standards.  It is more about what parts of a compilation unit (a file) can reference a particular identifier (everything after its declaration).

"File scope" says nothing about whether its linkage is external (extern or nothing) or internal (static), nor its storage class (stack aka auto, static [at file scope {global or local}, or in a function with the static keyword], or heap).  o_O

"File scope" is also used in the negative sense, to mean "not inside a function".

But what to change in your description?  I think the accepted use of global implies external linkage.  It seems like your description of "global scope" is really "file scope", except you mention using it across files, in the entire program.  The later post, "Scope Across Two Different Source Files" is really a "global scope" description.  Maybe just remove the "global" concept out of the first post, and just call it File Scope, and then use the term "Global Scope" inside the "Two Different Source Files" post?

Otherwise, you've written a nice introduction to a crucial (and confusing) topic.   Remember, the guy out in front is the one with all the arrows in his back.  :)

Cheers,
/dev
Really, I used to be /dev.  :(

Robin2

#11
Mar 02, 2016, 11:05 pm Last Edit: Mar 02, 2016, 11:06 pm by Robin2
@econjack, excellent tutorial.

(For the future it is a good idea to create a new topic and make several trivial replies to it in order to reserve a sequence of consecutive posts before someone else intervenes. Then you can edit the trivial replies at your leisure.)

IIRC correctly if you create an Arduino project with several .ino files the IDE loads the principal file first (the one with the project name) and then loads the others in alphabetical order. Global variables declared in an "early" file are automatically visible to all the "later" files, but not vice versa.

...R
Two or three hours spent thinking and reading documentation solves most programming problems.

Boardburner2

Good thread this , can it be pinned, possibly could we have a pinned conf to include Robins do lots of things at once thread.

nickgammon

#13
Mar 03, 2016, 09:01 pm Last Edit: Mar 03, 2016, 09:03 pm by Nick Gammon
Discussions about this post (ie. meta-discussions) have been moved to another thread:

Quote
Discussions about "What is scope?"
Talking about whether or not this post should be a PDF or FAQ confuses the intention of this post, which is to talk about variable scope.
Please post technical questions on the forum, not by personal message. Thanks!

More info: http://www.gammon.com.au/electronics

Go Up