Allocation, Extent, Environment -------------------------------- A variable is stored in memory at a location (determined by the environment) corresponding to the variable. Constants do not need to be stored in memory. (Constants Vs Literals) Environment stores the binding between variable names and the corresponding locations in memory. The process of setting up this binding is known as storage allocation. 1. static allocation: allocation performed at compile time. Compiler translates all names to corresponding location in the code generated by it. Examples: all variables in original FORTRAN all global and static variables in C/C++/Java 2. stack allocation: needed in any language that supports the notion of local variables for procedures. Also called "automatic allocation" but this is somewhat of a moisnomer now. Examples: all local variables in C/C++/Java procedures and blocks Implementation: Compiler translates all names to relative offsets from a location called the "base pointer" or "frame pointer". The value of this pointer varies will, in general, be different for different procedure invocations. The pointer refers to the base of the "activation record" (AR) for an invocation of a procedure. The AR holds such information as parameter values, local variables, return address (where execution should resume on exit from the current procedure) etc. Example: int fact (int n) { if n = 0 then 1 else { int rv = n * fact (n-1) ; return rv ; } } main() { fact (5) ; } An activation record is created on the stack for each a call to function. The start of activation record is pointed to by a register called BP. The other elements (e.g., parameters, local variables) are accessed by offset from BP (e.g., first parameter may be accessed as BP, second parameter as BP+2, assuming each parameter occupies 2 bytes etc). Then on the first call to fact, BP is decremented (since stack grows downwards, towards address 0 in the memory) to point to new activation record, n is initialiaed to 5, rv is pushed but not initialized (since fact has to compute factorial of 4 yet) and then new activation record (again with n initialized to 4, rv is pushed on top of n) is created for the next recursive call and so on until when n becomes 0, at which time, stack is unrolled where successive rv's are assigned the value of n at that stage and the rv of previous stage (thus computing the factorial). Dynamic allocation - explicit allocation, freeing e.g., malloc/free in C, new/delete in C++ - explicit allocation, automatic free Java - automatic allocation, automatic free Lisp, SML Extent, Lifetime: duration of allocation of a variable Variables and Constants Variables are stored in memory, whereas constants need not be (they may be replaced at the compile time by the compiler with the value they refer to). Value of variables can change at runtime. Variables have a location and value. Location refers to l-value (the value on left of assignment) and value refers to r-value (the value on right) e.g., in x = y, r-value of y is stored in location that is l-value of x. Variables and Constants ------------------------ Variables are stored in memory, whereas constants need not be (they may be replaced at the compile time by the compiler with the value they refer to). Variables are associated with a location and value. The location is called its l-value, while the value stored in this location is called the r-value. These two values can be captured by a diagram such as: Location +----------+ +------------------+ | +-----+ | Name -------| other attributes |----| |Value| | +------------------+ | +-----+ | +----------+ Henceforth, we will omit the "other attributes" from this picture. (In the notes, we use this way to represent the box-and-circle diagram used in your textbook. In particular, this picture captures the first figure in Section 5.5.1 of your textbook.) In assignment statements of the form x = y, we implicitly use the l-value of the variable x on the lhs of the assignment, and the r-value of the variable on the assignment. In other words, we are updating the LOCATION corresponding to x with the VALUE corresponding to y. The semantics of assignment can be understood precisely by looking at the following box-and-circle diagram: +----------+ | +-----+ | y --------| |Value| | | +--|--+ | +-----|----+ | +-----|----+ | V | | +-----+ | x --------| |Value| | | +-----+ | +----------+ Some languages provide explicit operators to access the l-value or r-value of a variable. For instance, the "*" operator in C/C++ accesses the r-value of a variable, even if it is used on the lhs of an assignment, such as *x = y. Similarly, "&" operator is used to override the use of r-values on the rhs, as in x = &y Most imperative languages use the above semantics for assignment, which requires the value of y to copied into the location for x. This called "STORAGE SEMANTICS". Some languages use a POINTER SEMANTICS, where the locations for x and y are simply shared: +----------+ | +-----+ | y --------| |Value| | /| +-----+ | / +----------+ / / / / x -- Pointer semantics is uncommon in languages that permit assignment, since it causes confusion: a future assignment to y may change the value of x. Pointer semantics is used in SNOBOL, and in some situations in LISP. Constants ----------- The semantics of constants is given in terms of their r-values. Although they may be stored in memory, we do not typically think about their l-values: since they are constants, their contents cannot be changed, so it is of no use to access their l-values. Since we only think about the r-value of constants, we say that constants have a "VALUE SEMANTICS" Note that the notion of a constants is symbolic: A constant is simply a symbolic name for a value. Aliases, Dangling References and Garbage ----------------------------------------- Alias: Two variables have the same l-value In C++, we can define reference types using the syntax &: int& y Reference variables have to be initialized with their l-value, e.g.: int x = 1; int& y = x; The resulting picture is: +----------+ | +-----+ | x --------| | 1 | | /| +-----+ | / +----------+ / / / / y -- Later, if we statement y = 3 then the value of x is also changed to 3: +----------+ | +-----+ | x --------| | 3 | | /| +-----+ | / +----------+ / / / / y -- It would not be apparent just by looking at the statement "y = 3" that the value of x would change as a result of this statement. For this reason, we say that the assignment to y has the "side-effect" of changing the value of x. Side-effects cause confusion for someone trying to read or understand a program, and are hence to be avoided in general. Since aliases cause such side-effects, one should be careful to use aliases sparingly. Aliases can also be created due to the use of pointer variables. int *x = NULL; +----------+ | +-----+ | x --------| | o--|-|-----+ | +-----+ | | +----------+ --+-- --- x = new int; *x = 4; +----------+ +----------+ | +-----+ | | +-----+ | x --------| | o--|-|----->| | 4 | | | +-----+ | | +-----+ | +----------+ +----------+ int *y; y = x; Execution of these two statements will result in: +----------+ +----------+ | +-----+ | | +-----+ | x --------| | o--|-|--+-->| | 4 | | | +-----+ | | | +-----+ | +----------+ | +----------+ | | +----------+ | | +-----+ | | y --------| | o--|-|--+ | +-----+ | +----------+ Note that the assignment statement has worked in the same way as it did when x and y were integer variables: the location of the variable on the lhs has been updated with the r-value of the variable on the rhs. It happens to be the case that the r-value of x is itself a pointer, and hence the result shown above. Arrays Vs Pointers in C ------------------------- In C, an array of type T is considered to have the same type as a pointer to type T. For instance, if we have int a[5]; int *b; then b and a have the same type, int*. But the semantics of array and pointers are quite different: -- you cannot assign a value to "a", but you can assign a value to "b" This is because the l-value of "a" cannot be changed. -- you can assign a value to *a, but before you can assign to *b, you have to first allocate storage for it. The box-and-circle diagrams look as follows: (5 locations, each can store integer) +-------------------------+ | +---+---+---+---+---+ | a --------| | | | | | | | | +---+---+---+---+---+ | +-------------------------+ (1 location that can store an integer pointer) +----------+ | +-----+ | b --------| | o--|-|-----> pointer to nowhere | +-----+ | (Unitialized) +----------+ An assignment *a = 3 will result in: +-------------------------+ | +---+---+---+---+---+ | a --------| | 3 | | | | | | | +---+---+---+---+---+ | +-------------------------+ whereas "*b = 3" will result in the following picture, assuming that b has previously been initialized with a pointer to a location that can hold an integer (using a statement such as "b = new int;") +----------+ +----------+ | +-----+ | | +-----+ | b --------| | o--|-|----->| | 3 | | | +-----+ | | +-----+ | +----------+ +----------+ Garbage --------- A location that has been allocated, but is no longer accessible in a program: int *x; int y = 3; x = new int; +----------+ +----------+ | +-----+ | | +-----+ | x --------| | o--|-|----->| | | | | +-----+ | | +-----+ | +----------+ +----------+ +----------+ | +-----+ | y --------| | 3 | | | +-----+ | +----------+ *x = 5; x = &y; results in: (Garbage location: allocated, but no +----------+ +----------+ pointers to it, which | +-----+ | | +-----+ | means that it cannot be x --------| | o--|-|-+ | | | | accessed) | +-----+ | | | +-----+ | +----------+ | +----------+ | +----------+ | | +-----+ |<+ y --------| | 3 | | | +-----+ | +----------+ Garbage leads to loss of available memory, but does not otherwise affect correctness of programs, i.e., programs that create garbage will operate correctly as long as they do not run out of memory. A program that produces garbage is said to have a MEMORY LEAK. Since premature freeing of locations (see DANGLING POINTERS below) causes more serious problems, programmers often err on the side of writing programs that leak memory. The net result is that long-running programs eventually run out of memory and crash. While this behavior may be tolerable for programs such as browsers, it is unacceptable for server programs with high availability requirements. In language with a strong type discipline (eg Java, SML) it is possible to detect garbage automatically and reclaim it. The process of search/reclamation is called GARBAGE COLLECTION. In languages with lax type systems (C/C++), automatic garbage collection is much harder. Dangling pointers ------------------ A pointer that points to a location that has been deallocated. Premature deallocation of memory e.g., int *x, *y, *z ; x = new int ; *x = 3 ; y = x ; delete x ; x = NULL; /* at this place, y is dangling - i.e., y points to a location that has been freed (so program shouldn't have access to it any longer), but in this case, y still points to it and hence has access to it. So y is dangling pointer */ z = new int ; /* it is possible that the operating system (or run-time system) may allocate the same space (ie previously pointed to by x) to z */ *z = 5 ; *y = 2 ; /* since y and z point to the same location, *z = 2 and not 5 as one would expect */ Pictorially, after "y = x" statement is executed in the above program: +----------+ +----------+ | +-----+ | | +-----+ | x --------| | o--|-|--+-->| | 3 | | | +-----+ | | | +-----+ | +----------+ | +----------+ | | +----------+ | | +-----+ | | y --------| | o--|-|--+ | +-----+ | +----------+ After "delete x" and assignment of NULL to x: +----------+ +----------+ | +-----+ | | +-----+ | x --------| | o--|-|--+ +-->| | 3 | | | +-----+ | | | | +-----+ | +----------+ - | +----------+ | | +----------+ | | +-----+ | | y --------| | o--|-|--+ | +-----+ | +----------+ (note that x is no longer pointing to the locatation that contains value "3") After "z = new int" statement : +----------+ +----------+ | +-----+ | | +-----+ | x --------| | o--|-|--+ +-->| | 3 | | | +-----+ | | | | +-----+ | +----------+ - | +----------+ | | +----------+ | | +-----+ | | y --------| | o--|-|--+ | +-----+ | | +----------+ | | | +----------+ | | +-----+ | | z --------| | o--|-|--+ | +-----+ | +----------+ i.e., both y and z point to the same location because the memory allocator allocated the location (that is now free) that contains 3 to z. After "*z = 5" statement: +----------+ +----------+ | +-----+ | | +-----+ | x --------| | o--|-|--+ +-->| | 5 | | | +-----+ | | | | +-----+ | +----------+ - | +----------+ | | +----------+ | | +-----+ | | y --------| | o--|-|--+ | +-----+ | | +----------+ | | | +----------+ | | +-----+ | | z --------| | o--|-|--+ | +-----+ | +----------+ After "*y = 2" statement: +----------+ +----------+ | +-----+ | | +-----+ | x --------| | o--|-|--+ +-->| | 2 | | | +-----+ | | | | +-----+ | +----------+ - | +----------+ | | +----------+ | | +-----+ | | y --------| | o--|-|--+ | +-----+ | | +----------+ | | | +----------+ | | +-----+ | | z --------| | o--|-|--+ | +-----+ | +----------+ Hence the value of *z is now 2!