Run-Time Environments

Introduction

A compiler must accurately implement the source language definition, including:
- Names, Scopes, Bindings
- Data types
- Operators
- Procedures, Parameters
- Flow-of-Control constructs
The compiler works with the OS and other system software to support these abstractions.
The compiler creates and manages a run-time environment in which it assumes its target programs are executed.
A runtime environment is a set of data structures maintained at runtime to implement high-level structures such as:
- Functions
- Objects
- Exceptions
- Data types
- Flow-of-Control constructs
Run time environment encompasses everything needed to execute a program.
Run-time environment deals with issues such as:
1. Allocation of storage locations for objects.
2. Mechanisms for accessing variables.
3. Linkages between procedures.
4. Mechanisms for passing parameters.
5. Interfaces to the OS, i/o devices, and other programs.
Questions to consider regarding function execution:
- What does the dynamic execution of functions look like?
- Where is the executable code for functions located?
- How are parameters passed in and out of functions?
- Where are local variables stored?

Storage Organization

The compiler deals with logical address space.
The OS maps logical addresses to physical addresses.
The representation of an object program (Target Code) in the logical address space consists of:
1. Program area (Code)
2. Data area
Note:
- Logical Address Space: Set of all logical addresses generated by the CPU in reference to a program.
- Physical Address: Location in a memory unit, mapped to the corresponding logical addresses.
Runtime storage is subdivided to hold:
1. Target code (static, size determined at compile time)
2. Static data objects
3. Dynamic data objects – Heap Memory
4. Automatic data objects – Stack memory
Figure 7.1 illustrates the subdivision of run-time memory into code and data areas.
The size of the generated target code is fixed at compile time.
The compiler places the executable target code in a statically determined area called Code, usually at the low end of memory.
The size of data objects, such as Global Constants, are known at compile time and placed in a statically determined area called Static.
Static allocation is possible only when the compiler knows the size of data objects at compile time.
Addresses of statically allocated objects can be compiled into the target code.
To maximize space utilization, Stack and Heap are at opposite ends of the address space.
Stack and Heap areas are dynamic; their sizes change as the program executes.
Heap memory is used to dynamically allocate memory to variables and reclaim it when no longer required.
The Stack stores data structures called activation records generated during procedure calls.
The Stack grows with each call and shrinks with each procedure return/termination.

Static Versus Dynamic Storage Allocation

Static allocation: A storage-allocation decision is static if it can be made by the compiler looking only at the text of the program.

Static allocation

Names are bound to storage locations at compilation time.
Bindings do not change, so no run time support is required.
Names are bound to the same location on every invocation.
Values are retained across activations of a procedure.
Limitations:
- Size of all data objects must be known at compile time.
- Data structures cannot be created dynamically.
- Recursive procedures are not allowed.

Dynamic allocation

Storage allocation decisions are made when the program is running.
- Stack storage allocation
- Heap storage allocation
Compilers use a combination of stack and heap storage allocation for dynamic storage allocation.
Stack storage:
- Local variables are stored in a stack when a function is called and destroyed upon return.
- A stack is used when a variable is not used outside that function.
- Stack automatically cleans up the object.
- Not easily corrupted
- Variables cannot be resized.
Heap storage:
- The heap allows objects to obtain storage when created and return it when invalidated.
- Garbage collection detects useless data elements and reuses their storage.
- Automatic garbage collection is an essential feature of many modern languages.

Stack Vs Heap Memory

Stack is a linear data structure whereas Heap is a hierarchical data structure.
Stack memory will never become fragmented whereas Heap memory can become fragmented as blocks of memory are first allocated and then freed.
Stack accesses local variables only while Heap allows you to access variables globally.
Stack variables can’t be resized whereas Heap variables can be resized.
Stack memory is allocated in a contiguous block whereas Heap memory is allocated in any random order.
Stack doesn’t require to de-allocate variables whereas in Heap de-allocation is needed.
Stack allocation and deallocation are done by compiler instructions whereas Heap allocation and deallocation is done by the programmer.
Example illustrating Stack and Heap Storage Allocation:
- Code snippet involving methods m1 and m2 with variable declarations and TextField object creation.
- Checkpoints are indicated in the code to illustrate the state of the stack and heap.

Stack Allocation of Space: Activation Tree & Activation Record

Activation:
- Each execution of a procedure is referred to as an activation.
Activation tree:
- A tree structure representing all function calls made by a program on a particular execution.
- The activation tree shows how control enters and leaves activations.
- The root of the activation tree represents the activation of main.
Activation tree is a representation of the activations of procedures during the running of an entire program, represented by a tree.
Activation tree depends on the runtime behavior of a program and is not always determined at compile-time.
The static equivalent is the call graph.

Properties of activation trees

Each node represents an activation of a procedure.
The root shows the activation of the main function.
Node $a$ is the parent of $b$ if control flows from $a$ to $b$ .
Node $a$ is to the left of $b$ if the lifetime of $a$ occurs before $b$ .
Activations are shown in the order they are called, from left to right.
One child must finish before the activation to its right can begin.
The main function call serves as the root of the activation tree.
Example: A quicksort program reads 9 integers into an array a and sorts them using the recursive quicksort algorithm.
- The main function calls readArray, sets sentinels, and calls quicksort on the entire data array.
Figure 7.3 shows a sequence of calls resulting from the execution of the program.
- The call to partition(1,9) returns 4, with elements less than the separator in a[1] through a[3] and larger elements in a[5] through a[9].

Additional points on Activations

Each execution of a procedure is an activation of the procedure.
The lifetime of an activation is the sequence of steps present in the execution of the procedure.
If $a$ and $b$ are two procedures, their activations will be:
- Non-overlapping (one is called after the other) or
- Nested (nested procedures).
A procedure is recursive if a new activation begins before an earlier activation of the same procedure has ended.
Figure 7.4 shows an activation tree representing calls during an execution of quicksort.
The tree is only one possibility, as the arguments of subsequent calls and the number of calls along any branch are influenced by the values returned by partition.
The use of a run-time stack is enabled by relationships between the activation tree and program behavior:

Relationships

The sequence of procedure calls corresponds to a preorder traversal of the activation tree.
The sequence of returns corresponds to a postorder traversal of the activation tree.
Flow of control in a program corresponds to a depth-first traversal of the activation tree.
If control lies within a particular activation of some procedure, corresponding to a node $N$ of the activation tree, the activations that are currently open (live) are those that correspond to node $N$ and its ancestors.
If an activation of procedure $p$ calls procedure $q$ , then that activation of $q$ must end before the activation of $p$ can end.

Three common cases

The activation of $q$ terminates normally. Control resumes just after the point in $p$ where the call to $q$ was made.
The activation of $q$ , or some procedure $q$ called, aborts making it impossible for execution to continue. $p$ ends simultaneously with $q$ .
The activation of $q$ terminates due to an exception that $q$ cannot handle. Procedure $p$ may handle the exception, in which case the activation of $q$ has terminated while the activation of $p$ continues, although not necessarily from the point at which the call to $q$ was made. If $p$ cannot handle the exception, then this activation of $p$ terminates at the same time as the activation of $q$ , and presumably the exception will be handled by some other open activation of a procedure.

Exercises

Exercise 1: Write the Activation tree for the code:

int main() {
 Fib(3);
}
int Fib(int n) {
if (n<=1) return n;
return Fib(n-1) + Fib(n-2);
}

Exercise 2: Write Activation tree for the code:

main() {
printf(“Enter Your Name: “);
scanf(“%s”, username);
show_data(username);
printf(“Press any key to continue…”);
. . .
}
int show_data(char *user) {
printf(“Your name is %s”, user);
return 0;
}

Activation Records (or Stack frames)

An activation record is a block of storage that contains all the necessary information required by a single execution of a procedure.
An activation record is used to store the status of the machine (the value of PC and machine registers), actual parameters, return values, access and control links, and local variables when a procedure call occurs.
Function calls are implemented using a stack of activation records.
- Calling a function pushes a new activation record onto the stack.
- Returning from a function pops the current activation record from the stack.
Figure 7.5 shows a general activation record structure.
When control returns from the called procedure, the activation of the calling procedure can be restarted after restoring the values of relevant registers and setting the program counter to the point immediately after the call.
“Once a function returns, its activation record cannot be referenced again.”
Procedure calls and returns are usually managed by a run-time stack called the control stack.
Each live activation has an activation record on the control stack, with the root of the activation tree at the bottom.
The entire sequence of activation records on the stack corresponds to the path in the activation tree to the activation where control currently resides. The latter activation has its record at the top of the stack.
Example: For the activation tree of Fig. 7.4, if control is currently in the activation $q(2,3)$ , then the activation record for $q(2,3)$ is at the top of the control stack. Below that is the activation record $q(1,9)$ , and at the bottom is the activation record for $m$ , the main function and root of the activation tree.
The contents of activation records vary with the language being implemented.

Data in an Activation Record

A list of the kinds of data that might appear in an activation record are:

Temporary values, such as those arising from the evaluation of expressions, in cases where those temporaries cannot be held in registers.
Local data belonging to the procedure whose activation record this is.
A saved machine status, with information about the state of the machine just before the call to the procedure. This information typically includes
- return address (value of the program counter) and
- contents of registers that were used by the calling procedure and that must be restored when the return occurs.
An "access link" may be needed to locate data needed by the called procedure but found elsewhere, e.g., in another activation record.
A control link, pointing to the activation record of the caller.
Space for the return value of the called function, if any. (May be placed in a register for efficiency).
The actual parameters used by the calling procedure. (Commonly placed in registers when possible for greater efficiency).

Example 7.4: Figure 7.6 shows snapshots of the run-time stack as control flows through the activation tree of Fig. 7.4. Since array a is global, space is allocated for it before execution begins with an activation of procedure main, as shown in Fig. 7.6(a).
When control reaches the first call in the body of main, procedure r is activated, and its activation record is pushed onto the stack (Fig. 7.6(b)). The activation record for r contains space for local variable i. When control returns from this activation, its record is popped, leaving just the record for main on the stack.
Control then reaches the call to $q$ (quicksort) with actual parameters 1 and 9, and an activation record for this call is placed on the top of the stack, as in Fig. 7.6(c). The activation record for $q$ contains space for the parameters $m$ and $n$ and the local variable $i$ , following the general layout in Fig. 7.5.
Activations $p(1,3)$ and $q(1,0)$ have begun and ended during the lifetime of $q(1,3)$ , leaving the activation record for $q(1,3)$ on top (Fig. 7.6(d)).
Notice that when a procedure is recursive, it is normal to have several of its activation records on the stack at the same time.

Stack Allocation of Space: Calling Sequence and Return Sequence

Calling Sequence (is a code sequence)
- Procedure calls are implemented by what are known as Calling Sequences.
- Calling sequence is a code sequence that allocates an activation record on the stack and enters information into its fields.
Return Sequence (is a code sequence)
- A return sequence is similar code to restore the state of the machine so the calling procedure can continue its execution after the call.
Calling sequences and the layout of activation records may differ greatly, even among implementations of the same language.
The code in a calling sequence is often divided between the calling procedure (the "caller") and the procedure it calls (the "callee").
- Responsibility is shared between the caller and the callee
There is no exact division of run-time tasks between caller and callee; the source language, the target machine, and the operating system impose requirements that may favor one solution over another.

Design of calling sequences and activation records

Values communicated between caller and callee are generally placed at the beginning of the callee’s activation record, so they are as close as possible to the caller's activation record.
- The caller can compute the values of the actual parameters of the call and place them on top of its own activation record, without creating the entire activation record of the callee or knowing the layout of that record.
- The callee knows where to place the return value relative to its own activation record.
Fixed-length items are generally placed in the middle, such items typically include the control link, the access link, and the machine status fields.
- If the same components of the machine status are saved for each call, then the same code can do the saving and restoring for each.
Items whose size may not be known early enough are placed at the end of the activation record.
- Most local variables have a fixed length, which can be determined by the compiler by examining the type of the variable.
- Some local variables have a size that cannot be determined until the program executes (e.g., dynamically sized array).
- The amount of space needed for temporaries usually depends on how successful the code-generation phase is in keeping temporaries in registers.
Top-of-stack pointer (i.e., top_sp) points to the end of the fixed-length fields in the activation record.
- Fixed-length data can then be accessed by fixed offsets, known to the intermediate-code generator, relative to the top-of-stack pointer.
- Variable-length fields in the activation records are actually "above" the top-of-stack. Their offsets need to be calculated at run time, but they too can be accessed from the top-of stack pointer, by using a positive offset.
A register top-sp points to the end of the machine status field in the current top activation record.
The top-sp pointer position within the callee's activation record is known to the caller, so the caller can be responsible for setting top-sp before control is passed to the callee.

Calling sequence and its division between caller and callee

The caller evaluates the actual parameters and stores them at the beginning of the callee's activation record.
The caller stores a return address into the callee's activation record.
The caller stores the old value of top-sp into the callee's activation record (i.e., in control link field). The caller then increments top-sp to the position shown in Fig. 7.7.
The callee saves the register values and other status information.
The callee initializes its local data and begins execution.

Return sequence

The callee places the return value next to the parameters.
Using information in the machine-status field, the callee restores top-sp and other registers and branches to the return address.
Although top-sp has been decremented, the caller knows where the return value is, relative to the current value of top-sp and may use that value.

Stack Allocation of Space: Variable-Length Data on the Stack

The run-time memory-management system must deal frequently with the allocation of space for objects the sizes of which are not known at compile time, but which are local to a procedure and thus may be allocated on the stack.
Example: A local array whose size depends upon a parameter.
In modern languages, objects whose size cannot be determined at compile time are allocated space in the heap but would require garbage collection.
A common strategy for allocating variable-length arrays is shown in Fig. (variable-length arrays are arrays whose size depends on the value of one or more parameters of the called procedure)
A register top-sp points to the end of the machine status field in the current top activation record.
Procedure $p$ has three local arrays, whose sizes are not determined at compile time.
The storage for these arrays is not part of the activation record for $p$ , although it does appear on the stack.
Only a pointer to the beginning of each array appears in the activation record itself.
When $p$ is executing, these pointers are at known offsets from the top-of-stack pointer, so the target code can access array elements through these pointers.
Also shown is the activation record for a procedure $q$ , called by $p$ . The activation record for $q$ begins after the arrays of $p$ , and any variable-length arrays of $q$ are located beyond that.

Additional Points

Access to the data on the stack is through two pointers, top and top-sp.
- top-sp finds local, fixed-length fields of the top activation record.
- top marks the actual top of the stack; it points to the position at which the next activation record will begin.
top-sp points to the end of this field in the activation record for $q$ . From there, the control-link field for $q$ leads to the place in the activation record for $p$ , where top-sp pointed when $p$ was on top.
The code to reposition top and top-sp can be generated at compile time in terms of sizes known at run time.
When $q$ returns:
- top-sp can be restored from the saved control link in the activation record for $q$ .
- The new value of top is top-sp minus the length of the machine-status, control and access link, return-value, and parameter fields in $q$ 's activation record. This length is known at compile time to the caller, although it may depend on the caller if the number of parameters can vary across calls to $q$ .

Access to Nonlocal Data on the Stack

Local data: Data that belongs to procedure $p$ .
Nonlocal data: Data used within procedure $p$ , but does not belong to $p$ . E.g., a variable used within $p$ , but defined in procedure $q$ .
Access becomes more difficult when procedures can be declared inside other procedures (e.g., ML language).

ML

ML (“Meta Language”) is a general-purpose functional programming language.
ML is statically-scoped, i.e., once the variables are declared and initialized, they cannot change.
Variables are declared using val <name> = <expression>.
Functions are declared using fun <name> ( <arguments> ) = < body >.
Function bodies have the form let <list of definitions> in <statements> end.
Functions can refer to variables available when the function is defined.
ML supports higher-order functions.
- Takes functions as arguments
- Can construct functions
- Can return other functions
ML has no iteration; the effect of iteration is achieved through recursion.
ML takes lists and labeled tree structures as primitive data types.
ML does not require declaration of its data types and recognizes types at compile time.
Data that is local to the procedure is at the activation record at the top of the stack and can be accessed using the top-sp pointer of the stack.

Access to Nonlocal data in Nested Procedures

Access becomes difficult when a language allows nested procedures, i.e., a procedure can access variables whose declarations surround its own declarations.
If $p$ is within $q$ , the relative positions of their activation records at compile time cannot be determined (it’s a dynamic decision).
$p$ and/or $q$ may be recursive, leading to several activation records of $p$ and $q$ on the stack.
If $x$ is a nonlocal name used in nested procedure $p$ , finding its declaration is a static decision.
If $x$ is defined in $q$ , finding the relevant activation of $q$ from an activation of $p$ is a dynamic decision, requiring additional runtime information about activations.
Two common strategies: Access links and Displays

Access to Nonlocal Data on the Stack: Access Links and Displays

Access Links (Refers to non-local data in other activation record)
- The main purpose of the access link is to access the data that is not present in the local scope of the activation record. It is a static link.
- The chain of access links traces the static structure (or scopes) of the program.
- An access link is a pointer to each activation record that obtains a direct implementation of lexical scope for nested procedures.
If procedure $p$ is nested immediately within procedure $q$ in the source code, then the access link in any activation record of $p$ points to the most recent activation record of $q$ .
The nesting depth of $q$ must be exactly one less than the nesting depth of $p$ .
Access links form a chain from the activation record at the top of the stack to a sequence of activations at progressively lower nesting depths.
Along this chain are all the activations whose data and procedures are accessible to the currently executing procedure.
Suppose procedure $p$ activation record is at the top of the stack and has nesting depth $np$ , and $p$ needs to access $x$ , which is an element defined within some procedure $q$ that surrounds $p$ and has nesting depth $nq$ .
To find nonlocal $x$ , we start at the activation record for $p$ at the top of the stack and follow the access link $np - nq$ times, from activation record to activation record.
Finally, we wind up at an activation record for $q$ , this activation record contains the element $x$ that we want.
How are access links determined? What should happen when a procedure $q$ calls procedure $p$ , explicitly?

Two cases

$nq < np$ (Procedure $p$ is at a higher nesting depth than $q$ )
- Called procedure $p$ is nested within $q$ .
- Therefore, $p$ must be declared in $q$ , or the call by $q$ will not be within the scope of $p$ .
- The access link in $p$ should point to the access link of the activation record of the caller $q$ .
$np == nq$
- Procedures are at the same nesting level
- Access link of called procedure $p$ is the same as $q$

When a procedure $p$ is passed to another procedure $q$ as parameter, $q$ may not know the context in which $p$ appears, and it's impossible to set the access link for $p$ .
- Solution: When procedures are used as parameters, the caller needs to pass the proper access link as the parameter along with the name of the procedure parameter.
  Caller always knows the link.
One problem with the access-link approach to nonlocal data is that if the nesting depth gets large, we may have to follow long chains of links to reach the data we need.
A more efficient implementation uses an auxiliary array $d$ , called the display, which consists of one pointer for each nesting depth.
At all times, $d[i]$ is a pointer to the highest activation record on the stack for any procedure at nesting depth $i$ .
Previous value of $d[i]$ must be stored in the current A.R. with depth $i$ .
The advantage of using a display is that execution only needs to search in activations from $d[i]$ , where $i$ is the nesting depth of $q$ .
Follow the pointer $d[i]$ to the activation record for $q$ , wherein $x$ should be defined at a known offset.

	Access Links	Displays
Cost of lookup	Varies. Common case is cheap, but long chains can be costly.	Constant
Cost of maintenance	Variable	Constant

Manipulating Access Links (Additional Explanations)

How are access links determined? What should happen when a procedure $q$ calls procedure $p$ , explicitly.

Two cases

CASE 1:

$nq < np$ (Procedure $p$ is at a higher nesting depth than $q$ ).

Then $p$ must be defined immediately within $q$ , or the call by $q$ would not be at a position that is within the scope of the procedure name $p$ . Thus, the nesting depth of $p$ is exactly one greater than that of $q$ , and the access link from $p$ must lead to $q$ .
It is a simple matter for the calling sequence to include a step that places in the access link for $p$ a pointer to the activation record of $q$ .
Examples include the call of quicksort by sort to set up Fig. 7.11(a), and the call of partition by quicksort to create Fig. 7.11(c).

CASE 2:

$nq > np$ (The nesting depth np of p is less than the nesting depth nq of q)

In order for the call within $q$ to be in the scope of name $p$ , procedure $q$ must be nested within some procedure $r$ , while $p$ is a procedure defined immediately within $r$ .
The top activation record for $r$ can therefore be found by following the chain of access links, starting in the activation record for $q$ , for $nq - np + 1$ hops. Then, the access link for $p$ must go to this activation of $r$ .
Nesting depth of exchange is 2
Nesting depth of partition is 3