CST424:PROGRAMMING
PARADIGMS
MODULE 2
Data Types
• A data type defines a collection of data values and a set
of predefined operations
• A descriptor is the collection of the attributes of a
variable.
• In an implementation, a descriptor is an area of memory
that stores the attributes of a variable.
• If the attributes are all static, descriptors are required
only at compile time.
• These descriptors are built by the compiler, usually as a
part of the symbol table, and are used during compilation.
Data Types
• For dynamic attributes, however, part or all of the
descriptor must be maintained during execution.
• In this case, the descriptor is used by the run-time system.
• In all cases, descriptors are used for type checking and
building the code for the allocation and deallocation
operations
One design issue for all data types:
• What operations are defined and how are they specified?
Primitive Data Types
• Almost all programming languages provide a set of
primitive data types
– Primitive data types: Those not defined in terms of
other data types
– Integer, floating-point, Boolean, character
• Some primitive data types are merely reflections of the
hardware
• Others require only a little non-hardware support for
their implementation
Numeric - Integer
• The most common primitive numeric data type is integer.
• Many computers now support several sizes of integers.
• For example,Java includes four signed integer sizes: byte, short, int,
and long.
• A signed integer value is represented in a computer by a string of
bits, with one of the bits (typically the leftmost) representing the
sign.
• Most integer types are supported directly by the hardware.
• Most computers now use a notation called two’s complement to
store negative integers, which is convenient for addition and
subtraction
• Ones-complement notation has the disadvantage that it has two
representations of zero.
Numeric - Floating Point
• Floating-point data types model real numbers, but the
representations are only approximations for many real values.
• For example, neither of the fundamental numbers pi or e (the base
for the natural logarithms) can be correctlyrepresented in floating-
point notation.
• Another problem with floating-point types is the loss of accuracy
through arithmetic operations
• Floating-point values are represented as fractions and exponents
(the IEEE Floating-Point Standard 754 format)
• Most languages include two floating-point types, often called float
and double.
Numeric - Floating Point
• The float type is the standard size, usually being stored in
four bytes of memory.
• The double type is provided for situations where larger
fractional parts and/or a larger range of exponents is
needed.
• Double-precision variables usually occupy twice as much
storage as float variables and provide at least twice the
number of bits of fraction
Numeric - Floating Point
• The collection of values that can be represented by a
floating-point type is defined in terms of precision and
range.
• Precision is the accuracy of the fractional part of a value,
measured as the number of bits.
• Range is a combination of the range of fractions and,
more important, the range of exponents.
Complex
• Some programming languages support a complex
data type—for example,Fortran and Python.
• Complex values are represented as ordered pairs of
floating-point values.
• In Python, the imaginary part of a complex literal is
specified by following it with a j or J—for example,(7
+ 3j)
• Languages that support a complex type include
operations for arithmeticon complex values.
Numeric - Decimal
• Decimal data types store a fixed number of decimal digits,
with the decimal point at a fixed position in the value.
• Decimal types have the advantage of being able to
precisely store decimal values, at least those within a
restricted range.
• The disadvantages of decimal types are that
– the range of values is restricted because no exponents
are allowed,
– and their representation in memory is mildly wasteful
Boolean
• Simplest of all
• Range of values: two elements, one for “true”and one for
“false”
• Could be implemented as bits, but often as bytes
• Advantage: readability
Character
• Character data are stored in computers as numeric codings.
• The most commonly used coding was the 8-bit code ASCII
(American Standard Code for Information Interchange), which uses
the values 0 to 127 to code 128different characters.
• ISO 8859-1 is another 8-bit character code, but it allows 256
different characters.
• Ada 95+ uses ISO 8859-1.
• Since the ASCII character set became inadequate, in 1991, the
Unicode Consortium published the UCS-2 standard, a 16-bit
character set.
• Unicode includes the characters from most of the
world’s natural languages.
• For example, Unicode includes the Cyrillic alphabet,
used in Serbia, and the Thai digits.
• The first 128 characters of Unicode are identical to
those of ASCII.
• Java was the first widely used language to use the
Unicode character set.
• JavaScript, Python, Perl,C#, and F# uses Unicode
Character String Types
• Values are sequences of characters
• Design issues:
• –Is it a primitive type or just a special kind of array?
• –Should the length of strings be static or dynamic?
• Typical operations:
–Assignment and copying
–Comparison (=, >, etc.)
–Catenation
–Substring reference
–Pattern matching
Character String Types
• C and C++
– Not primitive; use char arrays, terminate with a null
character (0)
– A library of functions to provide operations instead of
primitive operators
– Problems with C string library:
• Functions in library do not guard against overflowing the
destination, e.g., strcpy(src, dest);
What if src is 50 bytes and dest is 20 bytes?
String as an Array of Characters& String
as Aa primitive type:
Mutability: In arrays of characters, you can often change the
characters in place, whereas strings as a primitive type are
usually immutable.
Ease of Use: Using strings as a primitive type provides higher-
level abstractions and convenience (like built-in methods for
concatenation, substring, etc.), while using an array of
characters requires manual management of the array size and
operations.
Memory Management: A string as an array might require more
explicit memory management (like resizing or freeing memory
in languages like C), while strings as primitive types often handle
memory management internally.
Character String Length Options
• Static: COBOL, Java’s String class
– Length is set when string is created
• Limited dynamic length: C and C++
– Length varying, up to a fixed maximum
– In C-based language, a special character is used to
indicate the end of a string’s characters, rather
than maintaining the length
• Dynamic (no maximum): SNOBOL4, Perl,
JavaScript
• Ada supports all three string length options
Character String Types
• Static length: compile-time descriptor
• Limited dynamic length: may need a run-time descriptor
for length
• Dynamic length: need run-time descriptor; allocation/de-
allocation is the biggest implementation problem
• The limited dynamic strings of C and C++ do
not require run-time descriptors,because the
end of a string is marked with the null
character.
• They do not need the maximum length,
because index values in array references are
not range-checked in these languages.
• Static length and limited dynamic length
strings require no special dynamic storage
allocation.
• There are three approaches to supporting the
dynamic allocation and deallocation that is
required for dynamic length strings.
o strings can be stored in a linked list
o store strings as arrays of pointers to individual
characters allocated in the heap.
o store complete strings in adjacent storage
cells.
User Defined ordinal Types
• An ordinal type is one in which the range of possible
values can be easily associated with the set of positive
integers
• Two common user-defined ordinal types
– Enumeration: All possible values, which are named
constants, are provided or enumerated in the
definition, e.g., C# example
enum days {mon, tue, wed, thu, fri, sat, sun};
– Subrange: An ordered contiguous subsequence of an
ordinal type, e.g., 12..18 is a subrange of integer
type
Enumeration Types
• All possible values, which are named constants, are
provided in the definition
• Example
enum days {mon, tue, wed, thu, fri, sat, sun};
• The enumeration constants are typically implicitly
assigned the integer values, 0, 1, . . . but can be explicitly
assigned any integer literal in the type’s definition.
• Design issues
–Is an enumeration constant allowed to appear in more than
one type definition, and if so, how is the type of an
occurrence of that constant checked?
–Are enumeration values coerced to integer?
–Any other type coerced to an enumeration type?
enum colors {red, blue, green, yellow, black};
colors myColor = blue, yourColor = red;
• The enumeration values are coerced to int
when they are put in integer context.
myColor++ would assign green to myColor.
• However, no other type value is coerced to an
enumeration type in C++.
myColor = 4;is illegal in C++.
• In the area of reliability, the enumeration types of Ada, C#, F#,
and Java5.0 provide two advantages:
(1) No arithmetic operations are legal on enumeration types;
this prevents adding days of the week, for example, and
(2) second, no enumeration variable can be assigned a value
outside its defined range.
If the colors enumeration type has 10 enumeration constants
and uses 0..9 as its internal values, no number greater than 9
can be assigned to a colors type variable.
Enumeration Types - Evaluation
• Aid to readability,
– e.g., no need to code a color as a number
• Aid to reliability,
– e.g., compiler can check:
• No enumeration variable can be assigned a value
outside its defined range
• operations (don’t allow colors to be added)
Subrange Types
• An ordered contiguous subsequence of an ordinal type
–Example: 12..18 is a subrange of integer type
• Subrange types were introduced by Pascal and are
included in Ada.
• There are no design issues that are specific to subrange
types.
• In Ada, subranges are included in the category of types
called subtypes.
• Subtypes are not new types; rather, they are new names
for possibly restricted, or constrained, versions of existing
types.
• Ada’s code design
type Days is (Mon, Tue, Wed, Thu, Fri, Sat, Sun);
subtype Weekdays is Days range Mon..Fri;
subtype Index is Integer range 1..100;
• The restriction on the existing types is in the range
of possible values.
• All of the operations defined for the parent type are
also defined for the subtype, except assignment of
values outside the specified range.
For example, in
Day1 : Days;
Day2 : Weekdays;
...
Day2 := Day1;
the assignment is legal unless the value of Day1
is Sat or Sun.
• The compiler must generate range-checking
code for every assignment to a subrange
variable.
• While types are checked for compatibility at
compile time,subranges require run-time
range checking.
• One of the most common uses of user-
defined ordinal types is for the indices of
arrays
SubrangeTypes
• Aid to readability
–Make it clear to the readers that variables of subrange
can store only certain range of values
• Reliability
–Assigning a value to a subrange variable that is outside
the specified range is detected as an error
Implementation of User-Defined
Ordinal Types
• Enumeration types are implemented as integers
• Subrange types are implemented like the parent types
with code inserted (by the compiler) to restrict
assignments to subrange variables
Array Types
• An array is an aggregate of homogeneous data elements
in which an individual element is identified by its position
in the aggregate, relative to the first element.
• If any of the subscript expressions in a reference include
variables, then the reference will require an additional
run-time calculation to determine the address of the
memory location being referenced.
Array Types – Design Issues
• The primary design issues specific to arrays are
the following:
• What types are legal for subscripts?
• Are subscripting expressions in element references
range checked?
• When are subscript ranges bound?
• When does array allocation take place?
• Can arrays be initialized when they have their
storage allocated?
• What kinds of slices are allowed, if any?
Arrays and Indices
• Specific elements of an array are referenced by means of
a two-level syntactic mechanism,
• where the first part is the aggregate name,
• and the second part is a possibly dynamic selector
consisting of one or more items known as subscripts or
indices.
• If all of the subscripts in a reference are constants, the
selector is static; otherwise, it is dynamic.
• The selection operation - mapping from the array name
and the set of subscript values to an element in the
aggregate. - arrays are sometimes called finite mappings.
• array_name (subscript_value_list) → element
Arrays and Indices
• The syntax of array references
• The array name is followed by the list of subscripts, which
is surrounded by either parentheses or brackets.
• A problem with using parentheses to enclose subscript
expressions is that they often are also used to enclose
the parameters in subprogram calls;
• this use makes references to arrays appear exactly like
those calls. Ada assignment statement: Sum := Sum + B(I);
• both program readers and compilers are forced to use
other information to determine whether B(I) is a function
call or a reference to an array element.
• This results in reduced readability
Two distinct types are involved in an array
type:
the element type and the type of the
subscripts.
The type of the subscripts is often a subrange
of integers, but Ada allows any ordinal type to
be used as subscripts, such a Boolean,
character, and enumeration.
• Range errors in subscripts are common in
programs, so requiring range checking is an
important factor in the reliability of languages.
• Many contemporary languages do not specify
range checking of subscripts, but Java, ML,
and C# do.
• By default, Ada checks the range of all
subscripts, but this feature can be disabled by
the programmer
Subscript Bindings & Array Categories
• The binding of the subscript type to an array variable is usually
static, but the subscript value ranges are sometimes dynamically
bound.
• In some languages, the lower bound of the subscript range is
implicit.
• In some other languages, the lower bounds of the subscript ranges
must be specified by the programmer.
• Based on the binding to subscript ranges, the binding to storage
and from where the storage is allocated there are five categories of
arrays
• Static array
• Fixed stack-dynamic array
• Stack dynamic array
• fixed heap-dynamic array
• Heap dynamic array
Subscript Bindings & Array Categories
• A static array is one in which the subscript ranges are
statically bound and storage allocation is static (done
before run time).
• The advantage of static arrays is efficiency: No dynamic
allocation or deallocation is required.
• The disadvantage is that the storage for the array is fixed
for the entire execution time of the program.
• Arrays declared in C and C++ functions that include the
static modifier are static.
Subscript Bindings & Array Categories
• A fixed stack-dynamic array is one in which the subscript
ranges are statically bound, but the allocation is done at
declaration elaboration time during execution.
• The advantage of fixed stack-dynamic arrays over static arrays
is space efficiency.
• A large array in one subprogram can use the same space as a
large array in a different subprogram, as long as both
subprograms are not active at the same time. The same is
true if the two arrays are in different blocks that are not
active at the same time.
• The disadvantage is the required allocation and deallocation
time.
• Arrays that are declared in C and C++ functions (without the
static specifier) are examples of fixed stack-dynamic arrays
Subscript Bindings & Array Categories
• A stack-dynamic array is one in which both the subscript
ranges and the storage allocation are dynamically bound at
elaboration time.
• Once the subscript ranges are bound and the storage is
allocated, however, they remain fixed during the lifetime of
the variable.
• The advantage of stack-dynamic arrays over static and fixed
stack-dynamic arrays is flexibility.
• The size of an array need not be known until the array is
about to be used
declare
List : (array (1..LIST_GEN) of INTEGER;
begin
End //user input the value of LIST_GEN, dynamically allocated in declare block
• A fixed heap-dynamic array is similar to a fixed stack-
dynamic array, in that the subscript ranges and the
storage binding are both fixed after storage is
allocated.
• The differences are that both the subscript ranges
and storage bindings are done when the user
program requests them during execution, and the
storage is allocated from the heap, rather than the
stack.
• The advantage of fixed heap-dynamic arrays is
flexibility—the array’s size always fits the
problem.
• The disadvantage is allocation time from the
heap, which is longer than allocation time
from the stack.
• C++ uses the operators new and delete to
manage heap storage.
Subscript Bindings & Array Categories
• A heap-dynamic array is one in which the binding of
subscript ranges and storage allocation is dynamic and
can change any number of times during the array’s
lifetime.
• The advantage of heap-dynamic arrays over the others is
flexibility:
• Arrays can grow and shrink during program execution as
the need for space changes.
• The disadvantage is that allocation and deallocation take
longer and may happen many times during execution of
the program.
Implementation of Array Types
• Implementing arrays requires considerably more compile-
time effort than does implementing primitive types.
• The code to allow accessing of array elements must be
generated at compile time.
• At run time, this code must be executed to produce
element addresses.
• A single-dimensioned array is implemented as a list of
adjacent memory cells.
• Suppose the array list is defined to have a subscript
range lower bound of 0. The access function for list
address(list[k]) = address(list[0]) + k * element_size
• where the first operand of the addition is the constant
part, and the second is the variable part.
Implementation of Array Types
• If the element type is statically bound and the array is
statically bound to storage, then the value of the
constant part can be computed before run time.
Implementation of Array Types
• The generalization of this access function for an arbitrary
lower bound is
• address(list[k]) = address(list[lower_bound]) + ((k -
lower_bound) * element_size)
• Compile – Time Descriptor for a single dimensional array
Implementation of Array Types
• True multidimensional arrays (those that are not arrays of
arrays) are more complex to implement than single-
dimensioned arrays
• Hardware memory is linear—it is usually a simple
sequence of bytes.
• So values of data types that have two or more
dimensions must be mapped onto the single-
dimensioned memory.
• There are two ways in which multidimensional arrays can
be mapped to one dimension: row major order and
column major order
Implementation of Array Types
• In row major order, the elements of the array that have
as their first subscript the lower bound value of that
subscript are stored first, followed by the elements of the
second value of the first subscript, and so forth.
• If the array is a matrix, it is stored by rows. For example, if
the matrix had the values
347
625
138
• it would be stored in row major order as 3, 4, 7, 6, 2, 5, 1,
3, 8
Implementation of Array Types
• In column major order, the elements of an array that
have as their last subscript the lower bound value of that
subscript are stored first, followed by the elements of the
second value of the last subscript, and so forth.
• If the array is a matrix, it is stored by columns.
• If the example matrix were stored in column major order,
it would have the following order in memory:
3, 6, 1, 4, 2, 3, 7, 5, 8
Implementation of Array Types
• The access function for two-dimensional arrays stored in
row major order can be developed as follows.
• In general, the address of an element is the base address
of the structure plus the element size times the number
of elements that precede it in the structure.
• Base address + (element size * no. of preceding elements)
• For a matrix in row major order, the number of elements
that precedes an element is the number of rows above
the element times the size of a row, plus the number of
elements to the left of the element.
Implementation of Array Types
• the access function can be written as
• location(a[i,j]) = address of a[0, 0] + ((((number of rows
above the ith row) * (size of a row)) + (number of
elements left of the jth column)) * element size)
• Because the number of rows above the ith row is i and
the number of elements to the left of the jth column is j,
we have
• location(a[i, j]) = address of a[0, 0] + (((i * n) + j) *
element_size)
• where n is the number of elements per row.
• The first term is the constant part and the last is the
variable part.
Implementation of Array Types
• The generalization to arbitrary lower bounds results in
the following access function:
• location(a[i, j]) = address of a[row_lb, col_lb] + (((i -
row_lb) * n) + (j - col_lb)) * element_size
• This can be rearranged to the form
• location(a[i, j]) = address of a[row_lb, col_lb] - (((row_lb *
n) + col_lb) * element_size) + (((i * n) + j) * element_size)
Record Types
• A record is an aggregate of data elements in which
the individual elements are identified by names and
accessed through offsets from the beginning of the
structure.
• The elements of a record are of potentially different
sizes and reside in adjacent memory locations.
Design issues are specific to records:
• What is the syntactic form of references to
fields?
• Are elliptical references allowed?
eg:COBOL form of a record declaration
Ada Record Type
• The fundamental difference between a record and
an array is that record elements, or fields, are not
referenced by indices.
• The fields are named with identifiers, and references
to the fields are made using these identifiers.
References to Record Fields
COBOL field references have the form
field_name OF record_name_1 OF . . . OF
record_name_n
Eg:
MIDDLE OF EMPLOYEE-NAME OF EMPLOYEE-
RECORD
reference to the field Middle in the earlier Ada
record example:
• Employee_Record.Employee_Name.Middle
• A fully qualified reference to a record field is one in
which all intermediate record names, from the
largest enclosing record to the specific field, are
named in the reference.
• But In an elliptical reference, the field is named, but
any or all of the enclosing record names can be
omitted.
• For example, FIRST, FIRST OF EMPLOYEE-NAME, and FIRST
OFMPLOYEE-RECORD are elliptical references to the
employee’s first name.
• Although elliptical references are a programmer convenience,
they require a compiler to have elaborate data structures
and procedures in order to correctly identify the referenced
field.
• They are detrimental to readability.
• Records are used when the collection of data values
is heterogeneous and the different fields are not
processed in the same way.
• Also, the fields of a record often need not be
processed in a particular order.
• Field names are like literal, orconstant, subscripts.
• Because they are static, they provide very efficient
access to the fields.
• The fields of records are stored in adjacent
memory locations.
• the offset address, relative to the beginning
of the record, is associated with each field.
Field accesses are all handled using these
offsets.
• The compile-time descriptor for a record has
the general form
Pointer Types
• A pointer type is one in which the variables have a range
of values that consists of memory addresses and a special
value, nil.
• The value nil is not a valid address and is used to indicate
that a pointer cannot currently be used to reference a
memory cell
• Uses of Pointers - provide some of the power of indirect
addressing, which is frequently used in assembly
language programming.
• Pointers provide a way to manage dynamic storage.
• A pointer can be used to access a location in an area
where storage is dynamically allocated called a heap.
Pointer and Reference Types
• Variables that are dynamically allocated from the heap
are called heap dynamic variables.
• They often do not have identifiers associated with them
and thus can be referenced only by pointer or reference
type variables.
• Variables without names are called anonymous variables.
Pointer and Reference Types
• Both kinds of uses of pointers add writability to a
language.
• For example, suppose it is necessary to implement a
dynamic structure like a binary tree in a language which
does not have pointers.
• This would require the programmer to provide and
maintain a pool of available tree nodes, which would
probably be implemented in arrays.
• Also, because of the lack of dynamic storage it would be
necessary for the programmer to guess the maximum
number of required nodes.
• This is clearly an awkward and error-prone way to deal
with binary trees.
Pointer Types – Design Issues
• The primary design issues particular to pointers are the
following:
• What are the scope and lifetime of a pointer
variable?
• What is the lifetime of a heap-dynamic variable
(the value a pointer references)?
• Are pointers restricted as to the type of value to
which they can point?
• Are pointers used for dynamic storage
management, indirect addressing, or both?
• Should the language support pointer types,
reference types, or both?
Pointer Operations
• Languages that provide a pointer type usually include two
fundamental pointer operations: assignment and
dereferencing.
• The first operation sets a pointer variable’s value to some
useful address.
• If pointer variables are used only to manage dynamic storage,
then the allocation mechanism, whether by operator or built-
in subprogram, serves to initialize the pointer variable.
• If pointers are used for indirect addressing to variables that
are not heap dynamic, then there must be an explicit
operator or built-in subprogram for fetching the address of a
variable, which can then be assigned to the pointer variable
Pointer Operations
• An occurrence of a pointer variable in an expression can be
interpreted in two distinct ways.
• First, it could be interpreted as a reference to the contents of
the memory cell to which it is bound, which in the case of a
pointer is an address
• the pointer could also be interpreted as a reference to the
value in the memory cell pointed to by the memory cell to
which the pointer variable is bound.
• In this case, the pointer is interpreted as an indirect reference.
The former case is a normal pointer reference; the latter is
the result of dereferencing the pointer.
• Dereferencing, which takes a reference through one level of
indirection, is the second fundamental pointer operation.
Pointer Operations
j=*ptr
Pointer Problems - 1
• A dangling pointer, or dangling reference, is a pointer
that contains the address of a heap-dynamic variable that
has been deallocated.
• The following sequence of operations creates a dangling
pointer in many languages:
1. A new heap-dynamic variable is created and pointer p1 is
set to point at it.
2. Pointer p2 is assigned p1’s value.
3. The heap-dynamic variable pointed to by p1 is explicitly
deallocated (possibly setting p1 to nil), but p2 is not
changed by the operation.
• P2 is now a dangling pointer
int * arrayPtr1;
int * arrayPtr2 = new int[100];
arrayPtr1 = arrayPtr2;
delete [] arrayPtr2;
// Now, arrayPtr1 is dangling, because the heap
storage to which it was pointing has been
deallocated.
Pointer Problems - 1
• Dangling pointers are dangerous for several reasons.
• First, the location being pointed to may have been
reallocated to some new heap-dynamic variable.
• If the new variable is not the same type as the old one,
type checks of uses of the dangling pointer are invalid.
• Even if the new dynamic variable is the same type, its
new value will have no relationship to the old pointer’s
dereferenced value.
• Furthermore, if the dangling pointer is used to change
the heap-dynamic variable, the value of the new heap-
dynamic variable will be destroyed.
Pointer Problems - 2
• Lost Heap Dynamic Variable is an allocated heap-dynamic
variable that is no longer accessible to the user program.
• Such variables are often called garbage, because they are not
useful for their original purpose, and they also cannot be
reallocated for some new use in the program.
• Lost heap-dynamic variables are most often created by the
following sequence of operations:
1. Pointer p1 is set to point to a newly created heap-dynamic
variable.
2. p1 is later set to point to another newly created heap-
dynamic variable.
• The first heap-dynamic variable is now inaccessible, or lost.
This is sometimes called memory leakage.
Reference Types
• A reference type variable is similar to a pointer, with one
important and fundamental difference:
• A pointer refers to an address in memory, while a
reference refers to an object or a value in memory.
• C++ includes a special kind of reference type that is used
primarily for the formal parameters in function
definitions.
• A C++ reference type variable is a constant pointer that is
always implicitly dereferenced.
Reference Types
• When used as formal parameters in function definitions, C++
reference types provide for two-way communication between the
caller function and the called function
• This is not possible with nonpointer primitive parameter types,
because C++ parameters are passed by value.
• Passing a pointer as a parameter accomplishes the same two-way
communication, but pointer formal parameters require explicit
dereferencing, making the code less readable and less safe.
• Reference parameters are referenced in the called function exactly
as are other parameters.
• The calling function need not specify that a parameter whose
corresponding formal parameter is a reference type is anything
unusual.
• The compiler passes addresses, rather than values, to reference
parameters.
Implementation of Pointer Type
• There have been several proposed solutions to the dangling-
pointer problem.
• Among these are tombstones in which every heap-dynamic
variable includes a special cell, called a tombstone, that is
itself a pointer to the heap-dynamic variable.
• The actual pointer variable points only at tombstones and
never to heap-dynamic variables.
• When a heap-dynamic variable is deallocated, the tombstone
remains but is set to nil, indicating that the heap-dynamic
variable no longer exists.
• This approach prevents a pointer from ever pointing to a
deallocated variable.
• Any reference to any pointer that points to a nil tombstone
can be detected as an error.
Implementation of Pointer Type
• Tombstones are costly in both time and space.
• Because tombstones are never deallocated, their storage
is never reclaimed.
• Every access to a heap dynamic variable through a
tombstone requires one more level of indirection, which
requires an additional machine cycle on most computers.
Implementation of Pointer Type
• locks-and-keys approach - In this compiler, pointer values
are represented as ordered pairs (key, address), where
the key is an integer value.
• Heap-dynamic variables are represented as the storage
for the variable plus a header cell that stores an integer
lock value.
• When a heap-dynamic variable is allocated, a lock value
is created and placed both in the lock cell of the heap-
dynamic variable and in the key cell of the pointer that is
specified in the call to new.
• Every access to the dereferenced pointer compares the
key value of the pointer to the lock value in the heap-
dynamic variable
Implementation of Pointer Type
• If they match, the access is legal; otherwise the access is
treated as a run-time error.
• Any copies of the pointer value to other pointers must
copy the key value.
• Therefore, any number of pointers can reference a given
heap dynamic variable.
• When a heap-dynamic variable is deallocated with
dispose, its lock value is cleared to an illegal lock value.
• Then, if a pointer other than the one specified in the
dispose is dereferenced, its address value will still be
intact, but its key value will no longer match the lock, so
the access will not be allowed.
Heap Management
• Heap management can be a very complex run-time
process.
• We examine the process in two separate situations:
– one in which all heap storage is allocated and
deallocated in units of a single size,
– one in which variable-size segments are allocated and
deallocated.
Heap Management
• Single-Size Cells
• The simplest situation is when all allocation and
deallocation is of single-size cells.
• It is further simplified when every cell already contains a
pointer.
• In a single-size allocation heap, all available cells are
linked together using the pointers in the cells, forming a
list of available space.
• Allocation is a simple matter of taking the required
number of cells from this list when they are needed.
Heap Management
• Single-Size Cells
• Deallocation is a much more complex process.
• A heap-dynamic variable can be pointed to by more than
one pointer, making it difficult to determine when the
variable is no longer useful to the program.
• Simply because one pointer is disconnected from a cell
obviously does not make it garbage;
• there could be several other pointers still pointing to the
cell.
Heap Management
• Single-Size Cells
• There are several different approaches to garbage
collection.
• The two most common traditional techniques are in
some ways opposite processes.
• These are named reference counters, in which
reclamation is incremental and is done when inaccessible
cells are created,
• mark-sweep, in which reclamation occurs only when the
list of available space becomes empty.
• These two methods are sometimes called the eager
approach and the lazy approach, respectively.
Heap Management
• Single-Size Cells
• The reference counter method of storage reclamation
accomplishes its goal by maintaining in every cell a
counter that stores the number of pointers that are
currently pointing at the cell.
• Embedded in the decrement operation for the reference
counters, which occurs when a pointer is disconnected
from the cell, is a check for a zero value.
• If the reference counter reaches zero, it means that no
program pointers are pointing at the cell, and it has thus
become garbage and can be returned to the list of
available space.
Heap Management
• Single-Size Cells
• There are three distinct problems with the reference
counter method.
• First, if storage cells are relatively small, the space
required for the counters is significant.
• Second, some execution time is obviously required to
maintain the counter values.
• Every time a pointer value is changed, the cell to which it
was pointing must have its counter decremented, and the
cell to which it is now pointing must have its counter
incremented.
• Of course, if pointer changes are not too frequent, this is
not a problem.
Heap Management
• Single-Size Cells
• The third problem is that complications arise when a
collection of cells is connected circularly.
• The problem here is that each cell in the circular list has a
reference counter value of at least 1, which prevents it
from being collected and placed back on the list of
available space
Heap Management
• Single-Size Cells
• The original mark-sweep process of garbage collection
operates as follows:
• The run-time system allocates storage cells as requested
and disconnects pointers from cells as necessary, without
regard for storage reclamation (allowing garbage to
accumulate), until it has allocated all available cells.
• At this point, a mark-sweep process is begun to gather all
the garbage left floating around in the heap.
• To facilitate the process, every heap cell has an extra
indicator bit or field that is used by the collection
algorithm.
Heap Management
• Single-Size Cells
• The mark-sweep process consists of three distinct phases.
• First, all cells in the heap have their indicators set to
indicate they are garbage. This is, of course, a correct
assumption for only some of the cells.
• The second part, called the marking phase, is the most
difficult. Every pointer in the program is traced into the
heap, and all reachable cells are marked as not being
garbage.
• After this, the third phase, called the sweep phase, is
executed: All cells in the heap that have not been
specifically marked as still being used are returned to the
Heap Management
• Single-Size Cells
• To illustrate the flavor of algorithms used to mark the
cells that are currently in use, we provide the following
simple version of a marking algorithm.
• We assume that all heap-dynamic variables, or heap cells,
consist of an information part; a part for the mark,
named marker; and two pointers named llink and rlink.
• These cells are used to build directed graphs with at most
two edges leading from any node.
• The marking algorithm traverses all spanning trees of the
graphs, marking all cells that are found.
• Like other graph traversals, the marking algorithm uses
recursion.
Heap Management
• Single-Size Cells
for every pointer r do
mark(r)
void mark(void * ptr) {
if (ptr != 0)
if (*[Link] is not marked) {
set *[Link]
mark(*[Link])
mark(*[Link])
}}
Heap Management
Heap Management
• Single-Size Cells
• The most serious problem with the original version of
mark-sweep was that it was done too infrequently—only
when a program had used all or nearly all of the heap
storage.
• Mark-sweep in that situation takes a good deal of time,
because most of the cells must be traced and marked as
being currently used.
• This causes a significant delay in the progress of the
application.
• Furthermore, the process may yield only a small number
of cells that can be placed on the list of available space.
Heap Management
• Single-Size Cells
• This problem has been addressed in a variety of
improvements.
• For example, incremental mark-sweep garbage
collection occurs more frequently, long before memory
is exhausted, making the process more effective in terms
of the amount of storage that is reclaimed.
• Also, the time required for each run of the process is
obviously shorter, thus reducing the delay in application
execution.
• Another alternative is to perform the mark-sweep
process on parts, rather than all of the memory
associated with the application, at different times
Heap Management
• Variable-Size Cells
• Variable-Size Cells Managing a heap from which variable-
size cells are allocated has all the difficulties of managing
one for single-size cells, but also has additional problems.
• The additional problems posed by variable-size cell
management depend on the method used.
• If mark-sweep is used, the following additional problems
occur
Heap Management
• Variable-Size Cells
• The initial setting of the indicators of all cells in the heap
to indicate that they are garbage is difficult.
• Because the cells are different sizes, scanning them is a
problem.
• One solution is to require each cell to have the cell size as
its first field.
• Then the scanning can be done, although it takes slightly
more space and somewhat more time than its ounterpart
for fixed-size cells.
Heap Management
• Variable-Size Cells
• Maintaining the list of available space is another source
of overhead.
• The list can begin with a single cell consisting of all
available space.
• Requests for segments simply reduce the size of this
block.
• Reclaimed cells are added to the list.
• The problem is that before long, the list becomes a long
list of various-size segments, or blocks.
• This slows allocation because requests cause the list to
be searched for sufficiently large blocks.
Heap Management
• Variable-Size Cells
• Eventually, the list may consist of a large number of very
small blocks, which are not large enough for most
requests.
• At this point, adjacent blocks may need to be collapsed
into larger blocks.
• Alternatives to using the first sufficiently large block on
the list can shorten the search but require the list to be
ordered by block size.
• In either case, maintaining the list is additional overhead.
• Type checking is the activity of ensuring that the
operands of an operator are of compatible types.
• the concept of operands and operators is
generalized to include subprograms and assignment
statements.
• Subprograms will be thought of as operators whose
operands are their parameters.
• The assignment symbol will be thought of as a
binary operator, with its target variable and its
expression being the operands.
Type Checking
• Type checking is the activity of ensuring that the
operands of an operator are of compatible types.
• A compatible type is one that either is legal for the
operator or is allowed under language rules to be
implicitly converted by compiler-generated code (or the
interpreter) to a legal type.
• This automatic conversion is called a coercion.
• if an int variable and a float variable are added in Java,
the value of the int variable is coerced to float and a
floating-point add is done.
• A type error is the application of an operator to an
operand of an inappropriate type.
• int is passed for float
Type Checking
• If all bindings of variables to types are static in a language,
then type checking can nearly always be done statically.
• Dynamic type binding requires type checking at run time,
which is called dynamic type checking.
Type Checking
• A programming language is strongly typed if type errors
are always detected.
• This requires that the types of all operands can be
determined, either at compile time or at run time.
• The importance of strong typing lies in its ability to detect
all misuses of variables that result in type errors.
• A strongly typed language also allows the detection, at
run time, of uses of the incorrect type values in variables
that can store values of more than one type.
Type Equivalence
• The compatibility rules dictate the types of operands that
are acceptable for each of the operators and thereby
specify the possible type errors of the language.
• The rules are called compatibility because in some cases
the type of an operand can be implicitly converted by the
compiler or run-time system to make it acceptable to the
operator.
• The type compatibility rules are simple and rigid for the
predefined scalar types.
• However, in the cases of structured types, such as arrays
and records and some user-defined types, the rules are
more complex.
• Type equivalence is a strict form of type
compatibility— compatibility without coercion
• two types are equivalent if an operand of one type
in an expression is substituted for one of the other
type, without coercion.
• There are two approaches to defining type
equivalence:
❖ name type equivalence
❖ structure type equivalence.
Type Compatibility
• Name type compatibility means the two variables have
compatible types if they are in either the same
declaration or in declarations that use the same type
name
• Easy to implement but highly restrictive:
–Subranges of integer types are not compatible with
integer types
–Formal parameters must be the same type as their
corresponding actual parameters (Pascal)
For example, supposing Ada used strict name
type equivalence, consider the following Ada
code:
type Indextype is 1..100;
count : Integer;
index : Indextype;
The types of the variables count and index
would not be equivalent;
Count could not be assigned to index or vice
versa.
Type Compatibility
• Structure type compatibility means that two variables
have compatible types if their types have identical
structures
• More flexible, but harder to implement
–Are two record types compatible if they are structurally
the same but use different field names?
–Are two array types compatible if they are the same
except that the subscripts are different? (e.g. [1..10] and
[0..9])
–Are two enumeration types compatible if their
components are spelled differently?