01 Data Representation
Candidates should be able to:
Show understanding of why user-defined types are necessary
Define and use non-composite types
Define and use composite data types
Choose and design an appropriate user-defined data type for a given problem
Show understanding of the methods of file organization and select an
appropriate method of file organization and file access for a given problem
Show understanding of methods of file access
Show understanding of hashing algorithms
Describe the format of binary floating-point real numbers
Convert binary floating-point real numbers into denary and vice versa
Normalize floating-point numbers
Show understanding of the consequences of a binary representation only
being an approximation to the real number it represents (in certain cases)
Show understanding that binary representations can give rise to rounding
errors
Definitions
The programming language defines the range of possible
Built-in data type values that can be assigned to and the operations that can be
applied to a variable
A data type for which the programmer has included the
User-defined data type
definition in the program
01 Data Representation 1
Composite data type A data type that is derived from other data types
Non-Composite data
A data type defined without referencing another data type
type
A data-type which provides an ordered list of values that a
Enumerated data type
variable of this type can take on
Pointer data type Used to reference a memory location
A data type that allows storing a finite number of different
Sets
values that have no order
Class Includes variables of given data types and methods
Serial file organization Records stored in the order they were added in
Sequential file Physically stores record and ordered according to their key
organization field value
Random file organization Stores records of data in a file in any available position
Each record in the file is read, one by one, until the desired
Sequential Access
record is found
Jumps to a specific record in the file without accessing other
Random Access
records
A mathematical formula used to perform a calculation on the
Hashing Algorithm
key field of the record
Normalized two’s
First and second bits of mantissa must be different for a
complement binary
floating point number to be normalized
number
Number following a calculation is too big to be represented in
Overflow
the given format
Underflow Number is too small to be represented in the given format
1.1 User-defined data types
1.1.1 Data Types
Built-in data types
01 Data Representation 2
The programming language defines the range of possible values that can be
assigned to and the operations that can be applied to a variable
User-defined data types
A data type for which the programmer has included the definition in the
program
It is used to create a new data type
Allows to extend the flexibility of the programming language
Non-Composite data types
A data type defined without referencing another data type
Example: Enumerated data type, Pointer
Enumerated data type
A data-type which provides an ordered list of values that a variable of this
type can take on
Example:
TYPE SchoolDay = (Monday, Tuesday, Wednesday, Thursday, Friday)
Note: Since this is a different data type than STRING, quotation marks are not
used to represent them
Pointer data type
Used to reference a memory location
Example:
DECLARE IntPointer : ^INTEGER
DECLARE MyVar : INTEGER
DECLARE MyVar2 : INTEGER
MyVar ← 57
IntPointer ← @MyVar
MyVar2 ← IntPointer^
IntPointer^ ← 100
01 Data Representation 3
@ symbol before a variable identifier gives the address of the variable
^ is put before the data type to define a pointer data type
^ is placed after a pointer variable. It is an identifier that dereferences the
pointer variable
Composite data types
A data type that is derived from other data types
They are used to extend functionality of programming language
Example: Records, Sets, Classes
Sets
A data type that allows storing a finite number of different values that have no
order
Example:
TYPE Sletter = SET OF CHAR
DEFINE vowel = (’a’, ‘e’, ‘i’, ‘o’, ‘u’) : Sletter
Classes
Includes variables of given data types and methods
Used in Object Oriented Programming, discussed in later chapters
01 Data Representation 4
Skill Check 1
1. Describe the purpose of a user-defined data type. [2]
2. Define, using pseudocode, the following enumerated data types:
a. SchoolDay to hold data about the days students are usually in school.
[1]
b. WeekEnd to hold data about the days that are not school days. [1]
3. Define, using pseudocode, the composite data type ClubMeet. This will
hold data about club members that includes:
First name and last name
The two days they attend:
One on a school day
One not on a school day
Use the enumerated types you created in part(b). [4]
Solution
1. To create a new data type to extend the flexibility of the programming
language
2. TYPE SchoolDay = (Monday, Tuesday, Wednesday, Thursday
TYPE WeekEnd = (Saturday, Sunday)
3. TYPE ClubMeet
DECLARE FirstName : STRING
DECLARE LastName : STRING
DECLARE Schoolday : SchoolDay
DECLARE Weekend : WeekEnd
ENDTYPE
01 Data Representation 5
1.2 File organization and access
1.2.1 File Organization
Serial File Organization
Records stored in the order they were added in
New records appended to end of file
Accessed only sequentially
Reorganization not needed when a new record is added
Sequential File Organization
Physically stores record and ordered according to their key field value
Accessed both sequentially and by direct access with an index file
High hit rate (everyone needs a statement)
Justification:
Suitable for batch processing
All customers need statement
There’s a unique key field
Organized by unique key field
Random File Organization
Stores records of data in a file in any available position
The location of any record in the file is found by using a hashing algorithm on
the key field of a record
Low waiting time
Low hit rate (only one record will match your account number/username)
Justification:
Real time access
01 Data Representation 6
No need to search records.
1.2.2 File Access
Sequential Access
Each record in the file is read, one by one, until the desired record is found
Efficient when every record in the file needs to be processed
Every record is searched until a record is found, or whole file has been
searched and not found, or if key field of current record being checked is
greater than the key field of record being searched in sequential file
organization
Direct Access
Jumps to a specific record in the file without accessing other records
Required when only an individual record from a file needs to be processed
For sequential file, an index of all the key fields is kept and used to look up the
address of the file location where a given record is stored
For random access files, a hashing algorithm is used on the key field to
calculate the address of the file location where a given record is stored
1.2.3 Hashing Algorithms
A mathematical formula used to perform a calculation on the key field of the
record
The result of the calculation gives the address where the record should be
found
To write a record:
Key field is hashed to produce a location address
If location is free, add the data there
Otherwise, use an overflow method to find a free location
If no free location, data cannot be stored
01 Data Representation 7
To read a record:
Key field is hashed to produce a location address
The record at the address is checked to see if it matches the desired
record
If it does not match, then the following records need to be read until a
match is found (if open hash is used), or the overflow area needs to be
searched for a match (if closed hash is used)
Skill Check 2
1. Compare sequential and serial methods of file organization. [4]
2. State the most suitable method of file access when a record is
referenced by a unique address on a disk-type storage medium. [1]
3. State the most suitable method of file access when a bank stores its
data records in ascending order of account number. [1]
Solution
1. In both serial and sequential files, records are stored one after the
other and need to be accessed one after the other. Serial files are
stored in chronological order, whereas sequential files are stored
with ordered records and stored in the order of the key field. In serial
files, new records are added in the next available space. In
sequential files, new records are inserted in the correct position.
2. Direct Access
3. Sequential Access
1.3 Floating point numbers, representation
and manipulation
Allows for representation of fractional values in binary number system
The number is the form M × 2E
01 Data Representation 8
M is the mantissa and E is the exponent
1.3.1 Converting binary floating-point numbers into
denary
For the mantissa, start with -1 as the first bit, 0.5 as the second bit, and every
next bit being half of the previous one
Add up all the mantissa values where a 1 bit appears to get the value of M
For the exponent consider the normal 8-bit binary number with the right most
bit representing 1
Add up the exponent values where a 1 bit appears to get the value of E
Use M×2E to get the denary output
For example:
0.1011010 00000100
1 1 1 1 45
0.1011010= 2
+ 8
+ 16
+ 64
= 64
Exponent = 4
Hence, the number is 45
64
× 24 = 11.25
1.3.2 Converting denary numbers into binary floating-
point numbers
Take a denary number and convert to normal two’s complement binary with
the decimal point and replace missing bits to the right with 0s
Move the binary point after the first digit
Represent the exponent by the number of places the decimal place is shifted
left to
01 Data Representation 9
For example:
4.5 →
0100.1000→ 0.1001000= Mantissa
00000011= Exponent
1.3.3 Potential rounding errors and approximations
Most numbers cannot be represented exactly in binary representation, hence
an approximate value is represented
This approximation becomes more accurate with greater number of bits
allowed to represent the number
Repeated calculation and using of previously rounded values can lead to an
inevitable rounding error
The inaccuracy may become significant enough to see
Normalization
First and second bits of mantissa must be different for a floating point number
to be normalized
Reasons:
To store maximum range of numbers with the smallest number of bits
To maximize the precision of the number for the given number of bits
To prevent multiple representations of the same number
To minimize the number of leading zeroes/ones
To normalize, simply make the mantissa start with 1.0 (for a negative number)
or 0.1 (for a negative number) by shifting bits left
For example:
0.0011100 00000101
Shift bits left to get 0.1110000
Since it was shifted left, the value was increased for the relevant bit,
hence reduce the exponent by the number of left shifts, which is 2 in
01 Data Representation 10
this case
Reducing exponent by 2 gives 00000011
Hence, the normalized form is: 0.111000000000011
1.3.4 Precision versus range
There is a trade-off between range and precision
More bits in mantissa means greater precision and vice versa
More bits in exponent means a greater range and vice versa
For a fixed number of bits, increasing one results in the decrease of other
and hence the trade-off
For a binary number with an 8-bit mantissa and an 8-bit exponent (using two’s
complement):
The maximum positive number which can be stored is:
127
0111111101111111= 128
× 2127
The smallest positive number which can be stored is:
1
0100000010000000= 2
× 2−128
The smallest magnitude negative number which can be stored is:
65
1011111110000000= 128
× 2−128
The largest magnitude negative number which can be stored is:
1000000001111111= −1 × 2127
1.3.5 Floating-point problems
Overflow: Number following a calculation is too big to be represented in the
given format
01 Data Representation 11
Underflow: Number is too small to be represented in the given format
Unable to stored the number zero in normalized form
01 Data Representation 12
Skill Check 3
1. Numbers are stored in a computer using floating-point representation
with:
12 bits for the mantissa
4 bits for the exponent
two’s complement form for both the mantissa and exponent
a) Write the normalized floating-point representation of the following
unsigned binary number using this system. [2]
1011100.011001
b) State the consequence of storing the binary number in part(a)(i) as a
floating-point number in this system. Justify your answer. [2]
2. Explain the reason why binary numbers are stored in normalized form.
[3]
Solution
1.
a. Mantissa: 010111000110
Exponent:
0111
b. The accuracy of the number would be reduced because the least
significant bits of the original number have been lost.
2. To store the maximum range of numbers in the minimum number of
bits. Normalization minimizes the number of significant bits enabling
very large numbers to be stored with accuracy. Avoids the possibility
of many numbers having multiple representations.
Points To Note
01 Data Representation 13
Examples of non-composite user-defined data types include enumerated and
pointer data types
Record, set and class are examples of composite user-defined data types
File organization allow for serial, sequential or direct access
Floating-point representation for a real number allows a wider range of values
to be represented
A normalized floating-point representation achieves the best precision for the
value stored
Stored floating-point values rarely give an accurate representation of the
denary equivalent
01 Data Representation 14