COMPUTER STUDIES
Form Three Notes
CHAPTER 2
DATA PROCESSING
Comprehensive Study Notes
1. Definition of Key Terms
Data
Data is a collection of raw facts — figures, letters, characters, symbols — that convey little or no
meaning on their own without processing.
Information
Information is data that has been processed and is meaningful to the user. It must be available in the
form the user needs it, when they need it.
Data Processing
Data processing refers to the process of transforming raw facts (data) into meaningful output — i.e.,
information.
Data Processing Cycle
The data processing cycle refers to the stages of Input → Process → Output that data goes through
to be transformed into information.
■ Note: Remember: GIGO — Garbage In, Garbage Out. The accuracy of output depends entirely on
the accuracy of input data.
2. Data Processing Cycle
The data processing cycle has four primary stages:
Stage Description
1. Data Collection Gathering raw data from its point of origin for processing purposes.
2. Data Input Converting collected data from human-readable form to machine-readable form.
3. Processing Transformation of input data by the CPU into a more meaningful output.
4. Output The final activity — producing desired information and distributing it to target groups.
Fig 1: Electronic Data Processing — Source data is entered into a computer, processed by the CPU, and printed as
output.
3. Data Collection
Methods of Data Collection
• Interview — direct questioning of respondents
• Questionnaire — written set of questions distributed to respondents
• Observation — watching and recording events as they happen
• Record Inspection — examining existing documents and records
Stages of Data Collection
Depending on the method used, data collection may involve these stages:
• Data Creation — Putting together facts in an organised format (manually prepared documents
or captured using scanners, digital cameras, etc.)
• Data Transmission — Transferring data from the point of collection to the processing point
(electronically via computer-to-computer, or physically by post).
• Data Preparation — Converting data from source document to machine-readable form.
• Media Conversion — Converting data from one medium to another (e.g. CD to hard disk for
faster input).
• Input Validation — Subjecting entered data to validity and verification checks before
processing.
• Sorting — Arranging source documents in a particular order for easy and faster data entry.
Verification vs. Validation
Verification Validation
Checking that what is on the input document is exactly
Identification
the same as and
what
removal
is entered
of errors
into the
by the
computer.
computer through the co
4. Errors in Data Processing
The accuracy of data entered in the computer determines the accuracy of the information produced.
There are three main types of errors:
(a) Transcription Errors
These errors occur during data entry and include:
• Misreading errors — Incorrect reading of a source document leading to wrong values being
entered (e.g. reading '5' as 'S', or letter 'O' as zero '0'). Usually caused by bad handwriting.
• Transposition errors — Incorrect arrangement of characters, i.e. putting characters in the
wrong order (e.g. entering 524 instead of 542).
■ Note: Transcription errors can be eliminated by using data capture devices such as scanners and
barcode readers.
(b) Computational Errors
These occur when an arithmetic operation does not produce the expected result:
• Overflow errors — The result of a calculation is too large to be stored in the allocated memory
space (e.g. storing a 9-bit result in an 8-bit memory location).
• Truncation errors — Real numbers with long fractional parts are cut off to fit in allocated
memory (e.g. 0.854692 truncated to 0.854).
• Rounding errors — A digit is raised or lowered to the required rounded number (e.g. 3.59
rounded to 3.6).
(c) Algorithm / Logical Errors
These errors occur as a result of wrong algorithm design — the program logic is incorrect even
though the syntax may be fine.
5. Data Integrity
Definition
Data integrity refers to the accuracy and completeness of data entered in a computer or received
from an information system.
Factors that Determine Data Integrity
• Accuracy — Whether the data/information is true or correct. Computers produce accurate
results as long as correct instructions and data are entered.
• Timeliness — Whether information is available when needed. Outdated information has little
or no value in decision-making.
• Relevance — Data entered must be pertinent to the processing needs at hand.
• Audibility (Verifiability) — The ability of users to check the accuracy and completeness of
information.
Ways to Minimise Threats to Data Integrity
• Use error detection and correction software when transmitting data.
• Design user interfaces that minimise chances of invalid data entry.
• Use devices that capture data directly from the source (e.g. scanners).
• Control access to data by enforcing security measures.
• Back up data, preferably on external storage media.
6. Data Processing Methods
Data can be processed using one of three methods:
(a) Manual Data Processing
Staff use laid-down procedures with pen and paper. No machines are used — only simple tools like
tables and rulers. Tasks include collecting, processing, and distributing information.
Fig 2: Manual Data Processing — Human brain processes data from in-tray to out-tray.
(b) Mechanical Data Processing
Staff use mechanical machines such as calculators, typewriters, cash registers, and duplicating
machines to perform operations.
Fig 3: A Manual Typewriter — an example of a mechanical data processing tool.
(c) Electronic Data Processing
Data is manipulated using electronic machines (computers, mobile phones, washing machines,
digital TVs) to produce information. This method is faster and more accurate, especially for large
volumes of data.
■ Note: The first large-scale electronic general-purpose computer was the ENIAC (Electronic
Numeric Integrator and Calculator).
Fig 4: ENIAC — Electronic Numeric Integrator and Calculator, one of the earliest computers.
Factors Determining Choice of Data Processing Method
• Size and type of business
• Timing aspects (how urgently information is needed)
• Link between applications
7. Computer Files
Definition
A file is a collection of related records that give a complete set of information about a certain item or
entity. Files can be stored manually (in a file cabinet) or electronically (in a computer storage device).
Elements of a Computer File
• Characters — The smallest element. A single letter, number, or symbol that can be entered,
stored, and output.
• Field — A single character or collection of characters representing one piece of data (e.g. an
employee's name is a field).
• Record — A collection of related fields representing a single entity (e.g. Name, ID No., Sex,
Department = one employee record).
Logical vs. Physical File
Logical File Physical File
The way the user views the file — its contents and the
The
processing
actual arrangement
to be done of
onfile
them.
contents on the storage media surfa
Advantages of Computerised Filing
• Information takes up less physical space than manual systems.
• Enhances data integrity and reduces duplication.
• Offers faster access and retrieval of data.
• Much easier to update or modify information.
8. Types of Computer Processing Files
File Type Description
Master File The main file containing permanent records. Has both static fields (rarely change, e.g. name,
Transaction File Holds temporary incoming or outgoing data about an organisation's activities over a period of
(Movement File)
Reference File Permanent or semi-permanent file used for reference/look-up purposes (e.g. price lists, PAYE
Sort File Created from existing transaction or master files. Records are sorted in ascending or descend
Back-up File Duplicate copies of existing files. Created whenever an update is carried out. Used in case of
Report File Contains sets of records extracted from master files, used to prepare reports for later printing
9. File Processing Activities
• Updating — Changing data in a master file to reflect the current status.
• Referencing — Accessing a record to see its contents without altering it.
• Sorting — Arranging file contents into a predetermined sequence of the key field.
• Merging — Combining the contents of two or more input files into one output file.
• Matching — Comparing input file records to ensure the same records exist in both files.
• Summarising — Accumulating records of interest from a file to form a single record in an
output file.
• Searching — Looking for a specific record of interest within a file.
File Updating Terms
• Hit Rate — The proportion of a master file's records that are active/processed. Formula:
(Transactions ÷ Total Records) × 100. E.g. 600 ÷ 12,000 × 100 = 5%.
• Volatility — The frequency with which records are added or deleted. High frequency = 'volatile'
file; low frequency = 'static' file.
• Size — The total number of records stored in the file.
• Growth — Files grow as new records are added.
10. File Organisation Methods
File organisation is the arrangement of records within a particular file. There are four main methods:
(a) Sequential File Organisation
Records are stored and accessed in a sorted order using a key field. Searching starts at the
beginning and proceeds to the end until the record is found. Mainly used with magnetic tapes.
Advantages Disadvantages
Simple to understand and organise. Entire file must be read even with very low activity rate.
Easy to maintain. Random enquiries are impossible.
Inexpensive storage media. Data redundancy is typically high.
(b) Serial File Organisation
Records are laid out contiguously one after another in no particular sequence — stored in the same
order they arrive. No relationship exists between contiguous records. Used with magnetic tapes.
(c) Random (Direct) File Organisation
Records are stored randomly but accessed directly. A record key determines where a record is stored
on the media. Used with magnetic and optical disks.
Advantages Disadvantages
Records are quickly accessed. Data may be accidentally erased or overwritten.
File update is easily achieved. Expensive hardware and software required.
No indexes required. Complex and costly system design.
(d) Indexed Sequential File Organisation
Similar to sequential organisation, but an index is used to help the computer locate individual records
on the storage media. Used with magnetic disks.
Advantages Disadvantages
Records can be accessed sequentially or randomly. Storage medium is relatively expensive.
Records are not duplicated. Sequential access is time-consuming.
Fast random access. Sequential processing may introduce redundancy.
Fig 5: Records on a Magnetic Tape — showing unblocked (single) and blocked (multiple) records with Inter-Record
Gaps (IRG).
11. Electronic Data Processing Modes
There are eight main modes of electronic data processing:
(a) On-line Processing
Results are available immediately. All peripherals are under direct control of the CPU. Users can
interact with the system at any time using input/output facilities.
• Applications: Banking, Stock Exchange, Stock Control, Water/Electricity Billing.
• Advantages: Files kept up to date; information readily available; file enquiries possible via
terminals.
• Disadvantages: Complex to develop; costly hardware, software, and storage media.
(b) Time Sharing Processing
The CPU serves two or more users with different processing requirements. Processor time is divided
into time slices allocated equally to all jobs in a queue. Incomplete jobs return to the tail of the queue.
• Applications: Bureaus, learning institutions, companies.
• Advantages: Fast information output; file enquiries possible; user interaction supported.
• Disadvantages: User has no control over the central computer; poor data security; slow
response with many tasks.
(c) Real Time Processing
The computer processes incoming data immediately as it occurs, updates the transaction file, and
gives an immediate response that affects events as they happen.
• Applications: Airline reservation, hotel reservation, chemical plant processing.
• Advantages: Information instantly available; immediate control; fast and reliable.
• Disadvantages: Requires complex and expensive OS; not easy to develop; requires Front
End Processors (FEPs).
(d) Multi-programming / Multi-tasking
More than one program is executed apparently at the same time by a single CPU. The OS allocates
each program a time slice and determines execution order.
• Advantages: Increases CPU productivity; reduces peripheral-bound operations.
• Disadvantages: Requires more expensive CPUs; complex operating system.
(e) Distributed Processing
Processing tasks are divided and assigned to two or more computers at physically separate sites,
connected by data transmission media. Different database tables can reside on separate computers.
• Application: Banks — customers served from branches while data is updated at the head
branch.
• Advantages: Less risk of total system breakdown; reduced data loss; reduced load on host
computer.
• Disadvantages: Expensive communication costs; sophisticated software required.
(f) Batch Processing
Transactions are accumulated over a period of time (daily, weekly, monthly) and processed all at
once at a pre-specified time.
• Application: Payroll processing.
• Advantages: Simple to develop; timing of reports not critical; low unit processing cost.
• Disadvantages: Time lag between transaction origin and information availability; not suitable
for instant decisions; difficult priority scheduling.
(g) Multi-processing
More than one task is processed simultaneously on different processors within the same computer.
The computer contains more than one independent CPU working in a coordinated way.
(h) Interactive Processing
There is continuous dialogue between the user and the computer. The program keeps prompting the
user to provide input or respond to prompts displayed on screen.
12. Factors to Consider When Selecting a Data Processing Mode
• The need for direct information retrieval and/or file interrogation.
• Control over resources (files, input/output devices).
• Cost of acquiring relevant hardware, software, and media.
• Optimisation of processing time.
• Time factor of information needed for managerial decision-making.
Review Questions
1. Define: (a) Data Processing, (b) Data Processing Cycle.
2. Using an illustration, describe the four primary stages of the data processing cycle.
3. Outline the stages of data collection.
4. What is the relevance of GIGO (Garbage In Garbage Out) to errors in data processing?
5. Explain the two types of transcription errors.
6. State three types of computational errors.
7. Define the term data integrity.
8. Give three factors that determine the integrity of data.
9. State at least five ways of minimising threats to data integrity.
10. Distinguish between data and information.
11. Describe the types of data processing methods.
12. Distinguish between manual, mechanical, and electronic data processing.
Model Answers (Selected)
Answer 1:
(a) Data processing refers to the transformation of raw data into meaningful output (information). (b)
The data processing cycle refers to the stages (Data Collection → Data Input → Processing →
Output) that data goes through during its transformation into information.
Answer 3 — Stages of Data Collection:
• Data Creation
• Data Transmission
• Data Preparation
• Media Conversion
• Input Validation
• Sorting
Answer 6 — Computational Errors:
• Overflow errors
• Truncation errors
• Rounding errors