Data Processing
Definition
It is defined as:
a) Refers to the process of transforming data into meaningful output.
b) Refers to the process of converting data into useful information.
c) Refers to the process of translating collected data into usable information.
Note:
Data – Refers to raw facts that are less meaningful to the owner.
Information – Refers to meaningful results obtained as a result of data processing
Methods of Data Processing
Refer to the means used to change data into meaningful information. These methods are:
a) Manual data Processing
b) Mechanical data Processing
c) Electronic data Processing
Manual data Processing
Data processing involves human intellectual skills and is done using pens and papers.
Note:
In this case the data to be processed under goes three steps:
First, the incoming data is received and kept in the In-tray
Second, data is processed using human brain and the results are presented in paper form.
Third, the results are kept in the Out-tray ready to be distributed to people who need it or stored in file
cabinet.
Manual data Processing Summary
In try Human Brain Out tray
In coming data Data is processed by human Information (results)
effort using pens and papers awaiting distribution
Mechanical Data Processing
It can be explained as:
It is data processing that occurs when simple mechanical devices are involved in data processing.
Data processing is done using simple machines like ordinary calculators and type writers.
Note:
The devices used in mechanical data processing possess less or no electronic intelligence
This method provides a faster alternative to manual method especially when processing involves
repetitive task.
Electronic Data Processing
It can be explained as:
Data processing and dissemination is done using microprocessor and artificial intelligence technology
It is data processing that involves use of programmable machines integrated with microprocessor and
artificial intelligence technologies
Note:
a) The devices that are used in electronic data processing include:
i) Computers
ii) Cellular Mobile phones
iii) Programmable calculators
b) These devices have the ability to simulates some form of human intelligence in data processing.
c) During electronic data processing devices such as computer adopt various electronic modes to process data
depending on factors such as the operating system
d) The various Modes of Electronic Data Processing are; Online processing, Real time processing, Distributed
processing, Time sharing, Batch processing, Multiprocessing, Multitasking and Interactive processing
e) The Modes of Electronic Data Processing will be further discussed at the end of this topic
Data Processing Cycle
It can be defined as: -
Refers to the sequence of activities involved in in data conversion from raw to form to information.
Refers to fundamental stages of input, process, output that data pass through to be transformed to
information.
Note: This process is referred to as cycle since: -
The output obtained can be stored and after processing and may be used in future as input data.
It keeps on repeating itself by the fact that the output obtained can be stored after processing and be
used in future as input.
Stages of Data Processing Cycle
These stages are:
Data collection
Data input
Processing
Output
Data Collection
It can be defined as: -
It is the process of obtaining crucial data needed for processing
It is the act of looking for crucial facts needed for data processing.
Note: This process: -
Is also referred to as Data Gathering or Fact Finding
It Involves Data Collection Methods and Stages of Data Collection
Data Collection Methods
There are various methods that are used in data collection are:
Questionnaires
It is a special document that allows one to ask a number of standard questions set to be asked to a large number of
people in order to gather information from them.
Interviewing
It is a direct face-to-face conversation between the interviewer and interviewees.
Note: In this case the interviewer obtains answers to questions he asks the interviewee and gets the interviewee’s
suggestions and recommendations.
Observation
It is fact-finding technique that requires the one to watch the activities being carried out or to participate in
performing some activities to gather the facts intended.
Sampling
Sampling is the systematic selection of representative elements of a population.
Note: The selected elements are examined closely and the results assumed to reveal useful information about the
entire population.
Stages Of Data Collection
It mainly involves the following processes:
Data creation Media conversion
Data transmission Input validation
Data preparation Sorting
Data creation
It defined as: -
It is the process of identifying and putting together facts in an organized format.
It is the process of identifying of the required data to meet specific need.
It is the process of identifying, collecting, and recording data in an organized format for a specific
purpose.
Data preparation
It is verification and cleaning up of raw collected data by removing incomplete and erroneous entries.
Conversion
It defined as: -
It is transforming of the prepared data from source document to machine readable form.
It is the process of changing the prepared data from one format to another for instance from hardcopy to
soft copy
Note:
Transcription is only necessary if at all the data capture device couldn’t be captured in digital form.
Validation
It defined as: -
It is performing validity checks to source data before it is processed reduce error at the input.
It is the process of checking the accuracy of and quality of source data before using or processing it
Data transmission
It is the dissemination of electronic data form by physical or electronic mean for processing.
Data Input
It refers to a process through which the collected is converted from human readable form to machine readable
form by means of input devices
Process of Data Input
It may involve the following steps:
Media conversion
Input validation
Sorting
Media conversion
It is the process of transferring data from one medium to another medium to enhance efficiency when need
arises. Note: For instance, for faster input data can be copied from a floppy disk to a CD.
Input validation
It is performing validity checks to data entered in computer using a computer program before it is processed
reduce error at the input.
Sorting
It is arranging data in a predefined order before processing.
Data Processing
It is transformation of input data to more meaningful output (information).
Note: Processing operation are; calculation, comparing values and sorting.
Data Output
It is meaningful information produced after data has been processed.
It is the final activity in data processing cycle that generates the desired output.
Note:
The processed data can be stored for future retrieval or be distributed to the target group.
Distribution is also called dissemination and it refers to the act of making information available to those
who need it.
Dissemination process may involve electronic presentation over radio or television, distribution of hard
copies, broadcasting over the internet or mobile phones etc.
Errors in Data Processing
Definition of an error
It is anything that affects the accuracy and the validity of data input and information output.
Note: The accuracy of the data entered in the computer determines the accuracy of the information produced.
Thus, garbage in garbage out (GIGO)
Types of Errors in Data Processing
These errors can be classified into three major categories and they include: -
a) Transcription errors
b) Computation errors
c) Algorithm or logical errors
Transcription errors
Refer to errors that occur during data entry.
Categories of Transcription Errors
They are:
a) Misreading errors
They are errors that occur due to misinterpretation of source document by the user and hence entering
wrong value. For instance, in a number like 589 the user may confuse 5 for S when reading handwritten
figures
b) Transposition errors
They are errors that occur from incorrect arrangement of character by putting the characters in the wrong
order. For instance, the user may enter 396 instead of 369
Note: Transposition errors can be managed by using modern data capture devices like bar code readers, optical
character readers, and digital cameras for they enter data with minimum user involvement.
Computation errors
Refer to errors that arise when an arithmetic operation does not produce the expected results. Such errors
include.
Overflow errors - They are errors that occur if the result of the calculation is too large to be stored in the
allocated memory space. For instance, an overflow will occur if the result of calculation is giving a 9-bit
byte for byte that is represented by using 8 bits.
Truncation errors – They errors that occur when extra characters are cut off (truncated) from the
fractional part so that a real number that have long fractional part that can’t can fit in the allocated
memory space. For instance, a number like 0.347598 can be truncated to four digits to become 0.3475
instead of rounding off.
Rounding errors – They are errors that occur when a real number is rounded to required number by
either raising or lowering a digit.
Algorithm or logical errors – They are errors that result from algorithms that were wrongly designed during
program development thus program end up giving erroneous output when it runs.
Note: An algorithm is set of procedural steps followed to solve a given problem and they are used as design tools
during program writing.
Data Integrity
It is defined as: -
Refers to the accuracy and completeness of data entered in a computer or output received from an
information system.
Refers to validity correctness or completeness of data entered in a computer or received from an
information system.
Measures of Data Integrity
Integrity of data can be measured in terms:
Accuracy
Timeliness
Relevance
Accuracy – It refers to how close approximation is to an actual value.
Note: This accuracy depends on:
Correctness of input – As long as the computer is supplied with correct instruction or data it will
generate accurate result efficiently
Number length – The accuracy of real number depends on the number’s length. For instance, 84.975 is
more accurate than 84.98
Timeliness – It is defined as: -
Refer to the accuracy of data in respect to the current state which it is needed for.
It refers to timely convenience of data to ensure that it delivered to those who need it on time. For
instance, the information on the newspaper that is meant to invite people for meeting or occasion must
be printed prior to the event and not later.
Relevance – It is defined as: -
Refer to how suitable the data enter into computer is in order to get the expected output.
It is the state of data being precise or having logical significance to the matter at hand so as to meet
processing needs and to enhance daily operations or decision making.
Threat to Data Integrity
Refer to any act or situation that can lead to hazardous exposure or corruption of data.
Ways of Minimize Threat to Data Integrity
The following measures can be taken to minimize and control data threats.
1. Backup data preferably on external storage media
2. Control access to data by enforcing security measures.
3. Design user interface that minimize chances of invalid data entry
4. Using error detection and correction software when transmitting data
5. Using devices that directly capture data from the source such as bar code reader, digital cameras, optical
character readers etc.
Computer Files
Definition of a file
It is defined as:
a) It is a collection of related record that give a complete set of information about a certain item or entity.
b) It is a collection of related records stored together on a storage device and treated as a single unit.
Note:
A file is representative of a given set of information and it is generated after data has been processed.
A file acts as the basic unit of data storage that stores and organizes information for easy access,
processing, and retrieval.
File storage
Refer to the method of saving and keeping files for safe preservation, easy retrieval, update, or sharing
whenever needed.
Note: It refers both to manual storage of paper files and electronic storage of digital files.
File storage (Filling) Methods
The two filling systems are:
a) Manual(Physical) filling system
b) Computerized (Electronic/Digital) filling system
a) Manual (Physical) Filling System
It involves storing paper-based records in physical locations such as; Filing Cabinets, Shelves, Archives, and
Storage Rooms.
Weaknesses/Limitations of Manual (Physical) Filling System
Require Physical space which may become congested as records increase.
Retrieving a specific record/file can be time-consuming, especially when the filing system is
not well organized.
It is prone to damage where files can be destroyed by fire, water, pests, or wear and tear over
time.
It is prone to risk of loss and misplacement due to human error (misfiling, misplacing, or
theft) can lead to permanent loss of important documents.
1. Slow processing
o Manual handling of records makes tasks like updating, sorting, and compiling
reports very slow compared to electronic storage.
2. Limited security
o Paper files are harder to secure against unauthorized access compared to
encrypted digital storage.
3. High cost of maintenance
o Requires physical storage space, cabinets, and clerical staff to manage records.
4. Duplication problems
o Creating and maintaining multiple copies of the same record is cumbersome and
increases chances of inconsistency.
5. Not environmentally friendly
o Uses large amounts of paper, leading to wastage and environmental impact.
6. No remote access
o Unlike digital files, manual files cannot be accessed remotely; one must be
physically present where they are stored.
B) Electronic (Digital) File Storage
Involves storing files as digital data on computer systems storage such as; Local Storage
Devices, Network Storage, Cloud Storage, Database Storage
Advantages of Computerized Filling System
It offers much better way of holding information than the manual filling system
Information takes up much less space than the manual filling.
It is easier to update or modify information
It offers faster access and retrieval of data
It enhances data integrity and reduces duplication
Elements of a Computer File
They constituent that make a computer file, which are:
a) Character
b) Field
c) Record
Character – It refers to a letters, numbers or symbols that represent the smallest element of a computer file
Field – It a single character or a collection of characters that represent a single piece of data in a computer file.
Record – It is a collection of related fields that represent a single entity.
Note:
Character can be entered, stored and be retrieved as computer output
item’s ‘Name’ can termed as a field in an item record.
Item Id, Name Description, Buying Price, Selling Price make up a record in stock sheet details of a product such
Classification of Computer Files
The mainly two categories are:
a) Logical files
b) Physical files
Logical file – It is a file that expresses what data items it contains and what processing operations may be
performed on the data items.
Physical file – It is a file that expresses how data is stored on a storage media and how processing operations are
made possible.
Types of Computer Processing Files
They include:
Master files Backup files
Transaction files Report files
Reference files Sort files
Master File
It is defined as:
a) It is a file that contains relatively permanent record concerning particular item or entity.
b) It is a file that stores permanent data or information about an item or entity that rarely changes
Transaction files
It is defined as:
a) It is file that record individual transactions or events as they occur.
b) It is a file that holds operational data generated from business or organizational activities.
Purpose
It can later be used to update master files.
It can be used to audit daily, weekly or monthly processing operations
Example
Daily sales file in a supermarket.
Exam scores or fee payment records in a school.
Reference File
It is defined as:
a) It is a file that contain relatively static data that is used for validation and lookup purposes during
processing.
b) It is a file containing constant or rarely changing data used to validate, or supplement information in other
files.
Purpose:
Provides look-up values during processing.
Provide lookup data for validation or classification.
Ensure consistency and accuracy in data processing.
Examples:
Price lists.
Examination timetable.
School subjects list.
Report File
It is defined as:
a) It is a file that stores processed information organized for reporting purposes
b) It is an output file created after processing has occurred.
c) It is a file that stores relatively permanent records generated after a processing activity or extracted from
the master file.
d) It is a file generated to present summarized or processed data for analysis or decision-making.
Purpose:
Provide insights, trends, or summaries.
Support business decisions and regulatory requirements.
Characteristics:
Generated from master and transaction files.
Often used for decision-making.
Examples:
Student report cards.
Payroll reports.
Financial statements.
Backup File
It is defined as:
a) It is a duplicate copy of important files kept for safety.
b) Refer to copies of files or databases created to prevent data loss due to hardware failure, corruption, or
disasters.
c) It is a file that holds copies of information that can be used to reconstruct the original file in case it is
corrupt, lost, changed accidently.
d) Refer to copies of files made to protect against accidental loss, corruption, or hardware failure
Characteristics:
Used to restore data in case of loss, corruption, or disaster.
Stored on external media or cloud storage.
Examples:
Copy of student database stored on Google Drive.
Payroll backup stored on an external hard disk.
Sort File (Work File / Scratch File)
It is defined as:
a) It is a temporary file used during intermediate processing such as sorting, merging, or calculations.
b) It is a temporary files created during the sorting process to organize data in a specific order based on one
or more keys.
Purpose:
Improve efficiency for operations like merging or searching.
Prepare data for further processing or reporting.
Examples:
Sorting customer records by last name.
Organizing transaction logs by date.
Note:
They type of computer processing file is determine by its purpose or function the data stored in it is intended to
perform or fulfill.
File Organization Method
There are four methods of file organization depending on the method of access, efficiency, flexibility and storage
device to be used. These four methods are:
Serial Index sequential
Sequential Random
Serial method:
Explanation: -
It is a method in which records are stored in the order in which they are received or entered into the
system without any particular arrangement or sorting.
The records are laid down continuously one after another with no particular sequence.
The records are stored in the order they are created or received, with no specific sorting criteria
Note:
The records are stored in order that they come into the file and there exists no relationship between
different records.
During access the searching is done starting from the first record towards the last record.
This method is suitable where all the records in the files are to be accessed.
An example of this access method are records stored in a magnetic tape.
Record Record Record Record Record Record
File head File tail
Application Area
Archiving documents where order of creation is important (e.g., invoices, receipts).
Simple systems with low retrieval needs.
Advantages
Easy to implement and maintain.
No need for complex indexing.
Simple to implement
Suitable for batch processing
Disadvantages
Inefficient for searching or retrieving specific files.
No logical grouping or categorization
Searching may take longer if the file is large
Sequential file Organization
Explanation: -
It is a method in which records are stored in a sorted order based on a key field, and the records are
accessed one after another in sequence.
The records are arranged in specified order using their key field either in ascending order or in descending
order.
The records are stored in a sorted order based on a key field either in ascending order or in descending
order.
Note:
The records are stored in a sorted order based on a key field (Admission No.).
Access is linear: to find a record, you must scan from the beginning until you reach the desired file.
In this organization, records are laid down one after the other therefore continuous records have some
relationship that is a key field existing between them.
The method is suitable for files where several records are accessed at a particular time.
Record Record Record Record Record Record
Admin 1 Admin 2 Admin 3 Admin 4 Admin 5 Admin 6
File head file tail
Note: admission number is the Key Field.
Key is a subset of the fields in a record used to uniquely identify a record.
Application Area
Batch processing (e.g., payroll, billing).
Systems where files are processed in a fixed order (e.g., alphabetical, chronological).
Advantages
Simple to implement.
Efficient for batch operations.
It is simple to understand
Easy to re-organize files and different records
Loading file requires only the key records
The storing makes it easier to access records
Binary search technique can be used to reduce record search time
Disadvantages
Slow for random access or updates.
Inserting or deleting records requires rewriting the entire file.
Slow in data access because sorting does not remove the need to sequentially access other records
The method does not support modern storage technologies that require fast access to stored records
The transactions must be stored using the key fields, therefore requires to be sorted
Random enquiries cannot be accessed easily since the records are stored one after the other.
Index Sequential File organization
Explanation: -
It is is a method in which records are stored sequentially according to a key field while an index is used to
locate records quickly.
The records are arranged in sequential methods but records are grouped together using some indexes.
The records are stored sequentially, but an index (like a table of contents) allows direct access to specific
records.
Technical Humanities
Record1 Record2 Record3 Record1 Record2 Record 2
Computer Agriculture Business Geography History CRE
File head file tail
Note:
This method combines sequential storage with an index for faster access.
They are not easy to access since key fields and indexes must be accessed.
Technical and Humanities are the indexes while Admission number are record keys
Application Area
Databases or systems requiring both sequential and random access (e.g., library catalogs, customer
records).
Systems where files are frequently updated or searched.
Advantages
Faster retrieval than pure sequential methods.
Supports both sequential and random access.
Disadvantages
More complex to implement and maintain.
Requires additional storage for the index.
Random File Organization
Explanation: -
It is a method in which records are stored in any location and can be accessed directly using a key or address
without searching through other records.
Note:
It supports random storage of records and allow forward and backward search in the files thus providing direct
access to the records of interest.
Advantages
The records are quickly accessed due to forward and backward search capabilities
Facilitates quick retrieval of records
Makes it easy to update the transactions record.
Extremely fast for retrieval and updates.
No need for sequential scanning.
Disadvantages
Data may be incidentally erased or over written unless special precautions are taken
Complex to design and manage.
Risk of collisions (two keys pointing to the same location).
Requires advanced hashing or addressing techniques.
.
Factors To Consider When Choosing a File Organization Method
Frequency of update – The rate which the file is required to update and how regular the files require
updates determine the selected File Organization Method
File access Method – The methods of accessing records in a file and transferring the content to the main
memory of a computer should match the preferred File Organization Method.
Nature of the system – A suitable File Organization Method is adopted depending on how often the
system runs. For instance, in system that runs periodically serial file organisation while in a system where
transactions processed as they occur and the master file updated immediately, random file organisation is
the most appropriate is or sequential file.
Storage media – The storage media influences the file organisation method. For instance, magnetic tape
only allows several organisation methods while a magnetic disc can be used for index sequential or
random.
Electronic Data Processing
This refers to computerized means of processing data under the influence/control of the operating system.
Modes (Types) of Electronic Data Processing
They include:-
a) Online processing
b) Real time processing
c) Distributed processing
d) Time sharing
e) Batch processing
f) Multiprocessing
g) Multitasking
h) Interactive processing
Online Processing
It is defined as: -
Refer to a processing mode where data is processed immediately it is received
Refer to a processing system that responds immediately whenever a change is made.
It is a method that utilizes Internet connections and equipment directly attached to a computer thus,
allows data to be processed immediately it is received without necessary giving the feedback to the
source.
Application Area
It is used mainly for: -
Information recording and research.
Booking a seat on an airline
Features:
Requires a continuous link between user and system.
Data is captured and updated directly into files/databases.
Real Time Processing
It is defined as: -
It is a technique has the ability to respond almost immediately to various signals in order to acquire and
process information update the relevant file and give immediate feedback.
Refers to a case where the computer processes the incoming data as soon as it occurs, up-dates the
transaction file and gives an immediate response.
Application Area
In Nuclear Power Stations, where certain level of temperature is required to be maintained. Computers
are programmed to provide instant answers upon a slight change in temperature.
Stock trading
Robotics.
Humidifiers and Dehumidifiers in the computer lab, when there is a lot of humidity, computers turn on
the dehumidifiers and vice versa
Note: It is critical in environments where decisions must be made immediately.
Features:
Ultra-low latency. (No delay between input and output)
Distributed data processing
It is defined as: -
Refer to a processing system where a processing task is carried out by remote workstations connected to
one big central workstation or server.
13
Refer to a processing system that divides processing tasks to two or more computers that are located on
physically separate sites but connected through a network.
Note:
The remote computer may be connected to a central computer that receives input from the remote
computers (terminals), processes the data and updates the master file.
If required, the output can be communicated back to the remote terminals
Application area
ATMs are good examples of this data processing method.
Used in large-scale systems such as Cloud computing services and Blockchain
Features:
Workload is spread out to improve efficiency.
Scalability and fault tolerance.
Systems may be geographically spread but interconnected.
Time sharing
It is defined as: -
It is case where A central computer's resources are shared among multiple users simultaneously, with
each user assigned a small, rapid time slice
Refer to multi-user operating system-controlled environment that allows different users to access a
central processing unit apparently at the same time through terminals connected to it.
Refers to many terminals connected to a central computer and given access to the central processing unit
apparently at the same time where each user is allocated a time slice of the CPU in sequence.
Application area
University mainframe computers shared by students
Note:
Each user is allocated a time slice of the CPU in sequence
The amount of time allocated to each user is controlled by a multi-user operating system.
If a user’s task is not completed during the allocated time slice, he/she is allocated another time slice later
in a round robin manner.
Features
Many users appear to be using the system simultaneously.
Efficient resource use in multi-user environments
Batch Processing
It is defined as: -
It is a method where large volumes of data are collected over a period of time and processed together as
a single unit.
Refer to a method whereby data is collected, grouped, and processed together at a scheduled time,
Refer to a method whereby the information to be processed is organized into groups (batch) and to
allowed to accumulate over a specified period of time then the batch is processed at once.
Application area
Payroll – In a payroll processing system, employees’ details concerning number of hours worked, rate of
pay, and other details are collected for a period of time, say one month. These details are then used to
process the payment for the duration worked.
Processing bank cheques
Printing of bank statements
Updating of a stock database
Features
No immediate feedback.
14
Suitable for large volumes of routine tasks.
Multiprocessing
It is defined as: -
Refers to the processing of more than one task apparently at the same time in single computer.
Refers to the use of two or more central processors (CPUs) within a single computer system to execute
instructions simultaneously.
Application area
Modern servers and supercomputers.
High-performance gaming PCs.
Features
Increases speed and reliability.
Can perform multiple instructions at once.
Note:
This is possible in computers like mainframes and network servers.
A computer may contain more than one independent central processing unit which works together in a
coordinated way.
At a given time, the processors may execute instructions from two or more programs or from different
parts of one program simultaneously.
Multi-programming
Refers to a type of processing where more than one programs are processed apparently at the same time by a
single central processing unit.
Note:
Multi-programming is also referred to as multitasking
Unlike multiprocessing, in multitasking, the computer has only one CPU.
The computer allocates each program a time slice and decides what order they will be executed.
Interactive Processing
Refers to a method of processing data by prompting the user to provide inputs as data or instructions
Note:
There is a continuous dialogue between the user and the computer.
As the program executes, it keeps on prompting the use to provide input or respond
to prompts displayed on the screen
Advantages of Electronic Processing
Efficient processing especially where all required data is available
convenient access and availability of data and information digitally
Distance between entities that are processing data is made non-significant
Support for information sharing and collaboration on a wider scale.
Disadvantages of Electronic Information Processing
Security of data can be compromised during storage or while in transit on networks if appropriate
measures are not taken
Lack of legal frameworks in many countries that should support electronic processing activities
Lack of ICT skills among many knowledge workers to support electronic data processing.
15