Understanding Computer Files and Data Structures
Understanding Computer Files and Data Structures
A computer file is a specific piece of data that is held on a computer system. This contained data may be anything
from an executable program to a user created document. A computer file is made up of a name that creates the
file’s identity and an extension that tells the operating system and associated programs what type of file it is. Files
have a set beginning and end, which means they interact with a computer system in a predictable way.
Computer files are the most basic unit of data that users can store on a disk. Every program, image video, song,
and document is stored as a file.
Record
A record is a value that contains other values, typically in fixed number and sequence and typically indexed by
names. The elements of records are usually called fields or members.
A record is a collection of related data items or fields. Each record normally corresponds to a specific unit of
information. For example, employee number, employee’s name, basic salary and house rent allowance.
Field
A field is a single data item, and many fields make up a record. Each field has a name and one key field called the
primary key is used to identify the record.
Data items are physically arranged as field in a computer file. Their length may be fixed or variable. Since all
individual have 3 digit employee numbers, a 3–digit field is required to store the particular data. Hence, it is s fixed
field. In contracts, since customer’s name vary considerably from one customer to another, a variable to store this
element. This can be called variable field.
Data item
A data file is a collection of records holding the same type of information but about different objects or
individuals.
Data item is the smallest unit of information stored in computer file. It is a single element used to represent a fact
such as an employee’s name, item price, etc.
Types of Data Items
Numeric
This type of data item consists of letter 0 – 9
Alphabet
This type of data item consists of letter A – Z
Alphanumeric
Alphanumeric (sometimes shortened to alphanumeric) is a combination of alphabetic and numeric characters,
and is used to describe the collection of Latin letters and Arabic digits or a text constructed from this collection.
There are either 36 (single case) or 62 (case – sensitive) alphanumeric characters. The alphanumeric character set
consists of the number 0 – 9 and letters A – z.
File Structure
Data
Field
Record
File
1|Page
Represented above is the structure of a file in a systematic order form top to bottom.
Data: A data items is the smallest unit of information stored in computer file.
Filed : A field is a collection of related data items
Record: A record is a collection of related fields.
File :The collection of records is called a file
Sequential
A sequential file is one in which the records are stored in sorted order on one or more key fields.
In a sequential file organization, records are organized in the sequence by which they were added. A sequential
file contains records organized in the order they were entered. The order or the records in fixed. The records are
stored and sorted in physical, contiguous blocks within each block the records are in sequence. Records in these
files can only be read or written sequentially.
Examples of sequential files
1. Invoices for customers sorted on customer number
2. Class registers sorted on last name
Indexed
An indexed file is used to speed up the key search in a file. You can think of it as a one –column table organized in
ascending order and stored on disk. The primary key in the table is used as an index to the record.
An indexed file organization contains reference numbers, like employee numbers, that identify a record in relation
to other records. These references are called the primary keys that are unique to a particular record. Alternative
keys can also be defined to allow alternative methods of accessing the record. For example, instead of accessing an
employee’s record using employee numbers, you can use an alternative key that reference employees by
departments. This allows greater flexibility for users to randomly search through thousands of records of records in
a file. However, it employs complex programming in order to implement.
Random file
This is a file organized via an index. Also called a “direct file” or a direct access file, “it enables quick access to
specific records or other elements within the file rather than having to read the file sequentially. The index points
to a specific location within the file, and the file is read from that point.
2|Page
Methods of accessing files
Serial files
Let us use the tape of instance, the only way to access a serially organized file is serially.
Sequential files.
The method of access used in still SERIAL but of course the files are now in sequence, and for this reason the term
SEQUENTIAL is often used in describing serial access of sequential tape file. It is important to note that to
process (e.g. update) a sequential master tape file, the transaction file must also be in the sequence of the master
file. Access is achieved by first reading the transaction file and then reading the master file until the matching
record (using the record keys) is found. Note therefore that if the file, in order to get into storage to process it the
computer will first have to read in all nineteen preceding records.
Random files
Like index- sequential files, random files can only be used on direct access media, such as disks. Random file
organisation is the most efficient way of storing extremely large files, such as National databases.
Random files use a very clever method of providing direct access to records. Each record will have its own
specific position or address on the disk. The records are not sorted in any way. The position allocated to each
record will be calculated by using a special formula.
Generally speaking the method of accessing random files is RANDOM. The transaction record keys will be put
through the same mathematical formula as were the keys of the master records, thus creating the appropriate
bucket address. The transactions in random order are then processed against the master file, the bucket address
providing the address or record required.
Computer file Classification
Master file
There are files of a fairly permanent nature, e.g. customer ledger, payroll, inventory, etc. A feature to know is the
regular updating of these files to show a current position. For example customer’s order will be processed,
increasing the “balance owing” figure on a customer ledger record. It is seen therefore that master records will
contain both data of static nature, e.g. a customer name, address, and data that, by its nature will change each time
a transaction occurs, e.g. the “balance” figure already mentioned.
Transaction file
This is also known as movement file. This is made up of various transactions created from the source documents.
In a sales ledger application the file will contain all the orders received at a particular time. This file will be used to
update the master file. As soon as it had required. It will therefore have a very short life, because it will be replace
by a file containing the next batch of orders.
References files
A file with a reasonable amount of permanency. Examples of data used for reference purposes are price lists,
tables of rates of pay, names and addresses.
Criteria for classifying computer files
Criteria for classifying computer files are:
- By nature of content: it refers to the nature of file content
- By organization method: it refers to the sequential, random etc.
- By storage medium: it refers to storage devices in which a file’s’ could only be stored such as magnetic or
optical disk and magnetic tape etc.
3|Page
Handling Computer files
Basic Operations of Computer Files
Create: Creating a file with a given name
Retrieve: Retrieving a stored file or lose file
Copy: Copying a created file to either and external or in – built storage device.
View: Viewing a created file or granting privilege of viewing
Open: Opening a file to use its contents.
Update: Reading or updating the contents
Close: Closing the file, thereby losing access until it is opened again.
Example
The following opens a file, using mode OUTPUT and number 1, and then saves the text HelloWorld! To the file:
10 CLS
20 OPEN “testfile. Dat” FOR OUTPUT AS #1
30 PRINT #1, “Hello World!”
40 CLOSE #1
50 END
Steps involved in accessing Sequential File in BASIC
The open statement is also used for reading information from a file. In general, the open statement follows this
pattern:
OPEN file$ determined the filename to use. The FOR portion indicates how the file will be accessed or operated –
it may be APPEND, BINARY, INPUT OUTPUT, and RANDOM. The AS # is the identifier used for the file
handle in question.
Example
To open a file “reading”, call OPEN and pass INPUT as the file mode. Then you can read the data by using the
INPUT command.
10 CLS
20 OPEN “[Link]” FOR INOUT AS #1
30 INPUT#1, text$
40 CLOSE #1
50 PRINT text$
60 END
4|Page
CODE:
The combination of all these records forms a file. Thus, a file is group of related records. The facilitate the retrieval
of specific records from a file, at least one field in each record is chosen as records key. Usually, the key is unique
to every record to avoid duplication of records in a file. Example in fig. 7.2, shows that Matriculation number is a
good field for the record key. The key is also used for searching and sorting records in a file.
The data file [Link] should contain the following data to work properly with the example above.
Example
10 CLS
20 OPEN “[Link]” FOR INPUT AS #1
30 PRINT #1, “Matriculation Number Maths English Lang. Total Score”
40 PRINT #1, “0001 50 90 140”
50 PRINT #1, “0002 70 40 110”
60 PRINT #1, “0003 80 60 150”
70 CLOSE #1
80 OPEN “[Link]” FOR OUTPUT AS #1
90 DO WHILE NOT EOF(1)
100 INPUT #1, text$
110 PRINT text $
120 LOOP
130 CLOSE #1
140 END
5|Page
Overwriting is a process of writing a binary set of data on a memory. Overwriting generally occurs when unused
file system clusters are written upon with new data. In general it writes over the previous data.
Backups have two distinct purpose. The primary purpose is to recover data as reaction to data loss, be it by data
deletion or corrupted data. the secondary purpose of back deletion or corrupted data. the secondary purpose of
backups is to recover data from a historical period of time within the constraints of a user – defined data retention
policy.
TYPES OF FILE BACKUP
a. Full Backup: Full backup is a method of backup where all the files and folders selected for the
backup will be backed up.
b. Incremental backup: Incremental backup is a backup of all changes made since the last backup. With
incremental backups, one full backup is done first and subsequent backup runs are just the changes made
since the last backup.
c. Differential backup: Differential backup is a backup of all changes made since the last full backup. With
differential backups, one full backup is done first and subsequent backup runs are the changes made since
the last full backup.
d. Mirror Backup: Mirror backups are as the name suggests a mirror of the source being backed up. With
mirror backups, when a file in the source is deleted, that file is eventually also deleted in the mirror backup.
Because of this, mirror backups should be used with caution as a file that is deleted by accident or through
a virus may also cause the mirror backups to be deleted as well.
e. Full PC Backup or Full Computer Backup: In this backup, it is not the individual files that are backed
up but entire images of the hard drives of the computer that is backed up. With the full PC backup, you can
restore the computer hard drives to its exact state when the backup was done. With the Full PC backup, not
only can the work documents, picture, videos and audio files be restored but the operating system, hard
ware drivers, system files, registry, programs, emails etc can also be restored.
f. Local Backup: Local backups are any kind of backup where the storage medium is kept close at hand or in
the same building as the source. It could be a backup done on a second internal hard drive, an attached
external hard drive, CD/ DVD –ROM or Network Attached Storage (NAS). Local backups protect digital
content from hard drive failures and virus attacks. They also provide protection from accidental mistakes or
deletes.
g. Online Backup: These are backups that are ongoing or done continuously or frequently to a storage
medium that is always connected to the source being backed up. Typically the storage medium is located
6|Page
offsite and connected to the backup source by a network or Internet connection. It does not involve human
intervention to plug in drives and storage media for backups to run.
h. Cloud Backup: This term is often used interchangeably with Online Backup and Remote Backup. It is
where data is backed up to a service or storage facility connected over the Internet. With the proper login
credentials, that backup can then be accessed or restored from any other computer with Internet Access
i. FTP Backup: This is a kind of backup where the backup is done via FTP (File Transfer Protocol) over the
Internet to an FTP Server. Typically the FTP Server is located in a commercial data centre away from the
source data being backed up. When the FTP server is located at a different location, this is another form of
offsite backup
Antivirus
An anti–virus program protects a computer file from malicious viruses attack, detects and heals files that have
been attacked. Usually it consists of a firewall, a virus scanner and remover, and sometimes other tools as well.
Password
It is a user chosen secret string of characters that allows access to a computer, interface, file etc. the use of
password is at user’s discretion and caution must be exercised by the user to remember the password always.
7|Page
What is Word Processing
A word processor is an electronic device or computer software application that, as directed by the user,
performs word processing: the composition, editing, formatting and sometimes printing of any sort of written
material
Word processing is the use of computer software to create, edit, view, store, retrieve and print text documents. A
text document is a written communication like letters, reports, memos, and so on. The software that is used for
Word processing is called a Word Processor.
Examples of Word Processing
Microsoft Word
WordStar
WordPerfect
Corel WordPerfect
MultiMate Advantage
Professional Write etc
Word processors are used in place of typewriters because of the quality of outputs, ability to replicate copies
without having to retype or photocopy etc.
Microsoft Word
A word processor can be defined as an application software that helps in the production of a document. Microsoft
Word is Word Processing software. You can use it to type letters, create/edit reports, and other documents.
It is a commonly used word processor today because of its special features. There are different versions of
Microsoft Office Word examples include:
MS Office 2000
MS Office 2003
Ms. Office 2007
Copying a document
Copying a document or portion of a document means duplicating the document. The original document will still
remain while the duplicate of it will be found in a new location. To copy a document five major methods are
involved and they are:
Shortcut method
Keyboard method
Drag and drop method
Ribbon bar method
Right mouse method
Shortcut method
Highlight the portion of document to be copied
Right click on the highlighted text
Select Copy
Position the insertion point in a new location
Right click in an empty space
Select Paste
Keyboard method
Highlight the document to be copied
Hold down the Ctrl key as you drag the highlight to a new location
Release the mouse button
9|Page
Click copy on the Home Ribbon
Position the insertion point in a new location
Click on paste from the Home Ribbon
Note
To cut a document means to move the document from its original location to a different location. The document or
data seizes to appear in the former location. All the steps involved when copying a document are equally
applicable with the cut method but the only difference is that, instead of selecting copy now you have to select cut
before you paste.
Font
Font Face: The next outlook format of a document: Microsoft has embedded the following font face: Arial, Times
New Romans, Tahoma, Elephant, Freestyle Script, Imprint MT shadow, etc.
10 | P a g e
To set a font face for your text, do the following:
Type the text
Highlight the text.
From the Home ribbon click on the font face (font (Ctrl + Shift + F)
Click the drop down arrow and select a font of your choice
Font size: The displays text sizes of your choice: Microsoft has embedded font sizes ranging from 8 – 72.
To select a font size for your text, do the following:
Type the text
Highlight the text
From the Home ribbon click on the Font size (Ctrl + Shift + F)
Click the drop down arrow and select a font size of your choice
Font Style: This displays effect on text such as bold, italic, regular, bold italic
Bold
To select a bold font style for your text, do the following:
Type the text
Highlight the text
From the Home ribbon click on the B icon
Italic
To select an italic font style for your text, do the following:
Type the text
Highlight the text
From the Home ribbon click on the i icon
Underline
To select an italic font style for your text, do the following:
Type the text
Highlight the text
From the Home ribbon click on the u icon
To select the different underline font style for your text, do the following;
From the Home ribbon click on the drop down arrow.
Clock on the desired underline style
Font colour: This displays colour effects on text such as red, green, blue e.t.c.
11 | P a g e
Click on the color of your choice
Font Effects: This displays other effects on text such as strike through, subscript, superscript etc.
Strikethrough
To apply strikethrough effect on your text, do the following:
select an italic font style for your text, do the following:
Type the text
Highlight the text
From the Home ribbon click on the ……icon
Double strikethrough
select an italic font style for your text, do the following:
Type the text
Highlight the text
From the Home ribbon click on the …….. icon beside font.
Subscript
To apply a subscript feature to your text, for example, 10, do the following select an italic font style for your text,
do the following:
Type the text
Highlight the text
From the Home ribbon click on the x2 icon
Change Case
To change case of a text:
select an italic font style for your text, do the following:
Type the text
Highlight the text
From the Home ribbon click on the Aa icon drop down arrow
Select the format of case you desire for your text.
Character Spacing: This displays different characteristics of spacing that can ne applied on a text they include:
Expanded or Condensed Kerning, etc.
Paragraph
Indent and Spacing: This feature creates a text with spacing before of after. The effects here are: alignment,
indentation, spacing tabs.
Alignment
To determine how a document alignment outlook should be, do the following:
Right click the white space on a document and select paragraph
Click on indent and spacing tab. In the general options in the alignment drop down menu, select your
desired choice.
Click Ok to apply to the document.
Indentation
12 | P a g e
Type the text
Highlight the text
Right click on the text and select paragraph
Click on indent and spacing tab. In the indentation options, select your desired choice as shown in the
dialog box
Click Ok to apply to the document
Introduction
The systems approach can be applied to the solution of many types of problems. When this involves the
13 | P a g e
development of information system solutions to business problems, it is called information systems development
or application development. Most computer-based information systems are conceived, designed, and implemented
using some form of systematic development process. In this process, end users and information specialists design
information systems based on an analysis of the information requirements of an organization. Thus, a major part
of this process is known as
systems analysis and design.
Understand what the Business or organization requires.
Definition of terms
Systems analysis is the process of understanding in detail what a system should accomplish, how it will
accomplish it and what is required to accomplish it.
System design is the process of specifying in details how components of an information system should be
implemented physically.
Systems analyst is a person that uses analysis and design techniques to solve business problems using
information technology.
Skills of a Systems analyst
To be a good and successful systems analyst, the person must have the following skills:
Information technology knowledge and programming expertise
Understand business problems
Use logical method for solving problems
Ability to find facts about the problems and develop how it should be solved
Always wanting the improvement of the system
People management knowledge and skills
Example; Let us apply this process in solving a simple interest (SI) problem:
A man invested the sum of N500,000.00 for 5 years at an interest rate of 12% per annum. Calculate the
amount at the end of the period.
14 | P a g e
Solution:
Step 1: Study and understand the problem. The simple interest problem is understood as defined.
Step 2: Verify that the benefits of solving the problem outweigh the cost. The SI can be solved with our
current knowledge and resources that we have like four figure tables, calculators etc. we do not
need to hire anybody.
Step3: Define the requirements for solving the problem: The requirements of loving the SI problem are
Principal (N500,000.00), Rate (12%) and Time (5 Years)
Step 4: Develop a set of possible or alternative solutions: The problem can be solved in two ways: First
calculating the SI using the formula
(I = P*R*T/100) and then calculate the amount as A = Principal + Interest. Secondly, another way
is calculating the Amount using the formula that A=P(1+(R*T/100)). Note that * means
multiplication.
Step 5: Decide which solution is best and recommended. We decide to use the first method because it is
simpler than the second.
Step 6: Define the details of the chosen solution: The variables (facts) that we need to solve this problem
are as stated in step 3 above and the procedure are as stated on step 4 (i) above.
Step 7: Implement the solution: The problem is above is solve as follow:
I. I = P*R*T/100
= N500,000*12*5/100 = N300,000.00
II. Amount = P+I = N500,000 + N300,000 = N800,000
Step 8: Monitor to ensure that desired result is accomplished: To make sure that the formula, procedure
and calculations are always correct.
CONCEPT OF SYSTEMS
What is a System?
A System is a set of detailed methods, procedures and routines created to carry out a specific activity, perform
a duty, or solve a problem.
A System is an organized, purposeful structure that consists of interrelated and interdependent elements
(components, entities, factors, members, parts etc.). These elements continually influence one another (directly or
indirectly) to maintain their activity and the existence of the system, in order to achieve the goal of the system.
A system is a collection of interrelated components that function together to form a whole and achieve an
outcome.
Examples:
A human system is made up of many organs like hand, legs, eyes nose (and so on) that are interrelated and
form a human being.
A computer system is mainly made up of the hardware, the software, inputs and outputs. Inputs involve
materials or information entering the system which is processes within the system (the components). The
output from a system is made up of the items of piece of information that leave the system.
The hardware components of a computer are made up of input devices, output devices, the primary and
secondary memories, the processor and so on.
The software system of a computer systems are made up of the systems software and Application
Software.
15 | P a g e
The Software systems are made up of Operating System like Windows, drivers, complier, interpreters, and
so on.
The Application software systems are Ms Word, MS Excel, Corel Draw, Payroll and Human Resources,
Enterprise Resources Planning (ERP) and so on.
What is Subsystem?
A subsystem is a part of a large system that can function on its own to perform a task. It can be a system having
subsystems or it may just be a single system. The components that make up the overall computer system can be
described as sub – systems. These are mainly hardware and software subsystems. The process of dividing a system
into subsystems and components is called Functional Decomposition.
Some new types of information systems that cannot be classified as above are listed below:
1. Data warehouses
2. Enterprises resource planning
3. Enterprise systems
4. Expert systems
5. Geographic information system
6. Global information system
16 | P a g e
7. Office Automation
SYSTEMS DEVELOPMENT LIFE CYCLE (SDLC)
Systems development is a planned undertaking with a fixed beginning and end that produces the desired result or
product. It may be a large job that involves many people working for a long period or it can also be a small
assignment that one person can finish in a day. The SDLC provides an overall formalized method for managing the
systems development processes and activities. It represents a detailed and specific set of procedures, steps, and
documents that are required for the development of an information system. It describes the stages involved in an
information system Development.
The SDLC believes that the development of information systems should follow a structured and methodical way,
requiring each stage of the life cycle from inception of the idea to delivery of the final system, to be carried out in
rigid and sequential order.
Note that: the Systems Development Life Cycle and Systems Development Cycle mean the same and can b used
interchangeable.
Definition Of SDLC: The systems development life cycle is the process of understanding how an information
system (IS) can support the business needs of an organization, designing the system, building it and delivering it to
the users.
Objectives Of SDLC
The objectives of SDLC are:
1. To ensure that high quality systems are delivered.
2. To provide strong controls over the system development, and
3. To maximize the productivity of the systems staff.
Investigation stage
17 | P a g e
System design stage
18 | P a g e
operate. It includes the installation and deployment. This is the stage where the software in out into the stage where
the software in put into use and runs the actual business.
ADVANTAGES OF SDLC
The advantages of SDLC are:
1. Simple and easy to use
2. Easy to manage due to the rigidity of the model.
3. Phase are processed and completed one at a time. It works well for developing small information systems
where requirements are very well understood.
4. Provides guidelines for systems development as all the stages and activities are clearly Outlined.
5. Promoted consistency among system development projects.
6. Reduces cost of managing different systems at different stages.
7. Helps in efficient allocation of resources to systems development projects.
DISADVANTAGES OF SDLC
The disadvantaged of SDLC are:
1. Adjusting scope during the life cycle can kill a project
2. No working software is produced until late during the life cycle
3. High amounts of risk and uncertainty
4. Poor model for complex and object–oriented projects
5. Poor model for long and ongoing projects
6. Poor model where requirements are at a moderate to high risk of changing
7. If followed slavishly, it can result in the generation of unnecessary documents.
8. It takes time to go through the whole long development cycle.
Program Development
Definition of a Program
A computer program can be defined as a list of instruction issued to the computer to perform a particular task.
Programs are written in computer programming languages.
Accuracy
Every good program must be error free.
Readability
The program must be easy for any programmer to read and understand.
Maintainability
A carefully written program should be very easy to amend and maintain if need be
Efficiency
One of the characteristics of a good program is the ability to solve a particular problem skillfully
Generality
A good program should be able to solve all similar problems
Clarity
Every good and tested program must be clear, straight forward and easy to understand.
Reliability
The program should be depended upon at all times
Problem Analysis
Planning the Solution
Flowcharting
Desk checking
Program coding
Desk checking
Program coding
Program compilation
Program testing/debugging
Program documentation
20 | P a g e
Description of each stage
Problem definition
The programmer is expected to first of all understand the problem and know exactly what the program entails. The
definition of the problem must be unambiguous.
Problem analysis
The programmer is expected to analyze the problem to determine how it will be solved, the required inputs and
output.
Planning the Solution
Before a program is written, the algorithm or flowchart for that program must be drawn and tested before the
actual coding of the program and this is called dry running a program. The flowchart therefore, is a diagrammatical
representation of the steps involved in writing a given program.
Programming Coding
This is the actual writing or coding of the program in a particular programming language e.g. Basic, VBasic,
FORTRAN, Pascal, COBOL etc.
Problem compilation
When the coding process is complete, the program will be compiled if it is necessary. It is necessary to compile if
the programming language allow it.
Program testing
This is similar to proof reading. The written program is tested and errors corrected to check if the program is able
to solve the problem it is expected to solve.
Program documentation
This involves writing a detailed description about the program and some specific facts pertaining to the usage and
maintenance of the program.
Program running
This is the actual running or execution of the program with the compiler or interpreter so as to check if the desired
output is generated.
Maintenance
It is the process of updating or amending a previously written program for current use.
Interpreted and Compiled program
Compiler characteristics;
1. Spends a lot of time analyzing and processing the program
2. The resulting executable is some form of machine – specific binary code.
3. The computer hardware interprets (executes) the resulting code program
4. Execution is fast
Interpreter characteristics:
21 | P a g e
1. Relatively little time is pent analyzing and processing the program
2. The resulting code is some sort of intermediate code
3. The resulting code is interpreted by another program
4. Program execution is relatively slow
Examples of compiled language are:
1. C
2. C++,
3. COBOL
4. FORTRAN
22 | P a g e