Database Basics: Data vs. Information
Database Basics: Data vs. Information
A database is a collection of logically related data, and a description of this data is designed to
meet the information needs of an organization. A database is a single, possibly large repository
of data that is defined once and then can be accessed simultaneously by many various users. A
database can consist of one or more tables of information that are related in some way. Such a
multi-table database is called a relational database which is a collection of normalized relations
with distinct relation names.
1.1.1 Data versus Information
The terms data and information are closely related, and in fact, are often used interchangeably.
However, it is useful to distinguish between data and information
Data: Stored representations of objects and events that have meaning and importance in the
user's environment.
Information: is processed data that has meaning and increases the knowledge of the person who
uses the data. For example, consider the following list of facts.
For example, consider the following list of facts.
Alemu Bekele 32456
Seid Kemal 45627
Belay Teklu 67892
Abebech Tolosa 34256
The above facts satisfy the definition of data. It is simply the presentation of names with
numbers and is usually considered as useless as it does not give meaning to what the entire mean.
there are different ways to convert the above data into information. In this section, we see the
two basic Methods. The first is to put the data into a table. So, the above data can be represented
as follow
Table 1: Information regarding daily patient treatment
Daily Treatment
Physician Name: Hailu Tefera
Status: Pediatrics Date: March 2022
Name Patient ID Sex Age
Alemu Bekele 32456 M 3
Said Kemal 45627 M 2
Belay Teklu 67892 M 4
Abebech Tolosa 34256 F 1
The second way to convert data into information is to summarize them or otherwise process and
present them in pictorial forms. For example, figure 1-1 shows the distribution of outbreaks in
the four major regions of Ethiopia.
Figure 1
Outbreak
Distribution in Ethiopia
The above diagram shows the outbreak distribution presented in a graphical way. This
information could be used as a basis for intervention Planning.
Metadata: are data that describe the properties or characteristics of end-user data and the context
of that data. It is called data about data. Some of the properties that are typically described
include data names, definitions, length (or size), and allowable values. Metadata describing data
context includes the source of the data, where the data are stored, ownership (or stewardship),
and usage.
1.1.2 File System vs. Database Approach
File systems and Database Management systems are the two ways that could be used to manage,
store, retrieve and manipulate data. A File System is a collection of raw data files stored in the
hard drive whereas a database system is a bundle of applications that are dedicated to managing
data stored in databases. It is the integrated system used for managing digital databases, which
allows the storage of database content, creation/ maintenance of data, search, and other
functionalities. Both systems can be used to allow the user to work with data in a similar way. A
File System is one of the earliest ways of managing data. But due to the shortcomings present in
using a File System to store electronic data, Database Systems came into use sometime later, as
they provide mechanisms to solve those problems.
[Link] File System
The file system has a number of characteristics that differ from the database management
system. In the file system approach, each user defines and implements the needed files for a
specific application to run. For example, in the hospital system, one user will be maintaining the
details of how many patients are there in the hospital along with their histories, these details will
be stored and maintained in a separate file.
Another user will be maintaining the payments the patients made during treatment, the detailed
payment transactions report will be stored and maintained in a separate file. Although both of the
users are interested in the data of the patient, they will be having their details in separate files and
they need different programs to manipulate their files. This will lead to wastage of space and
redundancy or replication of data, which may lead to confusion, sharing of data among various
users is not possible, data inconsistency may occur. These files will not be having any inter-
relationship among the data stored in these files.
Database Management System Approach
A database Management System (DBMS) is a collection of programs that enables users to create
and maintain a database. The DBMS is hence a general-purpose software system that facilitates
the process of defining, constructing, and manipulating the database for various applications.
Defining a database involves specifying the data types, structures, and constraints for the data to
be stored in the database. Constructing the database is the process of storing the data itself on
some storage media that is controlled by the DBMS. Manipulating a database includes such
functions as querying the database to retrieve specific data, updating the database to reflect
changes in the mini world, and generating reports from the data.
1.1.3 History of Database Application
A. Early Database Applications Using Hierarchical and Network Systems
Most of these database systems were implemented in the [Link] of the main problems
with early database systems was the intermixing of conceptual relationships with the physical
storage and placement of records on a disk. The main types of early systems were based on three
main paradigms: hierarchical systems, network model-based systems, and inverted file systems
B. Relational Databases
Early experimental relational systems developed in the late [Link] databases were
originally proposed to separate the physical storage of data from its conceptual representation
and to provide a mathematical foundation for data representation and querying. The relational
data model also introduced high-level query languages that provided an alternative to
programming language interfaces, making it much faster to write new queries.
C. Object-Oriented Applications
The emergence of object-oriented programming languages in the 1980s and the need to store and
share complex, structured objects led to the development of object-oriented databases (OODBs).
Initially, OODBs were considered a competitor to relational databases, since they provided more
general data structures. They are now mainly used in specialized applications, such as
engineering design, multimedia publishing, and manufacturing systems.
D. Emerging of XML (Extensible Markup Language)
The World Wide Web provides a large network of interconnected computers. Users can create
documents using a Web publishing language, such as HyperText Markup Language (HTML),
and store these documents on Web servers where other users (clients) can access them. In the
1990s, electronic commerce (e-commerce) emerged as a major application on the Web.
E. Extending Database Capabilities for New Applications
The success of database systems in traditional applications encouraged developers of other types
of applications to attempt to use them. Such applications traditionally used their own specialized
file and data structures. Database systems now offer extensions to better support the specialized
requirements for some of these applications.
The following are some examples of these applications:
Scientific applications
Storage and retrieval of images, including scanned news or personal photo-graphs,
satellite photographic images, and images from medical procedures such as x-rays and
MRIs
Storage and retrieval of videos
Data mining applications that analyze large amounts of data
Spatial applications that store spatial locations of data like the distribution of disease
Time series applications that store information such as health data at regular points in
time, such as morbidity and mortality trend
1.1.4 Characteristics of the Database Approach
In the database approach, a single repository of data is maintained that is defined once and then
is accessed by various users. The main characteristics of the database approach are the following.
A. Self-Describing Nature of the Database System.
A fundamental characteristic of the database approach is that the database system contains not
only the database itself but also a complete definition or description of the database structure and
constraints. This definition is stored in the system catalog, which contains information such as
the structure of each file; the type and storage format of each data item, and various constraints
on the data. This information stored in the catalog is called meta-data, and it describes the
structure of the primary database.
B. Insulation between Programs, Data, and Data Abstraction
In traditional file processing, the structure of data files is embedded in the access programs, so
any changes to the structure of a file may require changing all programs that access this files. By
contrast, DBMS access programs do not require such changes in most case. The structure of data
files is stored in the DBMs catalog separately from the access programs. We call this property
program data independence.
The characteristic that allows program-data independence and program-operation independence
is called data abstraction.
C. Support of multiple views of the data
A database typically has many users, each of whom may require a different perspective or view
of the database. A view may be a subset of the database or it may contain virtual data that is
derived from the data base files but is not explicitly stored. Some user may not need to be aware
of whether the data they refer to is stored or derived. A multiple DBMS whose users have a
variety of application must provide facilities for defining multiple views.
D. Sharing of data and multiuser transaction processing
A multiuser DBMS, as its name implies, must allow multiple users to access the database at the
same time. This is essential if data for multiple applications is to be integrated and maintained in
a single database. The DBMS must include concurrency control software to ensure that several
users trying to update the same data do so in a controlled manner so that the result of the updates
is correct. For example, in a hospital when several matrons (A woman in charge of nursing in a
medical institution) try to assign a bed to a patient, the DBMS should ensure that each bed can be
accessed by only one matron at a time for assignment to a patient. A fundamental role of
multiuser DBMS software is to ensure that concurrent transaction operate correctly.
1.1.5 Basics of Database Architecture
The database architecture is the set of specifications, rules, and processes that dictate how data is
stored in a database and how data is accessed by components of a system. It includes data types,
relationships, and naming conventions. The database architecture describes the organization of
all database objects and how they work together. It affects integrity, reliability, scalability, and
performance. The database architecture involves anything that defines the nature of the data, the
structure of the data, or how the data flows.
The Three-level of Architecture
The goal of the three-schema architecture is to separate the user applications and the physical
database.
In any data model it is important to distingue between the description of the database and the
database itself. The description of the database is called the database schema, which is specified
during database design. A displayed schema is called a schema diagram.
Table 2: Schema diagram for a database
To keep the data consistent from one record to the next, assign the appropriate data type to each
[Link] this context Data characteristics is expressed in terms of Data types, field name, field
size and field format. most data type in database falls under this category.
CHAR (size): a variable length string (can contain letters, numbers, and special
characters). The size parameter specifies the column length in characters and it can be
from 0 to 255. The default is 1
VARCHAR (size): a variable length string (can contain letters, numbers, and special
characters). The size parameter specifies the maximum column length in characters - can
be from 0 to 65535
BINARY (size): equal to char (), but stores binary byte strings. the size parameter
specifies the column length in bytes. default is 1
VARBINARY (size) equal to varchar (), but stores binary byte strings. the size
parameter specifies the maximum column length in bytes.
BLOB (size): for BLOBs (Binary Large Objects). Holds up to 65,535 bytes of data
In addition to the above listed data type the following are commonly used Numeric Data Types
BOOL Zero is considered as false, nonzero values are considered as true.
BOOLEAN: is Equal to BOOL
INT (size): A medium integer. Signed range is from -2147483648 to 2147483647.
Unsigned range is from 0 to 4294967295.
INTEGER (size): Equal to INT (size)
DOUBLE (size, d): A normal-size floating point number. The total number of digits is
specified in size.
DECIMAL (size, d): An exact fixed-point number. The total number of digits is specified
in size. The number of digits after the decimal point is specified in the d parameter.
DATE: A date Format put as YYYY-MM-DD.
DATETIME: A date and time combination. Format: YYYY-MM-DD hh:mm:ss.
1.4 Designing a simple entity relationship diagram
1.4.1 Relational Model Entity Relationship (ER) Modeling
In this module we are going to take a patient management system to demonstrate all the contents
found here after. We concentrate only on the development process of the database application.
For illustration purpose only some of the major entities in the hospital system are discussed. We
list the data requirements of the database here, and we create its conceptual schema step-by-step
as we introduce the modeling concepts of the ER model. The database, called Hospital keeps
track of the information of the hospital departments, doctors, beds and patients. The hospital
provides inpatient and outpatient service for its clients.
The hospital is organized into departments. Each department has a unique name, a unique
number, and a particular employee who manages the department. The hospital provides
IPD (Inpatient Department) and OPD (Outpatient Department) services.
We store each doctor’s name, id number, office phone, salary, sex and specialization. A
doctor is assigned to one department and takes care for all patients and maintains and
arranges the clinical information.
A patient being treated in the hospital will have MRN (Medical Record Number) during
registration and his/her name, address, date of birth and sex will be stored in the database.
A patient assigned for a bed will be identified and the room number, the type of the room,
the price and the date of admission and discharge will be stored in the database.
The entity-relationship model (or ER model) is a way of graphically representing the logical
relationships of entities (or objects) in order to create a database. In ER modeling, the structure
for a database is portrayed as a diagram, called an entity-relationship diagram (or ER diagram),
that resembles the graphical breakdown of a sentence into its grammatical parts.
1.4.2 Identifying Entities
The basic object that the ER model represents is called an entity, which is a "thing" in the real
world with an independent existence. An entity may be an object with a physical existence a
particular person, car, house, or employee, doctor, patient or it may be an object with a
conceptual existence company, job, or hospital. More concisely an entity is defined as any
object in the system that we want to model and store information about. Entities are represented
by rectangles (either with round or square corners) as
Figure 2. Entity
notations
1.4.3 Attributes and Domain
An Attribute is a property that describes an entity. In the above example, the patient is the entity
and MRN, name, address, and sex etc are attributes of patient. Attributes are the descriptive
properties which are owned by each entity of an Entity Set.
Simple Attributes: Simple attributes are those attributes which cannot be divided further.
Figure 3 Simple Attribute
Here, all the attributes are simple attributes as they cannot be divided further.
Composite Attributes: Composite attributes are those attributes which are composed of many
other simple attributes.
1.4.4 Relationships
A relationship relates two or more distinct entities with a specific meaning. For example,
DOCTOR Assefa Belay treats the patient Meaza Birru. The relationship between the two entities
is “treats”.
an easy way to decide whether an object should be an entity or a relationship is to map nouns in
the requirements to entities, and to map the verbs to relations. For example, in the statement,
“DOCTOR Assefa Belay treats the patient Meaza Birru” we can identify the entities “Doctor”
and “patient,” and the relationship “treats”.
An entity-relationship diagram is used to represent the relationship among the entities.
An entity-relationship (ER) diagram is a specialized graphic that illustrates the relationships
between entities in a database. ER diagrams often use symbols to represent three different types
of information. As we mentioned before in Chen’s ER-modeling rectangles are commonly used
to represent entities. Diamonds are normally used to represent relationships and ovals are used to
represent attributes.
Figure 9: A relationship between Doctor and Patient entities using diamonds.
A number of data modeling techniques are being used today. One of the most common is the
entity relationship diagram (ERD). Several ERD notations are available to show the cardinality
between the entities. In this module we will be using Crow’s Foot Notation.
Components used in the creation of an ERD
Entity - A person, place or thing about which we want to collect and store multiple instances of
data. It has a name, which is a noun, and attributes which describe the data we are interested in
storing.
Relationship -Illustrates an association between two entities. It has a name which is a verb. It
also has cardinality and modality.
Cardinality- Defines the number of occurrences of one entity for a single occurrence of the
related entity. E.g., a doctor can treat one or more patients per day.
C. Many-to-many relationships- occurs when each record in Table A may have many
linked records in Table B and vice-versa. You create such a relationship by defining a
third table, called a junction table, whose primary key consists of the foreign keys from
both table A and table B.
Figure 12: A many to many relationships between Patient and Bed
In a many-to-many relationship a given doctor can treat many patients and the same patient can
be treated by different doctors, so it is a many to many (m: n) relationship.
[Link] Relational keys
There are many types of keys that can be defined in the relational model. These have significant
importance in maintaining data consistency and correctness in the database.
A. Supper Key
A super key is a set of one or more attributes (columns), which can uniquely identify a row in a
table.
B. Candidate Key
A candidate key is one or more attribute that uniquely identifies an entity. Every entity in
relational database must have at least one candidate key but it is possible that some may have
two or more. For example, MRN and name of the patient may identify the patient. Therefore,
MRN and name can be considered candidate keys for a patient table.
C. Primary Key
A primary key is an attribute, or set of attributes, that allows each information for an entity to be
uniquely identified. Every entity in a relational database must have a primary key. For example,
a patient entity has attributes such as MRN, name, address, date of birth and sex, and then MRN
can be used as a primary key. As mentioned above MRN and name are candidate keys for the
patient table but name cannot uniquely identify all the patients, whereas MRN is unique for all
patients. Therefore, MRN is a primary key for the patient table.
D. Foreign Key
Entities are related to each other through foreign keys. A foreign key references a particular
attribute of an entity containing the corresponding primary key. For example, a patient entity
with MRN as its primary key for a patient and doctor entity with doctor Id as its primary key for
doctor information can be related to each other through MRN. Therefore, MRN will be a foreign
key for doctor entity whereas the MRN will be a primary key for the Patient entity.
[Link] Degree of relationships
Degree of relationship simply refers to the concept of, how many numbers of entities associated
with the relationship. In DBMS, a degree of relationship represents the number of entity types
that associate in a relationship. Now, based on the number of linked entity types, we have 4 types
of degrees of relationships.
Unary
Binary
Ternary
N-ary
Unary: In this type of relationship, both the associating entity type are the same. So, we can say
that unary relationships exist when both entity types are the same and we call them the degree of
relationship is 1. Or in other words, in a relation only one entity set is participating then such
type of relationship is known as a unary relationship.
Binary: In a Binary relationship, there are two types of entity associates. So, we can say that a
Binary relationship exists when there are two types of entity and we call them a degree of
relationship is 2. Or in other words, in a relation when two entity sets are participating then such
type of relationship is known as a binary relationship. This is the most used relationship and one
can easily be converted into a relational table.
Ternary: In the Ternary relationship, there are three types of entity associates. So, we can say
that a Ternary relationship exists when there are three types of entity and we call them a degree
of relationship is 3. Since the number of entities increases due to this, it becomes very complex
to turn E-R into a relational table.
N-ary: In the N-ary relationship, there are n types of entity that associates. So, we can say that
an N-ary relationship exists when there are n types of entities. There is one limitation of the N-
ary relationship, as there are many entities so it is very hard to convert into an entity, rational
table. So, this is very uncommon, unlike binary which is very much popular.
Deletion anomalies
Update anomalies
Insertion anomalies
Then data insertion, deletion and update will be propagated through the rest of the database via
the defined relationships consistency.
What is an "inconsistent dependency"? While it is appropriate for a user to look in the Patient
table for the address of a particular patient, it may not make sense to look there for the
specialization of the Doctor who treated that patient. The Doctors specialization is related to, or
dependent on, the doctor and thus should be moved to the Doctor table. Inconsistent
dependencies can make data difficult to access because the path to find the data may be missing
or broken.
There are a few rules for database normalization. Each rule is called a "normal form". Commonly
there are Six types of Normalization as listed below
For this level it is enough to discuss on the first three normal form and the step necessary to each
normal form will be discussed in advance as follow.
Each column is unique in first normal form so in this normal form you should identify repeating
attribute and remove them. A relation will be 1NF if it contains an atomic value. First normal
form disallows the multi-valued attribute, composite attribute, and their combinations. Table in
1NF should be
It should only have single(atomic) valued attributes/columns.
Values stored in a column should be of the same domain
All the columns in a table should have unique names.
the order in which data is stored, does not matter.
Example: employee table at Hospital, it shows employees are working with multiple
departments.
Table 5 Employee at Hospital works in different department
Employee Age Department
Aster Abebe 32 MCH, Emergency
Almaz Tesema 45 OPD
Zebiba Ali 36 Delivery
Shimelis Chala 24 IPD
In first normal form duplication Should be removed then Employee table following first normal
will be like this.
Table 6 Employee in first normal form
After the table is normalized following second normal form it will be like this
Table 8 A drug in second normal form
DrugId Drug
001 Amoxicillin
002 Paracetamol
003 Ceftriaxone
004 Hydralazine
BrandId Brand
1 Trimox
2 Panadol
3 Rocephin
4 Apresoline
Table 8 C drug in second normal form
Drug_Brand_Id DrugID BrandId
01 001 1
02 002 2
C. 03 003 3 Third Normal
04 004 4 Form(3NF)
The entity should be considered already in 2NF, and no column entry should be dependent on
any other entry (value) other than the key for the table. If such an entity exists, move it outside
into a new table. 3NF is achieved, considered as the database is normalized. 3NF is used to
reduce the data duplication. It is also used to achieve the data integrity. If there is no transitive
dependency for non-prime attributes, then the relation must be in third normal form.
Super key in the table above should be identified first. A super key is a set of one or more
attributes (columns), which can uniquely identify a row in a table.
{EMP_ID}
{EMP_ID, EMP_NAME}
{EMP_ID, EMP_NAME, CITY_CODE, EMP_ADDRESS, EMP_NAME, EMP_ADDRESS,
EMP_CITY}
Candidate key: {EMP_ID}
Non-Prime Attribute
An attribute that is not part of any candidate key is known as non-prime attribute. Non-prime
attributes: In the given table above, all attributes except EMP_ID are non-prime.
Here, EMP_STATE & EMP_CITY dependent on EMP_ZIP and EMP_ZIP dependent on
EMP_ID. The non-prime attributes (EMP_STATE, EMP_CITY) transitively dependent on super
key (EMP_ID). It violates the rule of third normal form. So, we need to move the EMP_CITY
and EMP_STATE to the new CITY_CODE table, with CITY_CODE as a Primary key.
Table 10 An Employee table in third normal form
Name………………………………………ID…………………Date……………….
Directions: Read the question carefully then choose the best answer among the given alternative
Test I Choose the best answer
1. Assume you are the database end-user and your main job function revolves around constantly
querying and updating the database, using standard types of queries based on this you are
considered as
A. Causal user B. sophisticated user C. Naïve User D. Standalone User
2. The main difference between file system and database approach is
A. In database approach these files will not be having any inter-relationship
B. In database approach simultaneous access of file is possible
C. In file system approach simultaneous access of file is possible
D. Searching is easy in file system than database approach
3. Relational database property ensures the data operation will complete either with success or
with failure.
A. Isolation B. Atomicity C. Durability D. consistency
4. Which relational database software use SQL as query language
A. MySQL B. Microsoft SQL Server C. Oracle D. PostgreSQL
5. in the entity-relationship diagram (ERD) relationship can be represented by
A. diamond B. rectangle C. circle D. Triangle
6. assume you normalize a database and remove any non-key attributes that only depend on part
of the table key to a new table when to do this?
A. 3rd normal form B. 1st normal form C. 2nd normal form D. 4th normal form
So the query goes like all patients who came from Addis Ababa. The resulting output will be like
this as displayed in the following table
Table 15: patient information who came from Addis Ababa
FirstName MiddleName LastName Sex BirthDate city Woreda Kebele
7. Click the first option of the right side seen on the above screen. This screen will open.
8. Click OK and the following screen pops up.
9. Click Next to get the following screen. Make sure to check the product key selection and
click Next.
10. Select the checkbox to accept the license option and click Next.
11. Select SQL Server feature installation option and click Next.
12. Feature selection, Select the components you want to install on your machine.
Database Engine Services: Allow you to install SQL Server instance.
Analysis Services: install an Analysis Services instance on standalone or on
cluster node.
Reporting Services: Allow you to install the server as report server.
SQL Server Data Tool: Allow you to install SQL Server Developer tool to work
with integration packages.
Integration Services: Allow you to install Integration Services.
Management Tool: Allow you to install SQL Server management configuration
tool including command line and power shell tool.
If you selected “All Features with Default” in previous step, all these components will be
checked automatically.
13. after selecting the features to install, setup again runs a check to ensure whether your
machine’s configuration is compatible. If all looks good, click Next.
14. Instance Configuration: this step will ask, what type of instance you want to configure, as
we all know, either we can install default or Named instance.
If default instance is already installed, you’ll have to have proceeded with named
instance.
15. Disk Space requirement at this step, you’ll get disk space summary which will show how
much disk space your instance will take on the machine.
16. Server Configuration On this step you’ll find options to specify Service Accounts and
Collation Configuration.
17. Select service account names and start-up types for the above listed services and click
Collation.
18. Make sure the correct collation selection is checked and click Next.
19. Make sure authentication mode selection and administrators are checked and click Data
Directories.
20. Make sure to select the above directory locations and click Next. The following screen
appears.
21. Click Next on the above screen.
22. Click Next on the above screen to the get the following screen.
23. Make sure to check the above selection correctly and click Install.
LG- 60 LAP TEST-1 Performance Test
Name………………………………………………….ID…………………Date
Time started: ________________________ Time finished: ________________
Instructions: Given the necessary templates, tools, and materials you are required to perform the
following tasks with in 3hr. The project is expected of each student to do it.
Task-1 Perform SQL Server 2012 Installation
LG-60 Operation Sheet 2
Name…………………………………………….ID………………………Date…………….
Time started: ________________________ Time finished: ________________
Instructions: Given necessary templates, tools and materials you are required to perform the
following tasks within 3 hours. The project is expected from each student to do it.
Task-1 Create Database Health_Science_College
Task-2 based on the information provided below create table called DEPARTMENT
Column Data Type Size Constraint
Did CHAR 4 PK, NOT NULL
Dname VARCHAR 30 UNIQUE, NOT NULL
Dlocation VARCHAR 50
Task-3 based on the information provided below create table called DEPARTMENT
Column Data Type Size Constraint
Sid CHAR 10 PK, NOT NULL
Fname VARCHAR 30 NOT NULL
Lname VARCHAR 30 NOT NULL
Sex CHAR 1 DEFAULT ‘F’, must be either ‘F’ or ‘M’
Year_of_Study INTEGER NOT NULL
Dbirth DATE
Age INETEGER Computed or derived from date of birth and current date
Sem_Payment DECIMAL (6, 2)
Paid CHAR 3 Computed or derived from Semester_Payment
Did CHAR 4 FK
Note: The value of ‘Paid’ column is derived from the value of ‘Semester_Payment’ and should be:
‘Yes’, if Semester_Payment = NOT NULL
‘No’, if Semester_Payment = NULL
Task-4 based on the information provided below create table called Course
Sid Fname Lname Sex Yrs Dbirth Age Sem_Payment Paid Did
et001 Andinet Tefera M 1 1998-10-07 2200.00 Yes D03
et002 Sofiya Ahmed F 2 1995-02-02 2580.25 Yes D01
Et003 Asnake Wondim M 3 1995-07-18 2500.00 Yes D02
et004 Genet Lema F 4 1994-02-13 2650.75 Yes D01
et005 Genet Daniel F 4 1995-09-09 2350.25 Yes D03
et006 Solomom Tadesse M 4 1993-12-06 NULL No D02
et007 Adane Eyayu M 4 1994-09-01 2600.75 Yes D02
et008 Kidanu Belete M 1 1997-05-23 NULL No D03
et009 Seble Tefera F 2 1996-04-11 2580.25 Yes D01
et010 Meron Hailu F 4 1994-07-28 2600.75 Yes D02
et011 Jemal Hassen M 1 1995-01-25 2450.00 Yes D02
et012 Semira Kalid F 3 1995-03-16 2500.00 Yes D02
Task-11 based on the information provided below enter data to Course table
Task-12 based on the information provided below enter data to Section table
Task-13 based on the information provided below enter data to PREREQUISITE table
Course_number Prno
CS3380 IT3320
CS3380 MATH2410
IT3142 IT3141
IT3320 CS1310
Task-14 based on the information provided below enter data to GRADE_REPORT table
Stid Section_id Grade Remark
GU5501/10 13 B Very Good
GU5004/08 14 C Good
GU5202/09 16 A Excellent
GU5202/09 11 A Excellent
GU5003/08 14 A Excellent
GU5503/10 12 B Very Good
GU5202/09 15 A Excellent
GU5503/10 17 A Excellent
GU4504/07 15 C Good
GU5503/10 18 C Good
GU5501/10 10 B Very Good
GU4508/07 19 B Very Good
GU4505/07 10 A Excellent
GU4506/07 18 C Good
GU5004/08 19 A Excellent
GU4506/07 14 B Very Good
GU4507/07 12 A Excellent
GU4507/07 14 A Excellent
GU5508/10 10 B Very Good
GU5209/09 11 A Excellent
GU5202/09 13 B Very Good
Task 15 Specify the following queries in SQL.
1. Retrieve all students whose first name starts with ‘S’.
2. Retrieve all records of the first five students.
3. Retrieve grade and student id whose grade is either ‘B’, ‘C’ or ‘D’.
4. Retrieve the names of all senior students whose department is health informatics.
5. Retrieve all students where semester payment is null.
6. Retrieve the names and course number of all courses taught by Mr. Mohammed.
7. Retrieve the course number and name of all courses which have prerequisite course.
8. Retrieve all female students whose department is either computer science or health
informatics.
9. Retrieve the list of students with student id, student name, grade and remark. The list
should be ordered alphabetically by student first name, then last name.
10. Retrieve the course number and name of all courses which have no prerequisite course.
11. For each section taught by Mr. Mohammed, retrieve the course number, semester,
academic year, and number of students who took the section.
12. Retrieve the names and department ID of all students who do not have any grade of ‘A’
in any of their courses.
13. Find the sum of the semester payment, the maximum payment, the minimum payment,
and the average payment of all students.
14. Find the sum of the semester payment, the maximum payment, the minimum payment,
and the average payment of all students whose department is health informatics.
15. Retrieve the number of students who score grades ‘A’ in any course.
16. Count the number of distinct semester payment values in the database.
17. Change the year of study of a student to 3 whose id is ‘GU4508/07’.
18. Delete the record for the student whose first name is ‘Lema’
and whose student id is ’GU5001/10’.
Task 16 Create the following views based on the given specification.
1. Create a view called ‘Student_ Transcript’ that has course name, course number, grade,
semester, academic year, and section identifier.
LG #61 LO #3-Ensure data entry and integrity
Instruction sheet
This learning guide is developed to provide you with the necessary information regarding the
following content coverage and topics:
Designing user friendly data entry form
Checking application of data entry procedure based on institutional guideline/manual.
Checking Data integrity rules for completeness and accuracy
Identifying and implementing Data security, confidentiality and integrity mechanisms
This guide will also assist you to attain the learning outcome stated on the cover page.
Specifically, upon completion of this Learning Guide, you will be able to:
Design user friendly data entry form
Check application of data entry procedure
Check Data integrity rules for completeness and accuracy
Identifying and implementing Data security, confidentiality and integrity mechanisms
Learning Instructions:
1 Read the specific objectives of this Learning Guide.
2 Follow the instructions described below 3 to 6.
3 Read the information written in the information “Sheet 1”
4 Accomplish the “Self-check 1”
5 If you earned a satisfactory evaluation from the “Self-check” proceed to “Operation Sheet
LG-61 Information Sheet 1
When you design a form template that is based on a database, Microsoft Office InfoPath creates
a main data source with groups that contain query fields and data fields and a query data
connection as the main data connection for the form template. These fields and groups
correspond to the way that data is stored in the tables in the database. Query fields contain the
data that is entered by a user to limit the query results to records that match the data in the query
fields. When a form based on this form template uses the main data connection, InfoPath creates
a query by using the data in the query fields. InfoPath then sends the query through the query
data connection. The database returns the results of the query back to the form through the query
data connection. The results of the query are put into data fields, which can be edited through
controls that are bound to these fields. A form can submit data to a database through the form's
main data connection if the form template that the form is based on and the database meet the
following requirements:
The form template is not a browser-compatible form template InfoPath will not
create a submit data connection in the main data connection if you are designing a
browser-compatible form template. To allow users to submit data in a form that is
based on a browser-compatible form template, use a Web service that works with the
database.
The left table in each pair of related tables in the main data source contains a
primary key At least one of the relationships for every pair of related tables must
include a primary key from the left table.
None of the data fields in the main data source of the form store a large binary
data type InfoPath will disable the submit data connection if the query includes
fields that can store a large binary data type, such as pictures, images, OLE objects,
file attachments, the Office Access memo data type, or the SQL Text data type.
1.1.2 Compatibility considerations
When you design a form template that is based on a database, you have the option of designing a
browser-compatible form template. InfoPath will create a query data connection as the main data
connection in a browser-compatible form template. However, browser-compatible form
templates cannot be configured to allow users to submit data to a database. Therefore, if you are
designing a form template based on a SQL Server database and you want your users to submit
their form data to the database through the main data connection, you cannot make that form
template browser-compatible.
[Link] Before you begin
In order to design your form template based on a SQL Server database, you need the following
information from your database administrator:
The name of the server that contains the database that you will connect your form template to.
The name of the database that you will use with this form template.
The authentication required by the database. The database can use either Microsoft
Windows authentication or SQL Server authentication to determine how users can access
the database.
The name of the table that contains the data that you want to send to the form or that will
receive data from the form. This is the primary table. If you are going to use more than
one table in the database, you need the names of those other, child tables. You also need
the names of the fields in the child tables that have relationships with the fields in the
primary table
1. Click OK.
2. In the Data Connection Wizard, click Select Database.
3. In the Select Data Source dialog box, click New Source.
4. In the What kind of data source do you want to connect to list, click Microsoft SQL
Server, and then click Next
5. In the Server name box, type the name of the server with the SQL Server database.
6. Under Log on credentials, do one of the following:
If the database determines who has access based on the credentials used in a
Microsoft Windows network, click Use Windows Authentication.
If the database determines who has access based on a specified user name and
password that you get from the database administrator, click Use the following
User Name and Password, and then type your user name and password in the
User Name and Password boxes.
7. Click Next.
8. In the Select the database that contains the data you want list, click the name of the
database that you want to use, select the Connect to a specific table check box, click
the name of the primary table, and then click Next.
9. On the next page of the wizard, type a name for the file that stores the data connection
information in the File Name box, and then click Finish to save these settings.
Step 2: Add one or more controls to display the query results
1. If the Controls task pane is not visible, click More Controls on the Insert menu, or
press ALT+I, C.
2. Drag a control onto your form template.
3. In the Control Binding dialog box, select the group or field that you want to bind the
control to.
1.1.4 Configure the submit options
If your form template and the tables that you selected in the Data Connection Wizard meet the
requirements in the Overview section, InfoPath configures your form template to submit data
through its main data connection. If you choose to use this submit data connection, InfoPath
configures the form template so that users can submit their form data to the database and adds
both the Submit button to the Standard toolbar and the Submit command to the File menu
when users fill out the form. InfoPath also configures the form template so that, when users
submit their forms, the form remains open and a message is displayed to the user that indicates
whether the form was successfully submitted. You can change the text on the Submit button and
the behavior after the user submits a form.
1. On the Tools menu, click Submit [Link] change the name of the Submit button
that appears on the Standard toolbar and the Submit command that appears on the File
menu when users fill out the form, type the new name in the Caption box in the Submit
Options dialog box.
2. If you do not want people to use a Submit command or the Submit button on the
Standard toolbar when they fill out your form, clear the Show the Submit menu item
and the Submit toolbar button check box. By default, after users submit a form,
InfoPath keeps the form open and displays a message to indicate if the form was
successfully submitted. To change this default behavior, click Advanced, and then do
one of the following:
To close the form or create a new blank form after the user submits a completed
form, click the option that you want in the After submit list.
To create a custom message to indicate if the form was successfully submitted,
select the Use custom messages check box, and then type your messages in the
on success and On failure boxes.
If you do not want to display a message after the user submits a form, clear the
Show success and failure messages check box
Data entry is a big job in Health Information professional and require greater accuracy of the
data what enter. For example, in DHIS2(District Health Information Software) Data entry is
done using datasets (Forms) for an intended organizational unit and for a specified period. So,
data entry requires form created by an application. When you perform data entry in Health
Institution always check the following data entry procedure check
Proof reading
Accuracy of data
Outcome of sorting/filtering
Ensuring instructions with regard to content and format have been followed
Timeliness of data entry
Checking data for completeness, accuracy and security using automated
facilities such as spell checking and sorting data.
1.2.2 Data Validation in SQL
A validation rule is based on an expression which defines a numeric relationship between data
element values. The expression forms a condition which should assert that certain logical criteria
are met. The expression consists of:
a left side
a right side
an operator
A validation rule could assert that "Suspected malaria cases tested" >= "Confirmed malaria
cases".
The left and right sides must return numeric values. Data validation is the method for checking
the accuracy and quality of data.
It is often performed prior to adding, updating, or processing data.
When validating data, we can check if the data is:
complete (no blank or null values)
unique (no duplicate values)
consistent with what we expect (eg a decimal between a certain range).
A. Physical integrity
Physical integrity is the protection of the wholeness and accuracy of that data as it’s stored and
retrieved. When natural disasters strike, power goes out, or hackers disrupt database functions,
physical integrity is compromised. Human error, storage erosion, and a host of other issues can
also make it impossible for data processing managers, system programmers, applications
programmers, and internal auditors to obtain accurate data.
B. Logical integrity
Logical integrity keeps data unchanged as it’s used in different ways in a relational database.
Logical integrity protects data from human error and hackers as well, but in a much different
way than physical integrity does. There are four types of logical integrity:
C. Entity integrity Entity integrity relies on the creation of primary keys. the unique
values that identify pieces of data to ensure that data isn’t listed more than once and
that no field in a table is null. It’s a feature of relational systems which store data in
tables that can be linked and used in a variety of ways.
D. Referential integrity Referential integrity refers to the series of processes that
make sure data is stored and used uniformly. Rules embedded into the database’s
structure about how foreign keys are used ensure that only appropriate changes,
additions, or deletions of data occur. Rules may include constraints that eliminate
the entry of duplicate data, guarantee that data entry is accurate, and/or disallow the
entry of data that doesn’t apply.
E. Domain integrity Domain integrity is the collection of processes that ensure the
accuracy of each piece of data in a domain. In this context, a domain is a set of
acceptable values that a column is allowed to contain. It can include constraints and
other measures that limit the format, type, and amount of data entered.
F. User-defined integrity User-defined integrity involves the rules and constraints
created by the user to fit their particular needs. Sometimes entity, referential, and
domain integrity aren’t enough to safeguard data. Often, specific business rules
must be taken into account and incorporated into data integrity measures.
The above diagrams many of the possible locations for data security threats, accomplishing this
level of security requires careful review, establishment of security procedures and policies, and
implementation and enforcement of those procedures and policies. The following threats must be
addressed in a comprehensive data security plan:
Accidental losses, including human, error, software, and hardware-caused breaches Establishing
operating procedures such as user authorization, uniform software installation procedures, and
hardware maintenance schedules are examples of actions that may be taken to address threats
from accidental losses.
As in any effort that involves human beings, some losses are inevitable, but carefully planned
policies and procedures should reduce the amount and severity of losses. Of potentially more
serious consequence are the threats that are not accidental
1.4.3 Common Types of Database Security attacks and causes
A. Insider threats
An insider threat is a security threat from any one of three sources with privileged access to the
database:
A malicious insider who intends to do harm
A negligent insider who makes errors that make the database vulnerable to attack
An infiltrator an outsider who somehow obtains credentials via a scheme such as
phishing or by gaining access to the credential database itself
Insider threats are among the most common causes of database security breaches and are often
the result of allowing too many employees to hold privileged user access credentials.
B. Human error
Accidents, weak passwords, password sharing, and other unwise or uninformed user behaviors
continue to be the cause of nearly half (49%) of all reported data breaches.
C. Exploitation of database software vulnerabilities
Hackers make their living by finding and targeting vulnerabilities in all kinds of software,
including database management software. All major commercial database software vendors and
open-source database management platforms issue regular security patches to address these
vulnerabilities, but failure to apply these patches in a timely fashion can increase your exposure.
D. SQL/NoSQL injection attacks
A database-specific threat, these involve the insertion of arbitrary SQL or non-SQL attack strings
into database queries served by web applications or HTTP headers. Organizations that don’t
follow secure web application coding practices and perform regular vulnerability testing are open
to these attacks.
E. Buffer overflow exploitations
Buffer overflow occurs when a process attempts to write more data to a fixed-length block of
memory than it is allowed to hold. Attackers may use the excess data, stored in adjacent memory
addresses, as a foundation from which to launch attacks.
F. Denial of service (DoS/DDoS) attacks
In a denial of service (DoS) attack, the attacker deluges the target server in this case the database
server with so many requests that the server can no longer fulfill legitimate requests from actual
users, and, in many cases, the server becomes unstable or crashes.
In a distributed denial of service attack (DDoS), the deluge comes from multiple servers, making
it more difficult to stop the attack.
1.4.4 Types of Data Security
A. Data encryption.
Data encryption is a way of translating data from plaintext (unencrypted) to ciphertext
(encrypted). Users can access encrypted data with an encryption key and decrypted data with a
decryption key.
B. Data backup to the cloud.
C. Password protection.
D. identity and access management
E. intrusion detection and prevention software
1.4.5 Confidentiality and Integrity
A. Confidentiality
Confidentiality means that data, objects, and resources are protected from unauthorized viewing
and other access.
Confidentiality measures protect information from unauthorized access and misuse. Most
information systems house has some degree of sensitivity. It might be proprietary business
information that competitors could use to their advantage, or personal information regarding an
organization’s employees, customers, or clients.
Confidential information often has value and systems are therefore under frequent attack as
criminals hunt for vulnerabilities to exploit. Threat vectors include direct attacks such as stealing
passwords and capturing network traffic and more layered attacks such as social engineering and
phishing. Not all confidentiality breaches are intentional.
A few types of common accidental breaches include
emailing sensitive information to the wrong recipient
publishing private data to public web servers
leaving confidential information displayed on an unattended computer monitor.
Healthcare is an example of an industry where the obligation to protect client information is
very high. Not only do patients expect and demand that healthcare providers protect their
privacy, but there are also strict regulations governing how healthcare organizations manage
security.
The Health Insurance Portability and Accountability Act (HIPAA) addresses security, including
privacy protection, in the handling of personal health information by insurers, providers, and
claims processors. HIPAA rules mandate administrative, physical, and technical safeguards, and
require organizations to conduct a risk analysis.
There are many countermeasures that organizations put in place to ensure confidentiality.
Passwords, access control lists, and authentication procedures use software to control access to
resources. These access control methods are complemented by the use of encryption to protect
information that can be accessed despite the controls, such as emails that are in transit.
Additional confidentiality countermeasures include administrative solutions such as policies and
training, as well as physical controls that prevent people from accessing facilities and equipment.
B. Integrity
Integrity means that data is protected from unauthorized changes to ensure that it is reliable and
correct.
Integrity measures protect information from unauthorized alteration. These measures provide
assurance in the accuracy and completeness of data. The need to protect information includes
both data that is stored on systems and data that is transmitted between systems such as email. In
maintaining integrity, it is not only necessary to control access at the system level but to further
ensure that system users are only able to alter information that they are legitimately authorized to
alter. As with confidentiality protection, the protection of data integrity extends beyond
intentional breaches. Effective integrity countermeasures must also protect against unintentional
alteration, such as user errors or data loss that is a result of a system malfunction.
There are many countermeasures that can be put in place to protect integrity:
Access control and rigorous authentication can help prevent authorized users from
making unauthorized changes.
Hash verifications and digital signatures can help ensure that transactions are
authentic and that files have not been modified or corrupted.
Data integrity vs data security
Data integrity is not data security. The two concepts are clearly summarized on the following
table
data security Data integrity
keep data from getting corrupted. keep the data to be intact and accurate
Its motive is the protection of data. Its motive is the validity of data.
It avoids unauthorized access to data. It avoids human error when data is entered.
It can be implemented through It can be implemented through.
user accounts (passwords) user accounts (passwords)
authentication schemes authentication schemes
Data integrity is not data quality. does the data in your database meet company-defined standards
and the needs of your business? Data quality answers these questions with an assortment of
processes that measure your data’s age, relevance, accuracy, completeness, and reliability.
Much like data security, data quality is only a part of data integrity, but a crucial one. Data
integrity encompasses every aspect of data quality and goes further by implementing an
assortment of rules and processes that govern how data is entered, stored, transferred, and much
more.
In DHIS2 data integrity is assured when higher-level like zonal health department can not edit
lower-level health center data. for incentivize purpose service some service needed to increase
artificially on the report after data is entered. This is usually done higher officials and DHIS2 can
handle such integrity breach.
References
1. Elmasri, R., & Navathe, S. B. (2017). Fundamentals of Database Systems 7th Edition.
2. Jeffrey A. Hoffer, V. Ramesh (2017) Modern Database Management 13th Edition.
3. Ramez Elmasri, Shamkant Navathe. (2018) Brief History of Database Applications . Retrieved April
01,2022, from [Link] Database
Applications_11395/30/03/22
4. Database management system tutorial (2011). Retrieved April 01,2022, from
[Link] 30/03/22
5. Niroj Kumar, database development process (2022) Retrieved March 30,2022 from
[Link]