0% found this document useful (0 votes)
4 views84 pages

Database Basics: Data vs. Information

Uploaded by

tilalembi
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views84 pages

Database Basics: Data vs. Information

Uploaded by

tilalembi
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

Introduction to database

A database is a collection of logically related data, and a description of this data is designed to
meet the information needs of an organization. A database is a single, possibly large repository
of data that is defined once and then can be accessed simultaneously by many various users. A
database can consist of one or more tables of information that are related in some way. Such a
multi-table database is called a relational database which is a collection of normalized relations
with distinct relation names.
1.1.1 Data versus Information
The terms data and information are closely related, and in fact, are often used interchangeably.
However, it is useful to distinguish between data and information
Data: Stored representations of objects and events that have meaning and importance in the
user's environment.
Information: is processed data that has meaning and increases the knowledge of the person who
uses the data. For example, consider the following list of facts.
For example, consider the following list of facts.
Alemu Bekele 32456
Seid Kemal 45627
Belay Teklu 67892
Abebech Tolosa 34256
The above facts satisfy the definition of data. It is simply the presentation of names with
numbers and is usually considered as useless as it does not give meaning to what the entire mean.
there are different ways to convert the above data into information. In this section, we see the
two basic Methods. The first is to put the data into a table. So, the above data can be represented
as follow
Table 1: Information regarding daily patient treatment

Daily Treatment
Physician Name: Hailu Tefera
Status: Pediatrics Date: March 2022
Name Patient ID Sex Age
Alemu Bekele 32456 M 3
Said Kemal 45627 M 2
Belay Teklu 67892 M 4
Abebech Tolosa 34256 F 1
The second way to convert data into information is to summarize them or otherwise process and
present them in pictorial forms. For example, figure 1-1 shows the distribution of outbreaks in
the four major regions of Ethiopia.

Figure 1
Outbreak
Distribution in Ethiopia
The above diagram shows the outbreak distribution presented in a graphical way. This
information could be used as a basis for intervention Planning.
Metadata: are data that describe the properties or characteristics of end-user data and the context
of that data. It is called data about data. Some of the properties that are typically described
include data names, definitions, length (or size), and allowable values. Metadata describing data
context includes the source of the data, where the data are stored, ownership (or stewardship),
and usage.
1.1.2 File System vs. Database Approach
File systems and Database Management systems are the two ways that could be used to manage,
store, retrieve and manipulate data. A File System is a collection of raw data files stored in the
hard drive whereas a database system is a bundle of applications that are dedicated to managing
data stored in databases. It is the integrated system used for managing digital databases, which
allows the storage of database content, creation/ maintenance of data, search, and other
functionalities. Both systems can be used to allow the user to work with data in a similar way. A
File System is one of the earliest ways of managing data. But due to the shortcomings present in
using a File System to store electronic data, Database Systems came into use sometime later, as
they provide mechanisms to solve those problems.
[Link] File System
The file system has a number of characteristics that differ from the database management
system. In the file system approach, each user defines and implements the needed files for a
specific application to run. For example, in the hospital system, one user will be maintaining the
details of how many patients are there in the hospital along with their histories, these details will
be stored and maintained in a separate file.
Another user will be maintaining the payments the patients made during treatment, the detailed
payment transactions report will be stored and maintained in a separate file. Although both of the
users are interested in the data of the patient, they will be having their details in separate files and
they need different programs to manipulate their files. This will lead to wastage of space and
redundancy or replication of data, which may lead to confusion, sharing of data among various
users is not possible, data inconsistency may occur. These files will not be having any inter-
relationship among the data stored in these files.
Database Management System Approach
A database Management System (DBMS) is a collection of programs that enables users to create
and maintain a database. The DBMS is hence a general-purpose software system that facilitates
the process of defining, constructing, and manipulating the database for various applications.
Defining a database involves specifying the data types, structures, and constraints for the data to
be stored in the database. Constructing the database is the process of storing the data itself on
some storage media that is controlled by the DBMS. Manipulating a database includes such
functions as querying the database to retrieve specific data, updating the database to reflect
changes in the mini world, and generating reports from the data.
1.1.3 History of Database Application
A. Early Database Applications Using Hierarchical and Network Systems
Most of these database systems were implemented in the [Link] of the main problems
with early database systems was the intermixing of conceptual relationships with the physical
storage and placement of records on a disk. The main types of early systems were based on three
main paradigms: hierarchical systems, network model-based systems, and inverted file systems
B. Relational Databases
Early experimental relational systems developed in the late [Link] databases were
originally proposed to separate the physical storage of data from its conceptual representation
and to provide a mathematical foundation for data representation and querying. The relational
data model also introduced high-level query languages that provided an alternative to
programming language interfaces, making it much faster to write new queries.
C. Object-Oriented Applications
The emergence of object-oriented programming languages in the 1980s and the need to store and
share complex, structured objects led to the development of object-oriented databases (OODBs).
Initially, OODBs were considered a competitor to relational databases, since they provided more
general data structures. They are now mainly used in specialized applications, such as
engineering design, multimedia publishing, and manufacturing systems.
D. Emerging of XML (Extensible Markup Language)
The World Wide Web provides a large network of interconnected computers. Users can create
documents using a Web publishing language, such as HyperText Markup Language (HTML),
and store these documents on Web servers where other users (clients) can access them. In the
1990s, electronic commerce (e-commerce) emerged as a major application on the Web.
E. Extending Database Capabilities for New Applications
The success of database systems in traditional applications encouraged developers of other types
of applications to attempt to use them. Such applications traditionally used their own specialized
file and data structures. Database systems now offer extensions to better support the specialized
requirements for some of these applications.
The following are some examples of these applications:
 Scientific applications
 Storage and retrieval of images, including scanned news or personal photo-graphs,
satellite photographic images, and images from medical procedures such as x-rays and
MRIs
 Storage and retrieval of videos
 Data mining applications that analyze large amounts of data
 Spatial applications that store spatial locations of data like the distribution of disease
 Time series applications that store information such as health data at regular points in
time, such as morbidity and mortality trend
1.1.4 Characteristics of the Database Approach
In the database approach, a single repository of data is maintained that is defined once and then
is accessed by various users. The main characteristics of the database approach are the following.
A. Self-Describing Nature of the Database System.
A fundamental characteristic of the database approach is that the database system contains not
only the database itself but also a complete definition or description of the database structure and
constraints. This definition is stored in the system catalog, which contains information such as
the structure of each file; the type and storage format of each data item, and various constraints
on the data. This information stored in the catalog is called meta-data, and it describes the
structure of the primary database.
B. Insulation between Programs, Data, and Data Abstraction
In traditional file processing, the structure of data files is embedded in the access programs, so
any changes to the structure of a file may require changing all programs that access this files. By
contrast, DBMS access programs do not require such changes in most case. The structure of data
files is stored in the DBMs catalog separately from the access programs. We call this property
program data independence.
The characteristic that allows program-data independence and program-operation independence
is called data abstraction.
C. Support of multiple views of the data
A database typically has many users, each of whom may require a different perspective or view
of the database. A view may be a subset of the database or it may contain virtual data that is
derived from the data base files but is not explicitly stored. Some user may not need to be aware
of whether the data they refer to is stored or derived. A multiple DBMS whose users have a
variety of application must provide facilities for defining multiple views.
D. Sharing of data and multiuser transaction processing
A multiuser DBMS, as its name implies, must allow multiple users to access the database at the
same time. This is essential if data for multiple applications is to be integrated and maintained in
a single database. The DBMS must include concurrency control software to ensure that several
users trying to update the same data do so in a controlled manner so that the result of the updates
is correct. For example, in a hospital when several matrons (A woman in charge of nursing in a
medical institution) try to assign a bed to a patient, the DBMS should ensure that each bed can be
accessed by only one matron at a time for assignment to a patient. A fundamental role of
multiuser DBMS software is to ensure that concurrent transaction operate correctly.
1.1.5 Basics of Database Architecture
The database architecture is the set of specifications, rules, and processes that dictate how data is
stored in a database and how data is accessed by components of a system. It includes data types,
relationships, and naming conventions. The database architecture describes the organization of
all database objects and how they work together. It affects integrity, reliability, scalability, and
performance. The database architecture involves anything that defines the nature of the data, the
structure of the data, or how the data flows.
The Three-level of Architecture
The goal of the three-schema architecture is to separate the user applications and the physical
database.
In any data model it is important to distingue between the description of the database and the
database itself. The description of the database is called the database schema, which is specified
during database design. A displayed schema is called a schema diagram.
Table 2: Schema diagram for a database

In this architecture, schemas can be defined at the following three levels:


A. The internal level has an internal schema, which describes the physical storage structure
of the database. The internal schema uses a physical data model and describes the
complete details of data storage and access paths for the database.
B. The conceptual level has a conceptual schema, which describes the structure of the
whole database for a community of users. The conceptual schema hides the details of
physical storage structures and concentrates on describing entities, data types,
relationships, user operations, and constraints. A high-level data model or an
implementation data model can be used at this level.
C. The external or view level includes a number of external schemas or user views. Each
external schema describes the part of the database that a particular user group is
interested in and hides the rest of the database from that user group. A high-level data
model or an implementation data model can be used at this level.
1.1.6 Actors on the Database Environment
For small database the list of address one person typically defines, constructs, and manipulates
the database. However, many persons are involved in the design, use, maintenances of a large
database with a few hundreds of users. In this section we identify the people whose jobs involve
the day-to-day use of a large database; we call them “actors on the scene”.
A. Database Administrator
In any organization where many persons use the same resource, there is a need for a chief
administrator to oversee and manage these resources. In a database environment, the primary
resource is the database itself and the secondary resource is the DBMS and related software.
Administering these resources is the responsibility of the data administrator (DBA). The
database administrator is responsible for authorizing access to the database, for coordinating and
monitoring its use and for acquiring software and hardware resources as needed. The DBA is
accountable for problem such as breach of security or poor system response time. In large
organizations, the DBA is assisted by a staff that helps carry out these functions.
B. Database Designer
Database designers are responsible for identifying the data to be stored in the database and for
choosing appropriate structures to be represented and store this data. These tasks are mostly
undertaken before the database is actually implemented and populated with data. It is the
responsibility of database designers to communicate with all prospective database users, in order
to understand their requirements, and to come up with a design that meets these requirements. In
many cases, the designers are on the staff of the DBA and may be assigned other staff
responsibilities after the database design is completed. Database designers typically interact with
each potential group of users and develop a view of the database that meets the data and
processing requirements of this group. These views are then analyzed and integrated with the
view of other user groups. The final database design must be capable of supporting the
requirements of all user groups.
C. End users
End users are the people whose jobs require access to the database for querying, updating, and
generating reports; the database primarily exists for their use. There are several categories of
end-users.
 Causal end user occasionally accesses the database, but they may need different
information each time. They use a sophisticated database query language to specify their
request and are typically middle or high-level managers or other occasional browsers.
 Naive or parametric end users make up a sizeable portion of database end users. Their
main job function revolves around constantly querying and updating the database, using
standard types of queries and updates called canned transaction that have been carefully
programmed and tasted.
 Sophisticated end users include ingénues, scientists, business analysis, and other who
thoroughly familiarize themselves with the facilities of the DBMS so as to implement
their applications to meet their complex requirements.
 Stand-alone users maintain personal databases by using ready-made program package
that provide easy to use menu or graphic based interfaces.
A typical DBMS provides multiple facilities to access a database. Naive end users need to learn
very little about the facilities provided by DBMS; they have to understand only the type of
standard transaction designed and implemented for their use. Casual users learn only a few
facilities that they may use repeatedly. Sophisticated users try to learn most of the DBMS
facilities in order to achieve their complex requirements. Standalone users typically become very
proficient in using a specific software package.
D. System analysts and application programmers (Software Engineers)
System analysts determine the requirements of end users, especially naive and parametric end
users, and develop specification for canned transactions that meet these requirements.
Application programmer’s implements’ these specifications as programs; then they test, debug,
document, and maintain this canned transaction. Such analysts and programmer (nowadays
called software engineers) should be familiar with the full range of capabilities provided by the
DBMS to accomplish their tasks.
1.1.7 Advantages of Using the DBMS
A. Controlling redundancy
In conventional data systems, an organization often builds a collection of application programs
often created by different programmers and requiring different components of the operational
data of the organization. The data in conventional data systems is often not centralized. Some
applications may require data to be combined from several systems. These several systems could
have data that is redundant as well as inconsistent (that is, different copies of the same data may
have different values). Data inconsistencies are often encountered in everyday life. For example,
we have all come across situations when a new address is communicated to an organization that
we deal with (e.g. a bank, or Telecom, or a gas company), we find that some of the
communications from that organization are received at the new address while others continue to
be mailed to the old address. Combining all the data in a database would involve reduction in
redundancy as well as inconsistency. It also is likely to reduce the costs for collection, storage
and updating of data.
B. Restricting unauthorized Access
In file systems, applications are developed for specific purpose. Often different system of an
organization would access different components of the operational data. In such an environment,
enforcing security can be quite difficult. Setting up of a database makes it easier to enforce
security restrictions since the data is now centralized. It is easier to control that has access to
what parts of the database. However, setting up a database can also make it easier for a
determined person to breach security. We will discuss this in topic 4 of this module.
C. Better service to the Users
A DBMS is often used to provide better service to the users. In conventional systems,
availability of information is often poor since it normally is difficult to obtain information that
the existing systems were not designed for. Once several conventional systems are combined to
form one centralized data base, the availability of information and its up-todatedness is likely to
improve since the data can now be shared and the DBMS makes it easy to respond to unforeseen
information requests. Centralizing the data in a database also often means that users can obtain
new and combined information that would have been impossible to obtain otherwise. Also, use
of a DBMS should allow users that do not know programming to interact with the data more
easily. Clients/patients usually tend to forget their service card when they go to the hospital or
clinic. But if there is an EMR (Electronic Medical Record) system then MRN can be found by
searching his name in the database.
D. Enforcing Integrity Constraint
The data of an organization using a database approach is centralized and would be used by a
number of users at a time, it is essential to enforce integrity controls.
Integrity may be compromised in many ways. For example, patient may be shown to have taken
hospital service but not even registered. So, enforcing patient registration before any service is
mandatory.
If a number of users are allowed to update the same data item at the same time, there is a
possibility that the result of the updates is not quite what was intended. For example, we could
have a situation where many ward assignments could be made for patients that are larger than the
available beds. Controls therefore must be introduced to prevent such errors to occur because of
concurrent updating activities. However, since all data is stored only once, it is often easier to
maintain integrity than in conventional systems.
E. Enforces standard
Since all access to the database must be through the DBMS, standards are easier to enforce.
Standards may relate to the naming of the data, the format of the data, the structure of the data
etc.
F. Cost of developing and maintaining systems is lower
As noted earlier, it is much easier to respond to unforeseen requests when the data is centralized
in a database than when it is stored in conventional file systems. Although the initial cost of
setting up of a database can be large, one normally expects the overall cost of setting up a
database and developing and maintaining application programs to be lower than for similar
service using conventional systems since the productivity of programmers can be substantially
higher in using non-procedural languages that have been developed with modern DBMS than
using procedural languages.
G. Flexibility of the system is improved
Changes are often necessary to the contents of data stored in any system. These changes are
more easily made in a database than in a conventional system in that these changes do not need
to have any impact on application programs.
H. Provide backup and recovery
A DBMS must provide facilities for recovering from hardware or software failures. The backup
and recovery subsystem of the DBMS is responsible for recovery for example, if the computer
system fails in the middle of a complex update program, the recovery subsystem is responsible
for making sure that the database restored to the state it was in before the program started
executing.
1.1.8 Common Types of Databases
There are various types of databases used for storing different varieties of data. The following
are the most common type
 Relational Database  Cloud Database
 Centralized Database  Object-oriented Databases
 Distributed Database  Hierarchical Databases
 NoSQL Database  Network Databases.
Among the above listed database type for this level, we focus on relational database type. Relational
database uses SQL for storing, manipulating, as well as maintaining the data.
1.2 Relational database system
1.2.1 Introduction
A relational database is based on the relational data model, which stores data in the form of
rows(tuple) and columns(attributes), and together forms a table(relation). A relational database
uses SQL for storing, manipulating, as well as maintaining the data. Edgar F. Codd invented the
database in 1970. Each table in the database carries a key that makes the data unique from others.
Examples of Relational databases are MySQL, Microsoft SQL Server, Oracle, etc.
1.2.2 Properties of Relational Database
There are following four commonly known properties of a relational model known as ACID
properties, where:
A means Atomicity: This ensures the data operation will complete either with success or with
failure. It follows the 'all or nothing' strategy. For example, a transaction will either be
committed or will abort.
C means Consistency: If we perform any operation over the data, its value before and after the
operation should be preserved. For example, the account balance before and after the transaction
should be correct, i.e., it should remain conserved.
I mean Isolation: There can be concurrent users for accessing data at the same time from the
database. Thus, isolation between the data should remain isolated. For example, when multiple
transactions occur at the same time, one transaction effects should not be visible to the other
transactions in the database.
D means Durability: It ensures that once it completes the operation and commits the data, data
changes should remain permanent
1.3 Identifying Data Characteristics on The Basis of User Requirement
1.3.1 Requirement’s analysis
Requirement’s analysis is resembled to understanding the purpose of your database and
functional requirement. understanding the purpose of your database will inform your choices
throughout the design process. Make sure you consider the database from every perspective. For
instance, if you were making a database for a public library, you’d want to consider the ways in
which both patrons and librarians would need to access the data.
Here are some ways to gather information before creating the database:
 Interview the people who will use it
 Analyze business forms, such as invoices, timesheets, surveys
 Comb through any existing data systems (including physical and digital files)
Start by gathering any existing data that will be included in the database. Then list the types of
data you want to store and the entities, or people, things, locations, and events, those data
describe, like this:
Patient Demographic data Health Professional Data
 Full Name  Full Name
 MRN (Medical Record Number)  Civil Service ID
 Zone  Zone
 Woreda  Woreda
 Kebele  Kebele
 Phone Number  Phone Number
Disease Information Drug Information
 Chief Compliant  Generic Drug Name
 History of present Illness  Dose
 Past History of Illness  Rout
 Diagnosis  Expiry date
This information will later become part of the data dictionary, which outlines the tables and
fields within the database. Be sure to break down the information into the smallest useful pieces.
for instance, consider separating the diagnosis from the disease information so that you can later
filter each disease by their diagnosis name. it also, avoid placing the same data point in more
than one table, which adds unnecessary complexity.
Once you know what kinds of data the database will include, where that data comes from, and
how it will be used, you’re ready to start planning out the actual database.
The next step is to lay out a visual representation of your database. To do that, you need to
understand exactly how relational databases are structured. within a database, related data are
grouped into tables, each of which consists of rows (also called tuples) and columns, like a
spreadsheet.
To convert your lists of data into tables, start by creating a table for each type of entity, such as
products, sales, customers, and orders. Here’s an example:
Each row of a table is called a record. Records include data about something or someone, such as
a particular customer. By contrast, columns (also known as fields or attributes) contain a single
type of information that appears in each record. Fox ample the above listed data of drug
information can be converted to table like this
Table 2 shows how raw data converted to formal table
Generic Drug Name dose Rout Expiry date
Ceftriaxone 1 gm IM/IV 20-may-23
Tramadol 50 mg IM 24-jun-23
Amoxicillin 500mg PO 24-jun-23

To keep the data consistent from one record to the next, assign the appropriate data type to each
[Link] this context Data characteristics is expressed in terms of Data types, field name, field
size and field format. most data type in database falls under this category.
 CHAR (size): a variable length string (can contain letters, numbers, and special
characters). The size parameter specifies the column length in characters and it can be
from 0 to 255. The default is 1
 VARCHAR (size): a variable length string (can contain letters, numbers, and special
characters). The size parameter specifies the maximum column length in characters - can
be from 0 to 65535
 BINARY (size): equal to char (), but stores binary byte strings. the size parameter
specifies the column length in bytes. default is 1
 VARBINARY (size) equal to varchar (), but stores binary byte strings. the size
parameter specifies the maximum column length in bytes.
 BLOB (size): for BLOBs (Binary Large Objects). Holds up to 65,535 bytes of data
In addition to the above listed data type the following are commonly used Numeric Data Types
 BOOL Zero is considered as false, nonzero values are considered as true.
 BOOLEAN: is Equal to BOOL
 INT (size): A medium integer. Signed range is from -2147483648 to 2147483647.
Unsigned range is from 0 to 4294967295.
 INTEGER (size): Equal to INT (size)
 DOUBLE (size, d): A normal-size floating point number. The total number of digits is
specified in size.
 DECIMAL (size, d): An exact fixed-point number. The total number of digits is specified
in size. The number of digits after the decimal point is specified in the d parameter.
 DATE: A date Format put as YYYY-MM-DD.
 DATETIME: A date and time combination. Format: YYYY-MM-DD hh:mm:ss.
1.4 Designing a simple entity relationship diagram
1.4.1 Relational Model Entity Relationship (ER) Modeling
In this module we are going to take a patient management system to demonstrate all the contents
found here after. We concentrate only on the development process of the database application.
For illustration purpose only some of the major entities in the hospital system are discussed. We
list the data requirements of the database here, and we create its conceptual schema step-by-step
as we introduce the modeling concepts of the ER model. The database, called Hospital keeps
track of the information of the hospital departments, doctors, beds and patients. The hospital
provides inpatient and outpatient service for its clients.
 The hospital is organized into departments. Each department has a unique name, a unique
number, and a particular employee who manages the department. The hospital provides
IPD (Inpatient Department) and OPD (Outpatient Department) services.
 We store each doctor’s name, id number, office phone, salary, sex and specialization. A
doctor is assigned to one department and takes care for all patients and maintains and
arranges the clinical information.
 A patient being treated in the hospital will have MRN (Medical Record Number) during
registration and his/her name, address, date of birth and sex will be stored in the database.
 A patient assigned for a bed will be identified and the room number, the type of the room,
the price and the date of admission and discharge will be stored in the database.
The entity-relationship model (or ER model) is a way of graphically representing the logical
relationships of entities (or objects) in order to create a database. In ER modeling, the structure
for a database is portrayed as a diagram, called an entity-relationship diagram (or ER diagram),
that resembles the graphical breakdown of a sentence into its grammatical parts.
1.4.2 Identifying Entities
The basic object that the ER model represents is called an entity, which is a "thing" in the real
world with an independent existence. An entity may be an object with a physical existence a
particular person, car, house, or employee, doctor, patient or it may be an object with a
conceptual existence company, job, or hospital. More concisely an entity is defined as any
object in the system that we want to model and store information about. Entities are represented
by rectangles (either with round or square corners) as

Figure 2. Entity
notations
1.4.3 Attributes and Domain
An Attribute is a property that describes an entity. In the above example, the patient is the entity
and MRN, name, address, and sex etc are attributes of patient. Attributes are the descriptive
properties which are owned by each entity of an Entity Set.
Simple Attributes: Simple attributes are those attributes which cannot be divided further.
Figure 3 Simple Attribute
Here, all the attributes are simple attributes as they cannot be divided further.
Composite Attributes: Composite attributes are those attributes which are composed of many
other simple attributes.

Figure 4 Composite Attribute


Here, the attributes “Name” and “Address” are composite attributes as they are composed of
many other simple attributes.
Single valued attributes: are those attributes which can take only one value for a given entity from
an entity set.

Figure 5 Simple Attribute


Here, all the attributes are single valued attributes as they can take only one specific value for each
entity.
Multi Valued Attributes: multi valued attributes are those attributes which can take more than
one value for a given entity from an entity set.
Figure 6 Multi Valued Attribute
Here, the attributes “Mob_no” and “Email_id” are multi valued attributes as they can take more
than one values for a given entity.
Derived Attributes: Derived attributes are those attributes which can be derived from other
attribute(s).

Figure 7 derived Attribute


Here, the attribute “Age” is a
derived attribute as it can be
derived from the attribute
“DOB”.
Key Attributes: Key
attributes are those attributes
which can identify an entity
uniquely in an entity set.

Figure 8 Key Attribute


Here, the attribute “Roll_no” is a key attribute as it can identify any student uniquely.
Null Values: In some cases, a particular entity may not have an applicable value for an attribute.
null can also be used if we do not know the value of an attribute for a particular entity for
example if we do not know the home phone of "Abebe Kebede” The meaning of the former type
of null is not applicable, whereas the meaning of the latter is unknown. The unknown category of
null can be further classified into two cases.
The first case arises when it is known that the attribute value exists but is missing for example, if
the Height attribute of a person is listed as null. The second case arises when it is not known
whether the attribute value exists for example, if the Home Phone attribute of a person is null
Nulls can have multiple interpretations, such as the following:

 The attribute does not apply to this tuple.


 The attribute value for this tuple is unknown.
 The value is known but absent; that is, it has not been recorded yet
 Having the same representation for all nulls compromises the different meanings they
may have
Domain: Each attribute has a domain, an expression of the allowable values for that attribute. It
is a set of atomic values. by atomic we mean that each value in the domain is indivisible as far as
the relational model is concerned. a common method of specifying a domain is to specify a data
type from which the data values forming the domain are drawn. It is also useful to specify a
name for the domain, to help in interpreting its values. Some examples of domains are shown
below: By atomic we mean that each value in the domain is indivisible as far as the relational
model is concerned. for example, Office phone: the set of 10-digit valid phone numbers in
Ethiopia age: is between 0 and 125. sex: male or female

1.4.4 Relationships
A relationship relates two or more distinct entities with a specific meaning. For example,
DOCTOR Assefa Belay treats the patient Meaza Birru. The relationship between the two entities
is “treats”.
an easy way to decide whether an object should be an entity or a relationship is to map nouns in
the requirements to entities, and to map the verbs to relations. For example, in the statement,
“DOCTOR Assefa Belay treats the patient Meaza Birru” we can identify the entities “Doctor”
and “patient,” and the relationship “treats”.
An entity-relationship diagram is used to represent the relationship among the entities.
An entity-relationship (ER) diagram is a specialized graphic that illustrates the relationships
between entities in a database. ER diagrams often use symbols to represent three different types
of information. As we mentioned before in Chen’s ER-modeling rectangles are commonly used
to represent entities. Diamonds are normally used to represent relationships and ovals are used to
represent attributes.
Figure 9: A relationship between Doctor and Patient entities using diamonds.

A number of data modeling techniques are being used today. One of the most common is the
entity relationship diagram (ERD). Several ERD notations are available to show the cardinality
between the entities. In this module we will be using Crow’s Foot Notation.
Components used in the creation of an ERD
Entity - A person, place or thing about which we want to collect and store multiple instances of
data. It has a name, which is a noun, and attributes which describe the data we are interested in
storing.
Relationship -Illustrates an association between two entities. It has a name which is a verb. It
also has cardinality and modality.
Cardinality- Defines the number of occurrences of one entity for a single occurrence of the
related entity. E.g., a doctor can treat one or more patients per day.

[Link] Types of relationships


There are several types of database relationships among this the following are commonly used
A. One-to-one relationships -occur when there is exactly one record in Table A that
corresponds to exactly one record in Table B. One-to-one relationships are single-valued
in both directions. A patient being treated in a hospital will have only one bed. So, the
relationship between the patient and the bed table is one-to-one (1:1) relationship.

Figure 10: 1 to 1 relationship between Patient and Bed

B. One-to-many or many-to-one relationships -is the most common type of relationship.


It occurs when each record in Table A may have many linked records in Table B but
each record in Table B may have only one corresponding record in Table A. For
example, a given hospital department can have many doctors working under it and
usually one doctor works for a particular department. This is a one-to-many relationship
(1: n)
Figure
11: A many to 1 relationship between Patient and Bed

C. Many-to-many relationships- occurs when each record in Table A may have many
linked records in Table B and vice-versa. You create such a relationship by defining a
third table, called a junction table, whose primary key consists of the foreign keys from
both table A and table B.
Figure 12: A many to many relationships between Patient and Bed

In a many-to-many relationship a given doctor can treat many patients and the same patient can
be treated by different doctors, so it is a many to many (m: n) relationship.
[Link] Relational keys
There are many types of keys that can be defined in the relational model. These have significant
importance in maintaining data consistency and correctness in the database.
A. Supper Key
A super key is a set of one or more attributes (columns), which can uniquely identify a row in a
table.
B. Candidate Key
A candidate key is one or more attribute that uniquely identifies an entity. Every entity in
relational database must have at least one candidate key but it is possible that some may have
two or more. For example, MRN and name of the patient may identify the patient. Therefore,
MRN and name can be considered candidate keys for a patient table.
C. Primary Key
A primary key is an attribute, or set of attributes, that allows each information for an entity to be
uniquely identified. Every entity in a relational database must have a primary key. For example,
a patient entity has attributes such as MRN, name, address, date of birth and sex, and then MRN
can be used as a primary key. As mentioned above MRN and name are candidate keys for the
patient table but name cannot uniquely identify all the patients, whereas MRN is unique for all
patients. Therefore, MRN is a primary key for the patient table.
D. Foreign Key
Entities are related to each other through foreign keys. A foreign key references a particular
attribute of an entity containing the corresponding primary key. For example, a patient entity
with MRN as its primary key for a patient and doctor entity with doctor Id as its primary key for
doctor information can be related to each other through MRN. Therefore, MRN will be a foreign
key for doctor entity whereas the MRN will be a primary key for the Patient entity.
[Link] Degree of relationships
Degree of relationship simply refers to the concept of, how many numbers of entities associated
with the relationship. In DBMS, a degree of relationship represents the number of entity types
that associate in a relationship. Now, based on the number of linked entity types, we have 4 types
of degrees of relationships.
 Unary
 Binary
 Ternary
 N-ary
Unary: In this type of relationship, both the associating entity type are the same. So, we can say
that unary relationships exist when both entity types are the same and we call them the degree of
relationship is 1. Or in other words, in a relation only one entity set is participating then such
type of relationship is known as a unary relationship.
Binary: In a Binary relationship, there are two types of entity associates. So, we can say that a
Binary relationship exists when there are two types of entity and we call them a degree of
relationship is 2. Or in other words, in a relation when two entity sets are participating then such
type of relationship is known as a binary relationship. This is the most used relationship and one
can easily be converted into a relational table.
Ternary: In the Ternary relationship, there are three types of entity associates. So, we can say
that a Ternary relationship exists when there are three types of entity and we call them a degree
of relationship is 3. Since the number of entities increases due to this, it becomes very complex
to turn E-R into a relational table.
N-ary: In the N-ary relationship, there are n types of entity that associates. So, we can say that
an N-ary relationship exists when there are n types of entities. There is one limitation of the N-
ary relationship, as there are many entities so it is very hard to convert into an entity, rational
table. So, this is very uncommon, unlike binary which is very much popular.

1.4.5 Mapping E-R diagrams to Tables


To map E-R diagrams to Tables follow the steps shown below.
 For each entity type E in the ER schema, create a relation R (Table) that includes all the
simple attributes of E.
 Choose one of the key attributes of E as the primary key for R.
 If the chosen key of E is composite, the set of simple attributes that form it will together
form the primary key of R.
Relational Model is made up of tables and the mapping between E-R and a relation (tables) can
be summarized on the following table 3
E-R diagrams Tables
A row of table a relational instance/tuple
A column of table an attribute
A table a schema/relation
Cardinality number of rows
Degree number of columns

Diagrammatically it can be expressed as shown in the figure below.

1.5 Database Normalization


Normalization is the process of organizing the fields and tables of a relational database to
minimize redundancy and dependency. Normalization usually involves dividing large tables into
smaller (less redundant) tables and defining relationships between them. Normalization resolves
3 important anomalies

 Deletion anomalies
 Update anomalies
 Insertion anomalies
Then data insertion, deletion and update will be propagated through the rest of the database via
the defined relationships consistency.

What is an "inconsistent dependency"? While it is appropriate for a user to look in the Patient
table for the address of a particular patient, it may not make sense to look there for the
specialization of the Doctor who treated that patient. The Doctors specialization is related to, or
dependent on, the doctor and thus should be moved to the Doctor table. Inconsistent
dependencies can make data difficult to access because the path to find the data may be missing
or broken.

There are a few rules for database normalization. Each rule is called a "normal form". Commonly
there are Six types of Normalization as listed below

A. First normal form.


B. Second normal form.
C. Third normal form.
D. Boyce-Codd Normal Form (BCNF)
E. Fourth Normal Form (4NF)
F. Fifth Normal Form (5NF)

For this level it is enough to discuss on the first three normal form and the step necessary to each
normal form will be discussed in advance as follow.

A. First Normal Form (1NF)

Each column is unique in first normal form so in this normal form you should identify repeating
attribute and remove them. A relation will be 1NF if it contains an atomic value. First normal
form disallows the multi-valued attribute, composite attribute, and their combinations. Table in
1NF should be
 It should only have single(atomic) valued attributes/columns.
 Values stored in a column should be of the same domain
 All the columns in a table should have unique names.
 the order in which data is stored, does not matter.
Example: employee table at Hospital, it shows employees are working with multiple
departments.
Table 5 Employee at Hospital works in different department
Employee Age Department
Aster Abebe 32 MCH, Emergency
Almaz Tesema 45 OPD
Zebiba Ali 36 Delivery
Shimelis Chala 24 IPD

In first normal form duplication Should be removed then Employee table following first normal
will be like this.
Table 6 Employee in first normal form

Employee Age Department


Aster Abebe 32 MCH
Aster Abebe 32 Emergency
Almaz Tesema 45 OPD
Zebiba Ali 36 Delivery
Shimelis Chala 24 IPD

B. Second Normal Form(2NF)


The entity should be considered already in 1NF, and all attributes within the entity should
depend solely on the unique identifier of the entity.
Table7 drug in first normal form
DrugId Drug Brand
001 amoxicillin Trimox
002 Paracetamol Panadol
003 Ceftriaxone Rocephin
004 Hydralazine Apresoline

After the table is normalized following second normal form it will be like this
Table 8 A drug in second normal form
DrugId Drug
001 Amoxicillin
002 Paracetamol
003 Ceftriaxone
004 Hydralazine

Table 8 B drug in second normal form

BrandId Brand
1 Trimox
2 Panadol
3 Rocephin
4 Apresoline
Table 8 C drug in second normal form
Drug_Brand_Id DrugID BrandId
01 001 1
02 002 2
C. 03 003 3 Third Normal
04 004 4 Form(3NF)

The entity should be considered already in 2NF, and no column entry should be dependent on
any other entry (value) other than the key for the table. If such an entity exists, move it outside
into a new table. 3NF is achieved, considered as the database is normalized. 3NF is used to
reduce the data duplication. It is also used to achieve the data integrity. If there is no transitive
dependency for non-prime attributes, then the relation must be in third normal form.

Table 9 Employee table not in third normal form

EMP_ID EMP_NAME City Code EMP_ADDRESS EMP_CITY

222 Abebe Tesema 6000 Addis Ababa Bole

333 Meron Tadele 2000 Oromo Adama

555 Hayat Sied 4000 Afar Semera

Super key in the table above should be identified first. A super key is a set of one or more
attributes (columns), which can uniquely identify a row in a table.
{EMP_ID}
{EMP_ID, EMP_NAME}
{EMP_ID, EMP_NAME, CITY_CODE, EMP_ADDRESS, EMP_NAME, EMP_ADDRESS,
EMP_CITY}
Candidate key: {EMP_ID}
Non-Prime Attribute
An attribute that is not part of any candidate key is known as non-prime attribute. Non-prime
attributes: In the given table above, all attributes except EMP_ID are non-prime.
Here, EMP_STATE & EMP_CITY dependent on EMP_ZIP and EMP_ZIP dependent on
EMP_ID. The non-prime attributes (EMP_STATE, EMP_CITY) transitively dependent on super
key (EMP_ID). It violates the rule of third normal form. So, we need to move the EMP_CITY
and EMP_STATE to the new CITY_CODE table, with CITY_CODE as a Primary key.
Table 10 An Employee table in third normal form

EMP_ID EMP_NAME CITY_CODE

222 Abebe Tesema 6000

333 Meron Tadele 2000

444 Tigist Asefa 3000

222 Abebe Tesema 6000

Table 10 B Employee table in third normal form

CITY_CODE EMP_STATE EMP_CITY

6000 Addis Ababa Bole

2000 Oromo Adama

3000 Amhara Bahirdar

6000 Addis Ababa Bole

1.6 Designing Simple Relational Database


Database design is the process of analyzing a problem that needs to be solved, developing the
logical model of the data available to solve that problem, grouping the items of data into related
tables, and finally committing the design to the database software system that will be used for the
actual database.
The goals of database design are multiple:
 Satisfy the information content requirements of the specified users and applications.
 Provide a natural and easy-to-understand structuring of the information.
 Support processing requirements and any performance objectives, such as response time,
processing time, and storage space.
1.6.1 Steps in Database Design
The process of database design is divided into different phases and consists of a series of steps.
The main phases are
1. Requirement’s collection and analysis.
2. Conceptual database design.
3. Choice of a DBMS.
4. Data model mapping (also called logical database design).
5. Physical database design.
6. Database system implementation and tuning.

Figure 13: Database design process


Phase 1: Requirement Collection and Analysis
Before we can effectively design a database, we must know and analyze the expectations of the
users and the intended uses of the database in as much detail as possible.
This process is called requirements collection and analysis. To specify the requirements, we first
identify the other parts of the information system that will interact with the database system.
These include new and existing users and applications, whose requirements are then collected
and analyzed. Typically, the following activities are part of this phase:
1. The major application areas and user groups that will use the database or whose work
will be affected by it are identified.
2. Existing documentation concerning the applications is studied and analyzed. iii. The
current operating environment and planned use of the information is studied.
3. Written responses to sets of questions are sometimes collected from the
4. Potential database users or user groups.
Requirement analysis is carried out for the final users, or customers, of the database system by a
team of system analysts or requirement experts. The initial requirements are likely to be
informal, incomplete, inconsistent, and partially incorrect. Therefore, much work needs to be
done to transform these early requirements into a specification of the application that can be used
by developers and testers.
There is evidence that customer participation in the development process increases customer
satisfaction with the delivered system. For this reason, many practitioners use meetings and
workshops involving all stakeholders.
The requirements collection and analysis phase can be quite time-consuming, but it is crucial to
the success of the information system.

Phase 2: Conceptual Database Design


The second phase of database design involves two parallel activities. The first activity,
conceptual schema design, examines the data requirements resulting from Phase 1 and produces
a conceptual database schema. The second activity, transaction and application design, examines
the database applications analyzed in Phase 1 and produces high-level specifications for these
applications.

Phase 2a Conceptual Schema Design


The purpose of this process is to produce a conceptual schema of the database
 Expressed using concepts of the high-level data model
 not including implementation details (has to be understood by non-technical users)
 but detailed in terms of the “objects” of the domain the database will represent
 Independent of the DBMS to be used (no relational DB-oriented notions!)
 Cannot be used directly to implement the database design is made in terms of a semantic
or conceptual data model
 The goal is to achieve an understanding of database structure, semantics,
interrelationships and constraints

Phase 2b: Transaction Design.


The purpose of Phase 2b, which proceeds in parallel with Phase 2a, is to design the
characteristics of known database transactions (applications) in a DBMS-independent way.
When a database system is being designed, the designers are aware of many known applications
(or transactions) that will run on the database once it is implemented. An important part of
database design is to specify the functional characteristics of these transactions early on in the
design process. This ensures that the database schema will include all the information required
by these transactions. In addition, knowing the relative importance of the various transactions
and the expected rates of their invocation plays a crucial part during the physical database design
(Phase 5).
Transactions usually can be grouped into three categories:
1. Retrieval transactions, which are used to retrieve data for display on a screen or for
printing of a report.
2. Update transactions, which are used to enter new data or to modify existing data in the
database.
3. Mixed transactions, which are used for more complex applications that do some
retrieval and some update.
Phase 3: Choice of a DBMS
The choice of a DBMS is governed by a number of factors - some technical, others economic,
and still others concerned with the activities and rules of the organization. The technical factors
focus on the suitability of the DBMS for the task at hand. Issues to consider are the type of
DBMS (relational, object-relational, object, other), the storage structures and access paths that
the DBMS supports, the user and programmer interfaces available, the types of high-level query
languages, the availability of development tools, the ability to interface with other DBMSs via
standard interfaces, the architectural options related to client-server operation, and so on.
Nontechnical factors include the financial status and the support organization of the vendor

Phase 4: Data Model Mapping (Logical Database Design)


The purpose of this phase is to transform the generic, DBMS independent conceptual schema in
the data model of the chosen DBMS (data model mapping). The mapping can proceed in two
stages:
1. System independent mapping: no consideration of any specific characteristics that may
apply to the specific DBMS package.
2. Tailoring to DBMS: different DBMSs may implement the same data model in slightly
different ways
The result of this phase should be DDL (data definition language) statements in the language of
the chosen DBMS that specify the conceptual and external level schemas of the database system.
But if the DDL statements include some physical design parameters, a complete DDL
specification must wait until after the physical database design phase is completed.

Phase 5: Physical Database Design


Physical database design is the process of choosing specific file storage structures and access
paths for the database files to achieve good performance for the various database applications.
Each DBMS offers a variety of options for file organizations and access paths. Once a specific
DBMS is chosen, the physical database design process is restricted to choosing the most
appropriate structures for the database files from among the options offered by that DBMS. In
this section we give generic guidelines for physical design decisions; they hold for any type of
DBMS. The following criteria are often used to guide the choice of physical database design
options:
1. Response time. This is the elapsed time between submitting a database transaction for
execution and receiving a response.
2. Space utilization. This is the amount of
1.1 Procedures of database
Design storage space used by the database files and
their access path structures on disk, including indexes and other access paths.
3. Transaction throughput. This is the average number of transactions that can be
processed per minute; it is a critical parameter of transaction systems such as those used
for airline reservations or banking. Transaction throughput must be measured under peak
conditions on the system.

Phase 6: Database System Implementation and Tuning


After the logical and physical designs are completed, we can implement the database system.
This is typically the responsibility of the DBA and is carried out in conjunction with the database
designers. Language statements in the DDL are compiled and used to create the database
schemas and (empty) database files. The database can then be loaded (populated) with the data.
If data is to be converted from an earlier computerized system, conversion routines may be
needed to reformat the data for loading into the new database.
Database programs are implemented by the application programmers, by referring to the
conceptual specifications of transactions, and then writing and testing program code with
embedded DML (data manipulation language) commands. Once the transactions are ready and
the data is loaded into the database, the design and implementation phase are over and the
operational phase of the database system begins.
A. Tools and equipment (requirement)
I. Personal computer
II. Graphics design or drawing tool like viso, edrawmax or MS office
B. Procedures/Steps/Techniques
1. Defining Database Objective
2. Database Design Team and Other Stake Holders
3. Mapping Business Process, Rules and Policies
4. Defining User Requirements
5. Creating Data Model
6. Construct Conceptual Data Model
 Only database entities and abstract relationships are visible and attributes are not
visible.
 No special software is needed to draw conceptual data model.
 It can be drawn on a piece of paper or whiteboard
7. Construct Logical Data Model
 The next step in the database design is to construct the logical data model.
 The logical data model is created by expanding the conceptual data model with
some additional details.
 The logical structure of the database is represented by the number of interrelated
tables.
 The relationship between these tables is established by defining the database keys.
8. Database Normalization
a. 1NF
b. 2 NF
c. 3NF
9. Defining Database Requirements Specifications

LG-59 LAP TEST-1 Performance Test


Name……………………………………………ID…………………Date………………….
Time started: ________________________ Time finished: ________________
Instructions: Given the necessary templates, tools, and materials you are required to perform the
following tasks within 5 hours. The given time may be adjusted accordingly by the [Link] is
recommended to perform this operation in group of students
Task-1 Perform database design for any health facilities in your local setup
Self-Check – 1 Written test

Name………………………………………ID…………………Date……………….
Directions: Read the question carefully then choose the best answer among the given alternative
Test I Choose the best answer
1. Assume you are the database end-user and your main job function revolves around constantly
querying and updating the database, using standard types of queries based on this you are
considered as
A. Causal user B. sophisticated user C. Naïve User D. Standalone User
2. The main difference between file system and database approach is
A. In database approach these files will not be having any inter-relationship
B. In database approach simultaneous access of file is possible
C. In file system approach simultaneous access of file is possible
D. Searching is easy in file system than database approach
3. Relational database property ensures the data operation will complete either with success or
with failure.
A. Isolation B. Atomicity C. Durability D. consistency
4. Which relational database software use SQL as query language
A. MySQL B. Microsoft SQL Server C. Oracle D. PostgreSQL
5. in the entity-relationship diagram (ERD) relationship can be represented by
A. diamond B. rectangle C. circle D. Triangle
6. assume you normalize a database and remove any non-key attributes that only depend on part
of the table key to a new table when to do this?
A. 3rd normal form B. 1st normal form C. 2nd normal form D. 4th normal form

Test II: Short Answer Questions


1. List and explain the ACID property of database
2. List and explain characteristics of database approach

LG #60 LO #2- Develop and Use Simple Relational


Database and Apply Queries.
Instruction sheet
This learning guide is developed to provide you with the necessary information regarding the following
content coverage and topics:
 Installing and Configuring DBMS
 Developing Simple Relational Database
 Writing Queries
 Designing a Report Layout
This guide will also assist you to attain the learning outcome stated on the cover page. Specifically,
upon completion of this Learning Guide, you will be able to:
 Install and Configure DBMS (SQL Server)
 Develop Simple Relational Database
 Write Queries
 Design a Report Layout
Learning Instructions:
1 Read the specific objectives of this Learning Guide.
2 Follow the instructions described below 3 to 6.
3 Read the information written in the information “Sheet 1”
4 Accomplish the “Self-check 1”
5 Do the operation sheet
LG-60 Information Sheet 1

2.1 Install And Configure Relational Database Management System (RDBMS)


2.1.1 Introduction To Structural Query Language (SQL) Server
Relational Database management system and DBMS software are different concept. DBMS
software is relating to application. There is many DBMS software available today. here are some
of the most common RDBMS software: -
 SQL Server (like the backend of DAGU2 software uses)
 ORACLE
 MySQL
 SQLite
 MariaDB
 PostgreSQL (like the backend of DHIS2 and HRIS Application Uses)
All DBMS software listed above use common language called SQL (structural Query language).
Among different types of DBMS software for this level we focused on SQL Server
2.1.2 SQL Server Editions
[Link] Editions: - delivers comprehensive high-end datacenter capabilities
with blazing-fast performance, unlimited virtualization1, and end-to-end
business intelligence enabling high service levels for mission-critical workloads
and end-user access to data insights.
[Link] Editions: - delivers basic data management and business intelligence
database for departments and small organizations to run their applications and
supports common development tools for on-premises and cloud enabling
effective database management with minimal IT resources.
[Link] Editions: - is a low total-cost-of-ownership option for Web hosters and
Web VAPs to provide scalability, affordability, and manageability capabilities
for small to large-scale Web properties.
[Link] Editions: - lets developers build any kind of application on top of
SQL Server. It includes all the functionality of Enterprise edition, but is licensed
for use as a development and test system, not as a production server. SQL
Server Developer is an ideal choice for people who build and test applications.
E. Express Editions: - is the entry-level, free database and is ideal for learning and
building desktop and small server data-driven applications. it is the best choice
for independent soft-ware vendors, developers, and hobbyists building client
applications. If you need more advanced database features, SQL Server Express
can be seamlessly upgraded to other higher end versions of SQL Server.
2.1.3 Choosing SQL Server Features
The common features of SQL server are summarized on the following table. all these features
may not be available in all edition of the SQL server.
Server Features Description
SQL Server SQL Server Database Engine includes the Database Engine, the core service for
Database storing, processing, and securing data, replication, full-text search, tools for
Engine managing relational and XML data, in database analytics integration.
Analysis Analysis Services includes the tools for creating and managing online analytical
Services processing (OLAP) and data mining applications creating, managing, and
deploying tabular, matrix, graphical, and free-form reports.
Master Data aster Data Services (MDS) is the SQL Server solution for master data
Services management. MDS can be configured to manage any domain (products,
customers, accounts) and includes hierarchies, granular security, transactions, data
versioning, and business rules, as well as an Add-in for Excel that can be used to
manage data.
Machine Machine Learning Services (In-Database) supports distributed, scalable machine
Learning learning solutions using enterprise data sources. In SQL Server 2016, the R
Services (In language was supported. SQL Server 2019 (15.x) supports R and Python.
Database)
Machine Machine Learning Server (Standalone) supports deployment of dis-tributed,
Learning scalable machine learning solutions on multiple platforms and using multiple
Server enterprise data sources, including Linux and Hadoop. In SQL Server 2016, the R
(Standalone) language was supported. SQL Server 2019 (15.x) supports R and Python.
SQL Server- SQL Server Management Studio (SSMS) is an integrated environment to access,
Management configure, manage, administer, and develop components of SQL Server. SSMS
Studio lets developers and administrators of all skill levels use SQL Server. The latest
edition of SSMS updates SMO, which includes the SQL Assessment API.
SQL Server SQL Server Configuration Manager SQL Server Configuration Manager provides
Configuration basic configuration management for SQL Server services, server protocols, client
Manager protocols, and client aliases.
SQL Server SQL Server Profiler SQL Server Profiler provides a graphical user interface to
Profiler monitor an instance of the Database Engine or Analysis Services.
Database Database Engine Tuning Advisor helps create optimal sets of indexes, indexed
Engine Tuning views, and partitions.
Advisor
SQL Server SQL Server Data Tools SQL Server Data Tools provides an IDE for building
Data Tools solutions for the Business Intelligence components: Analysis Services, Reporting
Services, and Integration Services.
Connectivity Installs components for communication between clients and servers, and network
Components libraries for DB-Library, ODBC, and OLE DB.

2.1.4 Installation Requirement for 2012&2014


 Windows machine having 2 core and 4 GB RAM configuration
 Setup support files.
 .net framework 4.0
 SQL Server native client.
 Windows installer 4.5/later version.
 Actual hard disk space requirements depend on your system configuration and the
features that you decide to install. for example, database engine with R Services
require 2744 MB disk space.
 the rest installation steps will be discussed in detail on the next operation sheet of
this learning outcome.
2.2 Developing a Simple relational database
2.2.1 Introduction to database development
Once logical and physical design is done, the process moves on to the actual database
development. By then, it's only a matter of bringing the database design to life and developing it
according to the schema specifications. design standards must be followed precisely so as to
limit the chances of data anomalies.
Normally the goal of database development is to create a database which contains a high quality
of data and provides the important access to the organization. Beside this there are certain other
goals of database development. These are:
 Develop a common vocabulary
 Define a meaning of data
 To ensure the quality of data
 To find an efficient implementation of data
Skills in Database Development
as a database designer, we have the following two types skills in database development.
A. Soft skill: It is the first skill in database development. These types of skills are qualitative,
subjective and people oriented. qualitative skill emphasizes the generation of feasible
alternatives.
B. Hard skill: This is the second type of skill in database development. This type of skill is
quantitative, objective and data oriented. The quantitative disciplines are statistics,
operational management or any mathematical model.
A database is a structured form of data held in compatible software within computers for
performing specific functions more proficiently.
Steps for developing a database are simple and consist of four stages namely
 Requirements Elicitation
 Conceptual Modeling
 Logical Modeling
 Physical Modeling.
It is significant to understand that Database Management is your key to handle big chunks of
data more effectively. Over a course of time, there occur many malfunctions and errors within
databases when left unchecked. A Professional Database Developer can organize different
entities and maximizes storage for data. This might sound not so tough but when the
implementation of Database Development and Database Management, the data is not streamlined
then it can result in potential data loss. It is recommended to store data timely and in the form of
categories over time and again to avoid any sort of misadventure.
2.2.2 Process of Database Development
Database developmental procedure involves careful study of the business domain and running
projects in line. It is just like building a house from scratch and estimating what budget you need
to construct even about a single brick that goes in building material. From analyzing to creating
essential formats including specific hardware or software available for the development of a
company’s prestigious databases, Implementation may be staged, usually with an initial system
that can be validated and tested before a final system is released for use.
Sectioning and storing of data are done with the help of the newest of versions and introduces
definitive possibilities to complex needs of businesses forming compliant solutions for unique
database requirements of different departments and functional units, both independently and in a
consolidated form.
Latest software and technological trends such as Oracle, MySQL, Teradata, SQL Server, IBM
DB2, Sybase, Netezza, PostgreSQL, and much more are used to convert, process, and store data
into large databases.
2.3 Writing Queries
2.3.1 Types of SQL Commands
The following sections discuss the basic categories of commands used in SQL to perform various
functions. These functions include building database objects, manipulating objects, populating
database tables with data, updating existing data in tables, deleting data, performing database
queries, controlling database access, and overall database administration. The main categories
are:
A. DDL (Data Definition Language)
B. DML (Data Manipulation Language)
C. DCL (Data Control Language);;
D. TCL (Transactional control language)
A. Data Definition Language
Data Definition Language, DDL, is the part of SQL that allows a database user to create and
restructure database objects, such as the creation or the deletion of a table. Some of the most
fundamental DDL commands include the following:
 CREATE: creates a new database
 ALTER: modifies
 DROP: deletes database/table
We specify a database schema by a set of definitions expressed by a special language called a
data-definition language (DDL). For instance, the following statement in the SQL language
defines the account table:
Create table account
(Account number char (10),
Balance integer)
Execution of the above DDL statement creates the account table. In addition, it updates a special
set of tables called the data dictionary.
B. Data Manipulation Language
Data Manipulation Language, DML, is the part of SQL used to manipulate data within objects of
a relational database. There are three basic DML commands:
 INSERT: inserts new data
 UPDATE: updates data
 DELETE: deletes record from a database
 SELECT: retrieves data from the database
A data manipulation language (DML) is a language that enables users to access or manipulate
data as organized by the appropriate data model.
Select customer. customer_name from customer
Were customer. customer_id= '192-83-7465'
The query specifies that those rows from the table customer where the customer-id is 192-83-
7465 must be retrieved, and the customer_name attribute of these rows must be displayed
C. Data Control Language
Data control commands in SQL allow you to control access to data within the database.
These DCL commands are normally used to create objects related to user access and also control
the distribution of privileges among users. Some data control commands are as follows:
 GRANT: grants permissions on a securable to a principal
 REVOKE: removes a previously ranted or denied permission
 EXCUTE AS, statement: sets the execution context of ta session
 EXCUTE AS, clause: defines the execution context of the following user-defined
modules: functions (except inline table- valued functions), procedures, queries and
triggers
D. Transactional control Language
Transactional control language (TCL), used to manage the changes made by DML statements. It
allows statements to be grouped together into logical transactions.
E.g. Commit, roll back and save point
 BEGIN TRANSACTION: marks the starting point of an explicit, logical transaction.
 COMMIT TRANSACTION: marks the end of a successful implicit or explicit
transaction
 ROLL BACK TRANSACTION: rolls back an explicit or implicit transaction to the
beginning of the transaction, or to a save point inside the transaction
 SAVE TRANSACTION: sets a save point with in a transaction.
2.3.2 Writing Queries to develop relational database
[Link] Write query to Create a Database
Creating a database involves using the SQL statement CREATE DATABASE. This statement
has the following general form.
CREATE DATABASE db_name
For the syntax above optional items appear in brackets [ ]. Items written in braces, { }, and
followed by “...” are items that can be repeated any number of times. db_name is the name of the
database. The maximum size of a database name is 128 characters. The maximum number of
databases managed by a single system is 32,767. All databases are stored in files. These files can
be explicitly specified by the system administrator or implicitly provided by the system.
For example
For the database designed earlier the database can physically create by the command:
Create database Hospital
[Link] Writing query to Create Tables
The CREATE TABLE statement creates a new base table with all corresponding columns and
their data types. The basic form of the CREATE TABLE statement is
CREATE TABLE table_name
(col_name1 type1 [(size)] [NOT NULL| NULL]
[{, col_name2 type2 [(size)] [NOT NULL| NULL]} ...]
[{, CONSTRAINT constraint name constraint type [constraint attributes])
table_name is the name of the created base table. The maximum number of tables per database is
limited by the number of objects in the database (there can be more than 2 billion objects in a
database, including tables, views, stored procedures, triggers, and constraints). col_name1,
col_name2,... are the names of the table columns. type1[(size)], type2[(size)],... are data types
and size of corresponding columns. Constraint name is name of the created constraint and
constraint type defines the type of constraint you are creating (Primary key/foreign key).
In the database created by the name selam, we can create the patient table as follows.
Table Definition example
Create table patient (MRN char(10) not null, FirstName
varchar(20) not null, MiddleName varchar(20), LastName
varchar(20), Sex char(6), BirthDate date ,city varchar (20)
Woreda varchar(20), Kebele varchar (20)constraint patient_pk
primary key (MRN))
Note that the constraint patient_pk defines MRN to be the primary key of the table.
[Link] Adding or Dropping a New Column
You can use the ADD clause of the ALTER TABLE statement to add a new column to the
existing table. Only one column can be added for each ALTER TABLE statement.
For example
ALTER TABLE patient
ADD telephone_no CHAR(12) NULL;
ALTER TABLE patient
ADD registered_date DATE;
The ALTER TABLE statements above separately add columns telephone_no and registered_date
to the patient table
You can use the DROP COLUMN clause to drop an existing column of a table.
ALTER TABLE patient
DROP COLUMN telephone_no;
The ALTER TABLE statement above removes the telephone_no column, which was added to
the patient table with the ALTER TABLE statement.
[Link] Inserting data into a Table
The insert statement is used to insert or add a row of data into the table.
To insert records into a table, enter the key words insert into followed by the table name,
followed by an open parenthesis, followed by a list of column names separated by commas,
followed by a closing parenthesis, followed by the keyword values, followed by the list of values
enclosed in parenthesis. The values that you enter will be held in the rows and they will match up
with the column names that you specify. Strings should be enclosed in single quotes, and
numbers should not.
Insert into "tablename" (first_column,...last_column) values
(first_value...last_value);
In the example below, the column name firstName will match up with the value 'Alemu',
and the column name city will match up with the value 'Addis Ababa'.
Example:
Insert into patient (FirstName MiddleName LastName Sex
BirthDate
city woreda kebele) values (‘Alemu’, ‘Zelalem’, ‘Molla’,’M’,
‘2/3/2000’, ‘Addis Ababa’ 06 24)
[Link] Updating Records
The update statement is used to update or change records that match a specified criterion. This is
accomplished by carefully constructing a where clause.
update "tablename" set "columnname" = "newvalue"
[,"nextcolumn" = "newvalue2"...]
where "columnname" OPERATOR "value"
[and|or "column" OPERATOR "value"];
[] = optional
[The above example was line wrapped for better viewing on this Web page.]
Examples:
update patient
set MiddleName = Tolla
where FirstName = Alemu;
[Link] SQL SELECT Statement
The select statement is used to query the database and retrieve selected data that match the
criteria that you specify. Here is the format of a simple select statement:
select "column1" [,"column2",etc] from "tablename"
[where "condition"];[] = optional
The column names that follow the select keyword determine which columns will be returned in
the results. You can select as many column names that you'd like, or you can use a "*" to select
all columns.
The table name that follows the keyword from specifies the table that will be queried to retrieve
the desired results.
The where clause (optional) specifies which data values or rows will be returned or displayed,
based on the criteria described after the keyword where.
Conditional selections used in the where clause:
= Equal
> Greater than
< Less than
>= Greater than or equal
<= Less than or equal
<> Not equal to
LIKE *See note below
The LIKE pattern matching operator can also be used in the conditional selection of the where
clause. Like is a very powerful operator that allows you to select only rows that are "like" what
you specify. The percent sign "%" can be used as a wild card to match any possible character that
might appear before or after the characters specified. For example:
select FirstName, MiddleName, LastName from patient were
FirstName LIKE 'Al%';
2.4 Designing Report layout
Report a set of attributes, in a predetermined format, based on many unrelated records. A report
usually contains the same attributes about each record. For example, a report might list the
product identifier, current year sales, and current year sales goal for all the products for which
sales are below goal. A report usually includes page numbers, titles on each page, the date the
report was printed, and other descriptive information.
Most good business applications contain a built-in reporting tool; this is simply a front-end
interface that calls or runs back-end database queries that are formatted for easy application
usage. For example, eHMIS (Electronic Health Management Information System) software
application may contain specifically defined reports on Health Center or Clinic or Hospital
Monthly Inpatient (IPD) disease report.
2.4.1 Setting conditions to generate a report
In simple terms a report is a database report presents information retrieved from a table or query
in a preformatted, attractive manner. To see with an example, assume that the database created at
the beginning of this topic has the following information. Look at the table found below. The
table holds patient information that got treatment in a given hospital. from the database it is
possible patients that came from Addis Ababa city
Table 14 : patient information

FirstNam MiddleName LastName Sex BirthDate city Woreda Kebele


e
Alemu Zelalem Molla M 2/3/2000 Addis Ababa 06 24

Temesgen Kebede Abebe M 5/3/1991 Dessie 05 15

Ali Jemal Kebede M 6/4/1995 Addis Ababa 04 17

Meserete Ayalew Yared F 4/5/1980 Shashemene 08 23

Tigist Tesfaye Mehari F 1/8/1978 Addis Ababa 15 27

So the query goes like all patients who came from Addis Ababa. The resulting output will be like
this as displayed in the following table
Table 15: patient information who came from Addis Ababa
FirstName MiddleName LastName Sex BirthDate city Woreda Kebele

Alemu Zelalem Molla M 2/3/2000 Addis Ababa 06 24

Ali Jemal Kebede M 6/4/1995 Addis Ababa 04 17

Tigist Tesfaye Mehari F 1/8/1978 Addis Ababa 15 27

Self-Check – 1 Written test

Name…………………………………………… ID………………………… Date……………...


Test I: Multiple Choice Questions
Directions: Read the question carefully then Incircle the best answer among the given
alternative
1. When you work at your health institution external stakeholder ask you to provide a list of
10 top disease from your database which data manipulation language should use for this
query
A. Insert
LG-60 Operation Sheet -1 Procedures of database Installation B.
Drop
C. Select D. Create
2. While working at Health System software like DHIS, higher system administers remove
you from site admin role this is resemble to which SQL data control language
A. Grant B. Revoke C. rollback D. Drop
Test I: Short Answer Questions
Directions: Read the question and provide a short answer for the following question
1. list and explain the common types SQL Language

1. Tools and equipment (requirement)


a. SQL Server 2012/14 setup file
b. Computer with a minimum of 2GB RAM and 2 core
c. Setup support files.
d. .net framework 4.0
e. SQL Server native client.
f. Windows installer 4.5/later version.
Note: use this video in addition to the following steps on installation steps of SQL Server.
[Link]
2. Procedures/Steps/Techniques
3. Download SQL Server [Link]
4. If you have internet connection, ask your instructor to provide the setup file by USB flash
disk, DVD, and external hard disk
5. Double-click the SQLFULL_x86_ENU_Install.exe or SQLFULL_x64_ENU_Install.exe.
6. Click the “SQLFULL_x86_ENU” or “SQLFULL_x64_ENU_Install.exe” folder then
double-click “SETUP” applicationChoose New SQL Server stand-alone installation

7. Click the first option of the right side seen on the above screen. This screen will open.
8. Click OK and the following screen pops up.

9. Click Next to get the following screen. Make sure to check the product key selection and
click Next.
10. Select the checkbox to accept the license option and click Next.

11. Select SQL Server feature installation option and click Next.
12. Feature selection, Select the components you want to install on your machine.
 Database Engine Services: Allow you to install SQL Server instance.
 Analysis Services: install an Analysis Services instance on standalone or on
cluster node.
 Reporting Services: Allow you to install the server as report server.
 SQL Server Data Tool: Allow you to install SQL Server Developer tool to work
with integration packages.
 Integration Services: Allow you to install Integration Services.
 Management Tool: Allow you to install SQL Server management configuration
tool including command line and power shell tool.
If you selected “All Features with Default” in previous step, all these components will be
checked automatically.

13. after selecting the features to install, setup again runs a check to ensure whether your
machine’s configuration is compatible. If all looks good, click Next.

14. Instance Configuration: this step will ask, what type of instance you want to configure, as
we all know, either we can install default or Named instance.
If default instance is already installed, you’ll have to have proceeded with named
instance.

15. Disk Space requirement at this step, you’ll get disk space summary which will show how
much disk space your instance will take on the machine.
16. Server Configuration On this step you’ll find options to specify Service Accounts and
Collation Configuration.
17. Select service account names and start-up types for the above listed services and click
Collation.

18. Make sure the correct collation selection is checked and click Next.
19. Make sure authentication mode selection and administrators are checked and click Data
Directories.

20. Make sure to select the above directory locations and click Next. The following screen
appears.
21. Click Next on the above screen.

22. Click Next on the above screen to the get the following screen.
23. Make sure to check the above selection correctly and click Install.
LG- 60 LAP TEST-1 Performance Test
Name………………………………………………….ID…………………Date
Time started: ________________________ Time finished: ________________
Instructions: Given the necessary templates, tools, and materials you are required to perform the
following tasks with in 3hr. The project is expected of each student to do it.
Task-1 Perform SQL Server 2012 Installation
LG-60 Operation Sheet 2

2.1 Create Mid-Sized Database


2.2 steps to create a database using T-Sql
1. Connect to the Database Engine.
2. From the Standard bar, select New Query.
3. Write these queries to create the database
Create database Primary Hospital
4. Execute these queries by right click on the empty workspace then click on execute
5. Create table Physician by writing this query
Create table Physician
(Full_name varchare(20),
Physician_Id int primary key,
Sex Varchar (7) not null,
Address varchar unique)
Then Execute
6. Create table Physician by writing this query
Create table patient
(Full_name varchare(20),
MRN int primary key,
Sex Varchar (7) not null,
Address varchar unique,
Age int)
Then Execute
7. Create table diagnosis by writing this query
Create table diagnosis
(diagnosis_id identity(0,1),
NCOD varchar(100).
MRN int foreign key,
Physician_Id int foreign key)
LAP TEST Performance Test

Name…………………………………………….ID………………………Date…………….
Time started: ________________________ Time finished: ________________
Instructions: Given necessary templates, tools and materials you are required to perform the
following tasks within 3 hours. The project is expected from each student to do it.
Task-1 Create Database Health_Science_College
Task-2 based on the information provided below create table called DEPARTMENT
Column Data Type Size Constraint
Did CHAR 4 PK, NOT NULL
Dname VARCHAR 30 UNIQUE, NOT NULL
Dlocation VARCHAR 50

Task-3 based on the information provided below create table called DEPARTMENT
Column Data Type Size Constraint
Sid CHAR 10 PK, NOT NULL
Fname VARCHAR 30 NOT NULL
Lname VARCHAR 30 NOT NULL
Sex CHAR 1 DEFAULT ‘F’, must be either ‘F’ or ‘M’
Year_of_Study INTEGER NOT NULL
Dbirth DATE
Age INETEGER Computed or derived from date of birth and current date
Sem_Payment DECIMAL (6, 2)
Paid CHAR 3 Computed or derived from Semester_Payment
Did CHAR 4 FK

Note: The value of ‘Paid’ column is derived from the value of ‘Semester_Payment’ and should be:
 ‘Yes’, if Semester_Payment = NOT NULL
 ‘No’, if Semester_Payment = NULL
Task-4 based on the information provided below create table called Course

Column Data Type Size Constraint


Cno CHAR 10 PK, NOT NULL
Cname VARCHAR 50 UNIQUE, NOT NULL
Credit_hours INTEGER BETWEEN 2 AND 4 inclusive
Did CHAR 4 FK
Task-5 based on the information provided below create table called Section
Column Data Type Size Constraint
Section_id INTEGER IDENTITY(10, 1), PK , NOT NULL
Course_number CHAR 10 FK, NOT NULL
Semester CHAR 4 NOT NULL
Academic_year INT Must be 2015 – 2018, DEFAULT: System year
Instructor VARCHAR 10
Task-6 based on the information provided below create table called Grade_Report

Column Data Type Size Constraint


Stid CHAR 10 PK, FK, NOT NULL
Section_id INTEGER PK, FK, NOT NULL
Grade CHAR Must be one of {‘A’, ‘B’, ‘C’, ‘D’, ‘F’}
Remark Computed or derived from Grade
Note: The value of ‘Remark’ column is derived from the value of Grade column and
should be:
 ‘Excellent’, if Grade = ‘A’  ‘Poor’, if Grade = ‘D’
 ‘Very Good’, if Grade = ‘B’  ‘Fail’, if Grade = ‘F’
 ‘Good’, if Grade = ‘C’
Task-7 based on the information provided below create table called Prerequisite
Column Data Type Size Constraint
Course_number CHAR 10 PK, FK, NOT NULL
Prno CHAR 10 PK, NOT NULL
Task-8 Insert all records to each table as given below.
Task-9 based on the information provided below create table called Department

Did Dname Dlocation


D01 Computer Science ethiopia
D02 Health Informatics ethiopia
D03 Information Technology ethiopia
Task-10 based on the information provided below enter data to student table

Sid Fname Lname Sex Yrs Dbirth Age Sem_Payment Paid Did
et001 Andinet Tefera M 1 1998-10-07 2200.00 Yes D03
et002 Sofiya Ahmed F 2 1995-02-02 2580.25 Yes D01
Et003 Asnake Wondim M 3 1995-07-18 2500.00 Yes D02
et004 Genet Lema F 4 1994-02-13 2650.75 Yes D01
et005 Genet Daniel F 4 1995-09-09 2350.25 Yes D03
et006 Solomom Tadesse M 4 1993-12-06 NULL No D02
et007 Adane Eyayu M 4 1994-09-01 2600.75 Yes D02
et008 Kidanu Belete M 1 1997-05-23 NULL No D03
et009 Seble Tefera F 2 1996-04-11 2580.25 Yes D01
et010 Meron Hailu F 4 1994-07-28 2600.75 Yes D02
et011 Jemal Hassen M 1 1995-01-25 2450.00 Yes D02
et012 Semira Kalid F 3 1995-03-16 2500.00 Yes D02

Task-11 based on the information provided below enter data to Course table

Cno Cname Credit_hours Did


CS1310 Intro to Computer Science 4 D01
IT3320 C++ 3 D02
MATH2410 Discrete Mathematics 4 D03
IT3141 Fundamentals of DBS 3 D02
IT3142 Advanced Database 4 D02
IT4930 Internet Programming 3 D02
CS3380 OOP with Java 4 D01
IT3445 Computer Networking 4 D02

Task-12 based on the information provided below enter data to Section table

Section_id Course_number Semester Academic_year Instructor


10 MATH2410 I 2016 Melkamu
11 CS1310 I 2016 Kemila
12 IT3320 II 2017 Abrham
13 MATH2410 I 2018 John
14 IT3445 II 2017 Saba
15 CS1310 I 2017 Teferi
16 CS3380 II 2015 Sara
17 IT3141 I 2018 Mohammed
18 IT4930 II 2015 Roza
19 IT3142 II 2018 Mohammed

Task-13 based on the information provided below enter data to PREREQUISITE table

Course_number Prno
CS3380 IT3320
CS3380 MATH2410
IT3142 IT3141
IT3320 CS1310

Task-14 based on the information provided below enter data to GRADE_REPORT table
Stid Section_id Grade Remark
GU5501/10 13 B Very Good
GU5004/08 14 C Good
GU5202/09 16 A Excellent
GU5202/09 11 A Excellent
GU5003/08 14 A Excellent
GU5503/10 12 B Very Good
GU5202/09 15 A Excellent
GU5503/10 17 A Excellent
GU4504/07 15 C Good
GU5503/10 18 C Good
GU5501/10 10 B Very Good
GU4508/07 19 B Very Good
GU4505/07 10 A Excellent
GU4506/07 18 C Good
GU5004/08 19 A Excellent
GU4506/07 14 B Very Good
GU4507/07 12 A Excellent
GU4507/07 14 A Excellent
GU5508/10 10 B Very Good
GU5209/09 11 A Excellent
GU5202/09 13 B Very Good
Task 15 Specify the following queries in SQL.
1. Retrieve all students whose first name starts with ‘S’.
2. Retrieve all records of the first five students.
3. Retrieve grade and student id whose grade is either ‘B’, ‘C’ or ‘D’.
4. Retrieve the names of all senior students whose department is health informatics.
5. Retrieve all students where semester payment is null.
6. Retrieve the names and course number of all courses taught by Mr. Mohammed.
7. Retrieve the course number and name of all courses which have prerequisite course.
8. Retrieve all female students whose department is either computer science or health
informatics.
9. Retrieve the list of students with student id, student name, grade and remark. The list
should be ordered alphabetically by student first name, then last name.
10. Retrieve the course number and name of all courses which have no prerequisite course.
11. For each section taught by Mr. Mohammed, retrieve the course number, semester,
academic year, and number of students who took the section.
12. Retrieve the names and department ID of all students who do not have any grade of ‘A’
in any of their courses.
13. Find the sum of the semester payment, the maximum payment, the minimum payment,
and the average payment of all students.
14. Find the sum of the semester payment, the maximum payment, the minimum payment,
and the average payment of all students whose department is health informatics.
15. Retrieve the number of students who score grades ‘A’ in any course.
16. Count the number of distinct semester payment values in the database.
17. Change the year of study of a student to 3 whose id is ‘GU4508/07’.
18. Delete the record for the student whose first name is ‘Lema’
and whose student id is ’GU5001/10’.
Task 16 Create the following views based on the given specification.
1. Create a view called ‘Student_ Transcript’ that has course name, course number, grade,
semester, academic year, and section identifier.
LG #61 LO #3-Ensure data entry and integrity
Instruction sheet
This learning guide is developed to provide you with the necessary information regarding the
following content coverage and topics:
 Designing user friendly data entry form
 Checking application of data entry procedure based on institutional guideline/manual.
 Checking Data integrity rules for completeness and accuracy
 Identifying and implementing Data security, confidentiality and integrity mechanisms
This guide will also assist you to attain the learning outcome stated on the cover page.
Specifically, upon completion of this Learning Guide, you will be able to:
 Design user friendly data entry form
 Check application of data entry procedure
 Check Data integrity rules for completeness and accuracy
 Identifying and implementing Data security, confidentiality and integrity mechanisms

Learning Instructions:
1 Read the specific objectives of this Learning Guide.
2 Follow the instructions described below 3 to 6.
3 Read the information written in the information “Sheet 1”
4 Accomplish the “Self-check 1”
5 If you earned a satisfactory evaluation from the “Self-check” proceed to “Operation Sheet
LG-61 Information Sheet 1

1.1 Designing User-Friendly Data Entry Form


1.1.1 Introduction to form
A form can have one primary data connection, called the main data connection, and it can
optionally have one or more secondary data connections. Depending on your goals for the form,
a data connection might query or submit form data to an external data source, such as a
Microsoft SQL Server database or a Web service.

When you design a form template that is based on a database, Microsoft Office InfoPath creates
a main data source with groups that contain query fields and data fields and a query data
connection as the main data connection for the form template. These fields and groups
correspond to the way that data is stored in the tables in the database. Query fields contain the
data that is entered by a user to limit the query results to records that match the data in the query
fields. When a form based on this form template uses the main data connection, InfoPath creates
a query by using the data in the query fields. InfoPath then sends the query through the query
data connection. The database returns the results of the query back to the form through the query
data connection. The results of the query are put into data fields, which can be edited through
controls that are bound to these fields. A form can submit data to a database through the form's
main data connection if the form template that the form is based on and the database meet the
following requirements:

 The form template is not a browser-compatible form template InfoPath will not
create a submit data connection in the main data connection if you are designing a
browser-compatible form template. To allow users to submit data in a form that is
based on a browser-compatible form template, use a Web service that works with the
database.
 The left table in each pair of related tables in the main data source contains a
primary key At least one of the relationships for every pair of related tables must
include a primary key from the left table.
 None of the data fields in the main data source of the form store a large binary
data type InfoPath will disable the submit data connection if the query includes
fields that can store a large binary data type, such as pictures, images, OLE objects,
file attachments, the Office Access memo data type, or the SQL Text data type.
1.1.2 Compatibility considerations
When you design a form template that is based on a database, you have the option of designing a
browser-compatible form template. InfoPath will create a query data connection as the main data
connection in a browser-compatible form template. However, browser-compatible form
templates cannot be configured to allow users to submit data to a database. Therefore, if you are
designing a form template based on a SQL Server database and you want your users to submit
their form data to the database through the main data connection, you cannot make that form
template browser-compatible.
[Link] Before you begin
In order to design your form template based on a SQL Server database, you need the following
information from your database administrator:
The name of the server that contains the database that you will connect your form template to.
 The name of the database that you will use with this form template.
 The authentication required by the database. The database can use either Microsoft
Windows authentication or SQL Server authentication to determine how users can access
the database.
 The name of the table that contains the data that you want to send to the form or that will
receive data from the form. This is the primary table. If you are going to use more than
one table in the database, you need the names of those other, child tables. You also need
the names of the fields in the child tables that have relationships with the fields in the
primary table

1.1.3 Design the Form Template


To design a form template with a query data connection, you need to do the following:
A. Create the form template: When you create a form template that is based on a database,
InfoPath creates a query data connection as the main data connection between the form
template and the database. This process automatically creates the form template's main
data source.
B. Add one or more controls to display the query results: To allow users to see and edit
the data in the fields in the main data source when they open the form, you can add a
control to the form template and then bind that control to a field in the main data source.
Step 1: Create the form template
1. On the File menu, click Design a Form Template.
2. Under Design a new, in the Design a Form Template dialog box, click Form
template.
3. In the Based-on list, click Database.
4. If you are designing a browser-compatible form template, select the Enable browser-
compatible features only check box.
Note: The Data Connection Wizard will not enable the submit data connection in a browser-
compatible form template. To allow users to submit their data to a database from forms that are
based on a browser-compatible form template, add a secondary data connection to a Web
service that works with the database.

1. Click OK.
2. In the Data Connection Wizard, click Select Database.
3. In the Select Data Source dialog box, click New Source.
4. In the What kind of data source do you want to connect to list, click Microsoft SQL
Server, and then click Next
5. In the Server name box, type the name of the server with the SQL Server database.
6. Under Log on credentials, do one of the following:
 If the database determines who has access based on the credentials used in a
Microsoft Windows network, click Use Windows Authentication.
 If the database determines who has access based on a specified user name and
password that you get from the database administrator, click Use the following
User Name and Password, and then type your user name and password in the
User Name and Password boxes.
7. Click Next.
8. In the Select the database that contains the data you want list, click the name of the
database that you want to use, select the Connect to a specific table check box, click
the name of the primary table, and then click Next.
9. On the next page of the wizard, type a name for the file that stores the data connection
information in the File Name box, and then click Finish to save these settings.
Step 2: Add one or more controls to display the query results

1. If the Controls task pane is not visible, click More Controls on the Insert menu, or
press ALT+I, C.
2. Drag a control onto your form template.
3. In the Control Binding dialog box, select the group or field that you want to bind the
control to.
1.1.4 Configure the submit options
If your form template and the tables that you selected in the Data Connection Wizard meet the
requirements in the Overview section, InfoPath configures your form template to submit data
through its main data connection. If you choose to use this submit data connection, InfoPath
configures the form template so that users can submit their form data to the database and adds
both the Submit button to the Standard toolbar and the Submit command to the File menu
when users fill out the form. InfoPath also configures the form template so that, when users
submit their forms, the form remains open and a message is displayed to the user that indicates
whether the form was successfully submitted. You can change the text on the Submit button and
the behavior after the user submits a form.
1. On the Tools menu, click Submit [Link] change the name of the Submit button
that appears on the Standard toolbar and the Submit command that appears on the File
menu when users fill out the form, type the new name in the Caption box in the Submit
Options dialog box.
2. If you do not want people to use a Submit command or the Submit button on the
Standard toolbar when they fill out your form, clear the Show the Submit menu item
and the Submit toolbar button check box. By default, after users submit a form,
InfoPath keeps the form open and displays a message to indicate if the form was
successfully submitted. To change this default behavior, click Advanced, and then do
one of the following:
 To close the form or create a new blank form after the user submits a completed
form, click the option that you want in the After submit list.
 To create a custom message to indicate if the form was successfully submitted,
select the Use custom messages check box, and then type your messages in the
on success and On failure boxes.
 If you do not want to display a message after the user submits a form, clear the
Show success and failure messages check box

1.2 Checking application of data entry procedure based on institutional


guideline/manual.
1.2.1 Introduction to Data Entry

Data entry is a big job in Health Information professional and require greater accuracy of the
data what enter. For example, in DHIS2(District Health Information Software) Data entry is
done using datasets (Forms) for an intended organizational unit and for a specified period. So,
data entry requires form created by an application. When you perform data entry in Health
Institution always check the following data entry procedure check
 Proof reading
 Accuracy of data
 Outcome of sorting/filtering
 Ensuring instructions with regard to content and format have been followed
 Timeliness of data entry
 Checking data for completeness, accuracy and security using automated
facilities such as spell checking and sorting data.
1.2.2 Data Validation in SQL
A validation rule is based on an expression which defines a numeric relationship between data
element values. The expression forms a condition which should assert that certain logical criteria
are met. The expression consists of:
 a left side
 a right side
 an operator
A validation rule could assert that "Suspected malaria cases tested" >= "Confirmed malaria
cases".
The left and right sides must return numeric values. Data validation is the method for checking
the accuracy and quality of data.
It is often performed prior to adding, updating, or processing data.
When validating data, we can check if the data is:
 complete (no blank or null values)
 unique (no duplicate values)
 consistent with what we expect (eg a decimal between a certain range).

Some examples of how we’d use these checks include:


DHIS2 provides facilities to assign minimum and maximum values and it also help identify
outliers (example outbreak).
Some of you may be saying to yourself “data validation like this should be performed in the
application layer, not the database layer”. Of course, validation in the application layer is
crucial. But there are instances where updates could be performed directly in the database. It is
also good practice to make sure that the database has some form of data validation even it also
exists elsewhere. We’re going to take a look in more detail at the three constraints we’re most
likely to use to validate our data – the NOT NULL, UNIQUE, and CHECK constraints
1.2.3 Constraints in SQL
Constraints in SQL Server are rules that limit the data that goes into our tables. These
constraints ensure the validity of the data and help maintain the integrity of the database.
Constraints can be defined when tables are created or afterward and can apply to individual or
multiple columns.
If we insert data in the column that meets the constraint rule criteria, SQL Server inserts data
successfully. However, if data violates the constraint, the insert statement is aborted with an error
message.
A. NOT NULL Constraint
SQL Server allows NULL values, which translate to ‘unknown value’. There are valid use cases
for using NULLS, but there are also obvious cases where we can’t accept a NULL value. In
these cases, we can define a NOT NULL constraint on a column. In the example below, we are
creating an employee’s table, and specifying that all columns except ‘Middle_Name’ will not
accept NULLs.
CREATE TABLE Doctor
(DocID INT NOT NULL, FirstName Varchar(100) NOT NULL,
MiddleName Varchar(50) NULL, LastName Varchar(100) NOT
NULL,Gender char(1) NOT NULL, Address Varchar(200) NOT NULL);
For example, in DHIS2 every field of data entry should be filled otherwise you can’t pass to the
next field.

1.2.4 UNIQUE Constraint


We typically use the UNIQUE constraint on ID columns. All primary key is Unique by default
but all unique attribute may not be primary key. In the example below, we’re creating a simple
table and specifying that that ‘EmployeeID’ should be unique (and not NULL).
CREATE TABLE Employees
(EmployeeID INT NOT NULL UNIQUE,
FirstName Varchar(100) NOT NULL,
MiddleName Varchar(50) NULL,
LastName Varchar(100) NOT NULL,
Gender char(1) NOT NULL,
Address Varchar(200) NOT NULL);

1.2.5 CHECK Constraint


A check constraint consists of a logical expression to determine what values are valid. A simple
example would be in a payroll database where we want to specify a maximum value that can be
entered. The syntax for the CHECK constraint when creating a table is shown below.
CREATE TABLE table_name
(column1 datatype [ NULL | NOT NULL ],
column2 datatype [ NULL | NOT NULL ],
CONSTRAINT constraint_name CHECK (column_name condition));
A check constraint is a rule that identifies acceptable column values for data in a row within a
SQL Server table. Check constraints help enforce domain integrity. Domain integrity defines the
valid values for columns within a database table. A check constraint can validate the domain
integrity of a single column or a number of columns. You can have multiple check constraint for
a single column.

1.3 Checking Data Integrity Rules for Completeness and Accuracy


1.3.1 Introduction To Data Integrity
Data integrity is the overall accuracy, completeness, and consistency of data. Data integrity also
refers to the safety of data in regard to regulatory compliance and security. It is maintained by a
collection of processes, rules, and standards implemented during the design phase. When the
integrity of data is secure, the information stored in a database will remain complete, accurate,
and reliable no matter how long it’s stored or how often it’s accessed.
The importance of data integrity in protecting yourself from data loss or a data leak cannot be
overstated: in order to keep your data safe from outside forces with malicious intent, you must
first ensure that internal users are handling data correctly. By implementing the appropriate data
validation and error checking, you can ensure that sensitive data is never miscategorized or
stored incorrectly, thus exposing you to potential risk.
1.3.2 Types of data integrity
Maintaining data integrity requires an understanding of the two types of data integrity: physical
integrity and logical integrity. Both are collections of processes and methods that enforce data
integrity in both hierarchical and relational databases.

A. Physical integrity
Physical integrity is the protection of the wholeness and accuracy of that data as it’s stored and
retrieved. When natural disasters strike, power goes out, or hackers disrupt database functions,
physical integrity is compromised. Human error, storage erosion, and a host of other issues can
also make it impossible for data processing managers, system programmers, applications
programmers, and internal auditors to obtain accurate data.

B. Logical integrity
Logical integrity keeps data unchanged as it’s used in different ways in a relational database.
Logical integrity protects data from human error and hackers as well, but in a much different
way than physical integrity does. There are four types of logical integrity:
C. Entity integrity Entity integrity relies on the creation of primary keys. the unique
values that identify pieces of data to ensure that data isn’t listed more than once and
that no field in a table is null. It’s a feature of relational systems which store data in
tables that can be linked and used in a variety of ways.
D. Referential integrity Referential integrity refers to the series of processes that
make sure data is stored and used uniformly. Rules embedded into the database’s
structure about how foreign keys are used ensure that only appropriate changes,
additions, or deletions of data occur. Rules may include constraints that eliminate
the entry of duplicate data, guarantee that data entry is accurate, and/or disallow the
entry of data that doesn’t apply.
E. Domain integrity Domain integrity is the collection of processes that ensure the
accuracy of each piece of data in a domain. In this context, a domain is a set of
acceptable values that a column is allowed to contain. It can include constraints and
other measures that limit the format, type, and amount of data entered.
F. User-defined integrity User-defined integrity involves the rules and constraints
created by the user to fit their particular needs. Sometimes entity, referential, and
domain integrity aren’t enough to safeguard data. Often, specific business rules
must be taken into account and incorporated into data integrity measures.

1.4 Identifying and implementing Data security, confidentiality and integrity


mechanisms
1.4.1 Data security
Database security refers to the range of tools, controls, and measures designed to establish and
preserve database confidentiality, integrity, and availability. Database security in its broad sense
it deals about protection of the data against accidental or intentional loss, destruction, or misuse.
Database security is a complex and challenging endeavor that involves all aspects of information
security technologies and practices. It’s also naturally at odds with database usability. The more
accessible and usable the database, the more vulnerable it is to security threats; the more
invulnerable the database is to threats, the more difficult it is to access and use. Increased
difficulty due to Internet access and client/server technologies are arising.
A. Legal and ethical issues regarding the right to access certain information. Some
information may be deemed to be private and cannot be accessed legally by unauthorized
persons.
B. Policy issues at the governmental, institutional, or corporate level as to what kinds of
information should not be made publicly available—for example, credit ratings and
personal medical records.
C. System-related issues such as the system levels at which various security functions
should be enforced—for example, whether a security function should be handled at the
physical hardware level, the operating system level, or the DBMS level.
D. The need in some organizations to identify multiple security levels and to categorize the
data and users based on these classifications—for example, top secret, secret,
confidential, and unclassified. The security policy of the organization with respect to
permitting access to various classifications of data must be enforced.
Database security must address and protect the following:

 The data in the database


 The database management system (DBMS)
 Any associated applications
 The physical database server and/or the virtual database server and the underlying
hardware
 The computing and/or network infrastructure used to access the database

1.4.2 Common threats and challenges


Threats to data security may be direct threats to the database. For example, those who gain
unauthorized access to a database may then browse, change, or even steal the data to which they
have gained access. Focusing on database security alone, however, will not ensure a secure
database. All parts of the system must be secure, including the database, the network, the
operating system, the building(s) in which the database resides physically, and the personnel who
have any opportunity to access the system. Many software misconfigurations, vulnerabilities, or
patterns of carelessness or misuse can result in breaches. The following are among the most common
types or causes of database security attacks and their causes.

FIG 14 Possible locations of data security threats

The above diagrams many of the possible locations for data security threats, accomplishing this
level of security requires careful review, establishment of security procedures and policies, and
implementation and enforcement of those procedures and policies. The following threats must be
addressed in a comprehensive data security plan:
Accidental losses, including human, error, software, and hardware-caused breaches Establishing
operating procedures such as user authorization, uniform software installation procedures, and
hardware maintenance schedules are examples of actions that may be taken to address threats
from accidental losses.
As in any effort that involves human beings, some losses are inevitable, but carefully planned
policies and procedures should reduce the amount and severity of losses. Of potentially more
serious consequence are the threats that are not accidental
1.4.3 Common Types of Database Security attacks and causes
A. Insider threats
An insider threat is a security threat from any one of three sources with privileged access to the
database:
 A malicious insider who intends to do harm
 A negligent insider who makes errors that make the database vulnerable to attack
 An infiltrator an outsider who somehow obtains credentials via a scheme such as
phishing or by gaining access to the credential database itself
Insider threats are among the most common causes of database security breaches and are often
the result of allowing too many employees to hold privileged user access credentials.
B. Human error
Accidents, weak passwords, password sharing, and other unwise or uninformed user behaviors
continue to be the cause of nearly half (49%) of all reported data breaches.
C. Exploitation of database software vulnerabilities
Hackers make their living by finding and targeting vulnerabilities in all kinds of software,
including database management software. All major commercial database software vendors and
open-source database management platforms issue regular security patches to address these
vulnerabilities, but failure to apply these patches in a timely fashion can increase your exposure.
D. SQL/NoSQL injection attacks
A database-specific threat, these involve the insertion of arbitrary SQL or non-SQL attack strings
into database queries served by web applications or HTTP headers. Organizations that don’t
follow secure web application coding practices and perform regular vulnerability testing are open
to these attacks.
E. Buffer overflow exploitations
Buffer overflow occurs when a process attempts to write more data to a fixed-length block of
memory than it is allowed to hold. Attackers may use the excess data, stored in adjacent memory
addresses, as a foundation from which to launch attacks.
F. Denial of service (DoS/DDoS) attacks
In a denial of service (DoS) attack, the attacker deluges the target server in this case the database
server with so many requests that the server can no longer fulfill legitimate requests from actual
users, and, in many cases, the server becomes unstable or crashes.
In a distributed denial of service attack (DDoS), the deluge comes from multiple servers, making
it more difficult to stop the attack.
1.4.4 Types of Data Security
A. Data encryption.
Data encryption is a way of translating data from plaintext (unencrypted) to ciphertext
(encrypted). Users can access encrypted data with an encryption key and decrypted data with a
decryption key.
B. Data backup to the cloud.
C. Password protection.
D. identity and access management
E. intrusion detection and prevention software
1.4.5 Confidentiality and Integrity
A. Confidentiality
Confidentiality means that data, objects, and resources are protected from unauthorized viewing
and other access.
Confidentiality measures protect information from unauthorized access and misuse. Most
information systems house has some degree of sensitivity. It might be proprietary business
information that competitors could use to their advantage, or personal information regarding an
organization’s employees, customers, or clients.
Confidential information often has value and systems are therefore under frequent attack as
criminals hunt for vulnerabilities to exploit. Threat vectors include direct attacks such as stealing
passwords and capturing network traffic and more layered attacks such as social engineering and
phishing. Not all confidentiality breaches are intentional.
A few types of common accidental breaches include
 emailing sensitive information to the wrong recipient
 publishing private data to public web servers
 leaving confidential information displayed on an unattended computer monitor.
Healthcare is an example of an industry where the obligation to protect client information is
very high. Not only do patients expect and demand that healthcare providers protect their
privacy, but there are also strict regulations governing how healthcare organizations manage
security.
The Health Insurance Portability and Accountability Act (HIPAA) addresses security, including
privacy protection, in the handling of personal health information by insurers, providers, and
claims processors. HIPAA rules mandate administrative, physical, and technical safeguards, and
require organizations to conduct a risk analysis.
There are many countermeasures that organizations put in place to ensure confidentiality.
Passwords, access control lists, and authentication procedures use software to control access to
resources. These access control methods are complemented by the use of encryption to protect
information that can be accessed despite the controls, such as emails that are in transit.
Additional confidentiality countermeasures include administrative solutions such as policies and
training, as well as physical controls that prevent people from accessing facilities and equipment.
B. Integrity
Integrity means that data is protected from unauthorized changes to ensure that it is reliable and
correct.
Integrity measures protect information from unauthorized alteration. These measures provide
assurance in the accuracy and completeness of data. The need to protect information includes
both data that is stored on systems and data that is transmitted between systems such as email. In
maintaining integrity, it is not only necessary to control access at the system level but to further
ensure that system users are only able to alter information that they are legitimately authorized to
alter. As with confidentiality protection, the protection of data integrity extends beyond
intentional breaches. Effective integrity countermeasures must also protect against unintentional
alteration, such as user errors or data loss that is a result of a system malfunction.
There are many countermeasures that can be put in place to protect integrity:
 Access control and rigorous authentication can help prevent authorized users from
making unauthorized changes.
 Hash verifications and digital signatures can help ensure that transactions are
authentic and that files have not been modified or corrupted.
Data integrity vs data security
Data integrity is not data security. The two concepts are clearly summarized on the following
table
data security Data integrity
 keep data from getting corrupted.  keep the data to be intact and accurate
 Its motive is the protection of data.  Its motive is the validity of data.
 It avoids unauthorized access to data.  It avoids human error when data is entered.
 It can be implemented through  It can be implemented through.
 user accounts (passwords)  user accounts (passwords)
 authentication schemes  authentication schemes

Data integrity is not data quality. does the data in your database meet company-defined standards
and the needs of your business? Data quality answers these questions with an assortment of
processes that measure your data’s age, relevance, accuracy, completeness, and reliability.
Much like data security, data quality is only a part of data integrity, but a crucial one. Data
integrity encompasses every aspect of data quality and goes further by implementing an
assortment of rules and processes that govern how data is entered, stored, transferred, and much
more.
In DHIS2 data integrity is assured when higher-level like zonal health department can not edit
lower-level health center data. for incentivize purpose service some service needed to increase
artificially on the report after data is entered. This is usually done higher officials and DHIS2 can
handle such integrity breach.

Self-Check – 1 Written test

Name…………………………………………… ID………………………… Date……………...


Test I: Multiple Choice Questions
Directions: Read the question carefully then Incircle the best answer among the given
alternative value
1. way of translating data from plaintext (unencrypted) to ciphertext (encrypted).
A. Data Integrity B. data Security C. data Confidentiality D. data backup
2. Security threat is described by making the database server with so many requests then the
server becomes unstable or crashes.
A. SQL Injection B. denial of service C. buffer overflow D. human error
Test I: Short Answer Questions
Directions: Read the question and provide a short answer for the following question
1. explain the common Security threat

References
1. Elmasri, R., & Navathe, S. B. (2017). Fundamentals of Database Systems 7th Edition.
2. Jeffrey A. Hoffer, V. Ramesh (2017) Modern Database Management 13th Edition.
3. Ramez Elmasri, Shamkant Navathe. (2018) Brief History of Database Applications . Retrieved April
01,2022, from [Link] Database
Applications_11395/30/03/22
4. Database management system tutorial (2011). Retrieved April 01,2022, from
[Link] 30/03/22
5. Niroj Kumar, database development process (2022) Retrieved March 30,2022 from
[Link]

You might also like