Data Management: DBMS vs. Warehousing
Data Management: DBMS vs. Warehousing
1
File System
• File system was an early attempt to computerize the manual filing
system.
• A file system is a method for storing and organizing computer
files and the data to make it easy to find and access.
• File Systems may use a storage device such as a hard disk or CD‐
ROM.
• It is group of files storing data of an organization.
• Each file is independent from one another.
• Each file contained and processed information for one specific
function like accounting or inventory.
• Files are designed by using application programs written in
programming languages such as COBOL, C, C++, etc….
2
THE DATA HIERARCHY
A File based system
4
Problems of file system
• Data redundancy and inconsistency
• Program data dependence
• Lack of flexibility
• Poor security
• Lack of data sharing and
• Unavailability of data at right time
Data Redundancy
• Often the same information is duplicated in two or more files. It
may lead to inconsistency
• If this is not done, it will lead to multiple different values for same
data field.
6
Difficulty in Accessing Data
7
Data Isolation
• Data isolation means that all the related data is not available in one
file.
• Generally, the data is scattered in various files, and the files may be
in different formats, therefore writing new application programs to
retrieve the appropriate data is difficult.
8
Integrity Problem
• Integrity constraints refers to the rules that ensures completeness
and reliability of data.
• Integrity constraints (e.g. account balance > 0) become part of
program code
• Hard to add new constraints or change existing ones
9
Concurrent Access Anomalies
• Many systems allow multiple users to update the data
simultaneously. In such environment, interaction of concurrent
updates may result in inconsistent data.
10
Data Security
• The data as maintained in flat files is easily accessible and
therefore not secure.
11
Database Approach
• The limitations of File based system is overcome by a Database system
• A database is a computer based record keeping system whose over all purpose is
to record and maintain data.
• The database is a single, large repository of data, which can be used
simultaneously by many departments and users.
• DBMS is an intermediate layer between programs and the data. Programs access
the DBMS, which then accesses the data.
12
DBMS
• DBMS A database management system is the software system
that allows users to define, create and maintain a database and
provides controlled access to the data.
13
HUMAN RESOURCES DATABASE WITH MULTIPLE VIEWS
• Relational DBMS
– Represent data as two-dimensional tables
– Each table contains data on entity and attributes
• Table: grid of columns and rows
– Rows (tuples): Records for different entities
– Fields (columns): Represents attribute for entity
– Key field: Field used to uniquely identify each record
– Primary key: Field in table used for key fields
– Foreign key: Primary key used in second table as look-up field to
identify records from original table
Relational Data Model
16
Relational Data Model
17
Example of a Relation
attributes
(or columns)
tuples
(or rows)
18
• Normalisation (avoid redundancy and data
inconsistencies)
– Different forms (1NF,2NF,3NF etc.)
• Primary key(unique field in that table)
– Emp_Id,Customer_Id etc.
• Foreign Keys(look up field in other table where
this is primary key)
26
Data Anomalies
27
Dependencies: Definitions
• Multivalued Attributes (or repeating groups): the values
of non-key attributes or groups of non-key attributes are
not uniquely identified by (directly or indirectly) or not
functionally dependent on the value of the Primary Key
(or its part).
STUDENT
28
General Hardware Company:
Unnormalized Data
Sales-person Sales-person Year Depart-ment
Number Product Name Commission of Number Manager Product Unit
Number Percentage Hire Name Name Price Quantity
137 19440 Baker 101995 73 Scott Hammer 17.50 473
24013 Saw 26.25 170
26722 Pliers 11.50 688
186 16386 Adams 15 2001 59 Lopez Wrench 12.95 1745
19440 Hammer 17.50 2529
21765 Drill 32.99 1962
24013 Saw 26.25 3071
204 21765 Dickens 10 1998 73 Scott Drill 32.99 809
26722 Pliers 11.50 734
361 16386 Carlyle 20 2001 73 Scott Wrench 12.95 3729
21765 Drill 32.99 3110
26722 Pliers 11.50 2738
SALESPERSON/PRODUCT Table
30
Example 1: Determine NF
All attributes are directly
• ISBN Title or indirectly determined
• ISBN Publisher by the primary key;
therefore, the relation is at
• Publisher Address least in 1 NF
BOOK
31
Example 2: Determine NF
In your solution you will write
the following justification:
• Part_ID Description 1) There are M/V attributes;
• Part_ID Price therefore, not 1NF
Conclusion: The relation is not
• Part_ID, Comp_ID No normalized.
PART
32
S_id S_Name S_Address Subject_opted
401 Adam Noida Bio
402 Alex Panipat Maths
403 Stuart Jammu Maths
404 Adam Noida Physics
The Primary key is usually a single column, but sometimes more than one column can
be combined to create a single primary key. For example consider a table which is not in
First normal form
Student Table :
In First Normal Form, any row must not have a column in which more than one value is
saved, like separated with commas. Rather than that, we must separate such data into
multiple rows.
Conversion Process
Operational DB
Middleware
Data Warehouse
43
Data Mining
44
Data Mining
• It is discovery driven.
• To bring out hidden trends/patterns from the
data
– Associations
– Sequential patterns
– Clustering or grouping etc.
Knowledge Discovery in
Databases", or KDD
• data cleaning (to remove noise or irrelevant data),
• data integration (where multiple data sources may be combined)1,
• data selection (where data relevant to the analysis task are retrieved from
the database),
• data transformation (where data are transformed or consolidated into forms
appropriate for mining by performing summary or aggregation operations,
for instance),
• data mining (an essential process where intelligent methods are applied in
order to extract data patterns),
• pattern evaluation (to identify the truly interesting patterns representing
knowledge based on some interestingness measures), and
• knowledge presentation (where visualization and knowledge representation
techniques are used to present the mined knowledge to the user).
46
Data mining
• It is discovery driven
– Brings out hidden patterns and relationships
• Association
• Sequences
• Classification
• Clustering and helps in
• forecasting
Data Mining
• Discover Previously unknown data
characteristics, relationships, dependencies, or
trends
• Typical Data Analysis Relies on end users
– Define the Problem
– Select the Data
– Initial the Data Analysis
– Reacts to External Stimulus
48
Data Mining
• Proactive
• Automatically searches
– Anomalies
– Possible Relationships
– Identify Problems before the end-user
• Data Mining tools analyze the data, uncover problems
or opportunities hidden in data relationships, form
computer models based on their findings, and then
user the models to predict business behavior – with
minimal end-user intervention
49
Data Mining
• A methodology designed to perform
knowledge-discovery expeditions over the
database data with minimal end-user
intervention
• 3 Stages of Data
– Data
– Information
– Knowledge
50
Extraction of Knowledge from Data
51
4 Phases of Data Mining
• Data Preparation
– Identify the main data sets to be used by the data
mining operation (usually the data warehouse)
• Data Analysis and Classification
– Study the data to identify common data
characteristics or patterns
• Data groupings, classifications, clusters, sequences
• Data dependencies, links, or relationships
• Data patterns, trends, deviation
52
4 Phases of Data Mining
• Knowledge Acquisition
– Uses the Results of the Data Analysis and Classification phase
– Data mining tool selects the appropriate modeling or knowledge-
acquisition algorithms
• Neural Networks
• Decision Trees
• Rules Induction
• Genetic algorithms
• Memory-Based Reasoning
• Prognosis
– Predict Future Behavior
– Forecast Business Outcomes
• 65% of customers who did not use a particular credit card in the last 6
months are 88% likely to cancel the account.
53
54
55
Data Warehouse, Data mart, Data mine
Business Data Large volumes of business data ready to use, view and
Warehouse analyse (enterprise focus)









