0% found this document useful (0 votes)
73 views56 pages

Data Management: DBMS vs. Warehousing

A database management system (DBMS) allows users to define, create, and maintain a database and provides controlled access to stored data. A DBMS is a collection of programs that enables users to store, modify, and extract information from a database according to requirements. The DBMS acts as an intermediary layer between programs and data, with programs accessing the DBMS to then access the underlying data. Relational databases represent data in two-dimensional tables with rows representing records and columns representing attributes. Operations like select, join, and project allow users to extract useful subsets of data from related tables. Database normalization aims to reduce redundancy and inconsistencies by organizing data into normal forms.

Uploaded by

Apoorva Pattnaik
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
73 views56 pages

Data Management: DBMS vs. Warehousing

A database management system (DBMS) allows users to define, create, and maintain a database and provides controlled access to stored data. A DBMS is a collection of programs that enables users to store, modify, and extract information from a database according to requirements. The DBMS acts as an intermediary layer between programs and data, with programs accessing the DBMS to then access the underlying data. Relational databases represent data in two-dimensional tables with rows representing records and columns representing attributes. Operations like select, join, and project allow users to extract useful subsets of data from related tables. Database normalization aims to reduce redundancy and inconsistencies by organizing data into normal forms.

Uploaded by

Apoorva Pattnaik
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
  • Introduction to Database Systems
  • File System
  • Database Management Systems (DBMS)
  • Data Warehousing
  • Data Mining

DATABASE MANAGEMENT SYSTEM,

DATA WAREHOUSE & DATA MINING

Dr. Vaibhav Mishra

1
File System
• File system was an early attempt to computerize the manual filing
system.
• A file system is a method for storing and organizing computer
files and the data to make it easy to find and access.
• File Systems may use a storage device such as a hard disk or CD‐
ROM.
• It is group of files storing data of an organization.
• Each file is independent from one another.
• Each file contained and processed information for one specific
function like accounting or inventory.
• Files are designed by using application programs written in
programming languages such as COBOL, C, C++, etc….

2
THE DATA HIERARCHY
A File based system
4
Problems of file system
• Data redundancy and inconsistency
• Program data dependence
• Lack of flexibility
• Poor security
• Lack of data sharing and
• Unavailability of data at right time
Data Redundancy
• Often the same information is duplicated in two or more files. It
may lead to inconsistency

• Assume the same data is repeated in two or more files. If change is


made to data in one file, it is required that change be made to the
data in the other file as well.

• If this is not done, it will lead to multiple different values for same
data field.

6
Difficulty in Accessing Data

• Assume in a banking system there is need to find out the


names of all customers who live within a particular
postal‐code area.
• But there is only a program to generate the list of all
customers.

7
Data Isolation
• Data isolation means that all the related data is not available in one
file.
• Generally, the data is scattered in various files, and the files may be
in different formats, therefore writing new application programs to
retrieve the appropriate data is difficult.

8
Integrity Problem
• Integrity constraints refers to the rules that ensures completeness
and reliability of data.
• Integrity constraints (e.g. account balance > 0) become part of
program code
• Hard to add new constraints or change existing ones

9
Concurrent Access Anomalies
• Many systems allow multiple users to update the data
simultaneously. In such environment, interaction of concurrent
updates may result in inconsistent data.

• Example: Bank account A containing Rs. 6000/‐. If two


transactions of withdraw funds (Rs 500/‐ and Rs 1000/‐
respectively) from account at the same time, result of the
concurrent executions may leave the account in an incorrect state.

• Program on the behalf of each withdrawal read the old balance,


reduce amount and write result back.

10
Data Security
• The data as maintained in flat files is easily accessible and
therefore not secure.

• Example : the Customer_Transaction file has details about the


total available balance of all customers. A customer wants
information about his/her account balance.

• In a file system it is difficult to give the customer access to only


his/her data in the file.

• Thus enforcing security constraints for entire file or for certain


data items are difficult.

11
Database Approach
• The limitations of File based system is overcome by a Database system
• A database is a computer based record keeping system whose over all purpose is
to record and maintain data.
• The database is a single, large repository of data, which can be used
simultaneously by many departments and users.

• DBMS A database management system is the software system that allows


users to define, create and maintain a database and provides controlled access to
the data.

• A database management system (DBMS) is basically a collection of programs


that enables users to store, modify, and extract information from a database as
per the requirements.

• DBMS is an intermediate layer between programs and the data. Programs access
the DBMS, which then accesses the data.

12
DBMS
• DBMS A database management system is the software system
that allows users to define, create and maintain a database and
provides controlled access to the data.

• A database management system (DBMS) is basically a


collection of programs that enables users to store, modify, and
extract information from a database as per the requirements.

• DBMS is an intermediate layer between programs and the


data. Programs access the DBMS, which then accesses the
data.

13
HUMAN RESOURCES DATABASE WITH MULTIPLE VIEWS
• Relational DBMS
– Represent data as two-dimensional tables
– Each table contains data on entity and attributes
• Table: grid of columns and rows
– Rows (tuples): Records for different entities
– Fields (columns): Represents attribute for entity
– Key field: Field used to uniquely identify each record
– Primary key: Field in table used for key fields
– Foreign key: Primary key used in second table as look-up field to
identify records from original table
Relational Data Model

• RELATION: A table of values


– A relation may be thought of as a set of rows.
– A relation may alternately be though of as a set of columns.
– Each row represents a fact that corresponds to a real-world entity or
relationship.
– Each row has a value of an item or set of items that uniquely identifies
that row in the table.
– Sometimes row-ids or sequential numbers are assigned to identify the
rows in the table.
– Each column typically is called by its column name or column header
or attribute name.

16
Relational Data Model

17
Example of a Relation
attributes
(or columns)

tuples
(or rows)

18
• Normalisation (avoid redundancy and data
inconsistencies)
– Different forms (1NF,2NF,3NF etc.)
• Primary key(unique field in that table)
– Emp_Id,Customer_Id etc.
• Foreign Keys(look up field in other table where
this is primary key)

• Relationships (one to many etc.)


• Referential integrity
suppliers
Supplier no suppliernam Supplier city state Zip_code
e street

8259 Sujal ind Patel nagar rajkot GJ 360002


8261 Gujjar Comp. Street 5 ahmedabad GJ 382445
8263 Savera ind. Ind. layout mysore KA 560082
8444 BC Raj & Co annasalai chennai TN 600002
Parts
Part _no Part name Unit price Supplier
number
137 Door latch 22.00 8259
145 Side mirror 15.00 8444
150 compressor 154.00 8261
152 Door Lock 50.00 8259
Relational Database Tables
Capabilities of Database Management Systems (DBMSs)

• Operations of a Relational DBMS


– Three basic operations used to develop useful
sets of data
• SELECT: Creates subset of data of all records that
meet stated criteria
• JOIN: Combines relational tables to provide user
with more information than available in individual
tables
• PROJECT: Creates subset of columns in table,
creating tables with only the information specified
THE THREE BASIC OPERATIONS OF A RELATIONAL DBMS
Database Normalization
• The main goal of Database Normalization is to
restructure the logical data model of a database
to:
• Eliminate redundancy
• Organize data efficiently
• Reduce the potential for data anomalies.

26
Data Anomalies

• Data anomalies are inconsistencies in the data stored in a


database as a result of an operation such as update, insertion,
and/or deletion.
• Such inconsistencies may arise when have a particular record
stored in multiple locations and not all of the copies are
updated.

• We can prevent such anomalies by implementing different


level of normalization called Normal Forms (NF).

27
Dependencies: Definitions
• Multivalued Attributes (or repeating groups): the values
of non-key attributes or groups of non-key attributes are
not uniquely identified by (directly or indirectly) or not
functionally dependent on the value of the Primary Key
(or its part).

STUDENT

Stud_ID Name Course_ID Units


101 Lennon MSI 250 3.00
101 Lennon MSI 415 3.00
125 Johnson MSI 331 3.00

28
General Hardware Company:
Unnormalized Data
Sales-person Sales-person Year Depart-ment
Number Product Name Commission of Number Manager Product Unit
Number Percentage Hire Name Name Price Quantity
137 19440 Baker 101995 73 Scott Hammer 17.50 473
24013 Saw 26.25 170
26722 Pliers 11.50 688
186 16386 Adams 15 2001 59 Lopez Wrench 12.95 1745
19440 Hammer 17.50 2529
21765 Drill 32.99 1962
24013 Saw 26.25 3071
204 21765 Dickens 10 1998 73 Scott Drill 32.99 809
26722 Pliers 11.50 734
361 16386 Carlyle 20 2001 73 Scott Wrench 12.95 3729
21765 Drill 32.99 3110
26722 Pliers 11.50 2738
SALESPERSON/PRODUCT Table

 Records contain multivalued attributes.


7-29
First Normal Form

• Unnormalized – There are multivalued


attributes or repeating groups
• 1 NF – No multivalued attributes or
repeating groups.

30
Example 1: Determine NF
All attributes are directly
• ISBN  Title or indirectly determined
• ISBN  Publisher by the primary key;
therefore, the relation is at
• Publisher  Address least in 1 NF

BOOK

ISBN Title Publisher Address

31
Example 2: Determine NF
In your solution you will write
the following justification:
• Part_ID  Description 1) There are M/V attributes;
• Part_ID  Price therefore, not 1NF
Conclusion: The relation is not
• Part_ID, Comp_ID  No normalized.

PART

Part_ID Descr Price Comp_ID No

32
S_id S_Name S_Address Subject_opted
401 Adam Noida Bio
402 Alex Panipat Maths
403 Stuart Jammu Maths
404 Adam Noida Physics

Updation Anomaly : To update address of a student who occurs twice or more


than twice in a table, we will have to update S_Address column in all the rows, else
data will become inconsistent.
Insertion Anomaly : Suppose for a new admission, we have a Student id(S_id),
name and address of a student but if student has not opted for any subjects yet then
we have to insert NULL there, leading to Insertion Anamoly.
Deletion Anomaly : If (S_id) 402 has only one subject and temporarily he drops it,
when we delete that row, entire student record will be deleted along with it.
As per First Normal Form, no two Rows of data must contain repeating group of
information i.e each set of column must have a unique value, such that multiple columns
cannot be used to fetch the same row. Each table should be organized into rows, and
each row should have a primary key that distinguishes it as unique.

The Primary key is usually a single column, but sometimes more than one column can
be combined to create a single primary key. For example consider a table which is not in
First normal form

Student Table :
In First Normal Form, any row must not have a column in which more than one value is
saved, like separated with commas. Rather than that, we must separate such data into
multiple rows.

Student Age Subject


Adam 15 Biology, Maths
Alex 14 Maths
Stuart 17 Maths
Student Age Subject
Adam 15 Biology
Adam 15 Maths
Alex 14 Maths
Stuart 17 Maths

Student Table following 1NF will be :


Using the First Normal Form, data redundancy increases, as there will be
many columns with same data in multiple rows but each row as a whole will be
unique.
Database to improve Business
performance and decision making
• Data warehouses:
– It is a database that stores current and historical
data of potential interest to decision makers
throughout the company
• Customer data
• Suppliers data
• Offer letters of employees
• Discipline/legal documents etc.
Note: data can not be altered.
Data Warehouse
• A collection of non-volatile data of different business
subjects and objects, which is time variant and
integrated down various sources and applications and
stored in a manner to make a quick analysis of
business situation
• DSS – friendly data repository for the DSS is the
DATA WAREHOUSE
Integrated, Subject-Oriented, Time-Variant,
Nonvolatile database that provides support for
decision making
• Subject-oriented: A data warehouse is organized around major subjects, such
as customer, vendor, product, and sales. Rather than concentrating on the day-
to-day operations and transaction processing of an organization, a data
warehouse focuses on the modeling and analysis of data for decision makers.
• Integrated: A data warehouse is usually constructed by integrating multiple
heterogeneous sources, such as relational databases, and on-line transaction
records. Data cleaning and data integration techniques are applied to ensure
consistency in naming conventions, attribute measures, and so on.
• Time-variant: Data are stored to provide information from a historical
perspective (e.g., the past 5-10 years). Every key structure in the data warehouse
contains, either implicitly or explicitly, an element of time.
• Nonvolatile: A data warehouse is always a physically separate store of data
transformed from the application data found in the operational environment.
Due to this separation, a data warehouse does not require transaction processing,
recovery, and concurrency control mechanisms. It usually requires only two
operations in data accessing: initial loading of data and access of data. 38
Data Warehouse
Legacy database

Conversion Process
Operational DB

Middleware
Data Warehouse

Select, filter, validate, transform,


External DB
compute, consolidate and move to
DW
Components of a Data Warehouse
41
Data Marts
• Datamart: It is subset of datawarehouse.
Organisations create smaller de-centralised
warehouses called datamarts
– Ex: Finance regarding company P&L, wage bills
etc.
– HR: Medical insurance claims, legal cases,
attrition history etc.
– Sales: customer details, competitors’ details etc.
Data Marts
• Small Data Stores
• More manageable data sets
• Targeted to meet the needs of small groups within the organization
• Small, Single-Subject data warehouse subset that provides decision
support to a small group of people
• Smaller and de-centralised warehouses with in the company are
called data marts
• It is a subset of data warehouse
• It is highly focused towards a particular line of business or
department
– Top five customers data
– Customers with more than 20 % of business

43
Data Mining

44
Data Mining
• It is discovery driven.
• To bring out hidden trends/patterns from the
data
– Associations
– Sequential patterns
– Clustering or grouping etc.
Knowledge Discovery in
Databases", or KDD
• data cleaning (to remove noise or irrelevant data),
• data integration (where multiple data sources may be combined)1,
• data selection (where data relevant to the analysis task are retrieved from
the database),
• data transformation (where data are transformed or consolidated into forms
appropriate for mining by performing summary or aggregation operations,
for instance),
• data mining (an essential process where intelligent methods are applied in
order to extract data patterns),
• pattern evaluation (to identify the truly interesting patterns representing
knowledge based on some interestingness measures), and
• knowledge presentation (where visualization and knowledge representation
techniques are used to present the mined knowledge to the user).

46
Data mining
• It is discovery driven
– Brings out hidden patterns and relationships
• Association
• Sequences
• Classification
• Clustering and helps in
• forecasting
Data Mining
• Discover Previously unknown data
characteristics, relationships, dependencies, or
trends
• Typical Data Analysis Relies on end users
– Define the Problem
– Select the Data
– Initial the Data Analysis
– Reacts to External Stimulus

48
Data Mining
• Proactive
• Automatically searches
– Anomalies
– Possible Relationships
– Identify Problems before the end-user
• Data Mining tools analyze the data, uncover problems
or opportunities hidden in data relationships, form
computer models based on their findings, and then
user the models to predict business behavior – with
minimal end-user intervention

49
Data Mining
• A methodology designed to perform
knowledge-discovery expeditions over the
database data with minimal end-user
intervention
• 3 Stages of Data
– Data
– Information
– Knowledge

50
Extraction of Knowledge from Data

51
4 Phases of Data Mining
• Data Preparation
– Identify the main data sets to be used by the data
mining operation (usually the data warehouse)
• Data Analysis and Classification
– Study the data to identify common data
characteristics or patterns
• Data groupings, classifications, clusters, sequences
• Data dependencies, links, or relationships
• Data patterns, trends, deviation

52
4 Phases of Data Mining
• Knowledge Acquisition
– Uses the Results of the Data Analysis and Classification phase
– Data mining tool selects the appropriate modeling or knowledge-
acquisition algorithms
• Neural Networks
• Decision Trees
• Rules Induction
• Genetic algorithms
• Memory-Based Reasoning
• Prognosis
– Predict Future Behavior
– Forecast Business Outcomes
• 65% of customers who did not use a particular credit card in the last 6
months are 88% likely to cancel the account.

53
54
55
Data Warehouse, Data mart, Data mine
Business Data Large volumes of business data ready to use, view and
Warehouse analyse (enterprise focus)

Data Mart Departmental or functional data including external data for


limited use by department or function (department focus)

Data Mine Data set of probable associations or relations extracted


and organised to evaluate the influencing factors for the
patterns and find solutions to improve business ( business
focus)
Tools for Business Intelligence
• Multi dimensional data analysis and data
mining
– Compare the performance among sections
– Compare performance of 2020 batch with present
batch in certain subjects
– Schedule compliance wrt planned ( conduct of
sessions) 2015 to 2019
• Online analytical processing enables this

DATABASE MANAGEMENT SYSTEM, 
DATABASE MANAGEMENT SYSTEM, 
DATA WAREHOUSE & DATA MINING
DATA WAREHOUSE & DATA MINING
1
Dr. Vai
File System
File System
•
File system was an early attempt to computerize the manual filing 
system.
•
A file system is a met
THE DATA HIERARCHY
A File based system
A File based system
4
Problems of file system
• Data redundancy and inconsistency
• Program data dependence
• Lack of flexibility
• Poor security
•
Data Redundancy
Data Redundancy
•
Often the same information is duplicated in two or more files. It 
may lead to inconsistenc
Difficulty in Accessing Data
Difficulty in Accessing Data
• Assume in a banking system there is need to find out the 
names o
Data Isolation
Data Isolation
• Data isolation means that all the related data is not available in one 
file.
• Generally, th
Integrity Problem
Integrity Problem
• Integrity constraints refers to the rules that ensures completeness 
and reliability of
Concurrent Access Anomalies
Concurrent Access Anomalies
•
Many systems allow multiple users to update the data 
simultaneousl

You might also like