0% found this document useful (0 votes)
3 views27 pages

Database Management Systems Overview

The document outlines the functions and importance of Database Management Systems (DBMS), highlighting issues such as data redundancy, retrieval efficiency, and data integrity. It discusses data organization, including conceptual and physical levels, and introduces data models like the Entity-Relationship model and the Relational model. Additionally, it covers data retrieval methods, indexing, transaction processing, and advanced topics like data analytics.

Uploaded by

lpg pekapontren
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views27 pages

Database Management Systems Overview

The document outlines the functions and importance of Database Management Systems (DBMS), highlighting issues such as data redundancy, retrieval efficiency, and data integrity. It discusses data organization, including conceptual and physical levels, and introduces data models like the Entity-Relationship model and the Relational model. Additionally, it covers data retrieval methods, indexing, transaction processing, and advanced topics like data analytics.

Uploaded by

lpg pekapontren
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd

CS 220: Database Management and

Systems Design

Introduction to Databases, part 2


Attendance Quiz: Functions of DBMSs

 On a sheet of paper:
 Write your name
 Briefly describe three functions of database management systems
 We’ll discuss, then you can turn in the paper

1.2
Review

 Example: Facebook Profile


through
Why Study Databases?

1.3
Why Databases?

 Why not store everything in flat files. For example, use the file
system of the OS? Seems simple enough:

Name, Course, Grade


John Smith, CS112, B
Mike Stonebraker, CS234, A
Jim Gray, CS560, A
John Smith, CS560, B+

 You could try this, but you may have problems of:
 Data redundancy
 Efficient data retrieval
 Data integrity

1.4
Problem 1

 Data redundancy and inconsistency


 Multiple file formats, duplication of information in different files
Name, Course, Email, Grade
John Smith, js@[Link], CS120, B
Mike Stonebraker, ms@[Link], CS121, A
Jim Gray, CS220, jg@[Link], A
John Smith, CS220, js@[Link], B+

Why this a problem?


 Wasted space
 Potential inconsistencies (multiple formats, John
Smith vs Smith J.)

1.5
Problem 2

 Data retrieval:
 Find the students who took CS220
 Find the students with GPA > 3.5

For every query we need to write a program!

 We need the retrieval to be:


 Easy to write
 Execute efficiently

1.6
Problem 3

 Data Integrity

 No support for sharing:


 Prevent simultaneous modifications
 No coping mechanisms for system crashes
 No means of Preventing Data Entry Errors (checks must be hard-coded
in the programs)
 Security problems

 Database systems offer solutions to all the above problems

1.7
Data Organization

 Two levels of data modeling


 Conceptual or Logical level: describes data stored in
database, and the relationships among the data.
type customer = record
name : string;
street : string;
city : integer;
end;
 Physical level: describes how a record (e.g., customer) is
stored.

 Also, External (View) level: application programs hide details of


data types. Views can also hide information (e.g., salary) for
security purposes.

1.8
View of Data

A logical architecture for a database system

1.9
Database Schema

 Similar to types and variables in programming languages


 Schema – the structure of the database
 e.g., the database consists of information about a set of
customers and accounts and the relationship between them
 Analogous to type information of a variable in a program
 Physical schema: database design at the physical level
 Logical schema: database design at the logical level

1.10
Data Organization

 Data Models: a framework for describing


 data
 data relationships
 data semantics
 data constraints
 Entity-Relationship model
 We will concentrate on Relational model
 Other models:
 object-oriented model
 semi-structured data models, XML

1.11
Entity-Relationship Model

Example of schema in the entity-relationship model

1.12
Entity Relationship Model (Cont.)

 E-R model of real world


 Entities (objects)
 E.g. customers, accounts, bank branch
 Relationships between entities
 E.g. Account A-101 is held by customer Johnson
 Relationship set depositor associates customers with accounts
 Widely used for database design
 Database design in E-R model usually converted to design in the
relational model (coming up next) which is used for storage and
processing

1.13
Relational Model
Attributes
 Example of tabular data in the relational model

customer- customer- customer- account-


Customer-id
name street city number

192-83-7465 Johnson
Alma Palo Alto A-101
019-28-3746 Smith
North Rye A-215
192-83-7465 Johnson
Alma Palo Alto A-201
321-12-3123 Jones
Main Harrison A-217
019-28-3746 Smith
North Rye A-201

1.14
Data Organization

 Data Storage
Where can data be stored?
 Main memory
 Secondary memory (hard disks)
 Optical storage (DVDs)
 Tertiary store (tapes)
 Move data? Determined by buffer manager
 Mapping data to files? Determined by file manager

1.15
Database Architecture
(data organization)

DBA

DDL Commands

DDL Interpreter

File Manager
Buffer Manager
Storage Manager

Data Metadata
Secondary Storage
Schema

1.16
Data retrieval

 Queries
Query = Declarative data retrieval
describes what data, not how to retrieve it
Ex. Give me the students with GPA > 3.5 vs
Scan the student file and retrieve the records with gpa>3.5
 Why?
1. Easier to write
2. Efficient to execute (why?)

1.17
Data retrieval
Query

Query Optimizer Plan Query Evaluator


Query Processor

 Query Optimizer Data

“compiler” for queries (aka “DML Compiler”)


Plan ~ Assembly Language Program

Optimizer Does Better With Declarative Queries:


1. Algorithmic Query (e.g., in C) Þ 1 Plan to choose from
2. Declarative Query (e.g., in SQL) Þ n Plans to choose from

1.18
SQL

 SQL: widely used (declarative) non-procedural language


 E.g. find the name of the customer with customer-id 192-83-7465
select [Link]-name
from customer
where [Link]-id = ‘192-83-7465’
 E.g. find the balances of all accounts held by the customer with
customer-id 192-83-7465
select [Link]
from depositor, account
where [Link]-id = ‘192-83-7465’ and
[Link]-number = [Link]-
number
 Procedural languages: C++, Java, relational algebra

1.19
Data retrieval:
Indexing

 How to answer fast the query: “Find the student with SID = 101”?

 One approach is to scan the student table, check every student, retrurn
the one with id=101… very slow for large databases

 Any better idea?

1st keep student records ordered by SID. Do a binary search…. Updates…


2nd Use a dynamic search tree!! Allow insertions, deletions, updates and at the
same time keep the records sorted! In databases we use the B+-tree (multiway
search tree)
3rd Use a hash table. Much faster for exact match queries… but cannot support
Range queries. (Also, special hashing schemes are needed for dynamic data)

1.20
3
5
11

30
30
35
100
100
101
B+Tree Example

110
120

1.21
Root

120
130

150
156 150
179
180
180
200
B=4
Database Architecture
(data retrieval)

DB Programmer
User DBA

Code w/ embedded queries


Query DDL Commands

DML Precompiler Query Optimizer


DDL Interpreter
Query Evaluator
Query Processor

File Manager
Storage Manager Buffer Manager

Secondary Storage Indices Data Metadata

Statistics
Schema

1.22
Data Integrity
Transaction processing

 Why Concurrent Access to Data must be Managed?


John and Jane withdraw $50 and $100 from a common
account…

John: Jane:
1. get balance 1. get balance
2. if balance > $50 2. if balance > $100
3. balance = balance - $50 3. balance = balance - $100
4. update balance 4. update balance

Initial balance $300. Final balance=?


It depends…

1.23
Data Integrity
Recovery

Transfer $50 from account A ($100) to account B ($200)

1. get balance for A


2. If balanceA > $50
3. balanceA = balanceA – 50
[Link] balanceA in database
5. Get balance for B System crashes….
6. balanceB = balanceB + 50
7. Update balanceB in database

Recovery management

1.24
Database Architecture

DB Programmer
User DBA
Code w/ embedded queries
Query DDL Commands

DML Precompiler Query Optimizer


DDL Interpreter
Query Evaluator
Query Processor

Transaction Manager File Manager

Buffer Manager Recovery Manager


Storage Manager

Secondary Storage Indices Data Metadata

Statistics Integrity Constraints


Schema

1.25
Advanced Topics

 Data Analytics is an important domain


 Given large amount of data in the database, how can you use them
to help you take business decisions?

 Cluster based data analytics:


 Map-Reduce, shared nothing DBs

 Main memory databases (NVRAM)

1.26
Outline

 Application-oriented
 How to develop database applications: User + DBA
 For learning, we will do things the hard way (i.e., using lightweight
frameworks). In the real world, I recommend a featureful framework
like Django, but doing things the hard way will show the value of
more featureful frameworks.

 System-oriented
 Learn the internals of a relational DBMS

1.27

You might also like