CS 220: Database Management and
Systems Design
Introduction to Databases, part 2
Attendance Quiz: Functions of DBMSs
On a sheet of paper:
Write your name
Briefly describe three functions of database management systems
We’ll discuss, then you can turn in the paper
1.2
Review
Example: Facebook Profile
through
Why Study Databases?
1.3
Why Databases?
Why not store everything in flat files. For example, use the file
system of the OS? Seems simple enough:
Name, Course, Grade
John Smith, CS112, B
Mike Stonebraker, CS234, A
Jim Gray, CS560, A
John Smith, CS560, B+
You could try this, but you may have problems of:
Data redundancy
Efficient data retrieval
Data integrity
1.4
Problem 1
Data redundancy and inconsistency
Multiple file formats, duplication of information in different files
Name, Course, Email, Grade
John Smith, js@[Link], CS120, B
Mike Stonebraker, ms@[Link], CS121, A
Jim Gray, CS220, jg@[Link], A
John Smith, CS220, js@[Link], B+
Why this a problem?
Wasted space
Potential inconsistencies (multiple formats, John
Smith vs Smith J.)
1.5
Problem 2
Data retrieval:
Find the students who took CS220
Find the students with GPA > 3.5
For every query we need to write a program!
We need the retrieval to be:
Easy to write
Execute efficiently
1.6
Problem 3
Data Integrity
No support for sharing:
Prevent simultaneous modifications
No coping mechanisms for system crashes
No means of Preventing Data Entry Errors (checks must be hard-coded
in the programs)
Security problems
Database systems offer solutions to all the above problems
1.7
Data Organization
Two levels of data modeling
Conceptual or Logical level: describes data stored in
database, and the relationships among the data.
type customer = record
name : string;
street : string;
city : integer;
end;
Physical level: describes how a record (e.g., customer) is
stored.
Also, External (View) level: application programs hide details of
data types. Views can also hide information (e.g., salary) for
security purposes.
1.8
View of Data
A logical architecture for a database system
1.9
Database Schema
Similar to types and variables in programming languages
Schema – the structure of the database
e.g., the database consists of information about a set of
customers and accounts and the relationship between them
Analogous to type information of a variable in a program
Physical schema: database design at the physical level
Logical schema: database design at the logical level
1.10
Data Organization
Data Models: a framework for describing
data
data relationships
data semantics
data constraints
Entity-Relationship model
We will concentrate on Relational model
Other models:
object-oriented model
semi-structured data models, XML
1.11
Entity-Relationship Model
Example of schema in the entity-relationship model
1.12
Entity Relationship Model (Cont.)
E-R model of real world
Entities (objects)
E.g. customers, accounts, bank branch
Relationships between entities
E.g. Account A-101 is held by customer Johnson
Relationship set depositor associates customers with accounts
Widely used for database design
Database design in E-R model usually converted to design in the
relational model (coming up next) which is used for storage and
processing
1.13
Relational Model
Attributes
Example of tabular data in the relational model
customer- customer- customer- account-
Customer-id
name street city number
192-83-7465 Johnson
Alma Palo Alto A-101
019-28-3746 Smith
North Rye A-215
192-83-7465 Johnson
Alma Palo Alto A-201
321-12-3123 Jones
Main Harrison A-217
019-28-3746 Smith
North Rye A-201
1.14
Data Organization
Data Storage
Where can data be stored?
Main memory
Secondary memory (hard disks)
Optical storage (DVDs)
Tertiary store (tapes)
Move data? Determined by buffer manager
Mapping data to files? Determined by file manager
1.15
Database Architecture
(data organization)
DBA
DDL Commands
DDL Interpreter
File Manager
Buffer Manager
Storage Manager
Data Metadata
Secondary Storage
Schema
1.16
Data retrieval
Queries
Query = Declarative data retrieval
describes what data, not how to retrieve it
Ex. Give me the students with GPA > 3.5 vs
Scan the student file and retrieve the records with gpa>3.5
Why?
1. Easier to write
2. Efficient to execute (why?)
1.17
Data retrieval
Query
Query Optimizer Plan Query Evaluator
Query Processor
Query Optimizer Data
“compiler” for queries (aka “DML Compiler”)
Plan ~ Assembly Language Program
Optimizer Does Better With Declarative Queries:
1. Algorithmic Query (e.g., in C) Þ 1 Plan to choose from
2. Declarative Query (e.g., in SQL) Þ n Plans to choose from
1.18
SQL
SQL: widely used (declarative) non-procedural language
E.g. find the name of the customer with customer-id 192-83-7465
select [Link]-name
from customer
where [Link]-id = ‘192-83-7465’
E.g. find the balances of all accounts held by the customer with
customer-id 192-83-7465
select [Link]
from depositor, account
where [Link]-id = ‘192-83-7465’ and
[Link]-number = [Link]-
number
Procedural languages: C++, Java, relational algebra
1.19
Data retrieval:
Indexing
How to answer fast the query: “Find the student with SID = 101”?
One approach is to scan the student table, check every student, retrurn
the one with id=101… very slow for large databases
Any better idea?
1st keep student records ordered by SID. Do a binary search…. Updates…
2nd Use a dynamic search tree!! Allow insertions, deletions, updates and at the
same time keep the records sorted! In databases we use the B+-tree (multiway
search tree)
3rd Use a hash table. Much faster for exact match queries… but cannot support
Range queries. (Also, special hashing schemes are needed for dynamic data)
1.20
3
5
11
30
30
35
100
100
101
B+Tree Example
110
120
1.21
Root
120
130
150
156 150
179
180
180
200
B=4
Database Architecture
(data retrieval)
DB Programmer
User DBA
Code w/ embedded queries
Query DDL Commands
DML Precompiler Query Optimizer
DDL Interpreter
Query Evaluator
Query Processor
File Manager
Storage Manager Buffer Manager
Secondary Storage Indices Data Metadata
Statistics
Schema
1.22
Data Integrity
Transaction processing
Why Concurrent Access to Data must be Managed?
John and Jane withdraw $50 and $100 from a common
account…
John: Jane:
1. get balance 1. get balance
2. if balance > $50 2. if balance > $100
3. balance = balance - $50 3. balance = balance - $100
4. update balance 4. update balance
Initial balance $300. Final balance=?
It depends…
1.23
Data Integrity
Recovery
Transfer $50 from account A ($100) to account B ($200)
1. get balance for A
2. If balanceA > $50
3. balanceA = balanceA – 50
[Link] balanceA in database
5. Get balance for B System crashes….
6. balanceB = balanceB + 50
7. Update balanceB in database
Recovery management
1.24
Database Architecture
DB Programmer
User DBA
Code w/ embedded queries
Query DDL Commands
DML Precompiler Query Optimizer
DDL Interpreter
Query Evaluator
Query Processor
Transaction Manager File Manager
Buffer Manager Recovery Manager
Storage Manager
Secondary Storage Indices Data Metadata
Statistics Integrity Constraints
Schema
1.25
Advanced Topics
Data Analytics is an important domain
Given large amount of data in the database, how can you use them
to help you take business decisions?
Cluster based data analytics:
Map-Reduce, shared nothing DBs
Main memory databases (NVRAM)
1.26
Outline
Application-oriented
How to develop database applications: User + DBA
For learning, we will do things the hard way (i.e., using lightweight
frameworks). In the real world, I recommend a featureful framework
like Django, but doing things the hard way will show the value of
more featureful frameworks.
System-oriented
Learn the internals of a relational DBMS
1.27