Introduction to SQL (II)
Roadmap to This Lecture
Set operations
Aggregates
Nested Subqueries
Modification of the Database
Join Expressions
Views
Set Operations
Find courses that ran in Fall 2009 or in Spring 2010
(select course_id from section where sem = ‘Fall’ and year = 2009)
union
(select course_id from section where sem = ‘Spring’ and year = 2010)
Find courses that ran in Fall 2009 and in Spring 2010
(select course_id from section where sem = ‘Fall’ and year = 2009)
intersect
(select course_id from section where sem = ‘Spring’ and year = 2010)
Find courses that ran in Fall 2009 but not in Spring 2010
(select course_id from section where sem = ‘Fall’ and year = 2009)
except
(select course_id from section where sem = ‘Spring’ and year = 2010)
3
Set Operations
Set operations union, intersect, and except
Each of the above operations automatically eliminates duplicates
To retain all duplicates use the corresponding multiset versions union
all, intersect all and except all.
Suppose a tuple occurs m times in r and n times in s, then, it occurs:
m + n times in r union all s
min(m,n) times in r intersect all s
max(0, m – n) times in r except all s
Null Values
It is possible for tuples to have a null value, denoted by null, for some
of their attributes
4
null signifies an unknown value or that a value does not exist. The
result of any arithmetic expression involving null is null Example: 5 +
null returns null
The predicate is null can be used to check for null values.
Example: Find all instructors whose salary is null.
select name
from instructor
where salary is null
Null Values and Three Valued Logic
Any comparison with null returns unknown
Example: 5 < null or null <> null or null = null
Three-valued logic using the truth value unknown:
OR: (unknown or true) = true,
(unknown or false) = unknown
(unknown or unknown) = unknown
5
AND: (true and unknown) = unknown,
(false and unknown) = false,
(unknown and unknown) = unknown
NOT: (not unknown) = unknown
“P is unknown” evaluates to true if predicate P
evaluates to unknown
Result of where clause predicate is treated as false if it evaluates to
unknown
Aggregate Functions
These functions operate on the multiset of values of a column of a
relation, and return a value
avg: average value min: minimum value
max: maximum value sum: sum of
values count: number of values
6
Aggregate Functions (Cont.)
Find the average salary of instructors in the Computer Science
department
select avg (salary) from
instructor where dept_name=
’Comp. Sci.’;
Find the total number of instructors who teach a course in the Spring
2010 semester
select count (distinct ID) from
teaches
where semester = ’Spring’ and year = 2010;
Find the number of tuples in the course relation
select count (*) from course;
7
Aggregate Functions – Group By
Find the average salary of instructors in each department
select dept_name, avg (salary) as avg_salary
from instructor group by dept_name;
avg_salary
8
Aggregation (Cont.)
Attributes in select clause outside of aggregate functions must appear
in group by list
/* erroneous query */ select dept_name, ID, avg
(salary) from instructor group by dept_name;
Reason is simple: ID has different values in each
group of dept_name, so which ID shall we return
along with the average salary?
Aggregate Functions – Having Clause
Find the names and average salaries of all departments whose
average salary is greater than 42000
select dept_name, avg (salary)
9
from instructor group by
dept_name having avg
(salary) > 42000;
Note: predicates in the having clause are applied after the
formation of groups whereas predicates in the where clause
are applied before forming groups
Null Values and Aggregates
Total all salaries
select sum (salary )
from instructor
Above statement ignores null amounts
Result is null if there is no non-null amount
All aggregate operations except count(*) ignore tuples with null values
on the aggregated attributes
What if collection has only null values?
count returns 0
10
all other aggregates return null
11
Schemas
instructor(ID, name, dept_name, salary)
student(ID, name, dept_name, tot_cred)
takes(ID, course_id, sec_id, semester, year, grade)
teaches(ID, course_id, sec_id, semester, year)
course(course_id, title, dept_name, credits)
section(course_id, sec_id, semester, year)
Nested Subqueries
SQL provides a mechanism for the nesting of subqueries.
A subquery is a select-from-where expression that is nested within
another query.
A common use of subqueries is to perform tests for set membership,
set comparisons, and set cardinality.
12
Example Query
Find courses offered in Fall 2009 and in Spring 2010
select distinct course_id from
section
where semester = ’Fall’ and year= 2009 and
course_id in (select course_id
from section
where semester = ’Spring’ and year= 2010);
Find courses offered in Fall 2009 but not in Spring 2010
select distinct course_id from
section
where semester = ’Fall’ and year= 2009 and
course_id not in (select course_id
from section
where semester = ’Spring’ and year= 2010);
13
Example Query
Find the total number of (distinct) students who have taken
course sections taught by the instructor with ID 10101
select count (distinct ID)
from takes where (course_id, sec_id,
semester, year) in
(select course_id, sec_id, semester, year
from teaches
where [Link]= 10101);
Note: Above query can be written in a much simpler manner. The
formulation above is simply to illustrate SQL features.
Set Comparison
Find names of instructors with salary greater than that of some
(at least one) instructor in the Biology department.
14
select distinct [Link] from instructor instructor as S
where [Link] > [Link] and [Link] name =
’Biology’;
Same query using > some clause
select name from
instructor
where salary > some (select salary
from instructor
where dept name = ’Biology’);
Definition of Some Clause
F <comp> some r t r such that (F <comp> t )
Where <comp> can be:
0
(5 < some ) = true
(read: 5 < 5 some tuple in the relation)
6
15
(5 < some ) = 0 false
50
(5 = some ) = true
5
0
(5 some ) = true
5 (since 0 5)
(= some) in
However, ( some) ≠ not in
Example Query
Find the names of all instructors whose salary is greater than the salary
of all instructors in the Biology department.
select name from
instructor
where salary > all (select salary
from instructor
where dept name = ’Biology’);
16
Definition of all Clause
F <comp> all r t r (F <comp> t)
0
(5 < all ) = false
5
6
6
(5 < all ) = true
10
4
(5 = all ) = false
5
4
(5 all
6 ) = true (since 5 4 and 5 6)
( all) not in
However, (= all) ≠ in
17
Test for Empty Relations
The exists construct returns the value true if the argument subquery is
nonempty.
exists r r Ø
not exists r r = Ø
18
Correlation Variables
Yet another way of specifying the query “Find all courses taught in
both the Fall 2009 semester and in the Spring 2010 semester”
select course_id from section as S
where semester = ’Fall’ and year = 2009 and
exists (select *
from section as T
where semester = ’Spring’ and year= 2010
and S.course_id = T.course_id);
Correlated subquery
Correlation name or correlation variable
Scope of variables restricted to the inner-most query structure that
defines them
Not Exists
Find all students who have taken all courses offered in the
Biology department.
19
select distinct [Link], [Link] from
student as S
where not exists ( (select course_id
from course
where dept_name = ’Biology’)
except
(select T.course_id
from takes as T where
[Link] = [Link]));
• First nested query lists all courses offered in Biology
• Second nested query lists all courses a particular student took
Note that X – Y = Ø X Y (set containment)
Note: Cannot write this query using = all or its variants
Test for Absence of Duplicate Tuples
The unique construct tests whether a subquery has any duplicate
tuples in its result.
The unique construct evaluates to “true” on an empty set.
20
Find all courses that were offered at most once in 2009 select
T.course_id from course as T
where unique (select R.course_id
from section as R
where T.course_id= R.course_id
and [Link] = 2009);
Subqueries in the From Clause
SQL allows a subquery expression to be used in the from clause
Find the average instructors’ salaries of those departments where the
average salary is greater than $42,000.
select dept_name, avg_salary from (select
dept_name, avg (salary) as avg_salary from
instructor group by dept_name) where
avg_salary > 42000;
The above eliminate the need to use the having clause
Another way to write above query select dept_name, avg_salary
from (select dept_name, avg (salary) from instructor
group by dept_name) as dept_avg (dept_name, avg_salary)
where avg_salary > 42000;
21
Subqueries in the From Clause (Cont.)
Sub-queries in the from clause normally can’t access variables from
other attributes of the relations in the from clause
And yet another way to write it: lateral clause
Return instructor’s name, his or her salary and the average salary of
his or her department:
select name, salary, avg_salary from
instructor I1,
lateral (select avg(salary) as avg_salary
from instructor I2
where I2.dept_name= I1.dept_name);
Note: lateral is part of the SQL standard, but is not supported on many
database systems; some databases such as SQL Server offer
alternative syntax
22
With Clause
The with clause provides a way of defining a temporary relation whose
definition is available only to the query in which the with clause occurs.
Find all departments with the maximum budget
with max_budget (value) as (select
max(budget) from department) select
department.dept_name from department,
max_budget where [Link] =
max_budget.value;
You can think of with clause as declaration of local variables and
assigning values to them
Complex Queries using With Clause
Find all departments where the total salary is greater than the
average of the total salary at all departments
23
with dept _total (dept_name, value) as
(select dept_name, sum(salary)
from instructor
group by dept_name),
dept_total_avg(value) as
(select avg(value)
from dept_total) select
dept_name
from dept_total, dept_total_avg
where dept_total.value >= dept_total_avg.value;
Write it without the with clause?
Scalar Subquery
Scalar subquery is one which is used where a single value
(tuple) is expected
select dept_name,
24
(select count(*)
from instructor
where department.dept_name =
instructor.dept_name) as num_instructors from
department;
What does this query do?
Variables in the select clause must be scale value
Runtime error if subquery returns more than one result tuple
Modification of the Database
Deletion of tuples from a given relation.
Insertion of new tuples into a given relation
Updating of values in some tuples in a given relation
25
Deletion
Delete all instructors
delete from instructor
Delete all instructors from the Finance department delete
from instructor where dept_name= ’Finance’;
Delete all tuples in the instructor relation for those instructors
associated with a department located in the Watson building.
delete from instructor
where dept_name in (select dept_name
from department
where building = ’Watson’);
Deletion (Cont.)
Delete all instructors whose salary is less than the average salary of
instructors
26
delete from instructor
where salary < (select avg (salary) from instructor);
Problem?
as we delete tuples from instructor table, the average salary
changes Solution used in SQL:
1. First, compute avg salary and find all tuples to delete
2. Next, delete all tuples found above (without
recomputing avg or retesting the tuples)
Insertion
Add a new tuple to course insert into course
values (’CS-437’, ’Database Systems’, ’Comp. Sci.’, 4);
or equivalently
insert into course (course_id, title, dept_name, credits)
values (’CS-437’, ’Database Systems’, ’Comp. Sci.’, 4);
27
Add a new tuple to student with tot_creds set to null insert
into student values (’3003’, ’Green’, ’Finance’, null);
Insertion (Cont.)
Add all instructors to the student relation with tot_creds set to 0
insert into student select ID, name, dept_name, 0 from
instructor
The select from where statement is evaluated fully before any of
its results are inserted into the relation.
Otherwise queries like
insert into table1 select * from table1
would cause problem
28
Updates
Increase salaries of instructors whose salary is over $100,000 by 3%,
and all others receive a 5% raise Write two update statements:
update instructor
set salary = salary * 1.05
where salary <= 100000;
update instructor set
salary = salary * 1.03
where salary > 100000;
What’s the problem here?
The order is important
Can be done better using the case statement (next slide)
Case Statement for Conditional Updates
Same query as before but with case statement
update instructor
set salary = case
29
when salary <= 100000 then salary * 1.05
else salary * 1.03 end
30
Updates with Scalar Subqueries
Recompute and update tot_creds value for all students
update student S set tot_cred = ( select sum(credits)
from takes natural join course where [Link]=
[Link] and [Link] <> ’F’ and
[Link] is not null);
The above sets tot_creds to null for students who have not
taken any course
Instead of sum(credits), use:
case
when sum(credits) is not null then sum(credits)
else 0 end
Joined Relations
Join operations take two relations and return as a result
another relation.
31
A join operation is a Cartesian product which requires that
tuples in the two relations match (under some condition). It
also specifies the attributes that are present in the result of the
join
The join operations are typically used as subquery
expressions in the from clause
Join operations – Example
Relation course
Relation prereq
32
Observe that
prereq information is missing for CS-315 and
course information is missing for CS-347
Joined Relations
Join operations take two relations and return as a result
another relation.
These additional operations are typically used as subquery
expressions in the from clause
Join condition – defines which tuples in the two relations
match, and what attributes are present in the result of the join.
33
Join type – defines how tuples in each relation that do not
match any tuple in the other relation (based on the join
condition) are treated.
Outer Join
An extension of the join operation that avoids loss of
information.
Computes the join and then adds tuples from one relation that
does not match tuples in the other relation to the result of the
join.
Uses null values.
34
Left Outer Join
course natural left outer join prereq
Right Outer Join
course natural right outer join prereq
35
Full Outer Join
course natural full outer join prereq
Joined Relations in SQL – Examples
course inner join prereq on course.course_id =
prereq.course_id
36
What is the difference between the above and a
natural join?
Cartesian product with a selection condition
Joined Relations in SQL – Examples
course left outer join prereq on course.course_id
= prereq.course_id
Joined Relations – Examples
course natural right outer join prereq
37
course full outer join prereq using (course_id)
Views
In some cases, it is not desirable for all users to see the entire
logical model (that is, all the actual relations stored in the
database.)
38
Consider a person who needs to know an instructor’s name and
department, but not the salary. This person should see a
relation described, in SQL, by
select ID, name, dept_name
from instructor
A view provides a mechanism to hide certain data from the view
of certain users.
Any relation that is not of the conceptual model but is made
visible to a user as a “virtual relation” is called a view.
View Definition
A view is defined using the create view statement which has
the form
create view v as < query expression >
39
where <query expression> is any legal SQL expression. The
view name is represented by v.
Once a view is defined, the view name can be used to refer to
the virtual relation that the view generates.
View definition is not the same as creating a new relation by
evaluating the query expression
Rather, a view definition causes the saving of an expression;
the expression is substituted into queries using the view. In
programming language terms, this is “call by name” or lazy
evaluation!
Example Views
A view of instructors without their salary create view
faculty as select ID, name, dept_name from
instructor
A view of all instructors in the Biology department
create view bio_instructors as
40
select name from
faculty where dept_name =
‘Biology’
Create a view of department salary totals create view
departments_total_salary(dept_name, total_salary) as
select dept_name, sum (salary)
from instructor
group by dept_name;
Views Defined Using Other Views
create view physics_fall_2009 as select
course.course_id, sec_id, building, room_number
from course, section
where course.course_id =
section.course_id and
course.dept_name = ’Physics’ and
[Link] = ’Fall’ and
[Link] = ’2009’;
41
create view physics_fall_2009_watson as
select course_id, room_number from
physics_fall_2009 where building= ’Watson’;
View Expansion
Expand use of a view (physics_fall_2009) in a query/another view
create view physics_fall_2009_watson as
(select course_id, room_number
from (select course.course_id, building,
room_number from course, section
where course.course_id = section.course_id
and course.dept_name = ’Physics’ and
[Link] = ’Fall’ and [Link] =
’2009’) where building= ’Watson’;)
Views Defined Using Other Views
One view may be used in the expression defining another view
42
A view relation v1 is said to depend directly on a view relation v2
if v2 is used in the expression defining v1
A view relation v1 is said to depend on view relation v2 if either
v1 depends directly to v2 or there is a path of dependencies
from v1 to v2
A view relation v is said to be recursive if it depends on itself.
View Expansion
A way to define the meaning of views defined in terms of other
views.
Let view v1 be defined by an expression e1 that may itself
contain uses of view relations.
View expansion of an expression repeats the following
replacement step:
repeat
Find any view relation vi in e1
43
Replace the view relation vi by the expression defining vi
until no more view relations are present in e1
As long as the view definitions are not recursive, this loop will
terminate
Update of a View
Add a new tuple to faculty view which we defined earlier
insert into faculty values (’30765’, ’Green’, ’Music’);
This insertion must be represented by the insertion of the tuple
(’30765’, ’Green’, ’Music’, null)
into the instructor relation
44
Some Updates cannot be Translated Uniquely
create view instructor_info as select ID, name, building
from instructor, department where instructor.dept_name=
department.dept_name;
insert into instructor_info values (’69987’, ’White’, ’Taylor’);
which department, if multiple departments in Taylor?
what if no department is in Taylor?
Most SQL implementations allow updates only on simple views
The from clause has only one database relation.
The select clause contains only attribute names of the
relation, and does not have any expressions, aggregates, or
distinct specification.
Any attribute not listed in the select clause can be set to null
The query does not have a group by or having clause.
45
More Problems
create view history_instructors as select *
from instructor
where dept_name= ’History’;
What happens if we insert (’25566’, ’Brown’,
’Biology’, 100000) into history_instructors?
46
Materialized Views
When defining a view, simply create a physical table
representing the view at the time of creation.
Update is simple to handle.
How are updates handled to the “base” relations on which the
view was defined?
SQL> select deptname,max(salary) as maxsal from instructor group by deptname order by avg(salary) desc;
DEPTNAME MAXSAL
---------- ----------
Physics 95000
Finance 90000
[Link] 80000
Comp. Sci 92000
Biology 72000
History 62000
Music 40000
47
7 rows selected.
48