0% found this document useful (0 votes)
12 views48 pages

SQL Set Operations and Aggregates Guide

Uploaded by

Cheryl Prithika
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views48 pages

SQL Set Operations and Aggregates Guide

Uploaded by

Cheryl Prithika
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

Introduction to SQL (II)

Roadmap to This Lecture


 Set operations
 Aggregates
 Nested Subqueries
 Modification of the Database
 Join Expressions
 Views

Set Operations
 Find courses that ran in Fall 2009 or in Spring 2010
(select course_id from section where sem = ‘Fall’ and year = 2009)
union
(select course_id from section where sem = ‘Spring’ and year = 2010)

 Find courses that ran in Fall 2009 and in Spring 2010

(select course_id from section where sem = ‘Fall’ and year = 2009)
intersect
(select course_id from section where sem = ‘Spring’ and year = 2010)

 Find courses that ran in Fall 2009 but not in Spring 2010

(select course_id from section where sem = ‘Fall’ and year = 2009)
except
(select course_id from section where sem = ‘Spring’ and year = 2010)

3
Set Operations
 Set operations union, intersect, and except
 Each of the above operations automatically eliminates duplicates
 To retain all duplicates use the corresponding multiset versions union
all, intersect all and except all.

 Suppose a tuple occurs m times in r and n times in s, then, it occurs:


 m + n times in r union all s
 min(m,n) times in r intersect all s
 max(0, m – n) times in r except all s

Null Values
 It is possible for tuples to have a null value, denoted by null, for some
of their attributes

4
 null signifies an unknown value or that a value does not exist.  The
result of any arithmetic expression involving null is null  Example: 5 +
null returns null
 The predicate is null can be used to check for null values.
 Example: Find all instructors whose salary is null.
select name
from instructor
where salary is null

Null Values and Three Valued Logic


 Any comparison with null returns unknown
 Example: 5 < null or null <> null or null = null 
Three-valued logic using the truth value unknown:
 OR: (unknown or true) = true,
(unknown or false) = unknown
(unknown or unknown) = unknown

5
 AND: (true and unknown) = unknown,
(false and unknown) = false,
(unknown and unknown) = unknown
 NOT: (not unknown) = unknown
 “P is unknown” evaluates to true if predicate P
evaluates to unknown
 Result of where clause predicate is treated as false if it evaluates to
unknown

Aggregate Functions
 These functions operate on the multiset of values of a column of a
relation, and return a value
avg: average value min: minimum value
max: maximum value sum: sum of
values count: number of values

6
Aggregate Functions (Cont.)
 Find the average salary of instructors in the Computer Science
department
 select avg (salary) from
instructor where dept_name=
’Comp. Sci.’;
 Find the total number of instructors who teach a course in the Spring
2010 semester
 select count (distinct ID) from
teaches
where semester = ’Spring’ and year = 2010; 
Find the number of tuples in the course relation
 select count (*) from course;

7
Aggregate Functions – Group By
 Find the average salary of instructors in each department
 select dept_name, avg (salary) as avg_salary
from instructor group by dept_name;

avg_salary

8
Aggregation (Cont.)
 Attributes in select clause outside of aggregate functions must appear
in group by list
 /* erroneous query */ select dept_name, ID, avg
(salary) from instructor group by dept_name;

 Reason is simple: ID has different values in each


group of dept_name, so which ID shall we return
along with the average salary?

Aggregate Functions – Having Clause


 Find the names and average salaries of all departments whose
average salary is greater than 42000

select dept_name, avg (salary)

9
from instructor group by
dept_name having avg
(salary) > 42000;

Note: predicates in the having clause are applied after the


formation of groups whereas predicates in the where clause
are applied before forming groups

Null Values and Aggregates


 Total all salaries
select sum (salary )
from instructor
 Above statement ignores null amounts
 Result is null if there is no non-null amount
 All aggregate operations except count(*) ignore tuples with null values
on the aggregated attributes
 What if collection has only null values?
 count returns 0

10
 all other aggregates return null

11
Schemas
 instructor(ID, name, dept_name, salary)
 student(ID, name, dept_name, tot_cred)
 takes(ID, course_id, sec_id, semester, year, grade)
 teaches(ID, course_id, sec_id, semester, year)
 course(course_id, title, dept_name, credits)
 section(course_id, sec_id, semester, year)

Nested Subqueries
 SQL provides a mechanism for the nesting of subqueries.
 A subquery is a select-from-where expression that is nested within
another query.
 A common use of subqueries is to perform tests for set membership,
set comparisons, and set cardinality.

12
Example Query
 Find courses offered in Fall 2009 and in Spring 2010

select distinct course_id from


section
where semester = ’Fall’ and year= 2009 and
course_id in (select course_id
from section
where semester = ’Spring’ and year= 2010);

 Find courses offered in Fall 2009 but not in Spring 2010

select distinct course_id from


section
where semester = ’Fall’ and year= 2009 and
course_id not in (select course_id
from section
where semester = ’Spring’ and year= 2010);

13
Example Query
 Find the total number of (distinct) students who have taken
course sections taught by the instructor with ID 10101

select count (distinct ID)


from takes where (course_id, sec_id,
semester, year) in
(select course_id, sec_id, semester, year
from teaches
where [Link]= 10101);

 Note: Above query can be written in a much simpler manner. The


formulation above is simply to illustrate SQL features.

Set Comparison
 Find names of instructors with salary greater than that of some
(at least one) instructor in the Biology department.
14
select distinct [Link] from instructor instructor as S
where [Link] > [Link] and [Link] name =
’Biology’;

 Same query using > some clause

select name from


instructor
where salary > some (select salary
from instructor
where dept name = ’Biology’);

Definition of Some Clause


 F <comp> some r t r such that (F <comp> t )
Where <comp> can be:     

0
(5 < some ) = true
(read: 5 < 5 some tuple in the relation)

6
15
(5 < some ) = 0 false

50
(5 = some ) = true
5
0
(5  some ) = true
5 (since 0  5)
(= some)  in
However, ( some) ≠ not in

Example Query
 Find the names of all instructors whose salary is greater than the salary
of all instructors in the Biology department.

select name from


instructor
where salary > all (select salary
from instructor
where dept name = ’Biology’);

16
Definition of all Clause
 F <comp> all r t r (F <comp> t)

0
(5 < all ) = false
5
6
6
(5 < all ) = true
10
4
(5 = all ) = false
5
4
(5  all
6 ) = true (since 5  4 and 5  6)
( all)  not in
However, (= all) ≠ in

17
Test for Empty Relations
 The exists construct returns the value true if the argument subquery is
nonempty.
 exists r  r  Ø
 not exists r  r = Ø

18
Correlation Variables
 Yet another way of specifying the query “Find all courses taught in
both the Fall 2009 semester and in the Spring 2010 semester”
select course_id from section as S
where semester = ’Fall’ and year = 2009 and
exists (select *
from section as T
where semester = ’Spring’ and year= 2010
and S.course_id = T.course_id);
 Correlated subquery
 Correlation name or correlation variable
 Scope of variables restricted to the inner-most query structure that
defines them

Not Exists
 Find all students who have taken all courses offered in the
Biology department.
19
select distinct [Link], [Link] from
student as S
where not exists ( (select course_id
from course
where dept_name = ’Biology’)
except
(select T.course_id
from takes as T where
[Link] = [Link]));

• First nested query lists all courses offered in Biology


• Second nested query lists all courses a particular student took

 Note that X – Y = Ø  X Y (set containment)


 Note: Cannot write this query using = all or its variants

Test for Absence of Duplicate Tuples


 The unique construct tests whether a subquery has any duplicate
tuples in its result.
 The unique construct evaluates to “true” on an empty set.

20
 Find all courses that were offered at most once in 2009 select
T.course_id from course as T
where unique (select R.course_id
from section as R
where T.course_id= R.course_id
and [Link] = 2009);
Subqueries in the From Clause
 SQL allows a subquery expression to be used in the from clause 
Find the average instructors’ salaries of those departments where the
average salary is greater than $42,000.
select dept_name, avg_salary from (select
dept_name, avg (salary) as avg_salary from
instructor group by dept_name) where
avg_salary > 42000;
 The above eliminate the need to use the having clause
 Another way to write above query select dept_name, avg_salary
from (select dept_name, avg (salary) from instructor
group by dept_name) as dept_avg (dept_name, avg_salary)
where avg_salary > 42000;
21
Subqueries in the From Clause (Cont.)
 Sub-queries in the from clause normally can’t access variables from
other attributes of the relations in the from clause

 And yet another way to write it: lateral clause


 Return instructor’s name, his or her salary and the average salary of
his or her department:
select name, salary, avg_salary from
instructor I1,
lateral (select avg(salary) as avg_salary
from instructor I2
where I2.dept_name= I1.dept_name);

 Note: lateral is part of the SQL standard, but is not supported on many
database systems; some databases such as SQL Server offer
alternative syntax

22
With Clause
 The with clause provides a way of defining a temporary relation whose
definition is available only to the query in which the with clause occurs.

 Find all departments with the maximum budget

with max_budget (value) as (select


max(budget) from department) select
department.dept_name from department,
max_budget where [Link] =
max_budget.value;

 You can think of with clause as declaration of local variables and


assigning values to them

Complex Queries using With Clause


 Find all departments where the total salary is greater than the
average of the total salary at all departments
23
with dept _total (dept_name, value) as
(select dept_name, sum(salary)
from instructor
group by dept_name),
dept_total_avg(value) as
(select avg(value)
from dept_total) select
dept_name
from dept_total, dept_total_avg
where dept_total.value >= dept_total_avg.value;

 Write it without the with clause?

Scalar Subquery
 Scalar subquery is one which is used where a single value
(tuple) is expected
select dept_name,
24
(select count(*)
from instructor
where department.dept_name =
instructor.dept_name) as num_instructors from
department;
 What does this query do?
 Variables in the select clause must be scale value
 Runtime error if subquery returns more than one result tuple

Modification of the Database

 Deletion of tuples from a given relation.


 Insertion of new tuples into a given relation
 Updating of values in some tuples in a given relation

25
Deletion
 Delete all instructors
delete from instructor

 Delete all instructors from the Finance department delete


from instructor where dept_name= ’Finance’;

 Delete all tuples in the instructor relation for those instructors


associated with a department located in the Watson building.
delete from instructor
where dept_name in (select dept_name
from department
where building = ’Watson’);

Deletion (Cont.)
 Delete all instructors whose salary is less than the average salary of
instructors
26
delete from instructor
where salary < (select avg (salary) from instructor);

 Problem?
 as we delete tuples from instructor table, the average salary
changes  Solution used in SQL:
1. First, compute avg salary and find all tuples to delete

2. Next, delete all tuples found above (without


recomputing avg or retesting the tuples)

Insertion
 Add a new tuple to course insert into course
values (’CS-437’, ’Database Systems’, ’Comp. Sci.’, 4);

 or equivalently

insert into course (course_id, title, dept_name, credits)


values (’CS-437’, ’Database Systems’, ’Comp. Sci.’, 4);
27
 Add a new tuple to student with tot_creds set to null insert
into student values (’3003’, ’Green’, ’Finance’, null);

Insertion (Cont.)
 Add all instructors to the student relation with tot_creds set to 0
insert into student select ID, name, dept_name, 0 from
instructor
 The select from where statement is evaluated fully before any of
its results are inserted into the relation.
Otherwise queries like

insert into table1 select * from table1

would cause problem

28
Updates
 Increase salaries of instructors whose salary is over $100,000 by 3%,
and all others receive a 5% raise  Write two update statements:
update instructor
set salary = salary * 1.05
where salary <= 100000;
update instructor set
salary = salary * 1.03
where salary > 100000;
 What’s the problem here?
 The order is important
 Can be done better using the case statement (next slide)

Case Statement for Conditional Updates


 Same query as before but with case statement
update instructor
set salary = case
29
when salary <= 100000 then salary * 1.05
else salary * 1.03 end

30
Updates with Scalar Subqueries
 Recompute and update tot_creds value for all students
update student S set tot_cred = ( select sum(credits)
from takes natural join course where [Link]=
[Link] and [Link] <> ’F’ and
[Link] is not null);
 The above sets tot_creds to null for students who have not
taken any course
 Instead of sum(credits), use:
case
when sum(credits) is not null then sum(credits)
else 0 end
Joined Relations
 Join operations take two relations and return as a result
another relation.
31
 A join operation is a Cartesian product which requires that
tuples in the two relations match (under some condition). It
also specifies the attributes that are present in the result of the
join
 The join operations are typically used as subquery
expressions in the from clause

Join operations – Example


 Relation course

 Relation prereq

32
 Observe that
prereq information is missing for CS-315 and
course information is missing for CS-347

Joined Relations
 Join operations take two relations and return as a result
another relation.
 These additional operations are typically used as subquery
expressions in the from clause
 Join condition – defines which tuples in the two relations
match, and what attributes are present in the result of the join.

33
 Join type – defines how tuples in each relation that do not
match any tuple in the other relation (based on the join
condition) are treated.

Outer Join

 An extension of the join operation that avoids loss of


information.
 Computes the join and then adds tuples from one relation that
does not match tuples in the other relation to the result of the
join.
 Uses null values.

34
Left Outer Join

 course natural left outer join prereq

Right Outer Join

 course natural right outer join prereq

35
Full Outer Join

 course natural full outer join prereq

Joined Relations in SQL – Examples


 course inner join prereq on course.course_id =
prereq.course_id

36
 What is the difference between the above and a
natural join?
 Cartesian product with a selection condition

Joined Relations in SQL – Examples

 course left outer join prereq on course.course_id


= prereq.course_id

Joined Relations – Examples


 course natural right outer join prereq

37
 course full outer join prereq using (course_id)

Views
 In some cases, it is not desirable for all users to see the entire
logical model (that is, all the actual relations stored in the
database.)

38
 Consider a person who needs to know an instructor’s name and
department, but not the salary. This person should see a
relation described, in SQL, by

select ID, name, dept_name


from instructor

 A view provides a mechanism to hide certain data from the view


of certain users.
 Any relation that is not of the conceptual model but is made
visible to a user as a “virtual relation” is called a view.

View Definition
 A view is defined using the create view statement which has
the form
create view v as < query expression >

39
where <query expression> is any legal SQL expression. The
view name is represented by v.
 Once a view is defined, the view name can be used to refer to
the virtual relation that the view generates.
 View definition is not the same as creating a new relation by
evaluating the query expression
 Rather, a view definition causes the saving of an expression;
the expression is substituted into queries using the view.  In
programming language terms, this is “call by name” or lazy
evaluation!

Example Views
 A view of instructors without their salary create view
faculty as select ID, name, dept_name from
instructor
 A view of all instructors in the Biology department
create view bio_instructors as

40
select name from
faculty where dept_name =
‘Biology’
 Create a view of department salary totals create view
departments_total_salary(dept_name, total_salary) as
select dept_name, sum (salary)
from instructor
group by dept_name;
Views Defined Using Other Views
 create view physics_fall_2009 as select
course.course_id, sec_id, building, room_number
from course, section
where course.course_id =
section.course_id and
course.dept_name = ’Physics’ and
[Link] = ’Fall’ and
[Link] = ’2009’;

41
 create view physics_fall_2009_watson as
select course_id, room_number from
physics_fall_2009 where building= ’Watson’;

View Expansion
 Expand use of a view (physics_fall_2009) in a query/another view

create view physics_fall_2009_watson as


(select course_id, room_number
from (select course.course_id, building,
room_number from course, section
where course.course_id = section.course_id
and course.dept_name = ’Physics’ and
[Link] = ’Fall’ and [Link] =
’2009’) where building= ’Watson’;)

Views Defined Using Other Views


 One view may be used in the expression defining another view
42
 A view relation v1 is said to depend directly on a view relation v2
if v2 is used in the expression defining v1
 A view relation v1 is said to depend on view relation v2 if either
v1 depends directly to v2 or there is a path of dependencies
from v1 to v2
 A view relation v is said to be recursive if it depends on itself.

View Expansion
 A way to define the meaning of views defined in terms of other
views.
 Let view v1 be defined by an expression e1 that may itself
contain uses of view relations.
 View expansion of an expression repeats the following
replacement step:
repeat
Find any view relation vi in e1

43
Replace the view relation vi by the expression defining vi
until no more view relations are present in e1
 As long as the view definitions are not recursive, this loop will
terminate

Update of a View
 Add a new tuple to faculty view which we defined earlier
insert into faculty values (’30765’, ’Green’, ’Music’);
This insertion must be represented by the insertion of the tuple
(’30765’, ’Green’, ’Music’, null)
into the instructor relation

44
Some Updates cannot be Translated Uniquely
 create view instructor_info as select ID, name, building
from instructor, department where instructor.dept_name=
department.dept_name;
 insert into instructor_info values (’69987’, ’White’, ’Taylor’);
which department, if multiple departments in Taylor?
what if no department is in Taylor?
 Most SQL implementations allow updates only on simple views
 The from clause has only one database relation.
 The select clause contains only attribute names of the
relation, and does not have any expressions, aggregates, or
distinct specification.
 Any attribute not listed in the select clause can be set to null
 The query does not have a group by or having clause.

45
More Problems
 create view history_instructors as select *
from instructor
where dept_name= ’History’;
 What happens if we insert (’25566’, ’Brown’,
’Biology’, 100000) into history_instructors?

46
Materialized Views
 When defining a view, simply create a physical table
representing the view at the time of creation.
 Update is simple to handle.
 How are updates handled to the “base” relations on which the
view was defined?

SQL> select deptname,max(salary) as maxsal from instructor group by deptname order by avg(salary) desc;

DEPTNAME MAXSAL

---------- ----------

Physics 95000

Finance 90000

[Link] 80000

Comp. Sci 92000

Biology 72000

History 62000

Music 40000

47
7 rows selected.

48

You might also like