Database Systems with SQL
Part I: Foundations
Chapter 1: Introduction to Database Systems
1.1 What is a Database?
A database is an organized collection of structured information, or data, typically stored
electronically in a computer system. A database is usually controlled by a Database
Management System (DBMS). Together, the data, the DBMS, and the applications that
access them are referred to as a database system.
Think of a traditional file cabinet: it holds folders (tables) containing papers (records). A
database does the same but electronically, with powerful tools to retrieve, insert, update, and
delete data quickly and reliably.
1.2 Database Management System (DBMS)
A DBMS is software that interacts with end users, applications, and the database itself to
capture and analyze data. The DBMS software additionally encompasses the core facilities
provided to administer the database. The sum total of the database, the DBMS, and the
associated applications can be referred to as a database system.
Advantages of a DBMS over file-based systems
Data independence: Application programs are insulated from changes in the way
data is structured and stored.
Efficient data access: A DBMS utilizes sophisticated techniques to store and retrieve
data efficiently, even with very large datasets.
Data integrity and security: The DBMS can enforce integrity constraints (e.g.,
ensuring ages are positive) and control access to data.
Data administration: Centralized management facilitates a shared understanding of
data across an organization.
Concurrent access and crash recovery: A DBMS allows multiple users to access
data simultaneously without compromising integrity, and can recover data after a
system failure.
Reduced application development time: Developing applications on top of a DBMS
is faster due to the high-level functionalities provided (e.g., SQL queries,
transactions).
1.3 The Relational Model
The relational model is the most widely used data model for DBMSs. In this model, data is
organized into one or more tables (or "relations") of columns and rows, with a unique key
identifying each row.
Relation: A table with columns and rows.
Attribute: A named column of a relation.
Tuple: A row of a relation.
Domain: The set of allowable values for one or more attributes.
Schema: The name of a relation and the set of attributes with their corresponding
domains. For example: Student(sid: integer, sname: string, age:
integer).
Instance: A snapshot of the data in the database at a given instant in time.
1.4 Introduction to SQL
Structured Query Language (SQL) is the standard language for interacting with relational
database systems. It is a declarative language: you specify what data you want, not how to
retrieve it. The DBMS's query optimizer determines the most efficient way to execute your
query.
SQL is used to manage and interact with relational databases. It includes several types of
commands that allow users to define database structures, manipulate data, and control access.
Data Definition Language (DDL) - DDL is used to create, modify, and delete
database structures such as tables, indexes, and views. Examples: CREATE, ALTER,
DROP.
Data Manipulation Language (DML) - DML is used to insert, update, delete, and
retrieve data stored in database tables. Examples: INSERT, UPDATE, DELETE,
SELECT.
Data Control Language (DCL) - DCL is used to manage user permissions and
control access to database resources. Examples: GRANT, REVOKE.
Chapter 2: Database Design and Data Definition
Language (DDL)
2.1 The Database Design Process
Good database design is crucial for a successful software system. The process typically
involves:
1. Requirements Analysis: Understanding what data needs to be stored and what
relationships exist.
2. Conceptual Design: Creating a high-level description of the data, often using an
Entity-Relationship (ER) model. Entities become tables, attributes become columns,
and relationships become foreign keys.
3. Logical Design: Translating the conceptual model into the schema of a relational
DBMS. This involves defining tables, attributes, and keys. Normalization is applied
to eliminate redundancy and update anomalies.
4. Physical Design: Deciding on physical storage structures and access methods (like
indexes) to optimize performance.
2.1.1 Keys and Constraints
Primary Key: A column or set of columns that uniquely identifies each row in a table
(e.g., StudentID). It enforces entity integrity (no part of a primary key can be null).
Foreign Key: A column or set of columns in one table that refers to the primary key
of another table. It establishes a relationship and enforces referential integrity
(values must match an existing primary key or be null).
Unique Constraint: Ensures that all values in a column (or a set of columns) are
distinct.
Check Constraint: Specifies a condition that must be true for every row in a table
(e.g., age >= 18).
Not Null Constraint: Ensures a column cannot have NULL values.
2.2 Data Types in SQL
Common SQL data types (syntax may vary slightly across DBMS):
Category Data Types Description
INT, SMALLINT, BIGINT, Integer and floating-point numbers.
Numeric DECIMAL(p,s), NUMERIC(p,s), DECIMAL(5,2) stores numbers like
FLOAT, REAL 123.45.
Character CHAR(n), VARCHAR(n), TEXT Fixed-length and variable-length strings.
Date/Time DATE, TIME, TIMESTAMP, DATETIME Date, time, and timestamp values.
Boolean BOOLEAN True or false.
Binary BLOB, BINARY Binary large objects, e.g., images, files.
Others JSON, XML, UUID, ARRAY Semi-structured and specialized types.
2.3 Creating Tables (CREATE TABLE)
The CREATE TABLE statement defines a new table, its columns, data types, and constraints.
Syntax: in sql
CREATE TABLE table_name (
column1 datatype constraint,
column2 datatype constraint,
...
table_constraints
);
Example: Creating a Students table
sql
CREATE TABLE Students (
StudentID INT PRIMARY KEY,
FirstName VARCHAR(50) NOT NULL,
LastName VARCHAR(50) NOT NULL,
DateOfBirth DATE,
Email VARCHAR(100) UNIQUE,
EnrollmentDate DATE DEFAULT CURRENT_DATE
);
Example: Creating related Courses and Enrollments tables
sql
CREATE TABLE Courses (
CourseID INT PRIMARY KEY,
CourseName VARCHAR(100) NOT NULL,
Credits INT CHECK (Credits > 0)
);
CREATE TABLE Enrollments (
EnrollmentID INT PRIMARY KEY,
StudentID INT NOT NULL,
CourseID INT NOT NULL,
EnrollmentDate DATE DEFAULT CURRENT_DATE,
Grade CHAR(2),
FOREIGN KEY (StudentID) REFERENCES Students(StudentID),
FOREIGN KEY (CourseID) REFERENCES Courses(CourseID),
UNIQUE (StudentID, CourseID) -- a student can enroll only once
per course
);
2.4 Modifying Table Structure (ALTER TABLE)
Use ALTER TABLE to add, delete, or modify columns and constraints.
Examples:
sql
-- Add a new column
ALTER TABLE Students ADD PhoneNumber VARCHAR(15);
-- Modify an existing column's data type
ALTER TABLE Students ALTER COLUMN PhoneNumber VARCHAR(20);
-- Drop a column
ALTER TABLE Students DROP COLUMN PhoneNumber;
-- Add a foreign key constraint
ALTER TABLE Enrollments ADD CONSTRAINT FK_Student FOREIGN KEY
(StudentID) REFERENCES Students(StudentID);
-- Drop a constraint
ALTER TABLE Enrollments DROP CONSTRAINT FK_Student;
2.5 Deleting Tables (DROP TABLE)
The DROP TABLE statement removes an existing table and its data permanently.
sql
DROP TABLE Enrollments;
-- To avoid errors if the table doesn't exist:
DROP TABLE IF EXISTS Students;
2.6 Practical Examples and Exercises
Exercise 2.1: Create a database for a simple library. Define tables for Books, Members, and
Loans. Include appropriate primary keys, foreign keys, and constraints (e.g., loan date default
to current date, return date can be null, unique constraint on member email).
Solution: in sql
CREATE TABLE Books (
BookID INT PRIMARY KEY,
Title VARCHAR(200) NOT NULL,
Author VARCHAR(100),
ISBN VARCHAR(20) UNIQUE,
PublishedYear INT CHECK (PublishedYear > 1400)
);
CREATE TABLE Members (
MemberID INT PRIMARY KEY,
FirstName VARCHAR(50) NOT NULL,
LastName VARCHAR(50) NOT NULL,
Email VARCHAR(100) UNIQUE,
JoinDate DATE DEFAULT CURRENT_DATE
);
CREATE TABLE Loans (
LoanID INT PRIMARY KEY,
BookID INT NOT NULL,
MemberID INT NOT NULL,
LoanDate DATE DEFAULT CURRENT_DATE,
ReturnDate DATE,
FOREIGN KEY (BookID) REFERENCES Books(BookID),
FOREIGN KEY (MemberID) REFERENCES Members(MemberID)
);
Part II: Data Manipulation and Querying
Chapter 3: Data Manipulation Language (DML) - Core
Operations
Data Manipulation Language (DML) refers to the set of SQL commands used to manage
and manipulate the data stored within database tables. While Data Definition Language
(DDL) defines the structure of the database, DML focuses on interacting with the data
itself. These operations are commonly summarized by the acronym CRUD, which stands for
Create, Read, Update, and Delete.
Importance of DML
DML operations are essential because they allow organizations to:
Maintain and update stored information
Retrieve data for analysis and reporting
Correct or modify existing records
Remove outdated or unnecessary data
These operations form the core interaction layer between applications and relational
databases, enabling systems such as banking platforms, e-commerce websites, and enterprise
management systems to function effectively.
3.1 Inserting Data (INSERT)
The Create operation adds new records into a table. In SQL, this is performed using the
INSERT statement. The INSERT INTO statement adds new rows to a table. Summary:
INSERT INTO specifies the table where data will be added.
The column names identify where values should be stored.
VALUES provides the data to be inserted.
This operation is commonly used when new users register, new transactions occur, or new
records need to be stored.
Syntax:
sql
INSERT INTO table_name (column1, column2, ...)
VALUES (value1, value2, ...);
Examples: in sql
-- Insert a single student
INSERT INTO Students (StudentID, FirstName, LastName, DateOfBirth,
Email)
VALUES (1, 'John', 'Doe', '2000-05-15', '[Link]@[Link]');
-- Insert multiple students
INSERT INTO Students (StudentID, FirstName, LastName, DateOfBirth,
Email)
VALUES
(2, 'Jane', 'Smith', '1999-08-22', '[Link]@[Link]'),
(3, 'Bob', 'Johnson', '2001-01-10', '[Link]@[Link]');
-- Insert using DEFAULT for EnrollmentDate
INSERT INTO Students (StudentID, FirstName, LastName, DateOfBirth,
Email, EnrollmentDate)
VALUES (4, 'Alice', 'Brown', '2000-12-01', 'alice@[Link]',
DEFAULT);
-- Insert with NULL for Grade
INSERT INTO Enrollments (EnrollmentID, StudentID, CourseID, Grade)
VALUES (5001, 1, 101, NULL);
3.2 Querying Data (SELECT basics)
The Read operation retrieves data from a database. This is done using the SELECT
statement. The SELECT statement retrieves data from one or more tables. Summary:
SELECT specifies which columns to retrieve.
FROM identifies the table.
WHERE filters records based on conditions.
Additional clauses such as ORDER BY, GROUP BY, and LIMIT can refine the results.
Basic Syntax: in sql
SELECT column1, column2, ...
FROM table_name;
Examples: in sql
-- Select all columns and all rows
SELECT * FROM Students;
-- Select specific columns
SELECT FirstName, LastName FROM Students;
-- Select with expressions
SELECT FirstName || ' ' || LastName AS FullName FROM Students; --
concatenation (syntax varies)
3.3 Filtering with WHERE
The WHERE clause filters rows based on a condition. WHERE specifies which rows
Operators: =, <>, >, <, >=, <=, BETWEEN, LIKE, IN, IS NULL, AND, OR, NOT.
Examples: in sql
-- Equality
SELECT * FROM Students WHERE StudentID = 1;
-- Range
SELECT * FROM Students WHERE DateOfBirth BETWEEN '2000-01-01' AND
'2000-12-31';
-- Pattern matching (LIKE)
SELECT * FROM Students WHERE Email LIKE '%@[Link]';
-- List of values (IN)
SELECT * FROM Courses WHERE CourseID IN (101, 102, 103);
-- Null check
SELECT * FROM Enrollments WHERE Grade IS NULL;
-- Logical combinations
SELECT * FROM Students WHERE LastName = 'Smith' AND DateOfBirth >
'1999-01-01';
3.4 Sorting with ORDER BY
ORDER BY sorts the result set. Default is ascending (ASC); use DESC for descending.
In sql
SELECT FirstName, LastName, DateOfBirth
FROM Students
ORDER BY LastName ASC, DateOfBirth DESC;
3.5 Removing Duplicates with DISTINCT
DISTINCT eliminates duplicate rows from the result.
sql
SELECT DISTINCT LastName FROM Students;
3.6 Updating Data (UPDATE)
UPDATE modifies existing records. Always use a WHERE clause unless you intend to update
all rows. In sql
UPDATE Students
SET Email = '[Link]@[Link]'
WHERE StudentID = 1;
-- Update multiple columns
UPDATE Enrollments
SET Grade = 'A', EnrollmentDate = '2023-09-01'
WHERE StudentID = 1 AND CourseID = 101;
3.7 Deleting Data (DELETE)
DELETE removes records. Always use a WHERE clause unless you intend to delete all rows.
In sql
DELETE FROM Enrollments WHERE StudentID = 1 AND CourseID = 101;
-- Delete all enrollments for a specific student
DELETE FROM Enrollments WHERE StudentID = 1;
-- Delete all rows (use with caution!)
DELETE FROM Enrollments;
3.8 Practical Examples and Exercises
Exercise 3.1: Using the library database from Exercise 2.1, perform the following:
1. Insert at least 3 books, 3 members, and 2 loans.
2. Retrieve all books by a specific author.
3. Find all loans that are still active (return date is null).
4. Update the return date for a specific loan.
5. Delete a member who has no active loans.
Solutions: In sql
-- 1. Insert data
INSERT INTO Books VALUES
(1, '1984', 'George Orwell', '978-0451524935', 1949),
(2, 'To Kill a Mockingbird', 'Harper Lee', '978-0061120084', 1960),
(3, 'The Great Gatsby', 'F. Scott Fitzgerald', '978-0743273565',
1925);
INSERT INTO Members VALUES
(1, 'Alice', 'Green', 'alice@[Link]', DEFAULT),
(2, 'Bob', 'White', 'bob@[Link]', DEFAULT),
(3, 'Charlie', 'Black', 'charlie@[Link]', DEFAULT);
INSERT INTO Loans (LoanID, BookID, MemberID, LoanDate, ReturnDate)
VALUES
(101, 1, 1, '2024-01-10', NULL),
(102, 2, 2, '2024-01-12', '2024-01-20');
-- 2. Books by Orwell
SELECT * FROM Books WHERE Author = 'George Orwell';
-- 3. Active loans
SELECT * FROM Loans WHERE ReturnDate IS NULL;
-- 4. Update return date for loan 101
UPDATE Loans SET ReturnDate = CURRENT_DATE WHERE LoanID = 101;
-- 5. Delete member with no active loans (first ensure no active
loans)
DELETE FROM Members WHERE MemberID = 3 AND NOT EXISTS (SELECT 1 FROM
Loans WHERE MemberID = 3 AND ReturnDate IS NULL);
Chapter 4: Advanced Querying Techniques
This chapter covers more sophisticated ways to retrieve and analyze data by combining
information from multiple tables and using advanced SQL features.
4.1 Joins
Joins combine rows from two or more tables based on a related column. They are SQL
operations used to combine rows from two or more tables based on a related column,
usually a primary key–foreign key relationship. Joins allow data stored in separate tables to
be queried together, which is a fundamental feature of relational database design.
For example, in a university database:
Students table stores student information.
Courses table stores course details.
Enrollments table links students to courses.
Using joins, we can retrieve information such as which students are enrolled in which
courses.
4.1.1 Inner Join
An INNER JOIN returns only the rows where matching values exist in both tables. Rows
without a match in either table are excluded from the result. Summary:
Students is joined with Enrollments using StudentID.
Enrollments is then joined with Courses using CourseID.
Only students who are actually enrolled in courses appear in the results.
Use case: Finding students registered in specific courses. In sql
SELECT [Link], [Link], [Link]
FROM Students
INNER JOIN Enrollments ON [Link] = [Link]
INNER JOIN Courses ON [Link] = [Link];
4.1.2 Left (Outer) Join
A LEFT JOIN returns all rows from the left table, along with matching rows from the right table. If
there is no match, the columns from the right table contain NULL values. Summary:
All students are listed.
If a student has not enrolled in a course, the Grade column will show NULL.
Use case: Listing all students, including those not yet enrolled in any courses. In sql
SELECT [Link], [Link], [Link]
FROM Students
LEFT JOIN Enrollments ON [Link] = [Link];
4.1.3 Right (Outer) Join
Returns all rows from the right table, and matched rows from the left. (Less common; can be
rewritten as LEFT JOIN by swapping tables.) Example:
SELECT [Link], [Link]
FROM Students
RIGHT JOIN Enrollments
ON [Link] = [Link];
All enrollment records appear in the result.
If an enrollment record has no matching student, student information will appear as
NULL.
4.1.4 Full Outer Join
Returns all rows when there is a match in either table. Unmatched rows from both sides are
included with NULLs. Not all DBMS support FULL OUTER JOIN (e.g., MySQL lacks it,
but can be simulated with UNION).
Example:
SELECT [Link], [Link]
FROM Students
FULL OUTER JOIN Enrollments
ON [Link] = [Link];
Result:
Students without enrollments appear.
Enrollment records without matching students also appear.
Note:
Some database systems like MySQL do not directly support FULL OUTER JOIN. It can be
simulated using LEFT JOIN and RIGHT JOIN combined with UNION.
Example:
SELECT * FROM Students
LEFT JOIN Enrollments ON [Link] = [Link]
UNION
SELECT * FROM Students
RIGHT JOIN Enrollments ON [Link] = [Link];
4.1.5 Self-Join
A Self-Join occurs when a table is joined with itself. This is useful when comparing rows
within the same table. Table aliases must be used to distinguish the two instances of the table.
Example: Find pairs of students born on the same date.
sql
SELECT [Link] AS Student1, [Link] AS Student2, [Link]
FROM Students A
INNER JOIN Students B ON [Link] = [Link]
WHERE [Link] < [Link]; -- avoid self-pair and duplicate pairs
The Students table is referenced twice using aliases A and B.
The condition [Link] < [Link] prevents duplicate pairs and self-
matching.
Use cases: Finding employees with the same manager, Comparing records within the same
dataset, Identifying duplicates.
4.2 Aggregations and GROUP BY
Aggregate functions perform a calculation on a set of values and return a single value.
Common functions: COUNT, SUM, AVG, MIN, MAX. The GROUP BY clause groups rows that have
the same values in specified columns into summary rows.
Example: Count enrollments per course. In sql
SELECT CourseID, COUNT(*) AS EnrollmentCount
FROM Enrollments
GROUP BY CourseID;
Example: Calculate average grade per course (assuming numeric grades). In sql
SELECT [Link], AVG([Link]) AS AvgGrade
FROM Enrollments
JOIN Courses ON [Link] = [Link]
GROUP BY [Link];
4.3 Filtering Groups with HAVING
HAVING filters groups based on aggregate results, similar to WHERE for rows.
Example: Find courses with more than 5 enrollments. In sql
SELECT CourseID, COUNT(*) AS EnrollmentCount
FROM Enrollments
GROUP BY CourseID
HAVING COUNT(*) > 5;
4.4 Subqueries
A subquery is a SQL query that is nested inside another query. It allows the result of one
query to be used as input for another query. Subqueries are useful for breaking complex
queries into smaller logical parts. A subquery is a query nested inside another query.
Subqueries can be used in SELECT, FROM, WHERE, and HAVING clauses. Subqueries are
generally classified based on what they return and how they interact with the outer
query.
Scalar Subquery (returns single value) in sql
A scalar subquery returns one value (one row and one column). It is often used in the
SELECT list or in conditions. Example: Calculate each student’s average grade
SELECT FirstName, LastName,
(SELECT AVG(Grade) FROM Enrollments WHERE StudentID =
[Link]) AS AvgGrade
FROM Students;
The outer query retrieves student names.
The subquery calculates the average grade for each student.
The subquery runs for each row in the Students table.
Use case:
Adding calculated values such as averages, totals, or counts to query results.
Row Subquery (returns a single row) in sql
SELECT * FROM Students
WHERE (StudentID, DateOfBirth) = (SELECT StudentID, DateOfBirth FROM
Students WHERE FirstName = 'John');
Table Subquery (returns a set of rows) in sql
SELECT FirstName, LastName
FROM Students
WHERE StudentID IN (SELECT StudentID FROM Enrollments WHERE CourseID = 101);
Correlated Subquery (references outer query) in sql
-- Find students who have enrolled in more courses than average
SELECT StudentID, FirstName, LastName
FROM Students S
WHERE (SELECT COUNT(*) FROM Enrollments E WHERE [Link] = [Link]) >
(SELECT AVG(cnt) FROM (SELECT COUNT(*) AS cnt FROM Enrollments GROUP BY
StudentID) AS AvgCounts);
4.5 Common Table Expressions (CTEs)
A Common Table Expression (CTE) is a temporary named result set that exists only within
the execution scope of a single SQL statement. CTEs are defined using the WITH clause and
can significantly improve query readability and maintainability.
Key Benefits of CTE
Improved readability - Break complex queries into simpler parts
Reusability - Reference the same result set multiple times in a query
Recursive capabilities - Enable hierarchical or graph-based queries
Alternative to views - Useful for one-off queries without creating permanent views
Example: Using a CTE to find the most popular course. In sql
WITH CourseEnrollments AS (
SELECT CourseID, COUNT(*) AS EnrollmentCount
FROM Enrollments
GROUP BY CourseID
),
MaxEnrollment AS (
SELECT MAX(EnrollmentCount) AS MaxCount FROM CourseEnrollments
)
SELECT CourseID, EnrollmentCount
FROM CourseEnrollments
WHERE EnrollmentCount = (SELECT MaxCount FROM MaxEnrollment);
Recursive CTE Example: (for hierarchical data, e.g., employee hierarchy. In sql
WITH RECURSIVE EmployeeHierarchy AS (
-- Anchor: top-level employees
SELECT EmployeeID, ManagerID, FullName, 1 AS Level
FROM Employees
WHERE ManagerID IS NULL
UNION ALL
-- Recursive: employees reporting to those already in the CTE
SELECT [Link], [Link], [Link], [Link] + 1
FROM Employees e
INNER JOIN EmployeeHierarchy eh ON [Link] = [Link]
)
SELECT * FROM EmployeeHierarchy;
4.6 Window Functions
Window functions perform calculations across a set of rows related to the current row,
without collapsing them. They are identified by the OVER clause.
Common window functions: ROW_NUMBER(), RANK(), DENSE_RANK(), LEAD(), LAG(),
SUM(), AVG() with OVER.
Example: Rank students by their grade in a course (highest grade first). In sql
SELECT
StudentID,
Grade,
RANK() OVER (ORDER BY Grade DESC) AS GradeRank
FROM Enrollments
WHERE CourseID = 101;
Example: For each student, show their enrollment date and the previous enrollment date for
that student. In sql
SELECT
StudentID,
EnrollmentDate,
LAG(EnrollmentDate) OVER (PARTITION BY StudentID ORDER BY
EnrollmentDate) AS PreviousEnrollment
FROM Enrollments;
Example: Running total of sales per month. In sql
SELECT
SalesMonth,
SalesAmount,
SUM(SalesAmount) OVER (ORDER BY SalesMonth) AS RunningTotal
FROM MonthlySales;
4.7 Set Operations
Set operations in SQL allow you to combine results from multiple SELECT statements into a
single result set. These operations treat query results as sets and perform mathematical set
operations on them. Set operations thus combine results from two or more queries. They
require the same number of columns and compatible data types.
UNION – combines results and removes duplicates.
UNION ALL – combines results, keeping duplicates.
INTERSECT – returns rows common to both queries.
EXCEPT (or MINUS) – returns rows from the first query that are not in the second.
Key Requirements for Set Operations
Same number of columns in all SELECT statements
Compatible data types for corresponding columns
Column names are taken from the first SELECT statement
ORDER BY can only be applied to the final result
Database set up for this section
To use the set operations (UNION, INTERSECT, EXCEPT) and CTEs demonstrated in the
examples, your database needs to be structured around a few core tables that represent a
simplified course management system. The examples in sections 4.5 and 4.7 are built upon a
schema that includes tables for students, instructors, courses, enrollments, and related
activities. Here is a breakdown of the tables you would need to create to make all the
examples work.
Core Tables for the Examples
The SQL examples reference several tables that are related to each other. Here are the key
ones:
students: Stores information about each student.
o student_id (Primary Key): A unique identifier for each student.
o first_name, last_name: The student's name.
o Potentially other columns like email, enrollment_date, etc.
instructors: Stores information about each instructor.
o instructor_id (Primary Key): A unique identifier for each instructor.
o first_name, last_name: The instructor's name.
courses: Stores information about the courses offered.
o course_id (Primary Key): A unique identifier for each course (e.g., 'SQL101',
'MATH101').
o course_name: The full name of the course.
o instructor_id (Foreign Key): Links to the instructors table to identify the
teacher .
o prerequisite_id: (Optional) For recursive CTE examples, a column that points
to another course_id to define prerequisites.
enrollments: A junction table that records which students are enrolled in which courses.
This is a classic many-to-many relationship.
o enrollment_id (Primary Key): A unique identifier for each enrollment record.
o student_id (Foreign Key): Links to the students table .
o course_id (Foreign Key): Links to the courses table .
o enrollment_date: The date the student enrolled.
reviews: (Used in advanced CTE examples) Stores student reviews for courses.
o review_id (Primary Key): A unique identifier for each review.
o student_id (Foreign Key): Links to the students table.
o course_id (Foreign Key): Links to the courses table.
o rating: A numerical rating (e.g., 1-5).
o review_date: The date the review was posted.
forum_posts: (Used in advanced CTE examples) Stores student forum activity.
o post_id (Primary Key): A unique identifier for each post.
o student_id (Foreign Key): Links to the students table.
o thread_id: An identifier for the discussion thread.
o post_date: The date the post was made.
Types of Set Operations
1. UNION
Combines results from multiple queries and removes duplicates. In sql
-- Find all people associated with the university (students or
instructors)
SELECT first_name, last_name, 'Student' AS role
FROM students
UNION
SELECT first_name, last_name, 'Instructor' AS role
FROM instructors
ORDER BY last_name, first_name;
2. UNION ALL
Combines results from multiple queries but keeps duplicates. This is faster than UNION
because it doesn't perform duplicate elimination. In sql
-- Get all course enrollments and waitlist entries
SELECT student_id, course_id, enrollment_date, 'Enrolled' AS status
FROM enrollments
UNION ALL
SELECT student_id, course_id, request_date, 'Waitlisted' AS status
FROM waitlist
ORDER BY student_id, enrollment_date;
3. INTERSECT
Returns only rows that appear in both result sets. In sql
-- Find students who are enrolled in both Math and Physics courses
SELECT student_id, first_name, last_name
FROM students
WHERE student_id IN (
SELECT student_id FROM enrollments WHERE course_id = 'MATH101'
INTERSECT
SELECT student_id FROM enrollments WHERE course_id = 'PHYS101'
);
4. EXCEPT (or MINUS in Oracle)
Returns rows from the first query that do not appear in the second query. In sql
-- Find courses that have never been enrolled in
SELECT course_id, course_name
FROM courses
WHERE course_id IN (
SELECT course_id FROM courses
EXCEPT
SELECT DISTINCT course_id FROM enrollments
);
Practical Examples
Example 1: Comprehensive Student Activity Report in sql
-- Combine all student activities (enrollments, reviews, forum posts)
SELECT
student_id,
activity_type,
activity_date,
reference_id
FROM (
SELECT
student_id,
'Enrollment' AS activity_type,
enrollment_date AS activity_date,
course_id AS reference_id
FROM enrollments
UNION ALL
SELECT
student_id,
'Review',
review_date,
course_id
FROM reviews
UNION ALL
SELECT
student_id,
'Forum Post',
post_date,
thread_id
FROM forum_posts
) AS all_activities
WHERE activity_date >= '2024-01-01'
ORDER BY student_id, activity_date DESC;
Example 2: Comparing Student Populations In sql
-- Analyze different student segments
WITH
active_students AS (
SELECT DISTINCT student_id
FROM enrollments
WHERE enrollment_date >= DATE('now', '-6 months')
),
high_achievers AS (
SELECT DISTINCT student_id
FROM grades
WHERE grade IN ('A', 'A+')
),
forum_participants AS (
SELECT DISTINCT student_id
FROM forum_posts
WHERE post_date >= DATE('now', '-3 months')
)
SELECT
'Active Only' AS segment,
COUNT(*) AS student_count
FROM active_students
WHERE student_id NOT IN (
SELECT student_id FROM high_achievers
UNION
SELECT student_id FROM forum_participants
)
UNION ALL
SELECT
'High Achievers Only',
COUNT(*)
FROM high_achievers
WHERE student_id NOT IN (
SELECT student_id FROM active_students
UNION
SELECT student_id FROM forum_participants
)
UNION ALL
SELECT
'Forum Participants Only',
COUNT(*)
FROM forum_participants
WHERE student_id NOT IN (
SELECT student_id FROM active_students
UNION
SELECT student_id FROM high_achievers
)
UNION ALL
SELECT
'Multiple Activities',
COUNT(*)
FROM (
SELECT student_id FROM active_students
INTERSECT
SELECT student_id FROM high_achievers
INTERSECT
SELECT student_id FROM forum_participants
);
Example 3: Finding Overlapping Course Materials in sql
-- Find resources that are used across multiple courses
SELECT
resource_id,
resource_title,
'Used in multiple courses' AS status
FROM resources
WHERE resource_id IN (
SELECT resource_id FROM course_resources WHERE course_id =
'SQL101'
INTERSECT
SELECT resource_id FROM course_resources WHERE course_id =
'DB101'
)
UNION
SELECT
resource_id,
resource_title,
'SQL101 only'
FROM resources
WHERE resource_id IN (
SELECT resource_id FROM course_resources WHERE course_id =
'SQL101'
EXCEPT
SELECT resource_id FROM course_resources WHERE course_id =
'DB101'
)
UNION
SELECT
resource_id,
resource_title,
'DB101 only'
FROM resources
WHERE resource_id IN (
SELECT resource_id FROM course_resources WHERE course_id =
'DB101'
EXCEPT
SELECT resource_id FROM course_resources WHERE course_id =
'SQL101'
);
Example 4: Advanced Data Reconciliation in sql
-- Reconcile payment data between two systems
WITH
system_a_payments AS (
SELECT
student_id,
amount,
payment_date,
'System A' AS source
FROM payments_system_a
WHERE payment_date >= '2024-01-01'
),
system_b_payments AS (
SELECT
student_id,
amount,
payment_date,
'System B' AS source
FROM payments_system_b
WHERE payment_date >= '2024-01-01'
),
all_payments AS (
SELECT * FROM system_a_payments
UNION ALL
SELECT * FROM system_b_payments
)
SELECT
'Matched Payments' AS category,
COUNT(*) AS count,
SUM(amount) AS total_amount
FROM (
SELECT student_id, amount, payment_date
FROM system_a_payments
INTERSECT
SELECT student_id, amount, payment_date
FROM system_b_payments
)
UNION ALL
SELECT
'Only in System A',
COUNT(*),
SUM(amount)
FROM (
SELECT student_id, amount, payment_date
FROM system_a_payments
EXCEPT
SELECT student_id, amount, payment_date
FROM system_b_payments
)
UNION ALL
SELECT
'Only in System B',
COUNT(*),
SUM(amount)
FROM (
SELECT student_id, amount, payment_date
FROM system_b_payments
EXCEPT
SELECT student_id, amount, payment_date
FROM system_a_payments
);
Comparison of Set Operations
Operation Duplicates? Performance Use Case
UNION Removed Slower (sorting Combine distinct results
needed)
UNION ALL Kept Faster Combine all results including
duplicates
INTERSECT Removed Slower Find common records
EXCEPT Removed Slower Find records in one set but not
another
Important Notes:
1. Column Alignment: Columns must match in number and data type: in sql
-- Correct
SELECT student_id, student_name FROM students
UNION
SELECT instructor_id, instructor_name FROM instructors;
-- Incorrect (different number of columns)
SELECT student_id, student_name, email FROM students
UNION
SELECT instructor_id, instructor_name FROM instructors; -- Error!
2. ORDER BY Placement: in sql
-- Correct
SELECT student_id FROM students
UNION
SELECT instructor_id FROM instructors
ORDER BY student_id; -- ORDER BY at the end
-- Incorrect
SELECT student_id FROM students ORDER BY student_id
UNION
SELECT instructor_id FROM instructors; -- Error!
3. Column Names: The first query determines column names in the result: in sql
SELECT student_id AS id, student_name AS name FROM students
UNION
SELECT instructor_id, instructor_name FROM instructors;
-- Result columns will be named "id" and "name"
Set operations are powerful tools for combining and comparing data from different sources,
making them essential for data analysis, reporting, and data reconciliation tasks.
4.8 Practical Examples and Exercises
Exercise 4.1: Using the library database, write queries to:
1. List all members who have borrowed at least one book, along with the number of
books they have borrowed.
2. Find the most borrowed book (the book with the highest number of loans).
3. For each member, show their name and the date of their most recent loan.
4. List members who have never borrowed a book.
Solutions:
sql
-- 1. Members with borrow count
SELECT [Link], [Link], [Link], COUNT([Link]) AS BorrowCount
FROM Members m
LEFT JOIN Loans l ON [Link] = [Link]
GROUP BY [Link], [Link], [Link]
HAVING COUNT([Link]) > 0;
-- 2. Most borrowed book
WITH BookLoanCounts AS (
SELECT BookID, COUNT(*) AS LoanCount
FROM Loans
GROUP BY BookID
)
SELECT [Link], [Link], [Link]
FROM Books b
JOIN BookLoanCounts blc ON [Link] = [Link]
WHERE [Link] = (SELECT MAX(LoanCount) FROM BookLoanCounts);
-- 3. Most recent loan per member
SELECT [Link], [Link], [Link], MAX([Link]) AS LastLoanDate
FROM Members m
LEFT JOIN Loans l ON [Link] = [Link]
GROUP BY [Link], [Link], [Link];
-- 4. Members who never borrowed
SELECT m.*
FROM Members m
LEFT JOIN Loans l ON [Link] = [Link]
WHERE [Link] IS NULL;
Part III: Advanced Database Programming
Chapter 5: Views, Stored Procedures, and Triggers
5.1 Views
A view is a virtual table based on the result set of a SELECT statement. It does not store data
physically but provides a way to encapsulate complex queries and present a customized
perspective of the data.
Creating a view:
sql
CREATE VIEW StudentEnrollmentSummary AS
SELECT
[Link],
[Link] + ' ' + [Link] AS FullName,
COUNT([Link]) AS TotalCoursesEnrolled
FROM Students s
LEFT JOIN Enrollments e ON [Link] = [Link]
GROUP BY [Link], [Link], [Link];
Using the view:
sql
SELECT * FROM StudentEnrollmentSummary WHERE TotalCoursesEnrolled > 0;
Updatable Views: In some DBMS, views can be updatable if they are based on a single table
and do not contain aggregates, DISTINCT, or GROUP BY. You can then INSERT, UPDATE,
DELETE through the view.
Materialized Views: Some databases (like Oracle, PostgreSQL) support materialized views
that store the result set physically for faster access, but they need to be refreshed periodically.
5.2 Stored Procedures
A stored procedure is a set of precompiled SQL statements stored on the server. Procedures
can accept parameters, perform complex operations, and return results. They enhance
performance, security, and reusability.
Syntax (SQL Server / T-SQL example):
sql
CREATE PROCEDURE EnrollStudent
@StudentID INT,
@CourseID INT
AS
BEGIN
IF EXISTS (SELECT 1 FROM Students WHERE StudentID = @StudentID)
AND EXISTS (SELECT 1 FROM Courses WHERE CourseID = @CourseID)
BEGIN
INSERT INTO Enrollments (EnrollmentID, StudentID, CourseID,
EnrollmentDate)
VALUES (NEXT VALUE FOR EnrollmentSeq, @StudentID, @CourseID,
GETDATE());
PRINT 'Enrollment successful.';
END
ELSE
BEGIN
PRINT 'Invalid StudentID or CourseID.';
END
END;
Executing the procedure: in sql
EXEC EnrollStudent @StudentID = 1, @CourseID = 101;
Procedures in PostgreSQL: in sql
CREATE OR REPLACE FUNCTION enroll_student(p_student_id INT, p_course_id
INT)
RETURNS VOID AS $$
BEGIN
IF EXISTS (SELECT 1 FROM students WHERE student_id = p_student_id) AND
EXISTS (SELECT 1 FROM courses WHERE course_id = p_course_id) THEN
INSERT INTO enrollments (enrollment_id, student_id, course_id,
enrollment_date)
VALUES (nextval('enrollment_seq'), p_student_id, p_course_id,
CURRENT_DATE);
RAISE NOTICE 'Enrollment successful.';
ELSE
RAISE NOTICE 'Invalid StudentID or CourseID.';
END IF;
END;
$$ LANGUAGE plpgsql;
5.3 Functions
Functions are similar to procedures but return a single value (scalar) or a table. They can be
used in SQL expressions.
Scalar Function Example (SQL Server): in sql
CREATE FUNCTION [Link] (@BirthDate DATE)
RETURNS INT
AS
BEGIN
RETURN DATEDIFF(YEAR, @BirthDate, GETDATE()) -
CASE WHEN (MONTH(@BirthDate) > MONTH(GETDATE())) OR
(MONTH(@BirthDate) = MONTH(GETDATE()) AND
DAY(@BirthDate) > DAY(GETDATE()))
THEN 1 ELSE 0 END;
END;
Usage: in sql
SELECT FirstName, LastName, [Link](DateOfBirth) AS Age FROM
Students;
Table-Valued Function (PostgreSQL): in sql
CREATE FUNCTION get_student_enrollments(p_student_id INT)
RETURNS TABLE(course_name VARCHAR, grade CHAR) AS $$
BEGIN
RETURN QUERY
SELECT c.course_name, [Link]
FROM enrollments e
JOIN courses c ON e.course_id = c.course_id
WHERE e.student_id = p_student_id;
END;
$$ LANGUAGE plpgsql;
Usage: in sql
SELECT * FROM get_student_enrollments(1);
5.4 Triggers
A trigger is a set of SQL statements that automatically execute (fire) in response to a specific
event (e.g., INSERT, UPDATE, DELETE) on a particular table. Triggers are used for maintaining
complex integrity constraints, auditing changes, and replicating data.
Types of triggers:
BEFORE / AFTER the event.
FOR EACH ROW (row-level) or FOR EACH STATEMENT.
Example (PostgreSQL): A trigger to log all deletions from the Students [Link] sql
-- Create an audit log table
CREATE TABLE StudentsAuditLog (
LogID SERIAL PRIMARY KEY,
Operation VARCHAR(10),
StudentID INT,
OldFirstName VARCHAR(50),
OldLastName VARCHAR(50),
ChangeDate TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
-- Create the trigger function
CREATE OR REPLACE FUNCTION log_student_delete()
RETURNS TRIGGER AS $$
BEGIN
INSERT INTO StudentsAuditLog (Operation, StudentID, OldFirstName,
OldLastName)
VALUES ('DELETE', [Link], [Link], [Link]);
RETURN OLD;
END;
$$ LANGUAGE plpgsql;
-- Create the trigger
CREATE TRIGGER after_student_delete
AFTER DELETE ON Students
FOR EACH ROW
EXECUTE FUNCTION log_student_delete();
Example (SQL Server): in sql
CREATE TRIGGER trg_StudentDelete
ON Students
AFTER DELETE
AS
BEGIN
INSERT INTO StudentsAuditLog (Operation, StudentID, OldFirstName,
OldLastName, ChangeDate)
SELECT 'DELETE', [Link], [Link],
[Link], GETDATE()
FROM deleted;
END;
5.5 Transactions and ACID
A transaction is a unit of work performed against a database that follows the ACID
properties:
Atomicity: All operations in a transaction are completed successfully, or none are. If
any part fails, the entire transaction is rolled back.
Consistency: A transaction brings the database from one valid state to another,
maintaining all defined rules (constraints, triggers, etc.).
Isolation: Concurrent execution of transactions results in a state that is equivalent to
some serial execution. Isolation levels control how transactions interact.
Durability: Once a transaction is committed, its changes persist even in the event of a
system failure.
Transaction control statements:
BEGIN TRANSACTION (or START TRANSACTION)
COMMIT – makes all changes permanent.
ROLLBACK – undoes all changes since the transaction began.
SAVEPOINT – creates a point within a transaction to which you can later roll back.
Example: in sql
BEGIN TRANSACTION;
UPDATE Accounts SET Balance = Balance - 100 WHERE AccountID = 1;
UPDATE Accounts SET Balance = Balance + 100 WHERE AccountID = 2;
-- If both updates succeed:
COMMIT;
-- If something goes wrong:
-- ROLLBACK;
5.6 Concurrency Control and Isolation Levels
When multiple transactions run concurrently, problems can arise:
Dirty read: Reading uncommitted data from another transaction.
Non-repeatable read: In a transaction, reading the same row twice gets different
values because another transaction updated it in between.
Phantom read: In a transaction, running the same query twice gets different rows
because another transaction inserted or deleted rows.
To address these, SQL defines four isolation levels:
Isolation Level Dirty Read Non-repeatable Read Phantom Read
Read Possible Possible Possible
Uncommitted
Read Committed Not Possible Possible Possible
Repeatable Read Not Possible Not Possible Possible
Serializable Not Possible Not Possible Not Possible
Higher isolation levels provide more consistency but reduce concurrency. The default
isolation level varies by DBMS (e.g., PostgreSQL uses Read Committed, SQL Server uses
Read Committed, MySQL with InnoDB uses Repeatable Read).
You can set isolation level for a session or transaction:in sql
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
BEGIN TRANSACTION;
-- queries...
COMMIT;
5.7 Practical Examples and Exercises
Exercise 5.1: Create a trigger that prevents a student from enrolling in a course if they
already have a grade of 'F' in that course (assuming they can retake only if they passed). Also,
create a stored procedure to enroll a student with proper error handling.
Solution (PostgreSQL style):in sql
-- Trigger function to check enrollment eligibility
CREATE OR REPLACE FUNCTION check_enrollment_eligibility()
RETURNS TRIGGER AS $$
BEGIN
IF EXISTS (
SELECT 1 FROM Enrollments
WHERE StudentID = [Link] AND CourseID = [Link] AND
Grade = 'F'
) THEN
RAISE EXCEPTION 'Student % cannot re-enroll in course % due to
previous failing grade.', [Link], [Link];
END IF;
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
CREATE TRIGGER before_enrollment_insert
BEFORE INSERT ON Enrollments
FOR EACH ROW
EXECUTE FUNCTION check_enrollment_eligibility();
-- Stored procedure to enroll a student
CREATE OR REPLACE PROCEDURE enroll_student(p_student_id INT, p_course_id
INT)
LANGUAGE plpgsql
AS $$
BEGIN
INSERT INTO Enrollments (EnrollmentID, StudentID, CourseID,
EnrollmentDate)
VALUES (nextval('enrollment_seq'), p_student_id, p_course_id,
CURRENT_DATE);
EXCEPTION
WHEN OTHERS THEN
RAISE NOTICE 'Enrollment failed: %', SQLERRM;
END;
$$;
Part IV: Database Optimization and
Management
Chapter 6: Indexes and Query Optimization
6.1 Understanding Indexes
An index is a data structure that improves the speed of data retrieval operations on a table at
the cost of additional writes and storage space. Indexes are typically implemented as B-Trees
(balanced trees) or hash tables. They allow the database to quickly locate rows without
scanning the entire table.
When to use indexes:
Columns used frequently in WHERE, JOIN, ORDER BY, and GROUP BY clauses.
Columns with high selectivity (many distinct values).
Foreign key columns.
Types of indexes:
Single-column index: Index on one column.
Composite index: Index on multiple columns (order matters).
Unique index: Ensures all values are distinct; automatically created for primary key
and unique constraints.
Full-text index: For text search.
Spatial index: For geospatial data.
Partial index: Index on a subset of rows (PostgreSQL).
Expression index: Index on the result of an expression.
6.2 Creating and Managing Indexes
Syntax:
sql
CREATE INDEX index_name ON table_name (column1, column2, ...);
CREATE UNIQUE INDEX index_name ON table_name (column1, ...);
Examples:in sql
-- Single-column index on LastName
CREATE INDEX idx_students_lastname ON Students (LastName);
-- Composite index on (CourseID, Grade) for queries filtering by both
CREATE INDEX idx_enrollments_course_grade ON Enrollments (CourseID, Grade);
-- Partial index (PostgreSQL): only index active loans
CREATE INDEX idx_active_loans ON Loans (LoanDate) WHERE ReturnDate IS NULL;
Dropping an index: in sql
DROP INDEX idx_students_lastname;
Viewing indexes: Most DBMS provide system catalog views (e.g., pg_indexes in
PostgreSQL, [Link] in SQL Server).
6.3 Query Execution Plans (EXPLAIN)
To understand how a query is executed, use the EXPLAIN command. It shows the query plan
chosen by the optimizer, including which indexes are used, join methods, and estimated
costs.
Example (PostgreSQL): in sql
EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM Students WHERE LastName = 'Smith';
The output might show a sequential scan (full table scan) or an index scan. ANALYZE actually
runs the query and shows actual times and rows.
Interpreting an execution plan:
Seq Scan: Scans entire table (slow for large tables).
Index Scan: Uses an index to locate rows efficiently.
Bitmap Heap Scan: Combines multiple index scans.
Nested Loop, Hash Join, Merge Join: Different join algorithms.
Use EXPLAIN to identify bottlenecks and verify that indexes are being used.
6.4 Performance Tuning Tips
1. Index wisely: Too many indexes slow down INSERT, UPDATE, DELETE. Index columns
used in WHERE, JOIN, and ORDER BY.
2. Keep statistics up to date: The query optimizer uses statistics about data distribution.
Run ANALYZE (PostgreSQL) or UPDATE STATISTICS (SQL Server) regularly.
3. Write efficient queries:
o Avoid SELECT *; specify only needed columns.
o Use EXISTS instead of IN for subqueries when appropriate.
o Use UNION ALL instead of UNION if duplicates are not an issue (to avoid
sorting).
o Be careful with functions on indexed columns (e.g., WHERE YEAR(datecol) =
2024 might not use an index on datecol; rewrite as WHERE datecol >=
'2024-01-01' AND datecol < '2025-01-01').
4. Partition large tables: Split a table into smaller physical pieces based on a key (e.g.,
date). Queries that access only a few partitions can be much faster.
5. Use connection pooling: Reduce overhead of establishing connections.
6. Monitor and tune server parameters: Memory settings, cache sizes, etc.
Chapter 7: Security and User Management
7.1 Users and Roles
Most DBMS use a combination of users (individual accounts) and roles (groups of
privileges) to manage access.
Creating a user (MySQL): in sql
CREATE USER 'app_user'@'localhost' IDENTIFIED BY 'strong_password';
Creating a role (PostgreSQL):in sql
CREATE ROLE read_only;
Granting roles to users:in sql
GRANT read_only TO app_user;
7.2 Granting and Revoking Privileges
Privileges can be granted on databases, tables, views, procedures, etc. Common privileges:
SELECT, INSERT, UPDATE, DELETE, EXECUTE, ALL PRIVILEGES.
Examples (MySQL):in sql
-- Grant SELECT on all tables in school database
GRANT SELECT ON school.* TO 'app_user'@'localhost';
-- Grant specific privileges on Students table
GRANT INSERT, UPDATE ON [Link] TO 'app_user'@'localhost';
-- Revoke DELETE privilege
REVOKE DELETE ON school.* FROM 'app_user'@'localhost';
Examples (PostgreSQL):in sql
GRANT SELECT ON ALL TABLES IN SCHEMA public TO read_only;
GRANT INSERT, UPDATE ON Students TO app_user;
REVOKE DELETE ON Students FROM app_user;
7.3 Authentication Methods
Modern DBMS support various authentication methods:
Password authentication (plain or encrypted).
Certificate-based authentication (using SSL/TLS client certificates).
Kerberos / LDAP integration for enterprise environments.
Operating system authentication (e.g., PostgreSQL's peer authentication for local
connections).
It is essential to use strong passwords, enable SSL for network connections, and follow the
principle of least privilege (grant only necessary permissions).
Chapter 8: Backup and Recovery
8.1 Backup Strategies
Regular backups are critical to protect against data loss. Common backup types:
Full backup: A complete copy of the entire database.
Incremental backup: Backs up only data changed since the last backup (full or
incremental). Faster and smaller.
Differential backup: Backs up data changed since the last full backup. Larger than
incremental but faster to restore.
Backup tools:
MySQL: mysqldump for logical backups, mysqlbackup for physical backups.
PostgreSQL: pg_dump for logical backups, pg_basebackup for physical backups.
SQL Server: BACKUP DATABASE command, SQL Server Management Studio.
Oracle: RMAN (Recovery Manager).
Example: Logical backup with mysqldump
bash
mysqldump -u root -p school > school_backup.sql
Example: Physical backup with PostgreSQL
bash
pg_basebackup -D /backup/location -F t -z -P
8.2 Restoring Databases
Restoring from a logical backup (SQL dump) is straightforward:
MySQL:
bash
mysql -u root -p school < school_backup.sql
PostgreSQL:
bash
psql -d school -f school_backup.sql
Restoring from physical backups usually involves stopping the database server, copying files,
and restarting.
8.3 Point-in-Time Recovery (PITR)
Point-in-time recovery allows restoring a database to the state it was at any given moment.
This requires:
A base backup (full backup).
A continuous archive of transaction logs (WAL in PostgreSQL, binary logs in
MySQL).
In PostgreSQL, you enable WAL archiving and then restore by replaying logs up to a target
time.
Example steps for PostgreSQL PITR:
1. Take a base backup.
2. Restore the backup.
3. Create a [Link] (or set parameters in [Link]) specifying
restore_command and recovery_target_time.
4. Start the server; it will replay logs until the target time.
Part V: The Modern Database Landscape
Chapter 9: Industry Standards and Database Systems in
2026
The database world in 2026 is diverse, with relational databases still dominating but new
specialized engines gaining ground. Understanding the landscape helps in choosing the right
tool for the job.
9.1 Major RDBMS
Database Type/License Best Suited For Key Considerations
System
Oracle Commercial, Large-scale enterprise Extremely powerful and
multi-model applications with mission- feature-rich but expensive.
critical transactions, high Widely used in finance,
security, and complex telecom, and large
requirements. corporations.
MySQL Open-source Web applications, CMS Owned by Oracle. Popular
(GPL) / platforms (WordPress, for its simplicity and
Commercial Drupal), and startups. performance. MySQL 8.0+
Read-heavy workloads. includes many advanced
features.
PostgreSQL Open-source Organizations needing The favorite for new
(BSD-like) standards compliance, projects due to its
extensibility, complex robustness, active
queries, data integrity, and community, and
advanced data types permissive license.
(JSONB, geospatial, Supports many extensions
vector). (PostGIS, pgvector).
Microsoft Commercial Organizations heavily Powerful features,
SQL Server invested in the Microsoft excellent tooling (SSMS),
ecosystem, needing strong and a free Express edition.
BI tools and Windows Azure SQL Database is its
integration. cloud counterpart.
SQLite Public Domain Embedded systems, Serverless, zero-
mobile apps configuration, and the most
(iOS/Android), IoT deployed database in the
devices, local application world. Not for high-
storage. concurrency web
workloads.
9.2 MySQL vs PostgreSQL: Detailed Comparison
Feature MySQL PostgreSQL
SQL Good, but some Highly standards-compliant.
Compliance deviations.
ACID Yes (with InnoDB). Yes.
Compliance
JSON Support JSON data type, JSONB (binary JSON) with indexing and
functions. powerful operators.
Indexing B-Tree, R-Tree, Full-text. B-Tree, Hash, GiST, SP-GiST, GIN,
BRIN, etc. Very flexible.
Extensibility Limited. Extensive (custom data types, operators,
functions).
Concurrency Uses InnoDB MVCC; MVCC with excellent concurrency control.
good for read-heavy.
Replication Master-slave, group Physical and logical replication, bi-
replication. directional.
Partitioning Range, list, hash (MySQL Range, list, hash, and more via table
8.0). inheritance.
GIS Support Basic (MyISAM only). Excellent via PostGIS extension.
Use Case Simple, read-heavy web Complex, data-integrity-critical apps,
apps. analytics, geospatial, JSON workloads.
Verdict: Choose PostgreSQL for extensibility, data integrity, and complex queries. Choose
MySQL for simplicity and when you need a widely supported, lightweight solution for
standard web applications.
9.3 Cloud Databases and Services
Major cloud providers offer managed database services, abstracting away hardware and
software maintenance.
Provider Service Supported Engines
AWS RDS MySQL, PostgreSQL, Oracle, SQL
Server, MariaDB
Aurora MySQL and PostgreSQL compatible,
cloud-optimized
Azure Azure SQL Database SQL Server
Azure Database for MySQL, PostgreSQL
MySQL/PostgreSQL
Google Cloud SQL MySQL, PostgreSQL, SQL Server
Cloud
AlloyDB for PostgreSQL PostgreSQL-compatible, optimized for
performance
Benefits of cloud databases: automated backups, high availability, scalability, pay-as-you-go
pricing.
9.4 Specialized Databases
Beyond general-purpose RDBMS, specialized databases excel at particular workloads:
Data Warehousing: Snowflake, Google BigQuery, Amazon Redshift, Azure
Synapse. Optimized for analytical queries on massive datasets (OLAP).
NoSQL Document Stores: MongoDB, Couchbase. Store semi-structured data as
JSON-like documents, flexible schema.
Key-Value Stores: Redis, Memcached. In-memory, extremely fast, used for caching
and session management.
Search Engines: Elasticsearch, Apache Solr. Full-text search and log analytics.
Graph Databases: Neo4j, Amazon Neptune. Handle highly connected data (social
networks, recommendation engines).
Time-Series Databases: InfluxDB, TimescaleDB (built on PostgreSQL). Optimized
for time-stamped data (IoT, monitoring).
Vector Databases: Weaviate, Pinecone, pgvector (PostgreSQL extension). Store and
search vector embeddings for AI/ML applications.
9.5 Trends: NewSQL, AI Integration, SQL:2023
NewSQL: Databases that aim to provide the scalability of NoSQL while maintaining
ACID guarantees (e.g., Google Spanner, CockroachDB, YugabyteDB).
AI Integration: Many databases now incorporate machine learning capabilities (e.g.,
BigQuery ML, PostgreSQL with Madlib). Vector search support is exploding due to
generative AI.
SQL:2023: The latest SQL standard introduces new features like property graph
queries (SQL/PGQ) for native graph pattern matching, improved JSON support, and
more. Major DBMS are gradually adopting these.
Chapter 10: Learning Platforms and Tools
10.1 Online Courses and Bootcamps
Platform Key Features Best For Model
Official Microsoft content,
Microsoft Beginners wanting a free,
hands-on exercises using SQL
Learn (SQL structured, and industry- Free
Server, focus on enterprise BI
Server) recognized entry point.
and analytics.
Browser-based coding, focus on Learners preferring a
Subscription
Dataquest real datasets for data analysis, hands-on, code-first
(free to start)
includes portfolio-ready projects. approach.
Live online or in-person Beginners seeking real-
Noble Desktop
instruction, small classes, time feedback and a Paid
SQL Bootcamp
practical querying skills. short-term intensive.
Mobile-first, bite-sized lessons, Learners with busy
Mimo gamified, AI-powered assistance schedules who want to Freemium
for debugging. practice on mobile.
10.2 Interactive Practice Environments
SQLZOO: Interactive tutorials and exercises with real-world datasets (Nobel Prize,
world population). Great for rigorous practice.
SQLBolt: Short, interactive lessons that run in the browser. Quick way to learn or
refresh basics.
W3Schools SQL Tutorial: Go-to reference with a "Try it Yourself" editor for testing
syntax.
Select Star SQL: A narrative-driven tutorial where you solve a murder mystery using
SQL. Excellent for understanding query logic.
HackerRank / LeetCode: SQL problem sets for interview preparation, ranging from
easy to hard.
PGExercises: A site dedicated to PostgreSQL exercises, with a focus on practical
problems.
10.3 Database Tools (GUI)
DBeaver: Free, universal database tool supporting all major DBMS. Great for
development and administration.
pgAdmin: The official PostgreSQL administration and development platform.
MySQL Workbench: Official GUI for MySQL, including data modeling, SQL
development, and administration.
Azure Data Studio: Cross-platform tool for SQL Server and Azure SQL, with
modern interface and extensions.
DataGrip: JetBrains' commercial IDE for databases and SQL, powerful but paid.
Part VI: Comprehensive Lab Exercises
Lab 1: Car Database – Foundational Queries
Objective: Practice basic SELECT statements, filtering with WHERE, using DISTINCT, sorting
with ORDER BY, and simple aggregations.
Dataset: A simplified vehicle dataset with three tables: Car_specs (VIN, brand, model, year,
color), Car_priceloc (VIN, state, price), and Car_condition (VIN, mileage). (Adapted
from University of Virginia exercise.)
Setup: Create the tables and insert sample data using the script below.
sql
-- Create tables
CREATE TABLE Car_specs (
vin VARCHAR(20) PRIMARY KEY,
brand VARCHAR(50),
model VARCHAR(50),
year INT,
color VARCHAR(30)
);
CREATE TABLE Car_priceloc (
vin VARCHAR(20),
state VARCHAR(2),
price DECIMAL(10,2),
FOREIGN KEY (vin) REFERENCES Car_specs(vin)
);
CREATE TABLE Car_condition (
vin VARCHAR(20) PRIMARY KEY,
mileage INT,
FOREIGN KEY (vin) REFERENCES Car_specs(vin)
);
-- Insert sample data
INSERT INTO Car_specs VALUES
('1HGCM82633A123456', 'Toyota', 'Camry', 2019, 'Blue'),
('1HGCM82633A123457', 'Honda', 'Civic', 2020, 'Red'),
('1HGCM82633A123458', 'Ford', 'Focus', 2018, 'Black'),
('1HGCM82633A123459', 'Toyota', 'Corolla', 2021, 'White'),
('1HGCM82633A123460', 'Honda', 'Accord', 2019, 'Blue');
INSERT INTO Car_priceloc VALUES
('1HGCM82633A123456', 'CA', 22000.00),
('1HGCM82633A123457', 'CA', 19000.00),
('1HGCM82633A123458', 'TX', 15000.00),
('1HGCM82633A123459', 'NY', 18500.00),
('1HGCM82633A123460', 'FL', 21000.00);
INSERT INTO Car_condition VALUES
('1HGCM82633A123456', 35000),
('1HGCM82633A123457', 28000),
('1HGCM82633A123458', 42000),
('1HGCM82633A123459', 15000),
('1HGCM82633A123460', 31000);
Instructions: Write SQL queries to answer the following:
1. List all distinct vehicle brands.
2. Find the VIN, brand, and model of all vehicles that are blue.
3. List all vehicles (VIN, price, state) priced between $18,000 and $22,000. Sort by price
descending.
4. Find all Toyotas manufactured after 2018.
5. Count how many vehicles are available in each color.
6. Calculate the average price of vehicles by brand.
7. Find the vehicle(s) with the lowest mileage.
8. List vehicles along with their price and mileage. Show brand, model, year, price,
mileage.
Solutions:
sql
-- 1. Distinct brands
SELECT DISTINCT brand FROM Car_specs;
-- 2. Blue vehicles
SELECT vin, brand, model FROM Car_specs WHERE color = 'Blue';
-- 3. Vehicles priced between 18000 and 22000, sorted descending
SELECT [Link], [Link], [Link]
FROM Car_priceloc cp
WHERE [Link] BETWEEN 18000 AND 22000
ORDER BY [Link] DESC;
-- 4. Toyotas after 2018
SELECT vin, brand, model, year FROM Car_specs WHERE brand = 'Toyota' AND
year > 2018;
-- 5. Count vehicles by color
SELECT color, COUNT(*) AS count FROM Car_specs GROUP BY color;
-- 6. Average price by brand
SELECT [Link], AVG([Link]) AS avg_price
FROM Car_specs cs
JOIN Car_priceloc cp ON [Link] = [Link]
GROUP BY [Link];
-- 7. Vehicle(s) with lowest mileage
SELECT [Link], [Link], [Link], [Link]
FROM Car_specs cs
JOIN Car_condition cc ON [Link] = [Link]
WHERE [Link] = (SELECT MIN(mileage) FROM Car_condition);
-- 8. Vehicles with price and mileage
SELECT [Link], [Link], [Link], [Link], [Link]
FROM Car_specs cs
JOIN Car_priceloc cp ON [Link] = [Link]
JOIN Car_condition cc ON [Link] = [Link];
Lab 2: SQL Murder Mystery – Joins and Subqueries
Objective: Practice joins, subqueries, and logical deduction in an engaging scenario.
Dataset: The SQL Murder Mystery database from NUKnightLab (SQLite). It contains tables:
crime_scene_report, person, drivers_license, get_fit_now_member,
facebook_event_checkin, interview, and others.
Setup: Download the SQLite database file ([Link]) from the GitHub
repository or use an online version. Load it into a SQLite browser or command-line tool.
Scenario: A crime has taken place on Jan.15, 2018 in SQL City. Solve the mystery.
Instructions: You must use your SQL skills to find the murderer. Follow the clues:
1. Start your investigation by retrieving the crime scene report for the murder that
occurred on Jan.15, 2018 in SQL City.
sql
SELECT * FROM crime_scene_report
WHERE type = 'murder' AND city = 'SQL City' AND date = 20180115;
2. The report mentions two witnesses. Find their names and addresses. One witness lives
at the last house on "Northwestern Dr". The other is named Annabel and lives on
"Franklin Ave".
3. Query the interview transcripts for these witnesses. They both mention someone
named Jeremy Bowers.
4. Find Jeremy Bowers based on the clues: he's a gym member with a membership
number starting with "48Z", and his membership status is "gold". He also has a
driver's license with plate number including "H42W".
5. Check the interview of Jeremy Bowers (after you find him, there's a separate
interview table). He confesses he was hired by a mysterious woman.
6. Find the true mastermind. She has red hair, drives a Tesla Model S, and attended the
SQL Symphony Concert 3 times in December 2017.
Solutions (with commentary):
sql
-- Step 1: Crime scene report
SELECT * FROM crime_scene_report
WHERE type = 'murder' AND city = 'SQL City' AND date = 20180115;
-- Result: Security footage shows two witnesses. First witness lives at
last house on Northwestern Dr. Second witness is Annabel, lives on Franklin
Ave.
-- Step 2: Find witnesses
-- First witness
SELECT * FROM person
WHERE address_street_name = 'Northwestern Dr'
ORDER BY address_number DESC
LIMIT 1;
-- ID: 14887, name: Morty Schapiro, license_id: 118009
-- Second witness
SELECT * FROM person
WHERE address_street_name = 'Franklin Ave' AND name LIKE '%Annabel%';
-- ID: 16371, name: Annabel Miller, license_id: 490173
-- Step 3: Interview transcripts
SELECT * FROM interview WHERE person_id = 14887;
-- "I heard a gunshot and then saw a man run out. He had a 'Get Fit Now
Gym' bag. The membership number started with '48Z' and he was a gold
member."
SELECT * FROM interview WHERE person_id = 16371;
-- "I saw the man at the gym on January 9th. He was with a woman. His gym
membership number is '48Z7'."
-- Step 4: Find Jeremy Bowers
-- Gym membership clue: membership id starting with '48Z' and status 'gold'
SELECT * FROM get_fit_now_member
WHERE id LIKE '48Z%' AND membership_status = 'gold';
-- Two members: id 48Z7A (Joe Germuska) and 48Z55 (Jeremy Bowers). Which
one?
-- Use driver's license plate clue: from person table join drivers_license.
SELECT [Link], [Link], d.plate_number
FROM person p
JOIN drivers_license d ON p.license_id = [Link]
JOIN get_fit_now_member g ON [Link] = g.person_id
WHERE [Link] LIKE '48Z%' AND g.membership_status = 'gold' AND d.plate_number
LIKE '%H42W%';
-- Result: Jeremy Bowers (person_id 67318) with plate 0H42W2.
-- Step 5: Interview of Jeremy Bowers
SELECT * FROM interview WHERE person_id = 67318;
-- "I was hired by a woman. She has red hair and drives a Tesla Model S. I
know she attended the SQL Symphony Concert 3 times in December 2017."
-- Step 6: Find the mastermind
-- Need to join person, drivers_license, and facebook_event_checkin
SELECT [Link], [Link], d.hair_color, d.car_make, d.car_model
FROM person p
JOIN drivers_license d ON p.license_id = [Link]
JOIN facebook_event_checkin f ON [Link] = f.person_id
WHERE [Link] = 'female'
AND d.hair_color = 'red'
AND d.car_make = 'Tesla'
AND d.car_model = 'Model S'
AND f.event_name = 'SQL Symphony Concert'
AND [Link] BETWEEN 20171201 AND 20171231
GROUP BY [Link]
HAVING COUNT(*) = 3;
-- Result: Miranda Priestly (id 99716)
-- Congratulations! You found the murderer and the mastermind.
Lab 3: AdventureWorks Sales Analysis – Advanced
Aggregation
Objective: Master aggregation, window functions, date functions, and complex joins on a
realistic enterprise dataset.
Dataset: Microsoft's AdventureWorks sample database (OLTP version). It models a bicycle
manufacturer and contains tables for products, sales, customers, employees, etc. Available for
SQL Server, Azure SQL Database, and also for PostgreSQL (via conversion). We'll assume
SQL Server for this lab.
Setup: Restore the AdventureWorks database to your SQL Server instance. Instructions can
be found on Microsoft's GitHub. Alternatively, use the Azure SQL Database sample.
Instructions: Write SQL queries to answer the following business questions.
1. Monthly Sales Trend: Calculate the total sales amount (TotalDue) for each month
and year from the [Link] table. Order chronologically.
2. Top Selling Products: Find the top 5 best-selling products by total quantity sold
(OrderQty) across all time. Display product name, total quantity, and rank.
3. Product Reorder Status: For each product in [Link], compute the
ratio of SafetyStockLevel to DaysToManufacture. Create a "Reorder Priority"
status: 'High' if ratio > 50, 'Normal' if between 10 and 50, 'Low' if <10 (and
DaysToManufacture > 0). Show product name, safety stock, days to manufacture,
ratio, and priority.
4. Customer Lifetime Value: For each customer, calculate their total purchase amount,
total number of orders, and average order value. Show only customers with at least 5
orders, sorted by total purchase amount descending.
5. Salesperson Performance: For each salesperson, show their total sales amount for
2013 and 2014, and the year-over-year growth percentage. Assume salesperson is
linked via [Link].
6. Running Total of Sales per Territory: For each territory, show the monthly sales
amount and a running total within the territory ordered by month.
7. Product with Highest Sales in Each Category: Using window functions, for each
product category, find the product with the highest total sales amount.
Solutions (SQL Server syntax):
sql
-- 1. Monthly Sales Trend
SELECT
YEAR(OrderDate) AS SalesYear,
MONTH(OrderDate) AS SalesMonth,
SUM(TotalDue) AS TotalSales
FROM [Link]
GROUP BY YEAR(OrderDate), MONTH(OrderDate)
ORDER BY SalesYear, SalesMonth;
-- 2. Top 5 Selling Products by Quantity
SELECT TOP 5
[Link] AS ProductName,
SUM([Link]) AS TotalQuantity,
RANK() OVER (ORDER BY SUM([Link]) DESC) AS Rank
FROM [Link] p
JOIN [Link] sod ON [Link] = [Link]
GROUP BY [Link], [Link]
ORDER BY TotalQuantity DESC;
-- 3. Product Reorder Priority
SELECT
Name,
SafetyStockLevel,
DaysToManufacture,
CASE
WHEN DaysToManufacture > 0 THEN SafetyStockLevel * 1.0 /
DaysToManufacture
ELSE NULL
END AS Ratio,
CASE
WHEN DaysToManufacture > 0 AND SafetyStockLevel * 1.0 /
DaysToManufacture > 50 THEN 'High'
WHEN DaysToManufacture > 0 AND SafetyStockLevel * 1.0 /
DaysToManufacture BETWEEN 10 AND 50 THEN 'Normal'
WHEN DaysToManufacture > 0 AND SafetyStockLevel * 1.0 /
DaysToManufacture < 10 THEN 'Low'
ELSE 'N/A'
END AS ReorderPriority
FROM [Link]
WHERE DaysToManufacture > 0; -- exclude products that are not manufactured
-- 4. Customer Lifetime Value
SELECT
[Link],
CONCAT([Link], ' ', [Link]) AS CustomerName,
COUNT([Link]) AS OrderCount,
SUM([Link]) AS TotalPurchase,
AVG([Link]) AS AvgOrderValue
FROM [Link] c
JOIN [Link] p ON [Link] = [Link]
JOIN [Link] soh ON [Link] = [Link]
GROUP BY [Link], [Link], [Link]
HAVING COUNT([Link]) >= 5
ORDER BY TotalPurchase DESC;
-- 5. Salesperson Performance YoY Growth
WITH Sales2013 AS (
SELECT
[Link],
SUM([Link]) AS Sales2013
FROM [Link] soh
WHERE YEAR([Link]) = 2013
GROUP BY [Link]
),
Sales2014 AS (
SELECT
[Link],
SUM([Link]) AS Sales2014
FROM [Link] soh
WHERE YEAR([Link]) = 2014
GROUP BY [Link]
)
SELECT
ISNULL([Link], [Link]) AS SalesPersonID,
s13.Sales2013,
s14.Sales2014,
CASE
WHEN s13.Sales2013 IS NULL OR s13.Sales2013 = 0 THEN NULL
ELSE (s14.Sales2014 - s13.Sales2013) / s13.Sales2013 * 100.0
END AS YoYGrowthPercent
FROM Sales2013 s13
FULL OUTER JOIN Sales2014 s14 ON [Link] = [Link]
ORDER BY YoYGrowthPercent DESC;
-- 6. Running Total of Sales per Territory
WITH TerritoryMonthlySales AS (
SELECT
[Link],
YEAR([Link]) AS SalesYear,
MONTH([Link]) AS SalesMonth,
SUM([Link]) AS MonthlySales
FROM [Link] soh
GROUP BY [Link], YEAR([Link]), MONTH([Link])
)
SELECT
TerritoryID,
SalesYear,
SalesMonth,
MonthlySales,
SUM(MonthlySales) OVER (PARTITION BY TerritoryID ORDER BY SalesYear,
SalesMonth) AS RunningTotal
FROM TerritoryMonthlySales
ORDER BY TerritoryID, SalesYear, SalesMonth;
-- 7. Product with Highest Sales in Each Category
WITH ProductCategorySales AS (
SELECT
[Link] AS CategoryName,
[Link] AS ProductName,
SUM([Link]) AS TotalSales,
ROW_NUMBER() OVER (PARTITION BY [Link] ORDER BY
SUM([Link]) DESC) AS rn
FROM [Link] p
JOIN [Link] ps ON [Link] =
[Link]
JOIN [Link] pc ON [Link] =
[Link]
JOIN [Link] sod ON [Link] = [Link]
GROUP BY [Link], [Link], [Link]
)
SELECT CategoryName, ProductName, TotalSales
FROM ProductCategorySales
WHERE rn = 1
ORDER BY TotalSales DESC;
Additional Practice Problems
Problem 1: Using the library database, write a query to find members who have borrowed
more than the average number of books borrowed per member.
Solution:in sql
WITH MemberBorrowCounts AS (
SELECT MemberID, COUNT(*) AS BorrowCount
FROM Loans
GROUP BY MemberID
)
SELECT m.*, [Link]
FROM Members m
JOIN MemberBorrowCounts mbc ON [Link] = [Link]
WHERE [Link] > (SELECT AVG(BorrowCount) FROM MemberBorrowCounts);
Problem 2: For each book, show its title and the number of times it has been borrowed, but
only include books that have been borrowed at least once. Also show the book's rank by
borrow count.
Solution: in sql
SELECT
[Link],
COUNT([Link]) AS BorrowCount,
RANK() OVER (ORDER BY COUNT([Link]) DESC) AS Rank
FROM Books b
LEFT JOIN Loans l ON [Link] = [Link]
GROUP BY [Link], [Link]
HAVING COUNT([Link]) > 0;
Problem 3: Write a trigger that automatically updates a LastModified column in the
Students table whenever a row is updated. Also create a StudentHistory table to track
changes.
Solution (PostgreSQL): in sql
-- Add LastModified column
ALTER TABLE Students ADD COLUMN LastModified TIMESTAMP DEFAULT
CURRENT_TIMESTAMP;
-- Create history table
CREATE TABLE StudentHistory (
HistoryID SERIAL PRIMARY KEY,
StudentID INT,
FirstName VARCHAR(50),
LastName VARCHAR(50),
Email VARCHAR(100),
ChangeType CHAR(1), -- 'U' for update, 'D' for delete
ChangeDate TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
-- Trigger function for update
CREATE OR REPLACE FUNCTION update_student_lastmodified()
RETURNS TRIGGER AS $$
BEGIN
[Link] = CURRENT_TIMESTAMP;
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
CREATE TRIGGER before_student_update
BEFORE UPDATE ON Students
FOR EACH ROW
EXECUTE FUNCTION update_student_lastmodified();
-- Trigger function for logging updates
CREATE OR REPLACE FUNCTION log_student_update()
RETURNS TRIGGER AS $$
BEGIN
INSERT INTO StudentHistory (StudentID, FirstName, LastName, Email,
ChangeType)
VALUES ([Link], [Link], [Link], [Link], 'U');
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
CREATE TRIGGER after_student_update
AFTER UPDATE ON Students
FOR EACH ROW
EXECUTE FUNCTION log_student_update();
Appendices
Appendix A: SQL Syntax Quick Reference
Data Definition:
CREATE TABLE table_name (col_defs)
ALTER TABLE table_name ADD|DROP|MODIFY column
DROP TABLE table_name
Data Manipulation:
INSERT INTO table_name (cols) VALUES (vals)
SELECT cols FROM table WHERE condition GROUP BY cols HAVING condition
ORDER BY cols
UPDATE table SET col = val WHERE condition
DELETE FROM table WHERE condition
Joins:
SELECT ... FROM t1 JOIN t2 ON [Link] = [Link]
LEFT JOIN, RIGHT JOIN, FULL JOIN
Aggregation:
COUNT(*), SUM(col), AVG(col), MIN(col), MAX(col)
GROUP BY col
HAVING condition
Constraints:
PRIMARY KEY, FOREIGN KEY, UNIQUE, NOT NULL, CHECK, DEFAULT
Transactions:
BEGIN TRANSACTION, COMMIT, ROLLBACK, SAVEPOINT
Indexes:
CREATE INDEX idx_name ON table (col)
DROP INDEX idx_name
Views:
CREATE VIEW view_name AS select_statement
DROP VIEW view_name
Stored Procedures/Functions:
CREATE PROCEDURE name (params) AS ... (syntax varies)
CREATE FUNCTION name (params) RETURNS type AS ...
Triggers:
CREATE TRIGGER name {BEFORE|AFTER} event ON table FOR EACH ROW
EXECUTE ...
Appendix B: Glossary
ACID: Atomicity, Consistency, Isolation, Durability – properties of database
transactions.
Attribute: A column in a relation.
Candidate Key: A set of columns that uniquely identifies a tuple; one candidate is
chosen as primary key.
Constraint: A rule enforced on data columns (e.g., primary key, foreign key).
Data Definition Language (DDL): SQL statements that define the database
structure.
Data Manipulation Language (DML): SQL statements that manipulate data.
Foreign Key: A column that references a primary key in another table.
Index: A data structure that speeds up data retrieval.
Join: Combining rows from two or more tables based on related columns.
Normalization: The process of organizing data to reduce redundancy and improve
integrity.
Primary Key: A column (or set) that uniquely identifies each row.
Query: A request for data or information from a database.
Relation: A table in the relational model.
Schema: The structure that defines the organization of data (tables, views, etc.).
Stored Procedure: A set of SQL statements stored on the server for reuse.
Transaction: A unit of work that is atomic, consistent, isolated, and durable.
Trigger: A set of instructions that automatically execute in response to certain events.
Tuple: A row in a relation.
View: A virtual table based on a stored query.
Appendix C: Further Reading and Resources
Books:
Database System Concepts by Silberschatz, Korth, Sudarshan
SQL Performance Explained by Markus Winand
The Art of SQL by Stephane Faroult
Online Documentation:
PostgreSQL Official Docs: [Link]
MySQL Reference Manual: [Link]
Microsoft SQL Server Docs: [Link]
Websites and Tutorials:
Use The Index, Luke: [Link]
Modern SQL: [Link]
SQLZOO: [Link]
PGExercises: [Link]
Community:
Stack Overflow (tag: sql)
Reddit r/SQL
Database Administrators Stack Exchange