Module 3: SQL for Data Analysis (8 Hours)
SQL Basics – Detailed Notes
➢ SQL (Structured Query Language) is a standard language used to manage and
analyze data stored in databases. SQL is widely used in data analytics, business
intelligence, and software development for retrieving and manipulating
structured data.
➢ Most modern organizations store their data in databases and use SQL to analyze
that data efficiently.
➢ SQL is used with database management systems such as MySQL, PostgreSQL,
Microsoft SQL Server, and Oracle Database.
➢ Understanding SQL basics is essential for data analysts because it helps them
retrieve and analyze large amounts of data quickly.
1. What is a Database
Meaning
A database is an organized collection of data that is stored electronically and can be
easily accessed, managed, and updated.
In simple terms, a database is a structured place where information is stored so that it
can be retrieved and analyzed whenever needed.
For example, organizations store different types of information in databases such as:
• Customer information
• Product details
• Sales records
• Employee data
Instead of storing data in paper files, companies use databases to manage data
efficiently.
Example of a Database
Consider a student database.
Student_ID Name Course Marks
101 Rahul Data Analytics 85
102 Amit Data Analytics 90
103 Neha Data Analytics 88
This information is stored inside a database system.
Characteristics of a Database
A database usually has the following characteristics:
Organized Structure
Data is stored in a structured format so it can be easily searched and updated.
Data Consistency
Data remains accurate and consistent across the system.
Data Security
Access to data can be controlled using permissions.
Data Sharing
Multiple users can access the database at the same time.
Efficient Data Retrieval
Users can retrieve specific information quickly using queries.
Types of Databases
Some common types of databases include:
Relational Databases
Data is stored in tables with rows and columns.
Example systems:
• MySQL
• PostgreSQL
• SQL Server
NoSQL Databases
Used for unstructured or semi-structured data.
Example systems:
• MongoDB
• Cassandra
In data analytics, relational databases are most commonly used.
2. RDBMS Concept
Meaning
RDBMS stands for Relational Database Management System.
An RDBMS is a software system used to create, manage, and manipulate relational
databases.
In an RDBMS, data is stored in tables, and relationships are established between tables
using keys.
Examples of RDBMS software include:
• MySQL
• PostgreSQL
• Microsoft SQL Server
• Oracle Database
Features of RDBMS
Table-Based Structure
Data is stored in tables.
Relationships Between Tables
Tables can be connected using keys such as primary keys and foreign keys.
Data Integrity
RDBMS ensures data accuracy and consistency.
SQL Support
Users can interact with the database using SQL queries.
Multi-User Access
Multiple users can access the database simultaneously.
Example of RDBMS Structure
Suppose a company stores data in two tables:
Employees Table
Employee_ID Name Department_ID
101 Rahul 1
102 Amit 2
Departments Table
Department_ID Department_Name
1 Sales
2 Marketing
Here, Department_ID links the two tables together.
This relationship allows users to retrieve data from multiple tables efficiently.
3. Tables, Rows, and Columns
In relational databases, data is organized into tables.
A table is made up of rows and columns.
Tables
Meaning
A table is a structured collection of related data stored in rows and columns.
Each table represents a specific entity such as:
• Customers
• Employees
• Products
• Orders
Example Table
Customer_ID Name City Age
101 Rahul Delhi 25
102 Amit Mumbai 30
103 Neha Indore 28
This table stores customer information.
Rows
Meaning
A row represents a single record in a table.
Each row contains information about one entity.
Example:
Customer_ID Name City Age
101 Rahul Delhi 25
This row represents one customer record.
Rows are also called records or tuples.
Columns
Meaning
A column represents an attribute or field of the data.
Each column contains a specific type of information.
Example:
Column Name Description
Customer_ID Unique identifier
Name Customer name
Column Name Description
City Customer location
Age Customer age
Columns define the structure of the table.
Example: Table Structure
Table: Employees
Employee_ID Name Department Salary
101 Rahul HR 35000
102 Amit IT 45000
103 Neha Finance 40000
Here:
Table → Employees
Columns → Employee_ID, Name, Department, Salary
Rows → Each employee record
Importance of Tables in Data Analysis
Tables help organize data in a way that makes it easy to:
• Retrieve information
• Filter and sort data
• Perform calculations
• Generate reports
SQL queries allow analysts to extract meaningful insights from tables.
SQL Commands –
SQL (Structured Query Language) is used to retrieve, filter, analyze, and manage data
stored in relational databases. Data analysts use SQL commands to extract useful
information from large datasets stored in database systems such as MySQL,
PostgreSQL, Microsoft SQL Server, and Oracle Database.
These SQL commands help analysts perform tasks like retrieving specific records,
filtering data, performing calculations, grouping data, and combining tables.
1. Data Query Commands
Data Query commands are used to retrieve data from database tables.
SELECT
Meaning
The SELECT statement is used to retrieve data from a database table.
It is the most commonly used SQL command.
Syntax
SELECT column_name
FROM table_name;
Example
Table: Students
ID Name Marks
1 Rahul 85
2 Amit 90
3 Neha 88
Query:
SELECT Name, Marks
FROM Students;
Result:
Name Marks
Rahul 85
Amit 90
Neha 88
Select All Columns
SELECT *
FROM Students;
* means all columns.
WHERE
Meaning
The WHERE clause is used to filter records based on conditions.
Syntax
SELECT column_name
FROM table_name
WHERE condition;
Example
SELECT *
FROM Students
WHERE Marks > 85;
Result → Only students with marks greater than 85.
ORDER BY
Meaning
The ORDER BY clause sorts the result in ascending or descending order.
Syntax
SELECT column_name
FROM table_name
ORDER BY column_name ASC|DESC;
Example
SELECT *
FROM Students
ORDER BY Marks DESC;
This sorts students by marks from highest to lowest.
ASC → Ascending
DESC → Descending
LIMIT
Meaning
The LIMIT clause restricts the number of rows returned by a query.
Syntax
SELECT *
FROM table_name
LIMIT number;
Example
SELECT *
FROM Students
LIMIT 3;
This returns only first 3 records.
2. Filtering Conditions
Filtering helps in retrieving specific rows based on conditions.
AND
Meaning
The AND operator returns records only when all conditions are true.
Example
SELECT *
FROM Employees
WHERE Salary > 30000 AND Department = 'IT';
This returns employees who:
• earn more than 30000
• work in IT department
OR
Meaning
The OR operator returns records when any condition is true.
Example
SELECT *
FROM Employees
WHERE Department = 'HR' OR Department = 'IT';
This returns employees from either department.
NOT
Meaning
The NOT operator reverses a condition.
Example
SELECT *
FROM Employees
WHERE NOT Department = 'Sales';
This returns employees not working in Sales department.
BETWEEN
Meaning
The BETWEEN operator selects values within a range.
Syntax
SELECT *
FROM table_name
WHERE column BETWEEN value1 AND value2;
Example
SELECT *
FROM Students
WHERE Marks BETWEEN 70 AND 90;
This returns students whose marks are between 70 and 90.
LIKE
Meaning
The LIKE operator is used for pattern matching in text data.
Wildcards
% → multiple characters
_ → single character
Example
SELECT *
FROM Customers
WHERE Name LIKE 'A%';
This returns names starting with A.
Example results:
Amit
Anita
Arjun
IN
Meaning
The IN operator allows selecting multiple values in a WHERE clause.
Example
SELECT *
FROM Employees
WHERE Department IN ('HR','IT','Finance');
This returns employees from the specified departments.
3. Aggregate Functions
Aggregate functions perform calculations on multiple rows of data and return a single
result.
COUNT
Meaning
Counts the number of rows.
Example
SELECT COUNT(*)
FROM Employees;
Result → total number of employees.
SUM
Meaning
Calculates the total sum of a numeric column.
Example
SELECT SUM(Salary)
FROM Employees;
Result → total salary expense.
AVG
Meaning
Calculates the average value.
Example
SELECT AVG(Salary)
FROM Employees;
Result → average salary.
MIN
Meaning
Returns the smallest value.
Example
SELECT MIN(Salary)
FROM Employees;
Result → lowest salary.
MAX
Meaning
Returns the highest value.
Example
SELECT MAX(Salary)
FROM Employees;
Result → highest salary.
4. Grouping Data
Grouping allows analysts to summarize data by categories.
GROUP BY
Meaning
The GROUP BY clause groups rows that have the same values.
Example
SELECT Department, COUNT(*)
FROM Employees
GROUP BY Department;
Result:
Department Employees
HR 5
IT 8
Finance 4
HAVING
Meaning
The HAVING clause filters grouped results.
It is similar to WHERE but used after GROUP BY.
Example
SELECT Department, COUNT(*)
FROM Employees
GROUP BY Department
HAVING COUNT(*) > 5;
This shows departments with more than 5 employees.
5. SQL Joins
Joins are used to combine data from multiple tables.
They are very important in data analysis because real-world databases usually contain
multiple related tables.
INNER JOIN
Meaning
An INNER JOIN returns records that have matching values in both tables.
Example
Table: Employees
Emp_ID Name Dept_ID
1 Rahul 1
2 Amit 2
Table: Departments
Dept_ID Dept_Name
1 HR
2 IT
Query:
SELECT [Link], Departments.Dept_Name
FROM Employees
INNER JOIN Departments
ON Employees.Dept_ID = Departments.Dept_ID;
Result:
Name Department
Rahul HR
Amit IT
LEFT JOIN
Meaning
A LEFT JOIN returns:
• all records from the left table
• matching records from the right table
If no match exists, NULL values appear.
Example
SELECT [Link], Departments.Dept_Name
FROM Employees
LEFT JOIN Departments
ON Employees.Dept_ID = Departments.Dept_ID;
This returns all employees even if they have no department.
RIGHT JOIN
Meaning
A RIGHT JOIN returns:
• all records from the right table
• matching records from the left table
Example
SELECT [Link], Departments.Dept_Name
FROM Employees
RIGHT JOIN Departments
ON Employees.Dept_ID = Departments.Dept_ID;
This returns all departments even if no employee belongs to them.
FULL JOIN
Meaning
A FULL JOIN returns:
• all records from both tables
• matched records where possible
• NULL values where no match exists
Example
SELECT [Link], Departments.Dept_Name
FROM Employees
FULL JOIN Departments
ON Employees.Dept_ID = Departments.Dept_ID;
This combines all records from both tables.
Advanced SQL for Data Analysis
➢ Advanced SQL techniques help analysts perform complex data analysis and
advanced calculations directly inside databases. These techniques allow users
to work with large datasets efficiently and extract deeper insights.
➢ Advanced SQL features are supported by most relational database systems such
as MySQL, PostgreSQL, Microsoft SQL Server, and Oracle Database.
➢ The main advanced SQL concepts include Subqueries, CASE statements, and
Window Functions.
1. Subqueries
Meaning
A subquery is a query written inside another SQL query.
It is also called an inner query or nested query.
The inner query executes first and its result is used by the outer query.
Subqueries help solve complex problems where data needs to be filtered or calculated
based on another query.
Basic Syntax
SELECT column_name
FROM table_name
WHERE column_name OPERATOR
(SELECT column_name FROM table_name WHERE condition);
Example
Table: Employees
Emp_ID Name Salary
1 Rahul 30000
2 Amit 50000
3 Neha 45000
Query: Find employees earning more than average salary
SELECT Name, Salary
FROM Employees
WHERE Salary > (
SELECT AVG(Salary)
FROM Employees
);
Explanation
1. Inner query calculates average salary.
2. Outer query selects employees whose salary is greater than the average.
Types of Subqueries
Single Row Subquery
Returns only one value.
Example:
SELECT *
FROM Employees
WHERE Salary > (SELECT AVG(Salary) FROM Employees);
Multiple Row Subquery
Returns multiple values.
Example:
SELECT *
FROM Employees
WHERE Department_ID IN (
SELECT Department_ID
FROM Departments
);
Correlated Subquery
A correlated subquery depends on the outer query and runs once for each row.
Example:
SELECT Name, Salary
FROM Employees e
WHERE Salary >
(SELECT AVG(Salary)
FROM Employees
WHERE Department_ID = e.Department_ID);
This finds employees earning more than the average salary in their department.
Advantages of Subqueries
• Simplifies complex queries
• Improves readability
• Helps perform comparisons with aggregated values
• Useful for filtering and calculations
2. CASE Statements
Meaning
A CASE statement is used to perform conditional logic in SQL queries.
It works similarly to IF–ELSE statements in programming languages.
CASE allows SQL to categorize data or create new columns based on conditions.
Syntax
SELECT column_name,
CASE
WHEN condition1 THEN result1
WHEN condition2 THEN result2
ELSE result
END
FROM table_name;
Example
Table: Students
Name Marks
Rahul 85
Amit 65
Neha 45
Query:
SELECT Name, Marks,
CASE
WHEN Marks >= 75 THEN 'Distinction'
WHEN Marks >= 50 THEN 'Pass'
ELSE 'Fail'
END AS Result
FROM Students;
Result:
Name Marks Result
Rahul 85 Distinction
Amit 65 Pass
Neha 45 Fail
Uses of CASE Statements
CASE statements are widely used for:
• Categorizing data
• Creating conditional columns
• Performing logical calculations
• Creating reports and dashboards
CASE with Aggregation
CASE can also be used with aggregate functions.
Example:
SELECT
SUM(CASE WHEN Department='IT' THEN Salary ELSE 0 END) AS IT_Total_Salary
FROM Employees;
This calculates total salary for the IT department.
3. Window Functions :
Meaning
A Window Function performs calculations across a set of table rows that are related to
the current row.
Unlike aggregate functions, window functions do not collapse rows.
They allow calculations like ranking, running totals, and moving averages.
Window functions are very useful in data analytics and business intelligence.
Basic Syntax
SELECT column_name,
FUNCTION() OVER (PARTITION BY column_name ORDER BY column_name)
FROM table_name;
Common Window Functions
Some common window functions include:
• ROW_NUMBER()
• RANK()
• DENSE_RANK()
• SUM() OVER()
• AVG() OVER()
ROW_NUMBER()
Meaning
Assigns a unique row number to each record.
Example:
SELECT Name, Salary,
ROW_NUMBER() OVER (ORDER BY Salary DESC) AS Rank_Number
FROM Employees;
This assigns ranking based on salary.
RANK()
Meaning
Ranks rows but allows duplicate ranks.
Example:
SELECT Name, Salary,
RANK() OVER (ORDER BY Salary DESC) AS Rank
FROM Employees;
If two employees have the same salary, they receive the same rank.
DENSE_RANK()
Meaning
Similar to RANK but does not skip ranking numbers.
Example:
SELECT Name, Salary,
DENSE_RANK() OVER (ORDER BY Salary DESC) AS Rank
FROM Employees;
SUM() OVER()
Used to calculate running totals.
Example:
SELECT Name, Salary,
SUM(Salary) OVER (ORDER BY Salary) AS Running_Total
FROM Employees;
This shows cumulative salary totals.
PARTITION BY
Meaning
PARTITION BY divides data into groups before applying window functions.
Example:
SELECT Name, Department, Salary,
RANK() OVER (PARTITION BY Department ORDER BY Salary DESC) AS Dept_Rank
FROM Employees;
This ranks employees within each department.
Advantages of Window Functions
Window functions are powerful because they:
• Perform complex analytics easily
• Avoid multiple queries
• Support ranking and cumulative calculations
• Improve query performance
They are commonly used in data analytics, financial analysis, and reporting
dashboards.