0% found this document useful (0 votes)
3 views23 pages

Module 3 SQL

Module 3 covers SQL for data analysis, emphasizing the importance of SQL in managing and analyzing data stored in databases. It explains database concepts, types, and the structure of relational databases, along with SQL commands for data querying, filtering, aggregation, and joining tables. Advanced SQL techniques such as subqueries, CASE statements, and window functions are also introduced to enhance data analysis capabilities.

Uploaded by

Narendra jain
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views23 pages

Module 3 SQL

Module 3 covers SQL for data analysis, emphasizing the importance of SQL in managing and analyzing data stored in databases. It explains database concepts, types, and the structure of relational databases, along with SQL commands for data querying, filtering, aggregation, and joining tables. Advanced SQL techniques such as subqueries, CASE statements, and window functions are also introduced to enhance data analysis capabilities.

Uploaded by

Narendra jain
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Module 3: SQL for Data Analysis (8 Hours)

SQL Basics – Detailed Notes

➢ SQL (Structured Query Language) is a standard language used to manage and


analyze data stored in databases. SQL is widely used in data analytics, business
intelligence, and software development for retrieving and manipulating
structured data.
➢ Most modern organizations store their data in databases and use SQL to analyze
that data efficiently.
➢ SQL is used with database management systems such as MySQL, PostgreSQL,
Microsoft SQL Server, and Oracle Database.
➢ Understanding SQL basics is essential for data analysts because it helps them
retrieve and analyze large amounts of data quickly.

1. What is a Database

Meaning

A database is an organized collection of data that is stored electronically and can be


easily accessed, managed, and updated.

In simple terms, a database is a structured place where information is stored so that it


can be retrieved and analyzed whenever needed.

For example, organizations store different types of information in databases such as:

• Customer information

• Product details

• Sales records

• Employee data

Instead of storing data in paper files, companies use databases to manage data
efficiently.

Example of a Database

Consider a student database.


Student_ID Name Course Marks

101 Rahul Data Analytics 85

102 Amit Data Analytics 90

103 Neha Data Analytics 88

This information is stored inside a database system.

Characteristics of a Database

A database usually has the following characteristics:

Organized Structure

Data is stored in a structured format so it can be easily searched and updated.

Data Consistency

Data remains accurate and consistent across the system.

Data Security

Access to data can be controlled using permissions.

Data Sharing

Multiple users can access the database at the same time.

Efficient Data Retrieval

Users can retrieve specific information quickly using queries.

Types of Databases

Some common types of databases include:

Relational Databases

Data is stored in tables with rows and columns.

Example systems:

• MySQL

• PostgreSQL

• SQL Server
NoSQL Databases

Used for unstructured or semi-structured data.

Example systems:

• MongoDB

• Cassandra

In data analytics, relational databases are most commonly used.

2. RDBMS Concept

Meaning

RDBMS stands for Relational Database Management System.

An RDBMS is a software system used to create, manage, and manipulate relational


databases.

In an RDBMS, data is stored in tables, and relationships are established between tables
using keys.

Examples of RDBMS software include:

• MySQL

• PostgreSQL

• Microsoft SQL Server

• Oracle Database

Features of RDBMS

Table-Based Structure

Data is stored in tables.

Relationships Between Tables

Tables can be connected using keys such as primary keys and foreign keys.

Data Integrity

RDBMS ensures data accuracy and consistency.

SQL Support

Users can interact with the database using SQL queries.


Multi-User Access

Multiple users can access the database simultaneously.

Example of RDBMS Structure

Suppose a company stores data in two tables:

Employees Table

Employee_ID Name Department_ID

101 Rahul 1

102 Amit 2

Departments Table

Department_ID Department_Name

1 Sales

2 Marketing

Here, Department_ID links the two tables together.

This relationship allows users to retrieve data from multiple tables efficiently.

3. Tables, Rows, and Columns

In relational databases, data is organized into tables.

A table is made up of rows and columns.

Tables

Meaning

A table is a structured collection of related data stored in rows and columns.

Each table represents a specific entity such as:

• Customers

• Employees

• Products
• Orders

Example Table

Customer_ID Name City Age

101 Rahul Delhi 25

102 Amit Mumbai 30

103 Neha Indore 28

This table stores customer information.

Rows

Meaning

A row represents a single record in a table.

Each row contains information about one entity.

Example:

Customer_ID Name City Age

101 Rahul Delhi 25

This row represents one customer record.

Rows are also called records or tuples.

Columns

Meaning

A column represents an attribute or field of the data.

Each column contains a specific type of information.

Example:

Column Name Description

Customer_ID Unique identifier

Name Customer name


Column Name Description

City Customer location

Age Customer age

Columns define the structure of the table.

Example: Table Structure

Table: Employees

Employee_ID Name Department Salary

101 Rahul HR 35000

102 Amit IT 45000

103 Neha Finance 40000

Here:

Table → Employees

Columns → Employee_ID, Name, Department, Salary

Rows → Each employee record

Importance of Tables in Data Analysis

Tables help organize data in a way that makes it easy to:

• Retrieve information

• Filter and sort data

• Perform calculations

• Generate reports

SQL queries allow analysts to extract meaningful insights from tables.


SQL Commands –

SQL (Structured Query Language) is used to retrieve, filter, analyze, and manage data
stored in relational databases. Data analysts use SQL commands to extract useful
information from large datasets stored in database systems such as MySQL,
PostgreSQL, Microsoft SQL Server, and Oracle Database.

These SQL commands help analysts perform tasks like retrieving specific records,
filtering data, performing calculations, grouping data, and combining tables.

1. Data Query Commands

Data Query commands are used to retrieve data from database tables.

SELECT

Meaning

The SELECT statement is used to retrieve data from a database table.

It is the most commonly used SQL command.

Syntax

SELECT column_name

FROM table_name;

Example

Table: Students

ID Name Marks

1 Rahul 85

2 Amit 90

3 Neha 88

Query:

SELECT Name, Marks

FROM Students;

Result:
Name Marks

Rahul 85

Amit 90

Neha 88

Select All Columns

SELECT *

FROM Students;

* means all columns.

WHERE

Meaning

The WHERE clause is used to filter records based on conditions.

Syntax

SELECT column_name

FROM table_name

WHERE condition;

Example

SELECT *

FROM Students

WHERE Marks > 85;

Result → Only students with marks greater than 85.

ORDER BY

Meaning

The ORDER BY clause sorts the result in ascending or descending order.

Syntax

SELECT column_name
FROM table_name

ORDER BY column_name ASC|DESC;

Example

SELECT *

FROM Students

ORDER BY Marks DESC;

This sorts students by marks from highest to lowest.

ASC → Ascending
DESC → Descending

LIMIT

Meaning

The LIMIT clause restricts the number of rows returned by a query.

Syntax

SELECT *

FROM table_name

LIMIT number;

Example

SELECT *

FROM Students

LIMIT 3;

This returns only first 3 records.

2. Filtering Conditions

Filtering helps in retrieving specific rows based on conditions.

AND

Meaning

The AND operator returns records only when all conditions are true.

Example
SELECT *

FROM Employees

WHERE Salary > 30000 AND Department = 'IT';

This returns employees who:

• earn more than 30000

• work in IT department

OR

Meaning

The OR operator returns records when any condition is true.

Example

SELECT *

FROM Employees

WHERE Department = 'HR' OR Department = 'IT';

This returns employees from either department.

NOT

Meaning

The NOT operator reverses a condition.

Example

SELECT *

FROM Employees

WHERE NOT Department = 'Sales';

This returns employees not working in Sales department.

BETWEEN

Meaning

The BETWEEN operator selects values within a range.

Syntax
SELECT *

FROM table_name

WHERE column BETWEEN value1 AND value2;

Example

SELECT *

FROM Students

WHERE Marks BETWEEN 70 AND 90;

This returns students whose marks are between 70 and 90.

LIKE

Meaning

The LIKE operator is used for pattern matching in text data.

Wildcards

% → multiple characters
_ → single character

Example

SELECT *

FROM Customers

WHERE Name LIKE 'A%';

This returns names starting with A.

Example results:

Amit
Anita
Arjun

IN

Meaning

The IN operator allows selecting multiple values in a WHERE clause.

Example

SELECT *
FROM Employees

WHERE Department IN ('HR','IT','Finance');

This returns employees from the specified departments.

3. Aggregate Functions

Aggregate functions perform calculations on multiple rows of data and return a single
result.

COUNT

Meaning

Counts the number of rows.

Example

SELECT COUNT(*)

FROM Employees;

Result → total number of employees.

SUM

Meaning

Calculates the total sum of a numeric column.

Example

SELECT SUM(Salary)

FROM Employees;

Result → total salary expense.

AVG

Meaning

Calculates the average value.

Example

SELECT AVG(Salary)

FROM Employees;

Result → average salary.


MIN

Meaning

Returns the smallest value.

Example

SELECT MIN(Salary)

FROM Employees;

Result → lowest salary.

MAX

Meaning

Returns the highest value.

Example

SELECT MAX(Salary)

FROM Employees;

Result → highest salary.

4. Grouping Data

Grouping allows analysts to summarize data by categories.

GROUP BY

Meaning

The GROUP BY clause groups rows that have the same values.

Example

SELECT Department, COUNT(*)

FROM Employees

GROUP BY Department;

Result:
Department Employees

HR 5

IT 8

Finance 4

HAVING

Meaning

The HAVING clause filters grouped results.

It is similar to WHERE but used after GROUP BY.

Example

SELECT Department, COUNT(*)

FROM Employees

GROUP BY Department

HAVING COUNT(*) > 5;

This shows departments with more than 5 employees.

5. SQL Joins

Joins are used to combine data from multiple tables.

They are very important in data analysis because real-world databases usually contain
multiple related tables.

INNER JOIN

Meaning

An INNER JOIN returns records that have matching values in both tables.

Example

Table: Employees
Emp_ID Name Dept_ID

1 Rahul 1

2 Amit 2

Table: Departments

Dept_ID Dept_Name

1 HR

2 IT

Query:

SELECT [Link], Departments.Dept_Name

FROM Employees

INNER JOIN Departments

ON Employees.Dept_ID = Departments.Dept_ID;

Result:

Name Department

Rahul HR

Amit IT

LEFT JOIN

Meaning

A LEFT JOIN returns:

• all records from the left table

• matching records from the right table

If no match exists, NULL values appear.

Example

SELECT [Link], Departments.Dept_Name

FROM Employees

LEFT JOIN Departments


ON Employees.Dept_ID = Departments.Dept_ID;

This returns all employees even if they have no department.

RIGHT JOIN

Meaning

A RIGHT JOIN returns:

• all records from the right table

• matching records from the left table

Example

SELECT [Link], Departments.Dept_Name

FROM Employees

RIGHT JOIN Departments

ON Employees.Dept_ID = Departments.Dept_ID;

This returns all departments even if no employee belongs to them.

FULL JOIN

Meaning

A FULL JOIN returns:

• all records from both tables

• matched records where possible

• NULL values where no match exists

Example

SELECT [Link], Departments.Dept_Name

FROM Employees

FULL JOIN Departments

ON Employees.Dept_ID = Departments.Dept_ID;

This combines all records from both tables.


Advanced SQL for Data Analysis

➢ Advanced SQL techniques help analysts perform complex data analysis and
advanced calculations directly inside databases. These techniques allow users
to work with large datasets efficiently and extract deeper insights.
➢ Advanced SQL features are supported by most relational database systems such
as MySQL, PostgreSQL, Microsoft SQL Server, and Oracle Database.
➢ The main advanced SQL concepts include Subqueries, CASE statements, and
Window Functions.

1. Subqueries

Meaning

A subquery is a query written inside another SQL query.

It is also called an inner query or nested query.

The inner query executes first and its result is used by the outer query.

Subqueries help solve complex problems where data needs to be filtered or calculated
based on another query.

Basic Syntax

SELECT column_name

FROM table_name

WHERE column_name OPERATOR

(SELECT column_name FROM table_name WHERE condition);

Example

Table: Employees

Emp_ID Name Salary

1 Rahul 30000

2 Amit 50000

3 Neha 45000

Query: Find employees earning more than average salary

SELECT Name, Salary


FROM Employees

WHERE Salary > (

SELECT AVG(Salary)

FROM Employees

);

Explanation

1. Inner query calculates average salary.

2. Outer query selects employees whose salary is greater than the average.

Types of Subqueries

Single Row Subquery

Returns only one value.

Example:

SELECT *

FROM Employees

WHERE Salary > (SELECT AVG(Salary) FROM Employees);

Multiple Row Subquery

Returns multiple values.

Example:

SELECT *

FROM Employees

WHERE Department_ID IN (

SELECT Department_ID

FROM Departments

);

Correlated Subquery

A correlated subquery depends on the outer query and runs once for each row.

Example:
SELECT Name, Salary

FROM Employees e

WHERE Salary >

(SELECT AVG(Salary)

FROM Employees

WHERE Department_ID = e.Department_ID);

This finds employees earning more than the average salary in their department.

Advantages of Subqueries

• Simplifies complex queries

• Improves readability

• Helps perform comparisons with aggregated values

• Useful for filtering and calculations

2. CASE Statements

Meaning

A CASE statement is used to perform conditional logic in SQL queries.

It works similarly to IF–ELSE statements in programming languages.

CASE allows SQL to categorize data or create new columns based on conditions.

Syntax

SELECT column_name,

CASE

WHEN condition1 THEN result1

WHEN condition2 THEN result2

ELSE result

END

FROM table_name;
Example

Table: Students

Name Marks

Rahul 85

Amit 65

Neha 45

Query:

SELECT Name, Marks,

CASE

WHEN Marks >= 75 THEN 'Distinction'

WHEN Marks >= 50 THEN 'Pass'

ELSE 'Fail'

END AS Result

FROM Students;

Result:

Name Marks Result

Rahul 85 Distinction

Amit 65 Pass

Neha 45 Fail

Uses of CASE Statements

CASE statements are widely used for:

• Categorizing data

• Creating conditional columns

• Performing logical calculations

• Creating reports and dashboards


CASE with Aggregation

CASE can also be used with aggregate functions.

Example:

SELECT

SUM(CASE WHEN Department='IT' THEN Salary ELSE 0 END) AS IT_Total_Salary

FROM Employees;

This calculates total salary for the IT department.

3. Window Functions :

Meaning

A Window Function performs calculations across a set of table rows that are related to
the current row.

Unlike aggregate functions, window functions do not collapse rows.

They allow calculations like ranking, running totals, and moving averages.

Window functions are very useful in data analytics and business intelligence.

Basic Syntax

SELECT column_name,

FUNCTION() OVER (PARTITION BY column_name ORDER BY column_name)

FROM table_name;

Common Window Functions

Some common window functions include:

• ROW_NUMBER()

• RANK()

• DENSE_RANK()

• SUM() OVER()

• AVG() OVER()

ROW_NUMBER()
Meaning

Assigns a unique row number to each record.

Example:

SELECT Name, Salary,

ROW_NUMBER() OVER (ORDER BY Salary DESC) AS Rank_Number

FROM Employees;

This assigns ranking based on salary.

RANK()

Meaning

Ranks rows but allows duplicate ranks.

Example:

SELECT Name, Salary,

RANK() OVER (ORDER BY Salary DESC) AS Rank

FROM Employees;

If two employees have the same salary, they receive the same rank.

DENSE_RANK()

Meaning

Similar to RANK but does not skip ranking numbers.

Example:

SELECT Name, Salary,

DENSE_RANK() OVER (ORDER BY Salary DESC) AS Rank

FROM Employees;

SUM() OVER()

Used to calculate running totals.

Example:

SELECT Name, Salary,


SUM(Salary) OVER (ORDER BY Salary) AS Running_Total

FROM Employees;

This shows cumulative salary totals.

PARTITION BY

Meaning

PARTITION BY divides data into groups before applying window functions.

Example:

SELECT Name, Department, Salary,

RANK() OVER (PARTITION BY Department ORDER BY Salary DESC) AS Dept_Rank

FROM Employees;

This ranks employees within each department.

Advantages of Window Functions

Window functions are powerful because they:

• Perform complex analytics easily

• Avoid multiple queries

• Support ranking and cumulative calculations

• Improve query performance

They are commonly used in data analytics, financial analysis, and reporting
dashboards.

You might also like