Data
It is a collection of raw facts or unprocessed data or any information.
It can be in any form such as numbers, text, images, audio or any other format.
Information
It is data that has processed, organized that has meaning and useful.
To store the data we have Files, DataBases.
DrawBacks of Files
No Security to the data.
Limited capacity to store data.( Max capacity is Hard Disk Capacity )
Difficult to recover the lost data in case of hard disk failure.
To make changes in data we need the help of a programming language ( to read or write ).
To overcome these failures we prefer databases.
DataBases
It is an organized collection of data that is stored on a device so that they can be accessed easily
or updated.
Depending on the type of database data is stored in different formats like tables, documents,
key-value pairs, graphs etc.
Types Of Databases
Relational Databases
Stores data in table format. Eg: MySQL, PostgreSQL, Oracle etc.
NoSQL Databases
Stores data in different formats like documents, key-value pairs, graphs, objects etc.
Eg: MongoDB, Cassandra, Redis etc.
In order to create a database to store data we need a software which is ‘DBMS’.
DBMS
It stands for Database Mangement System.
It helps you to create, manage, interact with the databases.
It is a software interface or a layer used for providing interaction between user or application
and the raw data.
It defines data, store it , manipulate and provides security to data in database.
Types Of DBMS
Relational DBMS
Here data is stored in tables with relationships between them, it has the rigid schema,
there is no duplication of data due to Normalization technique and hence it maintains
accurate data
Eg: MySQL, PostgreSQL, Oracle, Microsoft SQL Server, SQLite
NoSQL DBMS
Here data is stored in different formats, depending on the way of storing data we have
different types of dbms.
Document DBMS:
Stores data in the form of documents, has the flexible schema.
Eg: MongoDB, CouchDB.
Key-Value DBMS:
Stores data in the form of key-value pairs ( stores data as collection of unique keys and
their associated values ), key acts as unique identifier for retrieving corresponding value)
Eg: Redis
Heirarchical DBMS:
Stores data in tree like structure with single root node and multiple child nodes
connected to it.
Eg: IBM IMS
Graph DBMS:
Stores and manages data using graph structures
Eg: Neo4j
Object DBMS (ODBMS):
Data is stored in the form of objects, similar to how data is handled in object – oriented
programming languages.
Eg: ObjectDB
To interact or communicate with rdbms we need a language that is SQL
SQL
It stands for Structured Query Language, also known SEQUEL ( Structured English Query
Language ).
It is a common Query Language to communicate with any rdbms.
SQL is not a case sensitive Language.
By using this users can perform various operations like creating, retrieving, manage databases
etc.
In RDBMS data is stored in table format, Tables also known Entities or Objects, Rows are also
known as Records or Tuples, Columns are known as Fields or Attributes.
In SQL each query ends with a semi colon.
To interact with RDBMS we need to use SQL Commands.
SQL Commands
In SQL we have five commands, they are : DDL, DML, DQL, DCL, TCL.
DDL:
It stands for Data Defnition Language.
It is used to define structure of a table.
Create, Drop, Truncate, Alter, Rename arse DDL commands.
DML
It stands for Data Manipulation.
These commands are used to modify the data in tables.
Insert, Update, Delete are DML commands.
DQL
It stands for Data Query Language.
It is used specifically for querying data from the database.
It includes only one Command ‘Select’.
DCL
It stands for Data Control Language.
These commands are used to control access, permissions and privileges on database objects.
Grant, Revoke are DCL commands.
TCL
It stands for Transaction Control Language.
These commands are used to manage transactions properly.
Commit, Rollback, Savepoint are TCL commands.
Operators
Arithmetic Operators: +, -, *, /, %
Logical Operators: and, or, not
Comparision Operators: >, >=, <, <=, =, != (or) <>
Some of the remaining operators are: is, in, between, any, all, like, exists etc…
AGGREGATE FUNCTIONS
-> These are also known as Multi Row Functions.
-> These functions are used to perform operations on multiple records and gives a single result.
-> The multiple records can be either all the records in a table or all the records in a specific category
defined by ‘Group By’ clause.
-> We cannot use aggregate functions along with single column when we didn’t specify group by clause
-> The following are the aggregate functions -
1) sum(), 2) max(), 3) min(), 4) avg(), 5) count()
1) SUM()
-> It is used to do summation of all the values in a specific column or in a specific category.
Syntax: SELECT SUM(col name) FROM <Table name>;
Ex: SELECT SUM(marks) FROM Students;
2) MAX()
-> It is used to find the maximum value of a column or in a category.
Ex: SELECT MAX(col name) FROM <Table name>;
3) MIN()
-> It is used to find the minimum value of a column or in a category.
Ex: SELECT MIN(col name) FROM <Table name>;
4) AVG()
-> It is used to find the average of all the values in a column or in a category.
Ex: SELECT AVG(col name) FROM <Table name>;
-> It doesn’t include null value by default, to include that record value with another value we can use
‘Coalesce’ Function
SELECT AVG(COALESCE(col name, value)) FROM <Table name>;
Ex: SELECT AVG(COALESCE(marks,0)) AS Average FROM Students;
COALESCE()
-> It takes/displays user given values instead of null values in a column.
-> It can take any number of parameters and returns the first not null value out of all parameters.
Ex: SELECT COALESCE(marks, ’ab’) AS marks FROM Students;
SELECT COALESCE(Location, null, ’Hyd’, null) FROM Students;
5) COUNT()
-> It is used to find the no of records in table or in a category.
-> We can specify either asterisk (*) or column name w.r.t this function.
-> If we mention column name w.r.t this function then it doesn’t count the record that has null value.
Syntax: SELECT COUNT(*) FROM <Table name>;
Ex: SELECT COUNT(*) FROM Students; // retrieves total number of records in table
SELECT COUNT(marks) FROM Students;
SELECT COUNT(DISTINCT branch) FROM Students; //counts duplicate values only once
GROUP BY
-> This clause is used to separate the data into different categories based on column data and we can
perform grouping operations on that categories.
-> We usually use this clause with aggregate functions to summarize data based on grouping criteria.
-> The columns other than the aggregate functions that are mentioned with respect to ‘SELECT’ must be
mentioned with respect to GROUP BY clause.
Syntax: SELECT col1, AGGREGATE_FUNCTION(col2) FROM <Table name> Group by col1;
Ex: SELECT branch, max(marks) as HIGHEST_MARKS from students
GROUP BY branch;
-> To specify the condition on the aggregated data, we use the ‘HAVING’ clause.
Ex: SELECT branch, AVG(marks) AS average_marks FROM students
GROUP BY branch HAVING AVG(marks) > 20;
Note:
‘Where’ Clause is followed by ‘From’ keyword, ‘Having’ Clause is followed by ‘Group By’ Clause.
‘Where’ Clause is used to filter data row by row, ‘Having’ Clause filters the groups.
Aggregate functions can be used with ‘Having’ Clause but not with ‘Where’ Clause.
Flow Of Execution:
5)Select Col_Name
1)From Table_Name
2)Where Conditions
3)Group By Col_Name
4)Having Conditions
6)Order By Col_Name Asc/Desc
7)Limit value1 Offset value2;
WINDOW FUNCTIONS
-> These functions perform operations based on total records and give separate values for each and
every record.
-> Here we can give values by considering total table or separate category by using ‘partition by’ clause
and we can give values in either ascending or descending order by using ‘order by’ clause
-> We need to specify these in ‘over’ clause
-> The following are the window functions -
1) AGGREGATE FUNCTIONS, 2) RANKING FUNCTIONS, 3) VALUE FUNCTIONS
RANKING FUNCTIONS -
-> These functions give the value by comparing the current record observation with all the observations
in the set of records. The following are the ranking functions -
a) rank(), b) dense_rank(), c) row_number(), d) percent_rank(), e) ntile(int)
a) Rank()
-> This function gives the same value for similar observations and the next observation will get the value
based on the number of equal ties count.
Syntax: SELECT *, RANK() OVER(PARTITION BY col1 ORDER BY col2) FROM <Table name>;
Ex: SELECT *, RANK() OVER(ORDER BY marks DESC) AS r FROM Students;
SELECT *, RANK() OVER(PARTITION BY branch ORDER BY marks DESC) AS r FROM Students;
b) Dense_rank()
-> This function will give the same value for similar observations and next observation will get the next
consecutive value.
Syntax: SELECT *, DENSE_RANK() OVER(PARTITION BY col1 ORDER BY col2) FROM <Table name>;
Ex: SELECT *, DENSE_RANK() OVER(ORDER BY marks DESC) AS dr FROM Students;
SELECT *, DENSE_RANK() OVER(PARTITION BY branch ORDER BY marks DESC) AS dr FROM Students;
c) Row_number()
-> This function is used to assign unique values to all the observations whether they are similar or
different observations.
Syntax: SELECT *, ROW_NUMBER() OVER(PARTITION BY col1 ORDER BY col2) FROM <Table name>;
Ex: SELECT *, ROW_NUMBER() OVER(order by marks DESC) as rn from students;
SELECT *, ROW_NUMBER() OVER(PARTITION BY branch order by marks DESC) as rn from students;
d) Percent_rank() -
-> It is used when we need to assign comparision values for every observation with respect to the
highest value or the least value.
-> The value ranges from 0 to 1.
Syntax: SELECT *,PERCENT_RANK() OVER(PARTITION BY col1 ORDER BY col2) FROM <Table name>;
Ex: SELECT *, PERCENT_RANK() OVER(ORDER BY marks) AS pr FROM Students;
SELECT *, PERCENT_RANK() OVER(PARTITION BY branch ORDER BY marks) AS pr FROM Students;
e) Ntile(int) -
-> This function will divide the data into specific number of parts based on user given value.
Syntax: SELECT *, NTILE(value) OVER(PARTITION BY col1 ORDER BY col2) FROM <Table name>;
Ex: SELECT *, NTILE(3) OVER(ORDER BY marks) AS n FROM Students;
SELECT *, NTILE(2) OVER(PARTITION BY branch ORDER BY marks) AS n FROM Students;
CONSTRAINTS
-> By using these we can enter only valid data into the table.
-> These are used to keep restriction to the data given by the user.
-> These are used at column level.
The following are the constraints in MYSql
1) UNIQUE, 2) NOT NULL, 3) DEFAULT, 4) CHECK, 5) PRIMARY KEY and
6) FOREIGN KEY
1) UNIQUE
-> By using this constraint we can enter only distinct values. It doesn't accept the duplicate values.
-> In MySQL, PostgreSQL the UNIQUE constraint allows the null values for any number of times.
Syntax:
CREATE TABLE <Table_Name> (col1 datatype UNIQUE, col2 datatype, col3 datatype);
Ex: CREATE TABLE Students(id int UNIQUE, sname varchar(255), branch varchar(255));
-> To add the ‘UNIQUE’ constraint to a column after the table creation -
Syntax: ALTER TABLE Students ADD UNIQUE(branch); (or)
ALTER TABLE Students MODIFY COLUMN branch varchar(255) UNIQUE;
2) NOT NULL
-> When using this constraint, we can make a field as mandatory.
-> The user can’t enter details or a record into a table unless if they provide data for this constraint field.
-> It doesn't accept the null values.
Syntax:
CREATE TABLE <Table_Name> (col1 datatype NOT NULL, col2 datatype, col3 datatype);
Ex: CREATE TABLE Students(id int UNIQUE, sname varchar(255) NOT NULL, branch varchar(255));
-> To add the ‘NOT NULL’ constraint to a column after the table creation -
Syntax: ALTER TABLE Students MODIFY COLUMN branch varchar(255) NOT NULL;
3) DEFAULT
-> This is used to replace the by default ‘NULL’ value with user provided value as the default value.
-> If user doesn’t enter value for that field then this default value will be inserted automatically.
Syntax:
CREATE TABLE <Table_Name> (col1 datatype DEFAULT(value), col2 datatype, col3 datatype);
Ex: CREATE TABLE Students(id int UNIQUE, sname varchar(255) NOT NULL, branch varchar(255),
marks int, age int DEFAULT(18));
-> To add the ‘DEFAULT’ constraint to a column after the table creation -
Syntax: ALTER TABLE Students MODIFY COLUMN marks int DEFAULT(0);
4) CHECK
-> By using this constraint we can specify the conditions on column data.
-> We can enter the data into this column in the table only if it satisfies the condition.
Syntax:
CREATE TABLE <Table_Name> (col1 datatype , col2 datatype, col3 datatype, CHECK(condition));
Ex: CREATE TABLE Students(id int UNIQUE, sname varchar(255) NOT NULL, branch varchar(255),
age int DEFAULT(18), CHECK(age>=14)); // input data should be greater or equal to 14 else the
data won’t be entered into the table.
-> To add the ‘CHECK’ constraint to a column after the table creation -
Syntax: ALTER TABLE Students MODIFY COLUMN id int CHECK(id>10);
5) PRIMARY KEY
-> It is one of the constraints in SQL.
-> It is a combination of UNIQUE and NOT NULL constraints.
-> It doesn’t allow duplicate values and doesn’t allow the null values.
-> A table can have only one primary key.
-> Whenever we use this constraint, it automatically creates the clustered index and sorts the data in
ascending order.
Syntax:
CREATE TABLE <Table_Name> (col1 datatype PRIMARY KEY, col2 datatype, col3 datatype);
Ex: CREATE TABLE Students(id int PRIMARY KEY, sname varchar(255) NOT NULL, branch
varchar(255), age int DEFAULT(18));
-> If one column cannot satisfy the primary key property, then we can combine two or more columns to
form a primary key, which is known as composite primary key.
Composite Primary Key Syntax:
CREATE TABLE Students(id int, sname varchar(255) NOT NULL, branch varchar(255), age int
DEFAULT(18), PRIMARY KEY( id, branch ));
->To add the PRIMARY KEY constraint after table creation
Syntax:
ALTER TABLE <table_name> ADD PRIMARY KEY(col_name);
->To remove PRIMARY KEY constraint
ALTER TABLE DROP PRIMARY KEY;
6) FOREIGN KEY
-> It acts as the reference for the primary key of another table.
-> A table can have multiple foreign keys.
-> Foreign key column must contain the values that are present in primary key column that it refers.
-> A foreign key column can have duplicate values and null values also.
-> Foreign key column name & primary key column name may or may not be same.
-> If two or more columns are combined to form a foreign key, then it is known as COMPOSITE FOREIGN
KEY.
Syntax:
CREATE TABLE <Table_Name> (col1 datatype, col2 datatype, col3 datatype,
FOREIGN KEY(col2) REFERENCES <PRIMARY KEY Table_Name> (PRIMARY KEY column name));
Ex: CREATE TABLE branches(branch_ id int PRIMARY KEY, branch_name varchar(15) NOT NULL,
seats int);
CREATE TABLE Students(sid int PRIMARY KEY, sname varchar(15) NOT NULL, location varchar(5),
age int, FOREIGN KEY(branch) REFERENCES branches (branch_id));
//relation between branches table and students table by referencing the branch id from the branches
table to branch_id column in the students table using the foreign key constraint
-> To add a foreign key to a column after the table creation -
Syntax –
ALTER table table_name ADD FOREIGN KEY(column_name) references <primary_key tale
name> (primary key column name);
-> To remove a foreign key to a column after the table creation -
Syntax - ALTER table table_name DROP FOREIGN KEY CONSTRAINT_NAME;
AUTO INCREMENT -
-> It is a property of a column that is used to assign the values by incrementing one to the previous
record value.
-> It cannot be used individually. It is used with either primary key or unique constraint.
-> It can be used only with INTEGER data types.
Ex:
CREATE table branches(branch_ id int PRIMARY KEY AUTO INCREMENT, branch_name
varchar(15) NOT NULL, seats int);
JOINS
-> By using these, we can retrieve the data from two or more tables based on the condition. The
following are the joins in MySql.
1) Cross Join, 2) Natural Join, 3) Inner Join, 4) Right Join, 5) Left Join, 6) Self Join
1) Cross Join
-> It retrieves the data from two or more tables without specifying any condition.
-> But the condition will be built automatically such that every record of one table comes with every
record of another table (result of Cartesian product).
Syntax: SELECT * FROM table1 AS t1 CROSS JOIN table2 AS t2;
2) Natural Join
-> It retrieves the data from two or more tables without specifying any condition.
-> But the condition will be built automatically such that -
i) If there are similar columns in the tables, then those columns will be merged and only the common
values in that column will be displayed,
ii) If there are no similar columns, it gives the result of Cartesian product.
Syntax: SELECT * FROM table1 AS t1 NATURAL JOIN table2 AS t2;
3) Inner Join -
-> It is used to retrieve the data from two or more tables that satisfies the given condition.
-> To specify the condition, we need to use either 'ON' or 'USING' keyword.
-> USING keyword is preferred if we have the same column names on which the condition is built.
Syntax: SELECT * FROM table1 AS t1 INNER JOIN table2 AS t2 ON Condition;
Equi Join
-> It is used to display the similar data like inner join but without using join keyword.
Syntax: SELECT * FROM table1 AS t1, table2 AS t2 WHERE Condition;
4) Left Join
-> It retrieves the data completely from the left table and the data from the right table that satisfies the
given condition.
-> The values will be replaced with null value that comes from the right table which doesn't satisfy the
given conditions.
Syntax: SELECT * FROM table1 AS t1 LEFT JOIN table2 AS t2 ON Condition;
5) Right Join
-> It retrieves the data completely from the right table and the data from the left table that satisfies the
given condition.
-> The values will be replaced with null value that comes from the left table which doesn't satisfy the
given conditions.
Syntax: SELECT * FROM table1 AS t1 RIGHT JOIN table2 AS t2 ON Condition;
6) Self Join
-> It is used when we build the condition on the same table itself.
Syntax: SELECT * FROM table1 AS t1 JOIN table1 AS t2 ON Condition;
Normalization
-> It is a database design technique used to reduce redundancy and improves efficiency of the database
by reducing anomalies.
-> It helps in making the database more efficient, flexible, and easy to maintain.
-> The goal of normalization is to divide large tables into smaller by maintaining well-defined
relationships between them.
-> It is achieved by following the normal forms
First Normal Form (1NF):
A table is in 1NF, if it follows the below conditions
-> Every cell must contain a single value.
-> Each row is unique( table shouldn’t contain duplicate rows ).
Second Normal Form (2NF):
A table is in 2NF, if it follows the below conditions
-> It must follow 1NF.
-> There should not be partial dependency in the table ( Partial Dependency means a non-key attribute
in a table not depending on entire primary key ).
Third Normal Form (3NF):
A table is in 3NF, if it follows the below conditions
-> It must follow 2NF.
-> There should not be transitive dependency in the table ( Transitive Dependency means a non-key
attribute in a table depending on another non-key attribute ).
BCNF:
-> It stands for Boyce-Codd Normal Form.
-> A table is in BCNF, if it follows the below conditions
-> It must follow 3NF.
-> It is a strict version of 3NF.
-> The attribute that decides the value of another attribute must be a Candidate key.
Fourth Normal Form (4NF):
A table is in 4NF, if it follows the below conditions
-> It must follow BCNF.
-> There should not be any multi – valued dependencies in the table. ( multi – valued dependencies
means different attributes depending on Primary Key but these attributes doesn’t have any relation
between them )
Procedures
-> They are also known as Stored Procedures.
-> They are stored in a database.
-> It is a collection of SQL queries grouped together to perform a specific task.
-> It can be used at any part in an application by encapsulating logic within the procedure, this reduces
writing complex queries multiple times.
-> The keywords create, begin, end, delimiter are used to define a procedure.
Syntax
1) Procedure without parameters
Delimiter ## -- can use any symbol to change query ending symbol
Create procedure p1()
Begin
-- logic --
End ## -- Should use the symbol i.e. mentioned w.r.t Delimiter keyword
Delimiter ; -- To change query ending symbol to semi colon
2) Procedure with ‘in’ parameters
Delimiter $$
Create procedure p2(in variable_name datatype)
Begin
-- logic --
End $$
Delimiter ;
3) Procedure with ‘out’ parameters
Delimiter %%
Create procedure p3(out variable_name datatype)
Begin
-- logic --
End %%
Delimiter ;
4) Procedure with ‘inout’ parameters
Delimiter &&
Create procedure p4(inout variable_name datatype)
Begin
-- logic --
End &&
Delimiter ;
Note:
To permanently delete a procedure from database
Drop procedure procedure_name;
Triggers
->It is a specialized stored procedure that is automatically invoked when an event occurs in a database.
-> The event can be any changes that are made on the data, as it involves data modification triggers are
usually defined on DML Commands(insert, update, delete).
-> It is invoked either before or after the event happens.
-> The keywords create, before, after, begin, end, delimiter are used.
Syntax
Delimiter ^^
Create trigger t1
[After|Before] [Insert|Update|Delete] on table_name
For each row
Begin
-- logic --
End ^^
Delimiter ;
Note:
To permanently delete a trigger from database
Drop trigger procedure_name;