0% found this document useful (0 votes)
6 views50 pages

MySQL Database Operations Guide

The document is a comprehensive guide to MySQL, covering essential topics such as data types, SQL commands, CRUD operations, and database normalization. It includes detailed explanations of various SQL statements like CREATE, DROP, and JOIN, as well as comparisons between different SQL databases like SQLite and PostgreSQL. Additionally, it addresses performance considerations and best practices for structuring and managing databases.

Uploaded by

unluvkyguy.21
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views50 pages

MySQL Database Operations Guide

The document is a comprehensive guide to MySQL, covering essential topics such as data types, SQL commands, CRUD operations, and database normalization. It includes detailed explanations of various SQL statements like CREATE, DROP, and JOIN, as well as comparisons between different SQL databases like SQLite and PostgreSQL. Additionally, it addresses performance considerations and best practices for structuring and managing databases.

Uploaded by

unluvkyguy.21
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

Table of Contents

Introduction..........................................................6
Data Types in MySQL................................................7
CREATE TABLE....................................................7
DROP TABLE........................................................7
RENAME TABLE....................................................8
INSERT INTO.......................................................8
SELECT..............................................................8
SELECT DISTINCT.................................................8
ALTER TABLE......................................................9
1. Add a column to a table using ALTER TABLE with ADD................9
2. Modify a column using ALTER TABLE with MODIFY..................9
3. Rename columns using ALTER TABLE with CHANGE COLUMN. . .9
4. Drop a column using ALTER TABLE with DROP COLUMN...........9

ORDER BY in MySQL.............................................10
Aliases in MySQL...................................................10
Column Alias........................................................10
Table Alias..........................................................10
WHERE clause in MySQL.........................................10
IN Operator in MySQL.............................................11
LIKE Operator in MySQL.........................................11
IS NULL Operator in MySQL......................................11
JOIN................................................................12
INNER JOIN......................................................................... 12
LEFT JOIN........................................................................... 12
RIGHT JOIN......................................................................... 12
CROSS JOIN.........................................................................13

GROUP BY.........................................................13
HAVING............................................................13
Comparing SQL Databases.........................................14
SQLite vs PostgreSQL.............................................................14

1
Chapter 2: SQL Tables..............................................14
How to use the CREATE TABLE statement...................................14
How to Alter Tables.................................................................15
How to use ALTER TABLE....................................................................15
Intro to Migrations..................................................................16
Example of a bad migration.......................................................16
SQL Data Types..................................................................... 16
SQLite Data Types..................................................................16
For example:.....................................................................................16
Boolean values...................................................................................17

Chapter 3: Constraints..............................................17
NOT NULL constraint..............................................................17
SQLite limitation.................................................................... 17
Primary Key Constraints...........................................................18
Your primary key will almost always be the "id" column................................18
Foreign Key Constraints...........................................................18
Creating a Foreign Key in SQLite................................................18
Schema................................................................................ 19
There is no perfect way to architect a database schema.................................19
How do we decide on a sane schema architecture?......................................19

Chapter 4: CRUD Operations in SQL..............................19


What is CRUD?......................................................................19
HTTP and CRUD....................................................................19
SQL Insert Statement...............................................................20
HTTP CRUD Database lifecycle.................................................20
Manual Entry.........................................................................20
SQL Injection........................................................................ 21
Count...................................................................................21
HTTP CRUD database lifecycle..................................................21

2
WHERE clause.......................................................................22
Using a WHERE clause............................................................22
Finding NULL values...............................................................23
IS NULL.............................................................................. 23
IS NOT NULL.......................................................................23
DELETE.............................................................................. 23
DELETE statement..................................................................23
The danger of deleting data...................................................................23
Strategy 2 - Soft deletes........................................................................24
Update query in SQL...............................................................24
Update statement................................................................................24
Object-Relational Mapping (ORMs).............................................24
Example: Using an ORM......................................................................25
Example: Using straight SQL.................................................................25
Should you use an ORM?..........................................................25

Chapter 5: Basic SQL Queries......................................26


How to use the AS Clause in SQL...............................................26
SQL Functions....................................................................... 26
IIF function.......................................................................................26
How to Use BETWEEN with WHERE.........................................27
How to return distinct values......................................................27
Logical Operators....................................................................27
Equality operators...................................................................28
The IN operator......................................................................29
The LIKE keyword..................................................................29
Underscore Operator................................................................30

Chapter 6: How to Structure Return Data in SQL..................30


The LIMIT keyword................................................................30
The SQL ORDER BY keyword..................................................31

3
Examples..........................................................................................31
Order By and Limit.................................................................31

Chapter 7: How to Perform Aggregations in SQL..................31


Why use aggregations?.............................................................31
The SUM function...................................................................32
The MAX function..................................................................32
A note on schema....................................................................32
The MIN function................................................................... 33
The GROUP BY clause............................................................33
Example of GROUP BY........................................................................33
The AVG() function................................................................33
The HAVING clause................................................................34
HAVING vs WHERE in SQL....................................................34
The ROUND function..............................................................34

Chapter 8: SQL Subqueries.........................................35


Subqueries............................................................................ 35
How to retreive data from multiple tables......................................35
Subquery syntax..................................................................... 35
No tables necessary.................................................................36

Chapter 9: Database Normalization.................................36


Table Relationships................................................................. 36
Types of Relationships.............................................................36
One-to-one........................................................................................37
One to many......................................................................................37
Many to many....................................................................................37
Joining tables.........................................................................38
Unique constraints across 2 fields...........................................................38
Database normalization.............................................................38
What is data integrity?..............................................................38

4
What is data redundancy?..........................................................38
Normal Forms........................................................................39
1st Normal Form (1NF)........................................................................40
2nd Normal Form (2NF).......................................................................41
3rd Normal Form (3NF).......................................................................42
Boyce-Codd Normal Form (BCNF)..........................................................43
Normalization Review..............................................................44

Chapter 10: How to Join Tables in SQL.............................44


INNER JOIN......................................................................... 44
The ON clause....................................................................... 45
Namespacing on Tables............................................................45
LEFT JOIN........................................................................... 46
RIGHT JOIN......................................................................... 46
SQLite Restriction...................................................................47
FULL JOIN...........................................................................47
SQLite................................................................................. 48

Chapter 11: Database Performance.................................48


SQL Indexes..........................................................................48
CREATE INDEX....................................................................48
Index Review.........................................................................49
Multi-column indexes..............................................................49
Rule of thumb........................................................................ 49
Denormalizing for speed...........................................................49
SQL Injection........................................................................ 50
How do we protect against SQL injection?.....................................50

5
Module 15: MYSQL Essentials

Introduction
MySQL is one of the most popular open-source Relational Database Management Systems
(RDBMS).

6
A MySQL client might hold multiple databases. Each database might contain multiple tables
with each table holding data of the same type. Each table contains rows and columns with
each row denoting a single entry and each column denoting different attributes of the entries.

Data Types in MySQL


MySQL supports a list of predefined data types that we can use to effectively model our
tables. These data types are:

INT: for integer data.

DECIMAL: for decimal data.

BOOLEAN: for boolean data.

CHAR: for fixed-length string.

VARCHAR: for variable-length string.

TEXT: for long-form text.

DATE: for date data.

TIME: for time data.

DATETIME: for date-time data.

TIMESTAMP: for timestamp data.

MySQL Commands

CREATE TABLE
The CREATE TABLE statement is used in MySQL to create a new table in a database. The
syntax for this is shown below:

Syntax:

CREATE TABLE [IF NOT EXISTS] table_name(


column1_definition,
column2_definition,
...,
table_constraints
);

DROP TABLE
The DROP TABLE statement is used in MySQL to drop or delete a table from the database.
The syntax for this is shown below:

7
Syntax:

DROP [TEMPORARY] TABLE [IF EXISTS] table_name;

RENAME TABLE
The RENAME TABLE statement is used in MySQL to rename the existing tables. One or
more tables can be renamed using this statement.

Syntax:

RENAME TABLE old_table_name to new_table_name;

INSERT INTO
The INSERT INTO statement is used in MySQL to insert rows into a table. One or more
rows can be inserted into a table using this statement.

Syntax:

INSERT INTO table_name (column1, column2, ….)


VALUES (value1, value2, ....),
(value1, value2, ...),
(value1, value2, ...),
....................;

SELECT
The SELECT statement is used for querying data. It allows you to select data from one or
more tables.
Syntax:

SELECT column1, column 2,....


FROM table_name;

SELECT DISTINCT
The SELECT DISTINCT statement is used in MySQL to remove duplicate rows.

Syntax:

SELECT DISTINCT column1, column2,...


FROM table_name;

8
ALTER TABLE
The statement ALTER TABLE can be used in MySQL to add a column, modify a column,
drop a column, rename a column from a table.

1. Add a column to a table using ALTER TABLE with ADD


The syntax for adding a column to a table is shown below:

Syntax:

ALTER TABLE table_name


ADD
new_column_name column_definition
[FIRST|AFTER column_name]

2. Modify a column using ALTER TABLE with MODIFY


We can modify one or multiple columns of a table using MODIFY with ALTER TABLE
statement.

The syntax for this is shown below:

ALTER TABLE table_name


MODIFY
column_name column_definition
[FIRST|AFTER column_name]

3. Rename columns using ALTER TABLE with CHANGE


COLUMN
We can rename a column of a table using the CHANGE COLUMN keyword with ALTER
TABLE.

The syntax for this is shown below:

ALTER TABLE table_name


CHANGE COLUMN original_column_name new_column_name column_definition
[FIRST | AFTER column_name]

4. Drop a column using ALTER TABLE with DROP COLUMN


We can drop a column or multiple columns using DROP COLUMN with ALTER TABLE.

The syntax for this is shown below:

ALTER TABLE table_name


DROP COLUMN column_name;

9
ORDER BY in MySQL
The ORDER BY clause is used in MySQL to sort the retrieved data in a particular order.

Syntax:

SELECT column1, column2,...


FROM table_name
ORDER BY Column1 ASC/DESC, Column2 ASC/DESC,... ;

Aliases in MySQL
Aliases are used to give columns or tables a temporary or simple name. AS keyword is used
to create an alias.

Column Alias
The syntax for column name aliases is written below.

Syntax:

SELECT column_name AS given_name


FROM table_name;

Table Alias
The aliases can be used to give simple and different names to tables also.

Syntax:

SELECT column1, column2, ….


FROM table_name AS given_name;

WHERE clause in MySQL


The WHERE clause is used to apply a particular condition while selecting rows from the
table. It helps in filtering the rows according to any particular condition.

Syntax:

SELECT column1, column2, …..


FROM table_name
WHERE condition;

10
IN Operator in MySQL
The IN operator is used to check if a value matches any of the values in a list of values. It is
similar to the OR operator as if any of the values in the list matches it returns true.

Syntax:

SELECT column1, column2, …..


FROM table_name
WHERE column_name IN (value1, value2, ….);

or

SELECT column1, column2, …..


FROM table_name
WHERE column_name IN (SELECT statement );

LIKE Operator in MySQL


The LIKE operator is used in MySQL to search for a specific pattern in a string. If an
expression matches the pattern, it returns true else false.

There are two wildcards in MySQL, used with the LIKE operator for searching a pattern.

The percentage sign (%). It represents zero or more characters.

The underscore sign (_). It represents a single character.

Syntax:

SELECT column1, column2, …..


FROM table_name
WHERE column_name LIKE pattern;

For example:

SELECT customer_id, customer_name, product_id


FROM customers
WHERE customer_name LIKE 'A'%;

IS NULL Operator in MySQL


The IS NULL is used to check if a value is NULL or not. If the value is NULL, it returns true
else false.

Syntax:

SELECT column1, column2, …..


FROM table_name
WHERE column_name IS NULL;

11
JOIN
Joins are used in relational databases to combine data from multiple tables based on a
common column between them. A foreign key may be used to reference a row in another
table and join can be done based on those columns. Two or more tables may have some
related data, and to combine all the data from multiple tables joins are used.

There are different types of joins in MySQL.

1. INNER JOIN

2. LEFT JOIN

3. RIGHT JOIN

4. CROSS JOIN

INNER JOIN
The INNER JOIN produces the output by combining those rows which have matching
column values.

Syntax:

SELECT column_names
FROM table1
INNER JOIN table2 ON table1.common_column=table2.common_column
INNER JOIN table3 ON table1.common_column=table3.common_column
...;

LEFT JOIN
The LEFT JOIN returns all the rows from the left table ‘A’ and the matching rows from the
right table ‘B’ in the join. The rows from the left table, which have no matching values in the
right table will be returned with a NULL value in the link column.

Syntax:

SELECT column_names
FROM table1
LEFT JOIN table2 ON table1.common_column=table2.common_column;

RIGHT JOIN
The RIGHT JOIN returns all the rows from the right table ‘B’ and the matching rows from
the left table ‘A’ in the join. The rows from the right table, which have no matching values in
the left table will be returned with a NULL value in the link column.

Syntax:

12
SELECT Column_names
FROM table1
RIGHT JOIN table2 ON table1.common_column=table2.common_column;

CROSS JOIN
CROSS JOIN returns the cartesian product of rows from the tables in the join. It combines
each row of the first table with each row of the second table. If there are X rows in the first
table and Y rows in the second table then the number of rows in the joined table will be
X*Y.

Syntax:

SELECT column_names
FROM table1
CROSS JOIN table2;

GROUP BY
The GROUP BY clause is used to arrange the rows in a group using a particular column
value. If there are multiple rows with the same value for a column then all those rows will be
grouped with that column value.

Syntax:

SELECT column1,column2,…
FROM table_name
WHERE condition
GROUP BY column1,column2, …
Order BY column1, column2, ….
The GROUP BY clause is generally used with aggregate functions like SUM, AVG,
COUNT, MAX, MIN. The aggregate functions provide information about each group.

HAVING
The HAVING clause is used with the GROUP BY clause in a query to specify come
conditions and filter some groups or aggregates that fulfill those conditions. The difference
between WHERE and HAVING is, WHERE is used to filter rows, and HAVING is used to
filter groups by applying some conditions.

Syntax:

SELECT column1, column2, …


FROM table_name
WHERE conditions
GROUP BY column1, column2, …..
HAVING conditions
13
Comparing SQL Databases
Let's dive deeper and talk about some of the popular SQL Databases and what makes them
different from one another. Some of the most popular SQL Databases right now are:

 PostgreSQL

 MySQL

 Microsoft SQL Server

 SQLite

 And many others

Source: [Link]

While all of these Databases use SQL, each database defines specific rules, practices, and
strategies that separate them from their competitors.

SQLite vs PostgreSQL
Personally, SQLite and PostgreSQL are my favorites from the list above. Postgres is a very
powerful, open-source, production-ready SQL database. SQLite is a lightweight, embeddable,
open-source database. I usually choose one of these technologies if I'm doing SQL work.

SQLite is a serverless database management system (DBMS) that has the ability to run within
applications, whereas PostgreSQL uses a Client-Server model and requires a server to be
installed and listening on a network, similar to an HTTP server.

See a full comparison here.


Again, in this course we will be working with SQLite, a lightweight and simple database. For
most backend web servers, PostgreSQL is a more production-ready option, but SQLite is
great for learning and for small systems.

Chapter 2: SQL Tables


The CREATE TABLE statement is used to create a new table in a database.

How to use the CREATE TABLE statement


To create a table, use the CREATE TABLE statement followed by the name of the table and
the fields you want in the table.

CREATE TABLE employees (id INTEGER, name TEXT, age INTEGER, is_manager
BOOLEAN, salary INTEGER);

14
Each field name is followed by its datatype. We'll get to data types in a minute.

It's also acceptable and common to break up the CREATE TABLE statement with some
whitespace like this:

CREATE TABLE employees(


id INTEGER,
name TEXT,
age INTEGER,
is_manager BOOLEAN,
salary INTEGER
);

How to Alter Tables


We often need to alter our database schema without deleting it and re-creating it. Imagine if
Twitter deleted its database each time it needed to add a feature, that would be a disaster!
Your account and all your tweets would be wiped out on a daily basis.

Instead, we can use use the ALTER TABLE statement to make changes in place without
deleting any data.

How to use ALTER TABLE


With SQLite an ALTER TABLE statement allows you to:

1. Rename a table or column, which you can do like this:

ALTER TABLE employees


RENAME TO contractors;

ALTER TABLE contractors


RENAME COLUMN salary TO invoice;

2. ADD or DROP a column, which you can do like this:

ALTER TABLE contractors


ADD COLUMN job_title TEXT;

ALTER TABLE contractors


DROP COLUMN is_manager;

Intro to Migrations
A database migration is a set of changes to a relational database. In fact, the ALTER TABLE
statements we did in the last exercise were examples of migrations.

15
Migrations are helpful when transitioning from one state to another, fixing mistakes, or
adapting a database to changes.

Good migrations are small, incremental and ideally reversible changes to a database. As you
can imagine, when working with large databases, making changes can be scary. We have to
be careful when writing database migrations so that we don't break any systems that depend
on the old database schema.

Example of a bad migration


If a backend server periodically runs a query like SELECT * FROM people, and we execute
a database migration that alters the table name from people to users without updating the
code, the application will break. It will try to grab data from a table that no longer exists.

A simple solution to this problem would be to deploy new code that uses a new query:

SELECT * FROM users;

And we would deploy that code to production immediately following the migration.

SQL Data Types


SQL as a language can support many different data types. But the datatypes that your
database management system (DBMS) supports will vary depending on the specific database
you're using.

SQLite only supports the most basic types, and we're using SQLite in this course.

SQLite Data Types


Let's go over the data types supported by SQLite: and how they are stored.

1. NULL - Null value.

2. INTEGER - A signed integer stored in 0,1,2,3,4,6, or 8 bytes.

3. REAL - Floating point value stored as an 64-bit IEEE floating point number.

4. TEXT - Text string stored using database encoding such as UTF-8

5. BLOB - Short for Binary large object and typically used for images, audio or other
multimedia.

For example:
CREATE TABLE employees (
id INTEGER,
name TEXT,
age INTEGER,
is_manager BOOLEAN,
salary INTEGER

16
);

Boolean values
It's important to note that SQLite does not have a separate BOOLEAN storage class. Instead,
boolean values are stored as integers:

 0 = false

 1 = true

It's not actually all that weird – boolean values are just binary bits after all!

SQLite will still let you write your queries using boolean expressions and true/false
keywords, but it will convert the booleans to integers under-the-hood.

Chapter 3: Constraints
A constraint is a rule we create on a database that enforces some specific behavior. For
example, setting a NOT NULL constraint on a column ensures that the column will not
accept NULL values.

If we try to insert a NULL value into a column with the NOT NULL constraint, the insert
will fail with an error message. Constraints are extremely useful when we need to ensure that
certain kinds of data exist within our database.

NOT NULL constraint


The NOT NULL constraint can be added directly to the CREATE TABLE statement.

CREATE TABLE employees(


id INTEGER PRIMARY KEY,
name TEXT UNIQUE,
title TEXT NOT NULL
);

SQLite limitation
In other dialects of SQL you can ADD CONSTRAINT within an ALTER TABLE statement.
SQLite does not support this feature, so when we create our tables we need to make sure we
specify all the constraints we want.

Here's a list of SQL Features SQLite does not implement in case you're curious.

Primary Key Constraints


A key defines and protects relationships between tables. A primary key is a special column
that uniquely identifies records within a table. Each table can have one, and only one primary
key.
17
Your primary key will almost always be the "id" column
It's very common to have a column named id on each table in a database, and that id is the
primary key for that table. No two rows in that table can share an id.

A PRIMARY KEY constraint can be explicitly specified on a column to ensure uniqueness,


rejecting any inserts where you attempt to create a duplicate ID.

Foreign Key Constraints


Foreign keys are what makes relational databases relational! Foreign keys define the
relationships between tables. Simply put, a FOREIGN KEY is a field in one table that
references another table's PRIMARY KEY.

Creating a Foreign Key in SQLite


Creating a FOREIGN KEY in SQLite happens at table creation! After we define the table
fields and constraints we add an additional CONSTRAINT where we define the FOREIGN
KEY and its REFERENCES.

Here's an example:

CREATE TABLE departments (


id INTEGER PRIMARY KEY,
department_name TEXT NOT NULL
);

CREATE TABLE employees (


id INTEGER PRIMARY KEY,
name TEXT NOT NULL,
department_id INTEGER,
CONSTRAINT fk_departments
FOREIGN KEY (department_id)
REFERENCES departments(id)
);

In this example, an employee has a department_id. The department_id must be the same as
the id field of a record from the departments table.

Schema
We've used the word schema a few times now, let's talk about what that word means. A
database's schema describes how data is organized within it.

Data types, table names, field names, constraints, and the relationships between all of those
entities are part of a database's schema.

18
There is no perfect way to architect a database schema
When designing a database schema there typically isn't a "correct" solution. We do our best
to choose a sane set of tables, fields, constraints, etc that will accomplish our project's goals.
Like many things in programming, different schema designs come with different tradeoffs.

How do we decide on a sane schema architecture?


One very important decision that needs to be made is to decide which table will store a user's
balance! As you can imagine, ensuring our data is accurate when dealing with money is
super important. We want to be able to:

 Keep track of a user's current balance

 See the historical balance at any point in the past

 See a log of which transactions changed the balance over time

There are many ways to approach this problem. For our first attempt, let's try the simplest
schema that fulfills our project's needs.

Chapter 4: CRUD Operations in SQL


What is CRUD?
CRUD is an acronym that stands for CREATE, READ, UPDATE, and DELETE. These four
operations are the bread and butter of nearly every database you will create.

HTTP and CRUD


The CRUD operations correlate nicely with the HTTP methods you may have already
learned:

 HTTP POST - CREATE

 HTTP GET - READ

 HTTP PUT - UPDATE

 HTTP DELETE - DELETE

SQL Insert Statement


Tables are pretty useless without data in them. In SQL we can add records to a table using an
INSERT INTO statement. When using an INSERT statement we must first specify the table
we are inserting the record into, followed by the fields within that table we want to add
VALUES to.

Here's an example of an INSERT INTO statement:

19
INSERT INTO employees(id, name, title)
VALUES (1, 'Allan', 'Engineer');

HTTP CRUD Database lifecycle


It's important to understand how data flows through a typical web application.

1. The front-end processes some data from user input - maybe a form is submitted.

2. The front-end sends that data to the server through an HTTP request - maybe a
POST.

3. The server makes a SQL query to it's database to create an associated record -
Probably using an INSERT statement.

4. Once the server has processed that the database query was successful, it responds to
the front-end with a status code! Hopefully a 200-level code (success)!

Manual Entry
Manually INSERTing every single record in a database would be an extremely time-
consuming task! Working with raw SQL as we are now is not super common when designing
backend systems.

When working with SQL within a software system, like a backend web application, you'll
typically have access to a programming language such as Go or Python.

For example, a backend server written in Go can use string concatenation to dynamically
create SQL statements, and that's usually how it's done.

sqlQuery := [Link](`
INSERT INTO users(name, age, country_code)
VALUES ('%s', %v, %s);

20
`, [Link], [Link], [Link])

SQL Injection
The example above is an oversimplification of what really happens when you access a
database using Go code. In essence, it's correct. String interpolation is how production
systems access databases. That said, it must be done carefully to not be a security
vulnerability. We'll talk more about that later!

Count
We can use a SELECT statement to get a count of the records within a table. This can be very
useful when we need to know how many records there are, but we don't particularly care
what's in them.

Here's an example in SQLite:

SELECT count(*) from employees;

The * in this case refers to a column name. We don't care about the count of a specific
column - we want to know the number of total records so we can use the wildcard (*).

HTTP CRUD database lifecycle


We talked about how a "create" operation flows through a web application. Let's talk about a
"read".

Let's talk through an example. Our product manager wants to show profile data on a user's
settings page. Here's how we could engineer that feature request:

1. First, the front-end webpage loads.

21
2. The front-end sends an HTTP GET request to a /users endpoint on the back-end
server.

3. The server receives the request.

4. The server uses a SELECT statement to retrieve the user's record from the users table
in the database.

5. The server converts the row of SQL data into a JSON object and sends it back to the
front-end.

WHERE clause
In order to keep learning about CRUD operations in SQL, we need to learn how to make the
instructions we send to the database more specific. SQL accepts a WHERE statement within
a query that allows us to be very specific with our instructions.

If we were unable to specify the specific record we wanted to READ, UPDATE, or DELETE
making queries to a database would be very frustrating, and very inefficient.

Using a WHERE clause


Say we had over 9000 records in our users table. We often want to look at specific user data
within that table without retrieving all the other records in the table. We can use a SELECT
statement followed by a WHERE clause to specify which records to retrieve. The SELECT
statement stays the same, we just add the WHERE clause to the end of the SELECT.

Here's an example:

SELECT name FROM users WHERE power_level >= 9000;

This will select only the name field of any user within the users table WHERE the
power_level field is greater than or equal to 9000.

Finding NULL values


You can use a WHERE clause to filter values by whether or not they're NULL.

IS NULL
SELECT name FROM users WHERE first_name IS NULL;

IS NOT NULL
SELECT name FROM users WHERE first_name IS NOT NULL;

22
DELETE
When a user deletes their account on Twitter, or deletes a comment on a YouTube video, that
data needs to be removed from its respective database.

DELETE statement
A DELETE statement removes a record from a table that match the WHERE clause. As an
example:

DELETE from employees


WHERE id = 251;

This DELETE statement removes all records from the employees table that have an id of
251!

The danger of deleting data


Deleting data can be a dangerous operation. Once removed, data can be really hard if not
impossible to restore! Let's talk about a couple of common ways back-end engineers protect
against losing valuable customer data.

Strategy 1 - Backups
If you're using a cloud-service like GCP's Cloud SQL or AWS's RDS you should always turn
on automated backups. They take an automatic snapshot of your entire database on some
interval, and keep it around for some length of time.

For example, the [Link] database has a backup snapshot taken daily and we retain those
backups for 30 days. If I ever accidentally run a query that deletes valuable data, I can restore
it from the backup.

You should have a backup strategy for production databases.

Strategy 2 - Soft deletes


A "soft delete" is when you don't actually delete data from your database, but instead just
"mark" the data as deleted.

For example, you might set a deleted_at date on the row you want to delete. Then, in your
queries you ignore anything that has a deleted_at date set. The idea is that this allows your
application to behave as if it's deleting data, but you can always go back and restore any data
that's been removed.

You should probably only soft-delete if you have a specific reason to do so. Automated
backups should be "good enough" for most applications that are just interested in protecting
against developer mistakes.

23
Update query in SQL
Whenever you update your profile picture or change your password online, you are changing
the data in a field on a table in a database. Imagine if every time you accidentally messed up a
Tweet on Twitter you had to delete the entire tweet and post a new one instead of just editing
it...

...Well, that's a bad example.

Update statement
The UPDATE statement in SQL allows us to update the fields of a record. We can even
update many records depending on how we write the statement.

An UPDATE statement specifies the table that needs to be updated, followed by the fields
and their new values by using the SET keyword. Lastly a WHERE clause indicates the
record(s) to update.

UPDATE employees
SET job_title = 'Backend Engineer', salary = 150000
WHERE id = 251;

Object-Relational Mapping (ORMs)


An Object-Relational Mapping or an ORM for short, is a tool that allows you to perform
CRUD operations on a database using a traditional programming language. These typically
come in the form of a library or framework that you would use in your backend code.

The primary benefit an ORM provides is that it maps your database records to in-memory
objects. For example, in Go we might have a struct that we use in our code:

type User struct {


ID int
Name string
IsAdmin bool
}

This struct definition conveniently represents a database table called users, and an instance of
the struct represents a row in the table.

Example: Using an ORM


Using an ORM we might be able to write simple code like this:

user := User{
ID: 10,
Name: "Lane",
IsAdmin: false,

24
}

// generates a SQL statement and runs it,


// creating a new record in the users table
[Link](user)

Example: Using straight SQL


Using straight SQL we might have to do something a bit more manual:

user := User{
ID: 10,
Name: "Lane",
IsAdmin: false,
}

[Link]("INSERT INTO users (id, name, is_admin) VALUES (?, ?, ?);",


[Link], [Link], [Link])

Should you use an ORM?


That depends – an ORM typically trades simplicity for control.

Using straight SQL you can take full advantage of the power of the SQL language. Using an
ORM, you're limited by whatever functionality the ORM has.

If you run into issues with a specific query, it can be harder to debug with an ORM because
you have to dig through the framework's code and documentation to figure out how the
underlying queries are being generated.
I recommend doing projects both ways so that you can learn about the tradeoffs. At the end of
the day, when you're working on a team of developers, it will be a team decision.

Chapter 5: Basic SQL Queries


How to use the AS Clause in SQL
Sometimes we need to structure the data we return from our queries in a specific way. An AS
clause allows us to "alias" a piece of data in our query. The alias only exists for the duration
of the query.

AS keyword

The following queries return the same data:

25
SELECT employee_id AS id, employee_name AS name
FROM employees;

and:

SELECT employee_id, employee_name


FROM employees;

The difference is that the results from the aliased query would have column names id and
name instead of employee_id and employee_name.

SQL Functions
At the end of the day, SQL is a programming language, and it's one that supports functions.
We can use functions and aliases to calculate new columns in a query. This is similar to how
you might use formulas in Excel.

IIF function
In SQLite, the IIF function works like a ternary. For example:

IIF(carA > carB, "Car a is bigger", "Car b is bigger")

If a is greater than b, this statement evaluates to the string "Car a is bigger". Otherwise, it
evaluates to "Car b is bigger".

Here's how we can use IIF() and a directive alias to add a new calculated column to our result
set:

SELECT quantity,
IIF(quantity < 10, "Order more", "In Stock") AS directive
from products

How to Use BETWEEN with WHERE


We can check if certain values are between two numbers using the WHERE clause in an
intuitive way. The WHERE clause doesn't always have to be used to specify specific id's or
values. We can also use it to help narrow down our result set. Here's an example:

SELECT employee_name, salary


FROM employees
WHERE salary BETWEEN 30000 and 60000;

This query returns all the employees name and salary fields for any rows where the salary is
BETWEEN 30,000 and 60,000. We can also query results that are NOT BETWEEN two
specified values.

26
SELECT product_name, quantity
FROM products
WHERE quantity NOT BETWEEN 20 and 100;

This query returns all the product names where the quantity was not between 20 and 100. We
can use conditionals to make the results of our query as specific as we need them to be.

How to return distinct values


Sometimes we want to retrieve records from a table without getting back any duplicates.

For example, we may want to know all the different companies our employees have worked
at previously, but we don't want to see the same company multiple times in the report.

SELECT DISTINCT

SQL offers us the DISTINCT keyword that removes duplicate records from the resulting
query.

SELECT DISTINCT previous_company


FROM employees;

This only returns one row for each unique previous_company value.

Logical Operators
We often need to use multiple conditions to retrieve the exact information we want. We can
begin to structure much more complex queries by using multiple conditions together to
narrow down the search results of our query.

The logical AND operator can be used to narrow down our result sets even more.

AND operator

SELECT product_name, quantity, shipment_status


FROM products
WHERE shipment_status = 'pending'
AND quantity BETWEEN 0 and 10;

This only retrieves records where both the shipment_status is "pending" AND the quantity is
between 0 and 10.

Equality operators
All of the following operators are supported in SQL. The = is the main one to watch out for,
it's not == like in many other languages.

 =

27
 <

 >

 <=

 >=

For example, in Python you might compare two values like this:

if name == "age"

Whereas in SQL you would do:

WHERE name = "age"

OR operator

As you've probably guessed, if the logical AND operator is supported, the OR operator is
probably supported as well.

SELECT product_name, quantity, shipment_status


FROM products
WHERE shipment_status = 'out of stock'
OR quantity BETWEEN 10 and 100;

This query retrieves records where either the shipment_status condition OR the quantity
condition are met.

Order of operations matter when using these operators.


You can group logical operations with parentheses to specify the order of operations.

(this AND that) OR the_other

The IN operator
Another variation to the WHERE clause we can utilize is the IN operator. IN returns true or
false if the first operand matches any of the values in the second operand. The IN operator is
a shorthand for multiple OR conditions.

These two queries are equivalent:

SELECT product_name, shipment_status


FROM products
WHERE shipment_status IN ('shipped', 'preparing', 'out of stock');
SELECT product_name, shipment_status
FROM products

28
WHERE shipment_status = 'shipped'
OR shipment_status = 'preparing'
OR shipment_status = 'out of stock';

Hopefully, you're starting to see how querying specific data using fine-tuned SQL clauses
helps reveal important insights. The larger a table becomes the harder it becomes to analyze
without proper queries.

The LIKE keyword


Sometimes we don't have the luxury of knowing exactly what it is we need to query. Have
you ever wanted to look up a song or a video but you only remember part of the name? SQL
provides us an option for when we're in situations LIKE this.

The LIKE keyword allows for the use of the % and _ wildcard operators. Let's focus on %
first.

% Operator

The % operator will match zero or more characters. We can use this operator within our
query string to find more than just exact matches depending on where we place it.

Here are some examples that show how these work:

Product starts with "banana":

SELECT * FROM products


WHERE product_name LIKE 'banana%';

Product ends with "banana":

SELECT * from products


WHERE product_name LIKE '%banana';

Product contains "banana":

SELECT * from products


WHERE product_name LIKE '%banana%';

Underscore Operator
As discussed, the % wildcard operator matches zero or more characters. Meanwhile, the _
wildcard operator only matches a single character.

SELECT * FROM products


WHERE product_name LIKE '_oot';

29
The query above matches products like:

 boot

 root

 foot

SELECT * FROM products


WHERE product_name LIKE '__oot';

The query above matches products like:

 shoot

 groot

Chapter 6: How to Structure Return Data in SQL


The LIMIT keyword
Sometimes we don't want to retrieve every record from a table. For example, it's common for
a production database table to have millions of rows, and SELECTing all of them might crash
your system. This is where the LIMIT keyword enters the chat.

The LIMIT keyword can be used at the end of a select statement to reduce the number of
records returned.

SELECT * FROM products


WHERE product_name LIKE '%berry%'
LIMIT 50;

The query above retrieves all the records from the products table where the name contains the
word berry. If we ran this query on the Facebook database, it would almost certainly return a
lot of records.

The LIMIT statement only allows the database to return up to 50 records matching the query.
This means that if there aren't that many records matching the query, the LIMIT statement
will not have an effect.

The SQL ORDER BY keyword


SQL also offers us the ability to sort the results of a query using ORDER BY. By default, the
ORDER BY keyword sorts records by the given field in ascending order, or ASC for short.
However, ORDER BY does support descending order as well with the keyword DESC.

30
Examples
This query returns the name, price, and quantity fields from the products table sorted by price
in ascending order:

SELECT name, price, quantity FROM products


ORDER BY price;

This query returns the name, price, and quantity of the products ordered by the quantity in
descending order:

SELECT name, price, quantity FROM products


ORDER BY quantity desc;

Order By and Limit


When using both ORDER BY and LIMIT, the ORDER BY clause must come first.

Chapter 7: How to Perform Aggregations in SQL


An "aggregation" is a single value that's derived by combining several other values. We
performed an aggregation earlier when we used the count statement to count the number of
records in a table.

Why use aggregations?


Data stored in a database should generally be stored raw. When we need to calculate some
additional data from the raw data, we can use an aggregation.

Take the following count aggregation as an example:

SELECT COUNT(*)
FROM products
WHERE quantity = 0;

This query returns the number of products that have a quantity of 0. We could store a count
of the products in a separate database table, and increment/decrement it whenever we make
changes to the products table - but that would be redundant.

It's much simpler to store the products in a single place (we call this a single source of truth)
and run an aggregation when we need to derive additional information from the raw data.

The SUM function


The sum aggregation function returns the sum of a set of values.

31
For example, the query below returns a single record containing a single field. The returned
value is equal to the total salary being collected by all of the employees in the employees
table.

SELECT sum(salary)
FROM employees;

Which returns:

SUM(SALARY)

2483

The MAX function


As you may expect, the max function retrieves the largest value from a set of values. For
example:

SELECT max(price)
FROM products

This query looks through all of the prices in the products table and returns the price with the
largest price value. Remember it only returns the price, not the rest of the record. You always
need to specify each field you want a query to return.

A note on schema
 The sender_id will be present for any transactions where the user in question (user_id)
is receiving money (from the sender).

 The recipient_id will be present for any transactions where the user in question
(user_id) is sending money (to the recipient).

In other words, a transaction can only have a sender_id or a recipient_id - not both. The
presence of one or the other indicates whether money is going into or out of the user's
account.

This user_id, recipient_id, sender_id schema we've designed is only one way to design a
transactions database - there are other valid ways to do it. It's the one we're using, and later
we'll talk more about the tradeoffs in different database design options.

The MIN function


The min function works the same as the max function but finds the lowest value instead of
the highest value.

32
SELECT product_name, min(price)
from products;

This query returns the product_name and the price fields of the record with the lowest price.

The GROUP BY clause


There are times we need to group data based on specific values.

SQL offers the GROUP BY clause which can group rows that have similar values into
"summary" rows. It returns one row for each group. The interesting part is that each group
can have an aggregate function applied to it that operates only on the grouped data.

Example of GROUP BY
Imagine that we have a database with songs and albums, and we want to see how many songs
are on each album. We can use a query like this:

SELECT album_id, count(song_id)


FROM songs
GROUP BY album_id;

This query retrieves a count of all the songs on each album. One record is returned per album,
and they each have their own count.

The AVG() function


Just like we may want to find the minimum or maximum values within a dataset, sometimes
we need to know the average!

SQL offers us the AVG() function. Similar to MAX(), AVG() calculates the average of all
non-NULL values.

select song_name, avg(song_length)


from songs

This query returns the average song_length in the songs table.

The HAVING clause


When we need to filter the results of a GROUP BY query even further, we can use the
HAVING clause. The HAVING clause specifies a search condition for a group.

The HAVING clause is similar to the WHERE clause, but it operates on groups after they've
been grouped, rather than rows before they've been grouped.

SELECT album_id, count(id) as count


FROM songs
GROUP BY album_id
33
HAVING count > 5;

This query returns the album_id and count of its songs, but only for albums with more than 5
songs.

HAVING vs WHERE in SQL


It's fairly common for developers to get confused about the difference between the HAVING
and the WHERE clauses - they're pretty similar after all.

The difference is fairly simple in actuality:

 A WHERE condition is applied to all the data in a query before it's grouped by a
GROUP BY clause.

 A HAVING condition is only applied to the grouped rows that are returned after a
GROUP BY is applied.

This means that if you want to filter on the result of an aggregation, you need to use
HAVING. If you want to filter on a value that's present in the raw data, you should use a
simple WHERE clause.

The ROUND function


Sometimes we need to round some numbers, particularly when working with the results of an
aggregation. We can use the ROUND() function to get the job done.

The SQL round() function allows you to specify both the value you wish to round and the
precision to which you wish to round it:

round(value, precision)

If no precision is given, SQL will round the value to the nearest whole value:

select song_name, round(avg(song_length), 1)


from songs

This query returns the average song_length from the songs table, rounded to a single decimal
point.

Chapter 8: SQL Subqueries


Subqueries
Sometimes a single query is not enough to retrieve the specific records we need.

It is possible to run a query on the result set of another query - a query within a query! This is
called "query-ception"... erm... I mean a "subquery".

34
Subqueries can be very useful in a number of situations when trying to retrieve specific data
that wouldn't be accessible by simply querying a single table.

How to retreive data from multiple tables


Here is an example of a subquery:

SELECT id, song_name, artist_id


FROM songs
WHERE artist_id IN (
SELECT id
FROM artists
WHERE artist_name LIKE 'Rick%'
);

In this hypothetical database, the query above selects all of the song_ids, song_names, and
artist_ids from the songs table that are written by artists whose name starts with "Rick".
Notice that the subquery allows us to use information from a different table - in this case the
artists table.

Subquery syntax
The only syntax unique to a subquery is the parentheses surrounding the nested query. The IN
operator could be different, for example, we could use the = operator if we expect a single
value to be returned.

Here's an example:

SELECT id, song_name, artist_id


FROM songs
WHERE artist_id IN (
SELECT id
FROM artists
WHERE artist_name LIKE 'Rick%'
);

No tables necessary
When working on a back-end application, this doesn't come up often, but it's important to
remember that SQL is a full programming language. We usually use it to interact with data
stored in tables, but it's quite flexible and powerful.

For example, you can SELECT information that's simply calculated, with no tables
necessary.

SELECT 5 + 10 as sum;

35
Chapter 9: Database Normalization
Table Relationships
Relational databases are powerful because of the relationships between the tables. These
relationships help us to keep our databases clean and efficient.

A relationship between tables assumes that one of these tables has a foreign key that
references the primary key of another table.

@youtube

Types of Relationships
There are 3 primary types of relationships in a relational database:

1. One-to-one

2. One-to-many

3. Many-to-many

One-to-one
A one-to-one relationship most often manifests as a field or set of fields on a row in a table.
For example, a user will have exactly one password.

Settings fields might be another example of a one-to-one relationship. A user will have
exactly one email_preference and exactly one birthday.

One to many
When talking about the relationships between tables, a one-to-many relationship is probably
the most commonly used relationship.

36
A one-to-many relationship occurs when a single record in one table is related to potentially
many records in another table.

Note that the one->many relation only goes one way, a record in the second table can not be
related to multiple records in the first table!

Examples of one-to-many relationships


 A customers table and a orders table. Each customer has 0, 1, or many orders that
they've placed.

 A users table and a transactions table. Each user has 0, 1, or many transactions that
taken part in.

Many to many
A many-to-many relationship occurs when multiple records in one table can be related to
multiple records in another table.

Examples of many-to-many relationships


 A products table and a suppliers table - Products may have 0 to many suppliers, and
suppliers can supply 0 to many products.

 A classes table and a students table - Students can take potentially many classes and
classes can have many students enrolled.

Joining tables
Joining tables helps define many-to-many relationships between data in a database. As an
example, when defining the relationship above between products and suppliers, we would
define a joining table called products_suppliers that contains the primary keys from the tables
to be joined.
Then, when we want to see if a supplier supplies a specific product, we can look in the
joining table to see if the ids share a row.

Unique constraints across 2 fields


When enforcing specific schema constraints we may need to enforce the UNIQUE constraint
across two different fields.

CREATE TABLE product_suppliers (


product_id INTEGER,
supplier_id INTEGER,
UNIQUE(product_id, supplier_id)
);

This ensures that we can have multiple rows with the same product_id or supplier_id, but we
can't have two rows where both the product_id and supplier_id are the same.

37
Database normalization
Database normalization is a method for structuring your database schema in a way that
helps:

 Improve data integrity

 Reduce data redundancy

What is data integrity?


"Data integrity" refers to the accuracy and consistency of data. For example, if a user's age is
stored in a database, rather than their birthday, that data becomes incorrect automatically with
the passage of time.

It would be better to store a birthday and calculate the age as needed.

What is data redundancy?


"Data redundancy" occurs when the same piece of data is stored in multiple places. For
example: saving the same file multiple times to different hard drives.

Data redundancy can be problematic, especially when data in one place is changed such that
the data is no longer consistent across all copies of that data.

Normal Forms
The creator of "database normalization", Edgar F. Codd, described different "normal forms" a
database can adhere to. We'll talk about the most common ones.

 First normal form (1NF)

 Second normal form (2NF)


 Third normal form (3NF)

 Boyce-Codd normal form (BCNF)

38
In short, 1st normal form is the least "normalized" form, and Boyce-Codd is the most
"normalized" form.

The more normalized a database, the better its data integrity, and the less duplicate data you'll
have.

In the context of normal forms, "primary key" means something a bit different

In the context of database normalization, we're going to use the term "primary key" slightly
differently. When we're talking about SQLite, a "primary key" is a single column that
uniquely identifies a row.

When we're talking more generally about data normalization, the term "primary key" means
the collection of columns that uniquely identify a row. That can be a single column, but it can
actually be any number of columns. A primary key is the minimum number of columns
needed to uniquely identify a row in a table.

If you think back to the many-to-many joining table product_suppliers, that table's "primary
key" was actually a combination of the 2 ids, product_id and supplier_id:

CREATE TABLE product_suppliers (


product_id INTEGER,
supplier_id INTEGER,
UNIQUE(product_id, supplier_id)
);

1st Normal Form (1NF)


To be compliant with first normal form, a database table simply needs to follow 2 rules:

 It must have a unique primary key.

 A cell can't have a nested table as its value (depending on the database you're using,
this may not even be possible)

39
Example of NOT 1st normal form

name age email

Lane 27 lane@[Link]

Lane 27 lane@[Link]

Allan 27 allan@[Link]

This table does not adhere to 1NF. It has two identical rows, so there isn't a unique primary
key for each row.

Example of 1st normal form


The simplest way (but not the only way) to get into first normal form is to add a unique id
column.

id name age email

1 Lane 27 lane@[Link]

2 Lane 27 lane@[Link]

3 Allan 27 allan@[Link]

It's worth noting that if you create a "primary key" by ensuring that two columns are always
"unique together" that works too.

You should almost never design a table that doesn't adhere to 1NF

First normal form is simply a good idea. I've never built a database schema where each table
isn't at least in first normal form.

2nd Normal Form (2NF)


A table in second normal form follows all the rules of 1st normal form, and one additional
rule:

 All columns that are not part of the primary key are dependent on the entire primary
key, and not just one of the columns in the primary key.

Example of 1st NF, but not 2nd NF

In this table, the primary key is a combination of first_name + last_name.

first_name last_name first_initial

Lane Wagner l

Lane Small l

Allan Wagner a

40
This table does not adhere to 2NF. The first_initial column is entirely dependent on the
first_name column, rendering it redundant.

Example of 2nd normal form


One way to convert the table above to 2NF is to add a new table that maps a first_name
directly to its first_initial. This removes any duplicates:

first_name last_name

Lane Wagner

Lane Small

Allan Wagner

first_name first_initial

Lane l

Allan a

2NF is usually a good idea

You should probably default to keeping your tables in second normal form. That said, there
are good reasons to deviate from it, particularly for performance reasons. The reason being
that when you have query a second table to get additional data it can take a bit longer.

My rule of thumb is:

Optimize for data integrity and data de-duplication first. If you have speed issues, de-
normalize accordingly.

3rd Normal Form (3NF)


A table in 3rd normal form follows all the rules of 2nd normal form, and one additional rule:

 All columns that aren't part of the primary are dependent solely on the primary key.

Notice that this is only slightly different from second normal form. In second normal form we
can't have a column completely dependent on a part of the primary key, and in third normal
form we can't have a column that is entirely dependent on anything that isn't the entire
primary key.

Example of 2nd NF, but not 3rd NF


In this table, the primary key is simply the id column.

id name first_initial email

1 Lane l [Link]@[Link]

2 Breanna b breanna@[Link]

3 Lane l [Link]@[Link]

41
This table is in 2nd normal form because first_initial is not dependent on a part of the primary
key. However, because it is dependent on the name column it doesn't adhere to 3rd normal
form.

Example of 3rd normal form


The way to convert the table above to 3NF is to add a new table that maps a name directly to
its first_initial. Notice how similar this solution is to 2NF.

id name email

1 Lane [Link]@[Link]

2 Breanna breanna@[Link]

3 Lane [Link]@[Link]

name first_initial

Lane l

Breanna b

3NF is usually a good idea

The same exact rule of thumb applies to the second and third normal forms.

Optimize for data integrity and data de-duplication first by adhering to 3NF. If you have
speed issues, de-normalize accordingly.

Remember the IIF function and the AS clause.

Boyce-Codd Normal Form (BCNF)


A table in Boyce-Codd normal form (created by Raymond F Boyce and Edgar F Codd)
follows all the rules of 3rd normal form, plus one additional rule:

 A column that's part of a primary key can not be entirely dependent on a column that's
not part of that primary key.

This only comes into play when there are multiple possible primary key combinations that
overlap. Another name for this is "overlapping candidate keys".

Only in rare cases does a table in third normal form not meet the requirements of Boyce-
Codd normal form.

Example of 3rd NF, but not Boyce-Codd NF


release_year release_date sales name

2001 2001-01-02 100 Kiss me tender

2001 2001-02-04 200 Bloody Mary

42
2002 2002-04-14 100 I wanna be them

2002 2002-06-24 200 He got me

The interesting thing here is that there are 3 possible primary keys:

 release_year + sales

 release_date + sales

 name

This means that by definition this table is in 2nd and 3rd normal form because those forms
only restrict how dependent a column that is not part of a primary key can be.

This table is not in Boyce-Codd's normal form because release_year is entirely dependent on
release_date.

Example of Boyce-Codd normal form

The easiest way to fix the table in our example is to simply remove the duplicate data from
release_date. Let's make that column release_day_and_month.

release_year release_day_and_month sales name

2001 01-02 100 Kiss me tender

2001 02-04 200 Bloody Mary

2002 04-14 100 I wanna be them

2002 06-24 200 He got me

BCNF is usually a good idea

The same exact rule of thumb applies to the 2nd, 3rd and Boyce-Codd normal forms. That
said, it's unlikely you'll see BCNF-specific issues in practice.

Optimize for data integrity and data de-duplication first by adhering to Boyce-Codd normal
form. If you have speed issues, de-normalize accordingly.

Normalization Review
In my opinion, the exact definitions of 1st, 2nd, 3rd and Boyce-Codd normal forms simply
are not all that important in your work as a back-end developer.

However, what is important is to understand the basic principles of data integrity and data
redundancy that the normal forms teach us.

Let's go over some rules of thumb that you should commit to memory - they'll serve you well
when you design databases and even just in coding interviews.

43
Rules of thumb for database design

1. Every table should always have a unique identifier (primary key)

2. 90% of the time, that unique identifier will be a single column named id

3. Avoid duplicate data

4. Avoid storing data that is completely dependent on other data. Instead, compute it on
the fly when you need it.

5. Keep your schema as simple as you can. Optimize for a normalized database first.
Only denormalize for speed's sake when you start to run into performance problems.

We'll talk more about speed optimization in a later chapter.

Chapter 10: How to Join Tables in SQL


Joins are one of the most important features that SQL offers. Joins allow us to make use of
the relationships we have set up between our tables. In short, joins allow us to query multiple
tables at the same time.

INNER JOIN
The simplest and most common type of join in SQL is the INNER JOIN. By default, a JOIN
command is an INNER JOIN.

An INNER JOIN returns all of the records in table_a that have matching records in table_b,
as demonstrated by the following Venn diagram.

The ON clause
In order to perform a join, we need to tell the database which fields should be "matched up".
The ON clause is used to specify these columns to join.

44
SELECT *
FROM employees
INNER JOIN departments
ON employees.department_id = [Link];

The query above returns all the fields from both tables. The INNER keyword doesn't have
anything to do with the number of columns returned - it only affects the number of rows
returned.

Namespacing on Tables
When working with multiple tables, you can specify which table a field exists on using a ..
For example:

table_name.column_name

SELECT [Link], [Link]


FROM students
INNER JOIN classes on classes.class_id = students.class_id;

The above query returns the name field from the students table and the name field from the
classes table.

LEFT JOIN
A LEFT JOIN will return every record from table_a regardless of whether or not any of those
records have a match in table_b. A left join will also return any matching records from
table_b.

Here is a Venn diagram to help visualize the effect of a LEFT JOIN.

A small trick you can do to make writing the SQL query easier is define an alias for each
table. Here's an example:

SELECT [Link], [Link]


FROM employees e
LEFT JOIN departments d

45
ON e.department_id = [Link];

Notice the simple alias declarations e and d for employees and departments respectively.

Some developers do this to make their queries less verbose. That said, I personally hate it
because single-letter variables are harder to understand the meaning of.

RIGHT JOIN
A RIGHT JOIN is, as you may expect, the opposite of a LEFT JOIN. It returns all records
from table_b regardless of matches, and all matching records between the two tables.

SQLite Restriction
SQLite does not support right joins, but many dialects of SQL do. If you think about it, a
RIGHT JOIN is just a LEFT JOIN with the order of the tables switched, so it's not a big deal
that SQLite doesn't support the syntax.

FULL JOIN
A FULL JOIN combines the result set of the LEFT JOIN and RIGHT JOIN commands. It
returns all records from both from table_a and table_b regardless of whether or not they have
matches.

46
SQLite
Like RIGHT JOINs, SQLite doesn't support FULL JOINs but they are still important to
know.

Chapter 11: Database Performance


SQL Indexes
An index is an in-memory structure that ensures that queries we run on a database are
performant, that is to say, they run quickly.

If you've learned about data structures, most database indexes are just binary trees. The
binary tree can be stored in ram as well as on disk, and it makes it easy to lookup the location
of an entire row.

PRIMARY KEY columns are indexed by default, ensuring you can look up a row by its id
very quickly. But if you have other columns that you want to be able to do quick lookups on,
you'll need to index them.

CREATE INDEX
CREATE INDEX index_name on table_name (column_name);

It's fairly common to name an index after the column it's created on with a suffix of _idx.

47
Index Review
As we discussed, an index is a data structure that can perform quick lookups. By indexing a
column, we create a new in-memory structure, usually a binary-tree, where the values in the
indexed column are sorted into the tree to keep lookups fast.

In terms of Big-O complexity, a binary tree index ensures that lookups are O(log(n)).

Shouldn't we index everything? We can make the database ultra-fast!

While indexes make specific kinds of lookups much faster, they also add performance
overhead - they can slow down a database in other ways.

Think about it: if you index every column, you could have hundreds of binary trees in
memory. That needlessly bloats the memory usage of your database. It also means that each
time you insert a record, that record needs to be added to many trees - slowing down your
insert speed.

The rule of thumb is simple:

Add an index to columns you know you'll be doing frequent lookups on. Leave everything else
un-indexed. You can always add indexes later.

Multi-column indexes
Multi-column indexes are useful for the exact reason you might think - they speed up lookups
that depend on multiple columns.

CREATE INDEX

CREATE INDEX first_name_last_name_age_idx


ON users (first_name, last_name, age);

A multi-column index is sorted by the first column first, the second column next, and so
forth. A lookup on only the first column in a multi-column index gets almost all of the
performance improvements that it would get from its own single-column index. But lookups
on only the second or third column will have very degraded performance.

Rule of thumb
Unless you have specific reasons to do something special, only add multi-column indexes if
you're doing frequent lookups on a specific combination of columns.

Denormalizing for speed


I left you with a cliffhanger in the "normalization" chapter. As it turns out, data integrity and
deduplication come at a cost, and that cost is usually speed.

48
Joining tables together, using subqueries, performing aggregations, and running post-hoc
calculations all take time. At very large scales these advanced techniques can actually take a
huge performance toll on an application - sometimes grinding the database server to a halt.

Storing duplicate information can drastically speed up an application that needs to look it up
in different ways. For example, if you store a user's country information right on their user
record, no expensive join is required to load their profile page.

That said, denormalize at your own risk. Denormalizing a database incurs a large risk of
inaccurate and buggy data.

In my opinion, it should be used as a kind of "last resort" in the name of speed.

SQL Injection
SQL is a very common way hackers attempt to cause damage or breach a database.

he joke here is that if someone was using this query:

INSERT INTO students(name) VALUES (?);

And the "name" of a student was 'Robert'); DROP TABLE students;-- then the resulting SQL
query would look like this:

INSERT INTO students(name) VALUES ('Robert'); DROP TABLE students;--)

As you can see, this is actually 2 queries! The first one inserts "Robert" into the database, and
the second one deletes the students table!

How do we protect against SQL injection?


You need to be aware of SQL injection attacks, but to be honest the solution these days is to
simply use a modern SQL library that sanitizes SQL inputs. We don't often need to sanitize
inputs by hand at the application level anymore.

For example, the Go standard library's SQL packages automatically protects your inputs
against SQL attacks if you use it properly. In short, don't interpolate user input into raw

49
strings yourself - make sure your database library has a way to sanitize inputs, and pass it
those raw values.

50

You might also like