MySQL Database Operations Guide
MySQL Database Operations Guide
Introduction..........................................................6
Data Types in MySQL................................................7
CREATE TABLE....................................................7
DROP TABLE........................................................7
RENAME TABLE....................................................8
INSERT INTO.......................................................8
SELECT..............................................................8
SELECT DISTINCT.................................................8
ALTER TABLE......................................................9
1. Add a column to a table using ALTER TABLE with ADD................9
2. Modify a column using ALTER TABLE with MODIFY..................9
3. Rename columns using ALTER TABLE with CHANGE COLUMN. . .9
4. Drop a column using ALTER TABLE with DROP COLUMN...........9
ORDER BY in MySQL.............................................10
Aliases in MySQL...................................................10
Column Alias........................................................10
Table Alias..........................................................10
WHERE clause in MySQL.........................................10
IN Operator in MySQL.............................................11
LIKE Operator in MySQL.........................................11
IS NULL Operator in MySQL......................................11
JOIN................................................................12
INNER JOIN......................................................................... 12
LEFT JOIN........................................................................... 12
RIGHT JOIN......................................................................... 12
CROSS JOIN.........................................................................13
GROUP BY.........................................................13
HAVING............................................................13
Comparing SQL Databases.........................................14
SQLite vs PostgreSQL.............................................................14
1
Chapter 2: SQL Tables..............................................14
How to use the CREATE TABLE statement...................................14
How to Alter Tables.................................................................15
How to use ALTER TABLE....................................................................15
Intro to Migrations..................................................................16
Example of a bad migration.......................................................16
SQL Data Types..................................................................... 16
SQLite Data Types..................................................................16
For example:.....................................................................................16
Boolean values...................................................................................17
Chapter 3: Constraints..............................................17
NOT NULL constraint..............................................................17
SQLite limitation.................................................................... 17
Primary Key Constraints...........................................................18
Your primary key will almost always be the "id" column................................18
Foreign Key Constraints...........................................................18
Creating a Foreign Key in SQLite................................................18
Schema................................................................................ 19
There is no perfect way to architect a database schema.................................19
How do we decide on a sane schema architecture?......................................19
2
WHERE clause.......................................................................22
Using a WHERE clause............................................................22
Finding NULL values...............................................................23
IS NULL.............................................................................. 23
IS NOT NULL.......................................................................23
DELETE.............................................................................. 23
DELETE statement..................................................................23
The danger of deleting data...................................................................23
Strategy 2 - Soft deletes........................................................................24
Update query in SQL...............................................................24
Update statement................................................................................24
Object-Relational Mapping (ORMs).............................................24
Example: Using an ORM......................................................................25
Example: Using straight SQL.................................................................25
Should you use an ORM?..........................................................25
3
Examples..........................................................................................31
Order By and Limit.................................................................31
4
What is data redundancy?..........................................................38
Normal Forms........................................................................39
1st Normal Form (1NF)........................................................................40
2nd Normal Form (2NF).......................................................................41
3rd Normal Form (3NF).......................................................................42
Boyce-Codd Normal Form (BCNF)..........................................................43
Normalization Review..............................................................44
5
Module 15: MYSQL Essentials
Introduction
MySQL is one of the most popular open-source Relational Database Management Systems
(RDBMS).
6
A MySQL client might hold multiple databases. Each database might contain multiple tables
with each table holding data of the same type. Each table contains rows and columns with
each row denoting a single entry and each column denoting different attributes of the entries.
MySQL Commands
CREATE TABLE
The CREATE TABLE statement is used in MySQL to create a new table in a database. The
syntax for this is shown below:
Syntax:
DROP TABLE
The DROP TABLE statement is used in MySQL to drop or delete a table from the database.
The syntax for this is shown below:
7
Syntax:
RENAME TABLE
The RENAME TABLE statement is used in MySQL to rename the existing tables. One or
more tables can be renamed using this statement.
Syntax:
INSERT INTO
The INSERT INTO statement is used in MySQL to insert rows into a table. One or more
rows can be inserted into a table using this statement.
Syntax:
SELECT
The SELECT statement is used for querying data. It allows you to select data from one or
more tables.
Syntax:
SELECT DISTINCT
The SELECT DISTINCT statement is used in MySQL to remove duplicate rows.
Syntax:
8
ALTER TABLE
The statement ALTER TABLE can be used in MySQL to add a column, modify a column,
drop a column, rename a column from a table.
Syntax:
9
ORDER BY in MySQL
The ORDER BY clause is used in MySQL to sort the retrieved data in a particular order.
Syntax:
Aliases in MySQL
Aliases are used to give columns or tables a temporary or simple name. AS keyword is used
to create an alias.
Column Alias
The syntax for column name aliases is written below.
Syntax:
Table Alias
The aliases can be used to give simple and different names to tables also.
Syntax:
Syntax:
10
IN Operator in MySQL
The IN operator is used to check if a value matches any of the values in a list of values. It is
similar to the OR operator as if any of the values in the list matches it returns true.
Syntax:
or
There are two wildcards in MySQL, used with the LIKE operator for searching a pattern.
Syntax:
For example:
Syntax:
11
JOIN
Joins are used in relational databases to combine data from multiple tables based on a
common column between them. A foreign key may be used to reference a row in another
table and join can be done based on those columns. Two or more tables may have some
related data, and to combine all the data from multiple tables joins are used.
1. INNER JOIN
2. LEFT JOIN
3. RIGHT JOIN
4. CROSS JOIN
INNER JOIN
The INNER JOIN produces the output by combining those rows which have matching
column values.
Syntax:
SELECT column_names
FROM table1
INNER JOIN table2 ON table1.common_column=table2.common_column
INNER JOIN table3 ON table1.common_column=table3.common_column
...;
LEFT JOIN
The LEFT JOIN returns all the rows from the left table ‘A’ and the matching rows from the
right table ‘B’ in the join. The rows from the left table, which have no matching values in the
right table will be returned with a NULL value in the link column.
Syntax:
SELECT column_names
FROM table1
LEFT JOIN table2 ON table1.common_column=table2.common_column;
RIGHT JOIN
The RIGHT JOIN returns all the rows from the right table ‘B’ and the matching rows from
the left table ‘A’ in the join. The rows from the right table, which have no matching values in
the left table will be returned with a NULL value in the link column.
Syntax:
12
SELECT Column_names
FROM table1
RIGHT JOIN table2 ON table1.common_column=table2.common_column;
CROSS JOIN
CROSS JOIN returns the cartesian product of rows from the tables in the join. It combines
each row of the first table with each row of the second table. If there are X rows in the first
table and Y rows in the second table then the number of rows in the joined table will be
X*Y.
Syntax:
SELECT column_names
FROM table1
CROSS JOIN table2;
GROUP BY
The GROUP BY clause is used to arrange the rows in a group using a particular column
value. If there are multiple rows with the same value for a column then all those rows will be
grouped with that column value.
Syntax:
SELECT column1,column2,…
FROM table_name
WHERE condition
GROUP BY column1,column2, …
Order BY column1, column2, ….
The GROUP BY clause is generally used with aggregate functions like SUM, AVG,
COUNT, MAX, MIN. The aggregate functions provide information about each group.
HAVING
The HAVING clause is used with the GROUP BY clause in a query to specify come
conditions and filter some groups or aggregates that fulfill those conditions. The difference
between WHERE and HAVING is, WHERE is used to filter rows, and HAVING is used to
filter groups by applying some conditions.
Syntax:
PostgreSQL
MySQL
SQLite
Source: [Link]
While all of these Databases use SQL, each database defines specific rules, practices, and
strategies that separate them from their competitors.
SQLite vs PostgreSQL
Personally, SQLite and PostgreSQL are my favorites from the list above. Postgres is a very
powerful, open-source, production-ready SQL database. SQLite is a lightweight, embeddable,
open-source database. I usually choose one of these technologies if I'm doing SQL work.
SQLite is a serverless database management system (DBMS) that has the ability to run within
applications, whereas PostgreSQL uses a Client-Server model and requires a server to be
installed and listening on a network, similar to an HTTP server.
CREATE TABLE employees (id INTEGER, name TEXT, age INTEGER, is_manager
BOOLEAN, salary INTEGER);
14
Each field name is followed by its datatype. We'll get to data types in a minute.
It's also acceptable and common to break up the CREATE TABLE statement with some
whitespace like this:
Instead, we can use use the ALTER TABLE statement to make changes in place without
deleting any data.
Intro to Migrations
A database migration is a set of changes to a relational database. In fact, the ALTER TABLE
statements we did in the last exercise were examples of migrations.
15
Migrations are helpful when transitioning from one state to another, fixing mistakes, or
adapting a database to changes.
Good migrations are small, incremental and ideally reversible changes to a database. As you
can imagine, when working with large databases, making changes can be scary. We have to
be careful when writing database migrations so that we don't break any systems that depend
on the old database schema.
A simple solution to this problem would be to deploy new code that uses a new query:
And we would deploy that code to production immediately following the migration.
SQLite only supports the most basic types, and we're using SQLite in this course.
3. REAL - Floating point value stored as an 64-bit IEEE floating point number.
5. BLOB - Short for Binary large object and typically used for images, audio or other
multimedia.
For example:
CREATE TABLE employees (
id INTEGER,
name TEXT,
age INTEGER,
is_manager BOOLEAN,
salary INTEGER
16
);
Boolean values
It's important to note that SQLite does not have a separate BOOLEAN storage class. Instead,
boolean values are stored as integers:
0 = false
1 = true
It's not actually all that weird – boolean values are just binary bits after all!
SQLite will still let you write your queries using boolean expressions and true/false
keywords, but it will convert the booleans to integers under-the-hood.
Chapter 3: Constraints
A constraint is a rule we create on a database that enforces some specific behavior. For
example, setting a NOT NULL constraint on a column ensures that the column will not
accept NULL values.
If we try to insert a NULL value into a column with the NOT NULL constraint, the insert
will fail with an error message. Constraints are extremely useful when we need to ensure that
certain kinds of data exist within our database.
SQLite limitation
In other dialects of SQL you can ADD CONSTRAINT within an ALTER TABLE statement.
SQLite does not support this feature, so when we create our tables we need to make sure we
specify all the constraints we want.
Here's a list of SQL Features SQLite does not implement in case you're curious.
Here's an example:
In this example, an employee has a department_id. The department_id must be the same as
the id field of a record from the departments table.
Schema
We've used the word schema a few times now, let's talk about what that word means. A
database's schema describes how data is organized within it.
Data types, table names, field names, constraints, and the relationships between all of those
entities are part of a database's schema.
18
There is no perfect way to architect a database schema
When designing a database schema there typically isn't a "correct" solution. We do our best
to choose a sane set of tables, fields, constraints, etc that will accomplish our project's goals.
Like many things in programming, different schema designs come with different tradeoffs.
There are many ways to approach this problem. For our first attempt, let's try the simplest
schema that fulfills our project's needs.
19
INSERT INTO employees(id, name, title)
VALUES (1, 'Allan', 'Engineer');
1. The front-end processes some data from user input - maybe a form is submitted.
2. The front-end sends that data to the server through an HTTP request - maybe a
POST.
3. The server makes a SQL query to it's database to create an associated record -
Probably using an INSERT statement.
4. Once the server has processed that the database query was successful, it responds to
the front-end with a status code! Hopefully a 200-level code (success)!
Manual Entry
Manually INSERTing every single record in a database would be an extremely time-
consuming task! Working with raw SQL as we are now is not super common when designing
backend systems.
When working with SQL within a software system, like a backend web application, you'll
typically have access to a programming language such as Go or Python.
For example, a backend server written in Go can use string concatenation to dynamically
create SQL statements, and that's usually how it's done.
sqlQuery := [Link](`
INSERT INTO users(name, age, country_code)
VALUES ('%s', %v, %s);
20
`, [Link], [Link], [Link])
SQL Injection
The example above is an oversimplification of what really happens when you access a
database using Go code. In essence, it's correct. String interpolation is how production
systems access databases. That said, it must be done carefully to not be a security
vulnerability. We'll talk more about that later!
Count
We can use a SELECT statement to get a count of the records within a table. This can be very
useful when we need to know how many records there are, but we don't particularly care
what's in them.
The * in this case refers to a column name. We don't care about the count of a specific
column - we want to know the number of total records so we can use the wildcard (*).
Let's talk through an example. Our product manager wants to show profile data on a user's
settings page. Here's how we could engineer that feature request:
21
2. The front-end sends an HTTP GET request to a /users endpoint on the back-end
server.
4. The server uses a SELECT statement to retrieve the user's record from the users table
in the database.
5. The server converts the row of SQL data into a JSON object and sends it back to the
front-end.
WHERE clause
In order to keep learning about CRUD operations in SQL, we need to learn how to make the
instructions we send to the database more specific. SQL accepts a WHERE statement within
a query that allows us to be very specific with our instructions.
If we were unable to specify the specific record we wanted to READ, UPDATE, or DELETE
making queries to a database would be very frustrating, and very inefficient.
Here's an example:
This will select only the name field of any user within the users table WHERE the
power_level field is greater than or equal to 9000.
IS NULL
SELECT name FROM users WHERE first_name IS NULL;
IS NOT NULL
SELECT name FROM users WHERE first_name IS NOT NULL;
22
DELETE
When a user deletes their account on Twitter, or deletes a comment on a YouTube video, that
data needs to be removed from its respective database.
DELETE statement
A DELETE statement removes a record from a table that match the WHERE clause. As an
example:
This DELETE statement removes all records from the employees table that have an id of
251!
Strategy 1 - Backups
If you're using a cloud-service like GCP's Cloud SQL or AWS's RDS you should always turn
on automated backups. They take an automatic snapshot of your entire database on some
interval, and keep it around for some length of time.
For example, the [Link] database has a backup snapshot taken daily and we retain those
backups for 30 days. If I ever accidentally run a query that deletes valuable data, I can restore
it from the backup.
For example, you might set a deleted_at date on the row you want to delete. Then, in your
queries you ignore anything that has a deleted_at date set. The idea is that this allows your
application to behave as if it's deleting data, but you can always go back and restore any data
that's been removed.
You should probably only soft-delete if you have a specific reason to do so. Automated
backups should be "good enough" for most applications that are just interested in protecting
against developer mistakes.
23
Update query in SQL
Whenever you update your profile picture or change your password online, you are changing
the data in a field on a table in a database. Imagine if every time you accidentally messed up a
Tweet on Twitter you had to delete the entire tweet and post a new one instead of just editing
it...
Update statement
The UPDATE statement in SQL allows us to update the fields of a record. We can even
update many records depending on how we write the statement.
An UPDATE statement specifies the table that needs to be updated, followed by the fields
and their new values by using the SET keyword. Lastly a WHERE clause indicates the
record(s) to update.
UPDATE employees
SET job_title = 'Backend Engineer', salary = 150000
WHERE id = 251;
The primary benefit an ORM provides is that it maps your database records to in-memory
objects. For example, in Go we might have a struct that we use in our code:
This struct definition conveniently represents a database table called users, and an instance of
the struct represents a row in the table.
user := User{
ID: 10,
Name: "Lane",
IsAdmin: false,
24
}
user := User{
ID: 10,
Name: "Lane",
IsAdmin: false,
}
Using straight SQL you can take full advantage of the power of the SQL language. Using an
ORM, you're limited by whatever functionality the ORM has.
If you run into issues with a specific query, it can be harder to debug with an ORM because
you have to dig through the framework's code and documentation to figure out how the
underlying queries are being generated.
I recommend doing projects both ways so that you can learn about the tradeoffs. At the end of
the day, when you're working on a team of developers, it will be a team decision.
AS keyword
25
SELECT employee_id AS id, employee_name AS name
FROM employees;
and:
The difference is that the results from the aliased query would have column names id and
name instead of employee_id and employee_name.
SQL Functions
At the end of the day, SQL is a programming language, and it's one that supports functions.
We can use functions and aliases to calculate new columns in a query. This is similar to how
you might use formulas in Excel.
IIF function
In SQLite, the IIF function works like a ternary. For example:
If a is greater than b, this statement evaluates to the string "Car a is bigger". Otherwise, it
evaluates to "Car b is bigger".
Here's how we can use IIF() and a directive alias to add a new calculated column to our result
set:
SELECT quantity,
IIF(quantity < 10, "Order more", "In Stock") AS directive
from products
This query returns all the employees name and salary fields for any rows where the salary is
BETWEEN 30,000 and 60,000. We can also query results that are NOT BETWEEN two
specified values.
26
SELECT product_name, quantity
FROM products
WHERE quantity NOT BETWEEN 20 and 100;
This query returns all the product names where the quantity was not between 20 and 100. We
can use conditionals to make the results of our query as specific as we need them to be.
For example, we may want to know all the different companies our employees have worked
at previously, but we don't want to see the same company multiple times in the report.
SELECT DISTINCT
SQL offers us the DISTINCT keyword that removes duplicate records from the resulting
query.
This only returns one row for each unique previous_company value.
Logical Operators
We often need to use multiple conditions to retrieve the exact information we want. We can
begin to structure much more complex queries by using multiple conditions together to
narrow down the search results of our query.
The logical AND operator can be used to narrow down our result sets even more.
AND operator
This only retrieves records where both the shipment_status is "pending" AND the quantity is
between 0 and 10.
Equality operators
All of the following operators are supported in SQL. The = is the main one to watch out for,
it's not == like in many other languages.
=
27
<
>
<=
>=
For example, in Python you might compare two values like this:
if name == "age"
OR operator
As you've probably guessed, if the logical AND operator is supported, the OR operator is
probably supported as well.
This query retrieves records where either the shipment_status condition OR the quantity
condition are met.
The IN operator
Another variation to the WHERE clause we can utilize is the IN operator. IN returns true or
false if the first operand matches any of the values in the second operand. The IN operator is
a shorthand for multiple OR conditions.
28
WHERE shipment_status = 'shipped'
OR shipment_status = 'preparing'
OR shipment_status = 'out of stock';
Hopefully, you're starting to see how querying specific data using fine-tuned SQL clauses
helps reveal important insights. The larger a table becomes the harder it becomes to analyze
without proper queries.
The LIKE keyword allows for the use of the % and _ wildcard operators. Let's focus on %
first.
% Operator
The % operator will match zero or more characters. We can use this operator within our
query string to find more than just exact matches depending on where we place it.
Underscore Operator
As discussed, the % wildcard operator matches zero or more characters. Meanwhile, the _
wildcard operator only matches a single character.
29
The query above matches products like:
boot
root
foot
shoot
groot
The LIMIT keyword can be used at the end of a select statement to reduce the number of
records returned.
The query above retrieves all the records from the products table where the name contains the
word berry. If we ran this query on the Facebook database, it would almost certainly return a
lot of records.
The LIMIT statement only allows the database to return up to 50 records matching the query.
This means that if there aren't that many records matching the query, the LIMIT statement
will not have an effect.
30
Examples
This query returns the name, price, and quantity fields from the products table sorted by price
in ascending order:
This query returns the name, price, and quantity of the products ordered by the quantity in
descending order:
SELECT COUNT(*)
FROM products
WHERE quantity = 0;
This query returns the number of products that have a quantity of 0. We could store a count
of the products in a separate database table, and increment/decrement it whenever we make
changes to the products table - but that would be redundant.
It's much simpler to store the products in a single place (we call this a single source of truth)
and run an aggregation when we need to derive additional information from the raw data.
31
For example, the query below returns a single record containing a single field. The returned
value is equal to the total salary being collected by all of the employees in the employees
table.
SELECT sum(salary)
FROM employees;
Which returns:
SUM(SALARY)
2483
SELECT max(price)
FROM products
This query looks through all of the prices in the products table and returns the price with the
largest price value. Remember it only returns the price, not the rest of the record. You always
need to specify each field you want a query to return.
A note on schema
The sender_id will be present for any transactions where the user in question (user_id)
is receiving money (from the sender).
The recipient_id will be present for any transactions where the user in question
(user_id) is sending money (to the recipient).
In other words, a transaction can only have a sender_id or a recipient_id - not both. The
presence of one or the other indicates whether money is going into or out of the user's
account.
This user_id, recipient_id, sender_id schema we've designed is only one way to design a
transactions database - there are other valid ways to do it. It's the one we're using, and later
we'll talk more about the tradeoffs in different database design options.
32
SELECT product_name, min(price)
from products;
This query returns the product_name and the price fields of the record with the lowest price.
SQL offers the GROUP BY clause which can group rows that have similar values into
"summary" rows. It returns one row for each group. The interesting part is that each group
can have an aggregate function applied to it that operates only on the grouped data.
Example of GROUP BY
Imagine that we have a database with songs and albums, and we want to see how many songs
are on each album. We can use a query like this:
This query retrieves a count of all the songs on each album. One record is returned per album,
and they each have their own count.
SQL offers us the AVG() function. Similar to MAX(), AVG() calculates the average of all
non-NULL values.
The HAVING clause is similar to the WHERE clause, but it operates on groups after they've
been grouped, rather than rows before they've been grouped.
This query returns the album_id and count of its songs, but only for albums with more than 5
songs.
A WHERE condition is applied to all the data in a query before it's grouped by a
GROUP BY clause.
A HAVING condition is only applied to the grouped rows that are returned after a
GROUP BY is applied.
This means that if you want to filter on the result of an aggregation, you need to use
HAVING. If you want to filter on a value that's present in the raw data, you should use a
simple WHERE clause.
The SQL round() function allows you to specify both the value you wish to round and the
precision to which you wish to round it:
round(value, precision)
If no precision is given, SQL will round the value to the nearest whole value:
This query returns the average song_length from the songs table, rounded to a single decimal
point.
It is possible to run a query on the result set of another query - a query within a query! This is
called "query-ception"... erm... I mean a "subquery".
34
Subqueries can be very useful in a number of situations when trying to retrieve specific data
that wouldn't be accessible by simply querying a single table.
In this hypothetical database, the query above selects all of the song_ids, song_names, and
artist_ids from the songs table that are written by artists whose name starts with "Rick".
Notice that the subquery allows us to use information from a different table - in this case the
artists table.
Subquery syntax
The only syntax unique to a subquery is the parentheses surrounding the nested query. The IN
operator could be different, for example, we could use the = operator if we expect a single
value to be returned.
Here's an example:
No tables necessary
When working on a back-end application, this doesn't come up often, but it's important to
remember that SQL is a full programming language. We usually use it to interact with data
stored in tables, but it's quite flexible and powerful.
For example, you can SELECT information that's simply calculated, with no tables
necessary.
SELECT 5 + 10 as sum;
35
Chapter 9: Database Normalization
Table Relationships
Relational databases are powerful because of the relationships between the tables. These
relationships help us to keep our databases clean and efficient.
A relationship between tables assumes that one of these tables has a foreign key that
references the primary key of another table.
@youtube
Types of Relationships
There are 3 primary types of relationships in a relational database:
1. One-to-one
2. One-to-many
3. Many-to-many
One-to-one
A one-to-one relationship most often manifests as a field or set of fields on a row in a table.
For example, a user will have exactly one password.
Settings fields might be another example of a one-to-one relationship. A user will have
exactly one email_preference and exactly one birthday.
One to many
When talking about the relationships between tables, a one-to-many relationship is probably
the most commonly used relationship.
36
A one-to-many relationship occurs when a single record in one table is related to potentially
many records in another table.
Note that the one->many relation only goes one way, a record in the second table can not be
related to multiple records in the first table!
A users table and a transactions table. Each user has 0, 1, or many transactions that
taken part in.
Many to many
A many-to-many relationship occurs when multiple records in one table can be related to
multiple records in another table.
A classes table and a students table - Students can take potentially many classes and
classes can have many students enrolled.
Joining tables
Joining tables helps define many-to-many relationships between data in a database. As an
example, when defining the relationship above between products and suppliers, we would
define a joining table called products_suppliers that contains the primary keys from the tables
to be joined.
Then, when we want to see if a supplier supplies a specific product, we can look in the
joining table to see if the ids share a row.
This ensures that we can have multiple rows with the same product_id or supplier_id, but we
can't have two rows where both the product_id and supplier_id are the same.
37
Database normalization
Database normalization is a method for structuring your database schema in a way that
helps:
Data redundancy can be problematic, especially when data in one place is changed such that
the data is no longer consistent across all copies of that data.
Normal Forms
The creator of "database normalization", Edgar F. Codd, described different "normal forms" a
database can adhere to. We'll talk about the most common ones.
38
In short, 1st normal form is the least "normalized" form, and Boyce-Codd is the most
"normalized" form.
The more normalized a database, the better its data integrity, and the less duplicate data you'll
have.
In the context of normal forms, "primary key" means something a bit different
In the context of database normalization, we're going to use the term "primary key" slightly
differently. When we're talking about SQLite, a "primary key" is a single column that
uniquely identifies a row.
When we're talking more generally about data normalization, the term "primary key" means
the collection of columns that uniquely identify a row. That can be a single column, but it can
actually be any number of columns. A primary key is the minimum number of columns
needed to uniquely identify a row in a table.
If you think back to the many-to-many joining table product_suppliers, that table's "primary
key" was actually a combination of the 2 ids, product_id and supplier_id:
A cell can't have a nested table as its value (depending on the database you're using,
this may not even be possible)
39
Example of NOT 1st normal form
Lane 27 lane@[Link]
Lane 27 lane@[Link]
Allan 27 allan@[Link]
This table does not adhere to 1NF. It has two identical rows, so there isn't a unique primary
key for each row.
1 Lane 27 lane@[Link]
2 Lane 27 lane@[Link]
3 Allan 27 allan@[Link]
It's worth noting that if you create a "primary key" by ensuring that two columns are always
"unique together" that works too.
You should almost never design a table that doesn't adhere to 1NF
First normal form is simply a good idea. I've never built a database schema where each table
isn't at least in first normal form.
All columns that are not part of the primary key are dependent on the entire primary
key, and not just one of the columns in the primary key.
Lane Wagner l
Lane Small l
Allan Wagner a
40
This table does not adhere to 2NF. The first_initial column is entirely dependent on the
first_name column, rendering it redundant.
first_name last_name
Lane Wagner
Lane Small
Allan Wagner
first_name first_initial
Lane l
Allan a
You should probably default to keeping your tables in second normal form. That said, there
are good reasons to deviate from it, particularly for performance reasons. The reason being
that when you have query a second table to get additional data it can take a bit longer.
Optimize for data integrity and data de-duplication first. If you have speed issues, de-
normalize accordingly.
All columns that aren't part of the primary are dependent solely on the primary key.
Notice that this is only slightly different from second normal form. In second normal form we
can't have a column completely dependent on a part of the primary key, and in third normal
form we can't have a column that is entirely dependent on anything that isn't the entire
primary key.
1 Lane l [Link]@[Link]
2 Breanna b breanna@[Link]
3 Lane l [Link]@[Link]
41
This table is in 2nd normal form because first_initial is not dependent on a part of the primary
key. However, because it is dependent on the name column it doesn't adhere to 3rd normal
form.
id name email
1 Lane [Link]@[Link]
2 Breanna breanna@[Link]
3 Lane [Link]@[Link]
name first_initial
Lane l
Breanna b
The same exact rule of thumb applies to the second and third normal forms.
Optimize for data integrity and data de-duplication first by adhering to 3NF. If you have
speed issues, de-normalize accordingly.
A column that's part of a primary key can not be entirely dependent on a column that's
not part of that primary key.
This only comes into play when there are multiple possible primary key combinations that
overlap. Another name for this is "overlapping candidate keys".
Only in rare cases does a table in third normal form not meet the requirements of Boyce-
Codd normal form.
42
2002 2002-04-14 100 I wanna be them
The interesting thing here is that there are 3 possible primary keys:
release_year + sales
release_date + sales
name
This means that by definition this table is in 2nd and 3rd normal form because those forms
only restrict how dependent a column that is not part of a primary key can be.
This table is not in Boyce-Codd's normal form because release_year is entirely dependent on
release_date.
The easiest way to fix the table in our example is to simply remove the duplicate data from
release_date. Let's make that column release_day_and_month.
The same exact rule of thumb applies to the 2nd, 3rd and Boyce-Codd normal forms. That
said, it's unlikely you'll see BCNF-specific issues in practice.
Optimize for data integrity and data de-duplication first by adhering to Boyce-Codd normal
form. If you have speed issues, de-normalize accordingly.
Normalization Review
In my opinion, the exact definitions of 1st, 2nd, 3rd and Boyce-Codd normal forms simply
are not all that important in your work as a back-end developer.
However, what is important is to understand the basic principles of data integrity and data
redundancy that the normal forms teach us.
Let's go over some rules of thumb that you should commit to memory - they'll serve you well
when you design databases and even just in coding interviews.
43
Rules of thumb for database design
2. 90% of the time, that unique identifier will be a single column named id
4. Avoid storing data that is completely dependent on other data. Instead, compute it on
the fly when you need it.
5. Keep your schema as simple as you can. Optimize for a normalized database first.
Only denormalize for speed's sake when you start to run into performance problems.
INNER JOIN
The simplest and most common type of join in SQL is the INNER JOIN. By default, a JOIN
command is an INNER JOIN.
An INNER JOIN returns all of the records in table_a that have matching records in table_b,
as demonstrated by the following Venn diagram.
The ON clause
In order to perform a join, we need to tell the database which fields should be "matched up".
The ON clause is used to specify these columns to join.
44
SELECT *
FROM employees
INNER JOIN departments
ON employees.department_id = [Link];
The query above returns all the fields from both tables. The INNER keyword doesn't have
anything to do with the number of columns returned - it only affects the number of rows
returned.
Namespacing on Tables
When working with multiple tables, you can specify which table a field exists on using a ..
For example:
table_name.column_name
The above query returns the name field from the students table and the name field from the
classes table.
LEFT JOIN
A LEFT JOIN will return every record from table_a regardless of whether or not any of those
records have a match in table_b. A left join will also return any matching records from
table_b.
A small trick you can do to make writing the SQL query easier is define an alias for each
table. Here's an example:
45
ON e.department_id = [Link];
Notice the simple alias declarations e and d for employees and departments respectively.
Some developers do this to make their queries less verbose. That said, I personally hate it
because single-letter variables are harder to understand the meaning of.
RIGHT JOIN
A RIGHT JOIN is, as you may expect, the opposite of a LEFT JOIN. It returns all records
from table_b regardless of matches, and all matching records between the two tables.
SQLite Restriction
SQLite does not support right joins, but many dialects of SQL do. If you think about it, a
RIGHT JOIN is just a LEFT JOIN with the order of the tables switched, so it's not a big deal
that SQLite doesn't support the syntax.
FULL JOIN
A FULL JOIN combines the result set of the LEFT JOIN and RIGHT JOIN commands. It
returns all records from both from table_a and table_b regardless of whether or not they have
matches.
46
SQLite
Like RIGHT JOINs, SQLite doesn't support FULL JOINs but they are still important to
know.
If you've learned about data structures, most database indexes are just binary trees. The
binary tree can be stored in ram as well as on disk, and it makes it easy to lookup the location
of an entire row.
PRIMARY KEY columns are indexed by default, ensuring you can look up a row by its id
very quickly. But if you have other columns that you want to be able to do quick lookups on,
you'll need to index them.
CREATE INDEX
CREATE INDEX index_name on table_name (column_name);
It's fairly common to name an index after the column it's created on with a suffix of _idx.
47
Index Review
As we discussed, an index is a data structure that can perform quick lookups. By indexing a
column, we create a new in-memory structure, usually a binary-tree, where the values in the
indexed column are sorted into the tree to keep lookups fast.
In terms of Big-O complexity, a binary tree index ensures that lookups are O(log(n)).
While indexes make specific kinds of lookups much faster, they also add performance
overhead - they can slow down a database in other ways.
Think about it: if you index every column, you could have hundreds of binary trees in
memory. That needlessly bloats the memory usage of your database. It also means that each
time you insert a record, that record needs to be added to many trees - slowing down your
insert speed.
Add an index to columns you know you'll be doing frequent lookups on. Leave everything else
un-indexed. You can always add indexes later.
Multi-column indexes
Multi-column indexes are useful for the exact reason you might think - they speed up lookups
that depend on multiple columns.
CREATE INDEX
A multi-column index is sorted by the first column first, the second column next, and so
forth. A lookup on only the first column in a multi-column index gets almost all of the
performance improvements that it would get from its own single-column index. But lookups
on only the second or third column will have very degraded performance.
Rule of thumb
Unless you have specific reasons to do something special, only add multi-column indexes if
you're doing frequent lookups on a specific combination of columns.
48
Joining tables together, using subqueries, performing aggregations, and running post-hoc
calculations all take time. At very large scales these advanced techniques can actually take a
huge performance toll on an application - sometimes grinding the database server to a halt.
Storing duplicate information can drastically speed up an application that needs to look it up
in different ways. For example, if you store a user's country information right on their user
record, no expensive join is required to load their profile page.
That said, denormalize at your own risk. Denormalizing a database incurs a large risk of
inaccurate and buggy data.
SQL Injection
SQL is a very common way hackers attempt to cause damage or breach a database.
And the "name" of a student was 'Robert'); DROP TABLE students;-- then the resulting SQL
query would look like this:
As you can see, this is actually 2 queries! The first one inserts "Robert" into the database, and
the second one deletes the students table!
For example, the Go standard library's SQL packages automatically protects your inputs
against SQL attacks if you use it properly. In short, don't interpolate user input into raw
49
strings yourself - make sure your database library has a way to sanitize inputs, and pass it
those raw values.
50