Module 6: Adding a Database Layer
Section 1: Database Layer Considerations: -
Relational and Non-Relational Databases: -
Relational Databases: If your use case lends itself well to strict schema rules, where
the schema structure is well defined and does not need to change often, a relational
database would be a good choice. When migrating to an on-premises relational
workload or if your workload involves online transactional processing, this type of
database is suitable. However, if your application needs extreme read/write capacity, a
relational database might not be the right choice. A relational database can be the best
low-effort solution for many use cases.
Non-relational Databases (NoSQL): When a caching layer is needed to improve read
performance, when storing JSON documents, or when a single digit millisecond data
retrieval is needed, a non-relational database is a suitable option. Non-relational
databases are optimized for specific data models and access patterns that help enable
higher performance instead of trying to achieve the functionality of relational
databases.
Both relational and non-relational databases can scale horizontally and vertically
Amazon Database Options: -
Relational: Amazon RDS is used for transactional applications like enterprise resource
planning (ERP). customer relationship management (CRM), and ecommerce
applications to store structured data.
Non-Relational: These are purpose-built databases designed to quickly and efficiently
perform the specific functions these applications require
Less responsibility with managed AWS database services: -
Database Capacity Planning: -
There are two ways to scale databases:
- Vertically: Scaling vertically involves expanding the resources that the existing
server uses in order to increase its capacity. This complex and time-consuming
process includes upgrading memory, storage, or processing power. Usually this
means that the database will be down for some period of time.
- Horizontally: Horizontal scaling involves increasing the number of servers that
the database runs on, which decreases the load on the server. The compute
capacity is added while the database is running, which usually means that this
scaling happens without downtime.
Section 2: Amazon RDS: -
Amazon Relational Database Service: -
Benefits of Amazon RDS: -
Amazon RDS database architecture: -
Architecture diagram of a database layer: -
Aurora: -
- Aurora is up to five times faster than standard MySQL databases and three times
faster than standard PostgreSQL databases.
- Aurora features a distributed, fault-tolerant, self-healing storage system that
auto scales up to 64 TB per database instance.
- It delivers high performance and availability with up to 15 low-latency read
replicas, point-in-time recovery, continuous backup to Amazon Simple Storage
Service (Amazon S3), and replication across three Availability Zones.
Aurora database clusters: -
Aurora Serverless: -
Amazon Aurora Serverless is an on-demand, auto scaling configuration for Aurora
where the database automatically starts up, shuts down, and scales capacity up or
down based on your application's needs
With Aurora Serverless, you can run your database in the cloud without managing any
database instances.
Aurora Serverless v2 is particularly useful for the following use cases:
- Variable workloads: You're running workloads that have sudden and
unpredictable increases in activity. With Aurora Serverless v2, your database
automatically scales capacity to meet the needs of the application's peak load
and scales back down when the surge of activity is over.
- New applications: You're deploying a new application and you're unsure about
the DB instance size you need. By using Aurora Serverless v2, you can set up a
cluster with one or many DB instances and have the database auto scale to the
capacity requirements of your application.
- Development and testing: With Aurora Serverless v2, you can create DB
instances with a low minimum capacity. You can set the maximum capacity high
enough that those DB instances can still run substantial workloads without
running low on memory. When the database isn't in use, all of the DB instances
scale down to avoid unnecessary charges.
- Capacity planning: Suppose that you usually adjust your database capacity or
verify the optimal database capacity for your workload by modifying the DB
instance classes of all the DB instances in a cluster. With Aurora Serverless v2,
you can avoid this administrative overhead. You can determine the appropriate
minimum and maximum capacity by running the workload and checking how
much the DB instances actually scale. You can modify existing DB instances
from provisioned to Aurora Serverless v2 or from Aurora Serverless v2 to
provisioned. You don't need to create a new cluster or a new DB instance in such
cases.
Amazon RDS use case: Banking Transactions: -
Amazon RDS EC2 instance types and sizing: -
- General purpose Amazon RDS instance types are suitable for CPU-intensive
workloads and workloads with moderate CPU usage that experience temporary
spikes in use. The T and M family are general purpose instance types.
- Memory-optimized Amazon RDS instance types are suitable for query-
intensive workloads or high connection counts. The R and X family are memory-
optimized instance types.
When it comes to upgrading instance types, first determine if the workload is memory
intensive or compute intensive. Identify which resource is constrained, and upgrade
based on that deficit.
Amazon RDS security best practices: -
Section 3: Amazon RDS proxy connection management: -
Amazon RDS Proxy: -
Connection pooling: Improved scalability: -
Many applications, including those built on modern serverless architectures, can have
thousands of open connections from the application to the database server. Not all of
these connections are always carrying out a transaction. RDS Proxy detects these gaps
in operations and reuses the connection to serve other application connections.
You might want to consider using RDS Proxy in these situations:
- Any database instance that encounters errors regarding too many connections is
a good candidate for associating with a proxy. The proxy enables applications to
open many client connections while the proxy manages a smaller number of
long-lived connections to the database instance.
- Applications that typically open and close large numbers of database
connections and don't have built-in connection pooling mechanisms are good
candidates for using a proxy.
- Applications that keep a large number of connections open for long periods are
typically good candidates for using a proxy. Applications in industries such as
software as a service (SaaS) or ecommerce often minimize the latency for
database requests by leaving connections open. With RDS Proxy, an application
can keep more connections open than it can when connecting directly to the
database instance.
Seamless and fast failover: Improved availability: -
Streamlined authentication: Improved application security: -
When you use RDS Proxy, IAM authentication is enforced, which improves security.
Passwords embedded in code are eliminated, which streamlines security.
1. The application requests an authentication token from IAM. IAM returns the
authentication token to Elastic Container Service (Amazon ECS) application
container.
2. The application sends a database request to RDS Proxy connecting with the
validated IAM token.
3. RDS Proxy calls Secrets Manager for the mapped identity. Secrets Manager
returns the secret (username and password) to RDS Proxy.
4. RDS Proxy sends the database request to the Aurora database instance
connecting with the secret.
Backing up data in Amazon RDS: -
Amazon RDS cross-region backups: -
For added disaster recovery capability, you can configure your Amazon RDS database
instance to replicate snapshots and transaction logs to a destination AWS Region of
your choice:
1. Snapshots and transaction logs from the primary RDS database are stored in an
S3 bucket that is controlled by Amazon RDS.
2. When backup replication is configured for a database instance, RDS initiates a
cross-Region copy of all snapshots and transaction logs on the database
instance.
Additionally, with Amazon RDS, you can create a read replica in a different AWS Region
from the source DB instance. You can create a read replica in a different AWS Region to
do the following:
- Improve your disaster recovery capabilities.
- Scale read operations into an AWS Region closer to your users.
- Make it easier to migrate from a data center in one AWS Region to a data center
in another AWS Region.
Amazon RDS encryption for backups: -
Section 3: Amazon DynamoDB: -
DynamoDB: -
A flexible schema gives you the ability to adapt as your business requirements change
without the burden of having to redefine the table schema as you would in relational
databases.
DynamoDB use cases: -
DynamoDB features: -
Serverless performance with limitless scalability:
- Secondary indexes provide flexibility on how to access your data.
- Amazon DynamoDB Streams is ideal for an event-driven architecture.
(DynamoDB Streams records a time-ordered sequence of every item-level
change in near-real time)
- Multi-Region, multi-active data replication with global tables
Built-in security and reliability
- DynamoDB encrypts all customer data at rest by default.
- Point-in-time recovery protects data from accidental operations.
- DynamoDB allows fine-grained access control.
Amazon DynamoDB data structure: -
Amazon DynamoDB sample base table: -
Alternate schema using a global secondary index: -
With a global secondary index (GSI), DynamoDB creates a read-only copy of the base
table in which you can pivot the data around different partition and sort keys. You do not
have to include every attribute of your items in the GSI. This provides an alternate
schema on your DynamoDB base table.
Alternate schema using a local secondary index: -
You can retrieve data from a DynamoDB base table using eventual consistency or strong
consistency. If you request a strongly consistent read, DynamoDB returns a response
with the most up-to-date data. The response reflects the updates from all previous write
operations that were successful.
If your use case requires strongly consistent reads, you could implement an alternate
schema with a local secondary index (LSI).
Multi-region replication: Amazon DynamoDB global tables: -
By default, Amazon DynamoDB replicates your data across multiple Availability Zones
in a single Region. However, there might be occasions when you want to replicate your
data across multiple Regions.
- A global table is a collection of one or more replica tables, all owned by a single
AWS account. A replica table is a single DynamoDB table that functions as a part
of a global table. Each replica stores the same set of data items. You can add
replica tables to the global table so that it can be available in additional Regions.
DynamoDB security best practices: -
The following best practices can help you anticipate and prevent security incidents in
DynamoDB:
- Use IAM roles to authenticate access.
- Use IAM policies for DynamoDB base authorization.
- Use IAM policy conditions for fine-grained access control.
- Use a VPC endpoint and policies to access DynamoDB.
The following best practices for DynamoDB can help you detect potential security
weaknesses and incidents:
- Use AWS CloudTrail to monitor AWS managed AWS KMS key usage.
- Monitor DynamoDB operations by using CloudTrail.
- Monitor DynamoDB configuration with AWS Config.
- Monitor DynamoDB compliance with AWS Config rules.
Section 4: Purpose-Built Databases: -
The evolution of purpose-built databases: -
Amazon Redshift: -
Data warehouses are relational databases that have been optimized for reporting and
analytics. This involves reading large amounts of data to understand relationships and
trends across the data.
An Amazon Redshift data warehouse is an enterprise-class relational database query
and management system.
The following are some use cases for Amazon Redshift:
- Automatically create, train, and deploy machine learning models to improve
financial and demand forecasts
- Securely share data among accounts, organizations, and partners while building
applications on top of third-party data.
- Increase developer productivity by getting simplified data access without
configuring drivers and managing database connections.
AWS fully managed purpose-built database options: -
- Amazon DocumentDB (with MongoDB compatibility) is a fast, scalable,
document database service that supports MongoDB workloads.
- Amazon Keyspaces (for Apache Cassandra) is a highly available and managed
Apache Cassandra–compatible database service.
- Amazon MemoryDB for Redis is a Redis-compatible, durable, in-memory
database service for ultra-fast performance.
- Amazon Neptune is a fast, reliable, fully managed graph database service that
you can use to build and run applications that work with highly connected
datasets.
- Amazon Timestream is a scalable, fully managed, fast timeseries database
service for IoT and operational applications.
- Amazon Quantum Ledger Database (QLDB) is a fully managed ledger database
that tracks each and every application data change and maintains a complete
and verifiable history of changes over time.
All of these databases are fully managed cloud services that help you limit the time and
cost of experimenting and maintaining different types of databases.
Managing a database to your business need: -
Section 5: Migrating data into AWS databases: -
AWS DMS: -
AWS DMS is a web service that you can use to migrate data from a source data store to
a target data store. These two data stores are called endpoints.
AWS DMS homogeneous migration: -
Homogeneous data migrations in AWS DMS simplify the migration of self-managed, on-
premises databases or cloud databases to the equivalent engine on Amazon RDS or
Aurora
Tools for heterogeneous database migration: -
Heterogenous migration is migrating between source and target endpoints that use
different database engines
AWS DMS heterogeneous migration with AWS SCT: -
AWS DMS replicates data from a database into a data lake: -
Section 6: Applying AWS Well-Architected Framework principles to the database
layer: -