0% found this document useful (0 votes)
9 views62 pages

Redshift Data Warehouse

The document discusses modernizing data warehouses using Amazon Redshift, highlighting its architecture, features, and benefits such as reduced maintenance costs and faster data processing. It emphasizes the advantages of a lake house approach, the use of columnar storage, and advanced query acceleration with AQUA technology. Additionally, it covers data warehousing trends, migration strategies, and various technical concepts related to Amazon Redshift's functionality.

Uploaded by

Narsireddy Avula
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views62 pages

Redshift Data Warehouse

The document discusses modernizing data warehouses using Amazon Redshift, highlighting its architecture, features, and benefits such as reduced maintenance costs and faster data processing. It emphasizes the advantages of a lake house approach, the use of columnar storage, and advanced query acceleration with AQUA technology. Additionally, it covers data warehousing trends, migration strategies, and various technical concepts related to Amazon Redshift's functionality.

Uploaded by

Narsireddy Avula
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Modernize your data warehouse

Aneesh Chandra PN
Data & Analytics Specialist Solutions Architect
Amazon Web Services

© 2020, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Agenda
• Modern analytics and Amazon Redshift

• Architecture and concepts

• Accelerating your data warehouse migration

• Additional resources

• Q&A
Data warehousing trends

Data
010010010
01010001
100010100

Exponential growth End-to-end insights from Migrations


of event data analyzing all of your data to the cloud
Why Amazon Redshift
Tens of thousands of customers use Amazon Redshift and process over 2 EB of data per day

Lake formation catalog & security 3x faster than other Usage-based, RIs
cloud data warehouses Predictable costs
Exabyte querying & AWS integrated
(e.g., AWS DMS, Amazon CloudWatch)

Virtually unlimited AWS-grade security (e.g., VPC, encryption Easy to provision & manage, automated
elastic linear scaling with AWS KMS, AWS CloudTrail) backups, AWS support, and 99.9% SLAs
Data warehouse
(business data)

Amazon
Redshift

Data lake
(event data)

Customers moving to data lake architectures


Amazon Redshift enables you to have a lake house approach
“ Moving to Amazon Redshift has helped us reduce our overall maintenance costs
by nearly 80% compared with our legacy data warehouse. By leveraging Amazon
Redshift Spectrum’s ability to query data directly in our Amazon S3 data lake, we
have been able to easily integrate new data sources in hours, not days or weeks.
This has not only reduced our time to insight, but it helped us control our
infrastructure costs. Amazon Redshift requires very little maintenance, to the
point where we don’t even have a dedicated administrator, and we spend less


than an hour a month on maintenance and administration.
Elliott Cordo
VP of Data Analytics
Equinox
Tens of thousands of customers use Amazon Redshift
© 2020, Amazon Web Services, Inc. or its affiliates. All rights reserved.
PostgreSQL
OLAP

MPP

PostgreSQL Columnar
AWS Cloud

Amazon Amazon IAM Amazon


SWF VPC EC2

OLAP

MPP

PostgreSQL Columnar

Amazon S3 AWS KMS Amazon Amazon


Route 53 CloudWatch
AWS Cloud

Amazon Amazon IAM Amazon


SWF VPC EC2

OLAP

MPP

PostgreSQL Columnar

Amazon Redshift

Amazon S3 AWS KMS Amazon Amazon


Route 53 CloudWatch
Features delivered to meet customer needs
Robust result Large # of tables support Copy command support
set caching ~20,000 for ORC, Parquet
IAM role chaining Elastic resize Groups
Performance enhancements:
Amazon Redshift Spectrum: date CloudWatch
Auto- Health and performance Automatic table hash join, vacuum, window functions,
formats, scalar JSON and ION file
monitoring with support for resize ops, aggregations, console,
format support, region expansion,
predicate filtering
analyze Amazon CloudWatch distribution style WLM queues union all, efficient compile
code cache

200+
Unload ~25 query monitoring
Auto WLM AQUA (Advanced Query Accelerator)
to CSV rules (QMR) support

Resiliency of
Concurrency scaling DC1 migration to DC2
ROLLBACK processing
Autoanalyze for
Manage multi-part Spectrum Request Apply new
incremental changes
query in AWS console
on table Accelerator distribution key

Performance: Bloom filters in


Amazon Redshift
Faster classic joins, complex queries that Amazon Redshift Spectrum:
Spectrum: Row group
filtering in Parquet and
ORC, nested data support,
resize with new features in the create internal table,
communication layer
Concurrency scaling

enhanced VPC routing,


multiple partitions
optimized data
transfer protocol
past 18 months Integration with
Autovacuum sort,
autoanalyze, and
AWS Lake Formation
autotable sort
Performance: Join pushdowns
Snapshot scheduler
Auto WLM with to subquery, mixed workload
temporary tables, rank functions,
Advisor recommendations AZ64 compression Console
query priorities Stored procedures null handling in join, for distribution keys encoding redesign
single row insert
Column-level access Performance of Federated Materialized Manual pause
Spatial processing control with RA3 inter-region
AWS Lake Formation snapshot transfers query views and resume
Amazon Redshift architecture
Massively parallel, shared-nothing SQL clients/BI tools
columnar architecture JDBC/ODBC

Leader node Leader


SQL endpoint node
Stores metadata
Coordinates parallel SQL processing
Compute Compute Compute
Compute nodes node node node
Local, columnar storage
Executes queries in parallel Load
Load, unload, backup, restore Unload
Backup
Restore
Amazon Redshift Spectrum nodes
Execute queries directly against Amazon S3
Amazon S3
Amazon Redshift architecture
Massively parallel, shared-nothing SQL clients/BI tools
columnar architecture JDBC/ODBC

Leader node Leader


SQL endpoint node
Stores metadata
Coordinates parallel SQL processing
Compute Compute Compute
Compute nodes node node node
Local, columnar storage
Executes queries in parallel Load Amazon
Load, unload, backup, restore Query ... Redshift
1 2 3 4 N Spectrum
Amazon Redshift Spectrum nodes
Execute queries directly against Amazon S3
Amazon S3
Amazon Redshift evolving architecture
Massively parallel, shared-nothing
SQL clients/BI tools
columnar architecture
Leader node JDBC/ODBC

Compute nodes
Leader
Amazon Redshift Spectrum nodes
node

Amazon Redshift managed storage


Pay separately for storage and compute Compute Compute Compute
Large high-speed SSD-backed cache node node node
Automatic scaling (up to 64 TB/instance)
Supports up to 8.2 PB of cluster storage
AQUA: Advanced Query Accelerator (preview)
A new distributed and hardware-accelerated processing layer that will make Amazon Redshift
10x faster than any other cloud data warehouse without increasing cost

Compute
Compute Compute Compute
Compute
Compute Compute
Compute Compute
Compute
Clusters
Clusters
RA3
Clusters
Compute
Clusters
Clusters
RA3
Clusters
Compute
Clusters
Clusters
RA3
Clusters Minimize data movement over the network
Clusters Clusters Clusters
cluster cluster cluster by pushing down operations to AQUA nodes

AQUA nodes with custom AWS-designed analytics


processors to make operations (compression,
AQUA AQUA AQUA AQUA encryption, filtering, and aggregations) faster than
node node node node
traditional CPUs

Available in preview only with RA3 – no code


changes required

Amazon Redshift managed storage


© 2020, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Terminology and concepts: Node types
Amazon Redshift analytics – RA3 (new)
Amazon Redshift managed storage – solid-state disks + Amazon S3

Dense compute – DC2


Solid-state disks

Dense storage – DS2


Magnetic disks

Instance type Disk type Size Memory CPUs Slices


RA3 4xlarge (new) RMS Scales to 16 TB 96 GB 12 4

RA3 16xlarge (new) RMS Scales to 64 TB 384 GB 48 16

DC2 large SSD 160 GB 16 GB 2 2

DC2 8xlarge SSD 2.56 TB 244 GB 32 16

DS2 xlarge Magnetic 2 TB 32 GB 4 2

DS2 8xlarge Magnetic 16 TB 244 GB 36 16


Terminology and concepts: Columnar
Amazon Redshift uses a columnar architecture for storing data on disk

Goal: Reduce I/O for analytics queries

Physically store data on disk by column rather than row

Read only the column data that is required


Example: Columnar architecture
CREATE TABLE deep_dive (
aid INT --audience_id
aid loc dt
,loc CHAR(3) --location 1 SFO 2017-10-20
);
,dt DATE --date
2 JFK 2017-10-20
3 SFO 2017-04-01
4 JFK 2017-05-14
aid loc dt

SELECT min(dt) FROM deep_dive;

Row-based storage
• Need to read everything
• Unnecessary I/O
Example: Columnar architecture
CREATE TABLE deep_dive (
aid INT --audience_id
aid loc dt
,loc CHAR(3) --location 1 SFO 2017-10-20
);
,dt DATE --date
2 JFK 2017-10-20
3 SFO 2017-04-01
4 JFK 2017-05-14
aid loc dt

SELECT min(dt) FROM deep_dive;

Column-based storage
Only scan blocks for relevant column
Terminology and concepts: Compression
Goals Impact
Allow more data to be stored within Allows 2–4x more data
an Amazon Redshift cluster to be stored within the cluster
Improve query performance
by decreasing I/O

By default, COPY automatically analyzes and compresses data on first load


into an empty table

ANALYZE COMPRESSION is a built-in command that finds the optimal compression


for each column on an existing table
Compression example
CREATE TABLE deep_dive (
aid INT --audience_id
aid loc dt
,loc CHAR(3) --location 1 SFO 2017-10-20
);
,dt DATE --date
2 JFK 2017-10-20
3 SFO 2017-04-01
4 JFK 2017-05-14
aid loc dt

Add 1 of 13 different encodings


to each column
Compression example
CREATE TABLE deep_dive (
aid INT ENCODE AZ64
aid loc dt
,loc CHAR(3) ENCODE BYTEDICT 1 SFO 2017-10-20
);
,dt DATE ENCODE RUNLENGTH
2 JFK 2017-10-20
3 SFO 2017-04-01
4 JFK 2017-05-14
aid loc dt

• More efficient compression by


storing the same data type in
the columnar architecture
• Columns grow and shrink
independently
• Reduces storage requirements
• Reduces I/O
Terminology and concepts: Blocks
Column data is persisted to 1 MB immutable blocks

Blocks are individually encoded with 1 of 13 encodings

A full block can contain millions of values


Terminology and concepts: Zone maps
Goal
Eliminates unnecessary I/O

In-memory block metadata


• Contains per-block minimum and maximum values
• All blocks automatically have zone maps
• Effectively prunes blocks that cannot contain data for a given query
Terminology and concepts: Data sorting
Goal Impact
Make queries run faster by Enables range-restricted scans
increasing the effectiveness of to prune blocks by leveraging
zone maps and reducing I/O zone maps

Achieved with the table property SORTKEY defined on one or


more columns

Optimal sort key is dependent on:


• Query patterns
• Business requirements
• Data profile
Example: Sort key
CREATE TABLE deep_dive (
aid INT --audience_id Add a sort key to one or more
,loc CHAR(3) --location columns to physically sort
,dt DATE --date
the data on disk
) SORTKEY (dt, loc);

deep_dive
aid loc dt
1 SFO 2017-10-20

2 JFK 2017-10-20

3 SFO 2017-04-01

4 JFK 2017-05-14
Example: Sort key
CREATE TABLE deep_dive (
aid INT --audience_id Add a sort key to one or more
,loc CHAR(3) --location columns to physically sort
,dt DATE --date
the data on disk
) SORTKEY (dt, loc);

deep_dive deep_dive (sorted)


aid loc dt aid loc dt
1 SFO 2017-10-20 3 SFO 2017-04-01

2 JFK 2017-10-20 4 JFK 2017-05-14

3 SFO 2017-04-01 2 JFK 2017-10-20

4 JFK 2017-05-14 1 SFO 2017-10-20


Example: Zone maps and sorting
SELECT count(*) FROM deep_dive WHERE dt = '06-09-2017';

Unsorted table
MIN: 01-JUNE-2017
MAX: 20-JUNE-2017

MIN: 08-JUNE-2017
MAX: 30-JUNE-2017

MIN: 12-JUNE-2017
MAX: 20-JUNE-2017

MIN: 02-JUNE-2017
MAX: 25-JUNE-2017
Example: Zone maps and sorting
SELECT count(*) FROM deep_dive WHERE dt = '06-09-2017';

Unsorted table
MIN: 01-JUNE-2017
MAX: 20-JUNE-2017

MIN: 08-JUNE-2017
MAX: 30-JUNE-2017

MIN: 12-JUNE-2017
MAX: 20-JUNE-2017

MIN: 02-JUNE-2017
MAX: 25-JUNE-2017
Example: Zone maps and sorting
SELECT count(*) FROM deep_dive WHERE dt = '06-09-2017';

Unsorted table Sorted by date


MIN: 01-JUNE-2017 MIN: 01-JUNE-2017
MAX: 20-JUNE-2017 MAX: 06-JUNE-2017

MIN: 08-JUNE-2017 MIN: 07-JUNE-2017


MAX: 30-JUNE-2017 MAX: 12-JUNE-2017

MIN: 12-JUNE-2017 MIN: 13-JUNE-2017


MAX: 20-JUNE-2017 MAX: 21-JUNE-2017

MIN: 02-JUNE-2017 MIN: 21-JUNE-2017


MAX: 25-JUNE-2017 MAX: 30-JUNE-2017
Example: Zone maps and sorting
SELECT count(*) FROM deep_dive WHERE dt = '06-09-2017';

Unsorted table Sorted by date


MIN: 01-JUNE-2017 MIN: 01-JUNE-2017
MAX: 20-JUNE-2017 MAX: 06-JUNE-2017

MIN: 08-JUNE-2017 MIN: 07-JUNE-2017


MAX: 30-JUNE-2017 MAX: 12-JUNE-2017

MIN: 12-JUNE-2017 MIN: 13-JUNE-2017


MAX: 20-JUNE-2017 MAX: 21-JUNE-2017

MIN: 02-JUNE-2017 MIN: 21-JUNE-2017


MAX: 25-JUNE-2017 MAX: 30-JUNE-2017
Terminology and concepts: Slices

A slice can be thought of


as a virtual compute node Facts about slices

Unit of data partitioning Each compute node is initialized with


either 2 or 16 slices
Parallel query processing
Table rows are distributed to slices
A slice processes only its own data
Data distribution
Distribution style is a table property that KEY
dictates how that table’s data is distributed
throughout the cluster
KEY: Value is hashed, same value goes to same location (slice) Slice Slice Slice Slice
1 2 3 4
ALL: Full table data goes to the first slice of every node Node 1 Node 2

EVEN: Round robin


AUTO: Combines EVEN and ALL ALL
Goals EVEN
Distribute data evenly for
parallel processing Slice Slice Slice Slice Slice Slice Slice Slice
1 2 3 4 1 2 3 4
Minimize data movement Node 1 Node 2 Node 1 Node 2
during query processing
Data distribution example
CREATE TABLE deep_dive ( Table: deep_dive
aid INT --audience_id
,loc CHAR(3) --location User columns System columns
,dt DATE --date
) (EVEN|KEY|ALL|AUTO); aid loc dt ins del row

Slice 0 Slice 1 Slice 2 Slice 3

Node 1 Node 2
Data distribution, EVEN example
CREATE TABLE deep_dive ( INSERT INTO deep_dive VALUES
aid INT --audience_id (1, 'SFO', '2016-09-01'),
,loc CHAR(3) --location (2, 'JFK', '2016-09-14'),
,dt DATE --date (3, 'SFO', '2017-04-01'),
) DISTSTYLE EVEN; (4, 'JFK', '2017-05-14');

Table: deep_dive
User Columns System Columns
aid loc dt ins del row

Rows: 1
Slice 0 Slice 1 Slice 2 Slice 3

Node 1 Node 2
Data distribution, EVEN example
CREATE TABLE deep_dive ( INSERT INTO deep_dive VALUES
aid INT --audience_id (1, 'SFO', '2016-09-01'),
,loc CHAR(3) --location (2, 'JFK', '2016-09-14'),
,dt DATE --date (3, 'SFO', '2017-04-01'),
) DISTSTYLE EVEN; (4, 'JFK', '2017-05-14');

Table: deep_dive Table: deep_dive


User Columns System Columns User Columns System Columns
aid loc dt ins del row aid loc dt ins del row

Rows: 1 Rows: 1
Slice 0 Slice 1 Slice 2 Slice 3

Node 1 Node 2
Data distribution, EVEN example
CREATE TABLE deep_dive ( INSERT INTO deep_dive VALUES
aid INT --audience_id (1, 'SFO', '2016-09-01'),
,loc CHAR(3) --location (2, 'JFK', '2016-09-14'),
,dt DATE --date (3, 'SFO', '2017-04-01'),
) DISTSTYLE EVEN; (4, 'JFK', '2017-05-14');

Table: deep_dive Table: deep_dive Table: deep_dive


User Columns System Columns User Columns System Columns User Columns System Columns
aid loc dt ins del row aid loc dt ins del row aid loc dt ins del row

Rows: 1 Rows: 1 Rows: 1


Slice 0 Slice 1 Slice 2 Slice 3

Node 1 Node 2
Data distribution, EVEN example
CREATE TABLE deep_dive ( INSERT INTO deep_dive VALUES
aid INT --audience_id (1, 'SFO', '2016-09-01'),
,loc CHAR(3) --location (2, 'JFK', '2016-09-14'),
,dt DATE --date (3, 'SFO', '2017-04-01'),
) DISTSTYLE EVEN; (4, 'JFK', '2017-05-14');

Table: deep_dive Table: deep_dive Table: deep_dive Table: deep_dive


User Columns System Columns User Columns System Columns User Columns System Columns User Columns System Columns
aid loc dt ins del row aid loc dt ins del row aid loc dt ins del row aid loc dt ins del row

Rows: 1 Rows: 1 Rows: 1 Rows: 1


Slice 0 Slice 1 Slice 2 Slice 3

Node 1 Node 2
Data distribution, KEY Example #1
CREATE TABLE deep_dive ( INSERT INTO deep_dive VALUES
aid INT --audience_id (1, 'SFO', '2016-09-01'),
,loc CHAR(3) --location (2, 'JFK', '2016-09-14'),
,dt DATE --date (3, 'SFO', '2017-04-01'),
) DISTSTYLE KEY DISTKEY (loc); (4, 'JFK', '2017-05-14');

Rows: 0 Rows: 0 Rows: 0 Rows: 0


Slice 0 Slice 1 Slice 2 Slice 3

Node 1 Node 2
Data distribution, KEY Example #1
CREATE TABLE deep_dive ( INSERT INTO deep_dive VALUES
aid INT --audience_id (1, 'SFO', '2016-09-01'),
,loc CHAR(3) --location (2, 'JFK', '2016-09-14'),
,dt DATE --date (3, 'SFO', '2017-04-01'),
) DISTSTYLE KEY DISTKEY (loc); (4, 'JFK', '2017-05-14');

Table: deep_dive
User Columns System Columns
aid loc dt ins del row

Rows: 1 Rows: 0 Rows: 0 Rows: 0


Slice 0 Slice 1 Slice 2 Slice 3

Node 1 Node 2
Data distribution, KEY Example #1
CREATE TABLE deep_dive ( INSERT INTO deep_dive VALUES
aid INT --audience_id (1, 'SFO', '2016-09-01'),
,loc CHAR(3) --location (2, 'JFK', '2016-09-14'),
,dt DATE --date (3, 'SFO', '2017-04-01'),
) DISTSTYLE KEY DISTKEY (loc); (4, 'JFK', '2017-05-14');

Table: deep_dive Table: deep_dive


User Columns System Columns User Columns System Columns
aid loc dt ins del row aid loc dt ins del row

Rows: 1 Rows: 1 Rows: 0 Rows: 0


Slice 0 Slice 1 Slice 2 Slice 3

Node 1 Node 2
Data distribution, KEY Example #1
CREATE TABLE deep_dive ( INSERT INTO deep_dive VALUES
aid INT --audience_id (1, 'SFO', '2016-09-01'),
,loc CHAR(3) --location (2, 'JFK', '2016-09-14'),
,dt DATE --date (3, 'SFO', '2017-04-01'),
) DISTSTYLE KEY DISTKEY (loc); (4, 'JFK', '2017-05-14');

Table: deep_dive Table: deep_dive


User Columns System Columns User Columns System Columns
aid loc dt ins del row aid loc dt ins del row

Rows: 2 Rows: 1 Rows: 0 Rows: 0


Slice 0 Slice 1 Slice 2 Slice 3

Node 1 Node 2
Data distribution, KEY Example #1
CREATE TABLE deep_dive ( INSERT INTO deep_dive VALUES
aid INT --audience_id (1, 'SFO', '2016-09-01'),
,loc CHAR(3) --location (2, 'JFK', '2016-09-14'),
,dt DATE --date (3, 'SFO', '2017-04-01'),
) DISTSTYLE KEY DISTKEY (loc); (4, 'JFK', '2017-05-14');

Table: deep_dive Table: deep_dive


User Columns System Columns User Columns System Columns
aid loc dt ins del row aid loc dt ins del row

Rows: 2 Rows: 2 Rows: 0 Rows: 0


Slice 0 Slice 1 Slice 2 Slice 3

Node 1 Node 2
Data distribution, KEY Example #2
CREATE TABLE deep_dive ( INSERT INTO deep_dive VALUES
aid INT --audience_id (1, 'SFO', '2016-09-01'),
,loc CHAR(3) --location (2, 'JFK', '2016-09-14'),
,dt DATE --date (3, 'SFO', '2017-04-01'),
) DISTSTYLE KEY DISTKEY (aid); (4, 'JFK', '2017-05-14');

Rows: 0 Rows: 0 Rows: 0 Rows: 0


Slice 0 Slice 1 Slice 2 Slice 3

Node 1 Node 2
Data distribution, KEY Example #2
CREATE TABLE deep_dive ( INSERT INTO deep_dive VALUES
aid INT --audience_id (1, 'SFO', '2016-09-01'),
,loc CHAR(3) --location (2, 'JFK', '2016-09-14'),
,dt DATE --date (3, 'SFO', '2017-04-01'),
) DISTSTYLE KEY DISTKEY (aid); (4, 'JFK', '2017-05-14');

Table: deep_dive Table: deep_dive Table: deep_dive Table: deep_dive


User Columns System Columns User Columns System Columns User Columns System Columns User Columns System Columns
aid loc dt ins del row aid loc dt ins del row aid loc dt ins del row aid loc dt ins del row

Rows: 1 Rows: 1 Rows: 1 Rows: 1


Slice 0 Slice 1 Slice 2 Slice 3

Node 1 Node 2
Data distribution, ALL example
CREATE TABLE deep_dive ( INSERT INTO deep_dive VALUES
aid INT --audience_id (1, 'SFO', '2016-09-01'),
,loc CHAR(3) --location (2, 'JFK', '2016-09-14'),
,dt DATE --date (3, 'SFO', '2017-04-01'),
) DISTSTYLE ALL; (4, 'JFK', '2017-05-14');

Table: deep_dive Table: deep_dive


User Columns System Columns User Columns System Columns
aid loc dt ins del row aid loc dt ins del row

Rows: 4 Rows: 0 Rows: 4 Rows: 0


Slice 0 Slice 1 Slice 2 Slice 3

Node 1 Node 2
Data distribution summary
DISTSTYLE KEY
Goals
• Optimize JOIN performance between large tables by distributing on columns used in the ON clause
• Optimize INSERT INTO SELECT performance SELECT diststyle, skew_rows
FROM svv_table_info WHERE "table" = 'deep_dive';
• Optimize GROUP BY performance
diststyle | skew_rows
The column that is being distributed on should -----------+----------- Ratio of the slice with the most
have a high cardinality and not cause row skew KEY(aid) | 1.07  and least number of rows

DISTSTYLE ALL
Goals
• Optimize JOIN performance with dimension tables
• Reduces disk usage on small tables
Small- and medium-size dimension tables (<3M rows)

DISTSTYLE EVEN
If neither KEY or ALL apply

DISTSTYLE AUTO
Default distribution combines DISTSTYLE ALL and EVEN
© 2020, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS migration tooling

AWS Schema Conversion Tool (AWS SCT) converts


your commercial database and data warehouse
schemas to open-source engines or AWS native
services, such as Amazon Aurora and Amazon Redshift

AWS Database Migration Service (AWS DMS) easily


and securely migrates and/or replicates your
databases and data warehouses to AWS
AWS SCT
AWS SCT helps automate database schema
and code conversion tasks when migrating from
source to target database engines
Features
Create assessment reports for
homogeneous/heterogeneous migrations
Convert
Convert database schema
Convert data warehouse schema
Convert embedded application code
Code browser that highlights places where
manual edits are required
Source DB AWS SCT Target DB
Secure connections to your databases with SSL
Service substitutions/ETL modernization to AWS Glue
Migrate data to data warehouses using SCT
data extractors
Optimize schemas in Amazon Redshift
AWS SCT data extractors
Extract data from your data warehouse and migrate to Amazon Redshift
• Extracts data through local migration agents
• Data is optimized for Amazon Redshift and saved in local files
• Files are loaded to an Amazon S3 bucket (through network or AWS Snowball Edge)
and then to Amazon Redshift

Microsoft SQL
Server

Source DW AWS SCT Amazon Amazon


S3 bucket Redshift
AWS DMS

Migrating Migrate between on-premises and AWS

databases Migrate between databases


to AWS
Automated schema conversion

Data replication for zero


downtime migration
Legacy data warehouse migration tips
When moving from legacy row-based data warehouses
• Denormalize tables where it makes sense (predicate columns in the fact table)
• Avoid date dimension tables
• Amazon Redshift is efficient with wide tables because of columnar storage and compression

When moving from SMP legacy data warehouses


• Colocation of tables is required for fast JOINS
Leverage DIST STYLE ALL/AUTO or KEY
• Amazon Redshift is designed for big data (>100 GB to PB scale)
Smaller datasets consider Amazon Aurora PostgreSQL
• Wrap workflows in explicit transactions

Leverage Amazon Redshift stored procedures for faster migrations


PL/pgSQL stored procedures were added to make porting legacy procedures easier
© 2020, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWSLabs on GitHub – Amazon Redshift
[Link]
[Link]
[Link]
Admin scripts
Collection of utilities for running diagnostics on your cluster

Admin views
Collection of utilities for managing your cluster, generating schema DDL, and so on

Analyze Vacuum utility


Utility that you can schedule to vacuum and analyze the tables within your Amazon Redshift cluster

Column Encoding utility


Utility that applies optimal column encoding to an established schema with data already loaded
AWS Big Data Blog – Amazon Redshift
Amazon Redshift ngineering’s Advanced Table Design Playbook
[Link]
playbook-preamble-prerequisites-and-prioritization/
–Zach Christopherson

Top 10 Performance Tuning Techniques for Amazon Redshift


[Link]
redshift/
–Ian Meyers and Zach Christopherson

Twelve Best Practices for Amazon Redshift Spectrum


[Link]
–Po Hong and Peter Dalton
AWS Training and Certification

Explore tailored Build cloud skills with Demonstrate expertise with Find entry-level cloud
learning paths for 550+ free digital an industry-recognized talent with AWS
customers and training courses, or dive credential Academy and AWS
partners deep with classroom re/Start
training

[Link]/training
Thank you!
Aneesh Chandra PN
nanhyama@[Link]

© 2020, Amazon Web Services, Inc. or its affiliates. All rights reserved.

You might also like