Redshift Data Warehouse
Redshift Data Warehouse
Aneesh Chandra PN
Data & Analytics Specialist Solutions Architect
Amazon Web Services
© 2020, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Agenda
• Modern analytics and Amazon Redshift
• Additional resources
• Q&A
Data warehousing trends
Data
010010010
01010001
100010100
Lake formation catalog & security 3x faster than other Usage-based, RIs
cloud data warehouses Predictable costs
Exabyte querying & AWS integrated
(e.g., AWS DMS, Amazon CloudWatch)
Virtually unlimited AWS-grade security (e.g., VPC, encryption Easy to provision & manage, automated
elastic linear scaling with AWS KMS, AWS CloudTrail) backups, AWS support, and 99.9% SLAs
Data warehouse
(business data)
Amazon
Redshift
Data lake
(event data)
”
than an hour a month on maintenance and administration.
Elliott Cordo
VP of Data Analytics
Equinox
Tens of thousands of customers use Amazon Redshift
© 2020, Amazon Web Services, Inc. or its affiliates. All rights reserved.
PostgreSQL
OLAP
MPP
PostgreSQL Columnar
AWS Cloud
OLAP
MPP
PostgreSQL Columnar
OLAP
MPP
PostgreSQL Columnar
Amazon Redshift
200+
Unload ~25 query monitoring
Auto WLM AQUA (Advanced Query Accelerator)
to CSV rules (QMR) support
Resiliency of
Concurrency scaling DC1 migration to DC2
ROLLBACK processing
Autoanalyze for
Manage multi-part Spectrum Request Apply new
incremental changes
query in AWS console
on table Accelerator distribution key
Compute nodes
Leader
Amazon Redshift Spectrum nodes
node
Compute
Compute Compute Compute
Compute
Compute Compute
Compute Compute
Compute
Clusters
Clusters
RA3
Clusters
Compute
Clusters
Clusters
RA3
Clusters
Compute
Clusters
Clusters
RA3
Clusters Minimize data movement over the network
Clusters Clusters Clusters
cluster cluster cluster by pushing down operations to AQUA nodes
Row-based storage
• Need to read everything
• Unnecessary I/O
Example: Columnar architecture
CREATE TABLE deep_dive (
aid INT --audience_id
aid loc dt
,loc CHAR(3) --location 1 SFO 2017-10-20
);
,dt DATE --date
2 JFK 2017-10-20
3 SFO 2017-04-01
4 JFK 2017-05-14
aid loc dt
Column-based storage
Only scan blocks for relevant column
Terminology and concepts: Compression
Goals Impact
Allow more data to be stored within Allows 2–4x more data
an Amazon Redshift cluster to be stored within the cluster
Improve query performance
by decreasing I/O
deep_dive
aid loc dt
1 SFO 2017-10-20
2 JFK 2017-10-20
3 SFO 2017-04-01
4 JFK 2017-05-14
Example: Sort key
CREATE TABLE deep_dive (
aid INT --audience_id Add a sort key to one or more
,loc CHAR(3) --location columns to physically sort
,dt DATE --date
the data on disk
) SORTKEY (dt, loc);
Unsorted table
MIN: 01-JUNE-2017
MAX: 20-JUNE-2017
MIN: 08-JUNE-2017
MAX: 30-JUNE-2017
MIN: 12-JUNE-2017
MAX: 20-JUNE-2017
MIN: 02-JUNE-2017
MAX: 25-JUNE-2017
Example: Zone maps and sorting
SELECT count(*) FROM deep_dive WHERE dt = '06-09-2017';
Unsorted table
MIN: 01-JUNE-2017
MAX: 20-JUNE-2017
MIN: 08-JUNE-2017
MAX: 30-JUNE-2017
MIN: 12-JUNE-2017
MAX: 20-JUNE-2017
MIN: 02-JUNE-2017
MAX: 25-JUNE-2017
Example: Zone maps and sorting
SELECT count(*) FROM deep_dive WHERE dt = '06-09-2017';
Node 1 Node 2
Data distribution, EVEN example
CREATE TABLE deep_dive ( INSERT INTO deep_dive VALUES
aid INT --audience_id (1, 'SFO', '2016-09-01'),
,loc CHAR(3) --location (2, 'JFK', '2016-09-14'),
,dt DATE --date (3, 'SFO', '2017-04-01'),
) DISTSTYLE EVEN; (4, 'JFK', '2017-05-14');
Table: deep_dive
User Columns System Columns
aid loc dt ins del row
Rows: 1
Slice 0 Slice 1 Slice 2 Slice 3
Node 1 Node 2
Data distribution, EVEN example
CREATE TABLE deep_dive ( INSERT INTO deep_dive VALUES
aid INT --audience_id (1, 'SFO', '2016-09-01'),
,loc CHAR(3) --location (2, 'JFK', '2016-09-14'),
,dt DATE --date (3, 'SFO', '2017-04-01'),
) DISTSTYLE EVEN; (4, 'JFK', '2017-05-14');
Rows: 1 Rows: 1
Slice 0 Slice 1 Slice 2 Slice 3
Node 1 Node 2
Data distribution, EVEN example
CREATE TABLE deep_dive ( INSERT INTO deep_dive VALUES
aid INT --audience_id (1, 'SFO', '2016-09-01'),
,loc CHAR(3) --location (2, 'JFK', '2016-09-14'),
,dt DATE --date (3, 'SFO', '2017-04-01'),
) DISTSTYLE EVEN; (4, 'JFK', '2017-05-14');
Node 1 Node 2
Data distribution, EVEN example
CREATE TABLE deep_dive ( INSERT INTO deep_dive VALUES
aid INT --audience_id (1, 'SFO', '2016-09-01'),
,loc CHAR(3) --location (2, 'JFK', '2016-09-14'),
,dt DATE --date (3, 'SFO', '2017-04-01'),
) DISTSTYLE EVEN; (4, 'JFK', '2017-05-14');
Node 1 Node 2
Data distribution, KEY Example #1
CREATE TABLE deep_dive ( INSERT INTO deep_dive VALUES
aid INT --audience_id (1, 'SFO', '2016-09-01'),
,loc CHAR(3) --location (2, 'JFK', '2016-09-14'),
,dt DATE --date (3, 'SFO', '2017-04-01'),
) DISTSTYLE KEY DISTKEY (loc); (4, 'JFK', '2017-05-14');
Node 1 Node 2
Data distribution, KEY Example #1
CREATE TABLE deep_dive ( INSERT INTO deep_dive VALUES
aid INT --audience_id (1, 'SFO', '2016-09-01'),
,loc CHAR(3) --location (2, 'JFK', '2016-09-14'),
,dt DATE --date (3, 'SFO', '2017-04-01'),
) DISTSTYLE KEY DISTKEY (loc); (4, 'JFK', '2017-05-14');
Table: deep_dive
User Columns System Columns
aid loc dt ins del row
Node 1 Node 2
Data distribution, KEY Example #1
CREATE TABLE deep_dive ( INSERT INTO deep_dive VALUES
aid INT --audience_id (1, 'SFO', '2016-09-01'),
,loc CHAR(3) --location (2, 'JFK', '2016-09-14'),
,dt DATE --date (3, 'SFO', '2017-04-01'),
) DISTSTYLE KEY DISTKEY (loc); (4, 'JFK', '2017-05-14');
Node 1 Node 2
Data distribution, KEY Example #1
CREATE TABLE deep_dive ( INSERT INTO deep_dive VALUES
aid INT --audience_id (1, 'SFO', '2016-09-01'),
,loc CHAR(3) --location (2, 'JFK', '2016-09-14'),
,dt DATE --date (3, 'SFO', '2017-04-01'),
) DISTSTYLE KEY DISTKEY (loc); (4, 'JFK', '2017-05-14');
Node 1 Node 2
Data distribution, KEY Example #1
CREATE TABLE deep_dive ( INSERT INTO deep_dive VALUES
aid INT --audience_id (1, 'SFO', '2016-09-01'),
,loc CHAR(3) --location (2, 'JFK', '2016-09-14'),
,dt DATE --date (3, 'SFO', '2017-04-01'),
) DISTSTYLE KEY DISTKEY (loc); (4, 'JFK', '2017-05-14');
Node 1 Node 2
Data distribution, KEY Example #2
CREATE TABLE deep_dive ( INSERT INTO deep_dive VALUES
aid INT --audience_id (1, 'SFO', '2016-09-01'),
,loc CHAR(3) --location (2, 'JFK', '2016-09-14'),
,dt DATE --date (3, 'SFO', '2017-04-01'),
) DISTSTYLE KEY DISTKEY (aid); (4, 'JFK', '2017-05-14');
Node 1 Node 2
Data distribution, KEY Example #2
CREATE TABLE deep_dive ( INSERT INTO deep_dive VALUES
aid INT --audience_id (1, 'SFO', '2016-09-01'),
,loc CHAR(3) --location (2, 'JFK', '2016-09-14'),
,dt DATE --date (3, 'SFO', '2017-04-01'),
) DISTSTYLE KEY DISTKEY (aid); (4, 'JFK', '2017-05-14');
Node 1 Node 2
Data distribution, ALL example
CREATE TABLE deep_dive ( INSERT INTO deep_dive VALUES
aid INT --audience_id (1, 'SFO', '2016-09-01'),
,loc CHAR(3) --location (2, 'JFK', '2016-09-14'),
,dt DATE --date (3, 'SFO', '2017-04-01'),
) DISTSTYLE ALL; (4, 'JFK', '2017-05-14');
Node 1 Node 2
Data distribution summary
DISTSTYLE KEY
Goals
• Optimize JOIN performance between large tables by distributing on columns used in the ON clause
• Optimize INSERT INTO SELECT performance SELECT diststyle, skew_rows
FROM svv_table_info WHERE "table" = 'deep_dive';
• Optimize GROUP BY performance
diststyle | skew_rows
The column that is being distributed on should -----------+----------- Ratio of the slice with the most
have a high cardinality and not cause row skew KEY(aid) | 1.07 and least number of rows
DISTSTYLE ALL
Goals
• Optimize JOIN performance with dimension tables
• Reduces disk usage on small tables
Small- and medium-size dimension tables (<3M rows)
DISTSTYLE EVEN
If neither KEY or ALL apply
DISTSTYLE AUTO
Default distribution combines DISTSTYLE ALL and EVEN
© 2020, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS migration tooling
Microsoft SQL
Server
Admin views
Collection of utilities for managing your cluster, generating schema DDL, and so on
Explore tailored Build cloud skills with Demonstrate expertise with Find entry-level cloud
learning paths for 550+ free digital an industry-recognized talent with AWS
customers and training courses, or dive credential Academy and AWS
partners deep with classroom re/Start
training
[Link]/training
Thank you!
Aneesh Chandra PN
nanhyama@[Link]
© 2020, Amazon Web Services, Inc. or its affiliates. All rights reserved.