0% found this document useful (0 votes)
8 views4 pages

Redshift Project Data Warehousing Guide

Uploaded by

hawk eye
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as XLSX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views4 pages

Redshift Project Data Warehousing Guide

Uploaded by

hawk eye
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as XLSX, PDF, TXT or read online on Scribd

Questonnaries to for DataWarehousing

General Information
What are the primary objectives of the data warehousing project?
Who are the key stakeholders and decision-makers for this project?

Data Volume and Format


What is the exact daily volume of incoming data?
What is the expected data growth in daily volume ?
What is the expected data growth rate over the next 1-3 years?
Are there any variations in the JSON data format?

Data Sources
What are the primary data sources?
How often is data ingested (real-time, hourly, daily)?
Are there any specific data transformation requirements?

Data Storage and Management


What is the data retention policy?
Is there a need for partitioning the data for efficient querying?

Data Access and Security


Who will need access to the data warehouse?
What are the security requirements for data access and encryption?
Are there any regulatory compliance requirements?

Performance and Scalability


What are the performance expectations for query execution times?
How many concurrent users or queries are expected?
How should the system scale to handle increasing data volume and user load?

Integration and API Access


Are there existing tools or applications that need to integrate with Redshift?
What are the specific requirements for the Redshift Data API and Lambda functions?
Are there any preferences or restrictions regarding serverless architecture?

Cost and Budget


What is the allocated budget for the project?
Are there any specific cost management strategies or tools in use?

Timeline and Milestones


What is the expected timeline for the project phases?
What are the critical milestones and their deadlines?

Support and Maintenance


What are the expectations for ongoing support and maintenance?
Is there a need for training sessions for the client’s team on using Redshift and related tools?
eg. Nested Loops in JSON
Technical Infromation
Technical Specifications of
Data
What are the schema details of your JSON data?
Data Schema Details: Please provide examples.
What are the specific data integrity constraints (e.g.,
Data Integrity Requirements foreign keys, unique constraints)?
What are the specific data types and sizes involved
Data Types and Sizes: in your datasets?
Data Processing and
Transformation
What specific transformations need to be applied to
Transformation Logic: the data before loading into Redshift?
What are the data cleaning steps required for your
Data Cleaning: datasets?
Do you require batch processing, stream
Batch vs. Streaming processing, or both?

Data Ingestion
What tools or technologies are you considering for
Ingestion Tools data ingestion?
What are the latency requirements for data
Ingestion Performance ingestion?
How should the system handle ingestion errors or
Error Handling data anomalies?

Page 3
Technical Infromation

Example: { "customer_id": 123, "name": "John Doe",


"transactions": [{ "date": "2022-01-01", "amount":
100.50 }] }
Example: customer_id must be unique;
[Link] cannot be null
Example: customer_id (integer), name (string, max 100
chars), amount (decimal, 10,2)

Example: Convert timestamp fields to UTC, aggregate


daily sales data into monthly totals
Example: Remove or correct records with null
customer_ids, deduplicate entries
Example: Batch processing nightly, real-time streaming
for transaction data

Example: AWS Data Pipeline for batch, Amazon


Kinesis for real-time streams
Example: Batch processing within 4 hours, stream
processing under 10 seconds
Example: Log errors to CloudWatch, retry ingestion
twice, notify via SNS for critical failures

Page 4

You might also like