Questonnaries to for DataWarehousing
General Information
What are the primary objectives of the data warehousing project?
Who are the key stakeholders and decision-makers for this project?
Data Volume and Format
What is the exact daily volume of incoming data?
What is the expected data growth in daily volume ?
What is the expected data growth rate over the next 1-3 years?
Are there any variations in the JSON data format?
Data Sources
What are the primary data sources?
How often is data ingested (real-time, hourly, daily)?
Are there any specific data transformation requirements?
Data Storage and Management
What is the data retention policy?
Is there a need for partitioning the data for efficient querying?
Data Access and Security
Who will need access to the data warehouse?
What are the security requirements for data access and encryption?
Are there any regulatory compliance requirements?
Performance and Scalability
What are the performance expectations for query execution times?
How many concurrent users or queries are expected?
How should the system scale to handle increasing data volume and user load?
Integration and API Access
Are there existing tools or applications that need to integrate with Redshift?
What are the specific requirements for the Redshift Data API and Lambda functions?
Are there any preferences or restrictions regarding serverless architecture?
Cost and Budget
What is the allocated budget for the project?
Are there any specific cost management strategies or tools in use?
Timeline and Milestones
What is the expected timeline for the project phases?
What are the critical milestones and their deadlines?
Support and Maintenance
What are the expectations for ongoing support and maintenance?
Is there a need for training sessions for the client’s team on using Redshift and related tools?
eg. Nested Loops in JSON
Technical Infromation
Technical Specifications of
Data
What are the schema details of your JSON data?
Data Schema Details: Please provide examples.
What are the specific data integrity constraints (e.g.,
Data Integrity Requirements foreign keys, unique constraints)?
What are the specific data types and sizes involved
Data Types and Sizes: in your datasets?
Data Processing and
Transformation
What specific transformations need to be applied to
Transformation Logic: the data before loading into Redshift?
What are the data cleaning steps required for your
Data Cleaning: datasets?
Do you require batch processing, stream
Batch vs. Streaming processing, or both?
Data Ingestion
What tools or technologies are you considering for
Ingestion Tools data ingestion?
What are the latency requirements for data
Ingestion Performance ingestion?
How should the system handle ingestion errors or
Error Handling data anomalies?
Page 3
Technical Infromation
Example: { "customer_id": 123, "name": "John Doe",
"transactions": [{ "date": "2022-01-01", "amount":
100.50 }] }
Example: customer_id must be unique;
[Link] cannot be null
Example: customer_id (integer), name (string, max 100
chars), amount (decimal, 10,2)
Example: Convert timestamp fields to UTC, aggregate
daily sales data into monthly totals
Example: Remove or correct records with null
customer_ids, deduplicate entries
Example: Batch processing nightly, real-time streaming
for transaction data
Example: AWS Data Pipeline for batch, Amazon
Kinesis for real-time streams
Example: Batch processing within 4 hours, stream
processing under 10 seconds
Example: Log errors to CloudWatch, retry ingestion
twice, notify via SNS for critical failures
Page 4