Snowflake — Quick Reference Notes
Cloud Data Warehouse
Architecture · DDL/DML · Warehouses · Loading · Streams · Tasks · Semi-Structured · Security
Index
# Topic What's Covered
1 What is Snowflake? Architecture overview, cloud platforms, editions
2 Snowflake Architecture Storage, compute, cloud services layers, virtual warehouses
3 Database Objects Hierarchy Organization → Account → Database → Schema → Table/View
4 Data Types Numeric, string, date/time, semi-structured, boolean, variant
5 DDL — Tables & Schema CREATE, ALTER, DROP, CLONE, TRANSIENT, TEMPORARY
tables
6 DML — Query & Modify Data SELECT, INSERT, UPDATE, DELETE, MERGE, TRUNCATE
7 Virtual Warehouses Sizes, auto-suspend, auto-resume, multi-cluster warehouses
8 Data Loading — COPY INTO Stages (internal/external), file formats, COPY INTO syntax
9 Streams & Change Data Stream types, consuming streams, DML change tracking
Capture
10 Tasks & Orchestration Creating tasks, scheduling, DAG of tasks, serverless tasks
11 Semi-Structured Data VARIANT, ARRAY, OBJECT, FLATTEN, querying JSON/Parquet
12 Time Travel & Fail-safe AT/BEFORE syntax, undrop, cloning from history
13 Cloning Zero-copy clone — tables, schemas, databases
14 Access Control & Security RBAC, roles, privileges, row-level security, column masking
15 Performance & Optimization Clustering, result cache, query profile, pruning, materialized
views
16 Snowflake Functions String, date, numeric, conditional, semi-structured, window
functions
17 Snowpipe — Continuous Auto-ingest, event notifications, pipe status
Loading
18 Stored Procedures & UDFs JavaScript/Python/SQL procs, scalar and table UDFs
19 Data Sharing & Marketplace Secure data sharing, reader accounts, data marketplace
20 Best Practices Warehouse sizing, clustering keys, cost control, naming
conventions
1. What is Snowflake?
Snowflake is a fully managed cloud data warehouse (SaaS) with a unique multi-cluster shared data
architecture. It separates storage and compute, allowing each to scale independently.
Feature Details
Type Cloud-native SaaS data warehouse
Cloud platforms AWS, Azure, Google Cloud
Language ANSI SQL + Snowflake extensions
Scaling Storage and compute scale independently
Concurrency Multi-cluster warehouses handle thousands of concurrent queries
Semi-structured Native support for JSON, Avro, Parquet, ORC, XML via VARIANT
Editions Standard → Enterprise → Business Critical → Virtual Private
2. Snowflake Architecture
Snowflake has three distinct layers that operate independently — this is what makes it unique from
traditional data warehouses.
Layer What it does Managed by
Cloud Services Authentication, metadata, query Snowflake (always on)
parsing, optimization, access
control
Query Processing Virtual Warehouses — MPP You — create/size/suspend
(Compute) clusters that execute queries
Database Storage Compressed columnar micro- Snowflake (automatic)
partitions on cloud object
storage (S3/Blob/GCS)
Virtual Warehouse Sizes
Size Credits/hr Use case
X-Small (XS) 1 Dev, testing, small queries
Small (S) 2 Light ETL, small reports
Medium (M) 4 Standard analytics workloads
Large (L) 8 Complex queries, mid-size ETL
X-Large (XL) 16 Heavy ETL, large dataset queries
2X-Large 32 Very heavy workloads
4X-Large 64 Massive data processing
6X-Large 128 Largest ETL / high concurrency
💡 Bigger warehouse = faster query (more compute). Cost = credits × credits_per_hour × credit_price. Auto-
suspend when idle!
3. Database Objects Hierarchy
Organization -- optional top-level grouping
└── Account -- your Snowflake account
└── Database -- logical container
└── Schema -- groups related objects
├── Table -- structured data
├── View -- saved SELECT query
├── Stage -- file location for loading
├── File Format -- parsing rules for files
├── Sequence -- auto-increment numbers
├── Stream -- CDC change log
├── Task -- scheduled SQL job
├── Pipe -- continuous data loading
├── Stored Proc -- programmatic logic
└── UDF -- user-defined function
USE — switch context
USE ROLE SYSADMIN;
USE WAREHOUSE COMPUTE_WH;
USE DATABASE MY_DB;
USE SCHEMA MY_DB.PUBLIC;
4. Data Types
Category Data Types
Numeric NUMBER(p,s), DECIMAL, INT, BIGINT, SMALLINT, TINYINT,
FLOAT, DOUBLE, REAL
String VARCHAR(n), CHAR(n), STRING, TEXT — all stored as VARCHAR
internally
Boolean BOOLEAN — TRUE / FALSE / NULL
Date & Time DATE, TIME, DATETIME, TIMESTAMP, TIMESTAMP_LTZ,
TIMESTAMP_NTZ, TIMESTAMP_TZ
Semi-Structured VARIANT, OBJECT, ARRAY — store JSON, Avro, Parquet natively
Geo GEOGRAPHY, GEOMETRY — spatial data (WGS84)
Binary BINARY, VARBINARY
VARIANT — the universal semi-structured type
-- VARIANT can hold JSON, array, object, or any primitive
CREATE TABLE events (
event_id INT,
payload VARIANT -- store raw JSON here
);
-- Insert JSON
INSERT INTO events SELECT 1, PARSE_JSON('{"user":"alice","action":"login"}');
-- Query VARIANT fields with colon notation
SELECT payload:user::VARCHAR AS user_name,
payload:action::VARCHAR AS action
FROM events;
5. DDL — Tables & Schema
CREATE DATABASE / SCHEMA
CREATE DATABASE IF NOT EXISTS analytics;
CREATE SCHEMA IF NOT EXISTS [Link];
ALTER DATABASE analytics RENAME TO analytics_v2;
DROP DATABASE IF EXISTS old_db;
CREATE TABLE types
-- Permanent table (default) — survives cluster restarts
CREATE TABLE orders (
order_id NUMBER AUTOINCREMENT PRIMARY KEY,
customer_id NUMBER NOT NULL,
status VARCHAR(50) DEFAULT 'pending',
amount DECIMAL(12,2),
order_date DATE,
payload VARIANT
);
-- Transient table — no Fail-safe, lower storage cost
CREATE TRANSIENT TABLE staging_orders LIKE orders;
-- Temporary table — session-scoped, dropped on disconnect
CREATE TEMPORARY TABLE temp_calc AS
SELECT * FROM orders WHERE order_date = CURRENT_DATE;
-- Create table from query
CREATE TABLE orders_2024 AS
SELECT * FROM orders WHERE YEAR(order_date) = 2024;
ALTER TABLE
ALTER TABLE orders ADD COLUMN discount DECIMAL(5,2) DEFAULT 0;
ALTER TABLE orders DROP COLUMN discount;
ALTER TABLE orders RENAME COLUMN status TO order_status;
ALTER TABLE orders ALTER COLUMN amount SET NOT NULL;
ALTER TABLE orders CLUSTER BY (order_date, customer_id);
ALTER TABLE orders SWAP WITH orders_backup; -- atomic table swap
Views
-- Standard view
CREATE OR REPLACE VIEW v_active_orders AS
SELECT * FROM orders WHERE status != 'cancelled';
-- Secure view — hides definition from non-owners
CREATE OR REPLACE SECURE VIEW v_sensitive_orders AS
SELECT order_id, amount FROM orders;
-- Materialized view — pre-computed, auto-refreshed
CREATE OR REPLACE MATERIALIZED VIEW mv_daily_revenue AS
SELECT order_date, SUM(amount) AS revenue
FROM orders GROUP BY order_date;
6. DML — Query & Modify Data
SELECT with Snowflake extras
-- QUALIFY — filter window function results (no subquery needed)
SELECT order_id, customer_id, amount,
ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY amount DESC) AS rn
FROM orders
QUALIFY rn = 1; -- top order per customer
-- SAMPLE — random row sampling
SELECT * FROM orders SAMPLE (10); -- 10% of rows
SELECT * FROM orders SAMPLE (1000 ROWS); -- exactly 1000 rows
-- TOP (alternative to LIMIT)
SELECT TOP 100 * FROM orders ORDER BY amount DESC;
MERGE — upsert
MERGE INTO orders AS target
USING (SELECT * FROM orders_staging) AS source
ON target.order_id = source.order_id
WHEN MATCHED AND [Link] != [Link] THEN
UPDATE SET [Link] = [Link],
[Link] = [Link]
WHEN NOT MATCHED THEN
INSERT (order_id, customer_id, status, amount)
VALUES (source.order_id, source.customer_id, [Link], [Link]);
MULTI-TABLE INSERT
INSERT ALL
WHEN status = 'shipped' THEN INTO orders_shipped
WHEN status = 'cancelled' THEN INTO orders_cancelled
ELSE INTO orders_other
SELECT * FROM orders_staging;
7. Virtual Warehouses
Create & Manage
CREATE WAREHOUSE IF NOT EXISTS transform_wh
WAREHOUSE_SIZE = 'MEDIUM'
AUTO_SUSPEND = 60 -- suspend after 60s idle
AUTO_RESUME = TRUE -- auto-start on query
MIN_CLUSTER_COUNT = 1
MAX_CLUSTER_COUNT = 3 -- multi-cluster: scale out
SCALING_POLICY = 'ECONOMY' -- or STANDARD
COMMENT = 'ETL warehouse';
ALTER WAREHOUSE transform_wh SUSPEND;
ALTER WAREHOUSE transform_wh RESUME;
ALTER WAREHOUSE transform_wh SET WAREHOUSE_SIZE = 'LARGE';
DROP WAREHOUSE IF EXISTS old_wh;
SHOW WAREHOUSES;
Multi-Cluster Warehouses
Policy Behavior
STANDARD Starts new clusters immediately when queries queue up
ECONOMY Waits 6 minutes before starting a new cluster — saves credits
💡 Use separate warehouses per workload type (ETL, reporting, ad-hoc) to isolate performance and control costs.
8. Data Loading — COPY INTO
Stage types
Stage Type Description Example path
User stage Auto-created per user, private @~ (tilde)
Table stage Auto-created per table @%table_name
Named internal stage Explicitly created, shared @my_stage
Named external stage Points to S3 / Azure Blob / GCS @s3_stage
Create a Named Stage
-- Internal named stage
CREATE STAGE my_internal_stage
FILE_FORMAT = (TYPE = 'CSV' FIELD_OPTIONALLY_ENCLOSED_BY = '"');
-- External stage — S3
CREATE STAGE s3_raw_stage
URL = 's3://my-bucket/raw/'
CREDENTIALS = (AWS_KEY_ID = '...' AWS_SECRET_KEY = '...')
FILE_FORMAT = (TYPE = 'PARQUET');
-- External stage with storage integration (preferred — no keys)
CREATE STAGE s3_secure_stage
URL = 's3://my-bucket/raw/'
STORAGE_INTEGRATION = my_s3_integration
FILE_FORMAT = (TYPE = 'CSV' SKIP_HEADER = 1);
File Formats
CREATE FILE FORMAT csv_format
TYPE = 'CSV'
FIELD_DELIMITER = ','
RECORD_DELIMITER = '\n'
SKIP_HEADER = 1
FIELD_OPTIONALLY_ENCLOSED_BY = '"'
NULL_IF = ('NULL', 'null', '')
EMPTY_FIELD_AS_NULL = TRUE;
CREATE FILE FORMAT json_format
TYPE = 'JSON'
STRIP_OUTER_ARRAY = TRUE; -- unwrap top-level array
CREATE FILE FORMAT parquet_format
TYPE = 'PARQUET'
SNAPPY_COMPRESSION = TRUE;
COPY INTO — load data
-- Load from stage into table
COPY INTO orders
FROM @s3_raw_stage/orders/
FILE_FORMAT = (FORMAT_NAME = 'csv_format')
PATTERN = '.*orders.*\.csv'
ON_ERROR = 'CONTINUE' -- CONTINUE / SKIP_FILE / ABORT_STATEMENT
PURGE = TRUE; -- delete staged files after load
-- Load specific columns
COPY INTO orders (order_id, customer_id, amount)
FROM (SELECT $1, $2, $4 FROM @my_stage/[Link]);
-- Validate without loading
COPY INTO orders FROM @my_stage VALIDATION_MODE = 'RETURN_ERRORS';
COPY INTO — unload data
-- Export table to stage
COPY INTO @s3_raw_stage/exports/orders_
FROM (SELECT * FROM orders WHERE order_date = CURRENT_DATE)
FILE_FORMAT = (TYPE = 'PARQUET')
OVERWRITE = TRUE
MAX_FILE_SIZE = 104857600; -- 100 MB per file
9. Streams & Change Data Capture
A Stream records DML changes (INSERT, UPDATE, DELETE) made to a table since the stream was
last consumed. Used for CDC and incremental processing.
Stream types
Type Tracks
Standard (default) All DML — inserts, updates, deletes
Append-only Only INSERT operations — no updates/deletes
Insert-only Only inserts — for external tables
Create & consume a stream
-- Create stream on a table
CREATE STREAM orders_stream
ON TABLE orders
APPEND ONLY = FALSE; -- standard stream
-- Stream metadata columns added automatically:
-- METADATA$ACTION -- 'INSERT' or 'DELETE'
-- METADATA$ISUPDATE -- TRUE if row is part of an UPDATE
-- METADATA$ROW_ID -- unique row identifier
-- Query stream (shows changes since last consumption)
SELECT * FROM orders_stream;
-- Consume stream — process into target table
-- Stream is cleared only when consumed inside a DML transaction
INSERT INTO orders_processed
SELECT order_id, customer_id, amount, METADATA$ACTION AS change_type
FROM orders_stream
WHERE METADATA$ACTION = 'INSERT';
-- Check if stream has new data
SELECT SYSTEM$STREAM_HAS_DATA('orders_stream');
10. Tasks & Orchestration
Tasks are scheduled SQL statements (or stored procedure calls). They can be chained into a DAG for
complex orchestration.
Create a Task
-- Scheduled task — runs every 15 minutes
CREATE TASK load_orders_task
WAREHOUSE = 'TRANSFORM_WH'
SCHEDULE = '15 MINUTE' -- or USING CRON '0 * * * * UTC'
WHEN SYSTEM$STREAM_HAS_DATA('orders_stream') -- conditional
AS
INSERT INTO orders_processed
SELECT * FROM orders_stream WHERE METADATA$ACTION = 'INSERT';
-- Serverless task (Snowflake manages compute)
CREATE TASK serverless_task
USER_TASK_MANAGED_INITIAL_WAREHOUSE_SIZE = 'MEDIUM'
SCHEDULE = 'USING CRON 0 2 * * * UTC'
AS
CALL my_stored_procedure();
Task DAG — chaining tasks
-- Root task (has SCHEDULE)
CREATE TASK root_task
WAREHOUSE = 'TRANSFORM_WH'
SCHEDULE = '1 HOUR'
AS SELECT 1; -- just a trigger
-- Child task (depends on root_task)
CREATE TASK child_task_1
WAREHOUSE = 'TRANSFORM_WH'
AFTER root_task -- runs after root_task succeeds
AS CALL load_staging();
-- Grandchild task
CREATE TASK child_task_2
WAREHOUSE = 'TRANSFORM_WH'
AFTER child_task_1
AS CALL transform_to_mart();
-- Tasks are suspended by default — must RESUME
ALTER TASK child_task_2 RESUME;
ALTER TASK child_task_1 RESUME;
ALTER TASK root_task RESUME; -- resume root last
ALTER TASK root_task SUSPEND;
EXECUTE TASK root_task; -- manual trigger
SHOW TASKS;
11. Semi-Structured Data
Snowflake natively stores JSON, Avro, Parquet, ORC, and XML in VARIANT columns — no schema
needed upfront.
Querying JSON — colon notation
-- Sample data: payload = {"user": {"name": "Alice", "age": 30}, "tags":
["vip","new"]}
SELECT
payload:user:name::VARCHAR AS user_name,
payload:user:age::INT AS user_age,
payload:tags[0]::VARCHAR AS first_tag, -- array index
GET(payload, 'user') AS user_obj,
GET_PATH(payload, '[Link]') AS nested_name
FROM events;
FLATTEN — expand arrays
-- Explode array into rows
SELECT
e.event_id,
[Link]::VARCHAR AS tag
FROM events e,
LATERAL FLATTEN(INPUT => [Link]:tags) f;
-- Flatten nested object
SELECT
[Link] AS field_name,
[Link] AS field_value
FROM events,
LATERAL FLATTEN(INPUT => payload, RECURSIVE => TRUE) f;
Build semi-structured data
-- Build OBJECT
SELECT OBJECT_CONSTRUCT('name', first_name, 'id', customer_id) AS customer_obj
FROM customers;
-- Build ARRAY
SELECT ARRAY_CONSTRUCT(1, 2, 3) AS my_array;
SELECT ARRAY_AGG(product_id) AS product_list FROM order_items;
-- Parse and serialize
SELECT PARSE_JSON('{"a":1}') AS parsed;
SELECT TO_JSON(payload) AS json_string FROM events;
SELECT TYPEOF(payload:user) AS data_type FROM events;
12. Time Travel & Fail-safe
Time Travel lets you query data as it existed at a past point in time, or restore accidentally
dropped/modified data.
Time Travel — AT / BEFORE
-- Query table as it was 1 hour ago
SELECT * FROM orders AT (OFFSET => -3600);
-- Query at a specific timestamp
SELECT * FROM orders AT (TIMESTAMP => '2024-06-01 09:00:00'::TIMESTAMP);
-- Query before a specific statement ran
SELECT * FROM orders BEFORE (STATEMENT => '8e5d0ca9-005e-44e6-b858-a8f5b37c5726');
-- Restore accidentally deleted rows
INSERT INTO orders
SELECT * FROM orders BEFORE (STATEMENT => LAST_QUERY_ID());
-- Restore a dropped table
UNDROP TABLE orders;
UNDROP SCHEMA my_schema;
UNDROP DATABASE my_db;
-- Clone table from point in time
CREATE TABLE orders_restored CLONE orders
AT (TIMESTAMP => '2024-06-01 08:00:00'::TIMESTAMP);
Time Travel retention periods
Table type Max retention
Permanent table — Standard 1 day
edition
Permanent table — Enterprise 90 days (configurable)
edition
Transient table 1 day max
Temporary table 1 day max (session only)
Feature Time Travel Fail-safe
Purpose Self-service recovery by user Snowflake-managed disaster recovery
Retention 0–90 days (you set it) 7 days (fixed, after time travel expires)
Accessible by User (SQL queries) Snowflake Support only
13. Cloning
Zero-copy cloning creates an instant copy of a table, schema, or database — no data is duplicated.
Clones share micro-partitions with the source until data diverges.
Syntax
-- Clone a table (instant, no storage cost initially)
CREATE TABLE orders_backup CLONE orders;
-- Clone a schema
CREATE SCHEMA analytics.staging_backup CLONE [Link];
-- Clone a database
CREATE DATABASE analytics_dev CLONE analytics;
-- Clone from Time Travel
CREATE TABLE orders_june1 CLONE orders
AT (TIMESTAMP => '2024-06-01 00:00:00'::TIMESTAMP);
-- Clone a stage
CREATE STAGE my_stage_backup CLONE my_stage;
💡 Cloning is perfect for dev/test environments — clone prod database instantly with zero storage cost at clone
time.
14. Access Control & Security
Snowflake uses Role-Based Access Control (RBAC). Users are granted roles, and roles are granted
privileges on objects.
Built-in system roles
Role Access level
ACCOUNTADMIN Full account control — billing, users, top-level objects. Use sparingly.
SYSADMIN Create and manage databases, warehouses, schemas, tables
SECURITYADMIN Create and manage users and roles
USERADMIN Create users and roles (subset of SECURITYADMIN)
PUBLIC Default role every user gets — minimal privileges
Create roles & grant privileges
-- Create a role
CREATE ROLE analyst_role;
CREATE ROLE transformer_role;
-- Grant object privileges to role
GRANT USAGE ON DATABASE analytics TO ROLE analyst_role;
GRANT USAGE ON SCHEMA [Link] TO ROLE analyst_role;
GRANT SELECT ON ALL TABLES IN SCHEMA [Link] TO ROLE analyst_role;
GRANT SELECT ON FUTURE TABLES IN SCHEMA [Link] TO ROLE analyst_role;
-- Grant warehouse usage
GRANT USAGE ON WAREHOUSE reporting_wh TO ROLE analyst_role;
-- Assign role to user
GRANT ROLE analyst_role TO USER alice;
-- Role hierarchy — analyst_role inherits transformer_role privileges
GRANT ROLE transformer_role TO ROLE analyst_role;
-- Revoke
REVOKE SELECT ON TABLE orders FROM ROLE analyst_role;
Row-Level Security — Row Access Policy
-- Policy: users only see rows matching their region
CREATE ROW ACCESS POLICY region_policy
AS (row_region VARCHAR) RETURNS BOOLEAN ->
row_region = CURRENT_ROLE() -- simple example
OR CURRENT_ROLE() = 'ACCOUNTADMIN';
-- Attach to table
ALTER TABLE orders
ADD ROW ACCESS POLICY region_policy ON (region);
Dynamic Data Masking
-- Mask email for non-admin roles
CREATE MASKING POLICY email_mask
AS (val VARCHAR) RETURNS VARCHAR ->
CASE
WHEN CURRENT_ROLE() IN ('ACCOUNTADMIN', 'ANALYST') THEN val
ELSE REGEXP_REPLACE(val, '.+@', '*****@')
END;
-- Apply to column
ALTER TABLE customers
MODIFY COLUMN email SET MASKING POLICY email_mask;
15. Performance & Optimization
Query result cache
-- Results cached for 24 hours at Cloud Services layer
-- Exact same query = instant result, zero warehouse cost
-- Cache invalidated when underlying data changes
-- Check if last query used cache:
SELECT * FROM TABLE(RESULT_SCAN(LAST_QUERY_ID()));
-- In query profile: look for 'Query Result Reuse'
Micro-partition pruning
-- Snowflake stores metadata (min/max) per micro-partition
-- Queries on filtered columns prune irrelevant partitions
-- GOOD — prunes by date (common filter column)
SELECT * FROM orders WHERE order_date = '2024-06-01';
-- CLUSTERING KEY — explicitly define pruning column for large tables
ALTER TABLE orders CLUSTER BY (order_date);
ALTER TABLE orders CLUSTER BY (order_date, customer_id);
-- Check clustering info
SELECT SYSTEM$CLUSTERING_INFORMATION('orders', '(order_date)');
Search Optimization Service
-- Accelerates point lookups on large tables (equality/IN queries)
ALTER TABLE orders ADD SEARCH OPTIMIZATION;
ALTER TABLE orders DROP SEARCH OPTIMIZATION;
-- Check search optimization status
SHOW TABLES LIKE 'orders';
Query Profile — how to analyze
Metric to check What it means
Partitions scanned vs total High ratio = poor pruning — add cluster key
Spillage to local/remote disk Warehouse too small — increase size
Bytes sent over network Large data shuffle — optimize JOIN order
Query result reuse Cache hit — free query
Compilation time Long compile = complex query or metadata issue
16. Snowflake Functions
Conditional Functions
IFF(condition, true_val, false_val) -- single if/else
IFF(amount > 100, 'high', 'low')
CASE WHEN ... THEN ... ELSE ... END -- multi-branch
COALESCE(col1, col2, 'default') -- first non-null
NULLIF(col, 0) -- returns NULL if col = 0
NVL(col, 'fallback') -- alias for COALESCE
NVL2(col, 'not null val', 'null val')
ZEROIFNULL(col) -- NULL → 0
DECODE(col, 'a', 1, 'b', 2, 0) -- like CASE for equality
String Functions
UPPER(col) / LOWER(col) / INITCAP(col)
TRIM(col) / LTRIM(col) / RTRIM(col)
LENGTH(col) / SUBSTR(col, start, len)
CONCAT(a, b, c) or a || b || c
REPLACE(col, 'old', 'new')
SPLIT(col, ',') -- returns ARRAY
SPLIT_PART(col, ',', 1) -- 1-indexed part
CONTAINS(col, 'substr') -- returns BOOLEAN
STARTSWITH(col, 'prefix') / ENDSWITH(col, 'suffix')
REGEXP_LIKE(col, 'pattern')
REGEXP_REPLACE(col, 'pattern', 'replacement')
REGEXP_SUBSTR(col, 'pattern', 1, 1)
LPAD(col, 10, '0') / RPAD(col, 10, ' ')
Date & Time Functions
CURRENT_DATE -- today's date
CURRENT_TIMESTAMP -- current datetime with tz
SYSDATE() -- alias for CURRENT_TIMESTAMP
DATEADD(day, 7, order_date) -- add 7 days
DATEDIFF(day, start_date, end_date) -- diff in days
DATE_TRUNC('month', order_date) -- truncate to month
EXTRACT(year FROM order_date) -- or YEAR(order_date)
TO_DATE('2024-06-01', 'YYYY-MM-DD')
TO_TIMESTAMP('2024-06-01 10:00:00')
CONVERT_TIMEZONE('UTC', 'Asia/Kolkata', ts_col) -- tz conversion
LAST_DAY(order_date) -- last day of month
Window Functions (same as standard SQL)
ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY order_date)
RANK() OVER (PARTITION BY dept ORDER BY salary DESC)
DENSE_RANK() OVER (...)
LAG(amount, 1) OVER (ORDER BY order_date) -- previous row
LEAD(amount, 1) OVER (ORDER BY order_date) -- next row
SUM(amount) OVER (PARTITION BY customer_id ORDER BY order_date
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS
running_total
-- Snowflake-specific: QUALIFY to filter window results
SELECT * FROM orders
QUALIFY ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY amount DESC) = 1;
RATIO_TO_REPORT(amount) OVER (PARTITION BY dept) -- pct of group total
17. Snowpipe — Continuous Loading
Snowpipe continuously loads data into tables as new files arrive in a stage — no need to manually run
COPY INTO.
Create a Pipe
CREATE PIPE orders_pipe
AUTO_INGEST = TRUE -- trigger via cloud event notification
AS
COPY INTO orders
FROM @s3_raw_stage/orders/
FILE_FORMAT = (FORMAT_NAME = 'csv_format');
-- Get the SQS ARN for S3 event notification setup:
SHOW PIPES;
SELECT SYSTEM$PIPE_STATUS('orders_pipe');
-- Manually refresh (for files already in stage)
ALTER PIPE orders_pipe REFRESH;
-- Pause / resume
ALTER PIPE orders_pipe SET PIPE_EXECUTION_PAUSED = TRUE;
ALTER PIPE orders_pipe SET PIPE_EXECUTION_PAUSED = FALSE;
💡 Snowpipe uses serverless compute — no warehouse needed. Billed per file/credit based on usage.
18. Stored Procedures & UDFs
Stored Procedure — JavaScript
CREATE OR REPLACE PROCEDURE update_order_status(order_id_param INT)
RETURNS STRING
LANGUAGE JAVASCRIPT
AS $$
var sql = `UPDATE orders SET status = 'processed'
WHERE order_id = ${ORDER_ID_PARAM}`;
[Link]({sqlText: sql});
return 'Done';
$$;
CALL update_order_status(12345);
Stored Procedure — Snowflake Scripting (SQL)
CREATE OR REPLACE PROCEDURE refresh_marts()
RETURNS STRING
LANGUAGE SQL
AS
$$
BEGIN
CREATE OR REPLACE TABLE marts.fct_orders AS
SELECT * FROM staging.stg_orders WHERE status = 'delivered';
CREATE OR REPLACE TABLE marts.dim_customers AS
SELECT DISTINCT customer_id, customer_name FROM staging.stg_orders;
RETURN 'Marts refreshed';
END;
$$;
CALL refresh_marts();
UDF — Scalar
-- SQL UDF
CREATE OR REPLACE FUNCTION cents_to_dollars(cents NUMBER)
RETURNS NUMBER(12,2)
AS 'cents / 100.0';
-- Usage
SELECT cents_to_dollars(amount_cents) FROM orders;
-- JavaScript UDF
CREATE OR REPLACE FUNCTION parse_domain(email VARCHAR)
RETURNS VARCHAR
LANGUAGE JAVASCRIPT
AS $$
return [Link]('@')[1];
$$;
UDTF — Table Function
-- Returns multiple rows per input row
CREATE OR REPLACE FUNCTION split_to_rows(val VARCHAR, delim VARCHAR)
RETURNS TABLE (item VARCHAR)
LANGUAGE JAVASCRIPT
AS $$
{
processRow: function(row, rowWriter) {
[Link]([Link]).forEach(item => [Link]({ITEM:
[Link]()}));
}
}
$$;
SELECT * FROM TABLE(split_to_rows('a,b,c', ','));
19. Data Sharing & Marketplace
Snowflake allows sharing live data between accounts with zero data copy — the consumer queries the
provider's storage directly.
Feature Details
Secure Data Sharing Share databases/schemas/tables to other Snowflake accounts — live,
no copy
Reader Accounts Share data with non-Snowflake customers (Snowflake manages their
account)
Data Marketplace Discover and subscribe to third-party datasets (weather, finance, geo)
Data Exchange Private marketplace for sharing within your org or partners
Create a Share
-- Provider side
CREATE SHARE orders_share;
GRANT USAGE ON DATABASE analytics TO SHARE orders_share;
GRANT USAGE ON SCHEMA [Link] TO SHARE orders_share;
GRANT SELECT ON TABLE [Link].fct_orders TO SHARE orders_share;
-- Add consumer account
ALTER SHARE orders_share ADD ACCOUNTS = consumer_account_locator;
-- Consumer side — create database from share
CREATE DATABASE shared_orders FROM SHARE provider_account.orders_share;
SELECT * FROM shared_orders.marts.fct_orders;
20. Best Practices
Area Best Practice
Warehouse sizing Start small (XS/S), scale up only if queries are slow. Monitor with
Query Profile.
Auto-suspend Set AUTO_SUSPEND = 60s for dev, 300s for prod to avoid idle credit
waste.
Separate warehouses Use dedicated warehouses per workload: ETL, reporting, ad-hoc, data
science.
Clustering keys Add CLUSTER BY only on large tables (>1TB) with frequent range
filters.
Avoid SELECT * Select only needed columns — reduces data scanned and speeds
queries.
Use TRANSIENT tables for No Fail-safe = lower storage costs for throwaway staging data.
staging
Prefer COPY INTO over INSERT Bulk loads via COPY are far faster than row-by-row INSERTs.
Use Time Travel wisely Set DATA_RETENTION_TIME_IN_DAYS = 1 on staging, higher on
critical tables.
Zero-copy clone for dev/test Clone prod database for testing — no storage cost, instant.
Secure views for sensitive data Use SECURE VIEW + masking policies instead of sharing raw tables.
Monitor costs Use SNOWFLAKE.ACCOUNT_USAGE views —
WAREHOUSE_METERING_HISTORY, QUERY_HISTORY.
Avoid cartesian joins Always specify ON clause in JOINs — CROSS JOIN creates
explosive row counts.
Useful account_usage queries
-- Top 10 most expensive queries last 7 days
SELECT query_text, total_elapsed_time/1000 AS sec,
credits_used_cloud_services
FROM SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY
WHERE start_time >= DATEADD(day, -7, CURRENT_TIMESTAMP)
ORDER BY total_elapsed_time DESC LIMIT 10;
-- Credits used per warehouse last 30 days
SELECT warehouse_name,
SUM(credits_used) AS total_credits
FROM SNOWFLAKE.ACCOUNT_USAGE.WAREHOUSE_METERING_HISTORY
WHERE start_time >= DATEADD(day, -30, CURRENT_TIMESTAMP)
GROUP BY warehouse_name
ORDER BY total_credits DESC;
-- Storage usage
SELECT * FROM SNOWFLAKE.ACCOUNT_USAGE.STORAGE_USAGE
ORDER BY usage_date DESC LIMIT 7;
End of Snowflake Notes