0% found this document useful (0 votes)

4 views17 pages

Snowflake Notes

The document provides a comprehensive overview of Snowflake, a cloud data warehouse, detailing its architecture, features, and functionalities including data types, DDL/DML operations, virtual warehouses, data loading, streams, tasks, and security. It outlines the hierarchical organization of database objects and explains various aspects such as semi-structured data handling, performance optimization, and best practices. Additionally, it includes practical examples of SQL commands for creating and managing database elements and tasks within Snowflake.

Uploaded by

rp105263

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views17 pages

Snowflake Notes

Uploaded by

rp105263

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

Snowflake — Quick Reference Notes

Cloud Data Warehouse

Architecture · DDL/DML · Warehouses · Loading · Streams · Tasks · Semi-Structured · Security

Index
# Topic What's Covered
1 What is Snowflake? Architecture overview, cloud platforms, editions
2 Snowflake Architecture Storage, compute, cloud services layers, virtual warehouses
3 Database Objects Hierarchy Organization → Account → Database → Schema → Table/View
4 Data Types Numeric, string, date/time, semi-structured, boolean, variant
5 DDL — Tables & Schema CREATE, ALTER, DROP, CLONE, TRANSIENT, TEMPORARY
tables
6 DML — Query & Modify Data SELECT, INSERT, UPDATE, DELETE, MERGE, TRUNCATE
7 Virtual Warehouses Sizes, auto-suspend, auto-resume, multi-cluster warehouses
8 Data Loading — COPY INTO Stages (internal/external), file formats, COPY INTO syntax
9 Streams & Change Data Stream types, consuming streams, DML change tracking
Capture
10 Tasks & Orchestration Creating tasks, scheduling, DAG of tasks, serverless tasks
11 Semi-Structured Data VARIANT, ARRAY, OBJECT, FLATTEN, querying JSON/Parquet
12 Time Travel & Fail-safe AT/BEFORE syntax, undrop, cloning from history
13 Cloning Zero-copy clone — tables, schemas, databases
14 Access Control & Security RBAC, roles, privileges, row-level security, column masking
15 Performance & Optimization Clustering, result cache, query profile, pruning, materialized
views
16 Snowflake Functions String, date, numeric, conditional, semi-structured, window
functions
17 Snowpipe — Continuous Auto-ingest, event notifications, pipe status
Loading
18 Stored Procedures & UDFs JavaScript/Python/SQL procs, scalar and table UDFs
19 Data Sharing & Marketplace Secure data sharing, reader accounts, data marketplace
20 Best Practices Warehouse sizing, clustering keys, cost control, naming
conventions

1. What is Snowflake?
Snowflake is a fully managed cloud data warehouse (SaaS) with a unique multi-cluster shared data
architecture. It separates storage and compute, allowing each to scale independently.

Feature Details
Type Cloud-native SaaS data warehouse
Cloud platforms AWS, Azure, Google Cloud
Language ANSI SQL + Snowflake extensions
Scaling Storage and compute scale independently
Concurrency Multi-cluster warehouses handle thousands of concurrent queries
Semi-structured Native support for JSON, Avro, Parquet, ORC, XML via VARIANT
Editions Standard → Enterprise → Business Critical → Virtual Private

2. Snowflake Architecture
Snowflake has three distinct layers that operate independently — this is what makes it unique from
traditional data warehouses.

Layer What it does Managed by

Cloud Services Authentication, metadata, query Snowflake (always on)
parsing, optimization, access
control
Query Processing Virtual Warehouses — MPP You — create/size/suspend
(Compute) clusters that execute queries
Database Storage Compressed columnar micro- Snowflake (automatic)
partitions on cloud object
storage (S3/Blob/GCS)

Virtual Warehouse Sizes

Size Credits/hr Use case
X-Small (XS) 1 Dev, testing, small queries
Small (S) 2 Light ETL, small reports
Medium (M) 4 Standard analytics workloads
Large (L) 8 Complex queries, mid-size ETL
X-Large (XL) 16 Heavy ETL, large dataset queries
2X-Large 32 Very heavy workloads
4X-Large 64 Massive data processing
6X-Large 128 Largest ETL / high concurrency
💡 Bigger warehouse = faster query (more compute). Cost = credits × credits_per_hour × credit_price. Auto-
suspend when idle!

3. Database Objects Hierarchy

Organization -- optional top-level grouping
└── Account -- your Snowflake account
└── Database -- logical container
└── Schema -- groups related objects
├── Table -- structured data
├── View -- saved SELECT query
├── Stage -- file location for loading
├── File Format -- parsing rules for files
├── Sequence -- auto-increment numbers
├── Stream -- CDC change log
├── Task -- scheduled SQL job
├── Pipe -- continuous data loading
├── Stored Proc -- programmatic logic
└── UDF -- user-defined function

USE — switch context

USE ROLE SYSADMIN;
USE WAREHOUSE COMPUTE_WH;
USE DATABASE MY_DB;
USE SCHEMA MY_DB.PUBLIC;

4. Data Types
Category Data Types
Numeric NUMBER(p,s), DECIMAL, INT, BIGINT, SMALLINT, TINYINT,
FLOAT, DOUBLE, REAL
String VARCHAR(n), CHAR(n), STRING, TEXT — all stored as VARCHAR
internally
Boolean BOOLEAN — TRUE / FALSE / NULL
Date & Time DATE, TIME, DATETIME, TIMESTAMP, TIMESTAMP_LTZ,
TIMESTAMP_NTZ, TIMESTAMP_TZ
Semi-Structured VARIANT, OBJECT, ARRAY — store JSON, Avro, Parquet natively
Geo GEOGRAPHY, GEOMETRY — spatial data (WGS84)
Binary BINARY, VARBINARY

VARIANT — the universal semi-structured type

-- VARIANT can hold JSON, array, object, or any primitive
CREATE TABLE events (
event_id INT,
payload VARIANT -- store raw JSON here
);

-- Insert JSON
INSERT INTO events SELECT 1, PARSE_JSON('{"user":"alice","action":"login"}');

-- Query VARIANT fields with colon notation

SELECT payload:user::VARCHAR AS user_name,
payload:action::VARCHAR AS action
FROM events;
5. DDL — Tables & Schema
CREATE DATABASE / SCHEMA
CREATE DATABASE IF NOT EXISTS analytics;
CREATE SCHEMA IF NOT EXISTS [Link];

ALTER DATABASE analytics RENAME TO analytics_v2;

DROP DATABASE IF EXISTS old_db;

CREATE TABLE types

-- Permanent table (default) — survives cluster restarts
CREATE TABLE orders (
order_id NUMBER AUTOINCREMENT PRIMARY KEY,
customer_id NUMBER NOT NULL,
status VARCHAR(50) DEFAULT 'pending',
amount DECIMAL(12,2),
order_date DATE,
payload VARIANT
);

-- Transient table — no Fail-safe, lower storage cost

CREATE TRANSIENT TABLE staging_orders LIKE orders;

-- Temporary table — session-scoped, dropped on disconnect

CREATE TEMPORARY TABLE temp_calc AS
SELECT * FROM orders WHERE order_date = CURRENT_DATE;

-- Create table from query

CREATE TABLE orders_2024 AS
SELECT * FROM orders WHERE YEAR(order_date) = 2024;

ALTER TABLE
ALTER TABLE orders ADD COLUMN discount DECIMAL(5,2) DEFAULT 0;
ALTER TABLE orders DROP COLUMN discount;
ALTER TABLE orders RENAME COLUMN status TO order_status;
ALTER TABLE orders ALTER COLUMN amount SET NOT NULL;
ALTER TABLE orders CLUSTER BY (order_date, customer_id);
ALTER TABLE orders SWAP WITH orders_backup; -- atomic table swap

Views
-- Standard view
CREATE OR REPLACE VIEW v_active_orders AS
SELECT * FROM orders WHERE status != 'cancelled';

-- Secure view — hides definition from non-owners

CREATE OR REPLACE SECURE VIEW v_sensitive_orders AS
SELECT order_id, amount FROM orders;

-- Materialized view — pre-computed, auto-refreshed

CREATE OR REPLACE MATERIALIZED VIEW mv_daily_revenue AS
SELECT order_date, SUM(amount) AS revenue
FROM orders GROUP BY order_date;
6. DML — Query & Modify Data
SELECT with Snowflake extras
-- QUALIFY — filter window function results (no subquery needed)
SELECT order_id, customer_id, amount,
ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY amount DESC) AS rn
FROM orders
QUALIFY rn = 1; -- top order per customer

-- SAMPLE — random row sampling

SELECT * FROM orders SAMPLE (10); -- 10% of rows
SELECT * FROM orders SAMPLE (1000 ROWS); -- exactly 1000 rows

-- TOP (alternative to LIMIT)

SELECT TOP 100 * FROM orders ORDER BY amount DESC;

MERGE — upsert
MERGE INTO orders AS target
USING (SELECT * FROM orders_staging) AS source
ON target.order_id = source.order_id
WHEN MATCHED AND [Link] != [Link] THEN
UPDATE SET [Link] = [Link],
[Link] = [Link]
WHEN NOT MATCHED THEN
INSERT (order_id, customer_id, status, amount)
VALUES (source.order_id, source.customer_id, [Link], [Link]);

MULTI-TABLE INSERT
INSERT ALL
WHEN status = 'shipped' THEN INTO orders_shipped
WHEN status = 'cancelled' THEN INTO orders_cancelled
ELSE INTO orders_other
SELECT * FROM orders_staging;

7. Virtual Warehouses
Create & Manage
CREATE WAREHOUSE IF NOT EXISTS transform_wh
WAREHOUSE_SIZE = 'MEDIUM'
AUTO_SUSPEND = 60 -- suspend after 60s idle
AUTO_RESUME = TRUE -- auto-start on query
MIN_CLUSTER_COUNT = 1
MAX_CLUSTER_COUNT = 3 -- multi-cluster: scale out
SCALING_POLICY = 'ECONOMY' -- or STANDARD
COMMENT = 'ETL warehouse';

ALTER WAREHOUSE transform_wh SUSPEND;

ALTER WAREHOUSE transform_wh RESUME;
ALTER WAREHOUSE transform_wh SET WAREHOUSE_SIZE = 'LARGE';
DROP WAREHOUSE IF EXISTS old_wh;

SHOW WAREHOUSES;
Multi-Cluster Warehouses
Policy Behavior
STANDARD Starts new clusters immediately when queries queue up
ECONOMY Waits 6 minutes before starting a new cluster — saves credits
💡 Use separate warehouses per workload type (ETL, reporting, ad-hoc) to isolate performance and control costs.

8. Data Loading — COPY INTO

Stage types
Stage Type Description Example path
User stage Auto-created per user, private @~ (tilde)
Table stage Auto-created per table @%table_name
Named internal stage Explicitly created, shared @my_stage
Named external stage Points to S3 / Azure Blob / GCS @s3_stage

Create a Named Stage

-- Internal named stage
CREATE STAGE my_internal_stage
FILE_FORMAT = (TYPE = 'CSV' FIELD_OPTIONALLY_ENCLOSED_BY = '"');

-- External stage — S3
CREATE STAGE s3_raw_stage
URL = 's3://my-bucket/raw/'
CREDENTIALS = (AWS_KEY_ID = '...' AWS_SECRET_KEY = '...')
FILE_FORMAT = (TYPE = 'PARQUET');

-- External stage with storage integration (preferred — no keys)

CREATE STAGE s3_secure_stage
URL = 's3://my-bucket/raw/'
STORAGE_INTEGRATION = my_s3_integration
FILE_FORMAT = (TYPE = 'CSV' SKIP_HEADER = 1);

File Formats
CREATE FILE FORMAT csv_format
TYPE = 'CSV'
FIELD_DELIMITER = ','
RECORD_DELIMITER = '\n'
SKIP_HEADER = 1
FIELD_OPTIONALLY_ENCLOSED_BY = '"'
NULL_IF = ('NULL', 'null', '')
EMPTY_FIELD_AS_NULL = TRUE;

CREATE FILE FORMAT json_format

TYPE = 'JSON'
STRIP_OUTER_ARRAY = TRUE; -- unwrap top-level array

CREATE FILE FORMAT parquet_format

TYPE = 'PARQUET'
SNAPPY_COMPRESSION = TRUE;
COPY INTO — load data
-- Load from stage into table
COPY INTO orders
FROM @s3_raw_stage/orders/
FILE_FORMAT = (FORMAT_NAME = 'csv_format')
PATTERN = '.*orders.*\.csv'
ON_ERROR = 'CONTINUE' -- CONTINUE / SKIP_FILE / ABORT_STATEMENT
PURGE = TRUE; -- delete staged files after load

-- Load specific columns

COPY INTO orders (order_id, customer_id, amount)
FROM (SELECT $1, $2, $4 FROM @my_stage/[Link]);

-- Validate without loading

COPY INTO orders FROM @my_stage VALIDATION_MODE = 'RETURN_ERRORS';

COPY INTO — unload data

-- Export table to stage
COPY INTO @s3_raw_stage/exports/orders_
FROM (SELECT * FROM orders WHERE order_date = CURRENT_DATE)
FILE_FORMAT = (TYPE = 'PARQUET')
OVERWRITE = TRUE
MAX_FILE_SIZE = 104857600; -- 100 MB per file

9. Streams & Change Data Capture

A Stream records DML changes (INSERT, UPDATE, DELETE) made to a table since the stream was
last consumed. Used for CDC and incremental processing.

Stream types
Type Tracks
Standard (default) All DML — inserts, updates, deletes
Append-only Only INSERT operations — no updates/deletes
Insert-only Only inserts — for external tables

Create & consume a stream

-- Create stream on a table
CREATE STREAM orders_stream
ON TABLE orders
APPEND ONLY = FALSE; -- standard stream

-- Stream metadata columns added automatically:

-- METADATA$ACTION -- 'INSERT' or 'DELETE'
-- METADATA$ISUPDATE -- TRUE if row is part of an UPDATE
-- METADATA$ROW_ID -- unique row identifier

-- Query stream (shows changes since last consumption)

SELECT * FROM orders_stream;

-- Consume stream — process into target table

-- Stream is cleared only when consumed inside a DML transaction
INSERT INTO orders_processed
SELECT order_id, customer_id, amount, METADATA$ACTION AS change_type
FROM orders_stream
WHERE METADATA$ACTION = 'INSERT';

-- Check if stream has new data

SELECT SYSTEM$STREAM_HAS_DATA('orders_stream');

10. Tasks & Orchestration

Tasks are scheduled SQL statements (or stored procedure calls). They can be chained into a DAG for
complex orchestration.

Create a Task
-- Scheduled task — runs every 15 minutes
CREATE TASK load_orders_task
WAREHOUSE = 'TRANSFORM_WH'
SCHEDULE = '15 MINUTE' -- or USING CRON '0 * * * * UTC'
WHEN SYSTEM$STREAM_HAS_DATA('orders_stream') -- conditional
AS
INSERT INTO orders_processed
SELECT * FROM orders_stream WHERE METADATA$ACTION = 'INSERT';

-- Serverless task (Snowflake manages compute)

CREATE TASK serverless_task
USER_TASK_MANAGED_INITIAL_WAREHOUSE_SIZE = 'MEDIUM'
SCHEDULE = 'USING CRON 0 2 * * * UTC'
AS
CALL my_stored_procedure();

Task DAG — chaining tasks

-- Root task (has SCHEDULE)
CREATE TASK root_task
WAREHOUSE = 'TRANSFORM_WH'
SCHEDULE = '1 HOUR'
AS SELECT 1; -- just a trigger

-- Child task (depends on root_task)

CREATE TASK child_task_1
WAREHOUSE = 'TRANSFORM_WH'
AFTER root_task -- runs after root_task succeeds
AS CALL load_staging();

-- Grandchild task
CREATE TASK child_task_2
WAREHOUSE = 'TRANSFORM_WH'
AFTER child_task_1
AS CALL transform_to_mart();

-- Tasks are suspended by default — must RESUME

ALTER TASK child_task_2 RESUME;
ALTER TASK child_task_1 RESUME;
ALTER TASK root_task RESUME; -- resume root last

ALTER TASK root_task SUSPEND;

EXECUTE TASK root_task; -- manual trigger
SHOW TASKS;
11. Semi-Structured Data
Snowflake natively stores JSON, Avro, Parquet, ORC, and XML in VARIANT columns — no schema
needed upfront.

Querying JSON — colon notation

-- Sample data: payload = {"user": {"name": "Alice", "age": 30}, "tags":
["vip","new"]}

SELECT
payload:user:name::VARCHAR AS user_name,
payload:user:age::INT AS user_age,
payload:tags[0]::VARCHAR AS first_tag, -- array index
GET(payload, 'user') AS user_obj,
GET_PATH(payload, '[Link]') AS nested_name
FROM events;

FLATTEN — expand arrays

-- Explode array into rows
SELECT
e.event_id,
[Link]::VARCHAR AS tag
FROM events e,
LATERAL FLATTEN(INPUT => [Link]:tags) f;

-- Flatten nested object

SELECT
[Link] AS field_name,
[Link] AS field_value
FROM events,
LATERAL FLATTEN(INPUT => payload, RECURSIVE => TRUE) f;

Build semi-structured data

-- Build OBJECT
SELECT OBJECT_CONSTRUCT('name', first_name, 'id', customer_id) AS customer_obj
FROM customers;

-- Build ARRAY
SELECT ARRAY_CONSTRUCT(1, 2, 3) AS my_array;
SELECT ARRAY_AGG(product_id) AS product_list FROM order_items;

-- Parse and serialize

SELECT PARSE_JSON('{"a":1}') AS parsed;
SELECT TO_JSON(payload) AS json_string FROM events;
SELECT TYPEOF(payload:user) AS data_type FROM events;

12. Time Travel & Fail-safe

Time Travel lets you query data as it existed at a past point in time, or restore accidentally
dropped/modified data.

Time Travel — AT / BEFORE

-- Query table as it was 1 hour ago
SELECT * FROM orders AT (OFFSET => -3600);
-- Query at a specific timestamp
SELECT * FROM orders AT (TIMESTAMP => '2024-06-01 09:00:00'::TIMESTAMP);

-- Query before a specific statement ran

SELECT * FROM orders BEFORE (STATEMENT => '8e5d0ca9-005e-44e6-b858-a8f5b37c5726');

-- Restore accidentally deleted rows

INSERT INTO orders
SELECT * FROM orders BEFORE (STATEMENT => LAST_QUERY_ID());

-- Restore a dropped table

UNDROP TABLE orders;
UNDROP SCHEMA my_schema;
UNDROP DATABASE my_db;

-- Clone table from point in time

CREATE TABLE orders_restored CLONE orders
AT (TIMESTAMP => '2024-06-01 08:00:00'::TIMESTAMP);

Time Travel retention periods

Table type Max retention
Permanent table — Standard 1 day
edition
Permanent table — Enterprise 90 days (configurable)
edition
Transient table 1 day max
Temporary table 1 day max (session only)

Feature Time Travel Fail-safe

Purpose Self-service recovery by user Snowflake-managed disaster recovery
Retention 0–90 days (you set it) 7 days (fixed, after time travel expires)
Accessible by User (SQL queries) Snowflake Support only

13. Cloning
Zero-copy cloning creates an instant copy of a table, schema, or database — no data is duplicated.
Clones share micro-partitions with the source until data diverges.

Syntax
-- Clone a table (instant, no storage cost initially)
CREATE TABLE orders_backup CLONE orders;

-- Clone a schema
CREATE SCHEMA analytics.staging_backup CLONE [Link];

-- Clone a database
CREATE DATABASE analytics_dev CLONE analytics;

-- Clone from Time Travel

CREATE TABLE orders_june1 CLONE orders
AT (TIMESTAMP => '2024-06-01 00:00:00'::TIMESTAMP);
-- Clone a stage
CREATE STAGE my_stage_backup CLONE my_stage;

💡 Cloning is perfect for dev/test environments — clone prod database instantly with zero storage cost at clone
time.

14. Access Control & Security

Snowflake uses Role-Based Access Control (RBAC). Users are granted roles, and roles are granted
privileges on objects.

Built-in system roles

Role Access level
ACCOUNTADMIN Full account control — billing, users, top-level objects. Use sparingly.
SYSADMIN Create and manage databases, warehouses, schemas, tables
SECURITYADMIN Create and manage users and roles
USERADMIN Create users and roles (subset of SECURITYADMIN)
PUBLIC Default role every user gets — minimal privileges

Create roles & grant privileges

-- Create a role
CREATE ROLE analyst_role;
CREATE ROLE transformer_role;

-- Grant object privileges to role

GRANT USAGE ON DATABASE analytics TO ROLE analyst_role;
GRANT USAGE ON SCHEMA [Link] TO ROLE analyst_role;
GRANT SELECT ON ALL TABLES IN SCHEMA [Link] TO ROLE analyst_role;
GRANT SELECT ON FUTURE TABLES IN SCHEMA [Link] TO ROLE analyst_role;

-- Grant warehouse usage

GRANT USAGE ON WAREHOUSE reporting_wh TO ROLE analyst_role;

-- Assign role to user

GRANT ROLE analyst_role TO USER alice;

-- Role hierarchy — analyst_role inherits transformer_role privileges

GRANT ROLE transformer_role TO ROLE analyst_role;

-- Revoke
REVOKE SELECT ON TABLE orders FROM ROLE analyst_role;

Row-Level Security — Row Access Policy

-- Policy: users only see rows matching their region
CREATE ROW ACCESS POLICY region_policy
AS (row_region VARCHAR) RETURNS BOOLEAN ->
row_region = CURRENT_ROLE() -- simple example
OR CURRENT_ROLE() = 'ACCOUNTADMIN';

-- Attach to table
ALTER TABLE orders
ADD ROW ACCESS POLICY region_policy ON (region);

Dynamic Data Masking

-- Mask email for non-admin roles
CREATE MASKING POLICY email_mask
AS (val VARCHAR) RETURNS VARCHAR ->
CASE
WHEN CURRENT_ROLE() IN ('ACCOUNTADMIN', 'ANALYST') THEN val
ELSE REGEXP_REPLACE(val, '.+@', '*****@')
END;

-- Apply to column
ALTER TABLE customers
MODIFY COLUMN email SET MASKING POLICY email_mask;

15. Performance & Optimization

Query result cache
-- Results cached for 24 hours at Cloud Services layer
-- Exact same query = instant result, zero warehouse cost
-- Cache invalidated when underlying data changes

-- Check if last query used cache:

SELECT * FROM TABLE(RESULT_SCAN(LAST_QUERY_ID()));
-- In query profile: look for 'Query Result Reuse'

Micro-partition pruning
-- Snowflake stores metadata (min/max) per micro-partition
-- Queries on filtered columns prune irrelevant partitions

-- GOOD — prunes by date (common filter column)

SELECT * FROM orders WHERE order_date = '2024-06-01';

-- CLUSTERING KEY — explicitly define pruning column for large tables

ALTER TABLE orders CLUSTER BY (order_date);
ALTER TABLE orders CLUSTER BY (order_date, customer_id);

-- Check clustering info

SELECT SYSTEM$CLUSTERING_INFORMATION('orders', '(order_date)');

Search Optimization Service

-- Accelerates point lookups on large tables (equality/IN queries)
ALTER TABLE orders ADD SEARCH OPTIMIZATION;
ALTER TABLE orders DROP SEARCH OPTIMIZATION;

-- Check search optimization status

SHOW TABLES LIKE 'orders';

Query Profile — how to analyze

Metric to check What it means
Partitions scanned vs total High ratio = poor pruning — add cluster key
Spillage to local/remote disk Warehouse too small — increase size
Bytes sent over network Large data shuffle — optimize JOIN order
Query result reuse Cache hit — free query
Compilation time Long compile = complex query or metadata issue

16. Snowflake Functions

Conditional Functions
IFF(condition, true_val, false_val) -- single if/else
IFF(amount > 100, 'high', 'low')

CASE WHEN ... THEN ... ELSE ... END -- multi-branch

COALESCE(col1, col2, 'default') -- first non-null

NULLIF(col, 0) -- returns NULL if col = 0
NVL(col, 'fallback') -- alias for COALESCE
NVL2(col, 'not null val', 'null val')
ZEROIFNULL(col) -- NULL → 0
DECODE(col, 'a', 1, 'b', 2, 0) -- like CASE for equality

String Functions
UPPER(col) / LOWER(col) / INITCAP(col)
TRIM(col) / LTRIM(col) / RTRIM(col)
LENGTH(col) / SUBSTR(col, start, len)
CONCAT(a, b, c) or a || b || c
REPLACE(col, 'old', 'new')
SPLIT(col, ',') -- returns ARRAY
SPLIT_PART(col, ',', 1) -- 1-indexed part
CONTAINS(col, 'substr') -- returns BOOLEAN
STARTSWITH(col, 'prefix') / ENDSWITH(col, 'suffix')
REGEXP_LIKE(col, 'pattern')
REGEXP_REPLACE(col, 'pattern', 'replacement')
REGEXP_SUBSTR(col, 'pattern', 1, 1)
LPAD(col, 10, '0') / RPAD(col, 10, ' ')

Date & Time Functions

CURRENT_DATE -- today's date
CURRENT_TIMESTAMP -- current datetime with tz
SYSDATE() -- alias for CURRENT_TIMESTAMP

DATEADD(day, 7, order_date) -- add 7 days

DATEDIFF(day, start_date, end_date) -- diff in days
DATE_TRUNC('month', order_date) -- truncate to month
EXTRACT(year FROM order_date) -- or YEAR(order_date)
TO_DATE('2024-06-01', 'YYYY-MM-DD')
TO_TIMESTAMP('2024-06-01 10:00:00')
CONVERT_TIMEZONE('UTC', 'Asia/Kolkata', ts_col) -- tz conversion
LAST_DAY(order_date) -- last day of month

Window Functions (same as standard SQL)

ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY order_date)
RANK() OVER (PARTITION BY dept ORDER BY salary DESC)
DENSE_RANK() OVER (...)
LAG(amount, 1) OVER (ORDER BY order_date) -- previous row
LEAD(amount, 1) OVER (ORDER BY order_date) -- next row
SUM(amount) OVER (PARTITION BY customer_id ORDER BY order_date
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS
running_total

-- Snowflake-specific: QUALIFY to filter window results

SELECT * FROM orders
QUALIFY ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY amount DESC) = 1;

RATIO_TO_REPORT(amount) OVER (PARTITION BY dept) -- pct of group total

17. Snowpipe — Continuous Loading

Snowpipe continuously loads data into tables as new files arrive in a stage — no need to manually run
COPY INTO.

Create a Pipe
CREATE PIPE orders_pipe
AUTO_INGEST = TRUE -- trigger via cloud event notification
AS
COPY INTO orders
FROM @s3_raw_stage/orders/
FILE_FORMAT = (FORMAT_NAME = 'csv_format');

-- Get the SQS ARN for S3 event notification setup:

SHOW PIPES;
SELECT SYSTEM$PIPE_STATUS('orders_pipe');

-- Manually refresh (for files already in stage)

ALTER PIPE orders_pipe REFRESH;

-- Pause / resume
ALTER PIPE orders_pipe SET PIPE_EXECUTION_PAUSED = TRUE;
ALTER PIPE orders_pipe SET PIPE_EXECUTION_PAUSED = FALSE;

💡 Snowpipe uses serverless compute — no warehouse needed. Billed per file/credit based on usage.

18. Stored Procedures & UDFs

Stored Procedure — JavaScript
CREATE OR REPLACE PROCEDURE update_order_status(order_id_param INT)
RETURNS STRING
LANGUAGE JAVASCRIPT
AS $$
var sql = `UPDATE orders SET status = 'processed'
WHERE order_id = ${ORDER_ID_PARAM}`;
[Link]({sqlText: sql});
return 'Done';
$$;

CALL update_order_status(12345);
Stored Procedure — Snowflake Scripting (SQL)
CREATE OR REPLACE PROCEDURE refresh_marts()
RETURNS STRING
LANGUAGE SQL
AS
$$
BEGIN
CREATE OR REPLACE TABLE marts.fct_orders AS
SELECT * FROM staging.stg_orders WHERE status = 'delivered';

CREATE OR REPLACE TABLE marts.dim_customers AS

SELECT DISTINCT customer_id, customer_name FROM staging.stg_orders;

RETURN 'Marts refreshed';

END;
$$;

CALL refresh_marts();

UDF — Scalar
-- SQL UDF
CREATE OR REPLACE FUNCTION cents_to_dollars(cents NUMBER)
RETURNS NUMBER(12,2)
AS 'cents / 100.0';

-- Usage
SELECT cents_to_dollars(amount_cents) FROM orders;

-- JavaScript UDF
CREATE OR REPLACE FUNCTION parse_domain(email VARCHAR)
RETURNS VARCHAR
LANGUAGE JAVASCRIPT
AS $$
return [Link]('@')[1];
$$;

UDTF — Table Function

-- Returns multiple rows per input row
CREATE OR REPLACE FUNCTION split_to_rows(val VARCHAR, delim VARCHAR)
RETURNS TABLE (item VARCHAR)
LANGUAGE JAVASCRIPT
AS $$
{
processRow: function(row, rowWriter) {
[Link]([Link]).forEach(item => [Link]({ITEM:
[Link]()}));
}
}
$$;

SELECT * FROM TABLE(split_to_rows('a,b,c', ','));

19. Data Sharing & Marketplace

Snowflake allows sharing live data between accounts with zero data copy — the consumer queries the
provider's storage directly.
Feature Details
Secure Data Sharing Share databases/schemas/tables to other Snowflake accounts — live,
no copy
Reader Accounts Share data with non-Snowflake customers (Snowflake manages their
account)
Data Marketplace Discover and subscribe to third-party datasets (weather, finance, geo)
Data Exchange Private marketplace for sharing within your org or partners

Create a Share
-- Provider side
CREATE SHARE orders_share;

GRANT USAGE ON DATABASE analytics TO SHARE orders_share;

GRANT USAGE ON SCHEMA [Link] TO SHARE orders_share;
GRANT SELECT ON TABLE [Link].fct_orders TO SHARE orders_share;

-- Add consumer account

ALTER SHARE orders_share ADD ACCOUNTS = consumer_account_locator;

-- Consumer side — create database from share

CREATE DATABASE shared_orders FROM SHARE provider_account.orders_share;
SELECT * FROM shared_orders.marts.fct_orders;

20. Best Practices

Area Best Practice
Warehouse sizing Start small (XS/S), scale up only if queries are slow. Monitor with
Query Profile.
Auto-suspend Set AUTO_SUSPEND = 60s for dev, 300s for prod to avoid idle credit
waste.
Separate warehouses Use dedicated warehouses per workload: ETL, reporting, ad-hoc, data
science.
Clustering keys Add CLUSTER BY only on large tables (>1TB) with frequent range
filters.
Avoid SELECT * Select only needed columns — reduces data scanned and speeds
queries.
Use TRANSIENT tables for No Fail-safe = lower storage costs for throwaway staging data.
staging
Prefer COPY INTO over INSERT Bulk loads via COPY are far faster than row-by-row INSERTs.
Use Time Travel wisely Set DATA_RETENTION_TIME_IN_DAYS = 1 on staging, higher on
critical tables.
Zero-copy clone for dev/test Clone prod database for testing — no storage cost, instant.
Secure views for sensitive data Use SECURE VIEW + masking policies instead of sharing raw tables.
Monitor costs Use SNOWFLAKE.ACCOUNT_USAGE views —
WAREHOUSE_METERING_HISTORY, QUERY_HISTORY.
Avoid cartesian joins Always specify ON clause in JOINs — CROSS JOIN creates
explosive row counts.
Useful account_usage queries
-- Top 10 most expensive queries last 7 days
SELECT query_text, total_elapsed_time/1000 AS sec,
credits_used_cloud_services
FROM SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY
WHERE start_time >= DATEADD(day, -7, CURRENT_TIMESTAMP)
ORDER BY total_elapsed_time DESC LIMIT 10;

-- Credits used per warehouse last 30 days

SELECT warehouse_name,
SUM(credits_used) AS total_credits
FROM SNOWFLAKE.ACCOUNT_USAGE.WAREHOUSE_METERING_HISTORY
WHERE start_time >= DATEADD(day, -30, CURRENT_TIMESTAMP)
GROUP BY warehouse_name
ORDER BY total_credits DESC;

-- Storage usage
SELECT * FROM SNOWFLAKE.ACCOUNT_USAGE.STORAGE_USAGE
ORDER BY usage_date DESC LIMIT 7;

End of Snowflake Notes

Snowflake Data Warehousing Essentials
100% (1)
Snowflake Data Warehousing Essentials
40 pages
Snowflake Object Management Guide
No ratings yet
Snowflake Object Management Guide
10 pages
Snowflake Snowpro Core Ebook
No ratings yet
Snowflake Snowpro Core Ebook
12 pages
Snowflake Interview Preparation Guide
No ratings yet
Snowflake Interview Preparation Guide
14 pages
Snowflake Data Management Guide
No ratings yet
Snowflake Data Management Guide
36 pages
Snowflake Data Warehouse Overview
No ratings yet
Snowflake Data Warehouse Overview
6 pages
Snowflake Table Types Overview
No ratings yet
Snowflake Table Types Overview
4 pages
Snowflake Components and SnowSQL Setup
No ratings yet
Snowflake Components and SnowSQL Setup
45 pages
Snowflake Programming Masterclass 2024
100% (1)
Snowflake Programming Masterclass 2024
342 pages
Snowflake
No ratings yet
Snowflake
7 pages
Snowflake Architecture and Data Management
100% (1)
Snowflake Architecture and Data Management
51 pages
Snowflake Query Management Essentials
No ratings yet
Snowflake Query Management Essentials
122 pages
Snowflake Architecture Deep Dive
No ratings yet
Snowflake Architecture Deep Dive
51 pages
Detailed Snowflake Notes
No ratings yet
Detailed Snowflake Notes
8 pages
Snowflake Interview Answers PDF
No ratings yet
Snowflake Interview Answers PDF
3 pages
Snowflake Guide: From Beginner to Expert
No ratings yet
Snowflake Guide: From Beginner to Expert
35 pages
Data Warehouse and Data Lake Concepts
No ratings yet
Data Warehouse and Data Lake Concepts
43 pages
Snowflake Cheatsheet
No ratings yet
Snowflake Cheatsheet
10 pages
SQL Data Engineer Cheatsheet - HTML
No ratings yet
SQL Data Engineer Cheatsheet - HTML
96 pages
SnowPro Core Certification Cheat Sheet
100% (1)
SnowPro Core Certification Cheat Sheet
7 pages
Snowflake SnowPro Certification Cheat Sheet
83% (12)
Snowflake SnowPro Certification Cheat Sheet
7 pages
DW Unit1 Notes
No ratings yet
DW Unit1 Notes
21 pages
Snowflake Architecture and Features Overview
No ratings yet
Snowflake Architecture and Features Overview
8 pages
Understanding Snowflake Architecture
100% (1)
Understanding Snowflake Architecture
35 pages
Snowflake Architecture Overview
No ratings yet
Snowflake Architecture Overview
220 pages
SnowPro Core Certification Prep Guide
No ratings yet
SnowPro Core Certification Prep Guide
37 pages
Snowflake Basics
No ratings yet
Snowflake Basics
11 pages
DataWarehousing Notes
No ratings yet
DataWarehousing Notes
16 pages
DWM Assignment
No ratings yet
DWM Assignment
6 pages
Snowflake Architecture and Key Concepts
No ratings yet
Snowflake Architecture and Key Concepts
3 pages
DWH Bosscoder
No ratings yet
DWH Bosscoder
22 pages
DynamoDB Table Creation and Management Guide
No ratings yet
DynamoDB Table Creation and Management Guide
23 pages
Snowflake Cloud Data Warehouse Overview
No ratings yet
Snowflake Cloud Data Warehouse Overview
73 pages
Database Lab Report: Star & Snowflake Schema
No ratings yet
Database Lab Report: Star & Snowflake Schema
20 pages
Snowflake Course Content Overview
No ratings yet
Snowflake Course Content Overview
5 pages
Snowflake Data Warehouse Training Guide
No ratings yet
Snowflake Data Warehouse Training Guide
18 pages
Snowflake Data Access and Management
No ratings yet
Snowflake Data Access and Management
22 pages
Understanding Data Warehousing Concepts
No ratings yet
Understanding Data Warehousing Concepts
8 pages
Snowflake - Interview Questions
No ratings yet
Snowflake - Interview Questions
15 pages
Snowflake Data Engineering Cheat Sheet
No ratings yet
Snowflake Data Engineering Cheat Sheet
10 pages
Snowflake - All Concepts To Get Started
No ratings yet
Snowflake - All Concepts To Get Started
93 pages
SriLanka DataEngineering Bootcamp Tutorials
No ratings yet
SriLanka DataEngineering Bootcamp Tutorials
16 pages
Data Warehousing & ETL Exam Notes
No ratings yet
Data Warehousing & ETL Exam Notes
3 pages
DE Intern Test Submit
No ratings yet
DE Intern Test Submit
10 pages
Snowflake and SQL Interview Questions
No ratings yet
Snowflake and SQL Interview Questions
13 pages
Types of Tables in Snowflake Explained
No ratings yet
Types of Tables in Snowflake Explained
51 pages
Snowflake Connection and DDL Guide
No ratings yet
Snowflake Connection and DDL Guide
50 pages
Data Warehousing Insights: Snowflake vs Oracle
No ratings yet
Data Warehousing Insights: Snowflake vs Oracle
30 pages
Physical Database Design Best Practices
No ratings yet
Physical Database Design Best Practices
6 pages
Snowflake Data Storage and Management Guide
No ratings yet
Snowflake Data Storage and Management Guide
34 pages
Comprehensive Snowflake MasterClass Guide
33% (3)
Comprehensive Snowflake MasterClass Guide
192 pages
Snowflake: Company Origins and Features
0% (1)
Snowflake: Company Origins and Features
15 pages
Snowflake Database and Table Management Guide
No ratings yet
Snowflake Database and Table Management Guide
6 pages
Data Warehousing Essentials Guide
No ratings yet
Data Warehousing Essentials Guide
14 pages
Snowflake Data Warehouse Architecture
No ratings yet
Snowflake Data Warehouse Architecture
20 pages
Accounting and Inventory Systems Guide
No ratings yet
Accounting and Inventory Systems Guide
21 pages
PPDM Lite Architectural Principles 1.0
No ratings yet
PPDM Lite Architectural Principles 1.0
17 pages
Epicor Service Connect Installation and Implementation Guide
No ratings yet
Epicor Service Connect Installation and Implementation Guide
32 pages
M.Tech Computer Science Syllabus 2019-20
No ratings yet
M.Tech Computer Science Syllabus 2019-20
24 pages
Azure Exam 70-487 Study Guide
No ratings yet
Azure Exam 70-487 Study Guide
140 pages
Essential Food Safety Training Guide
No ratings yet
Essential Food Safety Training Guide
8 pages
Project Monthly Performance
No ratings yet
Project Monthly Performance
99 pages
Data Warehouse Overview and Components
No ratings yet
Data Warehouse Overview and Components
59 pages
Migration of Python Framework in IBM Cloud
No ratings yet
Migration of Python Framework in IBM Cloud
24 pages
Fraud Protection Application Internship Report
No ratings yet
Fraud Protection Application Internship Report
26 pages
Database Administration Essentials Guide
No ratings yet
Database Administration Essentials Guide
14 pages
Client-Server Computing Exam Questions
No ratings yet
Client-Server Computing Exam Questions
53 pages
AI-Powered Public Speaking Coach
No ratings yet
AI-Powered Public Speaking Coach
15 pages
AI-Driven Proposal Evaluation for SIH 2025
No ratings yet
AI-Driven Proposal Evaluation for SIH 2025
7 pages
Web-Based Notes Manager Application
No ratings yet
Web-Based Notes Manager Application
2 pages
WebLogic Server Administration Guide
No ratings yet
WebLogic Server Administration Guide
126 pages
MIS 107 Course Outline - Spring 2024
No ratings yet
MIS 107 Course Outline - Spring 2024
4 pages
SQL Server 2022 Datasheet
No ratings yet
SQL Server 2022 Datasheet
2 pages
Exchange Server Diagnostics Log Guide
No ratings yet
Exchange Server Diagnostics Log Guide
69 pages
Morgan Stanley IT Analyst Resume
No ratings yet
Morgan Stanley IT Analyst Resume
6 pages
SAS Marketing Automation 5.3 Admistrator Guide Third Edition PDF
0% (1)
SAS Marketing Automation 5.3 Admistrator Guide Third Edition PDF
248 pages
Lec2 DB Enviroment
No ratings yet
Lec2 DB Enviroment
34 pages
Computer Science Sample Paper 2024-25
100% (1)
Computer Science Sample Paper 2024-25
7 pages
Cu DatabaseSystems
No ratings yet
Cu DatabaseSystems
8 pages
High Performance Techniques For Microsoft SQL Server
No ratings yet
High Performance Techniques For Microsoft SQL Server
307 pages
Microservices Architecture in Cloud Apps
No ratings yet
Microservices Architecture in Cloud Apps
19 pages
Kuppam Engineering College Project on Face Recognition
No ratings yet
Kuppam Engineering College Project on Face Recognition
15 pages
Mastering Management Reporter 2012 Guide
No ratings yet
Mastering Management Reporter 2012 Guide
57 pages
Execute Dynamic SQL Commands in SQL Server
No ratings yet
Execute Dynamic SQL Commands in SQL Server
3 pages
School Management System Project Report
No ratings yet
School Management System Project Report
16 pages