0% found this document useful (0 votes)
14 views12 pages

Snowflake Snowpro Core Ebook

This document serves as a guide for the Udemy course on Snowflake SnowPro Core Certification, detailing Snowflake's architecture, features, and functionalities. It covers aspects such as data loading methods, query processing, security measures, and billing structures. Additionally, it provides code examples for creating tables, views, and tasks within the Snowflake environment.

Uploaded by

8527479210
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views12 pages

Snowflake Snowpro Core Ebook

This document serves as a guide for the Udemy course on Snowflake SnowPro Core Certification, detailing Snowflake's architecture, features, and functionalities. It covers aspects such as data loading methods, query processing, security measures, and billing structures. Additionally, it provides code examples for creating tables, views, and tasks within the Snowflake environment.

Uploaded by

8527479210
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

This Book is created as part of Udemy course Snowflake SnowPro Core Certification COF-C02 Masterclass, please complete the

course
before reading this book, if you got this file without the course then you can purchase the course here. [Link]
Snowflake Data Cloud Features and Architecture: - Snowflake optimizer determines the most efficient way to execute a query.
What is Snowflake? - Metadata Manager manages the metadata associated with data stored in Snowflake. Schema: A schema is a logical grouping of database objects (tables, views, etc.). Each
• Cloud based data platform (AWS, GCP, Azure Supported) - Snowflake Security includes encryption, network isolation, and other security features schema belongs to a single database
• Not built from old db tech to protect data.
• Cloud native Table Types :
• Separates Compute from Storage
• It is SaaS (Min upfront cost, Pay-as-you-go, fully managed Infra) Snowflake has multiple editions to choose from:

Traditional Architecture:
Shared Disk Architecture (ex: Oracle Real Application Clusters)
- High availability/reliability, easy data sharing b/w nodes, dynamic load balance
- Single point of failure, Network bottleneck
Shared Nothing Architecture (ex: Teradata) *External tables are read only and points to external stage.
- High scalable, better performance, flexible conf
- Fixed load balancing, Data Skew, Time consuming data partitioning Code for table creation:
For each Snowflake account, you choose Edition, cloud provider and location/Region.
Snowflake (SF) Architecture: SF has 3 distinct layer, Cloud Services, Query Processing Snowflake UI is called Snowsight. CREATE TEMPORARY (TEMP)| TRANSIENT| PERMANENT (Optional) TABLE
Layer, Database Storage. <Table_name> (
Data Exchange: Snowflake support can enable it, can share and control data between ID NUMBER,
selected group of entities. );

Data Types:

View Type:

Database Storage: Snowflake Partners: Huge partner network for data integration, ML/Data Science,
- Data into optimized, Compressed and columnar format (Proprietary) Security, Governance & Observability, BI, SQL Dev & Mgmt and Native programming
- Data is stored in cloud storage, accessible to all nodes Interface.
- Data not accessible directly, need to read via SF
- Cloud agnostic (AWS, GCP, Azure Supported) Snowflake Drivers and Connectors: Supports Go, JDBC, ODBC, .Net, [Link], PHP PDOst,
Python, Kafka Connector, Apace Spark.
Query Processing Layer:
- Data into optimized, Compressed and columnar format SQL Scripting: Snowflake Scripting is an extension to Snowflake SQL that adds support Code for View Creation :
- Data is stored in cloud storage, accessible to all nodes for procedural logic.
- Data not accessible directly, need to read via SF You can Declare Variables, return a value, branching construct, IF/CASE, Loops, FOR, CREATE OR REPLACE SECURE MATERIALIZED VIEW <View_name> AS SELECT….
- Cloud agnostic (AWS, GCP, Azure Supported) WHILE, REPEAT, LOOP, CURSOR, RESULTSET, determine number of rows impacted by
DML, (SQLROWCOUNT, SQLFOUND, SQLNOTFOUND), Handles Exception. Advantage of Materialized View: (Enterprise + feature)
Cloud Services:
- Snowflake cloud services coordinate platform activities Snowpark: Snowpark is a developer experience for Snowflake that allows developers to • Materialized views can improve the performance of repeat queries.
- Manage user requests from login to query dispatch write code in their preferred language (Python, Scala, or Java) and run that code directly • Materialized views are automatically by Snowflake.
- Responsible for security, access management, monitoring and other operational tasks on Snowflake. • Data accessed through materialized views is always Up-to-date.

- Authentication and access control verifies user identity and manages access to data. Database: A database is a logical grouping of schemas. Each database belongs to a single Extra cost of Storage and compute ( Managed by SF)
- Infrastructure manager provisions and manages compute resources for Snowflake. Snowflake account.
This Book is created as part of Udemy course Snowflake SnowPro Core Certification COF-C02 Masterclass, please complete the course
before reading this book, if you got this file without the course then you can purchase the course here. [Link]
actions can be taken using the changed data. This process is referred to as change data
Snowflake Loading: capture (CDC). --Create TASK for Serverless (Example)

Bulk Loading Continuous Loading Types CREATE OR REPLACE TASK TASK_DEMO_S


COPY Command Snowpipe + Copy • Standard USER_TASK_MANAGED_INITIAL_WAREHOUSE_SIZE = 'XSMALL' --By Default Medium
Large Data Incremental Data • Append-only SCHEDULE = '1 MINUTE'
User Managed WH Serverless • Insert-only (External Table) AS INSERT <command>;
64 Days Metadata 14 Days Metadata
Auth Done by Client Needs Key-Pair Auth with JSON web Supported table types
token • Standard tables (Including shared table)
• Directory tables
Supported File Type for loading : CSV, TSV, JSON, PARQUET, AVRO, ORC, XML
• External tables
• Underlying tables for a view
Bulk Loading:
Streams goes stale when source tables go out of retention period
Stale stream doesn’t track changes
• Load using Copy Command
Shall consume the stream within retention period
• Can also transform the data (Column Re-ordering, Column Omission, Casts,
Truncate)
• Auto Duplicate detection
• Stored in the metadata of the target table for 64 days. Available upon
completion of the COPY statement as the statement output.
• May not load in files in order
For scheduling can also use CRON option
Check Stream’s stale date in column STALE_AFTER
Continuous Loading /Pipes :
Extra column comes with Stream, METADATA$ACTION, METADATA$ISUPDATE
• We use Snowpipe METADATA$ROW_ID
• Loads data in micro batches, within a minute
• Serverless – Managed by Snowflake Stream Code Example:
• Use options like Stream, Tasks CREATE OR REPLACE STREAM raw_data_stream ON TABLE raw_data;

Example of PIPE: Append only Stream:


CREATE OR REPLACE STREAM raw_data_stream2 ON TABLE raw_data append_only=true;
CREATE [ OR REPLACE ] PIPE [ IF NOT EXISTS ] <name>
[ AUTO_INGEST = [ TRUE | FALSE ] ] Tasks:
[ ERROR_INTEGRATION = <integration_name> ] A task can execute, Single SQL statement, Call to a stored procedure, Procedural logic
[ AWS_SNS_TOPIC = '<string>' ] using Snowflake Scripting.
[ INTEGRATION = '<string>' ]
[ COMMENT = '<string_literal>' ] Tasks can execute with serverless or user-managed WH
AS <copy_statement>;

File Considerations while loading

-The number of load operations that run in parallel cannot exceed the number of data Sequences:
files to be loaded Sequences are used to generate unique numbers across sessions and statements;
-Split Large files in smaller ones and combine small ones in large one however, Snowflake doesn’t guarantee generating sequence numbers without gaps.
-Ideal size 100-250 mb (or larger) in size compressed
-Loading very large file eg. 100 GB or larger, not recommended Syntax : CREATE OR REPLACE SEQUENCE SEQUEN_EX START = 1 INCREMENT = 1;
-For large Files, carefully use ON_ERROR option
-If load goes beyond 24 hours, then job can be cancelled with no data load -Create TASK with User Managed Warehouse (Example) SELECT SEQUENCE_SAMPLE2.nextval a, SEQUENCE_SAMPLE2.nextval b,
-Consider accumulating file in source at 1 minute frequency SEQUENCE_SAMPLE2.nextval c, SEQUENCE_SAMPLE2.nextval d;
CREATE OR REPLACE TASK TASK_DEMO_U
Streams: WAREHOUSE = COMPUTE_WH - Default value for the INCREMENT and START is 1.
A stream object records data manipulation language (DML) changes made to tables, SCHEDULE = '1 MINUTE'
including inserts, updates, and deletes, as well as metadata about each change, so that AS INSERT <command>;
This Book is created as part of Udemy course Snowflake SnowPro Core Certification COF-C02 Masterclass, please complete the course
before reading this book, if you got this file without the course then you can purchase the course here. [Link]
Micro-Partitioning: Data storage monitoring
• Snowflake automatically divides table in micro-partitions
• Each micro-partition contains 50-500 MB of uncompressed data (SF Select Admin » Cost Management » Consumption
compress it further and stores)
• Organized in columnar format Snowflake provides various methods to view and control data storage for tables, stages,
• Very large tables can have multi-million micro-partitions and clones. Users can use Snowsight, the Classic Console, the SHOW TABLES command,
• Micro-partitions are immutable (Time-Travel, Cloning) or the TABLE_STORAGE_METRICS view to monitor storage usage and costs. Users can
also remove staged files or drop cloned objects to reduce storage.

Snowflake stores MP metadata: --Local Schema


• Range of values for each of the column in MP
• Number of district values SELECT * FROM LA_DB.INFORMATION_SCHEMA.TABLE_STORAGE_METRICS;
• Additional info which can be used for optimized and efficient query
processing -- Across Account

Clustering SELECT * FROM


Clustering is a technique by which Snowflake co-locate same value data in same SYSTEM$CLUSTERING_DEPTH - Computes the average depth of the table according to SNOWFLAKE_SAMPLE_DATA.INFORMATION_SCHEMA.TABLE_STORAGE_METRICS;
micro-partitions, to improve the performance and optimize the cost. the specified columns.
Search optimization service: (Enterprise + feature)
How to choose candidate for Clustering: SYSTEM$CLUSTERING_INFORMATION (including clustering depth) - Returns clustering
information, including average clustering depth, for a table based on one or more The search optimization service can significantly improve the performance of certain
• Table with large number of Micro-partitions columns in the table types of lookup and analytical queries that use an extensive set of predicates for
• Queries need to read small amount of data (WHERE clause) filtering.
• Queries Sort the data (Order by) Code for Clustering :
• Many/most queries do select on, or sort on, the same few column(s) What Kind of Queries can be improved?
ALTER TABLE <table_name> CLUSTER BY (<column>);
Snowflake maintains clustering metadata • Selective point look up queries, returns small distinct rows
How to choose Clustering Keys : • Substring or regular expression searches ( LIKE, ILIKE etc.)
• Total Number of Micro-Partitions
• Queries on fields in VARIANT, OBJECT, ARRAY columns (Equality, IN
• Number of micro-partitions containing values that overlap with each
• Use column most used in where clause Predicates etc.)
other
• Depth of the overlapping partitions • Choose Join predicates • Queries using selected geo func
• Balance out the cardinality (Lower >> minimal pruning, Higher >> Inefficient
Overlapping MP and Overlap Depth grouping)
• For Multi Clustering Key move from low to high cardinality

• Search Access Path gets updated with data needed for look up
• New updates in table gets automatically updated
• Check progress search_optimization_progress in SHOW TABLES
• Cost for storage and compute
This Book is created as part of Udemy course Snowflake SnowPro Core Certification COF-C02 Masterclass, please complete the course
before reading this book, if you got this file without the course then you can purchase the course here. [Link]
• When Scale Up, Charged only for extra compute
Before SOS • When Scale Down, Charged for both, for brief period

Query should have a column with 100-200k distinct values Compute Cost for Cloud services :
Selective filters bring very few records • Authentication, enforces security, performs query compilation and
Query runs at least for few seconds optimization, handles request query caching etc.
• Snowflake credits are used to pay for the processing time used by each
SOS Vs Clustering virtual warehouse.
• If usage is less than 10% of daily credit consumed, it is free.
SOS applies for all columns
SOS uses addition data structure Snowflake Storage Cost:

Code for SOS : • Staged File Costs


• Database Costs
ALTER TABLE <Table_Name> add search optimization; • Time Travel and Fail-safe Costs
• Temporary and Transient Tables Costs
Snowflake Billing :
• Cloning Tables, Schemas, and Databases Costs
Snowflake SQL API:
Snowflake Data Transfer Cost : Data transfer is the process of moving data into
(ingress) and out of (egress) Snowflake. The Snowflake SQL API is a REST API that you can use to access and update data in a
Snowflake database.
• Unloading - No charges for data ingress/loading but cost if you unload
• Replication - Data transfer in same region and same cloud provider is
free Cost when region or cloud provider is different
• Writing External Function - When data gets transferred out for function
• Per-Byte fee for Egress

*Snowflake does not apply data egress charges when a Snowflake client or driver
retrieves query results across regions within the same cloud platform or across
different cloud platforms

Features
Compute Cost :
- Submit SQL statements for execution (Queries, DDL,DML)
- Check the status of the execution of a statement
• A virtual warehouse is one or more clusters of compute resources that Streamlit is an open-source Python library that makes it easy to create and share
- Cancel the execution of a statement
enable executing queries, loading data, and performing other DML custom web apps for machine learning and data science.
- Manage Deployments ( provision user, role etc.)
operations.
• Snowflake credits are used to pay for the processing time used by each Limitation
virtual warehouse. - PUT and GET are not supported

Billing
- Leverages cloud service layer

Cost Based on Size of VW and How long they Run, not on complexity of the Query. Account Access and Security

• Billed By Seconds Network security and policies (Admin » Security » Network Policies)
• 60 Sec minimum • Allows to restrict access to SF based on IP address
This Book is created as part of Udemy course Snowflake SnowPro Core Certification COF-C02 Masterclass, please complete the course
before reading this book, if you got this file without the course then you can purchase the course here. [Link]
• SECURITYADMIN or Higher (eg ACCOUNTADMIN) can create network GRANT USAGE ON SCHEMA DUMMY.DUMMY_SC TO ROLE DUMMY_ROLE;
policies
• Can be set at ACCOUNT or USER level Type of table privilege
• Only one Policy per ACCOUNT, or per USER SELECT
TRUNCATE
• Supports CIDR Notation
DELETE
• Blocked IP gets preference OWNERSHIP
• Support only Internet Protocol version 4 (i.e. IPv4) addresses, not IPv6 Etc…

CREATE NETWORK POLICY mypolicy1


ALLOWED_IP_LIST=('[Link]/24')
BLOCKED_IP_LIST=('[Link]');
System Defined and Custom Roles

Multi-Factor Authentication (MFA) :Powered By Duo Security Service

• Managed by Snowflake, User can self enroll


• Not enabled by default Securable object: An entity to which access can be granted. Unless allowed by a grant,
• Recommended to have MFA for all ACCOUNTADMIN role access is denied.
• Ideally for Snowsight but fully supported for Snowsql, JDBC/ODBC
drivers Role: An entity to which privileges can be granted. Roles are in turn assigned to users.
Note that roles can also be assigned to other roles, creating a role hierarchy.

Privilege: A defined level of access to an object. Multiple distinct privileges may be used
to control the granularity of access granted.

User: A user identity recognized by Snowflake, whether associated with a person or


program.

Account roles ORGADMIN : At Org level, Can create Account, can run SHOW ORGANIZATION
To permit SQL actions on any object in your account, grant privileges on the object to an ACCOUNTS, SHOW REGIONS
account role. ACCOUNTADMIN : Parent to SYSADMIN and SECURITYADMIN
SECURITYADMIN : Manages GRANTS, Manage USERS/ROLES, PARENT OF USERADMIN
Database Roles SYSADMIN : Create DB, WH, Assign all custom role to SYSADMIN
Federated Authentication (SSO) : In a federated environment, user authentication is To limit SQL actions to a single database, as well as any object in the database, grant USERADMIN : CREATE USER/ROLE
separated from user access through the use of one or more external entities that privileges on the object to a database role in the same database. PUBLIC : Pseudo Role, Auto granted to all
provide independent authentication of user credentials.
Primary Role Role Hierarchy and Privilege Inheritance:
Service Provider: The default assumed Role USE ROLE <ROLE NAME>
In a Snowflake federated environment, Snowflake serves as the SP.
Identity provider (IdP): Secondary Role
Creating and maintaining user credentials and other profile information. Can assign any number of secondary role
Authenticating users for SSO access to the SP.
Privileges are granted to roles, and roles are granted to users, to specify the operations
Snowflake Supports SCIM 2.0, Native app support : Microsoft and Okta
that the users can perform on objects in the system.

Access Control Framework (DAC + RBAC) -- Grant ROLE To new user


Discretionary Access Control (DAC): Each object has an owner, who can in turn grant GRANT ROLE DUMMY_ROLE TO USER USER1;
access to that object.
Role-based Access Control (RBAC): Access privileges are assigned to roles, which are in -- Grant SELECT on TABLE
turn assigned to users. GRANT SELECT ON TABLE DUMMY.DUMMY_SC.cust_det TO ROLE DUMMY_ROLE;

-- USAGE on DB and SCHEMA Needed in order to access TABLE


GRANT USAGE ON DATABASE DUMMY TO ROLE DUMMY_ROLE;
This Book is created as part of Udemy course Snowflake SnowPro Core Certification COF-C02 Masterclass, please complete the course
before reading this book, if you got this file without the course then you can purchase the course here. [Link]
Database Role:

-- Find the login history at account level


SELECT * FROM account_usage.login_history;

Access History : (Enterprise + feature) Access History in Snowflake refers to when the
user query reads data and when the SQL statement performs a data write operation,
such as INSERT, UPDATE, and DELETE along with variations of the COPY command,
from the source data object to the target data object.

• Needs Enterprise Edition or Higher


• Maintains history of read and write operations
• Used for compliance auditing
• Stores data on source column, object accessed directly and indirectly
An organization is First-class Snowflake object that links the accounts owned by the org
Column Level Security : (Enterprise + feature) Organizations simplify Account management and billing, Replication & Failover/Failback,
Columns in Access_History
Snowflake Secure Data Sharing and administration tasks.
Dynamic Data Masking (Needs Enterprise Edition or Higher) - Use ORGADMIN Role
- Use masking policies to selectively mask column data QUERY_ID | QUERY_START_TIME | USER_NAME | DIRECT_OBJECTS_ACCESSED |
Account identifier uniquely identify the SF account: BASE_OBJECTS_ACCESSED | OBJECTS_MODIFIED | OBJECT_MODIFIED_BY_DDL |
External Tokenization Format: <account_identifier>.[Link] POLICIES_REFERENCED
- Tokenization is the process of removing sensitive data by replacing it with an
undecipherable token <account_identifier > = <orgname>-<account_name> Code example for access_history :

SELECT * FROM SNOWFLAKE.ACCOUNT_USAGE.ACCESS_HISTORY


WHERE USER_NAME = 'TRAINING' ORDER BY QUERY_START_TIME;

Information Schema: Object Tag: (Enterprise + feature)


The Snowflake Information Schema (aka “Data Dictionary”) consists of a set of system-
defined views and table functions that provide extensive metadata information about
Object tags in Snowflake are schema-level objects that allow you to attach metadata
the objects created in your account. The Snowflake Information Schema is based on the
to other Snowflake objects, such as tables, views, and columns. Max 50 tags per
SQL-92 ANSI Information Schema, but with the addition of views and functions that are
specific to Snowflake. object.

• Read Only Schema CREATE TAG IF NOT EXISTS COST ALLOWED_VALUES 'MKT','ADVT' COMMENT =
• Available for every DB ‘comment’;
• Maintains data about all DB object and some account level information, In
form of Views
• Table function for historical and usage data across your account Performance & Cost Optimization Concepts
• Need Privilege
Query Profile: It provides a graphical representation of the main components of the
-- Find the login history with information schema processing plan for the query, with statistics for each component, along with details
Row Level Security : (Enterprise + feature)
SELECT * FROM TABLE(information_schema.login_history()); and statistics for the overall query.
Only relevant rows are shown to authorized person
Account Usage:
Views that display object metadata and usage metrics for your account.
- Type: Account_Usage and Readers_Account_Usage
This Book is created as part of Udemy course Snowflake SnowPro Core Certification COF-C02 Masterclass, please complete the course
before reading this book, if you got this file without the course then you can purchase the course here. [Link]
Warehouse Cache (Local Disk Cache) Maximum number of clusters, greater than 1 (up to 10).
At Query Processing Layer Minimum number of clusters, equal to or less than the maximum (up to 10).
Local to the WH
Purged when WH is suspended or shut Additionally, multi-cluster warehouses support all the same properties and actions as
Need to find whether it is worth to auto-suspend a WH single-cluster warehouses, including:

Specifying a warehouse size.


Query Pruning: Skipping unnecessary scan of micro-partition is Query Pruning
Resizing a warehouse at any time.
Improves the performance.
Auto-suspending a running warehouse due to inactivity; note that this does not apply
Virtual Warehouse Config Options : to individual clusters, but rather the entire multi-cluster warehouse.
Auto-resuming a suspended warehouse when new queries are submitted.

Scale Out) (Enterprise + feature)) in Snowflake refers to the ability of Snowflake to


automatically add or remove clusters (As per the demand) to a virtual data
warehouse based on the query load or concurrent user.

Scale Up in Snowflake means increasing the resources for a single warehouse. It’s like
EXPLAIN PLANS
upgrading from a SMALL to a MEDIUM or from a MEDIUM to a LARGE warehouse.
EXPLAIN [ USING {TABULAR | JSON | TEXT}] <statement>
Scale Down is exactly opposite of Scale up. Happens post completion of existing
Query History (Option under Activity)
queries.
Provides 14 days history
Scaling Policy:
QUERY_HISTORY (Under Information Schema)
Last 7 days activity

QUERY_HISTORY View (Under SF Account Usage)


Up to 365 days activity

Data Spilling: Data spilling occurs when an operation cannot fit in memory. The
system begins by spilling data to the local disk of a warehouse node then to remote
disks. These operations can significantly slow down query execution. To mitigate this,
it’s recommended to use a larger warehouse, process data in smaller batches, or
optimize the query.

Types of Cache:

You can monitor Warehouse load Admin>>Warehouse: You can check warehouse
Query Result Cache (USE_CACHED_RESULT)
wise Query history.
In Cloud services layer
Expires in 24 hours
Work for exact same query
Doesn’t work with functions, which evaluates at runtime e.g. CURRENT_TIMESTAMP()
and UUID_STRING(), CURRENT_DATE() is exception
Doesn’t work with external functions
Doesn’t work when, table change, config change, Micro-partitions change

Metadata Cache Multi Cluster Warehouse:


In Cloud services layer Resource Monitor:
Stores Metadata Information Multi-cluster warehouses, Snowflake supports allocating, either statically or
Warehouse not needed dynamically, additional clusters to make a larger pool of compute resources available. • Monitoring credit usage by user managed WH and cloud services (Not
Row Count, MIN/MAX values, DISTINCT count etc. Serverless)
Options available: • Can set at Account level or WH level
This Book is created as part of Udemy course Snowflake SnowPro Core Certification COF-C02 Masterclass, please complete the course
before reading this book, if you got this file without the course then you can purchase the course here. [Link]
• Can set Credit Quota (Eg, 100 Credit)
• Monitor consumption by User managed WH and Cloud services
• You can define frequency at which monitoring resets
• (Daily, Weekly, Monthly, Yearly, Never)
• Action on reaching threshold (Suspend Immediately,Suspend,Notify)

Query Acceleration Service (QAS) (Enterprise + feature)

• QAS can accelerate part of WH workload


• Reduces the impact of outlier query
• Works for ( Ad hoc Analysis, Unpredictable data volume per query, Queries
with large scan and selective filter)

SF Performance Optimization for below areas:

Storage: Clustering, QAS, SOS, Materialized View, Purge stage data, use of temp table,
Cost Consideration during replication and failover
Save on time travel (DATA_RETENTION_TIME_IN_DAYS), Monitor memory
consumption.
Data Transfer: Initial replication, Subsequent Sync FILE FORMAT:
Compute: To identify delta, to copy
Warehouse : Reduce Queues, Reduce spilling, Right size WH, WH cache optimization,
Storage: Target account Supported File Type for loading : CSV (Default),TSV, JSON, PARQUET, AVRO, ORC, XML
limit concurrent queries (MAX_CONCURRENCY_LEVEL),Separate WH , Monitor WH,
Control Cost: Select objects, adjust the refresh frequency Supported File Type for Unloading : CSV (Default),TSV,JSON,PARQUET
Resource Monitor.
Monitor and Control Cost Compression : AUTO (Default) | GZIP | BZ2 | BROTLI | ZSTD | DEFLATE | RAW_DEFLATE | NONE
Query : Query Profile stat, Query History, Avoid SELECT *, Use filter.
Resource Monitor : To monitor WH, can notify, suspend Data can be loaded/unloaded with UI or SQL statement.
Cost Optimization :
Budget : At account level, Send mail, also forecasting feature
Check Cost Insights in Admin>>Cost Mgmt >> Account Overview File Format definition (Example)
Attributing Costs
CREATE OR REPLACE FILE FORMAT my_csv_ff
Tagging : For logical bifurcation and chargeback using query/object tag TYPE = CSV
FIELD_DELIMITER = ','
To find cost use Views TAG_REFERENCES, WAREHOUSE_METERING_HISTORY, SKIP_HEADER = 1
QUERY_ATTRIBUTION_HISTORY from ACCOUNT_USAGE. NULL_IF = ('NULL', 'null')
EMPTY_FIELD_AS_NULL = true
Data Loading and Unloading COMPRESSION = NONE;

Stage : “Snowflake refers to the location of data files in cloud storage as a stage” Describe FF
DESC FILE FORMAT my_csv_ff_new;

Show FF
SHOW FILE FORMATS;
Types of objects to be used :
Drop FF
DROP FILE FORMAT my_csv_ff_new;

FF can be set at COPY INTO, STAGE and TABLE DEFINITION.

Put Command
PUT [Link] Path>CSV @~/customers_user/staged AUTO_COMPRESS = FALSE
OVERWRITE = TRUE;

COPY INTO example:

COPY INTO mytable


This Book is created as part of Udemy course Snowflake SnowPro Core Certification COF-C02 Masterclass, please complete the course
before reading this book, if you got this file without the course then you can purchase the course here. [Link]
FROM s3://mybucket/data/files Frequency Estimate: Uses Space Saving Algo to approximate frequency of value in the Example : Example : SELECT province, store_ID, profit,100 * RATIO_TO_REPORT(profit)
STORAGE_INTEGRATION = myint dataset. Function like: APPROX_TOP_K, OVER (PARTITION BY province) AS percent_profit FROM store_profit ORDER BY
ENCRYPTION=(MASTER_KEY = 'eSx...') APPROX_TOP_K_ACCUMULATE/COMBINE/ESTIMATE province, store_ID;
FILE_FORMAT = (FORMAT_NAME = my_csv_format);
Percentile Estimate: Uses T-digest Algo to approximate percentile values in the dataset. System Function : Control Functions that allows you to execute action in the system.
Various option with COPY INTO ON_ERROR Options: Example : SELECT SYSTEM$ABORT_SESSION(1065153868222);
SAMPLING :
ON_ERROR = { CONTINUE | SKIP_FILE (DEFAULT IN SNOWPIPE) | SKIP_FILE_<num> | Table Function : A table function returns a set of rows for each input row. The returned
'SKIP_FILE_<num>%' | ABORT_STATEMENT (DEFAULT IN BULK LOADING) } FIXED-SIZE set can contain zero, one, or more rows. Each row can contain one or more columns.
SIZE_LIMIT = <num> Fixed Numbers of rows are returned Example : VALIDATE
RETURN_FAILED_ONLY = TRUE | FALSE (DEFAULT) Example: SELECT COUNT(*) FROM SNOWFLAKE_SAMPLE_DATA.TPCH_SF1.CUSTOMER
MATCH_BY_COLUMN_NAME = CASE_SENSITIVE | CASE_INSENSITIVE | NONE SAMPLE(100 ROWS); UDF and UDTF:
(DEFAULT) --- “With user-defined functions (UDFs), you can extend the system to perform operations
ENFORCE_LENGTH = TRUE (DEFAULT) | FALSE FRACTION BASED that are not available through the built-in, system-defined functions provided by
TRUNCATECOLUMNS = TRUE | FALSE BERNOULLI | ROW : (p/100)*n, Number of rows vary, row wise Snowflake.”
FORCE = TRUE | FALSE Example: SELECT COUNT(*) FROM SNOWFLAKE_SAMPLE_DATA.TPCH_SF1.CUSTOMER UDF with returning Table is called UDTF.
PURGE = TRUE | FALSE TABLESAMPLE BERNOULLI (20.3) SEED(10); Languages Support: SQL, JS, Python, Java

SYSTEM | BLOCK : Faster, Might be biased, block wise Example: SELECT [Link].udf_sample()
VALIDATE and VALIDATION_MODE
Example: SELECT COUNT(*) FROM SNOWFLAKE_SAMPLE_DATA.TPCH_SF1.CUSTOMER Stored Procedure:
VALIDATION_MODE = RETURN_n_ROWS | RETURN_ERRORS | RETURN_ALL_ERRORS TABLESAMPLE(1); A stored procedure is a named, precompiled collection of SQL statements and control-of-
flow language constructs that are stored in a database and can be called from SQL CALL
Example : COPY INTO customers_named Sampling on Join is possible if, BERNOULII/ROW BASED, No Seed Used. command, including other stored procedures.
FROM @internal_named_stage SAMPLE and TABLESAMPLE are synonymous. Languages Support: SQL, JS, Python (These 3 via Snowpark), Java, Scala
FILE_FORMAT = my_csv_ff
ON_ERROR = CONTINUE Supporting Function Types: UDF Vs SP:
RETURN_FAILED_ONLY = TRUE;
Feature User-Defined Function (UDF) Stored Procedure
VALIDATE: Validates the files loaded in a past execution of the COPY INTO <table> Return Value Must return a value Not Mandatory to return a value
command Use case When part of SQL stmt For DML, DDL Operations
Example: SELECT * FROM TABLE (VALIDATE (customers_named, JOB_ID => '_last')) Java, JS, Python, Scala, Snowflake
Languages Java, JS, Python, Scala, SQL
Scripting
Unloading data : Can be called for every row of a query Can only be executed as a
- Bulk(Table) unloading with COPY INTO <Location> Granularity
result set standalone command
- For getting single file from multiple files use SINGLE = TRUE, default is FALSE Inside Single SP with a statement, Can
- Can partition the data by PARTITION BY Multiple UDFs can be called
SELECT have nested SP
DB Access May not have access Has access
GET Command
GET @internal_named_stage file ://<File_path>;
Semi-Structured Data : Semi-structured data is data that does not conform to the
-- INSERT OVERWRITES the existing data in the employees table with the new data from
standards of traditional structured data, but contains tags (labels) or other types of
the select statement.
mark-up that identify individual, distinct entities within the data.
INSERT OVERWRITE INTO employees SELECT id, name, department FROM employees
WHERE department = 'Engineering';
Snowflake provides native data types (ARRAY, OBJECT, and VARIANT)
Snowflake supports querying the data
Data Transformations
Supported Data formats (For Load)
Scalar Function: Returns 1 value per invocation.
- JSON
Estimation Functions: Example: SELECT CEIL(135.135); Returns >> 136
- Avro
Distinct Count: Uses HyperLogLog (HLL) Algo to count distinct value in a column.
- ORC
Aggregate Function: Returns 1 value per group of rows.
- Parquet
Similarity Estimate: Uses MinHash Function to find similarity between 2 datasets. Example: SELECT AVG(salary) from <table>;
- XML
Window Function : Aggregate functions that can operate on a subset of rows within the
Supported Data formats (For Unload)
set of input rows.
- JSON
This Book is created as part of Udemy course Snowflake SnowPro Core Certification COF-C02 Masterclass, please complete the course
before reading this book, if you got this file without the course then you can purchase the course here. [Link]
- Parquet

In a VARIANT column, JSON null values are stored as a string containing the word “null”

DATA ENCRYPTION:

At Rest : Encrypted with AES 256 Bit


At Transit : TLS 1.2

Periodic Key Rotation


- Snowflake auto-rotates the key every 30 days
Example of flattening of data: - Post 30 days, Active Keys get retired
CREATE OR REPLACE table dump_json as
SELECT src:Department::string as dep, Periodic Rekeying (Enterprise + feature)
[Link]:SubZone :: string as subzone_col, - Manual Enablement
[Link]:Store :: string as store_col, - Destroys retired key post one year
[Link]:SR :: string as sr_col, - Ensures that data is encrypted with latest tech
[Link]:Sales :: string as sales_col
Directory :
• Stores file level meta data about files in stage Tri-Secret Secure (Business Critical Edition on Higher)
FROM raw_source, - Combination of CMK and SMK
LATERAL FLATTEN(INPUT => SRC:Zone) as Zone, • Implicit object layered on stage
• Can be enabled for external/internal stage - Composite Master Key is used
LATERAL FLATTEN(INPUT => [Link]:Division) as div, - If CMK is revoked, You cant decrypt data in Snowflake
LATERAL FLATTEN(INPUT => [Link]:Orders) as ord; • Stores URL by BUILD_STAGE_FILE_URL
- Provides 3 levels of data protection: encryption, authentication, and control
• Can be enabled during stage creation or later (by alter)
Predicates determine the data types for values in semi-structured data. • Data can be refreshed manually or automatically (only for cloud i.e external
Eg. SELECT * FROM new_vartab WHERE IS_ARRAY(data) stages)
• No explicit privilege required to access
Full list here : [Link]

Unstructured data : Unstructured data is information that does not fit into a predefined Example Code :
data model or schema.
Example : PDF, Image, Video, VCF (genomics), KDF (semiconductors),HDF5 (aeronautics) CREATE OR REPLACE STAGE internal_UN_stage
etc. ENCRYPTION = (TYPE = 'SNOWFLAKE_SSE')
DIRECTORY=(ENABLE=TRUE);
SF Supports for Unstructured data :
Data Protection and Data Sharing
• Securely access data files located in cloud storage.
• Share file access URLs with collaborators and partners. Continuous Data Protection with Snowflake:
• Load file access URLs and other file metadata into Snowflake tables.
• Process Unstructured data.
This Book is created as part of Udemy course Snowflake SnowPro Core Certification COF-C02 Masterclass, please complete the course
before reading this book, if you got this file without the course then you can purchase the course here. [Link]
- Source in past doesn’t exists Secure Data Sharing (Not available for VPS): Secure Data Sharing lets you share
- Pipe object with AUTO_INGEST = TRUE recreated selected objects in a database in your account with other Snowflake accounts. No data
gets copied or replicated.
Code : CREATE OR REPLACE TABLE clone_data CLONE sample_data;
Secure data sharing can be done with other snowflake account or readers (non-sf)
REPLICATION :
account.

Below objects can be shared:

• Tables
• External tables
• Secure views
• Secure materialized views
• Secure UDFs

When data shared with other SF account, consumer pays for the compute.
When data shared with none SF account, provider pays for the compute.

Data Sharing Considerations:

• Shared databases are read-only. Users in a consumer account can


CLONING: view/query data, but cannot insert or update data, or create any objects in
Cloning in Snowflake refers to the process of creating an exact copy the database
of a database, schema, or table. This feature can be used to create a • No limit on Number of Shares or Number of accounts it is shared with
backup of existing data, test new data structures, or create a new • Only for DB objects • Can not create a clone of a shared database or any schemas/tables in the
environment for development or testing purposes. • Privileges granted on DB are not replicated database
• Error when replicating to lower edition (have workaround) • Not supported, Time Travel for a shared database or any schemas/tables in
Cloning doesn’t copy the load metadata, hence any file loaded in original table, can be • Can replicate to multiple accounts the database
loaded in cloned table. • Data transfer and Data Compute cost • Not supported, Editing the comments for a shared database.
• Shared databases and all the objects in the database cannot be re-shared
Objects which can be cloned and which can’t be cloned shown below. with other accounts
Below objects are not supported for Replications: • Shared databases cannot be replicated
• Can consume as many as shares you want but can create only db per share
• External tables • Can’t share from Business critical to non-Business critical account, can
• Event tables override settings
• Stages • Sharing happens between same cloud provider and same region (For cross
• Pipes region, need to replicate first)
• DBs created from SHARE • Sharing TO and FROM VPS is not supported

Data Provider : Under Data > Provider Studio

• Need ACCOUNTADMIN or relevant Auth


• Can create and publish listing
• Monitor the performance of listing

Cloning with Time-Travel is only available for Only available for Database, Schema, Table
& Streams
Cloning fails if
This Book is created as part of Udemy course Snowflake SnowPro Core Certification COF-C02 Masterclass, please complete the course
before reading this book, if you got this file without the course then you can purchase the course here. [Link]

Best wishes for your exam. For any feedback or


corrections, you can always reach us at
learn.anywhere1@[Link]

You might also like