0% found this document useful (0 votes)

14 views12 pages

Snowflake Snowpro Core Ebook

This document serves as a guide for the Udemy course on Snowflake SnowPro Core Certification, detailing Snowflake's architecture, features, and functionalities. It covers aspects such as data loading methods, query processing, security measures, and billing structures. Additionally, it provides code examples for creating tables, views, and tasks within the Snowflake environment.

Uploaded by

8527479210

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views12 pages

Snowflake Snowpro Core Ebook

Uploaded by

8527479210

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

This Book is created as part of Udemy course Snowflake SnowPro Core Certification COF-C02 Masterclass, please complete the

course
before reading this book, if you got this file without the course then you can purchase the course here. [Link]
Snowflake Data Cloud Features and Architecture: - Snowflake optimizer determines the most efficient way to execute a query.
What is Snowflake? - Metadata Manager manages the metadata associated with data stored in Snowflake. Schema: A schema is a logical grouping of database objects (tables, views, etc.). Each
• Cloud based data platform (AWS, GCP, Azure Supported) - Snowflake Security includes encryption, network isolation, and other security features schema belongs to a single database
• Not built from old db tech to protect data.
• Cloud native Table Types :
• Separates Compute from Storage
• It is SaaS (Min upfront cost, Pay-as-you-go, fully managed Infra) Snowflake has multiple editions to choose from:

Traditional Architecture:
Shared Disk Architecture (ex: Oracle Real Application Clusters)
- High availability/reliability, easy data sharing b/w nodes, dynamic load balance
- Single point of failure, Network bottleneck
Shared Nothing Architecture (ex: Teradata) *External tables are read only and points to external stage.
- High scalable, better performance, flexible conf
- Fixed load balancing, Data Skew, Time consuming data partitioning Code for table creation:
For each Snowflake account, you choose Edition, cloud provider and location/Region.
Snowflake (SF) Architecture: SF has 3 distinct layer, Cloud Services, Query Processing Snowflake UI is called Snowsight. CREATE TEMPORARY (TEMP)| TRANSIENT| PERMANENT (Optional) TABLE
Layer, Database Storage. <Table_name> (
Data Exchange: Snowflake support can enable it, can share and control data between ID NUMBER,
selected group of entities. );

Data Types:

View Type:

Database Storage: Snowflake Partners: Huge partner network for data integration, ML/Data Science,
- Data into optimized, Compressed and columnar format (Proprietary) Security, Governance & Observability, BI, SQL Dev & Mgmt and Native programming
- Data is stored in cloud storage, accessible to all nodes Interface.
- Data not accessible directly, need to read via SF
- Cloud agnostic (AWS, GCP, Azure Supported) Snowflake Drivers and Connectors: Supports Go, JDBC, ODBC, .Net, [Link], PHP PDOst,
Python, Kafka Connector, Apace Spark.
Query Processing Layer:
- Data into optimized, Compressed and columnar format SQL Scripting: Snowflake Scripting is an extension to Snowflake SQL that adds support Code for View Creation :
- Data is stored in cloud storage, accessible to all nodes for procedural logic.
- Data not accessible directly, need to read via SF You can Declare Variables, return a value, branching construct, IF/CASE, Loops, FOR, CREATE OR REPLACE SECURE MATERIALIZED VIEW <View_name> AS SELECT….
- Cloud agnostic (AWS, GCP, Azure Supported) WHILE, REPEAT, LOOP, CURSOR, RESULTSET, determine number of rows impacted by
DML, (SQLROWCOUNT, SQLFOUND, SQLNOTFOUND), Handles Exception. Advantage of Materialized View: (Enterprise + feature)
Cloud Services:
- Snowflake cloud services coordinate platform activities Snowpark: Snowpark is a developer experience for Snowflake that allows developers to • Materialized views can improve the performance of repeat queries.
- Manage user requests from login to query dispatch write code in their preferred language (Python, Scala, or Java) and run that code directly • Materialized views are automatically by Snowflake.
- Responsible for security, access management, monitoring and other operational tasks on Snowflake. • Data accessed through materialized views is always Up-to-date.

- Authentication and access control verifies user identity and manages access to data. Database: A database is a logical grouping of schemas. Each database belongs to a single Extra cost of Storage and compute ( Managed by SF)
- Infrastructure manager provisions and manages compute resources for Snowflake. Snowflake account.
This Book is created as part of Udemy course Snowflake SnowPro Core Certification COF-C02 Masterclass, please complete the course
before reading this book, if you got this file without the course then you can purchase the course here. [Link]
actions can be taken using the changed data. This process is referred to as change data
Snowflake Loading: capture (CDC). --Create TASK for Serverless (Example)

Bulk Loading Continuous Loading Types CREATE OR REPLACE TASK TASK_DEMO_S

COPY Command Snowpipe + Copy • Standard USER_TASK_MANAGED_INITIAL_WAREHOUSE_SIZE = 'XSMALL' --By Default Medium
Large Data Incremental Data • Append-only SCHEDULE = '1 MINUTE'
User Managed WH Serverless • Insert-only (External Table) AS INSERT <command>;
64 Days Metadata 14 Days Metadata
Auth Done by Client Needs Key-Pair Auth with JSON web Supported table types
token • Standard tables (Including shared table)
• Directory tables
Supported File Type for loading : CSV, TSV, JSON, PARQUET, AVRO, ORC, XML
• External tables
• Underlying tables for a view
Bulk Loading:
Streams goes stale when source tables go out of retention period
Stale stream doesn’t track changes
• Load using Copy Command
Shall consume the stream within retention period
• Can also transform the data (Column Re-ordering, Column Omission, Casts,
Truncate)
• Auto Duplicate detection
• Stored in the metadata of the target table for 64 days. Available upon
completion of the COPY statement as the statement output.
• May not load in files in order
For scheduling can also use CRON option
Check Stream’s stale date in column STALE_AFTER
Continuous Loading /Pipes :
Extra column comes with Stream, METADATA$ACTION, METADATA$ISUPDATE
• We use Snowpipe METADATA$ROW_ID
• Loads data in micro batches, within a minute
• Serverless – Managed by Snowflake Stream Code Example:
• Use options like Stream, Tasks CREATE OR REPLACE STREAM raw_data_stream ON TABLE raw_data;

Example of PIPE: Append only Stream:

CREATE OR REPLACE STREAM raw_data_stream2 ON TABLE raw_data append_only=true;
CREATE [ OR REPLACE ] PIPE [ IF NOT EXISTS ] <name>
[ AUTO_INGEST = [ TRUE | FALSE ] ] Tasks:
[ ERROR_INTEGRATION = <integration_name> ] A task can execute, Single SQL statement, Call to a stored procedure, Procedural logic
[ AWS_SNS_TOPIC = '<string>' ] using Snowflake Scripting.
[ INTEGRATION = '<string>' ]
[ COMMENT = '<string_literal>' ] Tasks can execute with serverless or user-managed WH
AS <copy_statement>;

File Considerations while loading

-The number of load operations that run in parallel cannot exceed the number of data Sequences:
files to be loaded Sequences are used to generate unique numbers across sessions and statements;
-Split Large files in smaller ones and combine small ones in large one however, Snowflake doesn’t guarantee generating sequence numbers without gaps.
-Ideal size 100-250 mb (or larger) in size compressed
-Loading very large file eg. 100 GB or larger, not recommended Syntax : CREATE OR REPLACE SEQUENCE SEQUEN_EX START = 1 INCREMENT = 1;
-For large Files, carefully use ON_ERROR option
-If load goes beyond 24 hours, then job can be cancelled with no data load -Create TASK with User Managed Warehouse (Example) SELECT SEQUENCE_SAMPLE2.nextval a, SEQUENCE_SAMPLE2.nextval b,
-Consider accumulating file in source at 1 minute frequency SEQUENCE_SAMPLE2.nextval c, SEQUENCE_SAMPLE2.nextval d;
CREATE OR REPLACE TASK TASK_DEMO_U
Streams: WAREHOUSE = COMPUTE_WH - Default value for the INCREMENT and START is 1.
A stream object records data manipulation language (DML) changes made to tables, SCHEDULE = '1 MINUTE'
including inserts, updates, and deletes, as well as metadata about each change, so that AS INSERT <command>;
This Book is created as part of Udemy course Snowflake SnowPro Core Certification COF-C02 Masterclass, please complete the course
before reading this book, if you got this file without the course then you can purchase the course here. [Link]
Micro-Partitioning: Data storage monitoring
• Snowflake automatically divides table in micro-partitions
• Each micro-partition contains 50-500 MB of uncompressed data (SF Select Admin » Cost Management » Consumption
compress it further and stores)
• Organized in columnar format Snowflake provides various methods to view and control data storage for tables, stages,
• Very large tables can have multi-million micro-partitions and clones. Users can use Snowsight, the Classic Console, the SHOW TABLES command,
• Micro-partitions are immutable (Time-Travel, Cloning) or the TABLE_STORAGE_METRICS view to monitor storage usage and costs. Users can
also remove staged files or drop cloned objects to reduce storage.

Snowflake stores MP metadata: --Local Schema

• Range of values for each of the column in MP
• Number of district values SELECT * FROM LA_DB.INFORMATION_SCHEMA.TABLE_STORAGE_METRICS;
• Additional info which can be used for optimized and efficient query
processing -- Across Account

Clustering SELECT * FROM

Clustering is a technique by which Snowflake co-locate same value data in same SYSTEM$CLUSTERING_DEPTH - Computes the average depth of the table according to SNOWFLAKE_SAMPLE_DATA.INFORMATION_SCHEMA.TABLE_STORAGE_METRICS;
micro-partitions, to improve the performance and optimize the cost. the specified columns.
Search optimization service: (Enterprise + feature)
How to choose candidate for Clustering: SYSTEM$CLUSTERING_INFORMATION (including clustering depth) - Returns clustering
information, including average clustering depth, for a table based on one or more The search optimization service can significantly improve the performance of certain
• Table with large number of Micro-partitions columns in the table types of lookup and analytical queries that use an extensive set of predicates for
• Queries need to read small amount of data (WHERE clause) filtering.
• Queries Sort the data (Order by) Code for Clustering :
• Many/most queries do select on, or sort on, the same few column(s) What Kind of Queries can be improved?
ALTER TABLE <table_name> CLUSTER BY (<column>);
Snowflake maintains clustering metadata • Selective point look up queries, returns small distinct rows
How to choose Clustering Keys : • Substring or regular expression searches ( LIKE, ILIKE etc.)
• Total Number of Micro-Partitions
• Queries on fields in VARIANT, OBJECT, ARRAY columns (Equality, IN
• Number of micro-partitions containing values that overlap with each
• Use column most used in where clause Predicates etc.)
other
• Depth of the overlapping partitions • Choose Join predicates • Queries using selected geo func
• Balance out the cardinality (Lower >> minimal pruning, Higher >> Inefficient
Overlapping MP and Overlap Depth grouping)
• For Multi Clustering Key move from low to high cardinality

• Search Access Path gets updated with data needed for look up
• New updates in table gets automatically updated
• Check progress search_optimization_progress in SHOW TABLES
• Cost for storage and compute
This Book is created as part of Udemy course Snowflake SnowPro Core Certification COF-C02 Masterclass, please complete the course
before reading this book, if you got this file without the course then you can purchase the course here. [Link]
• When Scale Up, Charged only for extra compute
Before SOS • When Scale Down, Charged for both, for brief period

Query should have a column with 100-200k distinct values Compute Cost for Cloud services :
Selective filters bring very few records • Authentication, enforces security, performs query compilation and
Query runs at least for few seconds optimization, handles request query caching etc.
• Snowflake credits are used to pay for the processing time used by each
SOS Vs Clustering virtual warehouse.
• If usage is less than 10% of daily credit consumed, it is free.
SOS applies for all columns
SOS uses addition data structure Snowflake Storage Cost:

Code for SOS : • Staged File Costs

• Database Costs
ALTER TABLE <Table_Name> add search optimization; • Time Travel and Fail-safe Costs
• Temporary and Transient Tables Costs
Snowflake Billing :
• Cloning Tables, Schemas, and Databases Costs
Snowflake SQL API:
Snowflake Data Transfer Cost : Data transfer is the process of moving data into
(ingress) and out of (egress) Snowflake. The Snowflake SQL API is a REST API that you can use to access and update data in a
Snowflake database.
• Unloading - No charges for data ingress/loading but cost if you unload
• Replication - Data transfer in same region and same cloud provider is
free Cost when region or cloud provider is different
• Writing External Function - When data gets transferred out for function
• Per-Byte fee for Egress

*Snowflake does not apply data egress charges when a Snowflake client or driver
retrieves query results across regions within the same cloud platform or across
different cloud platforms

Features
Compute Cost :
- Submit SQL statements for execution (Queries, DDL,DML)
- Check the status of the execution of a statement
• A virtual warehouse is one or more clusters of compute resources that Streamlit is an open-source Python library that makes it easy to create and share
- Cancel the execution of a statement
enable executing queries, loading data, and performing other DML custom web apps for machine learning and data science.
- Manage Deployments ( provision user, role etc.)
operations.
• Snowflake credits are used to pay for the processing time used by each Limitation
virtual warehouse. - PUT and GET are not supported

Billing
- Leverages cloud service layer

Cost Based on Size of VW and How long they Run, not on complexity of the Query. Account Access and Security

• Billed By Seconds Network security and policies (Admin » Security » Network Policies)
• 60 Sec minimum • Allows to restrict access to SF based on IP address
This Book is created as part of Udemy course Snowflake SnowPro Core Certification COF-C02 Masterclass, please complete the course
before reading this book, if you got this file without the course then you can purchase the course here. [Link]
• SECURITYADMIN or Higher (eg ACCOUNTADMIN) can create network GRANT USAGE ON SCHEMA DUMMY.DUMMY_SC TO ROLE DUMMY_ROLE;
policies
• Can be set at ACCOUNT or USER level Type of table privilege
• Only one Policy per ACCOUNT, or per USER SELECT
TRUNCATE
• Supports CIDR Notation
DELETE
• Blocked IP gets preference OWNERSHIP
• Support only Internet Protocol version 4 (i.e. IPv4) addresses, not IPv6 Etc…

CREATE NETWORK POLICY mypolicy1

ALLOWED_IP_LIST=('[Link]/24')
BLOCKED_IP_LIST=('[Link]');
System Defined and Custom Roles

Multi-Factor Authentication (MFA) :Powered By Duo Security Service

• Managed by Snowflake, User can self enroll

• Not enabled by default Securable object: An entity to which access can be granted. Unless allowed by a grant,
• Recommended to have MFA for all ACCOUNTADMIN role access is denied.
• Ideally for Snowsight but fully supported for Snowsql, JDBC/ODBC
drivers Role: An entity to which privileges can be granted. Roles are in turn assigned to users.
Note that roles can also be assigned to other roles, creating a role hierarchy.

Privilege: A defined level of access to an object. Multiple distinct privileges may be used
to control the granularity of access granted.

User: A user identity recognized by Snowflake, whether associated with a person or

program.

Account roles ORGADMIN : At Org level, Can create Account, can run SHOW ORGANIZATION
To permit SQL actions on any object in your account, grant privileges on the object to an ACCOUNTS, SHOW REGIONS
account role. ACCOUNTADMIN : Parent to SYSADMIN and SECURITYADMIN
SECURITYADMIN : Manages GRANTS, Manage USERS/ROLES, PARENT OF USERADMIN
Database Roles SYSADMIN : Create DB, WH, Assign all custom role to SYSADMIN
Federated Authentication (SSO) : In a federated environment, user authentication is To limit SQL actions to a single database, as well as any object in the database, grant USERADMIN : CREATE USER/ROLE
separated from user access through the use of one or more external entities that privileges on the object to a database role in the same database. PUBLIC : Pseudo Role, Auto granted to all
provide independent authentication of user credentials.
Primary Role Role Hierarchy and Privilege Inheritance:
Service Provider: The default assumed Role USE ROLE <ROLE NAME>
In a Snowflake federated environment, Snowflake serves as the SP.
Identity provider (IdP): Secondary Role
Creating and maintaining user credentials and other profile information. Can assign any number of secondary role
Authenticating users for SSO access to the SP.
Privileges are granted to roles, and roles are granted to users, to specify the operations
Snowflake Supports SCIM 2.0, Native app support : Microsoft and Okta
that the users can perform on objects in the system.

Access Control Framework (DAC + RBAC) -- Grant ROLE To new user

Discretionary Access Control (DAC): Each object has an owner, who can in turn grant GRANT ROLE DUMMY_ROLE TO USER USER1;
access to that object.
Role-based Access Control (RBAC): Access privileges are assigned to roles, which are in -- Grant SELECT on TABLE
turn assigned to users. GRANT SELECT ON TABLE DUMMY.DUMMY_SC.cust_det TO ROLE DUMMY_ROLE;

-- USAGE on DB and SCHEMA Needed in order to access TABLE

GRANT USAGE ON DATABASE DUMMY TO ROLE DUMMY_ROLE;
This Book is created as part of Udemy course Snowflake SnowPro Core Certification COF-C02 Masterclass, please complete the course
before reading this book, if you got this file without the course then you can purchase the course here. [Link]
Database Role:

-- Find the login history at account level

SELECT * FROM account_usage.login_history;

Access History : (Enterprise + feature) Access History in Snowflake refers to when the
user query reads data and when the SQL statement performs a data write operation,
such as INSERT, UPDATE, and DELETE along with variations of the COPY command,
from the source data object to the target data object.

• Needs Enterprise Edition or Higher

• Maintains history of read and write operations
• Used for compliance auditing
• Stores data on source column, object accessed directly and indirectly
An organization is First-class Snowflake object that links the accounts owned by the org
Column Level Security : (Enterprise + feature) Organizations simplify Account management and billing, Replication & Failover/Failback,
Columns in Access_History
Snowflake Secure Data Sharing and administration tasks.
Dynamic Data Masking (Needs Enterprise Edition or Higher) - Use ORGADMIN Role
- Use masking policies to selectively mask column data QUERY_ID | QUERY_START_TIME | USER_NAME | DIRECT_OBJECTS_ACCESSED |
Account identifier uniquely identify the SF account: BASE_OBJECTS_ACCESSED | OBJECTS_MODIFIED | OBJECT_MODIFIED_BY_DDL |
External Tokenization Format: <account_identifier>.[Link] POLICIES_REFERENCED
- Tokenization is the process of removing sensitive data by replacing it with an
undecipherable token <account_identifier > = <orgname>-<account_name> Code example for access_history :

SELECT * FROM SNOWFLAKE.ACCOUNT_USAGE.ACCESS_HISTORY

WHERE USER_NAME = 'TRAINING' ORDER BY QUERY_START_TIME;

Information Schema: Object Tag: (Enterprise + feature)

The Snowflake Information Schema (aka “Data Dictionary”) consists of a set of system-
defined views and table functions that provide extensive metadata information about
Object tags in Snowflake are schema-level objects that allow you to attach metadata
the objects created in your account. The Snowflake Information Schema is based on the
to other Snowflake objects, such as tables, views, and columns. Max 50 tags per
SQL-92 ANSI Information Schema, but with the addition of views and functions that are
specific to Snowflake. object.

• Read Only Schema CREATE TAG IF NOT EXISTS COST ALLOWED_VALUES 'MKT','ADVT' COMMENT =
• Available for every DB ‘comment’;
• Maintains data about all DB object and some account level information, In
form of Views
• Table function for historical and usage data across your account Performance & Cost Optimization Concepts
• Need Privilege
Query Profile: It provides a graphical representation of the main components of the
-- Find the login history with information schema processing plan for the query, with statistics for each component, along with details
Row Level Security : (Enterprise + feature)
SELECT * FROM TABLE(information_schema.login_history()); and statistics for the overall query.
Only relevant rows are shown to authorized person
Account Usage:
Views that display object metadata and usage metrics for your account.
- Type: Account_Usage and Readers_Account_Usage
This Book is created as part of Udemy course Snowflake SnowPro Core Certification COF-C02 Masterclass, please complete the course
before reading this book, if you got this file without the course then you can purchase the course here. [Link]
Warehouse Cache (Local Disk Cache) Maximum number of clusters, greater than 1 (up to 10).
At Query Processing Layer Minimum number of clusters, equal to or less than the maximum (up to 10).
Local to the WH
Purged when WH is suspended or shut Additionally, multi-cluster warehouses support all the same properties and actions as
Need to find whether it is worth to auto-suspend a WH single-cluster warehouses, including:

Specifying a warehouse size.

Query Pruning: Skipping unnecessary scan of micro-partition is Query Pruning
Resizing a warehouse at any time.
Improves the performance.
Auto-suspending a running warehouse due to inactivity; note that this does not apply
Virtual Warehouse Config Options : to individual clusters, but rather the entire multi-cluster warehouse.
Auto-resuming a suspended warehouse when new queries are submitted.

Scale Out) (Enterprise + feature)) in Snowflake refers to the ability of Snowflake to

automatically add or remove clusters (As per the demand) to a virtual data
warehouse based on the query load or concurrent user.

Scale Up in Snowflake means increasing the resources for a single warehouse. It’s like
EXPLAIN PLANS
upgrading from a SMALL to a MEDIUM or from a MEDIUM to a LARGE warehouse.
EXPLAIN [ USING {TABULAR | JSON | TEXT}] <statement>
Scale Down is exactly opposite of Scale up. Happens post completion of existing
Query History (Option under Activity)
queries.
Provides 14 days history
Scaling Policy:
QUERY_HISTORY (Under Information Schema)
Last 7 days activity

QUERY_HISTORY View (Under SF Account Usage)

Up to 365 days activity

Data Spilling: Data spilling occurs when an operation cannot fit in memory. The
system begins by spilling data to the local disk of a warehouse node then to remote
disks. These operations can significantly slow down query execution. To mitigate this,
it’s recommended to use a larger warehouse, process data in smaller batches, or
optimize the query.

Types of Cache:

You can monitor Warehouse load Admin>>Warehouse: You can check warehouse
Query Result Cache (USE_CACHED_RESULT)
wise Query history.
In Cloud services layer
Expires in 24 hours
Work for exact same query
Doesn’t work with functions, which evaluates at runtime e.g. CURRENT_TIMESTAMP()
and UUID_STRING(), CURRENT_DATE() is exception
Doesn’t work with external functions
Doesn’t work when, table change, config change, Micro-partitions change

Metadata Cache Multi Cluster Warehouse:

In Cloud services layer Resource Monitor:
Stores Metadata Information Multi-cluster warehouses, Snowflake supports allocating, either statically or
Warehouse not needed dynamically, additional clusters to make a larger pool of compute resources available. • Monitoring credit usage by user managed WH and cloud services (Not
Row Count, MIN/MAX values, DISTINCT count etc. Serverless)
Options available: • Can set at Account level or WH level
This Book is created as part of Udemy course Snowflake SnowPro Core Certification COF-C02 Masterclass, please complete the course
before reading this book, if you got this file without the course then you can purchase the course here. [Link]
• Can set Credit Quota (Eg, 100 Credit)
• Monitor consumption by User managed WH and Cloud services
• You can define frequency at which monitoring resets
• (Daily, Weekly, Monthly, Yearly, Never)
• Action on reaching threshold (Suspend Immediately,Suspend,Notify)

Query Acceleration Service (QAS) (Enterprise + feature)

• QAS can accelerate part of WH workload

• Reduces the impact of outlier query
• Works for ( Ad hoc Analysis, Unpredictable data volume per query, Queries
with large scan and selective filter)

SF Performance Optimization for below areas:

Storage: Clustering, QAS, SOS, Materialized View, Purge stage data, use of temp table,
Cost Consideration during replication and failover
Save on time travel (DATA_RETENTION_TIME_IN_DAYS), Monitor memory
consumption.
Data Transfer: Initial replication, Subsequent Sync FILE FORMAT:
Compute: To identify delta, to copy
Warehouse : Reduce Queues, Reduce spilling, Right size WH, WH cache optimization,
Storage: Target account Supported File Type for loading : CSV (Default),TSV, JSON, PARQUET, AVRO, ORC, XML
limit concurrent queries (MAX_CONCURRENCY_LEVEL),Separate WH , Monitor WH,
Control Cost: Select objects, adjust the refresh frequency Supported File Type for Unloading : CSV (Default),TSV,JSON,PARQUET
Resource Monitor.
Monitor and Control Cost Compression : AUTO (Default) | GZIP | BZ2 | BROTLI | ZSTD | DEFLATE | RAW_DEFLATE | NONE
Query : Query Profile stat, Query History, Avoid SELECT *, Use filter.
Resource Monitor : To monitor WH, can notify, suspend Data can be loaded/unloaded with UI or SQL statement.
Cost Optimization :
Budget : At account level, Send mail, also forecasting feature
Check Cost Insights in Admin>>Cost Mgmt >> Account Overview File Format definition (Example)
Attributing Costs
CREATE OR REPLACE FILE FORMAT my_csv_ff
Tagging : For logical bifurcation and chargeback using query/object tag TYPE = CSV
FIELD_DELIMITER = ','
To find cost use Views TAG_REFERENCES, WAREHOUSE_METERING_HISTORY, SKIP_HEADER = 1
QUERY_ATTRIBUTION_HISTORY from ACCOUNT_USAGE. NULL_IF = ('NULL', 'null')
EMPTY_FIELD_AS_NULL = true
Data Loading and Unloading COMPRESSION = NONE;

Stage : “Snowflake refers to the location of data files in cloud storage as a stage” Describe FF
DESC FILE FORMAT my_csv_ff_new;

Show FF
SHOW FILE FORMATS;
Types of objects to be used :
Drop FF
DROP FILE FORMAT my_csv_ff_new;

FF can be set at COPY INTO, STAGE and TABLE DEFINITION.

Put Command
PUT [Link] Path>CSV @~/customers_user/staged AUTO_COMPRESS = FALSE
OVERWRITE = TRUE;

COPY INTO example:

COPY INTO mytable

This Book is created as part of Udemy course Snowflake SnowPro Core Certification COF-C02 Masterclass, please complete the course
before reading this book, if you got this file without the course then you can purchase the course here. [Link]
FROM s3://mybucket/data/files Frequency Estimate: Uses Space Saving Algo to approximate frequency of value in the Example : Example : SELECT province, store_ID, profit,100 * RATIO_TO_REPORT(profit)
STORAGE_INTEGRATION = myint dataset. Function like: APPROX_TOP_K, OVER (PARTITION BY province) AS percent_profit FROM store_profit ORDER BY
ENCRYPTION=(MASTER_KEY = 'eSx...') APPROX_TOP_K_ACCUMULATE/COMBINE/ESTIMATE province, store_ID;
FILE_FORMAT = (FORMAT_NAME = my_csv_format);
Percentile Estimate: Uses T-digest Algo to approximate percentile values in the dataset. System Function : Control Functions that allows you to execute action in the system.
Various option with COPY INTO ON_ERROR Options: Example : SELECT SYSTEM$ABORT_SESSION(1065153868222);
SAMPLING :
ON_ERROR = { CONTINUE | SKIP_FILE (DEFAULT IN SNOWPIPE) | SKIP_FILE_<num> | Table Function : A table function returns a set of rows for each input row. The returned
'SKIP_FILE_<num>%' | ABORT_STATEMENT (DEFAULT IN BULK LOADING) } FIXED-SIZE set can contain zero, one, or more rows. Each row can contain one or more columns.
SIZE_LIMIT = <num> Fixed Numbers of rows are returned Example : VALIDATE
RETURN_FAILED_ONLY = TRUE | FALSE (DEFAULT) Example: SELECT COUNT(*) FROM SNOWFLAKE_SAMPLE_DATA.TPCH_SF1.CUSTOMER
MATCH_BY_COLUMN_NAME = CASE_SENSITIVE | CASE_INSENSITIVE | NONE SAMPLE(100 ROWS); UDF and UDTF:
(DEFAULT) --- “With user-defined functions (UDFs), you can extend the system to perform operations
ENFORCE_LENGTH = TRUE (DEFAULT) | FALSE FRACTION BASED that are not available through the built-in, system-defined functions provided by
TRUNCATECOLUMNS = TRUE | FALSE BERNOULLI | ROW : (p/100)*n, Number of rows vary, row wise Snowflake.”
FORCE = TRUE | FALSE Example: SELECT COUNT(*) FROM SNOWFLAKE_SAMPLE_DATA.TPCH_SF1.CUSTOMER UDF with returning Table is called UDTF.
PURGE = TRUE | FALSE TABLESAMPLE BERNOULLI (20.3) SEED(10); Languages Support: SQL, JS, Python, Java

SYSTEM | BLOCK : Faster, Might be biased, block wise Example: SELECT [Link].udf_sample()
VALIDATE and VALIDATION_MODE
Example: SELECT COUNT(*) FROM SNOWFLAKE_SAMPLE_DATA.TPCH_SF1.CUSTOMER Stored Procedure:
VALIDATION_MODE = RETURN_n_ROWS | RETURN_ERRORS | RETURN_ALL_ERRORS TABLESAMPLE(1); A stored procedure is a named, precompiled collection of SQL statements and control-of-
flow language constructs that are stored in a database and can be called from SQL CALL
Example : COPY INTO customers_named Sampling on Join is possible if, BERNOULII/ROW BASED, No Seed Used. command, including other stored procedures.
FROM @internal_named_stage SAMPLE and TABLESAMPLE are synonymous. Languages Support: SQL, JS, Python (These 3 via Snowpark), Java, Scala
FILE_FORMAT = my_csv_ff
ON_ERROR = CONTINUE Supporting Function Types: UDF Vs SP:
RETURN_FAILED_ONLY = TRUE;
Feature User-Defined Function (UDF) Stored Procedure
VALIDATE: Validates the files loaded in a past execution of the COPY INTO <table> Return Value Must return a value Not Mandatory to return a value
command Use case When part of SQL stmt For DML, DDL Operations
Example: SELECT * FROM TABLE (VALIDATE (customers_named, JOB_ID => '_last')) Java, JS, Python, Scala, Snowflake
Languages Java, JS, Python, Scala, SQL
Scripting
Unloading data : Can be called for every row of a query Can only be executed as a
- Bulk(Table) unloading with COPY INTO <Location> Granularity
result set standalone command
- For getting single file from multiple files use SINGLE = TRUE, default is FALSE Inside Single SP with a statement, Can
- Can partition the data by PARTITION BY Multiple UDFs can be called
SELECT have nested SP
DB Access May not have access Has access
GET Command
GET @internal_named_stage file ://<File_path>;
Semi-Structured Data : Semi-structured data is data that does not conform to the
-- INSERT OVERWRITES the existing data in the employees table with the new data from
standards of traditional structured data, but contains tags (labels) or other types of
the select statement.
mark-up that identify individual, distinct entities within the data.
INSERT OVERWRITE INTO employees SELECT id, name, department FROM employees
WHERE department = 'Engineering';
Snowflake provides native data types (ARRAY, OBJECT, and VARIANT)
Snowflake supports querying the data
Data Transformations
Supported Data formats (For Load)
Scalar Function: Returns 1 value per invocation.
- JSON
Estimation Functions: Example: SELECT CEIL(135.135); Returns >> 136
- Avro
Distinct Count: Uses HyperLogLog (HLL) Algo to count distinct value in a column.
- ORC
Aggregate Function: Returns 1 value per group of rows.
- Parquet
Similarity Estimate: Uses MinHash Function to find similarity between 2 datasets. Example: SELECT AVG(salary) from <table>;
- XML
Window Function : Aggregate functions that can operate on a subset of rows within the
Supported Data formats (For Unload)
set of input rows.
- JSON
This Book is created as part of Udemy course Snowflake SnowPro Core Certification COF-C02 Masterclass, please complete the course
before reading this book, if you got this file without the course then you can purchase the course here. [Link]
- Parquet

In a VARIANT column, JSON null values are stored as a string containing the word “null”

DATA ENCRYPTION:

At Rest : Encrypted with AES 256 Bit

At Transit : TLS 1.2

Periodic Key Rotation

- Snowflake auto-rotates the key every 30 days
Example of flattening of data: - Post 30 days, Active Keys get retired
CREATE OR REPLACE table dump_json as
SELECT src:Department::string as dep, Periodic Rekeying (Enterprise + feature)
[Link]:SubZone :: string as subzone_col, - Manual Enablement
[Link]:Store :: string as store_col, - Destroys retired key post one year
[Link]:SR :: string as sr_col, - Ensures that data is encrypted with latest tech
[Link]:Sales :: string as sales_col
Directory :
• Stores file level meta data about files in stage Tri-Secret Secure (Business Critical Edition on Higher)
FROM raw_source, - Combination of CMK and SMK
LATERAL FLATTEN(INPUT => SRC:Zone) as Zone, • Implicit object layered on stage
• Can be enabled for external/internal stage - Composite Master Key is used
LATERAL FLATTEN(INPUT => [Link]:Division) as div, - If CMK is revoked, You cant decrypt data in Snowflake
LATERAL FLATTEN(INPUT => [Link]:Orders) as ord; • Stores URL by BUILD_STAGE_FILE_URL
- Provides 3 levels of data protection: encryption, authentication, and control
• Can be enabled during stage creation or later (by alter)
Predicates determine the data types for values in semi-structured data. • Data can be refreshed manually or automatically (only for cloud i.e external
Eg. SELECT * FROM new_vartab WHERE IS_ARRAY(data) stages)
• No explicit privilege required to access
Full list here : [Link]

Unstructured data : Unstructured data is information that does not fit into a predefined Example Code :
data model or schema.
Example : PDF, Image, Video, VCF (genomics), KDF (semiconductors),HDF5 (aeronautics) CREATE OR REPLACE STAGE internal_UN_stage
etc. ENCRYPTION = (TYPE = 'SNOWFLAKE_SSE')
DIRECTORY=(ENABLE=TRUE);
SF Supports for Unstructured data :
Data Protection and Data Sharing
• Securely access data files located in cloud storage.
• Share file access URLs with collaborators and partners. Continuous Data Protection with Snowflake:
• Load file access URLs and other file metadata into Snowflake tables.
• Process Unstructured data.
This Book is created as part of Udemy course Snowflake SnowPro Core Certification COF-C02 Masterclass, please complete the course
before reading this book, if you got this file without the course then you can purchase the course here. [Link]
- Source in past doesn’t exists Secure Data Sharing (Not available for VPS): Secure Data Sharing lets you share
- Pipe object with AUTO_INGEST = TRUE recreated selected objects in a database in your account with other Snowflake accounts. No data
gets copied or replicated.
Code : CREATE OR REPLACE TABLE clone_data CLONE sample_data;
Secure data sharing can be done with other snowflake account or readers (non-sf)
REPLICATION :
account.

Below objects can be shared:

• Tables
• External tables
• Secure views
• Secure materialized views
• Secure UDFs
•
When data shared with other SF account, consumer pays for the compute.
When data shared with none SF account, provider pays for the compute.

Data Sharing Considerations:

• Shared databases are read-only. Users in a consumer account can

CLONING: view/query data, but cannot insert or update data, or create any objects in
Cloning in Snowflake refers to the process of creating an exact copy the database
of a database, schema, or table. This feature can be used to create a • No limit on Number of Shares or Number of accounts it is shared with
backup of existing data, test new data structures, or create a new • Only for DB objects • Can not create a clone of a shared database or any schemas/tables in the
environment for development or testing purposes. • Privileges granted on DB are not replicated database
• Error when replicating to lower edition (have workaround) • Not supported, Time Travel for a shared database or any schemas/tables in
Cloning doesn’t copy the load metadata, hence any file loaded in original table, can be • Can replicate to multiple accounts the database
loaded in cloned table. • Data transfer and Data Compute cost • Not supported, Editing the comments for a shared database.
• Shared databases and all the objects in the database cannot be re-shared
Objects which can be cloned and which can’t be cloned shown below. with other accounts
Below objects are not supported for Replications: • Shared databases cannot be replicated
• Can consume as many as shares you want but can create only db per share
• External tables • Can’t share from Business critical to non-Business critical account, can
• Event tables override settings
• Stages • Sharing happens between same cloud provider and same region (For cross
• Pipes region, need to replicate first)
• DBs created from SHARE • Sharing TO and FROM VPS is not supported

Data Provider : Under Data > Provider Studio

• Need ACCOUNTADMIN or relevant Auth

• Can create and publish listing
• Monitor the performance of listing

Cloning with Time-Travel is only available for Only available for Database, Schema, Table
& Streams
Cloning fails if
This Book is created as part of Udemy course Snowflake SnowPro Core Certification COF-C02 Masterclass, please complete the course
before reading this book, if you got this file without the course then you can purchase the course here. [Link]

Best wishes for your exam. For any feedback or

corrections, you can always reach us at
learn.anywhere1@[Link]

Snowflake Architecture and Features Overview
No ratings yet
Snowflake Architecture and Features Overview
8 pages
Snowflake Data Warehouse Overview
No ratings yet
Snowflake Data Warehouse Overview
6 pages
Snowflake Data Management Features
No ratings yet
Snowflake Data Management Features
16 pages
Snowflake Cloud Data Warehouse Overview
No ratings yet
Snowflake Cloud Data Warehouse Overview
73 pages
Snowflake Database Overview and Features
No ratings yet
Snowflake Database Overview and Features
10 pages
Snowflake Database Architecture Overview
No ratings yet
Snowflake Database Architecture Overview
20 pages
Snowflake Programming Masterclass 2024
100% (1)
Snowflake Programming Masterclass 2024
342 pages
SnowPro Core Certification Cheat Sheet
100% (1)
SnowPro Core Certification Cheat Sheet
7 pages
Snowflake SnowPro Certification Cheat Sheet
83% (12)
Snowflake SnowPro Certification Cheat Sheet
7 pages
Snowflake
No ratings yet
Snowflake
7 pages
Snowflake Data Warehousing Essentials
100% (1)
Snowflake Data Warehousing Essentials
40 pages
Kent Graziano UKDVUG Snowflake-Features-18NOV20
No ratings yet
Kent Graziano UKDVUG Snowflake-Features-18NOV20
40 pages
Snowflake Basics
No ratings yet
Snowflake Basics
11 pages
Snowflake Data Warehousing Overview
No ratings yet
Snowflake Data Warehousing Overview
10 pages
Snowflake L1 Interview QA Acxchange
No ratings yet
Snowflake L1 Interview QA Acxchange
10 pages
Snowflake Data Storage and Management Guide
No ratings yet
Snowflake Data Storage and Management Guide
34 pages
Snowflake Data Loading Techniques Guide
No ratings yet
Snowflake Data Loading Techniques Guide
52 pages
Snowflake Data Access and Management
No ratings yet
Snowflake Data Access and Management
22 pages
Understanding Snowflake Stages Explained
No ratings yet
Understanding Snowflake Stages Explained
51 pages
Snowflake Architecture and Key Concepts
No ratings yet
Snowflake Architecture and Key Concepts
3 pages
Snowflake Interview Q&A Cheat Sheet
No ratings yet
Snowflake Interview Q&A Cheat Sheet
2 pages
Snowflake Architecture and Data Management
100% (1)
Snowflake Architecture and Data Management
51 pages
Ilovepdf Merged
No ratings yet
Ilovepdf Merged
71 pages
Detailed Snowflake Notes
No ratings yet
Detailed Snowflake Notes
8 pages
Snowflake Guide: From Beginner to Expert
No ratings yet
Snowflake Guide: From Beginner to Expert
35 pages
Snowflake Architecture Overview
No ratings yet
Snowflake Architecture Overview
220 pages
Snowflake Notes
No ratings yet
Snowflake Notes
17 pages
Snowflake Guide: Architecture & Performance
No ratings yet
Snowflake Guide: Architecture & Performance
11 pages
Snowflake vs. Redshift vs. BigQuery
No ratings yet
Snowflake vs. Redshift vs. BigQuery
4 pages
Snowflake Data Migration and Architecture Guide
No ratings yet
Snowflake Data Migration and Architecture Guide
102 pages
Snowflake Data Engineering Course Outline
No ratings yet
Snowflake Data Engineering Course Outline
4 pages
Snowflake 03
No ratings yet
Snowflake 03
10 pages
Snowflake Data Warehouse Overview
No ratings yet
Snowflake Data Warehouse Overview
16 pages
Snowflake - Interview Questions
No ratings yet
Snowflake - Interview Questions
15 pages
Snowflake Cloud Data Platform Overview
No ratings yet
Snowflake Cloud Data Platform Overview
108 pages
Snowflake Architecture and Features Explained
No ratings yet
Snowflake Architecture and Features Explained
4 pages
Snowflake Data Pipeline Overview
No ratings yet
Snowflake Data Pipeline Overview
16 pages
SnowPro Core Certification Cheat Sheet
No ratings yet
SnowPro Core Certification Cheat Sheet
2 pages
E-Invoicing and FX Rate Data Pipelines
No ratings yet
E-Invoicing and FX Rate Data Pipelines
12 pages
Comprehensive Snowflake MasterClass Guide
33% (3)
Comprehensive Snowflake MasterClass Guide
192 pages
Snowflake 10 Page Study Guide
No ratings yet
Snowflake 10 Page Study Guide
10 pages
Snowflake Data Cloud Overview
No ratings yet
Snowflake Data Cloud Overview
2,157 pages
Snowflake: Company Origins and Features
0% (1)
Snowflake: Company Origins and Features
15 pages
Snowflake Guide: From Basics to Mastery
No ratings yet
Snowflake Guide: From Basics to Mastery
11 pages
Snowflake Data Engineer Course Syllabus
No ratings yet
Snowflake Data Engineer Course Syllabus
4 pages
Snowflake Overview and Key Features
No ratings yet
Snowflake Overview and Key Features
20 pages
Understanding Snowflake Architecture
100% (1)
Understanding Snowflake Architecture
35 pages
Snowflake Data Loading and Staging Guide
No ratings yet
Snowflake Data Loading and Staging Guide
25 pages
Snowflake Data Warehousing Overview
No ratings yet
Snowflake Data Warehousing Overview
90 pages
Keboola Best Practices Guide
No ratings yet
Keboola Best Practices Guide
76 pages
Snowflake Interview Preparation Guide
No ratings yet
Snowflake Interview Preparation Guide
14 pages
Snowflake Object Management Guide
No ratings yet
Snowflake Object Management Guide
10 pages
Understanding Snowflake Data Warehouse
No ratings yet
Understanding Snowflake Data Warehouse
34 pages
Snowflake Cheatsheet
No ratings yet
Snowflake Cheatsheet
10 pages
Nexans Expert Tool Kit Compressed
No ratings yet
Nexans Expert Tool Kit Compressed
12 pages
Kindergarten Addition Lesson Plan
No ratings yet
Kindergarten Addition Lesson Plan
2 pages
Recommended Verbs for Learning Outcomes
No ratings yet
Recommended Verbs for Learning Outcomes
3 pages
UP65EV7555 Vehicle Registration Details
No ratings yet
UP65EV7555 Vehicle Registration Details
1 page
Vision Hema 8: Blood Cell Analysis Solution
No ratings yet
Vision Hema 8: Blood Cell Analysis Solution
2 pages
SADS Workshop Overview and Procedures
No ratings yet
SADS Workshop Overview and Procedures
89 pages
Data Types and Collection Methods
No ratings yet
Data Types and Collection Methods
39 pages
Class XI Computer Science Exam Blueprint
No ratings yet
Class XI Computer Science Exam Blueprint
1 page
Potentiometry and Colorimetry Overview
No ratings yet
Potentiometry and Colorimetry Overview
11 pages
Overview of Volkswagen Group
No ratings yet
Overview of Volkswagen Group
15 pages
Understanding Single Phase Transformers
No ratings yet
Understanding Single Phase Transformers
73 pages
Auto-Turret Digital Brinell Tester
No ratings yet
Auto-Turret Digital Brinell Tester
3 pages
Tender for Emergency Operations Centre
No ratings yet
Tender for Emergency Operations Centre
24 pages
EL-FLOW Base Manual
No ratings yet
EL-FLOW Base Manual
36 pages
Red Hat Linux Cluster Command Guide
100% (1)
Red Hat Linux Cluster Command Guide
8 pages
Nutritional Treatment of Coronavirus
No ratings yet
Nutritional Treatment of Coronavirus
9 pages
Sma 805 General Topology
No ratings yet
Sma 805 General Topology
2 pages
Types of Computer Aided Process Planning
No ratings yet
Types of Computer Aided Process Planning
2 pages
Overview of Vitamin B Complex
No ratings yet
Overview of Vitamin B Complex
9 pages
Delivery Manager Roles and Responsibilities
No ratings yet
Delivery Manager Roles and Responsibilities
4 pages
EARP Project Brief for Gakenke Area
No ratings yet
EARP Project Brief for Gakenke Area
61 pages
Logistics Specialist Christa Jones
No ratings yet
Logistics Specialist Christa Jones
2 pages
Auto Scaling Basics on HUAWEI CLOUD
No ratings yet
Auto Scaling Basics on HUAWEI CLOUD
43 pages
Word Search: Communication Types & Principles
No ratings yet
Word Search: Communication Types & Principles
1 page
Operations Improvement Strategies Explained
No ratings yet
Operations Improvement Strategies Explained
41 pages
DTALite: Open-Source Traffic Modeling
No ratings yet
DTALite: Open-Source Traffic Modeling
240 pages
Testbank Microeconomics 8 Ed Pindyck Fast Download
No ratings yet
Testbank Microeconomics 8 Ed Pindyck Fast Download
207 pages
Yanitza Ruiz: Experienced Administrative Professional
No ratings yet
Yanitza Ruiz: Experienced Administrative Professional
2 pages
Adaptive Luenberger Observer for PMSM Control
No ratings yet
Adaptive Luenberger Observer for PMSM Control
6 pages
Qatar Cabinet Reshuffle and Updates
No ratings yet
Qatar Cabinet Reshuffle and Updates
20 pages

Snowflake Snowpro Core Ebook

Uploaded by

Snowflake Snowpro Core Ebook

Uploaded by

This Book is created as part of Udemy course Snowflake SnowPro Core Certification COF-C02 Masterclass, please complete the

Bulk Loading Continuous Loading Types CREATE OR REPLACE TASK TASK_DEMO_S

Example of PIPE: Append only Stream:

File Considerations while loading

Snowflake stores MP metadata: --Local Schema

Clustering SELECT * FROM

Code for SOS : • Staged File Costs

CREATE NETWORK POLICY mypolicy1

Multi-Factor Authentication (MFA) :Powered By Duo Security Service

• Managed by Snowflake, User can self enroll

User: A user identity recognized by Snowflake, whether associated with a person or

Access Control Framework (DAC + RBAC) -- Grant ROLE To new user

-- USAGE on DB and SCHEMA Needed in order to access TABLE

-- Find the login history at account level

• Needs Enterprise Edition or Higher

SELECT * FROM SNOWFLAKE.ACCOUNT_USAGE.ACCESS_HISTORY

Information Schema: Object Tag: (Enterprise + feature)

Specifying a warehouse size.

Scale Out) (Enterprise + feature)) in Snowflake refers to the ability of Snowflake to

QUERY_HISTORY View (Under SF Account Usage)

Metadata Cache Multi Cluster Warehouse:

Query Acceleration Service (QAS) (Enterprise + feature)

• QAS can accelerate part of WH workload

SF Performance Optimization for below areas:

FF can be set at COPY INTO, STAGE and TABLE DEFINITION.

COPY INTO example:

COPY INTO mytable

At Rest : Encrypted with AES 256 Bit

Periodic Key Rotation

Below objects can be shared:

Data Sharing Considerations:

• Shared databases are read-only. Users in a consumer account can

Data Provider : Under Data > Provider Studio

• Need ACCOUNTADMIN or relevant Auth

Best wishes for your exam. For any feedback or

You might also like