• Data Storage in Snowflake
• Supported Data Types
• Defining Constraints and Using Operators
Learning
• Database and Type with Features
Objectives
• Table Types with Features
• Types of Views with Features and Applications
• Role Based Access Control + Access Management
When data is loaded into Snowflake, Snowflake reorganizes that data into its internal optimized,
compressed, columnar format. Snowflake stores this optimized data in cloud storage(AWS, GCP, AZURE).
Row-based storage is ideal for transactional databases with frequent insertions and updates, while
column-based storage excels in analytical workloads involving large-scale data scans and aggregations
Data files staged in Snowflake internal stages are not subject to the additional costs associated with Time
Travel and Fail-safe, but they do incur standard data storage costs
Feature (Row-Based) (Traditional RDBMS) (Columnar-Based) (Snowflake)
Storage Format Primarily row-based (row partitioning) Fully columnar storage
Data Retrieval Best for full record reads Best for accessing specific columns
Compression Moderate (block-level compression) High (columnar compression with encoding)
Slower for frequent small updates (uses micro-
Insert/Update Performance Faster for row-by-row inserts and updates
partitions)
Aggregation Queries Slower for large-scale aggregation Very fast due to columnar and MPP architecture
Storage Optimization Row-level locking and indexing Automatic clustering and partition pruning
Concurrency Good for mixed workloads (OLTP + OLAP) Excellent for concurrent analytic workloads
Best for operational systems, real-time Best for data warehousing, BI, and big data
Use Case Fit
transactions analytics
Cloud-native auto-scaling with separate
Scaling Shared-nothing, linear scaling
compute/storage
Minimal indexing, relies on columnar scan and
Indexing Heavily relies on indexes for performance
metadata
Example Scenario Bank transaction system, CRM systems Marketing analytics, reporting dashboards
Introduction
to Micro
partitions
Snowflake automatically organizes all data in tables into
micro-partitions, which are contiguous storage units
Size: Each micro-partition contains between 50MB and 500MB of uncompressed data.
Automatic Table Partitioning: Snowflake automatically creates and manages micro-partitions for all tables.
Columnar Storage: Micro-partitions store data in a columnar format, enabling efficient compression and
query optimization.
Metadata: Snowflake stores metadata for each micro-partition, including the range of values for each column,
distinct value counts, and other information that aids in query optimization.
Query Performance: Improve query performance through horizontal and vertical query pruning, scanning only
the needed micro-partitions for better query performance.
Immutable: Original micro-partitions in Snowflake remain immutable. When data changes new micro-
partitions are generated, and the existing ones are never altered.
Numeric Data Types
DECIMAL, NUMERIC , NUMBER: Default precision and scale are (38,0).
INT, INTEGER, BIGINT, SMALLINT, TINYINT, BYTEINT: Synonymous with NUMBER, except precision
and scale can’t be specified.
FLOAT , FLOAT4 , FLOAT8: Snowflake treats all three as 64-bit floating-point numbers.
String & Binary Data Types
VARCHAR: Default (and maximum) length is 16,777,216 bytes.
CHAR, CHARACTER: Synonymous with VARCHAR, except the default length is VARCHAR(1).
STRING, TEXT: Synonymous with VARCHAR.
BINARY: The maximum length is 8 MB.
Logical Data Types
BOOLEAN: Can have TRUE or FALSE values. BOOLEAN can also have an UNKNOWN value, which is
represented by NULL
Date and Time Data Types
DATE: DATE accepts dates in the most common forms (YYYY-MM-DD, DD-MON-YYYY, and so on).
TIME: Snowflake supports a single TIME data type for storing times in the form of HH:MI:SS
TIMESTAMP_LTZ , TIMESTAMP_NTZ , TIMESTAMP_TZ.
a. TIMESTAMP_LTZ: TIMESTAMP with local time zone; time zone, if provided, isn’t stored.
b. TIMESTAMP_NTZ: TIMESTAMP with no time zone; time zone, if provided, isn’t stored.
c. TIMESTAMP_TZ : TIMESTAMP with time zone.
DATETIME: It is an alias for TIMESTAMP_NTZ
Semi Structured Data Types
VARIANT: A VARIANT value can have a maximum size of up to 128 MB of uncompressed data.
OBJECT: An OBJECT value contains key-value pairs .
ARRAY: An array contains 0 or more pieces of data. Each element is accessed by specifying its
position in the array.
Snowflake supports the following constraint types : PRIMARY KEY, FOREIGN KEY, UNIQUE, NOT NULL but does
not enforce them, except for NOT NULL constraints, which are always enforced.
Snowflake supports constraints on permanent, transient, temporary, and hybrid tables
Operators in
Snowflake
Types of
Tables in
Snowflake
Standard table type in Snowflake. When you create a table without specifying the type, it becomes a permanent table
Data stored in permanent tables persists indefinitely (subject to your Snowflake account retention policies) and is
available for querying until explicitly dropped.
Permanent tables contribute to your overall storage costs.
Use Cases: General-purpose data storage, data warehousing, business intelligence, long-term data retention
Example: CREATE OR REPLACE TABLE PERMANENT_TBL (NAME VARCHAR(100))
Temporary tables exist only for the duration of the session in which they were created.
Once the session ends (e.g., you disconnect from Snowflake), the data and the table definition are automatically
dropped.
Temporary tables do not contribute to your long-term storage costs.
Use Cases: ETL/ELT staging, temporary workspaces for data transformation, storing intermediate query results.
Example: CREATE OR REPLACE TEMPORARY TABLE TEMP_TABLE (NAME VARCHAR(100));
Transient tables are an alternative to regular tables, designed for scenarios where data persistence isn’t a top
priority
Transient tables are similar to permanent tables with the key difference that they do not have a Fail-safe period.
Like permanent tables, transient tables add to the overall storage costs associated with your account.
Use Cases: transient tables are specifically designed for transitory data that needs to be maintained beyond each
session
Example: create transient table student (id number, name varchar(100));
Snowflake also supports creating
transient databases and schemas.
All tables created in a transient
schema, as well as all schemas
created in a transient database, are
transient by definition
After creation, transient tables
cannot be converted to any other
table type.
Temporary table can exist with
same name as permanent table
inside same schema however all
queries and other operations
performed in the session on the
table affect only the temporary
table.
Dynamic
Tables
A dynamic table materializes the results of a query that you specify and is automatically kept up
to date on a schedule.
Instead of creating a separate target table and writing code to transform and update the data in
that table, you can define the target table as a dynamic table, and you can specify the SQL
statement that performs the transformation.
Dynamic Tables are refreshed at a frequency you specify, called “target lag”
For example, you can specify that the data should be at most five minutes behind the updates to the base
table. Based on this target freshness, the automated process sets up refreshes so that the data in the
dynamic table is kept up to date within this target
Dynamic tables support two refresh modes
a. Incremental refresh
b. Full refresh
Dynamic table is used in below scenarios:
a. You don’t want to write code to track data dependencies and manage data refresh
b. You don’t need, or want to avoid, the complexity of streams and tasks.
c. You do need to materialize the results of a query of multiple base tables.
d. You don’t need fine-grained refresh schedule control and you just want to specify the target data freshness
for your pipelines.
Database Object that contains SQL query built over one or more tables.
Using a view, we can pull only selected columns from the underlying table and provide custom output to
users without letting them access table directly.
Created to access data from one or more tables in a
custom manner compared to accessing individual
tables directly to avoid running complex SQL
Non – statements by business users.
materialized Example: Employee table, department table, salary
Views table needs to be joined and provide comprehensive
view to manager to see the relevant columns only.
Store the results of the underlying query as a physical table within
Snowflake.
Joining of tables is not allowed.
Snowflake automatically refreshes materialized views after changes are
made to the base tables they depend on.
Materialized Example: Creating pre-aggregated data for dashboards and reporting
Views Available in Enterprise Edition and Higher.
Materialized view incurs both storage cost and compute cost.
• Limitations:
Can query single table only
Does not support normal joins or self joins
Does not support all aggregate and window functions
A secure view doesn’t allow users to see definition of view.
Useful for concept of data sharing with secure views only.
SECURE keyword needs to be used in view DDL to create
secure views.
Definition is exposed only to the authorized users.
Secure Views
The internals of a secure view are not exposed even in query
profile to any user.
For queries containing secure views snowflake doesn’t expose
the amount of data scanned or the total amount of data.
Zero-copy cloning in Snowflake creates a copy of a database, schema, or table without
physically duplicating the underlying data
We can maintain multiple copies of data at no extra cost hence zero copy clone
Changes made in the clone do not affect the original, and vice versa
Zero-copy cloning is used for various purposes like testing, development, and data analysis.
No additional storage costs are incurred until changes are made to the data
Enables testing and development environments without impacting production data.
Objects that can be cloned: Databases, Schemas , Tables, Streams , Stages , File Formats ,
Tasks.
Snowflake Time Travel enables accessing historical data (i.e. data that has been changed or deleted) at
any point within a defined period
Using Time Travel, you can perform the following actions within a defined period of time:
a) Query data in the past that has since been updated or deleted.
b) Create clones of entire tables, schemas, and databases at or before specific points in the past.
c) Restore tables, schemas, and databases that have been dropped.
Retention Period
Number of days for which historical data is preserved.
a) Standard Edition: 1 Day Retention Period (can be updated to 0)
b) Enterprise Edition and higher: 90 days retention period (can be updated b/w 0-90)
Retention period of 0 days means TT is disabled for the object
Higher the retention period, higher the storage cost
Method 1: At Specified Timestamp
SELECT * FROM my_table AT(TIMESTAMP => 'Wed, 26 Jun 2024 09:20:00 -0700'::timestamp_tz);
Method 2: At Some Time Ago
SELECT * FROM my_table AT(OFFSET => -60*5);
Method 3: Before executing any statement or query
SELECT * FROM my_table BEFORE(STATEMENT => '8e5d0ca9-005e-44e6-b858-a8f5b37c5726');
Method 4: Undrop objects under retention period
UNDROP TABLE ‘TABLE_NAME’;
UNDROP SCHEMA ‘SCHEMA_NAME’;
UNDROP DATABASE ‘DATABASE_NAME’;
ACCOUNTADMIN: The god-mode role that can do anything on the account-level. ACCOUNTADMIN role inherits
all the privileges that SECURITYADMIN and SYSADMIN have.
SECURITYADMIN - This role has the MANAGE GRANTS account-level privilege granted to it, allowing it to grant (or
revoke) grants on all objects in the account to (or from) other roles. Notice that SECURITYADMIN role inherits all
the privileges passed to it from the USERADMIN role.
USERADMIN - As its name suggests, this role is used for creating and managing users (with the help of CREATE
USER and CREATE ROLE account-level privileges.
SYSADMIN - This role is allowed to create all objects in the account (including databases, warehouses, and other
schema-level objects).
PUBLIC - All users in the account are granted the PUBLIC role by default, but unless explicitly granted access to
objects, it can not do much. Still, be very careful not to accidentally grant it privileges on important data, as all
users in the account will be able to access it
Discretionary Access Control (DAC)
Each object has an owner, who can in turn
grant access to that object.
Role-based Access Control (RBAC)
Access privileges are assigned to roles, which
are in turn assigned to users.
User-based Access Control (UBAC)
Access privileges are assigned directly to
users.
User - An entity that enables a person (or service) to connect to Snowflake
Object - An entity that a User can access (i.e. table, view, database), if they have the right privileges.
Privilege - An operation that a User could execute on an Object, if their Role has been granted it.
Role - A bridging entity between Users and Privileges. Privileges are granted to Roles, and Roles are granted
to Users.
Secondary Roles: Additional roles that can be activated alongside a user's primary role within a session
USE SECONDARY ROLES ALL: Activates all roles granted to the user as secondary roles, in addition to the
primary role.
USE SECONDARY ROLES NONE: Disables all secondary roles.
USE SECONDARY ROLES <role1>, <role2> : Activates a specific set of roles as secondary roles.
Future Grants: As new objects are created, the defined privileges are automatically granted to a role, simplifying
grant management
Example:
-- Grant the SELECT privilege on all new (future) tables in a schema to role R1
GRANT SELECT ON FUTURE TABLES IN SCHEMA s1 TO ROLE r1;
-- Granting USAGE and MONITOR on all future schemas in a database
GRANT USAGE, MONITOR ON FUTURE SCHEMAS IN DATABASE my_database TO ROLE admin_role;