OVERVIEW
Step 1: go ahead to [Link] and create a trial account
Step 2: download and install SnowSQL
WHAT’S SNOWFLAKE
Snowflake is a cloud-based data warehouse. It is a fully managed SQL database built for the
cloud that supports both structured and semi-structured (i.e. JSON, XML, Parque etc) data
Snowflake enables data storage, processing, and analytic solutions that are faster, easier to
use, and far more flexible than traditional data warehousing [Link] provides all of the
functionality of a traditional enterprise analytic database, along with many additional special
features and unique capabilities.
Snowflake is a true SaaS offering 1(check out the foot note for further readings on SaaS)2. More
specifically:
● There is no hardware (virtual or physical) to select, install, configure, or manage.
● There is virtually no software to install, configure, or manage.
1
What’s SaaS?
2
SaaS Explained
● Ongoing maintenance, management, upgrades, and tuning are handled by Snowflake.
Snowflake uses virtual compute instances for its compute needs and a storage service for
persistent storage of data.
Accessing Snowflake
You can access and operate on Snowflake through 5 different ways:
1. Snowflake Web Interface–the Classic UI or Snowsight as Snowflake calls it
2. Snowflake CLI (Command Line Interface)--SnowSQL
3. ODBC and JDBC drivers
4. Native connectors
5. Third-party connectors
6. Code Editor or IDE (Integrated Development Environment)
1. Snowflake Web Interface (SnowSight)
All aspects of managing and using Snowflake can be accessed via the Snowflake Web Interface
Snowsight
Snowsight is the Snowflake web interface that’s used to perform various Snowflake operations,
such as:
● Building and running queries.
● Loading data into tables.
● Monitoring query performance and copy history.
● Creating and managing users and other account-level objects.
● Creating and using virtual warehouses.
● Creating and modifying databases and all database objects.
● Sharing data with other Snowflake accounts.
● Exploring and using the Snowflake Marketplace.
Go to [Link] and use your Snowflake credentials to sign in.
The Classic UI
2. Command Line Interface-CLI or SnowSQL
SnowSQL is Snowflake’s interactive CLI (Command Line Interface) Client. We can access all
aspects of managing and using Snowflake. That’s, almost anything you can do from
Snowflake’s web interface, you can do it in SnowSQL
SnowSQL has to be installed first.
You can find the installer executable from [Link]
Once you have installed SnowSQL, run the following commands to connect to your Snowflake
instance
snowsql -a [Link]-east-1 -u ashiferaw
3. ODBC and JDBC drivers
These are connectors/drivers that can be used by other applications (e.g. Tableau) to connect to
Snowflake
[Link]
4. Native Connectors
Native connectors are ones such as Python-based or Spark-based development kits that can be
used to develop applications for connecting to Snowflake
5. Third-Party Connectors
These are connectors can be used to connect applications such as ETL and orchestration tools
(e.g. Informatica, dbt) and BI tools (e.g. Power BI, Tableau) to Snowflake.
SNOWFLAKE ARCHITECTURE
Snowflake’s architecture is a hybrid of traditional shared-disk and shared-nothing database
architectures. Similar to shared-disk architectures, Snowflake uses a central data repository for
persisted data that is accessible from all compute nodes in the platform. But similar to
shared-nothing architectures, Snowflake processes queries using MPP3 (massively parallel
processing) compute clusters where each node in the cluster stores a portion of the entire data
set locally. This approach offers the data management simplicity of a shared-disk architecture,
but with the performance and scale-out benefits of a shared-nothing architecture.
3
Massively Parallel Processing (MPP)
Snowflake’s architecture consists of 3 layers:
1. Storage
2. Compute or Query Processing
3. Service or Cloud Services
1.The Storage Layer
When data is loaded into Snowflake, Snowflake reorganizes that data into its internal optimized,
compressed, columnar format. Snowflake stores this optimized data in cloud storage
2.The Compute Layer
Query execution is performed in the compute layer. Snowflake processes queries using “virtual
warehouses”. Each virtual warehouse is an MPP compute cluster composed of multiple
compute nodes allocated by Snowflake from a cloud provider.
Each virtual warehouse is an independent compute cluster that does not share compute
resources with other virtual warehouses (that’s why Snowflake has a shared-nothing
architecture). As a result, each virtual warehouse has no impact on the performance of other
virtual warehouses.
3.The Service Layer
The cloud services layer is a collection of services that coordinate activities across Snowflake.
These services tie together all of the different components of Snowflake in order to process user
requests, from login to query dispatch. The cloud services layer also runs on compute instances
provisioned by Snowflake from the cloud provider.
● Services managed in this layer include:
● Authentication
● Infrastructure management
● Metadata management
● Query parsing and optimization
● Access control
ACCESS MANAGEMENT
lorem ipsum
STAGES
Snowflake doesn’t allow any direct data loading to a table; it has to be done via a stage location.
Stages in Snowflake are locations used to store data. A stage is a database object and
represents a directory like location where all the data lands first before it gets loaded into tables
for further processing.
Before ingesting data into a table in Snowflake, data has to be loaded into a stage using the
PUT command, and then it has to be loaded into the table using the COPY INTO command.
Similarly, if we need to unload the data from a table, it has to be loaded into the stage using the
GET command, and then it has to be exported using the COPY INTO command. All this can
only be done using the SnowSQK CLI.
When you load the data from local to the stage, snowflake automatically compresses the file. If
you check the file in the stage after loading you will see an extension of .gz all the time inside
the stage.
There are two types of stages in Snowfake:
1. Internal Stages: if the data that needs to be loaded is stored inside and managed by
Snowflake.
2. External Stages: If the data that needs to be loaded into Snowflake is stored in other
cloud regions like AWS S3 or Azure or GCP
Internal Stages
There are 3 types of internal stages
1. User Stages: are tied to a specific user. Every user has a default stage created. This
cannot be modified or removed. We can copy the files to our user stages to load them
further into the table. Once the load is completed we need to ensure to remove these
files explicitly otherwise we need to pay for storage. Files in one user stage cannot be
accessed by another user. So if you need to load multiple tables from a specific user
then this is the best option. We need to refer to the user stages using ‘@~’
2. Table Stages: whenever a table is created, then automatically table stage is created.
Table stages are a lot like User stages except the fact that table stage for a particular
table cannot be accessed through another table
3. Internal Named Stages: these stages offer more flexibility compared to user or table
stages. These are some of the snowflake objects. So all the operations that can be
performed on objects can be performed on Internal named stages as well. We need to
create these stages manually and we can also specify the file format options while
creating the stage itself which is unlike the table or user stage. We need to refer to the
internal Named stages using ‘~’.
External Stages
If the files are located in an external cloud location, for example, if you need to load files from
AWS S3 into snowflake then an external stage can be used.
Unlike Internal stages, loading and unloading the data can be directly done using COPY INTO.
Get and Put commands are not supported in external stages. The external stage can be created
via Web user interface or SNOWSQL as well.
DATA LOADING
1.Snowpipe: Continuous Real Time Data Loading