Databricks Fundamentals
Data Lakehouse
Dalibor Wijas
November 2022
Introduction
2
Dalibor Wijas
Solution architect, Prague, Czech Republic
25 years in the software development industry
Worked for many customers from various industries in the field of e-commerce, data
warehousing, data engineering, data science and analytics
Certified professional in AWS, Microsoft Azure and Databricks
/// Training Databricks / 15.11.2022 / Dalibor Wijas
Dalibor.Wijas@datasentics.com
+420 602 436 803
Linkedin.com/in/wijas
Who am I?
Databricks
3
What is Databricks?
/// Training Databricks / 15.11.2022 / Dalibor Wijas
Databricks
4
Why we need Databricks?
/// Training Databricks / 15.11.2022 / Dalibor Wijas
Databricks
5
What is Data Lakehouse?
/// Training Databricks / 15.11.2022 / Dalibor Wijas
Databricks
6
How is it possible to joint these two concepts together?
/// Training Databricks / 15.11.2022 / Dalibor Wijas
Databricks
7
Though, what is Data Lakehouse then?
/// Training Databricks / 15.11.2022 / Dalibor Wijas
Databricks
8
Though, what is Data Lakehouse then?
/// Training Databricks / 15.11.2022 / Dalibor Wijas
Databricks
9
What is Data Lakehouse in Databricks?
/// Training Databricks / 15.11.2022 / Dalibor Wijas
Databricks
10
What is Apache Spark?
/// Training Databricks / 15.11.2022 / Dalibor Wijas
Source: https://www.databricks.com/spark/getting-started-with-apache-spark
Databricks
11
What is Apache Spark?
/// Training Databricks / 15.11.2022 / Dalibor Wijas
Source: https://medium.com/geekculture/apache-spark-architecture-f57fd3dd2f1e
Databricks
12
What is Delta Lake?
/// Training Databricks / 15.11.2022 / Dalibor Wijas
Source: https://delta.io
Databricks
13
What is Delta Lake?
/// Training Databricks / 15.11.2022 / Dalibor Wijas
Source: https://delta.io
Databricks
14
Can we do the same Data warehousing here?
/// Training Databricks / 15.11.2022 / Dalibor Wijas
Databricks
15
What Databricks offers to Data Engineers?
/// Training Databricks / 15.11.2022 / Dalibor Wijas
Key takeaways
16
What is Databricks
• Cloud based SaaS Application providing Data Lakehouse platform enabling users unify
their data using simple, open and collaborative web-based environment for all data use
cases.
• Based on open standard technologies – Apache Spark and Delta Lake – for massive
parallel processing in distributed cloud-based environment optimized for cloud object storage
and metadata driven computation.
• Covers all use cases from Data Engineering, Data Warehousing, Streaming, Machine
Learning and AI.
• Includes tooling for data ingestion, data transformation, data pipelines and orchestration,
data governance – data catalog, data quality, data lineage and data management – security
and observability.
• Connected to the world around by direct connectors, partner solutions and delta sharing
protocol, including BI & Analytical tools like Power BI and Tableau.
/// Training Databricks / 15.11.2022 / Dalibor Wijas
Databricks & ELSA platform Training Part I