0% found this document useful (0 votes)

29 views10 pages

Introduction to Apache Spark Basics

Apache Spark is a unified computing engine designed for large-scale distributed data processing, supporting multiple programming languages such as Python, Java, Scala, and R. It offers libraries for machine learning, SQL, stream processing, and graph processing, allowing users to perform various workloads with a single application. Spark emphasizes fast, parallel computation and can read data from various storage systems, requiring only Java for installation.

Uploaded by

20912023100044

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

29 views10 pages

Introduction to Apache Spark Basics

Uploaded by

20912023100044

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Spark

for
Big
Data
Agenda: Chapter 1

01 What is Apache Spark?

02 Apache Spark’s Philosophy

03 Running Spark
§ Apache Spark is a unified computing
engine and a set of libraries designed
for large-scale distributed data
processing.

§ Spark supports multiple widely used

programming languages (Python, Java,
Scala, and R).

§ It includes libraries with composable

APIs for machine learning (MLlib),
SQL for interactive queries (Spark
SQL), stream processing (Structured
1. What is
Streaming) for interacting with real-
time data, and graph processing
Apache
(GraphX). Spark?
Apache Spark
2. Apache Spark’s Philosophy

Unified 01
Computing
02
Engine

libraries
v Spark operations can be applied across
many types of workloads and expressed in
any of the supported programming
languages: Scala, Java, Python, SQL, and R.

v Spark offers unified libraries with well-

documented APIs that include the
following modules as core components:
Spark SQL, Spark Structured Streaming,
Spark MLlib, and GraphX, combining all the
workloads running under one engine.

v You can write a single Spark application

that can do it all—no need for distinct
engines for disparate workloads, no need
to learn separate APIs. With Spark, you get Unified
a unified processing engine for your
workloads.
v Spark focuses on its fast, parallel
computation engine rather than on
storage.

v That means you can use Spark to read

data stored in a variety of storage
systems, including:

ü Azure Storage ü Apache

ü Amazon S3 Hadoop

Cloud storage Distributed

systems file systems
Computing
Engine
v Spark’s final component is its
libraries, which build on its
design as a unified engine to
provide a unified API for
common data analysis tasks.

v Spark includes libraries for SQL

and structured data (Spark
SQL), machine learning(MLib),
stream processing (Spark
streaming), graph analytics
(GraphX).

Libraries
3. Running Spark

You can use Spark from Python, Java, Scala, R, or SQL. Spark itself is written in
Scala and runs on the Java Virtual Machine (JVM), so therefore to run Spark all
you need is an installation of Java

If you want to use the Python API, you will also need a Python
interpreter (version 2.7 or later). If you want to use R, you will
need a version of R on your machine.
Thank you

Introduction to Apache Spark Basics
No ratings yet
Introduction to Apache Spark Basics
30 pages
Introduction to Apache Spark Overview
No ratings yet
Introduction to Apache Spark Overview
20 pages
Introduction to Apache Spark Concepts
No ratings yet
Introduction to Apache Spark Concepts
51 pages
Overview of Apache Spark Features
No ratings yet
Overview of Apache Spark Features
200 pages
Overview of Apache Spark and RDDs
100% (1)
Overview of Apache Spark and RDDs
109 pages
Introduction to Apache Spark Basics
No ratings yet
Introduction to Apache Spark Basics
20 pages
Understanding Apache Spark: Features & Benefits
No ratings yet
Understanding Apache Spark: Features & Benefits
19 pages
Apache Spark Basics and Features
No ratings yet
Apache Spark Basics and Features
44 pages
Apache Spark: Fast Data Processing Overview
No ratings yet
Apache Spark: Fast Data Processing Overview
19 pages
Introduction to Apache Spark
No ratings yet
Introduction to Apache Spark
21 pages
Introduction to Apache Spark by Dulari Bhatt
No ratings yet
Introduction to Apache Spark by Dulari Bhatt
19 pages
Apache Spark Mllib
No ratings yet
Apache Spark Mllib
9 pages
Big Data Analytics with Apache Spark
No ratings yet
Big Data Analytics with Apache Spark
28 pages
Overview of Apache Spark Framework
No ratings yet
Overview of Apache Spark Framework
57 pages
Understanding Apache Spark Basics
No ratings yet
Understanding Apache Spark Basics
29 pages
Components of Apache Spark Explained
No ratings yet
Components of Apache Spark Explained
18 pages
Apache Spark Overview and Comparison
No ratings yet
Apache Spark Overview and Comparison
23 pages
Understanding Apache Spark Clusters
No ratings yet
Understanding Apache Spark Clusters
9 pages
Introduction to Big Data with Spark
No ratings yet
Introduction to Big Data with Spark
18 pages
Understanding Spark and PySpark Basics
No ratings yet
Understanding Spark and PySpark Basics
26 pages
Overview of Apache Spark Components
No ratings yet
Overview of Apache Spark Components
11 pages
Overview of Apache Spark Ecosystem
No ratings yet
Overview of Apache Spark Ecosystem
17 pages
Pyspark Learning Notes PDF Guide
No ratings yet
Pyspark Learning Notes PDF Guide
18 pages
Understanding Spark RDD and Ecosystem
No ratings yet
Understanding Spark RDD and Ecosystem
15 pages
Introduction to Apache Spark Basics
No ratings yet
Introduction to Apache Spark Basics
49 pages
Key Features and Components of Spark
No ratings yet
Key Features and Components of Spark
9 pages
Overview of Apache Spark Features
No ratings yet
Overview of Apache Spark Features
14 pages
Master Big Data with Apache Spark
No ratings yet
Master Big Data with Apache Spark
47 pages
Introduction to Apache Spark Basics
No ratings yet
Introduction to Apache Spark Basics
37 pages
Overview of Apache Spark Framework
No ratings yet
Overview of Apache Spark Framework
14 pages
Free Apache Spark Course Overview
No ratings yet
Free Apache Spark Course Overview
30 pages
Spark: Fast Data Processing Overview
No ratings yet
Spark: Fast Data Processing Overview
80 pages
Introduction to Apache Spark
No ratings yet
Introduction to Apache Spark
66 pages
Apache Spark: A Beginner's Guide
No ratings yet
Apache Spark: A Beginner's Guide
11 pages
Apache Spark for Machine Learning Overview
No ratings yet
Apache Spark for Machine Learning Overview
38 pages
Apache Spark: Big Data Analytics Overview
No ratings yet
Apache Spark: Big Data Analytics Overview
52 pages
Overview of Apache Spark Architecture
No ratings yet
Overview of Apache Spark Architecture
7 pages
Introduction To Apache Spark
No ratings yet
Introduction To Apache Spark
10 pages
Experiment No. 10 AIDS
No ratings yet
Experiment No. 10 AIDS
7 pages
Data Analysis with Apache Spark Overview
No ratings yet
Data Analysis with Apache Spark Overview
39 pages
Introduction to Apache Spark Basics
No ratings yet
Introduction to Apache Spark Basics
24 pages
Understanding Apache Spark and PySpark
No ratings yet
Understanding Apache Spark and PySpark
4 pages
Spark: Overview and Local Setup Guide
No ratings yet
Spark: Overview and Local Setup Guide
8 pages
Key Features of Apache Spark
No ratings yet
Key Features of Apache Spark
16 pages
Apache Spark: In-Memory Big Data Processing
No ratings yet
Apache Spark: In-Memory Big Data Processing
19 pages
Apache Spark: Fast Big Data Processing
No ratings yet
Apache Spark: Fast Big Data Processing
45 pages
Introduction To Spark For Data Engineers / Data Scientists
100% (3)
Introduction To Spark For Data Engineers / Data Scientists
100 pages
Apache Spark Overview and Getting Started
No ratings yet
Apache Spark Overview and Getting Started
67 pages
Introduction to Apache Spark Overview
No ratings yet
Introduction to Apache Spark Overview
48 pages
Introduction to Spark MLlib
No ratings yet
Introduction to Spark MLlib
6 pages
Apache Spark: Fast Data Processing Insights
No ratings yet
Apache Spark: Fast Data Processing Insights
7 pages
Apache Spark Engine
100% (1)
Apache Spark Engine
82 pages
Apache Spark Ecosystem Components
No ratings yet
Apache Spark Ecosystem Components
9 pages
Big Data Processing with Apache Spark
No ratings yet
Big Data Processing with Apache Spark
38 pages
Spark
No ratings yet
Spark
4 pages
Apache Spark Functional Programming Guide
No ratings yet
Apache Spark Functional Programming Guide
24 pages
Introduction to Apache Spark Overview
No ratings yet
Introduction to Apache Spark Overview
13 pages
Apache Spark: Fast Data Processing Guide
No ratings yet
Apache Spark: Fast Data Processing Guide
13 pages
Introduction to Apache Spark and Hadoop
No ratings yet
Introduction to Apache Spark and Hadoop
9 pages
Comprehensive Python Programming Guide
100% (1)
Comprehensive Python Programming Guide
3 pages
Ayodance Ghost Garden Hack Guide
No ratings yet
Ayodance Ghost Garden Hack Guide
20 pages
Java Activity Book for Beginners
No ratings yet
Java Activity Book for Beginners
62 pages
Object-Oriented Software Metrics Explained
No ratings yet
Object-Oriented Software Metrics Explained
91 pages
Minecraft Crash Report: Rendering Error
No ratings yet
Minecraft Crash Report: Rendering Error
75 pages
Core Java Full Course Syllabus
No ratings yet
Core Java Full Course Syllabus
4 pages
OOP Practical Lab Record by Mayur Kashyap
No ratings yet
OOP Practical Lab Record by Mayur Kashyap
18 pages
Day 10 Assignment - OOP
No ratings yet
Day 10 Assignment - OOP
4 pages
Comprehensive Java-J2EE Training Syllabus
No ratings yet
Comprehensive Java-J2EE Training Syllabus
12 pages
OOP vs Procedural Programming in Python
No ratings yet
OOP vs Procedural Programming in Python
26 pages
FA1 Computer Applications 2025-26
No ratings yet
FA1 Computer Applications 2025-26
7 pages
OOPS with C++: Class Programming Tasks
No ratings yet
OOPS with C++: Class Programming Tasks
123 pages
Entity Framework and LINQ Overview
No ratings yet
Entity Framework and LINQ Overview
59 pages
ALV Two-Level Interactive List Code
No ratings yet
ALV Two-Level Interactive List Code
15 pages
Java Try-Catch Exception Handling
No ratings yet
Java Try-Catch Exception Handling
5 pages
SQL User Procedures and Functions Guide
No ratings yet
SQL User Procedures and Functions Guide
46 pages
Understanding OOP Concepts in C++
No ratings yet
Understanding OOP Concepts in C++
30 pages
Python Programming Basics Explained
No ratings yet
Python Programming Basics Explained
3 pages
C++ Competitive Programming Solutions
No ratings yet
C++ Competitive Programming Solutions
9 pages
MySQL Database Operations Guide
No ratings yet
MySQL Database Operations Guide
82 pages
C++ Interview Questions and Answers
No ratings yet
C++ Interview Questions and Answers
5 pages
Python Globals, Locals, and Object Basics
No ratings yet
Python Globals, Locals, and Object Basics
9 pages
Computer Applications Exam Paper 2024
No ratings yet
Computer Applications Exam Paper 2024
9 pages
Java Currency Converter Lab Exercises
No ratings yet
Java Currency Converter Lab Exercises
2 pages
Object Oriented Programming Lab Assignments
No ratings yet
Object Oriented Programming Lab Assignments
21 pages
CS50 OOP: Python Flight Class Example
No ratings yet
CS50 OOP: Python Flight Class Example
1 page
JavaScript Classes and Objects Guide
No ratings yet
JavaScript Classes and Objects Guide
30 pages
Java Exception Handling Quiz
No ratings yet
Java Exception Handling Quiz
20 pages
Murach Chapter 10 Slides
No ratings yet
Murach Chapter 10 Slides
52 pages
C Programming Lab Manual for Students
No ratings yet
C Programming Lab Manual for Students
43 pages

Introduction to Apache Spark Basics

Uploaded by

Introduction to Apache Spark Basics

Uploaded by

Spark

01 What is Apache Spark?

02 Apache Spark’s Philosophy

§ Spark supports multiple widely used

§ It includes libraries with composable

v Spark offers unified libraries with well-

v You can write a single Spark application

v That means you can use Spark to read

ü Azure Storage ü Apache

Cloud storage Distributed

v Spark includes libraries for SQL

You might also like