0% found this document useful (0 votes)

45 views45 pages

UNIT-2 Cloud Computing Final PDF

The document covers Hadoop and Python, focusing on Hadoop's architecture, components, and job execution processes, including MapReduce and HDFS. It also discusses cloud application design principles and Python basics, highlighting its features, installation, and programming paradigms. The document serves as a comprehensive guide for understanding big data processing with Hadoop and programming with Python.

Uploaded by

padhu6121985

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

45 views45 pages

UNIT-2 Cloud Computing Final PDF

Uploaded by

padhu6121985

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

UNIT - II Hadoop and Python

Hadoop Map Reduce: Apache Hadoop, Hadoop Map Reduce Job Execution, Hadoop Schedulers,
Hadoop
Cluster setup.
Cloud Application Design: Reference Architecture for Cloud Applications, Cloud Application Design
Methodologies, Data Storage Approaches.
Python Basics: Introduction, Installing Python, Python data Types & Data Structures, Control flow,
Function, Modules, Packages, File handling, Date/Time Operations, Classes

Apache Hadoop:

Introduction:

➢ Hadoop is an open-source software framework used for storing and processing Big Data in a
distributed manner on large clusters of commodity hardware. Hadoop is licensed under Apache Software
Foundation (ASF).
➢ Hadoop is written in the Java programming language and ranks among the highest-level Apache
projects.
➢ Doug Cutting and Mike J. Cafarella developed Hadoop.
➢ By getting inspiration from Google, Hadoop is using technologies like Map-Reduce programming
model as well as Google file system (GFS).
➢ It is optimized to handle massive quantities of data that could be structured, unstructured or semi-
structured, using commodity hardware, that is, relatively inexpensive computers.
➢ It is intended to work upon from a single server to thousands of machines each offering local
computation and storage. It supports the large collection of data set in a distributed computing
environment.
Hadoop ecosystem:

➢ Hadoop Ecosystem is neither a programming language nor a service; it is a platform or framework

which solves big data problems. You can consider it as a suite which encompasses a number of services
(ingesting, storing, analyzing and maintaining) inside it. Let us discuss and get a brief idea about how
the services work individually and in collaboration.
➢ The Hadoop ecosystem provides the furnishings that turn the framework into a comfortable home for
big data activity that reflects your specific needs and tastes.
➢ The Hadoop ecosystem includes both official Apache open source projects and a wide range of
commercial tools and solutions.

The following are the components of Hadoop ecosystem:

1. HDFS: Hadoop Distributed File System. It simply stores data files as close to the original form as
possible.
2. HBase: It is Hadoop’s distributed column based database. It supports structured data storage for large
tables.
3. Hive: It is a Hadoop’s data warehouse, enables analysis of large data sets using a language very similar
to SQL. So, one can access data stored in hadoop cluster by using Hive.
4. Pig: Pig is an easy to understand data flow language. It helps with the analysis of large data sets which
is quite the order with Hadoop without writing codes in MapReduce paradigm
5. ZooKeeper: It is an open source application that configures synchronizes the distributed systems.
6. Oozie: It is a workflow scheduler system to manage apache hadoop jobs.
7. Mahout: It is a scalable Machine Learning and data mining library.
8. Chukwa: It is a data collection system for managing large distributed systems.
9. Sqoop: it is used to transfer bulk data between Hadoop and structured data stores such as relational
databases.
10. Ambari: it is a web based tool for provisioning, Managing and Monitoring Apache Hadoop clusters.

HDFS:
➢ Hadoop Distributed file system is a Java based distributed file system that allows you to store
large data across multiple nodes in a Hadoop cluster. So, if you install Hadoop, you get HDFS
as an underlying storage system for storing the data in the distributed environment.
➢ Hadoop File System was developed using distributed file system design. It runs on
commodity hardware. Unlike other distributed systems, HDFS is highly fault tolerant and
designed using low-cost hardware.
HDFS runs on top of the existing file systems on each node in a Hadoop cluster. It is a blockstructured
file system where each file is divided into blocks of a pre-determined size. These blocks are stored across
a cluster of one or several machines
Hadoop Distributed File System (HDFS)
HDFS is Hadoop’s primary storage system. It is designed to reliably store the vast amounts of data across
a cluster of machines.

Architecture and components:

The HDFS architecture revolves around a master-slave model. At the top sits the NameNode,
which manages metadata— essentially, the file system’s directory tree and information about
each file’s location. It doesn’t store the actual data.

The DataNodes are the workhorses. They manage storage attached to the nodes and serve
read/write requests from clients. Each DataNode regularly reports back to the NameNode with a
heartbeat and block reports to ensure consistent state tracking.

Finally, HDFS includes a Secondary NameNode, not to be confused with a failover node.
Instead, it periodically checkpoints the NameNode’s metadata to reduce startup time and
memory overhead.
Name node:
The NameNode is the primary master server in a Hadoop Distributed File System (HDFS) cluster. It
acts as the central coordinator, managing the filesystem namespace and regulating client
access to data stored across the various DataNodes (slave nodes).
DataNode:
DataNode works on the Slave system. The NameNode always instructs DataNode for storing the Data.
DataNode is a program that runs on the slave system that serves the read/write request from the client.

Secondary NameNode in HDFS:

Secondary NameNode in Hadoop is more of a helper to NameNode, it is not a backup NameNode server
which can quickly take over in case of NameNode failure.

JobTracker:
JobTracker is a master daemon responsible for executing over MapReduce job. It provides connectivity
between Hadoop and application.

TaskTracker:
This daemon is responsible for executing individual tasks that is assigned by the Job Tracker.
Task Tracker continuously sends heartbeat message to job tracker. When a job tracker fails to receive a
heartbeat message from a TaskTracker, the JobTracker assumes that the TaskTracker has failed and
resubmits the task to another available node in the cluster.
[Link] MapReduce is the data processing layer. It processes the huge amount of structured and
unstructured data stored in HDFS.
[Link] processes data in parallel by dividing the job into the set of independent tasks. So, parallel
processing improves speed and reliability.

Hadoop MapReduce data processing takes place in 2 phases- Map and Reduce phase.

Map phase- It is the first phase of data processing. In this phase, we specify all the complex
logic/business rules/costly code.
Reduce phase- It is the second phase of processing. In this phase, we specify light-weight processing like
aggregation/summation.
Steps of MapReduce Job Execution flow:
MapReduce processess the data in various phases with the help of different components. Let’s discuss the
steps of job execution in Hadoop.

1. Input Files
In input files data for MapReduce job is stored. In HDFS, input files reside. Input files format is arbitrary.
Line-based log files and binary format can also be used.

2. InputFormat
After that InputFormat defines how to split and read these input files. It selects the files or other objects
for input. InputFormat creates InputSplit.

3. InputSplits
It represents the data which will be processed by an individual Mapper. For each split, one map task is
created. Thus the number of map tasks is equal to the number of InputSplits. Framework divide split into
records, which mapper process.

4. RecordReader
It communicates with the inputSplit. And then converts the data into key-value pairs suitable for reading
by the Mapper. RecordReader by default uses TextInputFormat to convert data into a key-value pair.

5. Mapper
It processes input record produced by the RecordReader and generates intermediate key-value pairs. The
intermediate output is completely different from the input pair. The output of the mapper is the full
collection of key-value pairs.

4. Combiner
Combiner is Mini-reducer which performs local aggregation on the mapper’s output. It minimizes the
data transfer between mapper and reducer. So, when the combiner functionality completes, framework
passes the output to the partitioner for further processing.

5. Partitioner
Partitioner comes into the existence if we are working with more than one reducer. It takes the output of
the combiner and performs partitioning.

6. Shuffling and Sorting

After partitioning, the output is shuffled to the reduce node. The shuffling is the physical movement of the
data which is done over the network. As all the mappers finish and shuffle the output on the reducer
nodes.

Then framework merges this intermediate output and sort. This is then provided as input to reduce phase.
7. Reducer
Reducer then takes set of intermediate key-value pairs produced by the mappers as the input. After that
runs a reducer function on each of them to generate the output.
The output of the reducer is the final output. Then framework stores the output on HDFS.
8. RecordWriter
It writes these output key-value pair from the Reducer phase to the output files.
9. OutputFormat
OutputFormat defines the way how RecordReader writes these output key-value pairs in output files. So,
its instances provided by the Hadoop write files in HDFS. Thus OutputFormat instances write the final
output of reducer on HDFS.
Hadoop Schedulers:
Hadoop schedulers in cloud computing manage resources for big data jobs, with key types including the
default FIFO, the fair-share-focused Fair Scheduler, and capacity-based Capacity Scheduler,
all aiming to improve efficiency over basic queuing, addressing shared cluster needs by
balancing resource utilization, fairness, and performance, often using techniques like Delay
Scheduling to optimize data locality in dynamic cloud environments

FIFO Scheduler
First In First Out is the default scheduling policy used in Hadoop. FIFO
Scheduler gives more preferences to the application coming first than those
coming later. It places the applications in a queue and executes them in the
order of their submission (first in, first out).
Here, irrespective of the size and priority, the request for the first application
in the queue are allocated first. Once the first application request is satisfied,
then only the next application in the queue is served.

Advantage:
 It is simple to understand and doesn’t need any configuration.
 Jobs are executed in the order of their submission.
Disadvantage:
 It is not suitable for shared clusters. If the large application comes before
the shorter one, then the large application will use all the resources in the
cluster, and the shorter application has to wait for its turn. This leads to
starvation.
 It does not take into account the balance of resource allocation between
the long applications and short applications.
2. Capacity Scheduler
The CapacityScheduler allows multiple-tenants to securely share a large
Hadoop cluster. It is designed to run Hadoop applications in a shared, multi-
tenant cluster while maximizing the throughput and the utilization of the
cluster.
It supports hierarchical queues to reflect the structure of organizations or
groups that utilizes the cluster resources. A queue hierarchy contains three
types of queues that are root, parent, and leaf.

Advantages:
 It maximizes the utilization of resources and throughput in the Hadoop
cluster.
 Provides elasticity for groups or organizations in a cost-effective manner.
 It also gives capacity guarantees and safeguards to the organization
utilizing cluster.
Disadvantage:
 It is complex amongst the other scheduler.

3. Fair Scheduler
FairScheduler allows YARN applications to fairly share resources in large
Hadoop clusters. With FairScheduler, there is no need for reserving a set
amount of capacity because it will dynamically balance resources between all
running applications.

It assigns resources to applications in such a way that all applications get, on

average, an equal amount of resources over time.

The FairScheduler, by default, takes scheduling fairness decisions only on the

basis of memory. We can configure it to schedule with both memory and CPU.
Advantages:
 It provides a reasonable way to share the Hadoop Cluster between the
number of users.
 Also, the FairScheduler can work with app priorities where the priorities
are used as weights in determining the fraction of the total resources that
each application should get.
Disadvantage:
 It requires configuration.

Hadoop Cluster setup:

Hadoop Cluster Setup process:
Setting up a Hadoop cluster involves configuring multiple machines to work together as a unified system
for processing and storing large datasets. Here's a basic outline of the process:
1. Hardware Setup:
Choose suitable hardware for the cluster, including servers with sufficient RAM, CPU, and storage
capacity. Ensure that all machines have a reliable network connection.
2. Operating System Installation:
Install a compatible operating system (e.g., Linux) on each machine in the cluster.
[Link] Installation:
Install Java Development Kit (JDK) on all machines, as Hadoop is built using Java.
4. Hadoop Installation:
Download the Hadoop distribution and extract it on all machines.
Configure the Hadoop environment variables, such as HADOOP_HOME and JAVA_HOME.
5. Configuration Files:
Modify the [Link], [Link], and [Link] configuration files to specify the cluster
settings, such as the Namenode and Datanode details.

6. SSH Setup:
Set up passwordless SSH between the master and slave nodes to enable secure communication.
7. Hadoop Daemons:
Start the Hadoop daemons, including the Namenode, Datanode, Resource Manager, and Node Manager,
on their respective machines.

8. Testing:
Verify the cluster setup by running sample MapReduce jobs and checking the Hadoop web interface for
cluster status.
9. Maintenance:
Regularly monitor the cluster for performance, resource utilization, and any potential issues.
Cloud Application Design: Reference Architecture for Cloud Applications, Cloud Application Design
Methodologies, Data Storage Approaches.
1. Scalability
• Scalability is an important factor that drives the application designers to move to cloud computing
environments. Building applications that can serve millions of users without taking a hit on their
performance has always been challenging. With the growth of cloud computing application designers can
provision adequate resources to meet their workload levels.

esign

[Link] & Availability

Reliability of a system is defined as the probability that a system will perform the intended functions
under stated conditions for a specified amount of time. Availability is the probability that a system will
perform a specified function under given conditions at a prescribed time.

[Link]
Security is an important design consideration for cloud applications given the outsourced nature of cloud
computing environments.

4. Maintenance & Upgradation:

To achieve a rapid time-to-market, businesses typically launch their applications with a core set of
features ready and then incrementally add new features as and when they are complete.
[Link]
Applications should be designed while keeping the performance requirements in mind
Reference Architectures – e-Commerce, Business-to-Business, Banking and
Financial apps
Python Basics: Introduction, Installing Python, Python data Types & Data Structures, Control flow,
Function, Modules, Packages, File handling, Date/Time Operations, Classes
Python:
Python is a general-purpose programming language that is interpreted and high-level. It focuses on
readability and simple syntax. It has English like syntax and reading a python code is similar to reading
an English sentence.
Python offers all the functionalities one might ever need for programming tasks.
History of Python
In December 1989, Guido Van Rossum was searching for a hobby project to keep him occupied around
Christmas week. Since he had already been planning to write a new scripting language descended from
ABC, that would also appeal to Unix/C hackers, he ended up writing an interpreter for it. Being a big fan
of the British comedy troupe Monty Python, he chose to call the project ‘Python’ in an irreverent mood.

Key Characteristics

 Easy to Learn: Syntax is similar to English, reducing lines of code.

 Interpreted: Executes code line-by-line, allowing for rapid prototyping.

 High-Level: Manages complex tasks like memory automatically.

 Dynamically Typed: Type checking happens at runtime, not compile time.

 Cross-Platform: Runs on Windows, Mac, Linux, etc..

 Multi-Paradigm: Supports different programming styles (procedural, OOP, functional).

Installation of python:

Seven-Step Cloud Migration Model
No ratings yet
Seven-Step Cloud Migration Model
3 pages
MapReduce Job Execution Overview
No ratings yet
MapReduce Job Execution Overview
24 pages
Cloud Computing Unit 1 Notes
No ratings yet
Cloud Computing Unit 1 Notes
43 pages
Cloud Virtualization Infrastructure Overview
100% (1)
Cloud Virtualization Infrastructure Overview
10 pages
Scalable Computing in Cloud Systems
No ratings yet
Scalable Computing in Cloud Systems
30 pages
Cloud Migration Strategies and Challenges
No ratings yet
Cloud Migration Strategies and Challenges
6 pages
Overview of MapReduce Applications
No ratings yet
Overview of MapReduce Applications
11 pages
Hadoop and Python Integration Guide
No ratings yet
Hadoop and Python Integration Guide
50 pages
Elastic Resource Capacity Architecture
No ratings yet
Elastic Resource Capacity Architecture
14 pages
Python for Cloud Services: AWS & GCP
No ratings yet
Python for Cloud Services: AWS & GCP
20 pages
Cloud Computing Delivery Models Explained
No ratings yet
Cloud Computing Delivery Models Explained
17 pages
Cloud App Design & Streaming Protocols
No ratings yet
Cloud App Design & Streaming Protocols
25 pages
Distributed Debugging Techniques
No ratings yet
Distributed Debugging Techniques
13 pages
Hadoop in Cloud Computing: Overview
No ratings yet
Hadoop in Cloud Computing: Overview
16 pages
Load Balancing and Scalability in Cloud Computing
No ratings yet
Load Balancing and Scalability in Cloud Computing
11 pages
EMC Captiva Toolkit Overview
100% (1)
EMC Captiva Toolkit Overview
10 pages
UNIT - I: Introduction To Cloud Computing Fundamentals - Lecture Notes
No ratings yet
UNIT - I: Introduction To Cloud Computing Fundamentals - Lecture Notes
54 pages
Cloud Applications in Scientific Computing
No ratings yet
Cloud Applications in Scientific Computing
11 pages
Cloud Computing Unit 3
100% (1)
Cloud Computing Unit 3
11 pages
Cloud Computing Architecture Overview
100% (1)
Cloud Computing Architecture Overview
22 pages
UNIT-IV: Cloud Computing Challenges
No ratings yet
UNIT-IV: Cloud Computing Challenges
24 pages
Chapter 10 Part I - Ed
No ratings yet
Chapter 10 Part I - Ed
15 pages
Cloud Application Development in Python: Bahga & Madisetti, © 2014 Book Website: WWW - Cloudcomputingbook.info
100% (1)
Cloud Application Development in Python: Bahga & Madisetti, © 2014 Book Website: WWW - Cloudcomputingbook.info
20 pages
Overview of Cloud Computing Services
No ratings yet
Overview of Cloud Computing Services
75 pages
RESTful Web Services & IPC in Python
No ratings yet
RESTful Web Services & IPC in Python
65 pages
Cloud Enabling Technologies Overview
No ratings yet
Cloud Enabling Technologies Overview
75 pages
Virtualization in Cloud Computing Explained
No ratings yet
Virtualization in Cloud Computing Explained
48 pages
Cloud Security Course Syllabus
100% (1)
Cloud Security Course Syllabus
1 page
Open Cloud Consortium & Virtualization Standards
No ratings yet
Open Cloud Consortium & Virtualization Standards
6 pages
System Models in Cloud Computing
No ratings yet
System Models in Cloud Computing
37 pages
Big Data Analytics: Clustering & Classification
No ratings yet
Big Data Analytics: Clustering & Classification
47 pages
Cloud Application Design Methodologies
No ratings yet
Cloud Application Design Methodologies
41 pages
Model of Distributed Computations
No ratings yet
Model of Distributed Computations
3 pages
Understanding MapReduce Architecture
No ratings yet
Understanding MapReduce Architecture
5 pages
SkyNet IoT Messaging Platform Overview
No ratings yet
SkyNet IoT Messaging Platform Overview
16 pages
Cloud Architecture and Computing Models
100% (1)
Cloud Architecture and Computing Models
10 pages
Xen Paravirtualization in Cloud Computing
No ratings yet
Xen Paravirtualization in Cloud Computing
32 pages
Architectural and Fundamental Models in Distributed Systems
No ratings yet
Architectural and Fundamental Models in Distributed Systems
39 pages
Elementary Data Link Protocols Overview
No ratings yet
Elementary Data Link Protocols Overview
27 pages
Java Code for CloudSim Datacenter Setup
No ratings yet
Java Code for CloudSim Datacenter Setup
9 pages
IoT Fundamentals: Cloud and Smart Tech
No ratings yet
IoT Fundamentals: Cloud and Smart Tech
45 pages
Python for Cloud Development Guide
No ratings yet
Python for Cloud Development Guide
19 pages
Cloud Computing Course Overview
No ratings yet
Cloud Computing Course Overview
2 pages
System Models in Cloud Computing
No ratings yet
System Models in Cloud Computing
15 pages
Key Questions in Cloud Computing
No ratings yet
Key Questions in Cloud Computing
2 pages
On-Demand Provisioning in Cloud Computing
No ratings yet
On-Demand Provisioning in Cloud Computing
11 pages
Virtualization Structures in Cloud Computing
No ratings yet
Virtualization Structures in Cloud Computing
11 pages
Advanced Cloud Architecture Overview
No ratings yet
Advanced Cloud Architecture Overview
28 pages
Virtualization of CPU, Memory, and I/O Devices
No ratings yet
Virtualization of CPU, Memory, and I/O Devices
2 pages
Specialized Cloud Architecture Overview
No ratings yet
Specialized Cloud Architecture Overview
20 pages
Run-Time Storage and Organization Explained
No ratings yet
Run-Time Storage and Organization Explained
7 pages
Introduction to Cloud Computing Concepts
No ratings yet
Introduction to Cloud Computing Concepts
42 pages
Conventional Software Management Overview
No ratings yet
Conventional Software Management Overview
93 pages
Global Cloud Exchange Overview
67% (3)
Global Cloud Exchange Overview
20 pages
Explain Data Representation and Data Items in Traditional Databases Vs Relational Databases
No ratings yet
Explain Data Representation and Data Items in Traditional Databases Vs Relational Databases
62 pages
CC 3
No ratings yet
CC 3
6 pages
Agile Methodology and DevOps Overview
No ratings yet
Agile Methodology and DevOps Overview
42 pages
Cloud Computing Basics and Benefits
No ratings yet
Cloud Computing Basics and Benefits
13 pages
Modern Block Ciphers Overview
No ratings yet
Modern Block Ciphers Overview
74 pages
2 Hadoop
No ratings yet
2 Hadoop
20 pages
Syllabus Subheadings...
No ratings yet
Syllabus Subheadings...
5 pages
Machine Learning for Resume Screening
No ratings yet
Machine Learning for Resume Screening
7 pages
ITIL Certified Technical Support Engineer
No ratings yet
ITIL Certified Technical Support Engineer
3 pages
Control Structures in C Programming
No ratings yet
Control Structures in C Programming
15 pages
Basic Invoice DEC
No ratings yet
Basic Invoice DEC
1 page
Understanding Arrays in C Programming
No ratings yet
Understanding Arrays in C Programming
24 pages
III CSE-CC - Lesson plan-CSE-2 - AK23 2025-26 SECTION-1
No ratings yet
III CSE-CC - Lesson plan-CSE-2 - AK23 2025-26 SECTION-1
8 pages
Functions and File Handling in C
No ratings yet
Functions and File Handling in C
22 pages
ECE Mid1 Award Sheet
No ratings yet
ECE Mid1 Award Sheet
2 pages
Introduction to Programming Basics
No ratings yet
Introduction to Programming Basics
38 pages
Understanding Pointers in C Programming
No ratings yet
Understanding Pointers in C Programming
16 pages
Annamacharya Institute Academic Calendar
No ratings yet
Annamacharya Institute Academic Calendar
2 pages
Quality Assessment of Cloud Computing Exam
No ratings yet
Quality Assessment of Cloud Computing Exam
2 pages
Data Science Project Topics List
No ratings yet
Data Science Project Topics List
5 pages
MCA Database Management Systems Exam Quality Report
No ratings yet
MCA Database Management Systems Exam Quality Report
2 pages
Sailaja Paper
No ratings yet
Sailaja Paper
6 pages
Practical Assessment Sheet for Labs
No ratings yet
Practical Assessment Sheet for Labs
7 pages
Rayalaseema Express Ticket Details
No ratings yet
Rayalaseema Express Ticket Details
2 pages
ECE Programming Assignments Results
No ratings yet
ECE Programming Assignments Results
4 pages
MCA Academic Audit Proforma 2023-24
No ratings yet
MCA Academic Audit Proforma 2023-24
15 pages
Consumer Insights on Annapurna Salt
No ratings yet
Consumer Insights on Annapurna Salt
51 pages
Huffman Static Compression Techniques
No ratings yet
Huffman Static Compression Techniques
22 pages
The Short Surveys Regarding Dining Preferences Requested at The Bottom of The Restaurant Bill Are An Example of Which Data Approach
No ratings yet
The Short Surveys Regarding Dining Preferences Requested at The Bottom of The Restaurant Bill Are An Example of Which Data Approach
9 pages
SQL Table Transposition Methods
No ratings yet
SQL Table Transposition Methods
1 page
Effectiveness of Hierarchical Communication
No ratings yet
Effectiveness of Hierarchical Communication
6 pages
Profibus ABC
No ratings yet
Profibus ABC
9 pages
Relational Database Design Essentials
No ratings yet
Relational Database Design Essentials
45 pages
SQL Basics: Tables, Data Types, and Queries
No ratings yet
SQL Basics: Tables, Data Types, and Queries
8 pages
Mycroft: Enhancing LLM Training Reliability
No ratings yet
Mycroft: Enhancing LLM Training Reliability
16 pages
Excel Tables and Structured References
No ratings yet
Excel Tables and Structured References
9 pages
Database Design Exam Paper 2005/2006
100% (1)
Database Design Exam Paper 2005/2006
10 pages
Research Methodology Overview for MBA
88% (8)
Research Methodology Overview for MBA
87 pages
MSc Data Science, AI & Digital Business
No ratings yet
MSc Data Science, AI & Digital Business
2 pages
Major Clustering Methods in Data Mining
No ratings yet
Major Clustering Methods in Data Mining
98 pages
Income Risk and Coping Strategies in Africa
No ratings yet
Income Risk and Coping Strategies in Africa
48 pages
Stress Management at HCL Technologies
No ratings yet
Stress Management at HCL Technologies
50 pages
ALE IDOC Configuration Steps in SAP
No ratings yet
ALE IDOC Configuration Steps in SAP
19 pages
Power BI Overview and Applications
No ratings yet
Power BI Overview and Applications
17 pages
Inventory Management Study: Vince Plastic
No ratings yet
Inventory Management Study: Vince Plastic
77 pages
Snowflake Architecture and Services Overview
No ratings yet
Snowflake Architecture and Services Overview
68 pages
Key Questions on Database Design
No ratings yet
Key Questions on Database Design
1 page
Impact of Information Systems on Business Decisions
No ratings yet
Impact of Information Systems on Business Decisions
10 pages
ServiceNow CTA PDF Questions
100% (2)
ServiceNow CTA PDF Questions
5 pages
Tabular Data Presentation in Research
100% (1)
Tabular Data Presentation in Research
5 pages
COPA Reporting and Financial Analytics
No ratings yet
COPA Reporting and Financial Analytics
11 pages
Database Report for Child Records
No ratings yet
Database Report for Child Records
4 pages
NetBackup Architecture and Sizing Guide
No ratings yet
NetBackup Architecture and Sizing Guide
4 pages
Recruitment Practices at Eureka Forbes
No ratings yet
Recruitment Practices at Eureka Forbes
7 pages
Data Engineering Internships Overview
No ratings yet
Data Engineering Internships Overview
2 pages
AI Education Flowchart Homework 2025-26
No ratings yet
AI Education Flowchart Homework 2025-26
2 pages

UNIT-2 Cloud Computing Final PDF

Uploaded by

UNIT-2 Cloud Computing Final PDF

Uploaded by

UNIT - II Hadoop and Python

➢ Hadoop Ecosystem is neither a programming language nor a service; it is a platform or framework

The following are the components of Hadoop ecosystem:

Architecture and components:

Secondary NameNode in HDFS:

6. Shuffling and Sorting

It assigns resources to applications in such a way that all applications get, on

The FairScheduler, by default, takes scheduling fairness decisions only on the

Hadoop Cluster setup:

[Link] & Availability

4. Maintenance & Upgradation:

 Easy to Learn: Syntax is similar to English, reducing lines of code.

 Interpreted: Executes code line-by-line, allowing for rapid prototyping.

 High-Level: Manages complex tasks like memory automatically.

 Dynamically Typed: Type checking happens at runtime, not compile time.

 Cross-Platform: Runs on Windows, Mac, Linux, etc..

 Multi-Paradigm: Supports different programming styles (procedural, OOP, functional).

You might also like