Introduction to Data Storage and Processing
Installing the Hadoop Distributed File System (HDFS)
Defining key design assumptions and architecture
Configuring and setting up the file system
Issuing commands from the console
Reading and writing files
Setting the stage for MapReduce
Reviewing the MapReduce approach
Introducing the computing daemons
Dissecting a MapReduce job
Defining Hadoop Cluster Requirements
Planning the architecture
Selecting appropriate hardware
Designing a scalable cluster
Building the cluster
Installing Hadoop daemons
Optimizing the network architecture
Configuring a Cluster
Preparing HDFS
Setting basic configuration parameters
Configuring block allocation, redundancy and replication
Deploying MapReduce
Installing and setting up the MapReduce environment
Delivering redundant load balancing via Rack Awareness
Maximizing HDFS Robustness
Creating a faulttolerant file system
Isolating single points of failure
Maintaining High Availability
Triggering manual failover
Automating failover with Zookeeper
Leveraging NameNode Federation
Extending HDFS resources
Managing the namespace volumes
Introducing YARN
Critiquing the YARN architecture
Identifying the new daemons
Managing Resources and Cluster Health
Allocating resources
Setting quotas to constrain HDFS utilization
Prioritizing access to MapReduce using schedulers
Maintaining HDFS
Starting and stopping Hadoop daemons
Monitoring HDFS status
Adding and removing data nodes
Administering MapReduce
Managing MapReduce jobs
Tracking progress with monitoring tools
Commissioning and decommissioning compute nodes
Maintaining a Cluster
Employing the standard builtin tools
Managing and debugging processes using JVM metrics
Performing Hadoop status checks
Tuning with supplementary tools
Assessing performance with Ganglia
Benchmarking to ensure continued performance
Extending Hadoop
Simplifying information access
Enabling SQLlike querying with Hive
Installing Pig to create MapReduce jobs
Integrating additional elements of the ecosystem
Imposing a tabular view on HDFS with HBase
Configuring Oozie to schedule workflows
Implementing Data Ingress and Egress
Facilitating generic input/output
Moving bulk data into and out of Hadoop
Transmitting HDFS data over HTTP with WebHDFS
Acquiring applicationspecific data
Collecting multisourced log files with Flume
Importing and exporting relational information with Sqoop
Planning for Backup, Recovery and Security
Coping with inevitable hardware failures
Securing your Hadoop cluster
DURATION : 45 Working Days
FACULTY : MS. SINDU
BATCH TIMINGS : Click here
FOR HADOOP DEVELOPMENT COURSE : Click here
Hadoop Admin
How the Hadoop Distributed File System and Map Reduce work
What hardware configurations are optimal for Hadoop clusters
How to configure Hadoop's options for best cluster performance
How to configure NameNode High Availability
How to configure NameNode Federation
How to configure the FairScheduler to provide service-level agreements for
multiple users of a cluster
How to install and implement Kerberos-based security for your cluster
What system administration issues exist with other Hadoop projects such as
Hive, Pig, and HBase
Introduction to Big Data
Characteristics of Big Data
Why is parallel computing important
Discuss various products developed by vendors
Introducing Hadoop
Components of Hadoop
Starting Hadoop
Identify various processes
Hands on
Working with HDFS
Basic file commands
Web Based User Interface
Reading & Writing to files
Run a word count program
View jobs in the Web UI
Hands on
Installation & Configuration of Hadoop
Types of installation (RPMs & Tar files)
Set up ssh for the Hadoop cluster
Tree structure
XML, masters and slaves files
Checking system health
Discuss block size and replication factor
Benchmarking the cluster
Hands on
Advanced administration activities
Adding and de-commissioning nodes
Purpose of secondary name node
Recovery from a failed name node
Managing quotas
Enabling trash
Hands on
Monitoring the Hadoop Cluster
Hadoop infrastructure monitoring
Hadoop specific monitoring
Install and configure Nagios / Ganglia
Capture metrics
Hands on
Other Components of the Hadoop ecosystem
Discuss Hive, Sqoop, Pig, HBase, Flume
Use cases of each
Use Hadoop streaming to write code in Perl / Python
Hands on