UNIT V HADOOP RELATED TOOLS 6
HBASE – DATA MODEL AND IMPLEMENTATIONS – HBASE
CLIENTS – HBASE EXAMPLES – [Link] – GRUNT – PIG
DATA MODEL – PIG LATIN – DEVELOPING AND TESTING
PIG LATIN [Link] – DATA TYPES AND FILE FORMATS
– HIVEQL DATA DEFINITION – HIVEQL DATA
MANIPULATION – HIVEQLQUERIES.
HBase - Overview
• Hadoop uses distributed file system for storing big data,
and MapReduce to process it.
Limitations of Hadoop
• Hadoop can perform only batch processing, and data will
be accessed only in a sequential manner.
Hadoop Random Access Databases
• Applications such as HBase, Cassandra, couchDB, Dynamo,
and MongoDB are some of the databases that store huge
amounts of data and access the data in a random manner.
What is HBase?
• HBase is a distributed column-oriented
database built on top of the Hadoop file
system.
• It is an open-source project and is horizontally
scalable.
• HBase is a data model that is similar to
Google’s big table designed to provide quick
random access to huge amounts of structured
data.
What is HBase?
HBase and HDFS
HDFS HBase
HDFS is a distributed file system HBase is a database built on top
suitable for storing large files. of the HDFS.
HDFS does not support fast HBase provides fast lookups for
individual record lookups. larger tables.
It provides high latency batch It provides low latency access to
processing; no concept of batch single rows from billions of
processing. records (Random access).
HBase internally uses Hash tables
It provides only sequential access and provides random access, and
of data. it stores the data in indexed HDFS
files for faster lookups.
• [Link]
hbase_installation.htm
DATA MODEL AND IMPLEMENTATIONS