0% found this document useful (0 votes)
69 views7 pages

HBase Data Model and Implementations

The document discusses Hadoop-related tools, focusing on HBase, a distributed column-oriented database built on the Hadoop file system. It highlights the limitations of Hadoop, such as its batch processing capabilities and sequential data access, while emphasizing HBase's ability to provide quick random access to structured data. Additionally, it compares HDFS and HBase, noting that HDFS is suitable for large file storage but lacks fast individual record lookups, which HBase addresses through its architecture.

Uploaded by

itstudents589
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
69 views7 pages

HBase Data Model and Implementations

The document discusses Hadoop-related tools, focusing on HBase, a distributed column-oriented database built on the Hadoop file system. It highlights the limitations of Hadoop, such as its batch processing capabilities and sequential data access, while emphasizing HBase's ability to provide quick random access to structured data. Additionally, it compares HDFS and HBase, noting that HDFS is suitable for large file storage but lacks fast individual record lookups, which HBase addresses through its architecture.

Uploaded by

itstudents589
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd

UNIT V HADOOP RELATED TOOLS 6

HBASE – DATA MODEL AND IMPLEMENTATIONS – HBASE


CLIENTS – HBASE EXAMPLES – [Link] – GRUNT – PIG
DATA MODEL – PIG LATIN – DEVELOPING AND TESTING
PIG LATIN [Link] – DATA TYPES AND FILE FORMATS
– HIVEQL DATA DEFINITION – HIVEQL DATA
MANIPULATION – HIVEQLQUERIES.
HBase - Overview
• Hadoop uses distributed file system for storing big data,
and MapReduce to process it.
Limitations of Hadoop
• Hadoop can perform only batch processing, and data will
be accessed only in a sequential manner.

Hadoop Random Access Databases


• Applications such as HBase, Cassandra, couchDB, Dynamo,
and MongoDB are some of the databases that store huge
amounts of data and access the data in a random manner.
What is HBase?
• HBase is a distributed column-oriented
database built on top of the Hadoop file
system.
• It is an open-source project and is horizontally
scalable.
• HBase is a data model that is similar to
Google’s big table designed to provide quick
random access to huge amounts of structured
data.
What is HBase?
HBase and HDFS
HDFS HBase

HDFS is a distributed file system HBase is a database built on top


suitable for storing large files. of the HDFS.

HDFS does not support fast HBase provides fast lookups for
individual record lookups. larger tables.

It provides high latency batch It provides low latency access to


processing; no concept of batch single rows from billions of
processing. records (Random access).

HBase internally uses Hash tables


It provides only sequential access and provides random access, and
of data. it stores the data in indexed HDFS
files for faster lookups.
• [Link]
hbase_installation.htm
DATA MODEL AND IMPLEMENTATIONS

You might also like