HIVE
•There are various ways to execute MapReduce operations:
•The traditional approach using Java MapReduce program
for structured, semi-structured, and unstructured data.
•The scripting approach for MapReduce to process
structured and semi structured data using Pig.
•The Hive Query Language (HiveQL or HQL) for MapReduce
to process structured data using Hive.
What is Hive
• Hive is a data warehouse infrastructure tool to process structured
data in Hadoop. It resides on top of Hadoop to summarize Big Data,
and makes querying and analyzing easy.
• Initially Hive was developed by Facebook, later the Apache Software
Foundation took it up and developed it further as an open source
under the name Apache Hive.
•Features of Hive
•It stores schema in a database and processed data into
HDFS.
•It is designed for OLAP.
•It provides SQL type language for querying called HiveQL or
HQL.
•It is familiar, fast, scalable, and extensible.
Unit Name Operation
Hive is a data warehouse infrastructure software that can create
interaction between user and HDFS. The user interfaces that Hive
User Interface
supports are Hive Web UI, Hive command line, and Hive HD
Insight (In Windows server).
Hive chooses respective database servers to store the schema or
Meta Store Metadata of tables, databases, columns in a table, their data types,
and HDFS mapping.
HiveQL is similar to SQL for querying on schema info on the
HiveQL Metastore. It is one of the replacements of traditional approach for
Process Engine MapReduce program. Instead of writing MapReduce program in
Java, we can write a query for MapReduce job and process it.
The conjunction part of HiveQL process Engine and MapReduce is
Execution Hive Execution Engine. Execution engine processes the query and
Engine generates results as same as MapReduce results. It uses the
flavor of MapReduce.
HDFS or Hadoop distributed file system or HBASE are the data storage
HBASE techniques to store data into file system.
Relational Database Hive
Maintains a database Maintains a data warehouse
Fixed schema Varied schema
Sparse tables Dense tables
Doesn’t support partitioning Supports automation partition
Stores both normalized and
Stores normalized data
denormalized data
Uses HQL (Hive Query
Uses SQL (Structured Query Language)
Language)