0% found this document useful (0 votes)
3 views14 pages

Hive

Hive is a data warehouse infrastructure tool built on Hadoop for processing structured data, initially developed by Facebook and later open-sourced by Apache. It allows users to query and analyze Big Data using HiveQL, a SQL-like language, and supports various execution methods including traditional MapReduce, Pig, and HiveQL. Hive features a metadata storage system, user interfaces, and is designed for OLAP, making it scalable and extensible.

Uploaded by

amarjeetakskumar
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views14 pages

Hive

Hive is a data warehouse infrastructure tool built on Hadoop for processing structured data, initially developed by Facebook and later open-sourced by Apache. It allows users to query and analyze Big Data using HiveQL, a SQL-like language, and supports various execution methods including traditional MapReduce, Pig, and HiveQL. Hive features a metadata storage system, user interfaces, and is designed for OLAP, making it scalable and extensible.

Uploaded by

amarjeetakskumar
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

HIVE

•There are various ways to execute MapReduce operations:


•The traditional approach using Java MapReduce program
for structured, semi-structured, and unstructured data.
•The scripting approach for MapReduce to process
structured and semi structured data using Pig.
•The Hive Query Language (HiveQL or HQL) for MapReduce
to process structured data using Hive.
What is Hive

• Hive is a data warehouse infrastructure tool to process structured


data in Hadoop. It resides on top of Hadoop to summarize Big Data,
and makes querying and analyzing easy.
• Initially Hive was developed by Facebook, later the Apache Software
Foundation took it up and developed it further as an open source
under the name Apache Hive.
•Features of Hive
•It stores schema in a database and processed data into
HDFS.
•It is designed for OLAP.
•It provides SQL type language for querying called HiveQL or
HQL.
•It is familiar, fast, scalable, and extensible.
Unit Name Operation
Hive is a data warehouse infrastructure software that can create
interaction between user and HDFS. The user interfaces that Hive
User Interface
supports are Hive Web UI, Hive command line, and Hive HD
Insight (In Windows server).
Hive chooses respective database servers to store the schema or
Meta Store Metadata of tables, databases, columns in a table, their data types,
and HDFS mapping.
HiveQL is similar to SQL for querying on schema info on the
HiveQL Metastore. It is one of the replacements of traditional approach for
Process Engine MapReduce program. Instead of writing MapReduce program in
Java, we can write a query for MapReduce job and process it.
The conjunction part of HiveQL process Engine and MapReduce is
Execution Hive Execution Engine. Execution engine processes the query and
Engine generates results as same as MapReduce results. It uses the
flavor of MapReduce.
HDFS or Hadoop distributed file system or HBASE are the data storage
HBASE techniques to store data into file system.
Relational Database Hive

Maintains a database Maintains a data warehouse

Fixed schema Varied schema

Sparse tables Dense tables

Doesn’t support partitioning Supports automation partition

Stores both normalized and


Stores normalized data
denormalized data

Uses HQL (Hive Query


Uses SQL (Structured Query Language)
Language)

You might also like