0% found this document useful (0 votes)

10 views75 pages

Hive: Big Data SQL Query Tool

Uploaded by

23mca005

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views75 pages

Hive: Big Data SQL Query Tool

Uploaded by

23mca005

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

HIVE

INTRODUCTION

• Hive is a tool which was developed to help people work with Big Data
without actually needing to spend time learning MapReduce.
• Hive is a “data warehouse software” that enables you to query and
manipulate data using a SQL-like language known as HiveQL.
• It was developed at Facebook so that people who had experience in SQL
would be able to work on querying datasets without actually learning new
paradigms like MapReduce or new programming languages.
Developers face problem in writing MapReduce logic
● How to port existing
○ relational databases
○ SQL infrastructure with Hadoop?
● End users are familiar with SQL queries than MapReduce
and Pig
● Hive’s SQL-like query language makes data churning easy
KEY FEATURES OF HIVE

The main features of Hive are:

● An SQL-like interface to write queries on large datasets.
● Hive can be used to process all variants of data i.e. Structured, Semi-
structured and Unstructured.
● A variety of built-in functions for working with dates, strings, etc.
● Easy ETL (extraction, transformation, and loading) of data.
HIVE ARCHITECTURE
ARCHITECTURE OF HIVE

• User Interface - Hive is a data warehouse infrastructure software that can

create interaction between user and HDFS. The user interfaces that Hive
supports are Hive Web UI, Hive command line, and Hive HD Insight
• Meta Store -Hive chooses respective database servers to store the schema
or Metadata of tables, databases, columns in a table, their data types and
HDFS mapping.
• HiveQL Process Engine- HiveQL is similar to SQL for querying on
schema info on the Megastore. It is one of the replacements of traditional
approach for MapReduce program. Instead of writing MapReduce
program in Java, we can write a query for MapReduce job and process it.
ARCHITECTURE OF HIVE

• Execution Engine - The conjunction part of HiveQL process Engine and

MapReduce is Hive Execution Engine. Execution engine processes the
query and generates results as same as MapReduce results. It uses the
flavour of MapReduce.
• HDFS or HBASE - Hadoop distributed file system or HBASE are the data
storage techniques to store data into the file system
WORKING OF HIVE
WORKING

• Execute Query- The Hive interface such as Command Line or Web UI

sends query Driver to execute.
• Get Plan- The driver takes the help of query complier that parses the
query to check the syntax and query plan or the requirement of query.
• Get Metadata- The compiler sends metadata request to Megastore
• Send Metadata- Megastore sends metadata as a response to the compiler.
• Send Plan- The compiler checks the requirement and resends the plan to
the driver. Up to here, the parsing and compiling of a query is complete.
• Execute Plan- the driver sends the execute plan to the execution engine.
• Execute Job- Internally, the process of execution job is a MapReduce job.
The execution engine sends the job to JobTracker, which is in Name node
and it assigns this job to TaskTracker, which is in Data node. Here, the
query executes MapReduce job.
• Metadata Ops- Meanwhile in execution, the execution engine can execute
metadata operations with Megastore.
• Fetch Result- The execution engine receives the results from Data nodes.
• Send Results- The execution engine sends those resultant values to the
driver.
• Send Results- The driver sends the results to Hive Interfaces.
HIVE DATA TYPES AND DATA MODELS
• Hive stores and queries data using its data models. The purpose of using
data models is to make querying convenient and fast.
There are four main components in Hive data models, which are similar to
how an RDBMS stores data:
1. Databases
2. Tables
3. Partitions
4. Buckets
TABLES IN HIVE

Let’s start with tables. Hive has two types of tables:

● Managed (or internal) table
● External table
Note: 'Managed table' and the 'internal table' are synonymous terms.
You should use external tables when:
● You want to use the data outside of Hive as well. For example, when
another existing program is running on the same cluster.
● You want the data to remain stored on the HDFS even after dropping
tables because Hive does not delete the data stored outside (of the Hive
database).
● You do not want Hive to control the storage of your data
(location/directories of storage/etc.).
You use managed tables when:
● The data is temporary. So, when the Hive table is dropped, the data stored
in the internal table is deleted along with the metadata
● You want Hive to manage the life cycle of the data completely, i.e. both
store and process it
HIVE DATA TYPES

All the data types in hive are classified into four types
• Column Types
• Literals
• Null Values
• Complex Types
• Integer Types - Integer type data can be specified using integral data
types, INT. When the data range exceeds the range of INT, you need to use
BIGINT and if the data range is smaller than the INT, you use
SMALLINT. TINYINT is smaller than SMALLINT.
• String Types - String type data types can be specified using single quotes
(' ') or double quotes (" "). It contains two data types: VARCHAR and
CHAR. Hive follows C-types escape characters.
• Timestamp - It supports traditional UNIX timestamp with optional
nanosecond precision. It supports [Link] format “YYYY-
MM-DD HH:MM:[Link]” and format “yyyy-mmdd
hh:mm:[Link]”.
• Dates - DATE values are described in year/month/day format in the form
{{YYYY-MM-DD}}.
• Decimals -The DECIMAL type in Hive is as same as Big Decimal format
of Java. It is used for representing immutable arbitrary precision.
• Union Types - Union is a collection of heterogeneous data [Link] can
create an instance using create union.
LITERALS

• Floating Point Types - Floating point types are nothing but numbers with
decimal points. Generally, this type of data is composed of DOUBLE data
type.
• Decimal Type - Decimal type data is nothing but floating point value with
higher range than DOUBLE data type. The range of decimal type is
approximately -10-308 to 10308.
COMPLEX TYPES

• Arrays - Arrays in Hive are used the same way they are used in Java.
Syntax: ARRAY<data_type>
• Maps - Maps in Hive are similar to Java Maps.
Syntax: MAP<primitive_type, data_type>
• Structs - Structs in Hive is similar to using complex data with comment.
Syntax: STRUCT<col_name : data_type [ COMMENT col_comment, … ]>
HIVE SCHEMAS

• Hive is schema-on-read
• Schema is only enforced when the data is read (at query time)
• Allows greater flexibility: same data can be read using multiple
schemas
• Contrast with an RDBMS, which is schema-on-write
• Schema is enforced when the data is loaded
• Speeds up queries at the expense of load times
HIVE DDL

Create Database Statement

A database in Hive is a namespace or a collection of tables.
hive> CREATE SCHEMA userdb;
hive> SHOW DATABASES;
Drop database
hive> DROP DATABASE IF EXISTS userdb;
CREATING HIVE TABLES

Create a table called T with two columns, the first being an integer and the
other a string.
hive> CREATE TABLE T(a INT, b STRING);
CREATE TABLE SYNTAX

CREATE TABLE table_name

(col1 data_type,
col2 data_type,
col3 data_type,
col4 datatype )
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS format_type;
SIMPLE TABLE

CREATE TABLE page_view

(viewTime INT,
userid BIGINT,
page_url STRING,
referrer_url STRING,
ip STRING COMMENT 'IP Address of the User' )
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
STORED AS TEXTFILE;
MORE COMPLEX TABLE
CREATE TABLE employees (
(name STRING,
salary FLOAT,
subordinates ARRAY<STRING>,
deductions MAP<STRING, FLOAT>,
address STRUCT<street:STRING,
city:STRING,
state:STRING,
zip:INT>)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
STORED AS TEXTFILE;
EXTERNAL TABLE
CREATE EXTERNAL TABLE page_view_stg
(viewTime INT,
userid BIGINT,
page_url STRING,
referrer_url STRING,
ip STRING COMMENT 'IP Address of the User')
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
STORED AS TEXTFILE
LOCATION '/user/staging/page_view';
ALTERING AND DROPPING TABLES

hive> ALTER TABLE T RENAME TO K;

hive> ALTER TABLE K ADD COLUMNS (col INT);
hive> ALTER TABLE HIVE_TABLE ADD COLUMNS (col1 INT COMMENT 'a comment'
);
DML

To understand the Hive DML commands, let's see the employee and employee_department table first.
DML COMMANDS

LOAD DATA
hive> LOAD DATA LOCAL INPATH './usr/Desktop/
[Link]' OVERWRITE INTO TABLE Employee;
SELECTS and FILTERS
hive> SELECT E.EMP_ID FROM Employee E WHERE [Link]='US';
GROUP BY
hive> SELECT E.EMP_ID FROM Employee E GROUP BY [Link];
SORT BY

Hive sort by and order by commands are used to fetch data in sorted order. The
main differences between sort by and order by commands are given below.
hive> SELECT E.EMP_ID FROM Employee E SORT BY [Link];
May use multiple reducers for final output.
Only guarantees ordering of rows within a reducer.
May give partially ordered result.
ORDER BY

hive> SELECT E.EMP_ID FROM Employee E order BY [Link];

Uses single reducer to guarantee total order in output.
LIMIT can be used to minimize sort time.
JOINS
INNER & LEFT OUTER

Inner joins
Select * from employee join employeedepartment ON ([Link]
pid=[Link])
Left outer joins
Select [Link], empName, department from employee e Left outer joi
n employeedepartment ed on([Link]=[Link]);
RIGHT OUTER & FULL OUTER JOIN

Right outer joins

Select [Link], empName, department from employee e Right outer join employee
department ed on([Link]=[Link]);
Full outer joins
Select [Link], empName, department from employee e FULL outer join employee
department ed on([Link]=[Link]);
MORE HIVE QUERIES
AlexRon,123,Mathematics#Physics#Chemistry#MolecularBiology,semse
terfee|18000.50#labfee|1000#examfee|2500,011#1023550456
Alexa,111,Mathematics#Physics#English#ComputerScience,semseterfe
e|17000.50#labfee|2000#examfee|3500,012#1045550451
Derek,143,Mathematics#Physics#Geography#MolecularBiology,semset
erfee|20000.50#labfee|1500#examfee|2200,014#34523550456
Dona,115,Chemistry#Physics#English#ComputerScience,semseterfee|
17400.40#labfee|1000#examfee|1500,011#10455545251

(let the name of the above data file is [Link])

1. To create the table from the data use the below query
CREATE TABLE IF NOT EXISTS students (name STRING ,id INT ,subjects
ARRAY < STRING >,feeDetails MAP < STRING , FLOAT >,phoneNumber
STRUCT <areacode: INT , number : INT > )
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
COLLECTION ITEMS TERMINATED BY '#'
MAP KEYS TERMINATED BY '|'
LINES TERMINATED BY '\n'
STORED AS TEXTFILE;
2. Load the data into the table (if stored on HDFS) ,
remember this will move the file to
load data inpath 'add path to your file here' overwrite
into table students;
Note: If you are using the file located on your local directory
you will need to use the below query.
load data local inpath 'add path to your file here'
overwrite into table students;
3. To verify if the data has been loaded correctly use the
below query
Select * FROM students;
EXPLODE FUNCTION
explode(ARRAY<T> a)
Explodes an array to multiple rows. Returns a row-set with a single column (col),
one row for each element from the array.
explode(MAP<Tkey,Tvalue> m)
Explodes a map to multiple rows. Returns a row-set with a two columns (key,value)
, one row for each key-value pair from the input map.
posexplode(ARRAY<T> a)
Explodes an array to multiple rows with additional positional column of int type
(position of items in the original array, starting with 0). Returns a row-set with two
columns (pos,val), one row for each element from the array.
inline(ARRAY<STRUCT<f1:T1,...,fn:Tn>> a)
Explodes an array of structs to multiple rows. Returns a row-set with N columns (N
= number of top level elements in the struct), one row per struct from the array.
USING ADVANCED FUNCTIONS
1. Explode()
Select explode(feedetails) FROM students;
Select explode(subjects) FROM students;
Select explode(feedetails) FROM students WHERE name = "Alexa" ;
2. Upper()
Select upper ( name ) from students;
3. Regex_Replace()
Select regexp_replace( concat ( upper ( name ), id ), ' ' , '' ) as username
from students;
NESTED QUERY EXAMPLE

1. Query to determine the set of students having lab fees greater than 1000
Select name , feeDetails[ 'labfee' ] from students where feeDetails[ 'labfee' ] > 1000
;
2. Assume that the labfees are getting hiked by 20% and you would like to see the projected
values in case it is greater than 1500.
Select name , feeDetails[ 'labfee' ]* 1.2 as projected_fee from students where
feeDetails[ 'labfee' ]* 1.2 > 1500 ;

FROM
( SELECT name , feeDetails[ 'labfee' ]* 1.2 as projected_fee from
students) newfee
SELECT [Link] where newfee.projected_fee > 1500 ;
PARTITIONING AND BUCKETING

1. Partitioning : Partitions are used to make queries faster by dividing the tables into
smaller parts using partition key columns.
( For example, if in a table with details of employees in a company(name, salary, department,
designation), you can create a partition for each department. Hive will store each partition
separately and will scan only the partition that is needed, thereby making the query faster.)

2. Bucketing : Like partitioning, bucketing also divides the data into smaller and more
manageable parts. The key difference between partitioning and bucketing is bucketing uses
the hash of a column to create multiple 'buckets', whereas partitions are more
straightforward - they are simply segmented directories according to the value of a column
(year, month etc.).
PARTITIONING AND CLUSTERING

For creating partitions, "PARTITIONED BY" clause is used.

Similarly for creating buckets, "CLUSTERED BY“ clause is
used.
Guidelines for when to use partitioning and
bucketing:
● You typically perform partitioning on columns (or groups of
columns) such that it produces a manageable number of
partitions (directories), such as year, month, state, country
etc.
● However, if you partition on say customer ID, the number
of partitions will be enormous and may actually reduce
performance; thus, you can choose to create buckets on
customer ID
STATIC PARTITIONING EXAMPLE

1. To create table use the below query

CREATE TABLE IF NOT EXISTS customer_partitioned(customer_fname
varchar ( 64),customer_lname varchar ( 64 ),customer_addr
string ,city varchar ( 64 ))PARTITIONED BY (country VARCHAR
( 64 ),state VARCHAR ( 64 ))
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS TEXTFILE;
STATIC PARTITIONING

2. To load data into the table use the below command

LOAD DATA LOCAL INPATH 'yourfilepath'
INTO TABLE customer_partitioned
PARTITION (country= 'usa' , state= 'az’ );
DYNAMIC PARTITIONING EXAMPLE

1. To create table use the below query

CREATE TABLE customer(customer_fname
VARCHAR(64),customer_lname
VARCHAR(64),customer_addr STRING,city VARCHAR(64),state
VARCHAR(64),country VARCHAR(64)) ROW FORMAT DELIMITED
FIELDS
TERMINATED BY ‘,’ STORED AS TEXTFILE;
2. To load data into the table use the below command
LOAD DATA LOCAL INPATH 'yourdatapath' INTO TABLE customer ;
3. By default, dynamic partitioning is disabled in Hive to prevent
accidental partition creation. To enable, we use:
set [Link] = true ;
set [Link]=nonstrict;
4. Create a partitioned table using the below query
CREATE TABLE partitioned_customer(customer_fname
varchar ( 64 ),customer_lname varchar ( 64 ),customer_addr
string ,city
varchar ( 64 ))PARTITIONED BY (country VARCHAR ( 64 ),state
VARCHAR ( 64 ));
5. Load data into the partitioned table using the previously created
normal table
Insert into table partitioned_customer partition (country,state)
select
customer_fname,customer_lname,customer_addr,city,country,sta
te from
customer;
6. To show all the partitions
show partitions partitioned_customer;
7. Simple select query using partition
Select * from partitioned_customer where state= 'az' and country= 'usa' ;
8. If you want to drop any partitions, it can be done as:
ALTER Table partitioned_customer DROP IF EXISTS PARTITION (country= 'usa' ,
state= 'nw' ,);
9. If you need to change the file to be referred by a partition you can use the below query
ALTER TABLE customer_partitioned PARTITION (country= 'usa' ,state= 'az' ) SET
LOCATION 'newpath' ;
ONE MORE EXAMPLE OF PARTITIONING

•Creation of table all states with 3 column names such as state, district, and enrollment
•Loading data into table all states
•Creation of partition table with state as partition key
•In this step Setting partition mode as non-strict( This mode will activate dynamic partition mode)
•Loading data into partition tablestate_part
•Actual processing and formation of partition tables based on state as partition key
•There is going to 38 partition outputs in HDFS storage with the file name as state name
BUCKETING

Buckets in hive is used in segregating of hive table-data into

multiple files or directories. it is used for efficient querying.
• The data i.e. present in that partitions can be divided further into
Buckets
• The division is performed based on Hash of particular columns that
we selected in the table.
• Buckets use some form of Hashing algorithm at back end to read
each record and place it into buckets
• In Hive, we have to enable buckets by using the
[Link]=true;
BUCKETING EXAMPLE

1. To create a normal table use the below query

CREATE TABLE IF NOT EXISTS products_input(productid
int ,transactiondate date ,amount int , store string ) ROW
FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS
TEXTFILE;
2. To load data into the table use the below command
load data inpath 'yourdatapath' into table products_input;
3. To enable bucketing use:
set [Link]=nonstrict;
set [Link]= true ;
set [Link]= true ;
4. Create a bucketed table using the below query
CREATE TABLE IF NOT EXISTS products_bucket(productid
int ,amount int ) PARTITIONED BY ( store String ,transactiondate date
) CLUSTERED
BY (productid) into 4 buckets
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS TEXTFILE;
5. Load data into the bucketed table using the previously created normal
table
insert into table products_bucket partition
( store ,transactiondate) select
productid,amount, store ,transactiondate from products_input;
6. Simple select query using buckets
select * from products_bucket where store = "walmart" and
transactiondate= "2018-01-01" and productid = 123 ;
7. Select query for comparison
select * from products_input where store = "walmart" and
transactiondate= "2018-01-01" and productid = 123 ;
1. CREATE SAMPLEBUCKET WITH COLUMN NAMES AS
FIRST_NAME, JOB_ID, DEPARTMENT, SALARY AND COUNTRY .
2. LOADING DATA INTO TABLE SAMPLE
BUCKET
3. DISPLAYING 4 BUCKETS THAT CREATED IN
STEP 1
UDFS(USER DEFINED FUNCTIONS) IN HIVE

• User Defined Functions, also known as UDF, allow you to create

custom functions to process records or groups of records.
• To write a simple UDF:
1. Extend the [Link] class
2. Write an "evaluate" method that has a signature equivalent to the
signature of your UDF in HiveQL.
SIMPLE UDF EXAMPLE
package udfs;
import [Link];

class SimpleUDFExample extends UDF

{ public Text evaluate(Text input)

{ if (input==null) return null;
return new Text("Hello " + [Link]()); } }

ve> ADD JAR /root/hivedemo/[Link];

ve> CREATE TEMPORARY FUNCTION helloworld as ‘[Link]’;
ive> select helloworld(name) from tablename limit 1000;
/** A simple UDF to convert Celcius to Fahrenheit
*/
package p;
import [Link];

public class ConvertToCelcius extends UDF

{ public double evaluate(double value)
{ return (value - 32) / 1.8; } }

hive> add jar [Link]

hive> create temporary function fahrenheit_to_celcius using
“[Link]“; hive> SELECT
fahrenheit_to_celcius(temp_fahrenheit) from temperature_data;
WORKING WITH SEMI STRUCTURED DATA USING HIVE
(XML, JSON)

• Step 1) Creation of Table "xmlsample_guru" with str column with string

data type.
• Step 2) Using XPath () method we will be able to fetch the data stored
inside XML tags.
• Step 3) In this step, we will fetch and display the Raw XML of table
"xmlsample_guru."
JSON FORMAT

• Twitter and websites data is stored in JSON format.

• Whenever we try to fetch data from online servers it
will return JSON files.
• Using Hive as data store we can able to load JSON
data into Hive tables by creating schemas.
• Step 1) In this step, we are going to create JSON table name
"json_guru". Once created loading and displaying contents of the actual
schema.
• Step 2) Using get_json_object() Method we can able to fetch the Data
values stored in JSON hierarchy

Introduction to Apache Hive
No ratings yet
Introduction to Apache Hive
25 pages
Introduction to Apache Hive and Big Data
No ratings yet
Introduction to Apache Hive and Big Data
59 pages
Hive Execution Engine Overview
No ratings yet
Hive Execution Engine Overview
18 pages
Hive in Big Data: Overview and Usage
100% (1)
Hive in Big Data: Overview and Usage
24 pages
Hive Data Types in Big Data
No ratings yet
Hive Data Types in Big Data
24 pages
Chapter+9+ HIVE
No ratings yet
Chapter+9+ HIVE
50 pages
Understanding Hive: Features & Commands
No ratings yet
Understanding Hive: Features & Commands
80 pages
Hive: Overview, Architecture, and Data Modeling
No ratings yet
Hive: Overview, Architecture, and Data Modeling
28 pages
Introduction to Hive and HiveQL Basics
No ratings yet
Introduction to Hive and HiveQL Basics
14 pages
Hive Overview and Architecture in BDA
No ratings yet
Hive Overview and Architecture in BDA
23 pages
Introduction to Apache Hive Basics
No ratings yet
Introduction to Apache Hive Basics
80 pages
Overview of Apache Hive Features
No ratings yet
Overview of Apache Hive Features
30 pages
Unit IV
No ratings yet
Unit IV
57 pages
Overview of Apache Hive Data Formats
100% (1)
Overview of Apache Hive Data Formats
47 pages
Understanding Hive Map Types
No ratings yet
Understanding Hive Map Types
49 pages
Apache Hive for Data Analysts
No ratings yet
Apache Hive for Data Analysts
8 pages
Introduction to Apache Hive and Pig
No ratings yet
Introduction to Apache Hive and Pig
90 pages
Overview of Apache Hive Features
No ratings yet
Overview of Apache Hive Features
39 pages
Hive Query Language Commands
No ratings yet
Hive Query Language Commands
16 pages
Overview of Apache Hive Features
No ratings yet
Overview of Apache Hive Features
28 pages
Big Data Unit 4
No ratings yet
Big Data Unit 4
12 pages
Overview of Apache Hive Features
No ratings yet
Overview of Apache Hive Features
29 pages
Introduction to Apache Hive Framework
No ratings yet
Introduction to Apache Hive Framework
26 pages
Understanding Apache Hive and Big Data
No ratings yet
Understanding Apache Hive and Big Data
29 pages
Bigdata Unit 4
No ratings yet
Bigdata Unit 4
13 pages
Hive: Big Data Processing Overview
No ratings yet
Hive: Big Data Processing Overview
43 pages
Apache Hive: Data Warehouse & HiveQL Guide
No ratings yet
Apache Hive: Data Warehouse & HiveQL Guide
45 pages
Apache Hive Overview and Features
No ratings yet
Apache Hive Overview and Features
50 pages
BDA Unit IV
No ratings yet
BDA Unit IV
101 pages
Big Data 4.2
No ratings yet
Big Data 4.2
50 pages
Apache Hive: Data Warehouse Tool Overview
No ratings yet
Apache Hive: Data Warehouse Tool Overview
10 pages
Introduction to Apache Hive Overview
No ratings yet
Introduction to Apache Hive Overview
64 pages
Big Data Warehousing with Hive
No ratings yet
Big Data Warehousing with Hive
22 pages
Overview of Hive and Its Evolution
No ratings yet
Overview of Hive and Its Evolution
9 pages
Overview of Hive Architecture and Features
No ratings yet
Overview of Hive Architecture and Features
23 pages
Hive Overview and Key Features
No ratings yet
Hive Overview and Key Features
28 pages
Hive Overview: Features, Limitations, and Workflow
No ratings yet
Hive Overview: Features, Limitations, and Workflow
39 pages
Understanding HIVE for Big Data Analytics
No ratings yet
Understanding HIVE for Big Data Analytics
20 pages
Introduction to Apache Hive Basics
No ratings yet
Introduction to Apache Hive Basics
32 pages
Overview of Apache Hive Architecture
No ratings yet
Overview of Apache Hive Architecture
27 pages
Chapter9 HIVE
No ratings yet
Chapter9 HIVE
77 pages
Understanding Hive in Big Data
No ratings yet
Understanding Hive in Big Data
30 pages
Understanding Apache Hive for Big Data
No ratings yet
Understanding Apache Hive for Big Data
28 pages
Apache Hive Overview and Commands
No ratings yet
Apache Hive Overview and Commands
28 pages
Understanding Hive in Hadoop
No ratings yet
Understanding Hive in Hadoop
61 pages
Understanding Hive: Features and Architecture
No ratings yet
Understanding Hive: Features and Architecture
40 pages
Apache Hive and HBase Overview Guide
No ratings yet
Apache Hive and HBase Overview Guide
57 pages
Comprehensive Guide to Apache Hive
No ratings yet
Comprehensive Guide to Apache Hive
24 pages
Introduction to Hive in Big Data
No ratings yet
Introduction to Hive in Big Data
69 pages
Introduction to Apache Hive and Its Components
No ratings yet
Introduction to Apache Hive and Its Components
39 pages
Introduction to Apache Hive Overview
No ratings yet
Introduction to Apache Hive Overview
33 pages
Hive ODBC Integration in Big Data
No ratings yet
Hive ODBC Integration in Big Data
30 pages
Understanding Apache Hive for Hadoop
No ratings yet
Understanding Apache Hive for Hadoop
18 pages
Introduction to Hive and Its Features
100% (1)
Introduction to Hive and Its Features
30 pages
Overview of Apache Hive Features and Architecture
No ratings yet
Overview of Apache Hive Features and Architecture
16 pages
Understanding Hive in Hadoop
No ratings yet
Understanding Hive in Hadoop
69 pages
Hive Database Commands for Big Data Analysis
No ratings yet
Hive Database Commands for Big Data Analysis
8 pages
Overview of Apache Hive Features
No ratings yet
Overview of Apache Hive Features
32 pages
Understanding Apache Hive: Features & Architecture
No ratings yet
Understanding Apache Hive: Features & Architecture
52 pages
Principles of Database Manageme - Wilfried Lemahieu
100% (8)
Principles of Database Manageme - Wilfried Lemahieu
1,843 pages
SQL Table Creation and Query Examples
No ratings yet
SQL Table Creation and Query Examples
41 pages
SQL Interview Questions: One-Stop Resource
No ratings yet
SQL Interview Questions: One-Stop Resource
30 pages
Azure SQL DBA Interview Prep Guide
No ratings yet
Azure SQL DBA Interview Prep Guide
2 pages
OLTP vs OLAP: Data Warehouse Insights
No ratings yet
OLTP vs OLAP: Data Warehouse Insights
18 pages
Understanding MySQL and SQL Basics
No ratings yet
Understanding MySQL and SQL Basics
6 pages
E-R Diagram for Registrar's Office
No ratings yet
E-R Diagram for Registrar's Office
1 page
BCA Database Management System Syllabus
No ratings yet
BCA Database Management System Syllabus
4 pages
WAEC SS3 Data Processing Test 2025
100% (4)
WAEC SS3 Data Processing Test 2025
5 pages
Database Auditing: Methods & Importance
No ratings yet
Database Auditing: Methods & Importance
25 pages
Benefits of Geographic Information Systems
No ratings yet
Benefits of Geographic Information Systems
17 pages
Virginia Henderson: Biography & Impact
No ratings yet
Virginia Henderson: Biography & Impact
1 page
SQL Queries for Video Streaming Analysis
No ratings yet
SQL Queries for Video Streaming Analysis
4 pages
Understanding the Navigation Pane in Access
No ratings yet
Understanding the Navigation Pane in Access
8 pages
Big Data Analytics: Applications & Challenges
No ratings yet
Big Data Analytics: Applications & Challenges
6 pages
Comparing Object-Oriented and Relational Databases
No ratings yet
Comparing Object-Oriented and Relational Databases
12 pages
CSC 415 DBMS Concepts Overview
No ratings yet
CSC 415 DBMS Concepts Overview
8 pages
DPDPA Data Protection Policy Template
No ratings yet
DPDPA Data Protection Policy Template
3 pages
Database Applications Lab Syllabus
No ratings yet
Database Applications Lab Syllabus
30 pages
Backend Developer Intern CV
No ratings yet
Backend Developer Intern CV
1 page
HANA SQL Executed Statements Overview
No ratings yet
HANA SQL Executed Statements Overview
8 pages
Big Data Analytics Classification Full 20 Marks
No ratings yet
Big Data Analytics Classification Full 20 Marks
4 pages
Database Security: Threats & Solutions
No ratings yet
Database Security: Threats & Solutions
26 pages
Open Table Formats in Data Lakes
No ratings yet
Open Table Formats in Data Lakes
10 pages
Database Options for Leisure Centre Management
No ratings yet
Database Options for Leisure Centre Management
68 pages
Nonprofit Cloud
No ratings yet
Nonprofit Cloud
450 pages
Data Warehousing Overview and Differences
No ratings yet
Data Warehousing Overview and Differences
12 pages
Database Management Systems Exam Guide
No ratings yet
Database Management Systems Exam Guide
10 pages
Data Analyst Resume: Python & Power BI
No ratings yet
Data Analyst Resume: Python & Power BI
1 page
Yoast SEO Setup Guide for WordPress
No ratings yet
Yoast SEO Setup Guide for WordPress
4 pages

Hive: Big Data SQL Query Tool

Uploaded by

Hive: Big Data SQL Query Tool

Uploaded by

HIVE

The main features of Hive are:

• User Interface - Hive is a data warehouse infrastructure software that can

• Execution Engine - The conjunction part of HiveQL process Engine and

• Execute Query- The Hive interface such as Command Line or Web UI

Let’s start with tables. Hive has two types of tables:

Create Database Statement

CREATE TABLE table_name

CREATE TABLE page_view

hive> ALTER TABLE T RENAME TO K;

hive> SELECT E.EMP_ID FROM Employee E order BY [Link];

Right outer joins

(let the name of the above data file is [Link])

For creating partitions, "PARTITIONED BY" clause is used.

1. To create table use the below query

2. To load data into the table use the below command

1. To create table use the below query

Buckets in hive is used in segregating of hive table-data into

1. To create a normal table use the below query

• User Defined Functions, also known as UDF, allow you to create

class SimpleUDFExample extends UDF

{ public Text evaluate(Text input)

ve> ADD JAR /root/hivedemo/[Link];

public class ConvertToCelcius extends UDF

hive> add jar [Link]

• Step 1) Creation of Table "xmlsample_guru" with str column with string

• Twitter and websites data is stored in JSON format.

You might also like