0% found this document useful (0 votes)

7 views37 pages

Types of Analytics Explained: Big Data Insights

Uploaded by

Domakonda Neha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views37 pages

Types of Analytics Explained: Big Data Insights

Uploaded by

Domakonda Neha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

1. Explain in detail about types of analytics.

Data which are very large in size is called Big Data.

1. Descriptive analytics
• Descriptive analytics answer the question, “What happened?”.
• This type of analytics is by far the most commonly used by customers, providing
reporting and analysis on past events.
• It helps companies understand things such as:
1. How much did we sell as a company?
2. What was our overall productivity?
• Descriptive analytics deals with past trends data, it basically finds out what has
happened in the past, and based on past data or historic data it predicts the future
outcome.
• Example –
Let’s take an example of DMart, we can look at the product’s history and find out
which products have been sold more or which products have large demand by looking
at the product sold trends and based on their analysis we can further make the decision
of putting a stock of that item in large quantity for the coming year.
[Link] Analytics :

Diagnostic analysis works hand in hand with Descriptive analytics. As descriptive analytics
find out what happened in the past, diagnostic analytics, on the other hand, finds out why did
that happen or what measures were taken at that time, or how frequent it has [Link]
basically gives a detailed explanation of a particular scenario by understanding behavior
patterns.
Example –
Let’s take the example of Dmart again. Now if we want to find out why a particular product
has a lot of demand, is it because of their brand or is it because of quality. All this information
can easily be identified using diagnostic analytics.

[Link] analytics
• Predictive analytics determines what is likely to happen based on historical data using
machine learning.
• Predictive analytics helps companies address use cases such as:
1. Predicting maintenance issues and part breakdown in machines.
2. Determining credit risk and identifying potential fraud.
3. Predict and avoid customer by identifying signs of customer dissatisfaction.

Whatever information we have received from descriptive and diagnostic analytics, we can use
that information to predict future data. it basically finds out what is likely to happen in the
future. Now when I say future data doesn’t mean we have become fortune-tellers, by looking at
the past trends and behavioral patterns we are forecasting that it might happen in the future.
Example –
The best example would be Amazon and Netflix recommender system. You might have
noticed that whenever you buy any product from Amazon, on the payment side it shows you a
recommendation saying the customer who purchased this has also purchased this product that
recommendation is based on the customer purchased behavior in the past. By looking at
customer past purchase behavior analyst creates an association between each product and that’s
the reason it shows recommendation when you buy any product.

[Link] analytics
• Prescriptive analytics pertains to true guided analytics prescribing or guiding you toward
a specific action to take.
This is an advanced method of Predictive analytics. Now when you predict something or
when you start thinking out of the box you will definitely have a lot of options, and then we
get confused as to which option will actually work. Prescriptive analytics helps to find
which is the best option to make it happen or work. As predictive analytics forecast future
data, Prescriptive analytics on the other hand helps to make it happen whatever we have
forecasted.
Example–
The best example would be Google self-driving Car, by looking at the past trends and
forecasted data it identifies when to turn or when to slow down, works much like a human
driver.
• Prescriptive: The best course of action for a given situation.
• Predictive: Future is predicted based on past patterns
• Diagnostic: What has happened and why
• Descriptive: What is happening

2. What is HBase? Explain its role in data processing and real-time analytics.
HBase
• Hbase is an open source and sorted map data built on Hadoop.
• It is column oriented and horizontally scalable.
• It is based on Google's Big Table.
• It has set of tables which keep data in key value format.
• Hbase is well suited for sparse data sets which are very common in big data
use cases.
• It is a part of the Hadoop ecosystem that provides random real-time
read/write access to data in the Hadoop File System.
3. a) What are HDFS commands? Explain.
[Link]
b) Write a short note on HDFS high availability.
4. a) Explain about text analytics.
Text Analytics is a process of analyzing and understanding written or spoken
language. It employs computer algorithms and techniques to extract valuable
information, patterns, and insights from extensive textual data. In simpler terms, text
analytics empowers computers to understand and interpret human language.

How Text Analytics Work?

Text Analytics process typically includes several key steps, such as language
identification, tokenization, sentence breaking, part-of-speech tagging, chunking, syntax
parsing, and sentence chaining.

Steps of Text Analytics Process

Language Identification
 Objective: Determine the language in which the text is written.
 How it works: Algorithms analyze patterns within the text to identify the
language. This is essential as different languages may have different rules
and structures.
Tokenization
 Objective: Divide the text into individual units, often words or sub-word
units (tokens).
 How it works: Tokenization breaks down the text into meaningful units,
making it easier to analyze and process.
Sentence Breaking
 Objective: Identify and separate individual sentences in the text.
 How it works: Algorithms analyze the text to determine where one sentence
ends and another begins. This is crucial for tasks that require understanding
the context of sentences.
Part of Speech Tagging
 Objective: Assign a grammatical category (part of speech) to each token in a
sentence.
 How it works: Machine learning models or rule-based systems analyze the
context and relationships between words to assign appropriate part-of-speech
tags (e.g., noun, verb, adjective) to each token.
Chunking
 Objective: Identify and group related words (tokens) together, often based on
the part-of-speech tags.
 How it works: Chunking helps in identifying phrases or meaningful chunks
within a sentence. This step is useful for extracting information about specific
entities or relationships between words.
Syntax Parsing
 Objective: Analyze the grammatical structure of sentences to understand
relationships between words.
 How it works: Syntax parsing involves creating a syntactic tree that
represents the grammatical structure of a sentence. This tree helps in
understanding the syntactic relationships and dependencies between words.
Sentence Chaining
 Objective: Connect and understand the relationships between multiple
sentences.
 How it works: Algorithms analyze the content and context of different
sentences to establish connections or dependencies between them. This step is
crucial for tasks that require a broader understanding of the text, such as
summarization or document-level sentiment analysis.

b) Define big data. Explain evolution of big data and 4 Vs of big

data.
5. What is HDFS? What are the components of HDFS
architecture? Explain.
Hadoop File System was developed using distributed file system
design.
HDFS Architecture
Given below is the architecture of a Hadoop File System.

HDFS follows the master-slave architecture and it has the following

elements.

Namenode

The namenode is the commodity hardware that contains the

GNU/Linux operating system and the namenode software. It is a
software that can be run on commodity hardware. The system
having the namenode acts as the master server and it does the
following tasks −

 Manages the file system namespace.

 Regulates client’s access to files.
 It also executes file system operations such as renaming,
closing, and opening files and directories.

Datanode
The datanode is a commodity hardware having the GNU/Linux
operating system and datanode software. For every node
(Commodity hardware/System) in a cluster, there will be a
datanode. These nodes manage the data storage of their system.

 Datanodes perform read-write operations on the file systems,

as per client request.
 They also perform operations such as block creation, deletion,
and replication according to the instructions of the namenode.

Block

Generally the user data is stored in the files of HDFS. The file in a
file system will be divided into one or more segments and/or stored
in individual data nodes. These file segments are called as blocks. In
other words, the minimum amount of data that HDFS can read or
write is called a Block. The default block size is 64MB, but it can be
increased as per the need to change in HDFS configuration.

6. Discuss in detail about the MapReduce framework

The MapReduce task is mainly divided into 2 phases i.e. Map phase and Reduce
phase.
1. Map: As the name suggests its main use is to map the input data in key-
value pairs. The input to the map may be a key-value pair where the key
can be the id of some kind of address and value is the actual value that it
keeps. The Map() function will be executed and generates the intermediate
key-value pair which works as input for the Reducer or Reduce() function.

2. Reduce: The intermediate key-value pairs that work as input for Reducer
are shuffled and sort and send to the Reduce() function.
How Job tracker and the task tracker deal with MapReduce:
1. Job Tracker: The work of Job tracker is to manage all the resources and
all the jobs across the cluster and also to schedule each map on the Task
Tracker running on the same data node since there can be hundreds of data
nodes available in the cluster.
2. Task Tracker: The Task Tracker can be considered as the actual slaves
that are working on the instruction given by the Job Tracker. This Task
Tracker is deployed on each of the nodes available in the cluster that
executes the Map and Reduce task as instructed by Job Tracker.
7. Describe the anatomy of file read and file write in HDFS.

Anatomy of File Read in HDFS

Let’s get an idea of how data flows between the client interacting with HDFS, the name
node, and the data nodes with the help of a diagram. Consider the figure:
Step 1: The client opens the file it wishes to read by calling open() on the File System
Object(which for HDFS is an instance of Distributed File System).
Step 2: Distributed File System( DFS) calls the name node, using remote procedure
calls (RPCs), to determine the locations of the first few blocks in the file. For each
block, the name node returns the addresses of the data nodes that have a copy of that
block. The DFS returns an FSDataInputStream to the client for it to read data from.
FSDataInputStream in turn wraps a DFSInputStream, which manages the data node and
name node I/O.
Step 3: The client then calls read() on the stream. DFSInputStream, which has stored
the info node addresses for the primary few blocks within the file, then connects to the
primary (closest) data node for the primary block in the file.
Step 4: Data is streamed from the data node back to the client, which calls read()
repeatedly on the stream.
Step 5: When the end of the block is reached, DFSInputStream will close the
connection to the data node, then finds the best data node for the next block. This
happens transparently to the client, which from its point of view is simply reading an
endless stream. Blocks are read as, with the DFSInputStream opening new connections
to data nodes because the client reads through the stream. It will also call the name node
to retrieve the data node locations for the next batch of blocks as needed.
Step 6: When the client has finished reading the file, a function is called, close() on the
FSDataInputStream.

Anatomy of File Write in HDFS

Next, we’ll check out how files are written to HDFS. Consider figure 1.2 to get a better
understanding of the concept.

Note: HDFS follows the Write once Read many times model. In HDFS we cannot edit
the files which are already stored in HDFS, but we can append data by reopening the
files.

Step 1: The client creates the file by calling create() on DistributedFileSystem(DFS).

Step 2: DFS makes an RPC call to the name node to create a new file in the file
system’s namespace, with no blocks associated with it. The name node performs
various checks to make sure the file doesn’t already exist and that the client has the
right permissions to create the file. If these checks pass, the name node prepares a
record of the new file; otherwise, the file can’t be created and therefore the client is
thrown an error i.e. IOException. The DFS returns an FSDataOutputStream for the
client to start out writing data to.
Step 3: Because the client writes data, the DFSOutputStream splits it into packets,
which it writes to an indoor queue called the info queue. The data queue is consumed
by the DataStreamer, which is liable for asking the name node to allocate new blocks by
picking an inventory of suitable data nodes to store the replicas. The list of data nodes
forms a pipeline, and here we’ll assume the replication level is three, so there are three
nodes in the pipeline. The DataStreamer streams the packets to the primary data node
within the pipeline, which stores each packet and forwards it to the second data node
within the pipeline.
Step 4: Similarly, the second data node stores the packet and forwards it to the third
(and last) data node in the pipeline.

Step 5: The DFSOutputStream sustains an internal queue of packets that are waiting to
be acknowledged by data nodes, called an “ack queue”.
Step 6: This action sends up all the remaining packets to the data node pipeline and
waits for acknowledgments before connecting to the name node to signal whether the
file is complete or not.

8. Explain how reporting and analytics differ. Why are they

important?
An organisation often requires both reporting and analysis to explore
new business insights from big data.
A very common mistake organisations make is to relate reporting with
analysis.
For this, it is essential to know the difference between a report and an
analysis.
Reporting and analytics play a crucial role in the realm of big data analytics for several
reasons:

1. Decision Making: Reporting and analytics provide insights derived from large
volumes of data, enabling informed decision-making. By analyzing trends,
patterns, and anomalies, organizations can make data-driven decisions that are
aligned with their strategic objectives.
2. Performance Monitoring: Reporting tools allow organizations to monitor the
performance of various aspects of their operations in real-time. This helps in
identifying areas of improvement, optimizing processes, and maximizing
efficiency.
3. Identifying Trends and Patterns: Big data analytics help in uncovering hidden
trends and patterns within the data that might not be immediately apparent. By
analyzing these patterns, organizations can gain valuable insights into customer
behavior, market trends, and emerging opportunities.
4. Predictive Analytics: Reporting and analytics can be used for predictive
modeling, enabling organizations to forecast future trends and outcomes based
on historical data. This helps in proactive decision-making and strategic planning.
5. Customer Insights: Big data analytics can provide valuable insights into
customer behavior, preferences, and sentiments. By analyzing customer data,
organizations can personalize their marketing efforts, improve customer
satisfaction, and enhance customer retention.
6. Risk Management: Reporting and analytics help in identifying and mitigating
risks by analyzing historical data and identifying potential risk factors. This allows
organizations to take proactive measures to minimize risks and uncertainties.
7. Cost Optimization: By analyzing data related to resource utilization, operational
efficiency, and expenditure, organizations can identify opportunities for cost
optimization and resource allocation.

9. Explain all the phases in analysis process with necessary

diagram.
10. a) Explain about types of data.
There are three types of Big Data: Structured, Semi-structured and Unstructured
data.

1. Structured Data: Any data in a fixed format is known as structured data. It can
only be accessed, stored, or processed in a particular format. This type of data is
stored in the form of tables with rows and columns. Any Excel file or SQL file is an
example of structured data.
2. The data which is to the point, factual, and highly organized is referred to as
structured data.
3. It is easy to search and analyze structured data.
4. Structured data exists in a predefined format.
5. Relational database consisting of tables with rows and columns is one of the best
examples of structured data.
6. Structured data generally exist in tables like excel files and Google Docs
spreadsheets.
7. The programming language SQL (structured query language) is used for
managing the structured data.
[Link] Data: Unstructured data do not have a fixed format. These are stored in
an unknown format. Such type of data is known as unstructured data. An example of
unstructured data is a web page with text, images, videos, etc.

• Unstructured data is the data that lacks any predefined model or format.
• It requires a lot of storage space, and it is hard to maintain security in it.
• It cannot be presented in a data model or schema.
• That's why managing, analyzing, or searching for unstructured data is hard.
• It resides in various different formats like text, images, audio and video files, etc.
• It is qualitative in nature and sometimes stored in a non-relational database or
NO-SQL.

8. Semi-structured Data: Semi-structured data is the combination of structured as

well as unstructured forms of data. It does not contain any table to show
relations; it contains tags or other markers to show hierarchy. JSON files, XML
files, and CSV files (Comma-separated files) are semi-structured data examples.
The e-mails we send or receive are also an example of semi-structured data.

o Semi-structured data is a type of data that is not purely structured, but

also not completely unstructured.
o It contains some level of organization or structure, but does not conform
to a rigid schema or data model
o Semi-structured data is typically characterized by the use of metadata or
tags that provide additional information about the data elements. For
example, an XML document

b) Discuss about convergence of IT and analytics.

Big Data Analytics with Hadoop Overview
50% (2)
Big Data Analytics with Hadoop Overview
27 pages
Comprehensive Guide to Data Analysis Techniques
No ratings yet
Comprehensive Guide to Data Analysis Techniques
48 pages
Big Data Evolution and Data Science Insights
No ratings yet
Big Data Evolution and Data Science Insights
17 pages
Week 2
No ratings yet
Week 2
26 pages
Big Data Analytics: Architecture & Techniques
No ratings yet
Big Data Analytics: Architecture & Techniques
25 pages
Big Data Analytics Overview and Tools
100% (1)
Big Data Analytics Overview and Tools
14 pages
Big Data Analytics Overview and Insights
No ratings yet
Big Data Analytics Overview and Insights
4 pages
BD Intqb
No ratings yet
BD Intqb
11 pages
Big Data Analytics Overview and Applications
No ratings yet
Big Data Analytics Overview and Applications
78 pages
Data Analytics Overview and Techniques
No ratings yet
Data Analytics Overview and Techniques
15 pages
Viva Big Data
No ratings yet
Viva Big Data
8 pages
Extracting Usernames from Purchase Logs
No ratings yet
Extracting Usernames from Purchase Logs
26 pages
Understanding Big Data Sources
No ratings yet
Understanding Big Data Sources
27 pages
Understanding Hadoop for Big Data Analytics
No ratings yet
Understanding Hadoop for Big Data Analytics
6 pages
Web Analytics: Key Metrics & Process
No ratings yet
Web Analytics: Key Metrics & Process
6 pages
Unit 1 Da Full Notes
No ratings yet
Unit 1 Da Full Notes
18 pages
Big Data Analytics Fundamentals Guide
No ratings yet
Big Data Analytics Fundamentals Guide
64 pages
Key Features of Hadoop Architecture
No ratings yet
Key Features of Hadoop Architecture
14 pages
L2 UC3BDA102 S Kadry2024
No ratings yet
L2 UC3BDA102 S Kadry2024
26 pages
Data Warehouse and Data Mining Trends
No ratings yet
Data Warehouse and Data Mining Trends
35 pages
Big Data Processing with Relational Databases
No ratings yet
Big Data Processing with Relational Databases
10 pages
Big Data Analytics with Hadoop Guide
No ratings yet
Big Data Analytics with Hadoop Guide
134 pages
Overview of Big Data Ecosystem Components
No ratings yet
Overview of Big Data Ecosystem Components
76 pages
Bda Answers
No ratings yet
Bda Answers
49 pages
Big Data: Insights, Tools, and Challenges
No ratings yet
Big Data: Insights, Tools, and Challenges
86 pages
Overview of AI, Data Warehousing, and Analytics
No ratings yet
Overview of AI, Data Warehousing, and Analytics
7 pages
Module 1 Bda
No ratings yet
Module 1 Bda
8 pages
Components of Big Data Architecture
No ratings yet
Components of Big Data Architecture
31 pages
Data Analytics Techniques Overview
No ratings yet
Data Analytics Techniques Overview
25 pages
Big Data Analytics
No ratings yet
Big Data Analytics
6 pages
Advanced Analytics and Unstructured Data Insights
No ratings yet
Advanced Analytics and Unstructured Data Insights
11 pages
Big Data Analytics Characteristics and Architecture
No ratings yet
Big Data Analytics Characteristics and Architecture
37 pages
Types and Characteristics of Data in Analytics
No ratings yet
Types and Characteristics of Data in Analytics
12 pages
Big Data Analytics Mid-Sem Exam 2024
No ratings yet
Big Data Analytics Mid-Sem Exam 2024
10 pages
Understanding KDD and Data Mining Techniques
No ratings yet
Understanding KDD and Data Mining Techniques
183 pages
Data Characteristics and Big Data Insights
No ratings yet
Data Characteristics and Big Data Insights
24 pages
Key Characteristics of Data Processing
No ratings yet
Key Characteristics of Data Processing
24 pages
Big Data Previous Year Paper Solution
No ratings yet
Big Data Previous Year Paper Solution
48 pages
Data Products
No ratings yet
Data Products
12 pages
Introduction to Data Mining Concepts
No ratings yet
Introduction to Data Mining Concepts
59 pages
Key Methods for Analyzing Housing Data
No ratings yet
Key Methods for Analyzing Housing Data
13 pages
Big Data Analytics Study Pack for VTU SEE
No ratings yet
Big Data Analytics Study Pack for VTU SEE
13 pages
Hadoop Architecture and Setup Guide
No ratings yet
Hadoop Architecture and Setup Guide
11 pages
Data Science and Mining Overview Guide
No ratings yet
Data Science and Mining Overview Guide
5 pages
Introduction to Big Data Concepts
100% (2)
Introduction to Big Data Concepts
87 pages
Types and Characteristics of Digital Data
No ratings yet
Types and Characteristics of Digital Data
11 pages
Module 1
No ratings yet
Module 1
117 pages
Key BDA Questions and Concepts
100% (1)
Key BDA Questions and Concepts
26 pages
Big Data Analytics: Tools & Strategies
No ratings yet
Big Data Analytics: Tools & Strategies
51 pages
Big Data Analytics: Opportunities & Challenges
No ratings yet
Big Data Analytics: Opportunities & Challenges
10 pages
Bda Unit-1 (Big Data & Hadoop)
No ratings yet
Bda Unit-1 (Big Data & Hadoop)
52 pages
Data Analytics: Structured vs Unstructured
No ratings yet
Data Analytics: Structured vs Unstructured
27 pages
Que.1. Explain Common Terminologies in Data Analytics
No ratings yet
Que.1. Explain Common Terminologies in Data Analytics
8 pages
Big Data Analytics: Concepts & Technologies
No ratings yet
Big Data Analytics: Concepts & Technologies
33 pages
Big Data Processing Techniques Explained
No ratings yet
Big Data Processing Techniques Explained
9 pages
IoT Data Analytics and Visualization Guide
No ratings yet
IoT Data Analytics and Visualization Guide
116 pages
Big Data Analytics Exam Guide
No ratings yet
Big Data Analytics Exam Guide
15 pages
MapReduce Programming Framework Overview
No ratings yet
MapReduce Programming Framework Overview
35 pages
Data Mining: Introduction & Preprocessing
No ratings yet
Data Mining: Introduction & Preprocessing
9 pages
Data Visualization Techniques Explained
No ratings yet
Data Visualization Techniques Explained
29 pages
HBase Data Model and MapReduce Overview
No ratings yet
HBase Data Model and MapReduce Overview
32 pages
Overview of the Hadoop Ecosystem
No ratings yet
Overview of the Hadoop Ecosystem
38 pages
Cybersecurity Fundamentals Overview
No ratings yet
Cybersecurity Fundamentals Overview
144 pages
Dbms Notes 25-26
No ratings yet
Dbms Notes 25-26
129 pages
Ottawa T2 Wiring and Fuse Diagrams
100% (1)
Ottawa T2 Wiring and Fuse Diagrams
55 pages
CFW11 Solar Pump Drive Manual
No ratings yet
CFW11 Solar Pump Drive Manual
33 pages
Overview of Electronic Control Systems
No ratings yet
Overview of Electronic Control Systems
18 pages
Answer Formulation in QA Systems
No ratings yet
Answer Formulation in QA Systems
12 pages
Exam Questions 220-1002: Comptia A+ Certification Exam: Core 2
No ratings yet
Exam Questions 220-1002: Comptia A+ Certification Exam: Core 2
14 pages
Computer Vision: Transforming Machine Perception
No ratings yet
Computer Vision: Transforming Machine Perception
4 pages
Online Furniture Shop Management Proposal
No ratings yet
Online Furniture Shop Management Proposal
7 pages
Advanced Progressive Scan: Operating Instructions
No ratings yet
Advanced Progressive Scan: Operating Instructions
40 pages
Revisiting AI Project Cycle
No ratings yet
Revisiting AI Project Cycle
15 pages
Sampling and Quantization in DSP Lab
No ratings yet
Sampling and Quantization in DSP Lab
4 pages
MDB Pinout for CashCode Bill Validator
No ratings yet
MDB Pinout for CashCode Bill Validator
2 pages
Laptop Battery Specifications and Links
No ratings yet
Laptop Battery Specifications and Links
11 pages
Teltrac Telecom Call Accounting Solutions
No ratings yet
Teltrac Telecom Call Accounting Solutions
12 pages
Portable Programmer: Features
No ratings yet
Portable Programmer: Features
1 page
Half/Full Adder and Subtractor Lab Guide
100% (10)
Half/Full Adder and Subtractor Lab Guide
6 pages
AP DSC 2025 Hall Ticket Guidelines
No ratings yet
AP DSC 2025 Hall Ticket Guidelines
2 pages
Allsec Tax Proof Submission Guide
No ratings yet
Allsec Tax Proof Submission Guide
11 pages
Bugbounting Roadmap
No ratings yet
Bugbounting Roadmap
30 pages
Exabeam UEBA - Public
No ratings yet
Exabeam UEBA - Public
33 pages
SQL Conversion Functions Guide
No ratings yet
SQL Conversion Functions Guide
31 pages
Machine Learning Workshop Summary Report
No ratings yet
Machine Learning Workshop Summary Report
42 pages
Essential Linux Commands and Tips
No ratings yet
Essential Linux Commands and Tips
4 pages
IT Project Management Essentials Guide
No ratings yet
IT Project Management Essentials Guide
4 pages
Public Domain Book Usage Guidelines
No ratings yet
Public Domain Book Usage Guidelines
491 pages
Create Figure-Ground in Photoshop
No ratings yet
Create Figure-Ground in Photoshop
3 pages
HCI Techniques in Medical Training Review
No ratings yet
HCI Techniques in Medical Training Review
115 pages
Manakonline License Application Guide
No ratings yet
Manakonline License Application Guide
15 pages
Comparing Software Life Cycle Models
No ratings yet
Comparing Software Life Cycle Models
5 pages
MSBTE Summer 2025 Exam Manual
No ratings yet
MSBTE Summer 2025 Exam Manual
3 pages

Types of Analytics Explained: Big Data Insights

Uploaded by

Types of Analytics Explained: Big Data Insights

Uploaded by

1. Explain in detail about types of analytics.

Data which are very large in size is called Big Data.

How Text Analytics Work?

Steps of Text Analytics Process

b) Define big data. Explain evolution of big data and 4 Vs of big

HDFS follows the master-slave architecture and it has the following

The namenode is the commodity hardware that contains the

 Manages the file system namespace.

 Datanodes perform read-write operations on the file systems,

6. Discuss in detail about the MapReduce framework

Anatomy of File Read in HDFS

Anatomy of File Write in HDFS

Step 1: The client creates the file by calling create() on DistributedFileSystem(DFS).

8. Explain how reporting and analytics differ. Why are they

9. Explain all the phases in analysis process with necessary

8. Semi-structured Data: Semi-structured data is the combination of structured as

o Semi-structured data is a type of data that is not purely structured, but

b) Discuss about convergence of IT and analytics.

You might also like