0% found this document useful (0 votes)
14 views21 pages

Understanding Big Data Concepts

Uploaded by

Chiranjeeb Nayak
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views21 pages

Understanding Big Data Concepts

Uploaded by

Chiranjeeb Nayak
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd

Introduction to Big Data

Analytics
By
Dr. Abhisek Sethy
WHAT IS BIG DATA
• Big Data is a collection of data that is huge in volume, yet growing exponentially
with time. It is a data with so large size and complexity that none of traditional
data management tools can store it or process it efficiently. Big data is also a data
but with huge size.

• Big Data, a popular term recently, has come to be defined as a large amount of
data that can’t be stored or processed by conventional data storage or
processing equipment.

• Due to the massive amounts of data produced by human and machine activities,
the data are so complex and expansive that they cannot be interpreted by
humans nor fit into a relational database for analysis.

• However, when suitably evaluated using modern tools, these massive volumes of
data provide organizations with useful insights that help them improve their
business by making informed decisions.
• Big data, typically measured in petabytes or terabytes, materializes
from three major sources—transactional data, machine data, and
social data.

Digital Data

• Digital data is the electronic representation of information in a


format or language that machines can read and understand. In more
technical terms, digital data is a binary format of information that's
converted into a machine-readable digital format.

• Digital data is information stored on a computer system as a series


of 0’s and 1’s in a binary language. Digital data jumps from one
value to the next in a step by step sequence. Example: Whenever
we send an email, read a social media post, or take pictures with
our digital camera, we are working with digital data
TYPES OF BIG-DATA
• Big Data is generally categorized into three different
varieties. They are as shown below:
• Structured Data
• Semi-Structured Data
• Unstructured Data
• Structured Data owns a dedicated data model, It also has a well defined structure,
it follows a consistent order and it is designed in such a way that it can be easily
accessed and used by a person or a computer Structured data is usually stored in
well defined columns and also Databases
Example: Database Management DBMS

• A certain schema binds it, so all the data has the same set of properties.
Structured data is also called relational data. It is split into multiple tables to
enhance the integrity of the data by creating a single record to depict an entity.
Relationships are enforced by the application of table constraints.

• Semi Structured Data can be considered as another form of Structured Data It


inherits a few properties of Structured Data, but the major part of this kind of data
fails to have a definite structure and also, it does not obey the formal structure of
data models such as an RDBMS
Example: Comma Separated Values( CSV ), emails, XML, markup languages like HTML

• Semi-structured data is not bound by any rigid schema for data storage and
handling. The data is not in the relational format and is not neatly organized into
rows and columns like that in a spreadsheet. However, there are some features like
key-value pairs that help in discerning the different entities from each other.
• Unstructured Data is completely a different type of which
neither has a structure nor obeys to follow the formal
structural rules of data models It does not even have a
consistent format and it found to be varying all the time But,
rarely it may have information related to data and time
Example: Audio Files, Images etc.
CHARACTERISTICS OF BIG DATA
• Big data is characterized by five key properties, often referred to as the "5 Vs":
Volume, Velocity, Variety, Veracity, and Value. These characteristics highlight
the unique challenges and opportunities presented by large, complex datasets.
• Volume: the size and amounts of big data that companies manage and analyze
• Value: the most important “V” from the perspective of the business, the value
of big data usually comes from insight discovery and pattern recognition that
lead to more effective operations, stronger customer relationships and other
clear and quantifiable business benefits
• Variety: the diversity and range of different data types, including unstructured
data, semi-structured data and raw data
• Velocity: the speed at which companies receive, store and manage data – e.g.,
the specific number of social media posts or search queries received within a
day, hour or other unit of time
• Veracity: the “truth” or accuracy of data and information assets, which often
determines executive-level confidence
• The additional characteristic of variability can also be considered:
• Variability: the changing nature of the data companies seek to capture,
manage and analyze – e.g., in sentiment or text analytics, changes in the
meaning of key words or phrases
• Key Applications:
• Personalized Recommendations: Big data helps businesses
understand customer behavior and preferences to offer tailored
recommendations for products, services, and content.
• Fraud Detection: Financial institutions use big data to analyze
transaction patterns and identify fraudulent activities, minimizing
financial losses.
• Risk Management: Big data analytics helps assess and mitigate
risks in various sectors, such as finance, insurance, and supply
chain management.
• Network Optimization: Telecom companies use big data to
optimize network performance, improve service quality, and
manage resources efficiently.
• Predictive Maintenance: In manufacturing and other industries,
big data enables predictive maintenance by analyzing sensor
data to anticipate equipment failures and prevent downtime.
• Smart Cities: Big data is used to manage traffic flow, optimize
energy consumption, improve public safety, and enhance
overall city management.
• Healthcare: Big data helps in predictive diagnostics,
personalized treatment plans, and improving patient outcomes.
• Transportation and Logistics: Big data optimizes routes,
manages inventory, and streamlines logistics operations,
reducing costs and improving efficiency.
• Marketing and Advertising: Big data enables targeted
advertising campaigns, personalized marketing messages, and
improved customer engagement.
• Education: Big data is used to personalize learning experiences,
track student progress, and improve educational outcomes.
• Climate and Earth Science: Big data helps analyze climate
patterns, predict weather events, and understand
environmental changes.
Data Analytics
• Data analytics is the process of examining raw data to draw
conclusions, make predictions, and drive informed decision-
making. It involves collecting, cleaning, transforming, and
analyzing data using various techniques to identify patterns,
trends, and insights.
• This process helps organizations understand their data, optimize
operations, personalize customer experiences, and improve
overall performance.

• The chief aim of data analytics is to apply statistical analysis and


technologies on data to find trends and solve problems.

• To ensure robust analysis, data analytics teams leverage a range


of data management techniques, including data mining, data
DATA ANALYTICS VS. DATA ANALYSIS
• While the terms data analytics and data analysis are frequently
used interchangeably, data analysis is a subset of data analytics
concerned with examining, cleansing, transforming, and modeling
data to derive conclusions.
• Data analytics includes the tools and techniques used to perform
data analysis.
Data Analytics Data Analysis

It is described as a traditional form or It is described as a particularized form of


generic form of analytics. analytics.

It includes several stages like the collection To process data, firstly raw data is defined in a
of data and then the inspection of business meaningful manner, then data cleaning and
data is done. conversion are done to get meaningful
information from raw data.

It supports decision making by analyzing It analyzes the data by focusing on insights into
enterprise data. business data.

It uses various tools to process data such as It uses different tools to analyze data such as
Tableau, Python, Excel, etc. Rapid Miner, Open Refine, Node XL, KNIME, etc.

Descriptive analysis cannot be performed A Descriptive analysis can be performed on this.


on this.

One can find anonymous relations with the One cannot find anonymous relations with the
help of this. help of this.

It does not deal with inferential analysis. It supports inferential analysis.


DATA ANALYTICS VS. DATA SCIENCE
•Data analytics and data science are closely related.
•Data analytics is a component of data science, used to understand
what an organization’s data looks like.
•Generally, the output of data analytics are reports and
visualizations.
•Data science takes the output of analytics to study and solve
problems.
•The difference between data analytics and data science is often
seen as one of timescale.
•Data analytics describes the current or historical state of reality,
whereas data science uses that data to predict and/or understand
the future.
WHAT IS DATA ANALYTICS IN BUSINESS?

• Data analytics is the practice of examining data to answer questions,


identify trends, and extract insights. When data analytics is used in
business, it’s often called business analytics.
• You can use tools, frameworks, and software to analyze data, such
as Microsoft Excel and Power BI, Google Charts, Data Wrapper,
Infogram, Tableau, and Zoho Analytics. These can help you examine
data from different angles and create visualizations that illuminate
the story you’re trying to tell.
• Algorithms and machine learning also fall into the data analytics
field and can be used to gather, sort, and analyze data at a higher
volume and faster pace than humans can. Writing algorithms is a
more advanced data analytics skill, but you don’t need deep
knowledge of coding and statistical modeling to experience the
benefits of data-driven decision-making.
Types of Data Analytics
• Descriptive Analytics
• As the name suggests, the purpose of descriptive analytics
is to simply describe what has happened; it doesn’t try to
explain why this might have happened or to establish
cause-and-effect relationships. The aim is solely to provide
an easily digestible snapshot.

• Descriptive analytics looks at data and analyze past event


for insight as to how to approach future events. It looks at
past performance and understands the performance by
mining historical data to understand the cause of success
or failure in the past.
• Google Analytics is a good example of descriptive analytics in action; it
provides a simple overview of what’s been going on with your website,
showing you how many people visited in a given time period, for example, or
where your visitors came from. Similarly, tools like HubSpot will show you
how many people opened a particular email or engaged with a certain
campaign.
• There are two main techniques used in descriptive analytics: Data
aggregation and data mining.
Diagnostic Analytics

• Diagnostic analytics seeks to delve deeper in order to understand


why something happened. The main purpose of diagnostic
analytics is to identify and respond to anomalies within your data.
For example: If your descriptive analysis shows that there was a
20% drop in sales for the month of March, you’ll want to find out
why. The next logical step is to perform a diagnostic analysis.

• Diagnostic analytics addresses the next logical question, “Why did


this happen?”

• Diagnostic analytics uses data (often generated via descriptive


analytics) to discover the factors or reasons for past performance.
• Common techniques used for Diagnostic Analytics are:Data
discovery, Data mining, Correlations
Predictive analytics
• What is likely to happen in the future? Predictive analytics
applies techniques such as statistical modeling, forecasting, and
machine learning to the output of descriptive and diagnostic
analytics to make predictions about future outcomes.
• Predictive analytics seeks to predict what is likely to happen in
the future. Based on past patterns and trends, data analysts can
devise predictive models which estimate the likelihood of a
future event or outcome. This is especially useful as it enables
businesses to plan ahead.
Prescriptive Analytics
• What do we need to do? Prescriptive analytics is a type of advanced
analytics that involves the application of testing and other
techniques to recommend specific solutions that will deliver desired
outcomes. In business, predictive analytics uses machine learning,
business rules, and algorithms.
• Predictive analytics turn the data into valuable, actionable
information. predictive analytics uses data to determine the
probable outcome of an event or a likelihood of a situation
occurring. Predictive analytics holds a variety of statistical techniques
from modeling, machine learning , data mining , and game theory
that analyze current and historical facts to make predictions about a
future event. Techniques that are used for predictive analytics are:
• Linear Regression
• Time Series Analysis and Forecasting
• Data Mining

You might also like