Big Data
1
Theme
Large-Scale Data Management
Big Data Analytics
Data Science and Analytics
• How to manage very large amounts of data and extract value and
knowledge from them
2
2
Introduction to Big Data
What is Big Data?
What makes data, “Big” Data?
3
What is Big Data?
• Big data is a collection of data sets so large and complex that it
becomes difficult to process using on-hand database
management tools.
• The challenges include capture, curation, storage, search,
sharing, analysis, and visualization.
• The trend to larger data sets is due to the additional information
derivable from analysis of a single large set of related data, as
compared to separate smaller sets with the same total amount of
data, allowing correlations to be found to
• "spot business trends, determine quality of research, prevent
diseases, link legal citations, combat crime, and determine real-
time roadway traffic conditions. (Wikipedia)
4
Big Data Definition
• No single standard definition…
“Big Data” is data whose scale, diversity, and
complexity require new architecture, techniques,
algorithms, and analytics to manage it and extract
value and hidden knowledge from it…
5
Big Data: A definition
• Put another way, big data is the
realization of greater business
intelligence by storing, processing, and
analyzing data that was previously
ignored due to the limitations of
traditional data management
technologies.
6
Definition and Characteristics
• “BD is high-volume, high-velocity and high-variety
information assets that demand cost-effective, innovative
forms of information processing for enhanced insight and
decision making”– Gartner
• “While enterprises struggle to consolidate systems and
collapse redundant databases to enable greater
operational, analytical & collaborative consistencies,
changing economic conditions have made this job more
difficult. E-commerce, in particular, has exploded data mgt
challenges along dimensions: volumes, velocity & variety.
IT organizations much compile a variety of approaches to
have at their disposal for dealing each.” Doug Laney
7
What made Big Data needed?
• Increased analytics need
• Increased computation need
• Increased data volumes
• Lowered barrier to entry and success
• Innovative techniques
• Cost effective
8
Lots of data
• 2.5 quintillion bytes of data are generated every day!
• A quintillion is 1018
• Data come from many quarters.
• Social media sites
• Sensors
• Digital photos
• Business transactions
• Location-based data
9
Characteristics of Big Data:
1-Scale (Volume)
• Data Volume
• 44x increase from 2009 2020
• From 0.8 zettabytes to 35zb
• Data volume is increasing exponentially
Exponential increase in
collected/generated data
10
Characteristics of Big Data:
2-Complexity (Varity)
• Various formats, types, and structures
• Text, numerical, images, audio, video,
sequences, time series, social media
data, multi-dim arrays, etc…
• Static data vs. streaming data
• A single application can be
generating/collecting many types of
data
11
Characteristics of Big Data:
3-Speed (Velocity)
• Data is begin generated fast and need to be processed fast
• Online Data Analytics
• Late decisions ➔ missing opportunities
• Examples
• E-Promotions: Based on your current location, your purchase history,
what you like ➔ send promotions right now for store next to you
• Healthcare monitoring: sensors monitoring your activities and body ➔
any abnormal measurements require immediate reaction
12
Big Data: 3V’s
13
Some Make it 4V’s
14
The four dimensions of use
• Aspects of the way in which users want to interact with
their data…
• Totality: Users have an increased desire to process and
analyze all available data
• Exploration: Users apply analytic approaches where the
schema is defined in response to the nature of the query
• Frequency: Users have a desire to increase the rate of
analysis in order to generate more accurate and timely
business intelligence
• Dependency: Users’ need to balance investment in existing
technologies and skills with the adoption of new techniques
15
So, in a nutshell
• Big Data is about better analytics!
16
Why Big Data
17
18
Big Data Conundrum
• Problems:
• Although there is a massive spike
available data, the percentage of the
data that an enterprise can understand
is on the decline
• The data that the enterprise is trying
to understand is saturated with both
useful signals and lots of noise.
19
The Big Data platform Manifesto
imperatives and underlying technologies
20
IBM’s Big Data Platform
21
What to do with the data
22
Harnessing Big Data
• OLTP: Online Transaction Processing (DBMSs)
• OLAP: Online Analytical Processing (Data Warehousing)
• RTAP: Real-Time Analytics Processing (Big Data Architecture & technology)
23
Who’s Generating Big Data
Mobile devices
(tracking all objects all the time)
Social media and networks Scientific instruments
(all of us are generating data) (collecting all sorts of data)
Sensor technology and networks
(measuring all kinds of data)
• The progress and innovation is no longer hindered by the ability to collect data
• But, by the ability to manage, analyze, summarize, visualize, and discover
knowledge from the collected data in a timely manner and in a scalable fashion
24
The Model Has Changed…
• The Model of Generating/Consuming Data has Changed
Old Model: Few companies are generating data, all others are consuming data
New Model: all of us are generating data, and all of us are consuming data
25
What’s driving Big Data
- Optimizations and predictive analytics
- Complex statistical analysis
- All types of data, and many sources
- Very large datasets
- More of a real-time
- Ad-hoc querying and reporting
- Data mining techniques
- Structured data, typical sources
- Small to mid-size datasets
26
Value of Big Data Analytics
• Big data is more real-time in nature
than traditional DW applications
• Traditional DW architectures (e.g.
Exadata, Teradata) are not well-
suited for big data apps
• Shared nothing, massively parallel
processing, scale out architectures
are well-suited for big data apps
27
Challenges in Handling Big Data
• The Bottleneck is in technology
• New architecture, algorithms, techniques are needed
• Also in technical skills
• Experts in using the new technology and dealing with big data
28
What Technology Do We Have
For Big Data ??
29
30
31
Big Data Technology
32
Big Data Initiatives possible Course Of Action
• Complex BD applications in Science, Engg,
Medicine, Healthcare, Finance, Law & ducation
• Indian traditional Knowledge
• Transportation
• BD analytics in SMEs
• Real-life case-studies of value creation through BD
analytics………………….
33