PSIT201:
Big Data Analytics
Lecture # 4
Compiled By:
Beena Kapadia
State of Practice in Analytics
Key roles for New Big Data
Quick Recap of Ecosystems
Previous Lecture
Examples of big Data Analytics
4/12/2021 [Link] AM Big Data Analytics 2
State of the Practice in Analytics
Business Driver Examples
Optimize business operations Sales, pricing, profitability, efficiency
Identify business risk Customer churn, fraud, default
Predict new business opportunities Upsell, cross-sell, best new customer
prospects
Comply with laws or regulatory Anti-Money Laundering, Fair Lending,
requirements
TABLE 1-2 Business Drivers for Advanced Analytics
The first three examples do not represent new problems.
The last example portrays emerging regulatory requirements.
Different types of analytics:
• BI Versus Data Science: past to today Vs today to future
• Current Analytical Architecture (data flow): Most traditional data architectures prevent data
exploration and more sophisticated analysis.
• Drivers of Big Data: in the 1990s the volume of information was in terabytes. In 2000, it was
petabyte scales and From 2000 onwards, it is in exabyte scale.
• Emerging Big Data Ecosystem and a New Approach to Analytics: Using technologies such as
Hadoop to perform natural language processing on unstructured, textual data from social
media websites, and in many other modern applications
Key Roles for the New Big Data Ecosystem
1. Deep Analytical Talent- is technically savvy, with strong analytical skills. This group
has advanced training in quantitative disciplines, such as mathematics, statistics, and
machine learning.
• Examples of current professions fitting into this group include statisticians,
economists, mathematicians, and the new role of the Data Scientist.
2. Data Savvy Professionals- These people tend to have a base knowledge of working
with data, or an appreciation for some of the work being performed by data scientists
and others with deep analytical talent.
• Examples of data savvy professionals include financial analysts, market research
analysts, life scientists, operations managers, and business and functional managers.
[Link] and Data Enablers- This group represents people providing technical
expertise to support analytical projects and managing large-scale data architectures
within companies and other organizations.
• This role requires skills related to computer engineering, programming, and
database administration.
Examples of Big Data Analytics
There are three examples of Big Data Analytics in different areas: retail, IT infrastructure, and social
media.
1. Retail: As mentioned earlier, Big Data presents many opportunities to improve sales and
marketing analytics.
An example of this is the U.S. retailer Target. After analyzing consumer purchasing behavior,
Target's statisticians determined that the retailer made a great deal of money from three main
life-event situations - Marriage, Divorce, and Pregnancy.
2. IT infrastructure: MapReduce paradigm is an ideal technical framework for many Big Data
projects, which rely on large data sets of social media with unusual data structures.
3. social media: It represents a tremendous opportunity to influence social and professional
interactions to derive new insights like linkdIn.
Poll Activity : Lecture # 4
4/12/2021 [Link] AM Big Data Analytics 8
Big Data Analytics:
Introduction to big data analytics
Classification of Analytics
Today’s Content
Challenges of Big Data
4/12/2021 [Link] AM Big Data Analytics 9
Big Data Analytics:
Introduction to big data analytics
Big Data analytics is the process of collecting, organizing and analyzing a large amount of data to
discover hidden pattern, correlation and other meaningful insights.
Big Data Analytics is..
• Technology-enabled analytics: A few data analytics and visualization tools are available in the
market today from leading vendors such as IBM, Tableau, SAS, R Analytics, Statistica, World
Programming Systems (WPS), etc. to help process and analyze your big data.
• About gaining a meaningful, deeper, and richer insight into your business to steer it in the right
direction, understanding the customer's demographics to cross-sell and up-sell to them, better
leveraging the services of your vendors and suppliers, etc.
• About a competitive edge over your competitors by enabling you with findings that allow
quicker and better decision-making.
• A tight handshake between three communities: IT, business users and data
scientists.
• Working with datasets whose volume and variety exceed the current storage and
processing capabilities and infrastructure of your enterprise.
• About moving code to data. This makes perfect sense as the program for distributed
processing is tiny (just a few KBs) compared to the data (Terabytes or Petabytes
today and likely to be Exabytes or Zettabytes in near future).
• Time-sensitive decisions made in near real time by processing a steady stream of
real-time data.
• Working with datasets whose volume and variety is beyond the storage and
processing capability of a typical Database Software.
4/12/2021 [Link] AM Big Data Analytics 11
Classification of Analytics
There are basically
basically two schools of
thought:
1. Those that classify
analytics into basic,
operationalized,
advanced and
Monetized.
2. Those that classify
analytics into analytics
1.0, analytics 2.0, and
analytics 3.0.
4/12/2021 [Link] AM Big Data Analytics 12
1 First School of Thought
It includes Basic analytics, Operationalized analytics, Advanced analytics and
Monetized analytics.
• Basic analytics: This primarily is slicing and dicing of data to help with basic business
insights. This is about reporting on historical data, basic visualization, etc.
• Operationalized analytics: It is operationalized analytics if it gets woven into the
enterprises business processes.
• Advanced analytics: This largely is about forecasting for the future by way of
predictive and prescriptive modelling.
• Monetized analytics: This is analytics in use to derive direct business revenue.
4/12/2021 [Link] AM Big Data Analytics 13
2 Second School of Thought
• Let us take a closer look at analytics 1.0, analytics 2.0, and analytics 3.0. Refer Table
2.1. Figure 2.1 shows the subtle growth of analytics from Descriptive → Diagnostic
→ Predictive → Perspective analytics.
Analytics 1.0 Analytics 2.0 Analytics 3.0
Era: mid 1990s to 2009 2005 to 2012 2012 to present
Descriptive statistics Descriptive statistics and Descriptive + predictive +
predictive statistics prescriptive statistics
4/12/2021 [Link] AM Big Data Analytics 14
Analytics 1.0 Analytics 2.0 Analytics 3.0
key questions asked: key questions asked: Key questions asked:
What happened? Why will it happen? What & when will happen?
Why did it happen? What should be the action
taken to take advantage of
what will happen?
Data from legacy systems. Big data A blend of big data and
ERP, CRM, and 3rd party data from legacy systems,
applications ERP, CRM, and 3rd party
applications
Small and structured data Big data is being taken up A blend of big data and
sources seriously. Data is mainly traditional data
unstructured, arriving at a
much higher pace.
4/12/2021 [Link] AM Big Data Analytics 15
Analytics 1.0 Analytics 2.0 Analytics 3.0
Data stored in enterprise Data had to be stored and Data has to yield insights
data warehouses or data processed rapidly, often on and offerings with speed
marts. massive parallel servers and impact.
running Hadoop.
Data was internally sourced. Data was often externally Data is both being
sourced. internally and externally
sourced.
Relational databases Database appliances, In memory analytics, in
Hadoop clusters, SQL to database processing, agile
Hadoop environments, etc. analytical methods,
machine learning
techniques etc.
4/12/2021 [Link] AM Big Data Analytics 16
Challenges of Big Data
There are mainly seven challenges of big data:
• scale,
• security,
• schema,
• Continuous availability,
• Consistency,
• Partition tolerant and
• data quality.
4/12/2021 [Link] AM Big Data Analytics 17
• Scale: The need of the hour is a storage that can best withstand the attack of large
volume, velocity and variety of big data.
• Security: Most of the NoSQL big data platforms have poor security mechanisms (lack
of proper authentication and authorization mechanisms) when it comes to
safeguarding big data.
• Schema: Rigid schemas have no place. The need of the hour is dynamic schema.
• Continuous availability: The big question here is how to provide 24/7 support
because almost all RDBMS and NoSQL big data platforms have a certain amount of
downtime built in.
4/12/2021 [Link] AM Big Data Analytics 18
• Consistency: Should one opt for consistency or eventual consistency?
• Partition tolerant: How to build partition tolerant systems that can take care of both
hardware and software failures?
• Data quality: How to maintain data quality- data accuracy, completeness, timeliness,
etc.? Do we have appropriate metadata in place?
4/12/2021 [Link] AM Big Data Analytics 19
4/12/2021 [Link] AM Big Data Analytics 20
Importance of Big Data
Big Data Technologies
Data Science
4/12/2021 [Link] AM Big Data Analytics 21
Quiz Activity 2 : Lecture # 4
4/12/2021 [Link] AM Big Data Analytics 22
THANK YOU
4/12/2021 [Link] AM Big Data Analytics 23