0% found this document useful (0 votes)
15 views22 pages

Big Data Analytics in Education Overview

The document provides an overview of Big Data Analytics in education, detailing its importance, classification, and the technologies involved. It covers the steps in the analytics process, the role of cloud computing, and how data analytics can benefit businesses. Additionally, it highlights key statistics and trends in the global big data market, emphasizing the significance of data science in extracting insights from large datasets.

Uploaded by

alexa007helper
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views22 pages

Big Data Analytics in Education Overview

The document provides an overview of Big Data Analytics in education, detailing its importance, classification, and the technologies involved. It covers the steps in the analytics process, the role of cloud computing, and how data analytics can benefit businesses. Additionally, it highlights key statistics and trends in the global big data market, emphasizing the significance of data science in extracting insights from large datasets.

Uploaded by

alexa007helper
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd

Lecture 2

Big Data and Analytics in


Education

Aditya Rajbongshi
Assistant Professor
Dept. of Educational Technology and Engineering
University of Frontier Technology, Bangladesh
Agenda

i. Introduction to Big Data Analytics


ii. Classification of Analytics
iii. How Can Data Analytics Help Businesses?
iv. Why is big data analytics important?
v. Big Data Statistics
vi. The global big data market
vii. Big Data Technologies & Tools
viii. The Role of Cloud Computing in Big Data Analytics Services
ix. Data Science
Introduction to Big Data Analytics
Big Data Analytics is a field of study and practice that involves processing and
analyzing large and complex datasets to extract valuable insights and
information.
The working procedure of big data analytics involves several steps, from data
acquisition to generating actionable insights.
 Define Objectives and Questions: Clearly define the objectives of the analytics
project and the specific questions you want to answer.
 Data Collection: Collect relevant data from various sources.
 Data Ingestion: This involves loading the data into a storage system like Hadoop
Distributed File System (HDFS) or a NoSQL database. Ingestion tools and
frameworks help manage the flow of data into the analytics platform.
 Data Cleaning and Pre-processing: Clean and preprocess the data to handle
missing values, correct errors, and standardize formats.
 Data Storage: Store the preprocessed data in a way that allows for efficient
retrieval and analysis.
Introduction to Big Data Analytics
 Data Exploration and Descriptive Analytics: Descriptive analytics involves
summarizing and visualizing the data using techniques such as statistical measures,
charts, and graphs.
 Data Transformation and Feature Engineering: Transform the data and engineer
new features to make it suitable for analysis.
 Model Development (Machine Learning):Apply machine learning algorithms to
build predictive models. This step involves selecting appropriate algorithms,
training the models on historical data, and tuning parameters for optimal
performance.
 Model Evaluation: Evaluate the performance of the machine learning models using
validation datasets.
 Deployment: Deploy the models into production environments, making them
available for real-time or batch processing.
Introduction to Big Data Analytics

 Monitoring and Maintenance: Continuously monitor the performance of


deployed models and update them as needed.
 Visualization and Reporting: Visualization tools help stakeholders understand
complex patterns and make informed decisions.
 Iterative Improvement: The iterative nature of big data analytics allows
organizations to continuously improve models and derive more value from
their data.
Classification of Analytics
Data analytics is categorized into four types:
Descriptive analytics: deals with examining historical data and interpreting it in
order to answer to “What happened?”. Further, descriptive analytics visualizes the
description of historical data and changes within a business through pie and bar
charts, line graphs, scatter plots, tables, diagrams, generated narratives, and other
visualization tools. For example, if a company wants to look over its monthly sales
growth and other financial metrics, this is made possible through descriptive
analytics.

Diagnostic analytics: a more advanced data examination method that aims to


answer the question “Why did something happen?” through techniques such as
data discovery, drill-down, data mining, and correlations. For example, if a
company was having trouble with high employee turnover rates, diagnostic
analytics can help to identify the causes through analyzing factors such as
compensation ratio, promotion waiting time, pay raises, tenure, performance, etc.
Classification of Analytics

Predictive analytics: This type of analytics combines the outcomes from the
“what” and “why” of the two other types of data analytics and interprets them in
an attempt to forecast future events and actions within a business. This type of
analysis is quite complex, therefore it relies on techniques such as data mining,
artificial intelligence (AI), and machine learning. For example, retail companies
can gather information on what time of the year a certain product sells the fastest
or whether there is a particular pattern of sales for that product.

Prescriptive analytics: Different from predictive analytics, which gives a


prediction of what is likely to happen based on data, prescriptive analytics uses
data to suggest the best course of action. The predictive analysis example of a
product’s sell-through rate? Prescriptive analytics is what businesses use to
materialize those predictions into profitable business decisions. In this case, the
analysis aids decision-making around stocking inventories and marketing
strategies for that particular product.
How Can Data Analytics Help Businesses?

Making better decisions


Better marketing strategies
Improving customer experience
Mitigating business risks

Improve consumer engagement.


Increase operational efficiency.
Enhance decision-making.
Optimize sales and marketing
Why is big data analytics important?
Big data analytics helps organizations harness their data and use it to identify new
opportunities. That, in turn, leads to smarter business moves, more efficient
operations, higher profits and happier customers. Businesses that use big data with
advanced analytics gain value in many ways, such as:
 Reducing cost. Big data technologies like cloud-based analytics can
significantly reduce costs when it comes to storing large amounts of data (for
example, a data lake). Plus, big data analytics helps organizations find more
efficient ways of doing business.
 Making faster, better decisions. The speed of in-memory analytics – combined
with the ability to analyze new sources of data, such as streaming data from IoT
– helps businesses analyze information immediately and make fast, informed
decisions .
 Developing and marketing new products and services. Being able to gauge
customer needs and customer satisfaction through analytics empowers
businesses to give customers what they want, when they want it. With big data
analytics, more companies have an opportunity to develop innovative new
products to meet customers’ changing needs.
Big Data Statistics
Big Data Statistics Highlights:
 In 2020, the amount of data created and replicated reached a new high, surpassing
64 zettabytes.
 Between 2021 and 2026, data generation will exceed 221 zettabytes.
 The global big data analytics market was valued at $271.83 billion in 2022.
 Spending on big data analytics solutions is estimated to reach $42.2 billion in 2023.
 As of April 2023, there were 5.18 billion internet users generating data at an
unprecedented rate.
In 2023, big data tools and technologies are showing deeper penetration across several
end-use industries, including IT and telecoms, retail, gaming, and media and
entertainment. This trend is expected to continue through 2029. Other
industries expected to use big data extensively include:
 BFSI
 Healthcare
 Manufacturing
 Travel and hospitality
 Education and research
 Transportation and logistics
Big Data Statistics
The global big data healthcare market was dominated by North
America in 2021.
As one of the first regions to digitize, North America held 37% of the
market. The Asia-Pacific region was the fastest-growing emerging
market, jumping to 41% in 2021.
Leveraging analytical tools to track supply chain performance
metrics can save hospitals up to $10 million per year.
In addition, big data can help key players in the healthcare industry
with:
Prediction of disease outbreaks.
Early symptom detection.
Electronic health records.
Real-time alerting.
Enhancing patient engagement.
Research acceleration.
Analysis of medical images.
Big Data Statistics
The banking and financial industry can leverage big data and analytics for:
 Fraud detection.
 Risk management.
 Personalized marketing.
 Customer relationship management.
 Financial trend prediction.
Across multiple industries including tourism, big data simplifies and streamlines
transportation through:
 Congestion management and traffic control.
 Route planning.
 Traffic safety.
 Real-time processing.
 Predictive analytics to detect accident-prone areas.
In 2022, the amount of data created and replicated exceeded 94 zettabytes.
The global big data market
Big Data Source Description

Social media Platforms like Facebook, Twitter, Instagram, and LinkedIn.

Internet of Things (IoT) Connected devices such as sensors, wearables, and smart devices.

Online transactions and e-commerce Data generated through online purchases, transactions, and customer
interactions on e-commerce platforms.
Sensor networks Networks of sensors in various domains such as weather monitoring,
industrial systems, and smart grids.

Scientific research Scientific experiments, simulations, and observations.

Mobile applications Mobile apps collect user data, including location information,
preferences, and usage patterns.

Video and image data Multimedia content, such as video streams, images, and CCTV
footage.
The global big data market

• Key competitors in the global


big data market include:
 IBM
 Google
 Oracle
 Microsoft
 SAS
 Teradata
 AWS
 Salesforce
 Accenture
Big Data Technologies & Tools

Big data technologies encompass a wide range of tools, frameworks, and technologies
designed to store, process, and analyze large and complex datasets . Here are some key
categories of big data technologies:

Storage Technologies:
 HDFS (Hadoop Distributed File System) is a fundamental component of the Hadoop
ecosystem. It provides scalable and reliable storage for large datasets.
 NoSQL Databases: These databases are designed to handle various types of unstructured
or semi-structured data. Examples include MongoDB, Cassandra, Couchbase, and
Redis.
 Data Warehouses: Traditional data warehouses like Teradata, Oracle Exadata, and
modern cloud-based warehouses like Amazon Redshift, Google BigQuery, and Snowflake
are used for structured data storage and analytical processing.
Big Data Technologies & Tools
Processing Frameworks:
• Hadoop MapReduce: A programming model and processing framework for distributed
computing on large datasets.
• Apache Spark: An open-source, fast, and general-purpose cluster-computing framework
that supports in-memory data processing. Spark is known for its ease of use and
versatility, handling batch processing, interactive queries, streaming analytics, and
machine learning.

Stream Processing:
• Apache Kafka: A distributed event streaming platform that is widely used for building
real-time data pipelines and streaming applications.
• Apache Flink: A stream processing framework for processing large-scale data streams in
real-time, supporting event time processing and stateful computations.
Big Data Technologies
Batch Processing:
 Apache Hadoop (MapReduce): Although not as prominent as it once was, Hadoop's
MapReduce is still used for batch processing in certain scenarios.
 Apache Spark (Batch Processing): Spark also supports batch processing and is often
used as a more efficient alternative to Hadoop MapReduce.

Machine Learning and Analytics:


• TensorFlow and PyTorch: Popular open-source libraries for machine learning, deep
learning, and neural network-based applications.
• Scikit-Learn: A machine learning library in Python that provides simple and efficient
tools for data analysis and modeling.
• RapidMiner, KNIME: Platforms that provide graphical user interfaces for designing
and deploying machine learning workflows.
Big Data Technologies & Tools
Data Integration and ETL:
• Apache NiFi: A powerful data integration tool that provides a web-based interface for designing
data flows and automating data movement between systems.
• Apache Kafka Connect: Connects Kafka with external systems, enabling seamless data integration
and ETL (Extract, Transform, Load) operations.

Visualization and Business Intelligence:


• Tableau, Power BI, Qlik: Popular tools for creating interactive visualizations and business
intelligence dashboards.
• Apache Superset: An open-source data exploration and visualization platform that integrates with
various data sources.

Containerization and Orchestration:


• Docker and Kubernetes: Containerization platforms that enable the deployment and scaling of
applications, including big data applications.
• Apache Mesos: A cluster manager that simplifies the deployment and management of distributed
applications
The Role of Cloud Computing in Big Data Analytics
Services
The cloud allows for seamless scaling of resources based on the volume of data to
be processed. This flexibility is pivotal for big data processing, which often
involves vast amounts of data.
The meteoric rise of cloud computing has made these services more accessible and
affordable, particularly for companies in the USA.
Cloud computing offers
 scalable,
 flexible, and
 cost-effective solution for storing and processing big data.
Additionally, cloud-based services from providers like AWS, Azure, and Google
Cloud offer managed big data services, simplifying infrastructure management for
organizations.
Cloud Computing in Big Data Analytics
Cloud computing has revolutionized big data analytics in several ways:
Scalability: The cloud allows for seamless scaling of resources based on the volume of data
to be processed. This flexibility is pivotal for big data processing, which often involves vast
amounts of data.
Cost-Efficiency: Cloud platforms are typically pay-as-you-go, reducing the need for hefty
upfront investments in infrastructure. This cost-effective approach is a significant advantage
for any big data analytics company in the USA.
Speed and Efficiency: With cloud computing, businesses can quickly deploy big data
applications and process large datasets faster than traditional systems, enhancing the speed
and efficiency of big data analytics.
Accessibility: The cloud enables accessibility to big data from anywhere, anytime,
enhancing collaboration among teams and contributing to more informed decision-making.
Data Security: With robust security measures in place, cloud platforms provide secure
storage and processing environments for big data.
Data Science
Data science is a crucial component in the realm of big data because it provides the
methodologies, techniques, and tools needed to extract meaningful insights from large and
complex datasets. Data science addresses this challenge by offering advanced analytics and
statistical techniques to make sense of the data. Here's why and how data science is essential in
the context of big data:
i. Handling Large Volumes of Data
ii. Extracting Insights and Patterns
iii. Predictive Analytics
iv. Real-time and Near-real-time Analytics
v. Data Cleaning and Preprocessing
vi. Customized and Personalized Experiences
vii. Optimization and Efficiency
viii. Decision Support
Responsibilities, Soft state eventual consistency

Soft State Eventual Consistency in Big Data Analytics:


a) Streaming Analytics
b) Data Processing Frameworks
c) Machine Learning Models
d) Trade-off for Performance

In summary, in big data analytics, responsibilities encompass a broad set of tasks


ranging from data collection to analysis and reporting.

You might also like