0% found this document useful (0 votes)

18 views7 pages

Data Analytics Overview for COMP 333

The document provides an overview of Data Analytics, emphasizing its importance across various industries and outlining the course objectives for COMP 333. It details the main components of data analytics, including descriptive data analysis, data wrangling, and exploratory data analysis, as well as the iterative data analytics process. The document also distinguishes between different types of data analysis: descriptive, predictive, and prescriptive.

Uploaded by

lankwitzjacques

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views7 pages

Data Analytics Overview for COMP 333

Uploaded by

lankwitzjacques

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

COMP 333 — Week 1 Nutshell

Data Analytics in a Nutshell

This lecture provides an overview of Data Analytics
to let you orient yourself for COMP 333
and see what is important and where to focus your efforts.
To quote the course outline:
“The aim of this course is to introduce students
to the Python programming language
and related tools for data analytics; and
to expose them to a broad range of data analysis problems across a range of disciplines.”

Why?

Because

Data Analytics has permeated into every industry, government, and business function.
The future will need data-driven approaches for all fields of human endeavour.

What is Data Analytics?

The aim of data analytics is to add value to your data
so it becomes actionable data
which means it helps you and your organisation to make decisions.
You will see it termed as “monetization of data” in the business world.
The main steps of the data analytics are
- descriptive data analysis
- data wrangling
- exploratory data analysis
These steps fit into an overall data analytics process
where you combine an understanding of the data and the business
to come up with data-driven input into the decision-making of the organization.
THE IMPORTANT THINGS

This is an overview.
It is an orientation of what is to come.
You are not meant to understand everything in this document today!
Each topic will be done in much more detail again later in the semester.

Descriptive Data Analysis (DDA)

DDA is a basic tool for understanding your data
DDA is used throughout all stages of data analytics.

Be aware of the type of data that you have:

- categorical versus continuous
– categorical: nominal versus ordinal
– continuous: interval versus ratio
- structured versus unstructured
and for numerical values, be aware of
- accuracy
- precision
- significant digits

Describe the data and the data distribution for each feature in the dataset
- central tendency: mean, median, mode
- variation: standard deviation, inter-quartile range (IPR)
- outlier values and extreme values
- skew
- kurtosis

You want descriptions that are robust to presence of outliers

Visualization as box-plots, violin plots, histograms, scatter plots
Data Wrangling
Data wrangling is extremely important because your data is typically “messy”
and remember Garbage-In-Garbage-Out (GIGO) rule for computation
so you need to tidy-up your data before doing “serious” work.

Data wrangling is generally 60%+ of the time and effort for data analytics!

For data wrangling, you need to look closely at your data, so DDA is a basic tool.

Steps in data wrangling:

Step 1: Discover
Step 2: Structure
Step 3: Cleanse
Step 4: Enrich
Step 5: Validate
Step 6: Publish

Issues for data cleaning:

I errors in data
I outliers and anomalies
I missing values and imputation of missing values
I unification and normalization so data is comparable
I entity recognition

Data wrangling is the traditional ETL (Extract-Load-Transform) process

from data warehouses and OLAP (online analytical processing).

The output of data wrangling is formatted as “Tidy Data”

which has three basic properties:
1. Each variable is saved in its own column
2. Each observation is saved in its own row
3. Each type of observation is stored in its own (single) table
Exploratory Data Analysis (EDA)
EDA grew out of the statistics community.
EDA is the heart of data analytics.
EDA involves data wrangling and descriptive data analysis.

EDA develops a data-driven solution to your problem

by exploring the data to find which features lead to a solution.

The steps of EDA

Step 1: Data wrangling: collect, load, enrich data
Step 2: Descriptive data analysis: check data types, check distributions
Step 3: Feature engineering
Step 4: Modeling
Step 5: Story-Telling

A checklist for EDA:

Q1. What question(s) are you trying to solve (or prove wrong)?
Q2. What kind of data do you have and how do you treat different types?
Q3. Whats missing from the data and how do you deal with it?
Q4. Where are the outliers and why should you care about them?
Q5. How can you add, change or remove features to get more out of your data?
The Data Analytics Process
The data analytics process is how the business community looks at data analytics.

Step 1: Business Understanding: What are the business goals and problems?
Step 2: Data Understanding: Explore and visualize the data.
Step 3: Data Preparation: Generate features
Step 4: Modeling: Create models
Step 5: Evaluating: Train models and evaluate effectiveness
Step 6: Deploying: Use this data-driven approach for the goal of the business on a regular
basis.

This can be viewed as a highly iterative cycle:

- Define the Goal: What problem are you solving?
- Collect and Manage Data: What information do I need?
- Build the Model: Find patterns in the data that lead to solutions.
- Evaluate and Critique the Model: Does the model solve your problem?
- Present Results and Document: Establish that you can solve the problem, and how.
- Deploy Model: Deploy the model to solve the problem in the real world.
THE LESS IMPORTANT THINGS

Models and Machine Learning

Story-Telling and Visualization

Deployment and Big Data Infrastructure

THE LESS LESS IMPORTANT THINGS as these provide mainly context

Correlation, Causality, and Confounding Factors

Data Warehouses and Business Intelligence

Confirmatory Data Analysis

that is, the scientific method with
planned (not exploratory)
experimental design, data collection, and data analysis

Descriptive vs Predictive vs Prescriptive Data Analysis

Descriptive Data Analysis is describing your data from past activities, provides insight into
the past and answer: “What has happened?”
Predictive Data Analysis provides results for unseen data for future activities, uses statistical
models and forecasts to understand the future and answer: “What could happen?”
Prescriptive Data Analysis models viable solutions to a problem and the impact of consider-
ing a solution, uses optimization and simulation algorithms to advise on possible outcomes
and answer: “What should we do?”

Common questions

Data wrangling ensures data readiness for further analysis by transforming messy raw data into a structured format, which involves cleansing errors, handling missing values, normalizing data for comparability, and ensuring data consistency and integrity. This results in 'Tidy Data,' making it conducive for sophisticated analysis and model building .

Data wrangling is crucial because it addresses the 'messy' nature of raw data, adhering to the principle 'Garbage-In-Garbage-Out (GIGO).' The process involves steps such as discovering, structuring, cleansing, enriching, validating, and publishing data. This often constitutes over 60% of the time and effort in data analytics . The goal is to output 'Tidy Data,' where each variable and observation are clearly defined and organized .

Understanding the business goal is foundational in the data analytics process as it sets the direction for what problems the analytics initiatives are trying to solve. This understanding shapes the approach to data collection, feature generation, model building, and evaluation to ensure that the analytics outputs directly contribute to achieving the business objectives .

Identifying and dealing with outliers during data wrangling involves challenges such as distinguishing between genuine outliers and errors, which requires thorough domain knowledge and statistical analysis. Outliers can disproportionally affect the model's performance, thus necessitating robust methods like trimming, transformation, or using resistant statistic measures. Moreover, decisions on handling outliers often require balancing between data integrity and model performance .

The main purpose of data analytics in organizations is to add value to data so it becomes actionable and aids in decision-making. In a business context, this is often termed 'monetization of data,' meaning the data is leveraged to generate insights or economic benefits, driving decisions that potentially increase profits or efficiencies .

Exploratory Data Analysis (EDA) contributes significantly to data analytics by helping identify underlying patterns, relationships, and anomalies within the data through an iterative approach. It involves data wrangling and descriptive data analysis, enabling a deeper understanding of the data types and distributions, handling missing values, and transforming features to improve model outputs. EDA is a critical step for developing a data-driven solution, as it explores which data features are beneficial for modeling and ultimate decision-making .

Feature engineering is crucial in Exploratory Data Analysis because it involves creating new features from raw data that can make machine learning models more effective. This process includes adding, changing, or removing data features to improve model performance by discovering features that provide significant insights or patterns, thus influencing the model's explanatory power and accuracy .

The factors determining the type of descriptive statistics used in evaluating a dataset include the nature of the data—whether it is categorical or continuous, structured or unstructured, and the specific characteristics such as central tendency, variation, outliers, skewness, and kurtosis. These factors dictate whether metrics like mean, median, mode, standard deviation, or inter-quartile range are appropriate for describing the dataset effectively .

Descriptive data analytics focuses on summarizing past data to understand trends and patterns, answering 'What has happened?' Predictive data analytics uses models to forecast future outcomes based on historical data, providing insights into 'What could happen?' Prescriptive data analytics evaluates potential interventions and solutions, recommending actions to optimize outcomes, addressing 'What should we do?' Each type plays a role based on distinct objectives—descriptive for understanding, predictive for forecasting, and prescriptive for decision-making .

The iterative nature of the data analytics process enhances model deployment effectiveness by allowing continuous refining and re-evaluation of data inputs, models, and assumptions. Each iteration helps uncover new insights, adjust strategies based on model feedback, and ensure the model aligns closely with the evolving business goals and data characteristics, ultimately increasing the accuracy and usefulness of the deployed solution .

Data Analytics: Steps & Types Explained
No ratings yet
Data Analytics: Steps & Types Explained
16 pages
Data Analytics: Steps and Types Explained
No ratings yet
Data Analytics: Steps and Types Explained
16 pages
Data Analysis Terms Glossary A-Z
No ratings yet
Data Analysis Terms Glossary A-Z
1 page
Understanding Data Analytics Types
No ratings yet
Understanding Data Analytics Types
20 pages
EDA Fundamentals Overview Document
No ratings yet
EDA Fundamentals Overview Document
34 pages
Beginner's Guide to Data Analysis
100% (2)
Beginner's Guide to Data Analysis
94 pages
Introduction to Data Analytics Basics
No ratings yet
Introduction to Data Analytics Basics
20 pages
Data Analytics: Techniques & Applications
No ratings yet
Data Analytics: Techniques & Applications
8 pages
Data Analysis Fundamentals Explained
No ratings yet
Data Analysis Fundamentals Explained
23 pages
Understanding Data and Analytics Basics
No ratings yet
Understanding Data and Analytics Basics
15 pages
Data Analysis Overview and Methods
No ratings yet
Data Analysis Overview and Methods
16 pages
Data Analytics Overview with Python
No ratings yet
Data Analytics Overview with Python
47 pages
DA Lec 1
No ratings yet
DA Lec 1
43 pages
Data Analytics: Techniques and Insights
No ratings yet
Data Analytics: Techniques and Insights
19 pages
Data Analytics Fundamentals with Python
No ratings yet
Data Analytics Fundamentals with Python
47 pages
EDA Techniques for Data Analysis Insights
No ratings yet
EDA Techniques for Data Analysis Insights
33 pages
Understanding Data and Analysis Techniques
No ratings yet
Understanding Data and Analysis Techniques
7 pages
Data Analysis 2025 HIIT
No ratings yet
Data Analysis 2025 HIIT
43 pages
Data Visualization Techniques Overview
No ratings yet
Data Visualization Techniques Overview
101 pages
Most Frequent Attribute in Data Analysis
No ratings yet
Most Frequent Attribute in Data Analysis
86 pages
Introduction to Data Analysis Concepts
No ratings yet
Introduction to Data Analysis Concepts
16 pages
Extracting Usernames from Purchase Logs
No ratings yet
Extracting Usernames from Purchase Logs
26 pages
Machine Learning
No ratings yet
Machine Learning
49 pages
Data Analytics: Types and Benefits
No ratings yet
Data Analytics: Types and Benefits
40 pages
Data Analysis Course Overview and Skills
No ratings yet
Data Analysis Course Overview and Skills
19 pages
Data Analysis Techniques and Tools
No ratings yet
Data Analysis Techniques and Tools
94 pages
Data Analytics Skills for Colleges
No ratings yet
Data Analytics Skills for Colleges
36 pages
Data Analytics Unit-1
No ratings yet
Data Analytics Unit-1
40 pages
Understanding Data Analytics Basics
No ratings yet
Understanding Data Analytics Basics
30 pages
Introduction to Data Analytics Course
No ratings yet
Introduction to Data Analytics Course
11 pages
Data Analysis 2026 HIIT
No ratings yet
Data Analysis 2026 HIIT
46 pages
Data Analytics Course Overview
No ratings yet
Data Analytics Course Overview
4 pages
EDA Fundamentals and Techniques Overview
100% (1)
EDA Fundamentals and Techniques Overview
123 pages
Data Analysis Course
No ratings yet
Data Analysis Course
14 pages
Introduction to Business Analytics
100% (2)
Introduction to Business Analytics
45 pages
Comprehensive Data Analysis Guide
No ratings yet
Comprehensive Data Analysis Guide
33 pages
Comprehensive Data Analysis Guide
100% (1)
Comprehensive Data Analysis Guide
34 pages
Data Analytics Fundamentals with Spreadsheets
No ratings yet
Data Analytics Fundamentals with Spreadsheets
27 pages
Data Analytics: Techniques and Uses
No ratings yet
Data Analytics: Techniques and Uses
16 pages
Introduction to Data Analytics Lifecycle
No ratings yet
Introduction to Data Analytics Lifecycle
7 pages
Data Science and Analytics Overview
No ratings yet
Data Science and Analytics Overview
37 pages
Understanding Data Analysis Basics
No ratings yet
Understanding Data Analysis Basics
27 pages
Data Analytics
No ratings yet
Data Analytics
52 pages
Unit-1 Data Analytics
No ratings yet
Unit-1 Data Analytics
14 pages
Data Analytics: EDA & Summary Stats
No ratings yet
Data Analytics: EDA & Summary Stats
23 pages
Understanding Data Analytics Basics
No ratings yet
Understanding Data Analytics Basics
50 pages
Data Analytics Overview and Hierarchy
No ratings yet
Data Analytics Overview and Hierarchy
22 pages
Data Analytics Lifecycle Overview
No ratings yet
Data Analytics Lifecycle Overview
46 pages
Understanding Data Analytics Techniques
No ratings yet
Understanding Data Analytics Techniques
7 pages
Introduction to Data Analytics Basics
100% (1)
Introduction to Data Analytics Basics
14 pages
Data Analytics Skills and Techniques
No ratings yet
Data Analytics Skills and Techniques
54 pages
DATA ANALYTICS - A Comprehensive Beginner's Guide To Learn About The Realms of Data Analytics From A-Z
89% (19)
DATA ANALYTICS - A Comprehensive Beginner's Guide To Learn About The Realms of Data Analytics From A-Z
102 pages
EDA Techniques and Significance in Data Analysis
100% (7)
EDA Techniques and Significance in Data Analysis
33 pages
Data Analytics Overview and Skills Guide
No ratings yet
Data Analytics Overview and Skills Guide
15 pages
Data Transformation and Analysis Techniques
No ratings yet
Data Transformation and Analysis Techniques
19 pages
Understanding Function Continuity
No ratings yet
Understanding Function Continuity
12 pages
Two-Step Equation Practice Guide
No ratings yet
Two-Step Equation Practice Guide
17 pages
iCare IC200 Tonometer Overview
No ratings yet
iCare IC200 Tonometer Overview
5 pages
Edverse Proposal for Bharti Foundation
No ratings yet
Edverse Proposal for Bharti Foundation
4 pages
Cross Arms
No ratings yet
Cross Arms
46 pages
Pavan Sai Pedapudi: STEM Enthusiast Profile
No ratings yet
Pavan Sai Pedapudi: STEM Enthusiast Profile
5 pages
Turing Machines and Multitape Equivalence
No ratings yet
Turing Machines and Multitape Equivalence
13 pages
Power Electronics Course Overview
100% (1)
Power Electronics Course Overview
66 pages
Veepai WiFi Optical Zoom Camera Specs
No ratings yet
Veepai WiFi Optical Zoom Camera Specs
1 page
Diablo IV System Log Analysis
No ratings yet
Diablo IV System Log Analysis
37 pages
Definition and Meaning of Computer
No ratings yet
Definition and Meaning of Computer
6 pages
Tekmar Pump Sequencer Relay Overview
No ratings yet
Tekmar Pump Sequencer Relay Overview
3 pages
ISOVOLT Mobil English 4.5.06 (1MB) PDF
No ratings yet
ISOVOLT Mobil English 4.5.06 (1MB) PDF
48 pages
Skripsi Repository UIN Jakarta
No ratings yet
Skripsi Repository UIN Jakarta
64 pages
TANESCO Job Vacancies 2025 Announcement
No ratings yet
TANESCO Job Vacancies 2025 Announcement
8 pages
Securing Cloud Applications and Infrastructure
No ratings yet
Securing Cloud Applications and Infrastructure
7 pages
Cloud Computing Key Concepts and AWS IAM
No ratings yet
Cloud Computing Key Concepts and AWS IAM
2 pages
Product Management Insights and Challenges
No ratings yet
Product Management Insights and Challenges
174 pages
Conveyor Belt Tear Detection Methods
No ratings yet
Conveyor Belt Tear Detection Methods
8 pages
Data Quality and ETL Process Overview
No ratings yet
Data Quality and ETL Process Overview
7 pages
Aamir Alam: E&I QC Inspector CV
0% (2)
Aamir Alam: E&I QC Inspector CV
6 pages
MSN Messenger User State Log
No ratings yet
MSN Messenger User State Log
3 pages
Deep Learning for Pneumonia Detection
No ratings yet
Deep Learning for Pneumonia Detection
7 pages
Software Engineering Quiz Overview
No ratings yet
Software Engineering Quiz Overview
555 pages
Modelling Strategic Conversation The STA
No ratings yet
Modelling Strategic Conversation The STA
2 pages
Lexicon-Based Financial Sentiment Analysis
No ratings yet
Lexicon-Based Financial Sentiment Analysis
4 pages
UOM Post Result Services Guide
No ratings yet
UOM Post Result Services Guide
7 pages
User Design Approaches for Custom Products
No ratings yet
User Design Approaches for Custom Products
14 pages
CT TRG Otrs en PDF
No ratings yet
CT TRG Otrs en PDF
1,054 pages
Frank Hamo
No ratings yet
Frank Hamo
7 pages

Data Analytics Overview for COMP 333

Uploaded by

Data Analytics Overview for COMP 333

Uploaded by

COMP 333 — Week 1 Nutshell

Data Analytics in a Nutshell

What is Data Analytics?

Descriptive Data Analysis (DDA)

Be aware of the type of data that you have:

You want descriptions that are robust to presence of outliers

Steps in data wrangling:

Issues for data cleaning:

Data wrangling is the traditional ETL (Extract-Load-Transform) process

The output of data wrangling is formatted as “Tidy Data”

EDA develops a data-driven solution to your problem

The steps of EDA

A checklist for EDA:

This can be viewed as a highly iterative cycle:

Models and Machine Learning

Story-Telling and Visualization

Deployment and Big Data Infrastructure

Correlation, Causality, and Confounding Factors

Data Warehouses and Business Intelligence

Confirmatory Data Analysis

Descriptive vs Predictive vs Prescriptive Data Analysis

Common questions

Explain how the process of data wrangling ensures the data's readiness for further analysis.

Describe the significance and process of data wrangling in data analytics.

What role does the understanding of the business goal play in the data analytics process?

What are the challenges involved in identifying and dealing with outliers during data wrangling?

What is the main purpose of data analytics in organizations, and how is it typically monetized in a business context?

How does Exploratory Data Analysis (EDA) contribute to the data analytics process, particularly in addressing data challenges?

Why is feature engineering considered an important step in Exploratory Data Analysis, and what does it entail?

What factors determine the type of descriptive statistics that should be used in evaluating a dataset?

Compare descriptive, predictive, and prescriptive data analytics in terms of their objectives and applications.

How does the iterative nature of the data analytics process improve the effectiveness of model deployment?

You might also like