0% found this document useful (0 votes)

10 views50 pages

Week11 - COMP517 S2 2025

The document provides an overview of machine learning concepts, including linear regression assumptions, types of learning (supervised, unsupervised, semi-supervised, and reinforcement), and various applications across different fields. It discusses the importance of understanding multicollinearity, homoscedasticity, and the role of statistics and computer science in machine learning. Additionally, it highlights the significance of deep learning and neural networks in achieving state-of-the-art results in tasks such as speech recognition and image processing.

Uploaded by

maganac562

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views50 pages

Week11 - COMP517 S2 2025

Uploaded by

maganac562

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Week 11 Machine Learning-

An Introduction
COMP517 Data Analysis
SCHOOL OF ENGINEERING, COMPUTER AND
MATHEMATICAL SCIENCES
Announcements

Upcoming submissions:
• Lab Week 10 due 17 October (3%)
• No extensions or SCA applicable to this assessments
• Group Assignment 2 (50%)
• Due date is Friday 24th October, 11:59 p.m.
Overview

• Recap of Linear Association

• Assumptions for Linear Regression
• An Introduction to Machine Learning
Assumptions

• Must be linear relationship between the dependent and

independent variables
• Normally distributed error (residual)
• No Multicollinearity
• Multicollinearity exists when two or more of the explanatory
variables are highly correlated
• Homoscedasticity
• The variance of the residuals must be constant across predicted
variables
Checking for Multicollinearity
There are 2 ways multicollinearity is usually checked
1) Correlation Matrix (Heat Map)
Checking for Multicollinearity

There are 2 ways multicollinearity is usually checked

2) Variance Inflation Factor (VIF)
VIF is the measure of the variance in a model with multiple
terms by the variance of a model with one term alone.
(R2 = coefficient of determination)

R2 is the proportion of the variation in the dependent variable that is predictable

from the independent variable(s).

(A rule of thumb is that if VIF is higher than 5, then multicollinearity is high, a

cutoff of 5 is also commonly used. dependent on the industry, people might
think that above 2.50 is high)
Homoscedasticity

Homoscedasticity essentially means ‘same variance' and is an

important concept in linear regression.
• Homoscedasticity describes how the error term (the noise
or disturbance between independent and dependent
variables) is the same across the values of the independent
variables.
• The residual term is constant across observations, i.e., the
variance is constant.
• as the value of the dependent variable changes, the error term
does not vary much.
Detecting Homoscedasticity

• The example plot below indicates Heteroscedasticity and its

classic cone or fan shape.

[Link]
Data Science and Related Disciplines

Image source: [Link]

What is Machine Learning?

“Learning is any process by which a system improves

performance from experience.”
- Herbert Simon

Definition by Tom Mitchell (1998):

Machine Learning is the study of algorithms that
• improve their performance P
• at some task T
• with experience E.
A well-defined learning task is given by <P, T, E>.
Slide credit: Eric Eaton
Traditional Programming vs. Machine
Learning
What is Machine Learning?...

• Machine learning is the programming of computers to

optimize a performance criterion using example data or
past experience.
• Role of Statistics:
• Inference from a sample
• Role of Computer Science is to build efficient algorithms
to:
• Solve the optimization problem
• Representing and evaluating the model for inference
Why “Learn” ?

• Learning maybe used when:

• Human expertise does not exist (navigating on Mars)
• Humans can’t explain their expertise (speech
recognition)
• Models must be customized (personalized medicine)
• Models are based on huge amounts of data (genomics)

• Learning isn’t always useful

• There is no need to “learn” to calculate payroll
Samuel’s Checkers-Player

Machine Learning: Field of study that gives computers the

ability to learn without being explicitly programmed.”
-Arthur Samuel (1959)

A. L. Samuel, "Some Studies in Machine Learning Using the Game of

Checkers," in IBM Journal of Research and Development, vol. 3, no. 3, pp.
210-229, July 1959, doi: 10.1147/rd.33.0210. Slide credit: Eric Eaton 9
Defining the Learning Task
Improve on task T, with respect to performance metric P,
based on experience E
• Task T: Playing checkers
• P: Percentage of games won against an arbitrary opponent
• E: Playing practice games against itself
• Task T: Recognizing hand-written words
• P: Percentage of words correctly classified
• E: Database of human-labelled images of handwritten words
• T: Driving on four-lane highways using vision sensors
• P: Average distance travelled before a human-judged error
• E: A sequence of images and steering commands recorded while
observing a human driver.

Slide credit: Ray Mooney

Learning = Generalization

• “Learning denotes changes in the system that are

adaptive in the sense that they enable the system to do
the task or tasks drawn from the same population more
efficiently and more effectively the next time.”
- H. Simon

The ability to perform a task in a situation which has

never been encountered before

Slide credit: Eric Eaton

Measuring Performance

• Model (classification) accuracy

• Solution correctness
• Solution quality (length, efficiency)
• Speed of performance

Slide credit: Ray Mooney

Tasks that are best solved by using a learning
algorithm
• Recognizing patterns:
– Facial identities or facial expressions
– Handwritten or spoken words
– Medical images
• Generating patterns:
– Generating images or motion sequences
• Recognizing anomalies:
– Unusual credit card transactions
– Unusual patterns of sensor readings in a nuclear power
plant
• Prediction:
– Future stock prices or currency exchange rates

Slide credit: Geoffrey Hinton

NOTE: [Link]
geoffrey-hinton-sounding-alarm-
ai#:~:text=A%20deep%20learning%20pioneer%20is,AI%20and%20business%2C%20deli
vered%20monthly.&text=In%20a%20widely%20discussed%20interview,inability%20to%
Sample Applications

• Retail: Market basket analysis, Customer relationship

management (CRM)
• Finance: Credit scoring, fraud detection
• Manufacturing: Optimization, troubleshooting
• Medicine: Medical diagnosis, treatments
• Telecommunications: Quality of service optimization
• Bioinformatics: Motifs, DNA/RNA sequence alignment
• Web mining: Search engines
• And many more…
Types of Learning

• Supervised (inductive) learning

– Given: training data + desired outputs (labels)
• Unsupervised learning
– Given: training data (without desired outputs)
• Semi-supervised learning
– Given: training data + a few desired outputs
• Reinforcement learning
– Rewards from sequence of actions
Supervised vs. Unsupervised Learning
Supervised Learning

• Learning a concept C from examples

• Family car (vs. sports cars)
• “A” student (vs. all other students)
• Blockbuster movie (vs. all other movies)
• Input: N labeled examples
• Representation: descriptive features
• These define the “feature space”
• Goal: given <input x, labels>
output g(x),
learn a good approximation to g
• Minimize number of errors on new x’s
1/5/08 22
Supervised learning:
Regression
Regression

Draw a line through these dots.

Supervised Learning: Regression…

• Given ( , ), ( , ), … ( , ),
• Learn a function f( ) to predict given
– is real-valued == regression

Data from G. Witt. Journal of Statistics Education, Volume 21, Number 1 (2013)
Example: Traffic Jams Prediction
Supervised Learning: Binary Classification
Example Credit scoring
• Assign one of two
labels (i.e. low-risk /
high-risk) to the
input (here
customers’ income
and savings)
• Classification
requires a model (a
classifier) to
determine which
label to assign to
items

Discriminant: IF income > θ1 AND savings > θ2

THEN low-risk ELSE high-risk
Supervised Learning: Classification
• Given ( , ), ( , ), … ( , ),
• Learn a function f( ) to predict given
– is categorical == Classification
Supervised Learning: Classification…
• can be multi-dimensional
• Each dimension corresponds to an
attribute/feature:
• Age
• Tumour size
• Colour
• Distance from optic nerve
• Cell type is the most telling feature, but it’s risky
to do a biopsy of the eye
• ML can help determine when a feature is needed

Slide credit: Eric Eaton

Supervised Learning: Classification…

Example: Processing Loan Applications

• Given: questionnaire with financial and personal
information
• Problem: should money be lent?
• Simple statistical method covers 90% of cases;
borderline cases referred to loan officers
• But: 50% of accepted borderline cases defaulted!
• Solution(?): reject all borderline cases
Processing Loan Applications
• Input consisted of 1000 training examples of
borderline cases
• Each example tracked 20 attributes that included:
Age, Years with current Employer, Years with the
bank, other credit cards, etc.
• A machine learning procedure produced a small set
of classification rules that made predictions on an
independently chosen test set
• Correct predictions were made on around 67% of
this test set
• These rules not only improved the success rate, but
also helped the company to explain to customers
why their applications were declined
Decision Trees
• Another popular classification technique is the Decision Tree
Example: Contact Lens prescription
Association Rules
• Market basket analysis
• A market basket is a collection of items purchased by a customer in
a transaction
• The goal here is to track items that are frequently purchased
together – hence this falls into the Association Rules category
• This will enable Managers to better target customers and thereby
increase sales
• Finding Frequent Itemsets
• Need to first define “frequent” – the support of an itemset is the
fraction of transactions that contain all the items in the itemset
• For example, we may be interested in finding all itemsets which
have 75% or more support
• Thus for our basket, we need to identify all frequently occurring
• Single items,
• Pairs of items
• Triples and so on

1/5/08 32
Learning Associations

• Market Basket analysis:

P (Y | X ) probability that somebody who buys X also buys Y
where X and Y are products/services.

Example: P ( chips | beer ) = 0.7

Classification Algorithms
Issues in Supervised Learning

1. Representation: which features to use?

2. Model Selection: complexity, noise, bias
3. Evaluation: how well does it perform?

1/5/08
Unsupervised Learning
• Learn trends, patterns
• No labels or feedback
• Learning “what normally happens”

• Clustering: Grouping similar instances

• Example applications
• Customer segmentation in CRM : e.g., targeted mailings
• Image segmentation: find objects
• Genomics: group individuals by genetic similarity

36
Slide credit: Eric Eaton
Unsupervised Learning

• Given , , … , (without labels)

• Output hidden structure behind the x’s
– E.g., clustering

Slide credit: Eric Eaton

Applications

• Classification: aka Pattern recognition

• Face recognition: Pose, lighting, occlusion (glasses,
beard), make-up, hair style
• Character recognition: Different handwriting styles.
• Speech recognition: Temporal dependency.
• Use of a dictionary or the syntax of the language.
• Sensor fusion: Combine multiple modalities; eg, visual
(lip image) and acoustic for speech
• Medical diagnosis: From symptoms to illnesses
Autonomous cars

• Nevada made it legal for autonomous cars to drive on roads

in June 2011
• As of 2017, 29 states have enacted legislation regarding
autonomous cars
• Today – autopilot and FSD (supervised) cars

Slide credit: Eric Eaton

Autonomous car technology
Current state-of-the-art: Neural
Networks and Deep Learning
Learning Techniques (today) Explainability
(notional)
Neural Nets
Graphical
Models
Deep

Prediction Accuracy
Learning Ensemble
Bayesian Methods
Belief Nets

SRL Random
Forests
CRFs HBNs
AOGs
Statistical MLNs
Models Decision
Markov Trees
SVMs Models Explainability

Most machine learning methods work well because of human-designed

representations and input features
ML becomes just optimizing weights to best make a final prediction
Neural Network Introduction
Weights

Activation functions

How do we train?

4 + 2 = 6 neurons (not counting inputs)

[3 x 4] + [4 x 2] = 20 weights
4 + 2 = 6 biases
26 learnable parameters

Demo
Machine Learning in
Automatic Speech Recognition
• A Typical Speech Recognition System

• ML used to predict of phone states from the sound

spectrogram
Deep learning has state-of-the-art results
# Hidden Layers 1 2 4 8 10 12

Word Error Rate % 16.0 12.8 11.4 10.9 11.0 11.1

Baseline GMM performance = 15.4%

[Zeiler et al. “On rectified linear units for speech
recognition” ICASSP 2013]
Example: NN used to read digit ‘4’

[Link]
Why is Deep Learning useful?
• Deep learning provides a very flexible, universal, learnable
framework for representing visual and linguistic information.
• It’s quite easy to learn, used for both unsupervised and supervised
• Utilize large amounts of training data
• In ~2010 DL started outperforming other ML techniques; first in
speech and vision, then NLP
• Popular architectures: Perceptron, Convolutional
Network (CNN), Recurrent Networks(RNN), Autoencoders
• Popular platform: Keras, TensorFlow & PyTorch

Google Trend of Popularity

Generative Adversarial Networks (GAN)

• Example: Image Super-Resolution

[Link]
ZhGrHRfFI6UUYOo
Reinforcement Learning

• Given a sequence of states and actions with (delayed)

rewards, output a policy
• Policy is a mapping from states à actions that tells you what to do in
a given state
• Examples:
• Credit assignment problem
• Game playing
• Robot in a maze
• Balance a pole on your hand
The Agent-Environment Interface

Slide credit: Sutton & Barto

Reinforcement Learning…

• [Link]
References

• Computational and Inferential Thinking

• Simple Linear Regression
• Data Mining: Practical Machine Learning Tools and
Techniques (3rd edition) / Ian Witten, Eibe Frank; Elsevier,
2011, Chapter 4

Data Science Lec 9
No ratings yet
Data Science Lec 9
9 pages
Understanding Machine Learning Concepts
No ratings yet
Understanding Machine Learning Concepts
29 pages
ML Complete Notes
No ratings yet
ML Complete Notes
23 pages
Introduction to Machine Learning Concepts
No ratings yet
Introduction to Machine Learning Concepts
42 pages
Supervised Learning in Machine Learning
No ratings yet
Supervised Learning in Machine Learning
42 pages
Unit-1 Machine Learning Techniques
No ratings yet
Unit-1 Machine Learning Techniques
10 pages
Machine Learning Techniques Overview
100% (1)
Machine Learning Techniques Overview
113 pages
Introduction to Machine Learning Concepts
No ratings yet
Introduction to Machine Learning Concepts
19 pages
Machine Learning Concepts and Techniques
No ratings yet
Machine Learning Concepts and Techniques
13 pages
INT522 Unit-5
No ratings yet
INT522 Unit-5
59 pages
Introduction to Machine Learning Concepts
No ratings yet
Introduction to Machine Learning Concepts
89 pages
Introduction to Machine Learning Concepts
No ratings yet
Introduction to Machine Learning Concepts
63 pages
Chapter - 01 - Introduction To ML
No ratings yet
Chapter - 01 - Introduction To ML
60 pages
Overview of Machine Learning Concepts
No ratings yet
Overview of Machine Learning Concepts
62 pages
ML Unit 1
No ratings yet
ML Unit 1
94 pages
Introduction to Machine Learning Concepts
No ratings yet
Introduction to Machine Learning Concepts
19 pages
Linier & Logistic
No ratings yet
Linier & Logistic
15 pages
Machine Learning Fundamentals Explained
No ratings yet
Machine Learning Fundamentals Explained
24 pages
Understanding Machine Learning Concepts
No ratings yet
Understanding Machine Learning Concepts
68 pages
Introduction to Machine Learning Concepts
No ratings yet
Introduction to Machine Learning Concepts
128 pages
Machine Learning Overview and Techniques
No ratings yet
Machine Learning Overview and Techniques
445 pages
Machine Learning Techniques Overview
No ratings yet
Machine Learning Techniques Overview
145 pages
Understanding Supervised Machine Learning
No ratings yet
Understanding Supervised Machine Learning
45 pages
Eml JHP Lec 01
No ratings yet
Eml JHP Lec 01
51 pages
Understanding Machine Learning Basics
No ratings yet
Understanding Machine Learning Basics
24 pages
Introduction to Machine Learning Concepts
No ratings yet
Introduction to Machine Learning Concepts
38 pages
Fundamentals of Machine Learning Unit 1
No ratings yet
Fundamentals of Machine Learning Unit 1
9 pages
Machine Learning Workshop Overview
No ratings yet
Machine Learning Workshop Overview
44 pages
Overview of Machine Learning Types
No ratings yet
Overview of Machine Learning Types
6 pages
Machine Learning Basics with Python
No ratings yet
Machine Learning Basics with Python
54 pages
Understanding Machine Learning Basics
No ratings yet
Understanding Machine Learning Basics
38 pages
Introduction to Machine Learning Concepts
100% (1)
Introduction to Machine Learning Concepts
83 pages
Intro to Machine Learning Concepts
No ratings yet
Intro to Machine Learning Concepts
17 pages
Introduction to Machine Learning Concepts
No ratings yet
Introduction to Machine Learning Concepts
51 pages
Machine Learning
No ratings yet
Machine Learning
119 pages
Types and Techniques in Machine Learning
No ratings yet
Types and Techniques in Machine Learning
9 pages
Introduction to Machine Learning Concepts
No ratings yet
Introduction to Machine Learning Concepts
15 pages
Data Science Principles: Machine Learning
No ratings yet
Data Science Principles: Machine Learning
28 pages
Introduction to Machine Learning Concepts
No ratings yet
Introduction to Machine Learning Concepts
19 pages
ML Chapter1 English
No ratings yet
ML Chapter1 English
14 pages
Machine Learning IV
No ratings yet
Machine Learning IV
79 pages
Introduction to Machine Learning Concepts
No ratings yet
Introduction to Machine Learning Concepts
80 pages
Unit IV Supervised Machine Learning For Financial Data Analysis
No ratings yet
Unit IV Supervised Machine Learning For Financial Data Analysis
60 pages
Introduction to Machine Learning Concepts
No ratings yet
Introduction to Machine Learning Concepts
50 pages
Introduction to Machine Learning Concepts
No ratings yet
Introduction to Machine Learning Concepts
316 pages
Machine Learning Overview and Applications
No ratings yet
Machine Learning Overview and Applications
73 pages
Data Science Principles in Machine Learning
No ratings yet
Data Science Principles in Machine Learning
28 pages
ML Chapter1 Summary
No ratings yet
ML Chapter1 Summary
14 pages
CS-E4715: Supervised ML Overview
No ratings yet
CS-E4715: Supervised ML Overview
37 pages
MLC Semester Si
No ratings yet
MLC Semester Si
33 pages
Introduction to Machine Learning Concepts
No ratings yet
Introduction to Machine Learning Concepts
47 pages
Introduction to Machine Learning Concepts
No ratings yet
Introduction to Machine Learning Concepts
42 pages
Geocluster Mod in Machine Learning
No ratings yet
Geocluster Mod in Machine Learning
124 pages
Introduction to Machine Learning Concepts
No ratings yet
Introduction to Machine Learning Concepts
16 pages
Introduction to Machine Learning Concepts
No ratings yet
Introduction to Machine Learning Concepts
94 pages
Flaw Comparison in Cloth Suppliers
No ratings yet
Flaw Comparison in Cloth Suppliers
36 pages
Multiple Linear Regression Analysis Guide
No ratings yet
Multiple Linear Regression Analysis Guide
16 pages
Crash Pulse Model for Injury Prediction
No ratings yet
Crash Pulse Model for Injury Prediction
5 pages
Regression Models in Machine Learning
No ratings yet
Regression Models in Machine Learning
15 pages
Return On Equity-Debt To Equity Ratio
No ratings yet
Return On Equity-Debt To Equity Ratio
14 pages
Motor Yacht Displacement Estimation Techniques
100% (1)
Motor Yacht Displacement Estimation Techniques
14 pages
Mulitple Linear Regression
No ratings yet
Mulitple Linear Regression
54 pages
Econometric Techniques and Models Explained
No ratings yet
Econometric Techniques and Models Explained
6 pages
Understanding Multicollinearity in Regression
No ratings yet
Understanding Multicollinearity in Regression
8 pages
DA4139 Level II CFA Mock Exam 1 Afternoon
100% (1)
DA4139 Level II CFA Mock Exam 1 Afternoon
34 pages
Understanding Factor Analysis Methods
100% (1)
Understanding Factor Analysis Methods
54 pages
Tax Compliance in Small Businesses Uganda
No ratings yet
Tax Compliance in Small Businesses Uganda
21 pages
Impact of Cell Towers on Property Prices
No ratings yet
Impact of Cell Towers on Property Prices
22 pages
Mauritius Economic Growth Analysis
No ratings yet
Mauritius Economic Growth Analysis
35 pages
795-Article Text-3910-2-10-20251219
No ratings yet
795-Article Text-3910-2-10-20251219
14 pages
Customer Revisit Intention at Decade Coffee Shop
No ratings yet
Customer Revisit Intention at Decade Coffee Shop
10 pages
Enhancing Self-Employment for TVET Graduates
No ratings yet
Enhancing Self-Employment for TVET Graduates
110 pages
The Relationship Between Working Capital Management and Profitability: A Case Study of Cement Industry in Pakistan
No ratings yet
The Relationship Between Working Capital Management and Profitability: A Case Study of Cement Industry in Pakistan
8 pages
Zakat's Effect on Indonesia's HDI
No ratings yet
Zakat's Effect on Indonesia's HDI
12 pages
Wage Determinants in SOEs vs. Private Sector
No ratings yet
Wage Determinants in SOEs vs. Private Sector
15 pages
UAV Flight Time Prediction Framework
No ratings yet
UAV Flight Time Prediction Framework
5 pages
Communication and Job Satisfaction in BPS Binjai
No ratings yet
Communication and Job Satisfaction in BPS Binjai
4 pages
Regression Webinar
No ratings yet
Regression Webinar
31 pages
Macroeconomic Trends in Sub-Saharan Africa
No ratings yet
Macroeconomic Trends in Sub-Saharan Africa
19 pages
Gen Z Investment Behavior Insights
No ratings yet
Gen Z Investment Behavior Insights
15 pages
Affiliate Marketing's Impact on Gen Z Purchases
No ratings yet
Affiliate Marketing's Impact on Gen Z Purchases
9 pages
Service Quality & Delivery Speed Impact on Customer Satisfaction
No ratings yet
Service Quality & Delivery Speed Impact on Customer Satisfaction
15 pages
Understanding Linear Regression Basics
No ratings yet
Understanding Linear Regression Basics
10 pages
15-Day Statistics Crash Course
No ratings yet
15-Day Statistics Crash Course
47 pages
Intra-Industry Trade in Australian Manufacturing
No ratings yet
Intra-Industry Trade in Australian Manufacturing
23 pages

Week11 - COMP517 S2 2025

Uploaded by

Week11 - COMP517 S2 2025

Uploaded by

Week 11 Machine Learning-

• Recap of Linear Association

• Must be linear relationship between the dependent and

There are 2 ways multicollinearity is usually checked

R2 is the proportion of the variation in the dependent variable that is predictable

(A rule of thumb is that if VIF is higher than 5, then multicollinearity is high, a

Homoscedasticity essentially means ‘same variance' and is an

• The example plot below indicates Heteroscedasticity and its

Image source: [Link]

“Learning is any process by which a system improves

Definition by Tom Mitchell (1998):

• Machine learning is the programming of computers to

• Learning maybe used when:

• Learning isn’t always useful

Machine Learning: Field of study that gives computers the

A. L. Samuel, "Some Studies in Machine Learning Using the Game of

Slide credit: Ray Mooney

• “Learning denotes changes in the system that are

The ability to perform a task in a situation which has

Slide credit: Eric Eaton

• Model (classification) accuracy

Slide credit: Ray Mooney

Slide credit: Geoffrey Hinton

• Retail: Market basket analysis, Customer relationship

• Supervised (inductive) learning

• Learning a concept C from examples

Draw a line through these dots.

Discriminant: IF income > θ1 AND savings > θ2

Slide credit: Eric Eaton

Example: Processing Loan Applications

• Market Basket analysis:

Example: P ( chips | beer ) = 0.7

1. Representation: which features to use?

• Clustering: Grouping similar instances

• Given , , … , (without labels)

Slide credit: Eric Eaton

• Classification: aka Pattern recognition

• Nevada made it legal for autonomous cars to drive on roads

Slide credit: Eric Eaton

Most machine learning methods work well because of human-designed

4 + 2 = 6 neurons (not counting inputs)

• ML used to predict of phone states from the sound

Word Error Rate % 16.0 12.8 11.4 10.9 11.0 11.1

Baseline GMM performance = 15.4%

Google Trend of Popularity

• Example: Image Super-Resolution

• Given a sequence of states and actions with (delayed)

Slide credit: Sutton & Barto

• Computational and Inferential Thinking

You might also like