Week 11 Machine Learning-
An Introduction
COMP517 Data Analysis
SCHOOL OF ENGINEERING, COMPUTER AND
MATHEMATICAL SCIENCES
Announcements
Upcoming submissions:
• Lab Week 10 due 17 October (3%)
• No extensions or SCA applicable to this assessments
• Group Assignment 2 (50%)
• Due date is Friday 24th October, 11:59 p.m.
Overview
• Recap of Linear Association
• Assumptions for Linear Regression
• An Introduction to Machine Learning
Assumptions
• Must be linear relationship between the dependent and
independent variables
• Normally distributed error (residual)
• No Multicollinearity
• Multicollinearity exists when two or more of the explanatory
variables are highly correlated
• Homoscedasticity
• The variance of the residuals must be constant across predicted
variables
Checking for Multicollinearity
There are 2 ways multicollinearity is usually checked
1) Correlation Matrix (Heat Map)
Checking for Multicollinearity
There are 2 ways multicollinearity is usually checked
2) Variance Inflation Factor (VIF)
VIF is the measure of the variance in a model with multiple
terms by the variance of a model with one term alone.
(R2 = coefficient of determination)
R2 is the proportion of the variation in the dependent variable that is predictable
from the independent variable(s).
(A rule of thumb is that if VIF is higher than 5, then multicollinearity is high, a
cutoff of 5 is also commonly used. dependent on the industry, people might
think that above 2.50 is high)
Homoscedasticity
Homoscedasticity essentially means ‘same variance' and is an
important concept in linear regression.
• Homoscedasticity describes how the error term (the noise
or disturbance between independent and dependent
variables) is the same across the values of the independent
variables.
• The residual term is constant across observations, i.e., the
variance is constant.
• as the value of the dependent variable changes, the error term
does not vary much.
Detecting Homoscedasticity
• The example plot below indicates Heteroscedasticity and its
classic cone or fan shape.
[Link]
Data Science and Related Disciplines
Image source: [Link]
What is Machine Learning?
“Learning is any process by which a system improves
performance from experience.”
- Herbert Simon
Definition by Tom Mitchell (1998):
Machine Learning is the study of algorithms that
• improve their performance P
• at some task T
• with experience E.
A well-defined learning task is given by <P, T, E>.
Slide credit: Eric Eaton
Traditional Programming vs. Machine
Learning
What is Machine Learning?...
• Machine learning is the programming of computers to
optimize a performance criterion using example data or
past experience.
• Role of Statistics:
• Inference from a sample
• Role of Computer Science is to build efficient algorithms
to:
• Solve the optimization problem
• Representing and evaluating the model for inference
Why “Learn” ?
• Learning maybe used when:
• Human expertise does not exist (navigating on Mars)
• Humans can’t explain their expertise (speech
recognition)
• Models must be customized (personalized medicine)
• Models are based on huge amounts of data (genomics)
• Learning isn’t always useful
• There is no need to “learn” to calculate payroll
Samuel’s Checkers-Player
Machine Learning: Field of study that gives computers the
ability to learn without being explicitly programmed.”
-Arthur Samuel (1959)
A. L. Samuel, "Some Studies in Machine Learning Using the Game of
Checkers," in IBM Journal of Research and Development, vol. 3, no. 3, pp.
210-229, July 1959, doi: 10.1147/rd.33.0210. Slide credit: Eric Eaton 9
Defining the Learning Task
Improve on task T, with respect to performance metric P,
based on experience E
• Task T: Playing checkers
• P: Percentage of games won against an arbitrary opponent
• E: Playing practice games against itself
• Task T: Recognizing hand-written words
• P: Percentage of words correctly classified
• E: Database of human-labelled images of handwritten words
• T: Driving on four-lane highways using vision sensors
• P: Average distance travelled before a human-judged error
• E: A sequence of images and steering commands recorded while
observing a human driver.
Slide credit: Ray Mooney
Learning = Generalization
• “Learning denotes changes in the system that are
adaptive in the sense that they enable the system to do
the task or tasks drawn from the same population more
efficiently and more effectively the next time.”
- H. Simon
The ability to perform a task in a situation which has
never been encountered before
Slide credit: Eric Eaton
Measuring Performance
• Model (classification) accuracy
• Solution correctness
• Solution quality (length, efficiency)
• Speed of performance
Slide credit: Ray Mooney
Tasks that are best solved by using a learning
algorithm
• Recognizing patterns:
– Facial identities or facial expressions
– Handwritten or spoken words
– Medical images
• Generating patterns:
– Generating images or motion sequences
• Recognizing anomalies:
– Unusual credit card transactions
– Unusual patterns of sensor readings in a nuclear power
plant
• Prediction:
– Future stock prices or currency exchange rates
Slide credit: Geoffrey Hinton
NOTE: [Link]
geoffrey-hinton-sounding-alarm-
ai#:~:text=A%20deep%20learning%20pioneer%20is,AI%20and%20business%2C%20deli
vered%20monthly.&text=In%20a%20widely%20discussed%20interview,inability%20to%
Sample Applications
• Retail: Market basket analysis, Customer relationship
management (CRM)
• Finance: Credit scoring, fraud detection
• Manufacturing: Optimization, troubleshooting
• Medicine: Medical diagnosis, treatments
• Telecommunications: Quality of service optimization
• Bioinformatics: Motifs, DNA/RNA sequence alignment
• Web mining: Search engines
• And many more…
Types of Learning
• Supervised (inductive) learning
– Given: training data + desired outputs (labels)
• Unsupervised learning
– Given: training data (without desired outputs)
• Semi-supervised learning
– Given: training data + a few desired outputs
• Reinforcement learning
– Rewards from sequence of actions
Supervised vs. Unsupervised Learning
Supervised Learning
• Learning a concept C from examples
• Family car (vs. sports cars)
• “A” student (vs. all other students)
• Blockbuster movie (vs. all other movies)
• Input: N labeled examples
• Representation: descriptive features
• These define the “feature space”
• Goal: given <input x, labels>
output g(x),
learn a good approximation to g
• Minimize number of errors on new x’s
1/5/08 22
Supervised learning:
Regression
Regression
Draw a line through these dots.
Supervised Learning: Regression…
• Given ( , ), ( , ), … ( , ),
• Learn a function f( ) to predict given
– is real-valued == regression
Data from G. Witt. Journal of Statistics Education, Volume 21, Number 1 (2013)
Example: Traffic Jams Prediction
Supervised Learning: Binary Classification
Example Credit scoring
• Assign one of two
labels (i.e. low-risk /
high-risk) to the
input (here
customers’ income
and savings)
• Classification
requires a model (a
classifier) to
determine which
label to assign to
items
Discriminant: IF income > θ1 AND savings > θ2
THEN low-risk ELSE high-risk
Supervised Learning: Classification
• Given ( , ), ( , ), … ( , ),
• Learn a function f( ) to predict given
– is categorical == Classification
Supervised Learning: Classification…
• can be multi-dimensional
• Each dimension corresponds to an
attribute/feature:
• Age
• Tumour size
• Colour
• Distance from optic nerve
• Cell type is the most telling feature, but it’s risky
to do a biopsy of the eye
• ML can help determine when a feature is needed
Slide credit: Eric Eaton
Supervised Learning: Classification…
Example: Processing Loan Applications
• Given: questionnaire with financial and personal
information
• Problem: should money be lent?
• Simple statistical method covers 90% of cases;
borderline cases referred to loan officers
• But: 50% of accepted borderline cases defaulted!
• Solution(?): reject all borderline cases
Processing Loan Applications
• Input consisted of 1000 training examples of
borderline cases
• Each example tracked 20 attributes that included:
Age, Years with current Employer, Years with the
bank, other credit cards, etc.
• A machine learning procedure produced a small set
of classification rules that made predictions on an
independently chosen test set
• Correct predictions were made on around 67% of
this test set
• These rules not only improved the success rate, but
also helped the company to explain to customers
why their applications were declined
Decision Trees
• Another popular classification technique is the Decision Tree
Example: Contact Lens prescription
Association Rules
• Market basket analysis
• A market basket is a collection of items purchased by a customer in
a transaction
• The goal here is to track items that are frequently purchased
together – hence this falls into the Association Rules category
• This will enable Managers to better target customers and thereby
increase sales
• Finding Frequent Itemsets
• Need to first define “frequent” – the support of an itemset is the
fraction of transactions that contain all the items in the itemset
• For example, we may be interested in finding all itemsets which
have 75% or more support
• Thus for our basket, we need to identify all frequently occurring
• Single items,
• Pairs of items
• Triples and so on
1/5/08 32
Learning Associations
• Market Basket analysis:
P (Y | X ) probability that somebody who buys X also buys Y
where X and Y are products/services.
Example: P ( chips | beer ) = 0.7
Classification Algorithms
Issues in Supervised Learning
1. Representation: which features to use?
2. Model Selection: complexity, noise, bias
3. Evaluation: how well does it perform?
1/5/08
Unsupervised Learning
• Learn trends, patterns
• No labels or feedback
• Learning “what normally happens”
• Clustering: Grouping similar instances
• Example applications
• Customer segmentation in CRM : e.g., targeted mailings
• Image segmentation: find objects
• Genomics: group individuals by genetic similarity
36
Slide credit: Eric Eaton
Unsupervised Learning
• Given , , … , (without labels)
• Output hidden structure behind the x’s
– E.g., clustering
Slide credit: Eric Eaton
Applications
• Classification: aka Pattern recognition
• Face recognition: Pose, lighting, occlusion (glasses,
beard), make-up, hair style
• Character recognition: Different handwriting styles.
• Speech recognition: Temporal dependency.
• Use of a dictionary or the syntax of the language.
• Sensor fusion: Combine multiple modalities; eg, visual
(lip image) and acoustic for speech
• Medical diagnosis: From symptoms to illnesses
Autonomous cars
• Nevada made it legal for autonomous cars to drive on roads
in June 2011
• As of 2017, 29 states have enacted legislation regarding
autonomous cars
• Today – autopilot and FSD (supervised) cars
Slide credit: Eric Eaton
Autonomous car technology
Current state-of-the-art: Neural
Networks and Deep Learning
Learning Techniques (today) Explainability
(notional)
Neural Nets
Graphical
Models
Deep
Prediction Accuracy
Learning Ensemble
Bayesian Methods
Belief Nets
SRL Random
Forests
CRFs HBNs
AOGs
Statistical MLNs
Models Decision
Markov Trees
SVMs Models Explainability
Most machine learning methods work well because of human-designed
representations and input features
ML becomes just optimizing weights to best make a final prediction
Neural Network Introduction
Weights
Activation functions
How do we train?
4 + 2 = 6 neurons (not counting inputs)
[3 x 4] + [4 x 2] = 20 weights
4 + 2 = 6 biases
26 learnable parameters
Demo
Machine Learning in
Automatic Speech Recognition
• A Typical Speech Recognition System
• ML used to predict of phone states from the sound
spectrogram
Deep learning has state-of-the-art results
# Hidden Layers 1 2 4 8 10 12
Word Error Rate % 16.0 12.8 11.4 10.9 11.0 11.1
Baseline GMM performance = 15.4%
[Zeiler et al. “On rectified linear units for speech
recognition” ICASSP 2013]
Example: NN used to read digit ‘4’
[Link]
Why is Deep Learning useful?
• Deep learning provides a very flexible, universal, learnable
framework for representing visual and linguistic information.
• It’s quite easy to learn, used for both unsupervised and supervised
• Utilize large amounts of training data
• In ~2010 DL started outperforming other ML techniques; first in
speech and vision, then NLP
• Popular architectures: Perceptron, Convolutional
Network (CNN), Recurrent Networks(RNN), Autoencoders
• Popular platform: Keras, TensorFlow & PyTorch
Google Trend of Popularity
Generative Adversarial Networks (GAN)
• Example: Image Super-Resolution
[Link]
ZhGrHRfFI6UUYOo
Reinforcement Learning
• Given a sequence of states and actions with (delayed)
rewards, output a policy
• Policy is a mapping from states à actions that tells you what to do in
a given state
• Examples:
• Credit assignment problem
• Game playing
• Robot in a maze
• Balance a pole on your hand
The Agent-Environment Interface
Slide credit: Sutton & Barto
Reinforcement Learning…
• [Link]
References
• Computational and Inferential Thinking
• Simple Linear Regression
• Data Mining: Practical Machine Learning Tools and
Techniques (3rd edition) / Ian Witten, Eibe Frank; Elsevier,
2011, Chapter 4