0% found this document useful (0 votes)
28 views60 pages

Supervised Learning in Machine Learning

This document provides an overview of machine learning and supervised learning techniques. It discusses classification and decision tree learning. Classification involves using attributes to predict a class label, with examples like predicting tumors as benign or malignant. Decision tree learning is a classification technique that builds a tree structure with decision nodes and leaf nodes to classify examples. The document outlines the process of building a decision tree model using a tennis play dataset and attributes like outlook and temperature. It discusses advantages like interpretability and disadvantages like limited tree structures of decision tree learning.

Uploaded by

zombiee hook
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views60 pages

Supervised Learning in Machine Learning

This document provides an overview of machine learning and supervised learning techniques. It discusses classification and decision tree learning. Classification involves using attributes to predict a class label, with examples like predicting tumors as benign or malignant. Decision tree learning is a classification technique that builds a tree structure with decision nodes and leaf nodes to classify examples. The document outlines the process of building a decision tree model using a tennis play dataset and attributes like outlook and temperature. It discusses advantages like interpretability and disadvantages like limited tree structures of decision tree learning.

Uploaded by

zombiee hook
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd

Lecture 07

Machine Learning

Mona Leeza
Email : [Link]@[Link]
Outline
• Learning
• Supervised Learning
– Classification
– Prediction
Hi Humans !!
Hi Machine
Learning !
Develop predictive Machine Learning Discover an internal
model based on representation
both input and from input data
output data only

Reinforcement
Supervised Unsupervised
(Algorithms learn to react
(Task Driven) (Data Driven)
an environment)

Classification Regression Dimension


Clustering Decision Process
Reduction

Reward System
Decision Tree, NN, kMean, Kmedoids,
Naïve Bayes, KNN, Fuzzy C-means,
SVM, Discriminant Hierarchical, SOM,
Analysis, Ensemble Hidden Markov Model,
Methods, Random Gaussian Mixture
Forest
Recommendation
Ordinary LSR, Systems
Linear Regression,
Logistic Regression, PCA, LDA
MARS, LOESS
Comparison with Traditional
Programming
Traditional Programming
Data
Computer Output
Program

Machine Learning

Data
Computer Program
Output
What is Learning
• Webster's definition of “to learn”
“To gain knowledge or understanding of, or skill in
by study, instruction or experience''
• Learning a set of new facts
• Learning HOW to do something
• Improving ability of something already learned
Why “Learn” ?
• There is no need to “learn” to calculate payroll
• Learning is used when:
– Human expertise does not exist
– Humans are unable to explain their expertise
(speech recognition)
– Solution changes in time (routing on a computer
network)
– Solution needs to be adapted to particular cases
(user biometrics)
Learning
• Examples
– Walking (motor skills)
– Riding a bike (motor skills)
– Telephone number (memorizing)
– Playing backgammon (strategy)
– Develop scientific theory (abstraction)
– Language
– Recognize fraudulent credit card transactions
– Etc.
Supervised Learning
Supervised Learning
• Supervised learning is the machine learning
task of inferring a function from supervised
training data.
• The training data consist of a set of training
examples.
• In supervised learning, each example is a
pair consisting of an input object (typically
a vector) and a desired output value (also
called the supervisory signal).
Supervised Learning Algorithms
A supervised learning algorithm analyzes the
training data and produces

• (Inferred function) classifier


– If the output is discrete.
OR
• Regression function
– If the output is continuous
Working of supervised learning algorithms

Given a set of training examples of the form:


{ (x1,y1), . . ., (xN, yN)}
a learning algorithm seeks a function
g:XY
where X is the input space and Y is the output space and the
function g is an element of some space of possible functions G,
usually called the hypothesis space.
Classification
Classification: Definition
• Given a collection of records (training set )
• Each record contains a set of attributes, one of the
attributes is the class.
• Find a model for class attribute as a function of the values
of other attributes.
• Goal: previously unseen records should be assigned a class
as accurately as possible.

• A test set is used to determine the accuracy of the model.


Usually, the given data set is divided into training and test
sets, with training set used to build the model and test set
used to validate it.
The Basics - General Approach
Attribute Class
vector label

Training Classification
data
Algorithm

Classification rules
The Basics - General Approach
Classification rules

Estimate classifier accuracy Predict classification


(to avoid overfitting) of new data

(Fahim Anwar, youth, medium)


Test Loan decision?
data

% test set tuples correctly classified Risky


Examples of Classification Task
Applications
• Predicting tumor cells as benign or malignant
• Classifying credit card transactions as legitimate or
fraudulent
• Categorizing news stories as finance, weather,
entertainment, sports, etc
Different Types of Classifiers
• Back propagation
• Bayesian Classifiers
• Decision Trees
• Density estimation methods
• Fuzzy set theory
• Linear discriminant analysis (LDA)
• Logistic regression
• Naive bayes classifier
• Nearest Neighborhood Classification
• Neural networks
• Quadratic discriminant analysis (QDA)
• Support Vector Machine
• many more…
Decision Tree Learning
Definition
• Decision tree is a classifier in the form of a tree structure
– Decision node: specifies a test on a single attribute
– Leaf node: indicates the value of the target attribute
– Arc/edge: split of one attribute
– Path: a disjunction of test to make the final decision

Decision trees classify


instances or examples by
starting at the root of the
tree and moving through it
until a leaf node
Key Requirements
• Attribute-value description: object or case must be
expressible in terms of a fixed collection of properties
or attributes (e.g., hot, mild, cold).
• Predefined classes (target values): the target
function has discrete output values (single or
multiclass)
• Sufficient data: enough training cases should be
provided to learn the model.
Training Example
Day Outlook Temp Humidity Wind Play
• Consider an example of playing tennis
• Attributes (features) D1 Sunny Hot High Weak No
– Outlook, temp, humidity, wind D2 Sunny Hot High Strong No
D3 Cloudy Hot High Weak Yes
• Values
D4 Overcast Mild High Weak Yes
– Description of features
D5 Overcast Cool Normal Weak Yes
– Eg: Outlook values - sunny, cloudy,
D6 Overcast Cool Normal Strong No
Overcast
D7 Cloudy Cool Normal Strong Yes
• Target D8 Sunny Mild High Weak No
– Play D9 Sunny Cool Normal Weak Yes
– Represents the output of the model D10 Overcast Mild Normal Weak Yes
• Instances D11 Sunny Mild Normal Strong Yes
– Examples D1 to D14 of the dataset D12 Cloudy Mild High Strong Yes
• Concept D13 Cloudy Hot Normal Weak Yes
D14 Overcast Mild High Strong No
– Learn to decide whether to play
tennis i.e. find h from given data set
Decision Tree for Play Tennis

Shall we play tennis today?


Advantages of using Decision Tree
• Fast to implement
• Simple to implement because it perform classification without
much computation
• Can convert result to a set of easily interpretable rules that
can be used in knowledge system such as database, where
rules are built from the label of the nodes and the labels of
the arcs.
• Can handle continuous and categorical variables
• Can handle noisy data
• provide a clear indication of which fields are most important
for prediction or classification
Disadvantages of using Decision
Tree
• "Univariate" splits/partitioning using only one attribute
at a time so limits types of possible trees
• Large decision trees may be hard to understand
• Perform poorly with many class and small data.
ID3 Algorithm

Other Decision Tree Learning Algorithms


• C4.5
• CART (Classification and Regression Tree)
• CHAID (CHi-squared Automatic Interaction Detection)
• MARS
• QUEST (Quick, Unbiased, Efficient, Statistical Tree)
• SLIQ
• SPRINT
• Many more…..
Decision Tree Construction
• Top-down tree construction schema:
• Examine training database and find best splitting
predicate for the root node
• Partition training database
• Recourse on each child node
Decision Tree Induction - Attribute Selection
Measures
• Attribute selection measure  a heuristic for
selecting the splitting criterion that “best” splits a
given data partition into smaller mutually exclusive
classes
• Attributes are ranked according to a measure
– attribute having the best score is chosen as the
splitting attribute
– split-point for continuous attributes
– splitting subset for discrete attributes with
binary trees
• Measures: Information Gain, Gain Ratio, Gini Index
Decision Tree Induction - Attribute Selection
Measures
Information Gain
• Based on Shannon’s information theory
• Goal is to minimize the expected number of tests needed to
classify a tuple
– guarantee that a simple tree is found
• Attribute with the highest information gain is chosen as the
splitting attribute
– minimizes information needed to classify tuples in
resulting partitions
– reflects least “impurity” in resulting partitions
Decision Tree Induction - Attribute Selection
Measures
Decision Tree Induction - Attribute Selection
Measures
 
Decision Tree Induction - Attribute Selection
Measures
 
Decision Tree Induction - Attribute Selection
Measures
RID age income student Credit_rating Class: buys_computer C1 = Yes = 9 C2 = No = 5
1 youth high no fair no  
2 youth high no excellent no
3 middle aged high no fair yes
4 senior medium no fair yes
5 senior low yes fair yes
6 senior low yes excellent no
7 middle aged low yes excellent yes
8 youth medium no fair no
9 youth low yes fair yes
10 senior medium yes fair yes
11 youth medium yes excellent yes
12 middle aged medium no excellent yes
13 middle aged high yes fair yes
14 senior medium no excellent no
Decision Tree Induction - Attribute Selection
Measures
RID age income student Credit_rating Class: buys_computer C1 = Yes = 9 C2 = No = 5
1 youth high no fair no  
2 youth high no excellent no
3 middle aged high no fair yes
4 senior medium no fair yes
5 senior low yes fair yes
6 senior low yes excellent no
7 middle aged low yes excellent yes
8 youth medium no fair no
9 youth low yes fair yes
10 senior medium yes fair yes
11 youth medium yes excellent yes
12 middle aged medium no excellent yes
13 middle aged high yes fair yes
14 senior medium no excellent no
Decision Tree Induction - Attribute Selection
Measures
RID age income student Credit_rating Class: buys_computer C1 = Yes = 9 C2 = No = 5
1 youth high no fair no  
2 youth high no excellent no
3 middle aged high no fair yes
4 senior medium no fair yes
5 senior low yes fair yes
6 senior low yes excellent no
7 middle aged low yes excellent yes
8 youth medium no fair no
9 youth low yes fair yes
10 senior medium yes fair yes
11 youth medium yes excellent yes
12 middle aged medium no excellent yes
13 middle aged high yes fair yes
14 senior medium no excellent no
Decision Tree Induction - Attribute Selection
Measures
RID age income student Credit_rating Class: buys_computer C1 = Yes = 9 C2 = No = 5
1 youth high no fair no  
2 youth high no excellent no
3 middle aged high no fair yes
4 senior medium no fair yes
5 senior low yes fair yes
6 senior low yes excellent no
7 middle aged low yes excellent yes
8 youth medium no fair no
9 youth low yes fair yes
10 senior medium yes fair yes
11 youth medium yes excellent yes
12 middle aged medium no excellent yes
13 middle aged high yes fair yes
14 senior medium no excellent no

3- compute Gain(age) = 0.94 - 0.694 = 0.246 bits


Decision Tree Induction - Attribute Selection
Measures
RID age income student Credit_rating Class: buys_computer
1 youth high no fair no
2 youth high no excellent no
Gain(age) = 0.246 bits
Gain(income) = 0.029 bits
3 middle aged high no fair yes
Gain(student) = 0.151 bits
4 senior medium no fair yes Gain(credit_rating) = 0.048 bits
5 senior low yes fair yes
6 senior low yes excellent no
Gain(age) has highest information gain
7 middle aged low yes excellent yes
8 youth medium no fair no
9 youth low yes fair yes
10 senior medium yes fair yes
11 youth medium yes excellent yes
12 middle aged medium no excellent yes
13 middle aged high yes fair yes
14 senior medium no excellent no
Decision Tree Induction - Attribute Selection
Measures
Decision Tree Induction - Attribute Selection
Measures
Decision Tree Induction - Attribute Selection
Measures
• The same process of splitting has to happen for the two remaining
branches.
• For branch age<=30 we still have attributes income, student and
credit_rating.
• The mutual information is I(2 Yes, 3 No)= I(2,3)= -2/5 log2(2/5) – 3/5
log2(3/5)=0.97
• For Income we have three values income high (0 yes and 2 no),
income medium (1 yes and 1 no) and income low (1 yes and 0 no)
• Entropy(income) = 2/5(0) + 2/5 (-1/2log(1/2)-1/2log(1/2)) + 1/5 (0)
• = 2/5 (1) = 0.4
• Gain(income) = 0.97 – 0.4 = 0.57
Decision Tree Induction - Attribute Selection
Measures
• For Student we have two values student yes (2 yes
and 0 no) and student no (0 yes 3 no)
• Entropy(student) = 2/5(0) + 3/5(0) = 0
• Gain (student) = 0.97 – 0 = 0.97
• We can then safely split on attribute student without
checking the other attributes since the information
gain is maximized.
Decision Tree Induction - Attribute Selection
Measures

Since these two new branches are from distinct classes, we make them into
leaf nodes with their respective class as label.
Decision Tree Induction - Attribute Selection
Measures
Decision Tree Induction - Attribute Selection
Measures
• Again the same process is needed for the other branch of
age.
• The mutual information is I(S Yes, S No)= I(3,2)= -3/5
log2(3/5) – 2/5 log2(2/5)=0.97
• For Income we have two values income medium (2 yes and
1 no) and income low (1 yes and 1 no)
• Entropy(income) = 3/5(-2/3log(2/3)-1/3log(1/3)) + 2/5 (-
1/2log(1/2)-1/2log(1/2))
• = 3/5(0.9182)+2/5 (1) = 0.55+0. 4= 0.95
• Gain(income) = 0.97 – 0.95 = 0.02
Decision Tree Induction - Attribute Selection
Measures
• For Student we have two values student yes (2 yes and 1 no)
and student no (1 yes and 1 no)
• Entropy(student) = 3/5(-2/3log(2/3)-1/3log(1/3)) + 2/5 (-
1/2log(1/2)-1/2log(1/2)) = 0.95
• Gain (student) = 0.97 – 0.95 = 0.02
• For Credit_Rating we have two values credit_rating fair (3
yes and 0 no) and credit_rating excellent (0 yes and 2 no)
• Entropy(credit_rating) = 0
• Gain(credit_rating) = 0.97 – 0 = 0.97
Decision Tree Induction - Attribute Selection
Measures
Decision Tree Induction - Attribute Selection
Measures
• We then split based on credit_rating.
• These splits give partitions each with records from the same class
• make these into leaf nodes with their class label attached

New example: age<=30, income=medium, student=yes, credit-rating=fair


Follow branch(age<=30) then student=yes we predict Class=yes Buys_computer =
yes
Decision Tree Induction - Attribute Selection
Measures
Decision Tree Induction - Tree Pruning
• Data may be overfitted to dataset anomalies and outliers
• Pruning removes the least reliable branches
– DT becomes less complex
• Prepruning  statistically assess the goodness of a split
before it takes place
– hard to choose thresholds for statistical significance
• Postpruning  remove sub-trees from already constructed
trees
1. remove sub-tree branches and replace with leaf node
2. leaf is labeled with most frequent class in sub-tree
Decision Tree Induction - Tree Pruning
Decision Tree Induction - Rule Extraction
from a Decision Tree
•  
Rule Extraction from a Decision Tree – What are
Rules?
RID age income student Credit_rating Class: buys_computer
1 youth high no fair no
2 youth high no excellent no   
R1:
R1: (age
(age =youth)
=youth) ^^ (student
(student =yes)
=yes)
3 middle aged high no fair yes
(buys
(buys computer
computer == yes)
yes)
4 senior medium no fair yes
5 senior low yes fair yes
6 senior low yes excellent no    2
𝑐𝑜𝑣𝑒𝑟𝑎𝑔𝑒 ( 𝑅 1 )= =14.28 %
7 middle aged low yes excellent yes
14
8 youth medium no fair no
9 youth low yes fair yes    2
𝑎𝑐𝑐𝑢𝑟𝑎𝑐𝑦 ( 𝑅 1 )= =100 %
10
11
senior
youth
medium
medium
yes
yes
fair
excellent
yes
yes
2
12 middle aged medium no excellent yes
X:
X: (age
(age == youth,
youth, income
income == medium,
medium,
13 middle aged high yes fair yes
14 senior medium no excellent no student
student == yes,
yes, credit_rating=fair)
credit_rating=fair)
Rule Extraction from a Decision Tree
• Rules are easier to understand than large
trees.
• One rule is created for each path form
the root to leaf
• Each attribute-value pair along a path
forms a conjunction(ANDed): the leaf
holds the class prediction (THEN)
• Rules are mutually exclusive and
exhaustive

Example: Rule extraction from our buy_computer decision-tree


R1: IF age =youth AND student =no THEN buys computer =no
R2: IF age =youth AND student =yes THEN buys computer =yes
R3: IF age =middle aged THEN buys computer =yes
R4: IF age =senior AND credit rating =excellent THEN buys computer =yes
R5: IF age = senior AND credit rating =fair THEN buys computer =no
Prediction
What Is Prediction?
• Prediction is similar to classification
– First, construct a model
– Second, use model to predict unknown value
• Major method for prediction is regression
– Linear and multiple regression
– Non-linear regression

• Prediction is different from classification


– Classification refers to predict categorical class label
– Prediction models continuous-valued functions
Linear and Multiple Regression Analysis
• Linear regression: Y =  +  X
– Two parameters ,  and  specify the line and are to be estimated by
using the data at hand.
– using the least squares criterion to the known values of Y 1, Y2, …, X1, X2,
….
s _ _

 ( x  x)( yi i  y)
 i 1
s _

 i
( x
i 1
 x ) 2

_ _
  y  x
• Multiple regression: Y = b0 + b1 X1 + b2 X2.
– Many nonlinear functions can be transformed into the above.
Example: Linear Regression Analysis
X Y • Given data
_ _
(Years (Salary)
experience) x  9.1 y  55.4
3 30K (3  9.1)(30  55.4)  ...  (16  9.1)(83  55.4)

8 57K (3  9.1) 2  ...(16  9.1) 2
9 64K   55.4  (3.5)(9.1)  23.6
13 75K
3 36K
6 43K
11 59K
21 90K
1 20K
16 83K
y = 23.6 + 3.5x
Example: Linear Regression Analysis
X (Years Y Y (Predicted) Mean Absolute Root Mean Squared
experience) (Salary) [Y=23.6 +3.5X] Error Error

3 30K 34.1K abs(Y-Yp)=4.1 Sqr(Y-Yp)=16.81


8 57K 51.6K 5.4 29.16
9 64K 55.1K 8.9 79.21
13 75K 69.1K 5.9 34.81
3 36K 34.1K 1.9 3.61
6 43K 44.6K 1.6 2.56
11 59K 62.1K 3.1 9.61
21 90K 97.1K 7.1 50.41
1 20K 27.1K 7.1 50.41
16 83K 79.6K 3.4 11.56

4.85 5.37
Classification vs Regression 
• Classification means to • Regression means to
group the output into a predict the output value
class. using training data. 
• classification to predict • regression to predict
the type of tumor i.e. the house price from
harmful or not harmful training data 
using training data • if it is a real
• if it is discrete/categorical number/continuous,
variable, then it is then it is regression
classification problem problem.
TUTORIAL 07
1. Compare traditional programming with machine learning.
2. What is learning? Why to learn?
3. What is supervised learning? What are supervised learning
algorithms?
4. What is classifier? Explain with an example.
5. What is decision tree? What are pros and cons of decision tree?
6. What are attribute selection measures?
7. What are the criteria to prune a tree?
8. How rules are extracted from decision tree? Given an example.
9. How the goodness of rule is measure?
10. What is prediction? How it is different from classification?

You might also like