0% found this document useful (0 votes)
5 views23 pages

Machine Learning Unit - 4

Rule-based models in machine learning utilize a set of interpretable rules for decision-making, with decision trees being a prominent example that structures decisions in a tree-like format. These models, including rule-based classifiers and association rule mining, excel in explainability but may struggle with complex relationships and noise in data. Techniques such as pruning and attribute selection measures like Information Gain and Gini Index are essential for optimizing decision trees and enhancing their predictive performance.
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views23 pages

Machine Learning Unit - 4

Rule-based models in machine learning utilize a set of interpretable rules for decision-making, with decision trees being a prominent example that structures decisions in a tree-like format. These models, including rule-based classifiers and association rule mining, excel in explainability but may struggle with complex relationships and noise in data. Techniques such as pruning and attribute selection measures like Information Gain and Gini Index are essential for optimizing decision trees and enhancing their predictive performance.
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Rule Based Models:

Rule-based models in machine learning represent decision-making using a set of rules. These
models are interpretable and transparent, making them valuable in domains where
explainability is crucial. Here's an overview:
Decision Trees: They consist of a tree-like structure where each internal node represents a
feature, each branch represents a decision rule, and each leaf node represents the outcome.
Decision trees are intui ve and can handle both numerical and categorical data.
Rule-Based Classifica on: Models like RIPPER (Repeated Incremental Pruning to Produce Error
Reduc on) or CN2 (Induc on of Mul variate Decision Trees) generate rules by itera vely
construc ng simpler decision rules that cover specific classes in the dataset.
Associa on Rule Mining: Algorithms like Apriori and FP-Growth extract rules showing
rela onships between different items in datasets, commonly used in market basket analysis.
Rule Ensemble Models: These combine mul ple rules from various sources or models (like
decision trees) to enhance predic ve performance.
Fuzzy Rule-Based Systems: They use fuzzy logic to handle uncertainty and imprecision in data,
crea ng rules based on fuzzy sets and fuzzy reasoning.
Rule-based models have strengths in interpretability and explainability, making them suitable
for domains where understanding the decision-making process is as important as predic ve
accuracy. However, they might struggle with capturing complex rela onships present in some
datasets and can be sensi ve to noise or outliers.
Decision Tree

o Decision Tree is a Supervised learning technique that can be used for both classifica on
and Regression problems, but mostly it is preferred for solving Classifica on problems. It
is a tree-structured classifier, where internal nodes represent the features of a dataset,
branches represent the decision rules and each leaf node represents the outcome.
o In a Decision tree, there are two nodes, which are the Decision Node and Leaf
Node. Decision nodes are used to make any decision and have mul ple branches, whereas
Leaf nodes are the output of those decisions and do not contain any further branches.
o The decisions or the test are performed on the basis of features of the given dataset.
o It is a graphical representa on for ge ng all the possible solu ons to a
problem/decision based on given condi ons.

Dept of CSE, ASGI B. Uma Maheswari


Assistant Professor
o It is called a decision tree because, similar to a tree, it starts with the root node, which
expands on further branches and constructs a tree-like structure.
o In order to build a tree, we use the CART algorithm, which stands for Classifica on and
Regression Tree algorithm.
o A decision tree simply asks a ques on, and based on the answer (Yes/No), it further split
the tree into subtrees.
o Below diagram explains the general structure of a decision tree:

Note: A decision tree can contain categorical data (YES/NO) as well as numeric data.

Why use Decision Trees?

There are various algorithms in Machine learning, so choosing the best algorithm for the given
dataset and problem is the main point to remember while creating a machine learning model.
Below are the two reasons for using the Decision tree:

o Decision Trees usually mimic human thinking ability while making a decision, so it is easy
to understand.
o The logic behind the decision tree can be easily understood because it shows a tree-like
structure.

Dept of CSE, ASGI B. Uma Maheswari


Assistant Professor
Decision Tree Terminologies
Root Node: Root node is from where the decision tree starts. It represents the en re dataset,
which further gets divided into two or more homogeneous sets.
Leaf Node: Leaf nodes are the final output node, and the tree cannot be segregated further a er
ge ng a leaf node.
Spli ng: Spli ng is the process of dividing the decision node/root node into sub-nodes
according to the given condi ons.
Branch/Sub Tree: A tree formed by spli ng the tree.
Pruning: Pruning is the process of removing the unwanted branches from the tree.
Parent/Child node: The root node of the tree is called the parent node, and other nodes are
called the child nodes.

How does the Decision Tree algorithm Work?

In a decision tree, for predicting the class of the given dataset, the algorithm starts from the root
node of the tree. This algorithm compares the values of root attribute with the record (real
dataset) attribute and, based on the comparison, follows the branch and jumps to the next
node.

For the next node, the algorithm again compares the attribute value with the other sub-nodes
and move further. It continues the process until it reaches the leaf node of the tree. The
complete process can be better understood using the below algorithm:eoForward Skip 10s

o Step-1: Begin the tree with the root node, says S, which contains the complete dataset.
o Step-2: Find the best a ribute in the dataset using A ribute Selec on Measure (ASM).
o Step-3: Divide the S into subsets that contains possible values for the best a ributes.
o Step-4: Generate the decision tree node, which contains the best a ribute.
o Step-5: Recursively make new decision trees using the subsets of the dataset created in
step -3. Con nue this process un l a stage is reached where you cannot further classify
the nodes and called the final node as a leaf node.

Example: Suppose there is a candidate who has a job offer and wants to decide whether he
should accept the offer or Not. So, to solve this problem, the decision tree starts with the root
node (Salary attribute by ASM). The root node splits further into the next decision node
(distance from the office) and one leaf node based on the corresponding labels. The next
decision node further gets split into one decision node (Cab facility) and one leaf node. Finally,
Dept of CSE, ASGI B. Uma Maheswari
Assistant Professor
the decision node splits into two leaf nodes (Accepted offers and Declined offer). Consider the
below diagram:

A ribute Selec on Measures

While implementing a Decision tree, the main issue arises that how to select the best attribute
for the root node and for sub-nodes. So, to solve such problems there is a technique which is
called as Attribute selection measure or ASM. By this measurement, we can easily select the
best attribute for the nodes of the tree. There are two popular techniques for ASM, which are:

o Informa on Gain
o Gini Index

1. Informa on Gain:
o Informa on gain is the measurement of changes in entropy a er the segmenta on of a
dataset based on an a ribute.
o It calculates how much informa on a feature provides us about a class.
o According to the value of informa on gain, we split the node and build the decision tree.
o A decision tree algorithm always tries to maximize the value of informa on gain, and a
node/a ribute having the highest informa on gain is split first. It can be calculated using
the below formula:

Dept of CSE, ASGI B. Uma Maheswari


Assistant Professor
1. Information Gain= Entropy(S)- [(Weighted Avg) *Entropy(each feature)

Entropy: Entropy is a metric to measure the impurity in a given attribute. It specifies


randomness in data. Entropy can be calculated as:

Entropy(s)= -P(yes)log2 P(yes)- P(no) log2 P(no)

Where,

o S= Total number of samples


o P(yes)= probability of yes
o P(no)= probability of no

2. Gini Index:
o Gini index is a measure of impurity or purity used while crea ng a decision tree in the
CART(Classifica on and Regression Tree) algorithm.
o An a ribute with the low Gini index should be preferred as compared to the high Gini
index.
o It only creates binary splits, and the CART algorithm uses the Gini index to create binary
splits.
o Gini index can be calculated using the below formula:

Gini Index= 1- ∑jPj2


Pruning: Ge ng an Op mal Decision tree

Pruning is a process of deleting the unnecessary nodes from a tree in order to get the optimal
decision tree.

A too-large tree increases the risk of overfitting, and a small tree may not capture all the
important features of the dataset. Therefore, a technique that decreases the size of the learning
tree without reducing accuracy is known as Pruning. There are mainly two types of
tree pruning technology used:

o Cost Complexity Pruning


o Reduced Error Pruning.

Dept of CSE, ASGI B. Uma Maheswari


Assistant Professor
Advantages of the Decision Tree
o It is simple to understand as it follows the same process which a human follow while
making any decision in real-life.
o It can be very useful for solving decision-related problems.
o It helps to think about all the possible outcomes for a problem.
o There is less requirement of data cleaning compared to other algorithms.

Disadvantages of the Decision Tree


o The decision tree contains lots of layers, which makes it complex.
o It may have an overfi ng issue, which can be resolved using the Random Forest
algorithm.
o For more class labels, the computa onal complexity of the decision tree may increase.

Rule-Based Classifier
Rule-based classifiers are just another type of classifier which makes the class decision
depending by using various “if..else” rules. These rules are easily interpretable and thus these
classifiers are generally used to generate descriptive models. The condition used with “if” is
called the antecedent and the predicted class of each rule is called the consequent.
Properties of rule-based classifiers:
 Coverage: The percentage of records which satisfy the antecedent conditions of a
particular rule.
 The rules generated by the rule-based classifiers are generally not mutually exclusive,
i.e. many rules can cover the same record.
 The rules generated by the rule-based classifiers may not be exhaustive, i.e. there
may be some records which are not covered by any of the rules.
 The decision boundaries created by them is linear, but these can be much more
complex than the decision tree because the many rules are triggered for the same
record.
 The IF part of the rule is called rule antecedent or precondition.
 The THEN part of the rule is called rule consequent.
 The antecedent part the condition consist of one or more attribute tests and these tests
are logically ANDed.
 The consequent part consists of class prediction.

 Association Rule Learning


 Association rule learning is a type of unsupervised learning technique that checks for the
dependency of one data item on another data item and maps accordingly so that it can
be more profitable. It tries to find some interesting relations or associations among the

Dept of CSE, ASGI B. Uma Maheswari


Assistant Professor
variables of dataset. It is based on different rules to discover the interesting relations
between variables in the database.
 The association rule learning is one of the very important concepts of machine learning,
and it is employed in Market Basket analysis, Web usage mining, continuous production,
etc. Here market basket analysis is a technique used by the various big retailer to discover
the associations between items. We can understand it by taking an example of a
supermarket, as in a supermarket, all products that are purchased together are put
together.
 For example, if a customer buys bread, he most likely can also buy butter, eggs, or milk,
so these products are stored within a shelf or mostly nearby. Consider the below diagram:


 Association rule learning can be divided into three types of algorithms:

1. Apriori
2. Eclat
3. F-P Growth Algorithm

We will understand these algorithms in later chapters.

How does Associa on Rule Learning work?

Association rule learning works on the concept of If and Else Statement, such as if A then B.

Dept of CSE, ASGI B. Uma Maheswari


Assistant Professor
Here the If element is called antecedent, and then statement is called as Consequent. These
types of relationships where we can find out some association or relation between two items is
known as single cardinality. It is all about creating rules, and if the number of items increases,
then cardinality also increases accordingly. So, to measure the associations between thousands
of data items, there are several metrics. These metrics are given below:

o Support
o Confidence
o Li

Let's understand each of them:

Support

Support is the frequency of A or how frequently an item appears in the dataset. It is defined as
the fraction of the transaction T that contains the itemset X. If there are X datasets, then for
transactions T, it can be written as:

Confidence

Confidence indicates how often the rule has been found to be true. Or how often the items X
and Y occur together in the dataset when the occurrence of X is already given. It is the ratio of
the transaction that contains X and Y to the number of records that contain X.

Li

It is the strength of any rule, which can be defined as below formula:

It is the ratio of the observed support measure and expected support if X and Y are independent
of each other. It has three possible values:

Dept of CSE, ASGI B. Uma Maheswari


Assistant Professor
o If Li = 1: The probability of occurrence of antecedent and consequent is independent of
each other.
o Li >1: It determines the degree to which the two itemsets are dependent to each other.
o Li <1: It tells us that one item is a subs tute for other items, which means one item has
a nega ve effect on another.

Types of Associa on Rule Lerning

Association rule learning can be divided into three algorithms:

Apriori Algorithm

This algorithm uses frequent datasets to generate association rules. It is designed to work on
the databases that contain transactions. This algorithm uses a breadth-first search and Hash
Tree to calculate the itemset efficiently.

It is mainly used for market basket analysis and helps to understand the products that can be
bought together. It can also be used in the healthcare field to find drug reactions for patients.

The Apriori algorithm uses frequent item sets to generate association rules, and it is designed
to work on the databases that contain transactions. With the help of these association rule, it
determines how strongly or how weakly two objects are connected. This algorithm uses
a breadth-first search and Hash Tree to calculate the itemset associations efficiently. It is the
iterative process for finding the frequent item sets from the large dataset.

This algorithm was given by the R. Agrawal and Srikant in the year 1994. It is mainly used
for market basket analysis and helps to find those products that can be bought together. It can
also be used in the healthcare field to find drug reactions for patients.

What is Frequent Itemset?

Frequent itemsets are those items whose support is greater than the threshold value or user-
specified minimum support. It means if A & B are the frequent itemsets together, then
individually A and B should also be the frequent itemset.

Suppose there are the two transactions: A= {1,2,3,4,5}, and B= {2,3,7}, in these two transactions,
2 and 3 are the frequent itemsets.

Steps for Apriori Algorithm

Below are the steps for the apriori algorithm:


Dept of CSE, ASGI B. Uma Maheswari
Assistant Professor
Step-1: Determine the support of itemsets in the transactional database, and select the
minimum support and confidence.

Step-2: Take all supports in the transaction with higher support value than the minimum or
selected support value.

Step-3: Find all the rules of these subsets that have higher confidence value than the threshold
or minimum confidence.

Step-4: Sort the rules as the decreasing order of lift.

Apriori Algorithm Working

We will understand the apriori algorithm using an example and mathematical calculation:

Example: Suppose we have the following dataset that has various transactions, and from this
dataset, we need to find the frequent itemsets and generate the association rules using the
Apriori algorithm:

Solu on:
Step-1: Calcula ng C1 and L1:
o In the first step, we will create a table that contains support count (The frequency of each
itemset individually in the dataset) of each itemset in the given dataset. This table is called

Dept of CSE, ASGI B. Uma Maheswari


Assistant Professor
the Candidate set or C1.

o Now, we will take out all the itemsets that have the greater support count that the
Minimum Support (2). It will give us the table for the frequent itemset L1.
Since all the itemsets have greater or equal support count than the minimum support,
except the E, so E itemset will be removed.

Step-2: Candidate Genera on C2, and L2:


o In this step, we will generate C2 with the help of L1. In C2, we will create the pair of the
itemsets of L1 in the form of subsets.
o A er crea ng the subsets, we will again find the support count from the main transac on
table of datasets, i.e., how many mes these pairs have occurred together in the given
dataset. So, we will get the below table for C2:

o Again, we need to compare the C2 Support count with the minimum support count, and
a er comparing, the itemset with less support count will be eliminated from the table C2.

Dept of CSE, ASGI B. Uma Maheswari


Assistant Professor
It will give us the below table for L2

Step-3: Candidate genera on C3, and L3:


o For C3, we will repeat the same two processes, but now we will form the C3 table with
subsets of three itemsets together, and will calculate the support count from the dataset.
It will give the below table:

o Now we will create the L3 table. As we can see from the above C3 table, there is only one
combina on of itemset that has support count equal to the minimum support count. So,
the L3 will have only one combina on, i.e., {A, B, C}.

Step-4: Finding the associa on rules for the subsets:

To generate the association rules, first, we will create a new table with the possible rules from
the occurred combination {A, B.C}. For all the rules, we will calculate the Confidence using
formula sup( A ^B)/A. After calculating the confidence value for all rules, we will exclude the
rules that have less confidence than the minimum threshold(50%).

Consider the below table:

Dept of CSE, ASGI B. Uma Maheswari


Assistant Professor
As the given
Rules Support Confidence

A ^B → C 2 Sup{(A ^B) ^C}/sup(A ^B)= 2/4=0.5=50%

B^C → A 2 Sup{(B^C) ^A}/sup(B ^C)= 2/4=0.5=50%

A^C → B 2 Sup{(A ^C) ^B}/sup(A ^C)= 2/4=0.5=50%

C→ A ^B 2 Sup{(C^( A ^B)}/sup(C)= 2/5=0.4=40%

A→ B^C 2 Sup{(A^( B ^C)}/sup(A)= 2/6=0.33=33.33%

B→ B^C 2 Sup{(B^( B ^C)}/sup(B)= 2/7=0.28=28%

threshold or minimum confidence is 50%, so the first three rules A ^B → C, B^C → A, and A^C
→ B can be considered as the strong association rules for the given problem.

Advantages of Apriori Algorithm


o This is easy to understand algorithm
o The join and prune steps of the algorithm can be easily implemented on large datasets.

Disadvantages of Apriori Algorithm


o The apriori algorithm works slow compared to other algorithms.
o The overall performance can be reduced as it scans the database for mul ple mes.
o The me complexity and space complexity of the apriori algorithm is O(2D), which is very
high. Here D represents the horizontal width present in the database.

Eclat Algorithm

Eclat algorithm stands for Equivalence Class Transformation. This algorithm uses a depth-first
search technique to find frequent itemsets in a transaction database. It performs faster
execution than Apriori Algorithm.
Dept of CSE, ASGI B. Uma Maheswari
Assistant Professor
F-P Growth Algorithm

The F-P growth algorithm stands for Frequent Pattern, and it is the improved version of the
Apriori Algorithm. It represents the database in the form of a tree structure that is known as a
frequent pattern or tree. The purpose of this frequent tree is to extract the most frequent
patterns.

In Data Mining, finding frequent patterns in large databases is very important and has been
studied on a large scale in the past few years. Unfortunately, this task is computationally
expensive, especially when many patterns exist.

The FP-Growth Algorithm proposed by Han in. This is an efficient and scalable method for
mining the complete set of frequent patterns by pattern fragment growth, using an extended
prefix-tree structure for storing compressed and crucial information about frequent patterns
named frequent-pattern tree (FP-tree). In his study, Han proved that his method outperforms
other popular methods for mining frequent patterns, e.g. the Apriori Algorithm and the Tree
Projection. In some later works, it was proved that FP-Growth performs better than other
methods, including Eclat and Relim. The popularity and efficiency of the FP-Growth Algorithm
contribute to many studies that propose variations to improve its performance.

What is FP Growth Algorithm?

The FP-Growth Algorithm is an alternative way to find frequent item sets without using
candidate generations, thus improving performance. For so much, it uses a divide-and-conquer
strategy. The core of this method is the usage of a special data structure named frequent-
pattern tree (FP-tree), which retains the item set association information.

This algorithm works as follows:

o First, it compresses the input database crea ng an FP-tree instance to represent frequent
items.
o A er this first step, it divides the compressed database into a set of condi onal databases,
each associated with one frequent pa ern.
o Finally, each such database is mined separately.

Using this strategy, the FP-Growth reduces the search costs by recursively looking for short
patterns and then concatenating them into the long frequent patterns.

Dept of CSE, ASGI B. Uma Maheswari


Assistant Professor
In large databases, holding the FP tree in the main memory is impossible. A strategy to cope
with this problem is to partition the database into a set of smaller databases (called projected
databases) and then construct an FP-tree from each of these smaller databases.

FP-Tree

The frequent-pattern tree (FP-tree) is a compact data structure that stores quantitative
information about frequent patterns in a database. Each transaction is read and then mapped
onto a path in the FP-tree. This is done until all transactions have been read. Different
transactions with common subsets allow the tree to remain compact because their paths
overlap.

A frequent Pattern Tree is made with the initial item sets of the database. The purpose of the
FP tree is to mine the most frequent pattern. Each node of the FP tree represents an item of the
item set.

The root node represents null, while the lower nodes represent the item sets. The associations
of the nodes with the lower nodes, that is, the item sets with the other item sets, are maintained
while forming the tree.

Han defines the FP-tree as the tree structure given below:

1. One root is labelled as "null" with a set of item-prefix subtrees as children and a frequent-
item-header table.
2. Each node in the item-prefix subtree consists of three fields:
o Item-name: registers which item is represented by the node;
o Count: the number of transac ons represented by the por on of the path reaching
the node;
o Node-link: links to the next node in the FP-tree carrying the same item name or null
if there is none.
3. Each entry in the frequent-item-header table consists of two fields:
o Item-name: as the same to the node;
o Head of node-link: a pointer to the first node in the FP-tree carrying the item name.

Additionally, the frequent-item-header table can have the count support for an item. The below
diagram is an example of a best-case scenario that occurs when all transactions have the same
itemset; the size of the FP-tree will be only a single branch of nodes.

Dept of CSE, ASGI B. Uma Maheswari


Assistant Professor
The worst-case scenario occurs when every transaction has a unique item set. So the space
needed to store the tree is greater than the space used to store the original data set because
the FP-tree requires additional space to store pointers between nodes and the counters for each
item. The diagram below shows how a worst-case scenario FP-tree might appear. As you can
see, the tree's complexity grows with each transaction's uniqueness.

Advantages of FP Growth Algorithm

Here are the following advantages of the FP growth algorithm, such as:

o This algorithm needs to scan the database twice when compared to Apriori, which scans
the transac ons for each itera on.
o The pairing of items is not done in this algorithm, making it faster.
Dept of CSE, ASGI B. Uma Maheswari
Assistant Professor
o The database is stored in a compact version in memory.
o It is efficient and scalable for mining both long and short frequent pa erns.

Disadvantages of FP-Growth Algorithm

This algorithm also has some disadvantages, such as:

o FP Tree is more cumbersome and difficult to build than Apriori.


o It may be expensive.
o The algorithm may not fit in the shared memory when the database is large.

Applica ons of Associa on Rule Learning

It has various applications in machine learning and data mining. Below are some popular
applications of association rule learning:

o Market Basket Analysis: It is one of the popular examples and applica ons of associa on
rule mining. This technique is commonly used by big retailers to determine the associa on
between items.
o Medical Diagnosis: With the help of associa on rules, pa ents can be cured easily, as it
helps in iden fying the probability of illness for a par cular disease.
o Protein Sequence: The associa on rules help in determining the synthesis of ar ficial
Proteins.
o It is also used for the Catalog Design and Loss-leader Analysis and many more other
applica ons.

Difference between Apriori and FP Growth Algorithm

Apriori and FP-Growth algorithms are the most basic FIM algorithms. There are some basic
differences between these algorithms, such as:

Apriori FP Growth

Apriori generates frequent pa erns by making the FP Growth generates an FP-Tree for
itemsets using pairings such as single item set, making frequent pa erns.
double itemset, and triple itemset.

Dept of CSE, ASGI B. Uma Maheswari


Assistant Professor
Apriori uses candidate genera on where frequent FP-growth generates a condi onal
subsets are extended one item at a me. FP-Tree for every item in the data.

Since apriori scans the database in each step, it FP-tree requires only one database
becomes me-consuming for data where the scan in its beginning steps, so it
number of items is larger. consumes less me.

A converted version of the database is saved in the A set of condi onal FP-tree for every
memory item is saved in the memory

It uses a breadth-first search It uses a depth-first search.

CLASSIFICATION & REGRESSION TREES

Classifica on and regression trees (CART) may be a term used to describe decision tree
algorithms that are used for classifica on and regression learning tasks.

CART was introduced in the year 1984 by Leo Breiman, Jerome Friedman, Richard Olshen and
Charles Stone for regression task. It is additionally a predictive model which helps to seek out a
variable supported other labelled variables. To be more clear the tree models predict the result
by asking a group of if-else questions.

Two advantages of using tree model are as follows :

o They’re ready to capture the non-linearity within the data set.

o No need for standardiza on of knowledge when using tree models.

Dept of CSE, ASGI B. Uma Maheswari


Assistant Professor
What are Decision Trees?

If you strip it right down to the fundamentals, decision tree algorithms are nothing but if-else
statements which will be wont to predict a result supported data.

CLASSIFICATION TREES

It is an algorithm where the target variable is always fixed or categorical. The algorithm is then
used to identify the “class” within which a target variable would presumably fall into. An
example of a classification-type problem would be determining who will or won’t subscribe to
a digital platform; or who will or won’t graduate from high school.

These are samples of simple binary classifications where the specific variable can assume just
one of two, mutually exclusive values. In other cases, you would possibly need to predict among
a variety of various variables. as an example, you’ll need to predict which sort of smartphone a
consumer may plan to purchase. In such cases, there are multiple values for the specific variable.
Here’s what a classic classification tree seems like.

Regression Trees
Dept of CSE, ASGI B. Uma Maheswari
Assistant Professor
A regression tree refers to an algorithm where the target variable is and therefore the algorithm
is employed to predict it’s value. As an example of a regression type problem, you’ll want to
predict the selling prices of a residential house, which may be a continuous variable.

This will depend upon both continuous factors like square footage also as categorical factors
just like the sort of home, area during which the property is found then on.

CART Algorithm
Classification and Regression Trees (CART) is a decision tree algorithm that is used for both
classification and regression tasks. It is a supervised learning algorithm that learns from
labelled data to predict unseen data.
 Tree structure: CART builds a tree-like structure consisting of nodes and branches.
The nodes represent different decision points, and the branches represent the
possible outcomes of those decisions. The leaf nodes in the tree contain a predicted
class label or value for the target variable.
 Splitting criteria: CART uses a greedy approach to split the data at each node. It
evaluates all possible splits and selects the one that best reduces the impurity of the
resulting subsets. For classification tasks, CART uses Gini impurity as the splitting
criterion. The lower the Gini impurity, the more pure the subset is. For regression
tasks, CART uses residual reduction as the splitting criterion. The lower the residual
reduction, the better the fit of the model to the data.
 Pruning: To prevent overfitting of the data, pruning is a technique used to remove
the nodes that contribute little to the model accuracy. Cost complexity pruning and
information gain pruning are two popular pruning techniques. Cost complexity
pruning involves calculating the cost of each node and removing nodes that have a
Dept of CSE, ASGI B. Uma Maheswari
Assistant Professor
negative cost. Information gain pruning involves calculating the information gain of
each node and removing nodes that have a low information gain.
How does CART algorithm works?
The CART algorithm works via the following process:
 The best-split point of each input is obtained.
 Based on the best-split points of each input in Step 1, the new “best” split point is
identified.
 Split the chosen input according to the “best” split point.
 Continue splitting until a stopping rule is satisfied or no further desirable splitting is
available.

CART algorithm uses Gini Impurity to split the dataset into a decision tree .It does that by
searching for the best homogeneity for the sub nodes, with the help of the Gini index criterion.

When to use Classifica on and Regression Trees

Classification trees are mostly used when the dataset must be split into classes which is part of
the response variable. In many cases, the classes Yes or No. In other words, they’re just two and
mutually exclusive. In some cases, there could also be quite two classes during which case a
variant of the classification tree algorithm is employed.

Dept of CSE, ASGI B. Uma Maheswari


Assistant Professor
Regression trees, on the opposite hand, are used when the response variable is continuous. as
an example, if the response variable is some things just like the price of a property or the
temperature of the day, a regression tree is employed.

We can say that regression trees are used problems where prediction are used while
classification trees are used for classification-type problems.

How Classifica on and Regression Trees Work

A classification tree splits the dataset supported the homogeneity of knowledge. Say, as an
example, there are two variables; income and age; which determine whether or not a consumer
will buy a specific quite phone.

If the training data shows that 95% of individuals who are older than 30 bought the phone, the
info gets split there and age becomes a top node within the tree. This split makes the info “95%
pure”. Measures of impurity like entropy or Gini index are wont to quantify the homogeneity of
the info when it involves classification trees.

In a regression tree, a regression model is fit the target variable using each of the independent
variables. After this, the info is split at several points for every experimental variable.

At each such point, the error between the anticipated values and actual values is squared to
urge “A Sum of Squared Errors”(SSE). The SSE is compared across the variables and therefore
the variable or point which has rock bottom SSE is chosen because of the split point. This process
is sustained recursively.

Dept of CSE, ASGI B. Uma Maheswari


Assistant Professor
Difference Between Classifica on and Regression Trees

Decision trees are easily understood and there are several classification and regression trees
presentation to form things even simpler. However, it’s important to know that there are some
fundamental differences between classification and regression trees.

Advantages of Classifica on and Regression Trees

The purpose of the analysis conducted by any classification or regression tree is to make a group
of if-else conditions that leave the accurate prediction or classification of a case. Classification
and regression trees work to supply accurate predictions or predicted classifications, supported
the set of if-else conditions. they typically have several advantages over regular decision trees

Dept of CSE, ASGI B. Uma Maheswari


Assistant Professor

You might also like