THE APRIORI ALGORITHM
PRESENTED BY
MAINUL HASSAN
INTRODUCTION
The Apriori Algorithmis an influential algorithm for mining
frequent itemsets for boolean association rules
Some key points in Apriori algorithm –
• To mine frequent itemsets from traditional database for
boolean association rules.
• A subset of frequent itemset must also be frequent itemsets.
For example, if {l1, l2} is a frequent itemset then {l1}, {l2}
should be frequent itemsets.
• An iterative way to find frequent itemsets.
• Use the frequent itemsets to generate association rules.
CONCEPTS
• A set of all items in a store
• A set of all transactions (Transactional Database T)
• Each is a set of items s.t.
• Each transaction has a transaction ID (TID).
Apriori algorithm divided into 3 sections as –
},....,,{ 21 miiiI 
},....,,{ 21 NtttT 
it lt 
it
Initial frequent
itemsets
Candidate
generation
Support
calculation
Candidate pruning
CONCEPTS
• Uses level wise search where k itemsets are use to explore
(k+1) itemset.
• Frequent subsets are extended one item at a time, which is
known as candidate generation process.
• Groups of candidates are texted against the data.
• It identifies the frequent individual items in the database and
extends them to larger and larger itemsets as long as those
itemsets appear sufficiently often in the database.
• Apriori algorithm determines frequent itemset to determine
association rules.
• All infrequent itemsets can be pruned if it has an infrequent
subset.
THE APRIORI ALGORITHM – PSEDUO
CODE
o Join Step: is generated by joining with itself.
o Prune Step: Any (k-1) itemset that is not frequent cannot be a subset of a
frequent k itemset
o Pseduo – Code:
: candidate itemset of size k
: frequent itemset of size k
= {frequent items};
for (k = 1; != ; k++) do begin
candidate key generated from
for each transaction t in database do increment the count of all
candidates in that are contained in t
= candidate in with min_support
end
return
kC 1kL
kC
kL
1L
kL 
1kC kL
1kC
1kL 1kC
kk L
HOW THE ALGORITHM WORKS
1. We have to build candidate list for k itemsets and extract a
frequent list of k-itemsets using support count.
2. After that we use the frequent list of k itemsets in
determining the candidate and frequent list of k+1 itemsets.
3. We use pruning to do that.
4. We repeat until we have an empty candidate or frequent
support of k itemsets.
5. Then return the list of k-1 itemsets.
EXAMPLE OF APRIORI
ALGORITHM
Consider the following Transactional Database –
Setp 1: Minimum support count = 2
TID Items
T100 1 2 3
T200 2 3 5
T300 1 2 3 5
T400 2 5
T500 1 3 5
itemse
ts
Support
{1} 3
{2} 3
{3} 4
{4} 1
{5} 4
Candidate
itemset -1
Frequent itemset
-1
itemse
ts
Support
{1} 3
{2} 3
{3} 4
{5} 4
prune
Because minimum support count is 2
EXAMPLE OF APRIORI
ALGORITHM
Step 2:
itemse
ts
suppor
t
{1, 2} 1
{1, 3} 3
{1, 5} 2
{2, 3} 2
{2, 5} 3
{3, 5} 3
TID Items
T100 1 2 3
T200 2 3 5
T300 1 2 3 5
T400 2 5
T500 1 3 5
Candidate
itemset -2 itemse
ts
Support
{1, 3} 3
{1, 5} 2
{2, 3} 2
{2, 5} 3
{3, 5} 3
Frequent itemset
- 2
prune
Database
EXAMPLE OF APRIORI
ALGORITHM
Step 3:
itemsets In FI2?
{1, 2, 3}
{1, 2}, {1, 3}, {2,
3}
No
{1, 2, 5}
{1, 3}, {1, 5}, {2,
5}
Yes
{1, 3, 5}
{1, 3}, {1, 5}, {3,
5}
No
{2, 3, 5} Yes
TID Items
T100 1 2 3
T200 2 3 5
T300 1 2 3 5
T400 2 5
T500 1 3 5
Candidate
itemset -3 itemse
ts
support
{1, 3,
5}
2
{2, 3,
5}
2
Frequent itemset
- 3
itemse
ts
Support
{1, 3} 3
{1, 5} 2
{2, 3} 2
{2, 5} 3
Frequent itemset
- 2
Don’t match
Remember ..
A subset of frequent itemset must also be frequent itemsets
Database
Same as other two itemsets
EXAMPLE OF APRIORI
ALGORITHM
Step 4:
itemsets suppor
t
{1, 2, 3,
5}
1
TID Items
T100 1 2 3
T200 2 3 5
T300 1 2 3 5
T400 2 5
T500 1 3 5
Candidate
itemset -4 itemse
ts
Support
Empty
Frequent itemset
- 4
prune
Database
itemsets In FI -3
{1, 2, 3, 5}
{1, 2, 3 }, {1, 2,
5},
{1, 3, 5}, {2, 3,
5}
No
itemse
ts
support
{1, 3,
5}
2
Frequent itemset
- 3
Don’t match
Remember ..
A subset of frequent itemset must also be frequent itemsets
Candidate
itemset -4
The itemsets
is empty so
Split
APRIORI ALGORITHM
• Advantages
• Uses large itemsets property
• Easily parallelized
• Easy to implement
• Disadvantages
• Assumes transaction database is memory resident.
• Requires many database scans.
THE END