Mathematical Model for Association Analysis
Association analysis (or association rule mining) is a data mining technique to discover
interesting relations (associations, correlations) among a large set of items in transactional
databases.
Basic Concepts
1. Itemset
o A collection of one or more items.
o Example: {Milk, Bread} is a 2-itemset.
2. Transaction (T)
o A set of items bought together.
o Example: T1 = {Milk, Bread, Butter}
3. Support (Sup)
o Fraction of transactions that contain an itemset.
o Formula:
Number of transactions containing X
Support( X )=
Total number of transactions
4. Confidence (Conf)
o Strength of the rule X → Y (probability of Y given X).
o Formula:
Support ( X ∪ Y )
Confidence( X → Y )=
Support (X )
5. Lift
o Measures how much more likely X and Y occur together than expected if
independent.
o Formula:
Support( X ∪ Y )
Lift ( X →Y )=
Support ( X)× Support (Y )
Large Item Sets
Definition
Large item sets (frequent item sets) are those itemsets whose support is greater
than or equal to a minimum support threshold.
These are crucial because only large item sets are used to generate strong association
rules.
Example Dataset
Transaction ID Items
T1 {Milk, Bread, Butter}
T2 {Milk, Bread}
T3 {Bread, Butter}
T4 {Milk, Bread, Butter, Eggs}
T5 {Milk, Eggs}
Step 1: 1-Itemsets with Support
Milk: appears in T1, T2, T4, T5 → Support = 4/5 = 0.8
Bread: appears in T1, T2, T3, T4 → Support = 4/5 = 0.8
Butter: appears in T1, T3, T4 → Support = 3/5 = 0.6
Eggs: appears in T4, T5 → Support = 2/5 = 0.4
If min support = 0.5, then {Milk}, {Bread}, {Butter} are large item sets. {Eggs} is not.
Step 2: 2-Itemsets with Support
{Milk, Bread}: in T1, T2, T4 → Support = 3/5 = 0.6
{Milk, Butter}: in T1, T4 → Support = 2/5 = 0.4 (below threshold)
{Bread, Butter}: in T1, T3, T4 → Support = 3/5 = 0.6
{Milk, Eggs}: in T4, T5 → Support = 2/5 = 0.4
So large 2-itemsets: {Milk, Bread}, {Bread, Butter}
Step 3: 3-Itemsets with Support
{Milk, Bread, Butter}: in T1, T4 → Support = 2/5 = 0.4 ❌
No 3-itemset is frequent (under threshold).
Final Large Item Sets
1-itemsets: {Milk}, {Bread}, {Butter}
2-itemsets: {Milk, Bread}, {Bread, Butter}
Generating Rules from Large Item Sets
From {Milk, Bread} (Support = 0.6):
Rule: Milk → Bread
Confidence = 0.6 / 0.8 = 0.75
Rule: Bread → Milk
Confidence = 0.6 / 0.8 = 0.75
From {Bread, Butter} (Support = 0.6):
Rule: Bread → Butter
Confidence = 0.6 / 0.8 = 0.75
Rule: Butter → Bread
Confidence = 0.6 / 0.6 = 1.0 (strong rule)