UNIT 2
RETAIL ING ANALYTICS
Market Basket Analysis and Lift, RFM Analysis and Optimizing Direct Mail Campaigns, Using the
SCAN*PRO Model and its Variants, Allocating Retail Space and Sales Resources,Forecasting Sales
from Few Data Points.
Market Basket Analysis (MBA) is a data mining technique used to uncover associations
between products bought together in transactions. It helps identify buying patterns based on
customer purchasing behavior.
It answers questions like:
"If a customer buys product A, how likely are they to buy product B?"
Key Concepts in Market Basket Analysis:
Itemset: A group of items purchased together.
Support: How frequently an itemset appears in the dataset.
Confidence: The likelihood of item B being purchased when item A is purchased.
Lift: A measure that shows how much more likely item B is purchased when item A
is purchased, compared to its typical purchase rate.
Then:
Support(Milk) = 100 / 1000 = 0.10
Support(Bread) = 80 / 1000 = 0.08
Support(Milk ∩ Bread) = 60 / 1000 = 0.06
Confidence(Milk → Bread) = 0.06 / 0.10 = 0.6
Lift(Milk → Bread) = 0.6 / 0.08 = 7.5 → This suggests a strong positive
association.
Example
Imagine a grocery store has the following transactions:
Transaction ID Items Bought
1 Bread, Milk
2 Bread, Diaper, Beer, Eggs
Transaction ID Items Bought
3 Milk, Diaper, Beer, Coke
4 Bread, Milk, Diaper, Beer
5 Bread, Milk, Diaper, Coke
Let’s analyze the rule:
"If Diaper, then Beer"
Count of:
Diaper transactions: 4
Diaper & Beer together: 3
So:
Support(Diaper → Beer) = 3/5 = 0.60
Confidence(Diaper → Beer) = 3/4 = 0.75
Support(Beer) = 3/5 = 0.60
Lift(Diaper → Beer) = 0.75 / 0.60 = 1.25
➡ Interpretation:
Customers who buy diapers are 1.25 times more likely to buy beer compared to average
customers.
Imp Questions
Explain Market Basket Analysis with an example. How is it useful in business decision-making?
Define Support, Confidence, and Lift in Market Basket Analysis. Explain the significance of each with
a suitable example.
Describe the steps involved in performing Market Basket Analysis using the Apriori algorithm.
A dataset contains 1000 transactions. Item A appears in 200 transactions, item B in 300 transactions,
and both in 120 transactions. Calculate support, confidence, and lift for the rule A → B and interpret
the results.
A store has 500 transactions:
120 contain tea
100 contain sugar
80 contain both
Calculate:
(a) Support(Tea), Support(Sugar), Support(Tea ∩ Sugar)
(b) Confidence(Tea → Sugar)
(c) Lift(Tea → Sugar)
Chapter : RFM Analysis and Optimizing Direct Mail
Campaigns
RFM Analysis (Recency, Frequency, Monetary) is a powerful marketing technique used to identify
and segment customers based on their purchasing behavior. It’s especially effective for optimizing
direct mail campaigns by targeting the most promising customers and increasing ROI
RFM stands for:
1. Recency (R) – How recently a customer made a purchase
o The more recent the purchase, the more likely the customer is to buy again.
2. Frequency (F) – How often the customer makes a purchase
o
Frequent buyers are more loyal and valuable over time.
3. Monetary (M) – How much money the customer spends
o Customers who spend more are often more important to a business.
By analyzing these three factors, businesses can segment their customer base into groups, making it
easier to target the right people for direct mail campaigns. This helps to increase the effectiveness of
the campaign, save money, and improve sales.
Step-by-Step Process of RFM Analysis\ How RFM Analysis works
1. Data Collection: First, a business gathers customer purchase data, including
transaction dates, amounts spent, and frequency of purchases.
2. Scoring: Each customer is assigned a score for each of the three dimensions
(Recency, Frequency, and Monetary). These scores are typically ranked from 1 to 5,
with 5 representing the best value (e.g., most recent, most frequent, highest spending).
3. Customer Segmentation: Based on these scores, customers are grouped into
different segments. For instance:
o High Recency, High Frequency, High Monetary: Loyal customers who
purchase frequently and spend a lot.
o Low Recency, Low Frequency, Low Monetary: At-risk customers who may
need re-engagement efforts.
4. Actionable Insights: Once the customers are segmented, businesses can tailor their
marketing strategies. For example, loyal customers (high RFM scores) can be
rewarded with exclusive offers, while less active customers (low RFM scores) can be
sent re-engagement emails or special promotions to bring them back.
===============================================================================
Example:
A customer has the following transaction record:
Last purchase: 10 days ago
Total purchases in the last year: 12
Total amount spent: ₹36,000
Given the following scoring rules:
Recency: 0–7 days = 5, 8–14 days = 4, 15–30 days = 3, 31–60 days = 2, >60 days = 1
Frequency: 10+ = 5, 8–9 = 4, 5–7 = 3, 3–4 = 2, 1–2 = 1
Monetary: >₹30,000 = 5, ₹20k–30k = 4, ₹10k–20k = 3, ₹5k–10k = 2, <₹5k = 1
Calculate the RFM score and identify the customer segment. Suggest a suitable
marketing strategy.
Solution:
Same scoring scale as in above question
==================================================================================
Applications of RFM Analysis
Customer Segmentation
RFM (Recency, Frequency, and Monetary) analysis is a key tool in customer segmentation. By
analyzing how recently, how frequently, and how much a customer spends, businesses can
divide their customer base into distinct groups. For instance, customers who recently made a
purchase, buy frequently, and spend a significant amount can be categorized as loyal
customers. On the other hand, customers who haven't made a purchase recently, buy
infrequently, or spend less may be considered at risk or low-value customers. This
segmentation allows businesses to develop targeted marketing strategies, ensuring that each
customer group receives the most relevant and effective communication.
Optimizing Direct Mail Campaigns
RFM analysis is highly effective for optimizing direct mail campaigns. By analyzing RFM
scores, businesses can identify the most promising customers to receive promotional materials.
Customers with high recency, frequency, and monetary scores are more likely to engage with
offers, making them the prime targets for direct mail campaigns. This focused approach ensures
that marketing efforts are directed toward customers who are more likely to respond, thereby
increasing the return on investment. It also prevents wasting resources on customers who are
unlikely to engage with the promotion.
Customer Retention
RFM analysis plays a critical role in customer retention. By assessing recency, businesses can
identify customers who have not made a purchase in a while and may be at risk of becoming
inactive. This insight allows companies to take proactive steps to re-engage these customers,
such as sending them targeted promotions, special offers, or reminders. Retaining existing
customers is often more cost-effective than acquiring new ones, and using RFM analysis helps
businesses focus their efforts on customers who need encouragement to stay loyal.
Loyalty Program Design
RFM analysis helps businesses design more effective loyalty programs by identifying high-
value customers. By understanding which customers frequently make purchases and spend the
most, businesses can offer personalized rewards, discounts, or exclusive benefits. This targeted
approach ensures that loyalty programs are more appealing to the right customers, encouraging
them to remain engaged with the brand and make more frequent purchases. In this way, RFM
analysis helps strengthen customer relationships and increase long-term loyalty.
Sales Forecasting and Cross-Selling
RFM analysis also contributes to sales forecasting by providing insights into customer buying
patterns. By understanding how often customers make purchases and how much they spend,
businesses can predict future sales trends and plan inventory and promotions accordingly.
Additionally, RFM analysis helps identify opportunities for cross-selling and up-selling.
Customers who make frequent purchases or spend a lot are prime candidates for additional
product recommendations, allowing businesses to increase the average order value and overall
sales.
Application of RFM in Direct Mail Campaigns
Direct mail campaigns are targeted marketing efforts that involve sending promotional content via
physical mail. These campaigns can be costly due to design, printing, and postage expenses.
Therefore, effective targeting is essential to maximize return on investment (ROI). RFM analysis
enables marketers to prioritize high-value customers and tailor content accordingly.
Steps in Using RFM for Direct Mail:
1. Data Collection:
Compile transactional data, including purchase dates, frequency of purchases, and total
amount spent.
2. RFM Scoring:
Assign R, F, and M scores to each customer based on business-defined ranges or percentile
rankings.
3. Segmentation:
Group customers based on their combined RFM scores. Common segments include:
o Champions (555): Recent, frequent, and high spenders.
o Loyal Customers (x5x): Frequent buyers who may not be recent.
o At Risk (x1x): Good past customers who have not purchased recently.
o Lost Customers (111): Low engagement across all metrics.
4. Campaign Personalization:
Design mail content to suit each segment:
o Champions: Personalized thank-you notes, VIP offers.
o Loyal: Loyalty rewards, referral incentives.
o At Risk: Win-back offers, reminders.
o Lost: Comeback discounts, reactivation messages.
5. Performance Monitoring:
Analyze campaign response rates by segment to refine targeting in future mailings.
Benefits of RFM-Driven Direct Mail Campaigns
Improved Targeting: Messages are sent to customers with the highest likelihood of
responding.
Cost Efficiency: Marketing budget is optimized by avoiding low-value or inactive customers.
Higher Conversion Rates: Personalized messaging improves customer engagement and
sales.
Strategic Customer Relationship Management: Fosters long-term loyalty and retention.
Scenario:
A retail company wants to launch a direct mail campaign and decides to use RFM
analysis to segment its customers and tailor messages effectively.
The following is the data for three customers:
Customer Last Purchase (days ago) Purchases in Last Year Total Spent (₹)
A 5 15 ₹45,000
B 35 8 ₹25,000
C 120 2 ₹4,000
Scoring Scale:
Recency (R):
0–7 days = 5, 8–14 = 4, 15–30 = 3, 31–60 = 2, >60 = 1
Frequency (F):
10+ = 5, 8–9 = 4, 5–7 = 3, 3–4 = 2, 1–2 = 1
Monetary (M):
₹30,000 = 5, ₹20k–30k = 4, ₹10k–20k = 3, ₹5k–10k = 2, <₹5k = 1
Step-by-Step RFM Score Calculation:
Customer R Score F Score M Score RFM Score Segment
A 5 5 5 555 Champion
B 2 4 4 244 Potential Loyalist
C 1 1 1 111 Lost
Direct Mail Strategy Based on Segments:
Customer A (555 - Champion):
Send an exclusive thank-you letter with a VIP discount code and early access to new arrivals.
Customer B (244 - Potential Loyalist):
Send a loyalty program invitation and a personalized coupon to encourage repeat purchases.
Customer C (111 - Lost):
Send a reactivation mail with a strong comeback offer (e.g., 40% off) and a feedback form.
Important Questions:
.
1) What is RFM analysis? Explain how Recency, Frequency, and Monetary
values are used to segment customers
2) Explain the step-by-step process of performing RFM analysis in marketing
3) Explain the steps in RFM Analysis
4) Explain the applications of RFM Analysis
5) Explain the applications of the RFM Analysis in Direct Mail Campaigns
==================================================================================
Chapter : Using the SCAN*PRO Model and its Variants
Definition of SCAN*PRO Model:
The SCAN*PRO model (Scanner-based Promotional Model) is a regression-based statistical
model used to analyze the effect of various marketing mix variables (like price, promotions,
and displays) on sales performance. This helps companies see what influences their product
sales the most.
The name SCAN*PRO stands for Scanner-based Promotional Model, reflecting its core
function of analyzing the impact of promotional activities using retail point-of-sale (POS)
scanner data. "SCAN" refers to the barcode scanner data that captures product-level sales
information, while "PRO" highlights the model’s focus on promotions such as price discounts,
in-store displays, and advertising features. The asterisk (*) symbolizes the interaction of
multiple marketing mix variables
Structure of the SCAN*PRO Model
The SCAN*PRO model is a type of statistical equation. It looks at how sales are affected by
different factors, and it uses logarithms to make the data easier to understand.
β1: Measures how sensitive sales are to price changes.
β2: Measures the impact of in-store displays on sales.
β3: Measures the impact of marketing features (ads, promotions) on sales.
Use of the SCAN*PRO Model in Analyzing Sales Data
The SCAN*PRO model is useful for:
1. Understanding Marketing Impact: It helps businesses see how changes in things
like price or promotions affect sales. For example, how much will sales increase if a
product’s price is reduced?
2. Price Sensitivity: The model helps businesses see how price changes impact sales.
This is called price elasticity. For example, if a product’s price drops, how much will
sales go up?
3. Measuring Promotion Effectiveness: The model shows how well promotions (like
in-store displays or ads) work at increasing sales. It tells companies if their
promotional activities are worth the investment.
4. Making Better Marketing Decisions: By understanding what affects sales,
companies can plan better marketing strategies. For example, should they focus more
on discounts or advertising?
5. Predicting Future Sales: The model helps companies forecast how their sales will
change if they change their prices or launch new promotions.
Key Variables Used in the SCAN*PRO Model
The SCAN*PRO model typically includes the following key variables:
1. Price
Definition: The retail price of the product.
Type: Continuous (often used in logarithmic form).
Effect on Sales:
o There is usually a negative relationship between price and sales.
o As the price increases, the sales tend to decrease, and vice versa.
o This effect is measured using price elasticity, which shows the percentage
change in sales due to a 1% change in price.
2. Display
Definition: Indicates whether the product is given special in-store placement or
display (e.g., end-of-aisle or promotional stand).
Type: Binary (1 = displayed, 0 = not displayed).
Effect on Sales:
o Products on display often experience increased visibility, which can lead to
higher sales.
o Displays draw customer attention and encourage impulse purchases.
3. Feature
Definition: Indicates whether the product is featured in advertising, such as store
flyers, newspapers, or catalogues.
Type: Binary (1 = featured, 0 = not featured).
Effect on Sales:
o Featuring a product in advertisements generally results in greater consumer
awareness and can lead to significant sales increases.
o The effect can vary depending on the reach and appeal of the media used.
4. Promotion (or Deal)
Definition: Indicates whether the product is part of a temporary price reduction or
promotional offer.
Type: Binary (1 = promotion available, 0 = no promotion).
Effect on Sales:
o Promotions usually lead to a short-term increase in sales.
o Some customers may be motivated to stock up during the promotional period.
5. Brand
Definition: Identifies the specific brand of the product.
Type: Categorical.
Effect on Sales:
o Well-known brands often have a higher base level of sales.
o Brand loyalty can influence how customers respond to price changes or
promotions.
6. Advertising Expenditure (Optional in extended models)
Definition: The amount of money spent on advertising and media campaigns for the
product.
Type: Continuous.
Effect on Sales:
o Higher advertising spend may increase brand awareness and lead to greater
sales, especially in competitive markets.
7. Time/Seasonality Variables (Optional)
Definition: Includes variables to capture seasonal effects or specific time periods
such as holidays or festivals.
Type: Time-based (e.g., monthly dummy variables).
Effect on Sales:
o Helps explain fluctuations in sales due to seasonal demand patterns.
==========================================================================
Major Variants of the SCAN*PRO Model
The SCAN*PRO model is used to study how different marketing actions (like pricing, displays, and
promotions) affect sales. Over time, researchers and marketers have developed variants of the basic
SCAN*PRO model to handle more complex situations, improve accuracy, or include additional factors.
1. Heterogeneous SCANPRO (HETERO SCANPRO)
Purpose: To account for differences across stores or markets.
Explanation: In real life, different stores may have different customer types,
competition, or store sizes. This model allows the effects of price or promotion to
vary across stores, instead of assuming the same effect everywhere.
Use: Helps tailor marketing strategies for specific locations.
2. Multiplicative SCANPRO (MULTIPLICATIVE SCANPRO)
Purpose: To improve model accuracy when relationships between variables are not
linear.
Explanation: This model uses a multiplicative (log-log) form, which means it
studies percentage changes. For example, it looks at how a 1% change in price
affects sales in percentage terms.
Use: Gives more realistic estimates of price elasticity and promotion effectiveness.
3. Lagged SCAN*PRO
Purpose: To measure delayed effects of marketing actions.
Explanation: Sometimes, promotions or ads do not lead to immediate sales. The
lagged model includes past values (e.g., last week’s promotion) to see how sales
respond over time.
Use: Useful for long-term planning and understanding carry-over effects of
advertising.
4. Dynamic SCAN*PRO
Purpose: To allow coefficients (like price effect) to change over time.
Explanation: Consumer behavior and market conditions can change. This model
updates its parameters regularly to reflect current trends.
Use: Helpful in fast-changing markets or during seasonal events.
5. Cross-Brand or Cross-Category SCAN*PRO
Purpose: To analyze interactions between competing brands or product
categories.
Explanation: A promotion for Brand A may affect the sales of Brand B. This model
looks at how one product's marketing affects another product’s sales.
Use: Useful for competitive analysis and category management.
Advantages of the SCAN*PRO Model
1. Data-Driven Insights
Uses actual sales data from retail stores (scanner data).
Helps make evidence-based decisions rather than guesses.
2. Measures the Impact of Marketing Mix
Separately identifies the effect of price, display, feature, and promotion.
Shows how each factor influences sales, which helps optimize marketing strategies.
3. Quantifies Price Sensitivity
Provides price elasticity of demand.
Helps marketers know how sensitive customers are to price changes.
4. Supports Promotion Planning
Shows which promotions (like displays or discounts) are most effective.
Helps plan cost-effective campaigns.
5. Customizable
Can be adjusted for different brands, products, and stores.
Variants (like dynamic or lagged models) allow deeper analysis.
Limitations of the SCAN*PRO Model
1. Assumes Constant Effects
Basic model assumes that the effect of price or display is the same across all stores
and times.
In real life, customer behavior changes over time and location.
2. Ignores Some External Factors
Does not always include competitor actions, weather, or seasonal effects unless
specially added.
This can reduce accuracy.
3. Limited Long-Term Impact Measurement
Focuses more on short-term effects (e.g., weekly sales).
May not capture brand loyalty or long-term brand building.
4. Data Requirements
Needs large amounts of clean scanner data.
Small companies or stores without detailed sales data may not be able to use it
effectively.
5. Complexity in Interpretation
The model uses logarithmic and statistical terms, which may be difficult to
understand for non-technical users.
Requires trained analysts to interpret results correctly.
Imp questions:
1. What is the SCAN*PRO model? Explain its structure and use in analyzing sales data.
2. Describe the key variables used in the SCAN*PRO model and how they influence product
sales.
3. What are the major variants of the SCAN*PRO model? Briefly explain the purpose of
each.
[Link] the advantages and limitations of using the SCAN*PRO model in marketing
decision-making.
Chapter : Forecasting Sales from Few Data Points
Sales forecasting is the process of estimating future sales based on past data. When only a
few data points are available (e.g., 3–6 observations), advanced forecasting techniques may
not be practical. Instead, simpler, intuitive methods are applied to give a rough but helpful
estimate.
When only a few data points (e.g., 4–6 weeks of sales data) are available, we use basic
models like:
Linear regression
Log-linear models (like SCAN*PRO)
Moving average (simpler method)
Exponential smoothing (when trend exists)
Exmaple
Given Sales Data:
Month (X) Sales (Y in $1000s)
1 20
2 22
3 25
4 27
5 30
Calculate the sales for the 6th month.
Solution:
Step 2: Calculate Slope (m) of the Line
Moving Average Method
The Moving Average (MA) method smooths out short-term fluctuations and highlights longer-term
trends in data. It works by taking the average of a fixed number of past periods to forecast the next
period.
Example: Forecast Sales
Week (t) Sales (Y)
1 4
2 5
3 7
4 10
5 ?
Solution :
Challenges of Forecasting with Limited Data
Lack of Trend Visibility
With few data points, it is difficult to identify meaningful patterns or trends in sales
performance.
Sensitivity to Outliers
One unusual value (e.g., a promotional spike) can affect the forecast heavily and reduce
reliability.
Overfitting Risk
With a small dataset, models may fit the available data perfectly but perform poorly on future
data.
Poor Parameter Estimation
Statistical models like regression require sufficient data to estimate relationships between
variables. Limited data leads to unstable coefficients.
No Data for Model Validation
Validating a forecast model usually requires splitting the data into training and testing sets.
With few data points, this is not possible.
Restriction on Model Complexity
Advanced forecasting techniques (e.g., ARIMA, neural networks) require larger datasets and
are unsuitable for limited data environments.
Improving Forecast Accuracy with Few Data Points
Despite these limitations, the following approaches can improve the reliability of forecasts
made from small datasets:
Use of Simple Forecasting Models
Methods such as linear regression, moving averages, or exponential smoothing can work well
with fewer inputs.
Apply Exponential Smoothing
This method gives more weight to recent data, reducing the impact of noise and outliers.
Supplement with External or Analogous Data
Data from similar products or markets can help fill gaps and support better decision-making.
Incorporate Expert Judgment
Managerial experience and market knowledge can guide adjustments to model-based
forecasts.
Avoid Overfitting
Keep models simple, using fewer variables and straightforward equations to avoid fitting the
model too closely to limited data.
Use Percentage or Log Transformations
Modeling relative changes (e.g., log-log models) often provides more stable results than
using raw values.
Scenario Planning
Forecast multiple possibilities (e.g., optimistic, pessimistic, and expected scenarios) to
prepare for uncertainty.
Important Questions
1. What is sales forecasting? Discuss how sales can be forecasted using limited
historical data.
2. Explain the use of simple linear regression for forecasting sales from few data
points with an example.
3. Describe different methods used to forecast sales when only a few data points
are available.
4. What are the challenges of forecasting sales using limited data? How can
accuracy be improved in such cases?