Recommendation
Systems
Collaborative Filtering
Yijun Lin
Thanks for source slides and material
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets
[Link] 1
Recall: Collaborative Filtering
Overview
• CF works by collecting user feedback on items: e.g.,
ratings
• Two classes of CF algorithms:
• Neighborhood-based or Memory-based approaches
• User-based CF
• Item-based CF
2
Recall: Collaborative Filtering
Overview
• CF works by collecting user feedback on items: e.g.,
ratings
• Two classes of CF algorithms:
• Neighborhood-based or Memory-based approaches
• User-based CF
• Item-based CF
• Model-based approaches
• Estimate parameters of statistical models
• Latent factor and matrix factorization models
3
Model-Based Collaborative Filtering
• Provide recommendations by estimating parameters of
(statistical) models for predicting user ratings
• Developing models to learn complex patterns
• Based on training set – Supervised Learning
• Then make predictions for CF tasks with the learned models
• Example models:
• Bayesian models
• Classification algorithms (if users’ ratings are in categories)
• Regression models and SVD (singular value decomposition ) for
numerical ratings
4
Regression Models and SVD
Methods
• Numerical ratings are common in real recommender
systems
• Regression and SVD methods are good at making
predictions for numerical values
• Linear regression methods
• Latent factor models
• Learn model parameters to minimize the error between
predictions and ground truth on training data
• After developing such a model, fitting the model to a
new data sample to make prediction
5
UV-Decomposition
• For a utility matrix with rows and columns (i.e., users
and items)
• Find a matrix with rows and columns and a matrix
with rows and columns
• Such that closely approximates in those entries
where is nonblank
𝑀 𝑈 𝑉 6
UV-Decomposition
• For a utility matrix with rows and columns (i.e., users
and items)
• Find a matrix with rows and columns and a matrix
with rows and columns
• Such that closely approximates in those entries
where is nonblank => Minimize reconstruction error on
known ratings
𝑀 𝑈 𝑉 7
UV-Decomposition
• For a utility matrix with rows and columns (i.e., users
and items)
• Find a matrix with rows and columns and a matrix
with rows and columns
• Such that closely approximates in those entries
where is nonblank => Use to estimate the missing
rating
𝑀 𝑈 𝑉 8
Characteristics and Challenges of CF
• Data Sparsity
• Cold Start Problem
• Synonyms
• Scalability
• Gray Sheep
•…
9
Characteristics and Challenges of CF
(1)
• Data Sparsity
• Many commercial recommender systems deal with large
user/item sets
• Most users do not rate most items: Utility matrix is extremely
sparse
• Limitations?
10
Characteristics and Challenges of CF
(1)
• Data Sparsity
• Many commercial recommender systems deal with large
user/item sets
• Most users do not rate most items: Utility matrix is extremely
sparse
• This reduces probability of finding set of users with similar
ratings
11
Characteristics and Challenges of CF
(1)
• Data Sparsity
• Many commercial recommender systems deal with large
user/item sets
• Most users do not rate most items: Utility matrix is extremely
sparse
• This reduces probability of finding set of users with similar
ratings
• Potential Solutions: Clustering / Dimensionality Reduction
12
Characteristics and Challenges of CF
(1)
• Data Sparsity
• Many commercial recommender systems deal with large
user/item sets
• Most users do not rate most items: Utility matrix is extremely
sparse
• This reduces probability of finding Revise
set ofthe
users with similar
utility matrix so the columns
Harry-Potter
ratings Star-
Wars
represent clusters of items
• Potential Solutions: Clustering / Dimensionality Reduction
13
Characteristics and Challenges of CF
(1)
• Data Sparsity
• Many commercial recommender systems deal with large
user/item sets
• Most users do not rate most items: Utility matrix is extremely
sparse
• This reduces probability of finding set of users with similar
ratings
• Potential Solutions: Clustering / Dimensionality Reduction
• Principle Component Analysis (PCA), Singular Value
Decomposition (SVD)
• Remove unrepresentative or insignificant users/items to reduce size of
the matrix 14
Characteristics and Challenges of CF
(2)
• Cold Start Problem
• When a new item has just entered the system
• Limitations?
15
Characteristics and Challenges of CF
(2)
• Cold Start Problem
• When a new item has just entered the system
• Can’t be recommended until some users rate it
• Also called “first-rater problem”
• When a new user has just entered the system
• Limitations?
16
Characteristics and Challenges of CF
(2)
• Cold Start Problem
• When a new item has just entered the system
• Can’t be recommended until some users rate it
• Also called “first-rater problem”
• When a new user has just entered the system
• Cannot give good recommendations because of lack of rating or
purchase history
• Potential Solutions:
• Content-based systems do not rely on ratings from other
users
• Hybrid CF (content-boosted CF): External content
information can be used to produce predictions for new users 17
Characteristics and Challenges of CF
(3)
• Synonyms:
• Same or very similar items that have different names or
entries
• Limitations?
18
Characteristics and Challenges of CF
(3)
• Synonyms:
• Same or very similar items that have different names or entries
• Most recommender systems are unable to discover this latent
association
• These products are treated differently
• Potential Solutions:
• Automatic term expansion or construction of thesaurus
• Construct a semantic space where terms and documents that
are closely associated
19
Characteristics and Challenges of CF
(4)
• Scalability
• Traditional CF systems suffer scalability problems at very large
scale
• Limitations?
20
Characteristics and Challenges of CF
(4)
• Scalability
• Traditional CF systems suffer scalability problems at very large
scale
• Expensive computation, cannot do online/real-time
recommendation
• Potential Solutions:
• Dimensionality reduction techniques
• Item-based CF algorithms: Instead of calculating similarities
between all pairs of items, calculate similarity only between
pairs of co-rated items by a user => Trick for HW4 21
Characteristics and Challenges of CF
(5)
• Gray Sheep
• There are users whose opinions do not consistently agree or
disagree with any group of people
• Limitations?
22
Characteristics and Challenges of CF
(5)
• Gray Sheep
• There are users whose opinions do not consistently agree or
disagree with any group of people
• Do not benefit from collaborative filtering
• Potential Solutions:
• Hybrid approach combining content-based and CF
recommendations
23
Characteristics and Challenges of CF
(5)
• Black sheep
• Idiosyncratic tastes
• Make recommendations nearly impossible
• Considered an acceptable failure
24
Summary of Collaborative Filtering
Methods
• Does CF require feature selection?
• Can CF do personalized recommendation?
• Can CF learn latent patterns from user interactions?
25
Summary of Collaborative Filtering
Methods
• Pros:
• No feature selection needed
• Provide personalized recommendation
• Consider user interactions to capture complex user
preferences
26
Summary of Collaborative Filtering
Methods
• Can CF recommend for new users?
• Can CF recommend for new items?
• Can CF recommend if there is no common users/items?
• Does CF tend to recommend items to a user with unique
taste?
27
Summary of Collaborative Filtering
Methods
• Pros:
• No feature selection needed
• Provide personalized recommendation
• Consider user interactions to capture complex user
preferences
• Cons:
• Cold start: Need enough users in the system to find a match
• First rater: Cannot recommend an item that has not been
previously rated
• Sparsity: Hard to find users that have rated the same items
• Popularity bias: Cannot recommend items to someone with 28