0% found this document useful (0 votes)
11 views58 pages

Recommender Systems Overview and Techniques

The document discusses recommender systems, which are information filtering systems that predict user ratings for various items such as music, movies, and books. It outlines different paradigms of recommender systems, including collaborative filtering, content-based filtering, and hybrid approaches, highlighting their advantages and disadvantages. The document also delves into techniques for measuring user similarity and making predictions, emphasizing the importance of understanding user preferences and item characteristics in enhancing user experience and increasing sales for providers.

Uploaded by

milake3794
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views58 pages

Recommender Systems Overview and Techniques

The document discusses recommender systems, which are information filtering systems that predict user ratings for various items such as music, movies, and books. It outlines different paradigms of recommender systems, including collaborative filtering, content-based filtering, and hybrid approaches, highlighting their advantages and disadvantages. The document also delves into techniques for measuring user similarity and making predictions, emphasizing the importance of understanding user preferences and item characteristics in enhancing user experience and increasing sales for providers.

Uploaded by

milake3794
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Information Retrieval and

Web Analytics

Andreas Kaltenbrunner & Ricardo Baeza-Yates

Recommender systems
(Part I)
References

• Mining of Massive Datasets (Chapter 9).


[Link]

• Recommender Systems Handbook


Francesco Ricci, Lior Rokach, Bracha Shapira
Springer
Introduction

• Recommender System is a form of information filtering


system which seeks to predict user ratings of items.

• Recommender System finds application in music,


movies, news, books, research articles and many other.
Application areas
In the Social Web
Why using Recommender Systems?

• Value for the customer


– Find things that are interesting
– Narrow down the set of choices
– Help to explore the space of options
– Discover new things
– Entertainment
– …
• Value for the provider
– Offer unique personalized service for the customer
– Increase trust and customer loyalty
– Increase sales, click through rates, conversion, etc.
– Opportunities for promotion, persuasion
– Obtain more knowledge about customers
– …
Jeff Bezos

• “If I have 3 million


customers on the Web,
I should have 3 million
stores on the Web”
– Jeff Bezos, CEO of
[Link]
– Degree in Computer
Science
– US$83.6 billion
estimated (November
2020), ranked no. 1 in
the Forbes list of
world’s billionaires
Long-Tails Graph in RS

Distribution of popularity among items or products in a


marketplace.
Problem domain

• Recommendation systems (RS) help to match users with items


– Ease information overload
– Sales assistance (guidance, advisory, persuasion, …)

RS are software agents that elicit the interests and preferences of


individual consumers […] and make recommendations accordingly.
They have the potential to support and improve the quality of the
decisions consumers make while searching for and selecting
products online.
• [Xiao & Benbasat, MISQ, 2007]

• Different system designs / paradigms


– Based on availability of exploitable data
– Implicit and explicit user feedback
– Domain characteristics
Recommender systems

• RS seen as a function
• Given:
– User model (e.g., ratings, preferences, demographics,
situational context)
– Items (with or without description of item
characteristics)
• Find:
– Relevance score. Used for ranking.
• Finally:
– Recommend items that are assumed to be relevant
• But:
– Remember that relevance might be context-dependent
– Characteristics of the list itself might be important
(diversity)
Paradigms of recommender systems

Collaborative: "Tell me what's


popular among my peers"

Collaborative filtering models can recommend an item to


user A based on the interests of a similar user B.
Paradigms of recommender systems

Content-based: "Show me
more of the same I've liked"

Content-based filtering uses item features to recommend


other items similar to what the user likes, based on their
previous actions or explicit feedback.
Paradigms of recommender systems

Content-based: "Show me more of the same I've liked"

Content-based filtering uses item features to recommend


other items similar to what the user likes, based on their
previous actions or explicit feedback.
Paradigms of recommender systems

Knowledge-based: "Tell me
what fits based on my needs"
Paradigms of recommender systems

Hybrid: combinations of various


inputs and/or composition of
different mechanism
Recommender systems: basic techniques

Pros Cons
Collaborative No knowledge-engineering Requires some form of rating
effort, serendipity of results, feedback, cold start for new
learns market segments users and new items

Content-based No community required, Content descriptions


comparison between items necessary, cold start for
possible new users, no surprises

Knowledge-based Deterministic Knowledge engineering


recommendations, assured effort to bootstrap, basically
quality, no cold-start, can static, does not react to
resemble sales dialogue short-term trends
COLLABORATIVE FILTERING
Collaborative Filtering (CF)

• The most prominent approach to generate recommendations


– used by large, commercial e-commerce sites
– well-understood, various algorithms and variations exist
– applicable in many domains (book, movies, DVDs, ..)
• Approach
– use the "wisdom of the crowd" to recommend items
• Basic assumption and idea
– Users give ratings to catalog items (implicitly or explicitly)
Explicit
• users specify how much they liked for example a particular movie by
providing a numerical rating.
• this approach usually does not scale because only a small fraction
of users leave ratings and reviews
Implicit:
• E.g., a user watches a movie => system infers that she/he likes it.
– Customers who had similar tastes in the past, will have similar
tastes in the future
User-based nearest-neighbor collaborative
filtering (1)

• The basic technique:


– Given an "active user" (Alice) and an item I not yet
seen by Alice
– Goal is to estimate Alice's rating for item I, e.g., by
• find a set of users (peers) who liked the same items as
Alice in the past and who have rated item I
• use, e.g., the average of their ratings to predict, if Alice
will like item I
• do this for all items Alice has not seen and
recommend the best-rated
Item1 Item2 Item3 Item4 Item5
Alice 5 3 4 4 ?
User1 3 1 2 3 3
User2 4 3 4 3 5
User3 3 3 1 5 4
User4 1 5 5 2 1
User-based nearest-neighbor collaborative
filtering (2)

• Some first questions


– How do we measure similarity?
– How many neighbors should we consider?
– How do we generate a prediction from the
neighbors' ratings?

Item1 Item2 Item3 Item4 Item5


Alice 5 3 4 4 ?
User1 3 1 2 3 3
User2 4 3 4 3 5
User3 3 3 1 5 4
User4 1 5 5 2 1
Measuring user similarity

Item1 Item2 Item3 Item4 Item5


Alice 5 3 4 4 ?
User1 3 1 2 3 3 sim = 0,85
User2 4 3 4 3 5 sim = 0,71

User3 3 3 1 5 4 sim = 0,00

User4 1 5 5 2 1 sim = -0,79


Making predictions

• is the average rating of target user a.


• N is the neighborhood of most similar users.
• d is the average rating of neighbour user b.
• is the similarity between user a and b.
• is the rating assigned by user b to item p.
Item-based collaborative filtering

• Basic idea:
– Use the similarity between items (and not
users) to make predictions
• Example:
– Look for items that are similar to Item5
– Take Alice's ratings for these items to
predict the rating for Item5
Item1 Item2 Item3 Item4 Item5
Alice 5 3 4 4 ?
User1 3 1 2 3 3
User2 4 3 4 3 5
User3 3 3 1 5 4
User4 1 5 5 2 1
Item-based Collaborative Filtering

sij: similarity of items i and j


rxj: rating of user x on item j
N(i;x): set items rated by x similar to i
Item-based CF (|N|=2)

users
movies

- unknown rating - rating between 1 to 5


Item-based CF (|N|=2)
movies

- estimate rating of movie 1 by user 5


Item-based CF (|N|=2)

Here we use Pearson correlation as similarity:


Neighbor selection: 1) Subtract mean rating mi from each movie i
Identify movies similar to m1 = (1+3+5+5+4)/5 = 3.6
row 1: [-2.6, 0, -0.6, 0, 0, 1.4, 0, 0, 1.4, 0, 0.4, 0]
movie 1, rated by user 5 2) Compute Pearson correlation (i.e., cosine similarity if avg
Item-based CF (|N|=2)

Compute similarity weights:


s1,3=0.41, s1,6=0.59
Item-based CF (|N|=2)

Predict by taking weighted average:


r1.5 = (0.41*2 + 0.59*3) / (0.41+0.59) = 2.6
Before:
CF: Common Practice

• Define similarity sij of items i and j


• Select k nearest neighbors N(i; x)
– Items most similar to i, that were rated by x
• Estimate rating rxi as the weighted average:

baseline estimate for rxi ◼ μ = overall mean movie rating


◼ bx = rating deviation of user x
= (avg. rating of user x) – μ
◼ bi = rating deviation of movie i
Item-Item vs. User-User

Avatar LOTR Matrix Pirates

Alice

Bob

Carol

David

◼ In practice, it has been observed that item-item often works


better than user-user
◼ Why? Items are simpler, users have multiple tastes
Example CF: Item vs Item
ITEMS
1 2 3 4

1 5 1 3
b3=(1+1+4+3)/4-μ=2.25-3= -0.75
U b4=(3+2+3+5)/4-μ=3.25-3= 0.25
2 1 2
S b63=μ+b6+b3=3+1-0.75= 3.25
E b64=μ+b6+b4=3+1+0.25= 4.25
R 3
r61= b61+(s13·(r63-b63)
S +s14·(r64-b64)) /(s13+s14)
4 1 r61=3.5+(0.48·(3-3.25)
+0.32·(5-4.25))/(0.48+0.32)

r61=3.68
5 4 4 3

6 ? 4 3 5

Average ratings: s12=0 s13=0.48 s14=0.32


μ = (5+1+3+2+1+4+3+5+1+4+3+4)/12 = 36/12 = 3
bx= b6=(4+3+5)/3-μ = 12/3-3 = 1
bi = b1=(1+4)/2-μ = 2.5-3 = -0.5
bxi = b61=μ+bx+bi = 3+1-0.5 = 3.5
Memory-based and model-based
approaches

• User-based CF is said to be "memory-based"


– the rating matrix is directly used to find neighbors /
make predictions
– does not scale for most real-world scenarios
– large e-commerce sites have tens of millions of
customers and millions of items
• Model-based approaches
– based on an offline pre-processing or "model-learning"
phase
– at run-time, only the learned model is used to make
predictions
– models are updated / re-trained periodically
– large variety of techniques used
– model-building and updating can be computationally
expensive
Model-based approaches

• Plethora of different techniques proposed in the last


years, e.g.,
– Matrix factorization techniques, statistics
• singular value decomposition, principal component
analysis
– Association rule mining
• compare: shopping basket analysis
– Probabilistic models
• clustering models, Bayesian networks, probabilistic
Latent Semantic Analysis
– Various other machine learning approaches
• Costs of pre-processing
– Usually not discussed
– Incremental updates possible?
The Netflix Prize


The Netflix Utility Matrix R
480,000 users
Matrix R 1 3 4
3 5 5
4 5 5
3
17,700
movies 3
2 2 2
5
2 1 1
3 3
1
The Netflix Utility Matrix R
480,000 users
Matrix R 1 3 4
3 5 5
4 5 5
3
17,700
movies 3
Training Data Set 2 ? ? Test Data Set
?
2 1 ?
3 ? True rating of
user x on item i
1

Predicted rating
BellKor Recommender System

• The winner of the Netflix Challenge!


• Multi-scale modeling of the data: Global effects
Combine top level, “regional”
modeling of the data, with
a refined, local view:
– Global: Factorization
• Overall deviations of users/movies
– Factorization:
• Addressing “regional” effects Collaborative
filtering
– Collaborative filtering:
• Extract local patterns
Modeling Local & Global Effects

• Global:
– Mean movie rating: 3.7 stars
– The Sixth Sense is 0.5 stars above avg.
– Joe rates 0.2 stars below avg.
⇒ Baseline estimation:
Joe will rate The Sixth Sense 4 stars
• Local neighborhood (CF/NN):
– Joe didn’t like related movie Signs
– ⇒ Final estimate:
Joe will rate The Sixth Sense 3.8 stars
Recap: Collaborative Filtering (CF)

• Earliest and most popular collaborative


filtering method
• Derive unknown ratings from those of “similar”
movies (item-item variant)
• Define similarity measure sij of items i and j
• Select k-nearest neighbors, compute the rating
– N(i; x): items most similar to i that were rated
by x

sij… similarity of items i and j


rxj…rating of user x on item j
N(i;x)… set of items similar to
item i that were rated by x
Recap: deviations in classic CF

• In practice we get better estimates if we


model deviations:

baseline estimate for rxi


Problems/Issues:
1) Similarity measures are “arbitrary”
2) Pairwise similarities neglect
interdependencies among users
μ = overall mean rating 3) Taking a weighted average can be
bx = rating deviation of user x restricting
= (avg. rating of user x) – μ Solution: Instead of sij use wij that we
bi = (avg. rating of movie i) – μ estimate directly from data
Recommendations via Optimization 1 3 4
3 5 5
4 5 5
3
3
2 2 2
5
2 1 1
• Goal: Make good recommendations 3 3
1
– Quantify goodness using RMSE:
Lower RMSE ⇒ better recommendations
– Want to make good recommendations on
items that user has not yet seen.

– Let’s set build a system such that it works


well on known (user, item) ratings
And hope the system will also predict well the
unknown ratings
Recommendations via Optimization

SSE = sum of squared errors

Predicted rating True


rating
Latent Factor Models

• “SVD” on Netflix data: R ≈ Q · PT SVD: A = U Σ VT

users factors

users
items

factors
≈items

PT
R Q
• For now, let’s assume we can approximate the rating
matrix R as a product of “thin” Q · PT
– R has missing entries but let’s ignore that for now!
• Basically, we will want the reconstruction error to be small on
known ratings and we don’t care about the values on the missing
ones
Latent Factor Models (e.g., SVD)
Serious Braveheart
The Color Amadeus
Purple

Lethal Weapon
Sense and
Sensibility Ocean’s 11
Geared Geared
towards towards
females males

The Lion King

The Princess Independence


Diaries Day
Dumb and
Funny Dumber
Ratings as Products of Factors

• How to estimate the missing rating of


user x for item i ?

qi = row i of Q
px = column x of PT
Ratings as Products of Factors

• How to estimate the missing rating of


user x for item i ?

qi = row i of Q
px = column x of PT
Ratings as Products of Factors

• How to estimate the missing rating of


user x for item i ?

qi = row i of Q
px = column x of PT

ȓxi=-0.5·(-2)+0.6·0.3+0.5·2.4=2.38
Latent Factor Models
Serious Braveheart
The Color Amadeus
Purple

Lethal Weapon
Sense and
Sensibility Ocean’s 11
Geared Factor 1Geared
towards towards
females males

The Lion King


Factor 2

The Princess Independence


Diaries Day
Dumb and
Funn Dumber
y
Latent Factor Models
Serious Braveheart
The Color Amadeus
Purple

Lethal Weapon
Sense and
Sensibility Ocean’s 11
Geared Factor 1Geared
towards towards
females males

The Lion King


Factor 2

The Princess Independence


Diaries Day
Dumb and
Funn Dumber
y
Recap: SVD
n n
• Remember SVD:
– A: Input data matrix
V T
Σ
– U: Left singular vecs m
– V: Right singular vecs
A ≈m

– Σ: Singular values
– U,V are orthonormal. U
We do not need this here, we do not care, we just
want something that works

• In our case:
“SVD” on Netflix data: R ≈ Q · PT
A = R, Q = U, PT = Σ VT
Latent Factor Models


FINDING THE LATENT
FACTORS
Back to Our Problem

• Want to minimize SSE for unseen test data


• Idea: Minimize SSE on training data
– Want large k (# of factors) to capture all the
signals
– But SSE on test data begins to rise for k > 2
1 3 4
3 5 5

• This is a classical example of overfitting:


4 5 5
3
3
2 ? ?

– With too much freedom (too many free 2 1


3
?

?
?

parameters) the model starts fitting noise


1

• I.e., it fits too well the training data and thus not
generalizing well to unseen test data
Detour: Minimizing a function


Dealing with Missing Entries
1 3 4
3 5 5
4 5 5
3
3

• To solve overfitting, we introduce 2 ?

2 1
?
?

regularization:
3 ?
1

– Allow rich model where there is sufficient


data
– Shrink aggressively where data is scarce

“error” “length”
λ1, λ2 … user set regularization parameters

Note: We do not care about the “raw” value of the objective function,
but we care in P,Q that achieve the minimum of the objective
Summary: Recommender Systems

Main paradigms:
• Collaborative filtering: based on user or
item similarity.
• Content-based filtering: relies on item
features and user profiles.
• Knowledge-based and hybrid
approaches: combine structured knowledge
and multiple methods.
Summary: Recommender Systems

• Collaborative filtering (CF):recommending


items based on similar users or items.
• Two types: user-based and item-based CF.
• CF enables serendipity and captures
evolving community preferences.
•Model-based methods, such as matrix
factorization (SVD), have advanced scalability
and accuracy.
•Key challenges remain:
•Cold-start problem
•Data sparsity
•Balancing accuracy, novelty, and diversity

You might also like