0% found this document useful (0 votes)
5 views72 pages

Understanding Recommendation Systems

Chapter 6 discusses recommendation systems, which provide personalized suggestions based on user preferences and behaviors. It covers various techniques such as content-based filtering, collaborative filtering (both user-based and item-based), and model-based approaches like clustering and matrix factorization. The chapter also highlights the advantages and limitations of each method, as well as the importance of hybrid systems to enhance recommendation accuracy.
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views72 pages

Understanding Recommendation Systems

Chapter 6 discusses recommendation systems, which provide personalized suggestions based on user preferences and behaviors. It covers various techniques such as content-based filtering, collaborative filtering (both user-based and item-based), and model-based approaches like clustering and matrix factorization. The chapter also highlights the advantages and limitations of each method, as well as the importance of hybrid systems to enhance recommendation accuracy.
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Chapter 6


What is a recommendation system?
A recommendation system (or recommender system) is a tool
designed to provide personalized suggestions to users based on
their preferences, behavior, and interactions with a platform.
These systems analyze data such as purchase history, browsing
history, user demographics, and contextual information to
deliver relevant content.
Content-based techniques
 Content-based techniques are a class of recommender systems that suggest
items to users based on the attributes of items they have previously liked.
Unlike collaborative filtering, which uses data from other similar users, a
content-based system relies only on an individual user's preferences and the
features of the items.
 For example, a movie recommendation system might note that a user has
watched and enjoyed several action movies starring a specific actor. The
system would then recommend other films that share the same "action"
genre or feature the same actor.
How content-based techniques work
Collaborative Filtering
Types of collaborative Filtering
User-Based Collaborative Filtering (UBCF)
User-Based Collaborative Filtering is a recommendation technique that suggests items
to a user based on the preferences of other users who have similar tastes or behavior.
It assumes that:
“Users who agreed in the past will agree again in the future.”
How It Works
 Collect user-item ratings
 Each user rates some items (like movies, songs, or products).
 Example: User A rates movies on a scale of 1–5.

User Movie 1 Movie 2 Movie 3 Movie 4


A 5 4 ? 2
B 5 5 4 2
C 1 2 1 5

 Find similar users


 Compare user A’s ratings with other users (B, C, etc.).
 Measure similarity using:
 Cosine similarity
 Pearson correlation
 Euclidean distance
 Example: A is most similar to B.
 Predict missing ratings
 Use the ratings of similar users (neighbors) to estimate A’s missing ratings.
 Example: Since B liked Movie 3, A might like it too.
 Generate recommendations
 Recommend items (movies, songs, etc.) that similar users liked but the active user hasn’t rated yet.
 Example
If User A and User B both liked Inception and Interstellar, and B also liked The Matrix,
then The Matrix is recommended to User A.
 Advantages
• Simple and intuitive to implement.
• Doesn’t need content features (works just from ratings).
• Captures human behavior patterns directly.
 Limitations
• Cold start problem: Can’t recommend for new users/items with no ratings.
• Sparsity problem: Many users don’t rate enough items.
• Scalability issue: Similarity must be recalculated for many users as data grows.
 Use Cases
• Movie recommendation (Netflix)
• E-commerce product recommendation (Amazon)
• Music or playlist recommendation (Spotify)
• Social media content suggestions
Item-Based Collaborative Filtering (IBCF)
Item-Based Collaborative Filtering recommends items similar to those a user has
already liked, based on item-to-item relationships rather than user-to-user similarities.
It assumes that:
“If a user liked one item, they will also like similar items.”
How It Works
 Collect user-item ratings
 Each user rates items (like movies, songs, or products).
User Movie A Movie B Movie C Movie D
A 5 4 ? 2
B 5 5 4 2
C 1 2 1 5
 Compute item-item similarity
Measure similarity between items based on how users rated them.
Common similarity measures:
 Cosine similarity
 Pearson correlation
 Example: If users who liked Movie A also liked Movie B,
then A and B are similar items.
 Predict missing ratings
For a user’s unrated item, find similar items they already rated.
Estimate rating using similarities and user’s known ratings.
 Generate recommendations
Recommend the top-rated or most similar items to the user.
Example
If a user liked Inception and Interstellar, and those are similar to The Matrix,
then the system recommends The Matrix.

Advantages
 More stable than user-based filtering (item similarities change less often).
 Works well even with many users.
 Faster in large systems (e.g., Amazon).

Limitations
 Still suffers from cold start for new items.
 Doesn’t consider user preferences beyond rated items.

Use Cases
 E-commerce: “Customers who bought this also bought…”
 Streaming services: “Because you watched…”
 Online learning: “Students who viewed this course also viewed…”
Model-Based Collaborative Filtering (UBCF)
Clustering-Based Algorithms

 Model-Based Collaborative Filtering uses machine learning models to predict a


user’s preferences based on patterns learned from past user–item interactions.
 When clustering-based algorithms are used, the system groups similar users or
items into clusters and makes predictions using those clusters.
Types of Clustering Algorithms
K-Means Clustering
 K-Means stands as one of the most widely used clustering algorithms due to its simplicity and
efficiency. This centroid-based technique organizes data points around central vectors that
represent clusters.
 The algorithm works through a straightforward process:
1. Randomly initialize K centroids (cluster centers)
2. Assign each data point to its nearest centroid
3. Recalculate the centroids based on the assigned points
4. Repeat until convergence or maximum iterations reached
5. K-Means excels with spherical clusters of similar size but requires specifying the number of
clusters (K) beforehand. This makes it ideal for customer segmentation, image compression, and
document clustering applications.
 Example:
In a movie recommendation system, users are clustered based on their movie ratings. A new user is
assigned to the nearest cluster (e.g., 'Action Movie Lovers'), and recommendations are made based
on that group’s preferences.
How K-Means Clustering Works
 K-means stands out as one of the most accessible clustering algorithms
for beginners to understand
 The fundamental idea behind K-means is finding commonalities by
measuring distances between data points—the closer two points are, the
more similar they are considered.
Step-by-step process

 K-means follows a straightforward iterative approach:


 Initialization: Begin by randomly selecting K points as initial cluster centroids
 Assignment: Calculate the distance between each data point and all centroids,
then assign each point to its closest centroid
 Update: Recalculate the centroids by taking the mean of all points assigned to
each cluster
 Repeat: Continue steps 2-3 until the centroids no longer change significantly or
you reach a maximum number of iterations
 During this process, K-means attempts to minimize the total intra-cluster
variation (the sum of squared distances from each point to its assigned centroid).
This measurement, often called “inertia” or “within-cluster sum of squares,”
decreases with each iteration as the algorithm refines the clusters.
 Clustering enhances recommendations by grouping similar users or items. It
addresses cold-start issues and improves accuracy through user-based and item-
based filtering.
Matrix Factorization Method
 Matrix Factorization (MF) is a model-based collaborative filtering technique that decomposes a large
user–item rating matrix into two lower-dimensional matrices —
one representing users and another representing items.
 It helps to discover latent (hidden) features that explain the relationships between users and items.
These latent features can then be used to cluster similar users or items.
 In a user–item rating matrix (R):
 Each row represents a user
 Each column represents an item
 Each cell 𝑅𝑢,𝑖 represents the rating of user u for item i
 Matrix Factorization approximates:

Where:
P = User-feature matrix
Q = Item-feature matrix
The dot product of P and Q predicts missing ratings.
 Algorithms Used
[Link] Value Decomposition (SVD) – Most common matrix factorization technique.
[Link]-negative Matrix Factorization (NMF) – Ensures non-negative features for interpretability.
[Link] Matrix Factorization (PMF) – Adds probabilistic modeling to handle uncertainty
in ratings.

 Advantages
Reduces data dimensionality.
Captures hidden user and item features.
Works well even with sparse data.
Supports clustering for grouping similar behaviors.

 Disadvantages
Computationally expensive for very large datasets.
Requires tuning of hyperparameters.
Struggles with cold start (new users/items with no data).
Example
1. In a movie recommendation system (like Netflix):
2. The rating matrix is factorized using SVD.
3. Clustering is applied on user factors to find groups like “Action lovers” or
“Romance lovers.”
4. Recommendations are made based on cluster-level similarities.
Applications
1. Movie, music, and e-commerce recommendations.
2. Social media interest group detection.
3. Customer segmentation in marketing.
Deep Learning Integration in UBCF
 Deep learning can enhance UBCF by learning latent features automatically rather than relying
solely on similarity metrics.
Common Approaches:

 Autoencoders
 Treat the user’s item ratings as input.
 Learn compressed latent representation of users.
 Reconstruct the missing ratings.
Neural Collaborative Filtering (NCF)
 Use embedding layers for users and items.
 Pass embeddings through deep neural networks
to predict ratings.

Advantages of Deep Learning UBCF:


 Captures non-linear relationships between
users and items.
 Handles sparse data better.
 Learns complex latent features automatically.

Example Flow: Deep Learning UBCF


 Input user-item matrix → mask missing values.
 Encode users/items into latent vectors via neural network.
 Compute predicted ratings through a feedforward network.
 Recommend top-N items to users based on predicted ratings.
Why Hybrid Filtering?
1. Collaborative Filtering (CF):
 Pros: Can find patterns across users, no need for item attributes.
 Cons: Cold start problem (new items/users), sparsity issue.
2. Content-Based Filtering (CBF):
 Pros: Works well with new items if attributes are known, personalized.
 Cons: Limited to items similar to what the user has already rated; no discovery of new interests.
3. Hybrid systems aim to:
✓ Reduce cold start and sparsity problems.
✓ Improve recommendation accuracy.
✓ Balance personalization and novelty.

Applications
 E-commerce: Amazon combining purchase history (CF) and product attributes (CBF).
 Streaming services: Netflix and Spotify combining user ratings and content features.
 Social networks: Suggest friends or groups using hybrid techniques.
Utility Matrix
A utility matrix is a foundational concept in recommendation systems, especially in
collaborative filtering. It represents user preferences for a set of items in matrix form.
Here's a breakdown to help you understand and build a recommendation system
using a utility matrix.

A utility matrix is a 2D matrix where:


 Rows represent users
 Columns represent items (like movies, books, products, etc.)
 Each cell contains a rating or preference (e.g., 1–5 stars) that a user has given to an
item. If no rating exists, the cell is typically empty or marked as NaN (missing value).

Movie A Movie B Movie C Movie D


User1 5 3 1
User2 4 2 1
User3 2 4 5
Types of Recommendation Systems Using Utility
Matrix

 User-Based Collaborative Filtering


 Finds users who are similar based on rating patterns.
 Recommends items liked by similar users.
 Item-Based Collaborative Filtering
 Finds items similar to what a user has liked.
 Recommends similar items based on item-item similarity.
 Matrix Factorization (e.g., SVD)
 Decomposes the utility matrix into latent features for users and items.
 Predicts missing ratings.
 Example: Movie ratings
Imagine a streaming service with four users (Alice, Bob, Carol, and David) and five
movies. The ratings are on a scale of 1 to 5, with blank cells indicating that a user has
not rated that movie yet.
 Initial utility matrix:
Avatar Interstellar The Matrix The Deadpool
Godfather

Alice 5 4 1

Bob 4 5 5

Carol 4 5

David 1 2 4 5

 How the system uses this matrix


 The goal of the recommendation system is to predict the missing values in this matrix
to recommend new movies to users. It does this by analyzing existing data through
methods like collaborative filtering.
 User-based collaborative filtering
This method identifies users with similar tastes and recommends items the similar users liked.
For example, to recommend a movie to Alice:
Find similar users: The system compares Alice's ratings to others. It notices that Alice and Bob
both rated Avatar highly (5 and 4, respectively) and Interstellar similarly (4 and 5).
Identify new items: The system sees that Bob rated The Matrix a 5, and Alice has not rated it.
Predict and recommend: Based on Bob's positive rating, the system can recommend The
Matrix to Alice.
 Item-based collaborative filtering
 This method focuses on finding items that are similar to one another. For example, to
predict Alice's rating for The Godfather:
 Find similar items: The system compares ratings for The Godfather with other movies. It
sees that users Carol and David both rated The Godfather and The Matrix similarly (Carol
gave them 5 and 4; David gave them 4 and 2).
 Leverage existing ratings: Alice's high ratings for movies like Avatar and Interstellar are
compared with the ratings for The Matrix. This is a more complex comparison, but the
system could identify that The Godfather is similar to movies Alice has enjoyed.
 Predict and recommend: The system would then predict Alice's potential rating for The
Godfather and decide whether to recommend it.
 Matrix factorization
 This technique decomposes the large, sparse utility matrix into two smaller, lower-
rank matrices—a user-feature matrix and an item-feature matrix.
 Decomposition: The system learns "latent factors" (hidden characteristics) for both
users and items based on the observed ratings. For example, one latent factor
might represent a user's preference for sci-fi, and another might represent an item's
genre.
 Prediction: By multiplying these two new matrices, the system can generate a
complete, dense utility matrix with predictions for all missing entries.
 Recommendation: The highest-predicted rating for a movie Alice has not seen
becomes the recommendation. For instance, the system might predict that Alice
would rate The Matrix a 4.5 and The Godfather a 2.0, so it recommends The Matrix.
Features of Documents
A feature is a measurable property or characteristic of a document. In text processing,
features help algorithms understand the content and structure of the document.
Tools & Techniques to Discover Features
 Preprocessing the Text
Tokenization-The text is split into individual words or "tokens".
Stopword removal
Lemmatization / stemming
Lowercasing
Ex.
from [Link] import word_tokenize
from [Link] import stopwords
tokens = word_tokenize("Natural Language Processing with Python.")
filtered = [w for w in tokens if [Link]() not in [Link]('english')]
 Bag-of-Words (BoW)
The Bag-of-Words model represents a document as an unordered collection of its words. It
disregards grammar and word order and focuses only on the frequency of each word.
 Vectorization:
Each document is converted into a vector where each dimension corresponds to a
unique word in the vocabulary. The value in each dimension is the count of that word in the
document

from sklearn.feature_extraction.text import CountVectorizer, TfidfVectorizer


docs = ["AI is cool", "AI and ML are the future"]
vectorizer = TfidfVectorizer()
X = vectorizer.fit_transform(docs)
print(vectorizer.get_feature_names_out())
Example:
Document 1: "The cat sat on the mat."
Document 2: "The dog ate the cat."
The vocabulary
would be: ["the", "cat", "sat", "on", "mat", "dog", "ate"]. The BoW vectors for the documents
would be:
Document 1: [2, 1, 1, 1, 1, 0, 0]
Document 2: [2, 1, 0, 0, 0, 1, 1]
 Term Frequency-Inverse Document Frequency (TF-IDF)
 This technique is a more advanced version of BoW that measures the importance of a word in a
document relative to a corpus. It gives higher weight to words that are frequent in a specific
document but rare across all documents.
 Process:
 Term Frequency (TF): Measures how often a word appears in a document.
 Inverse Document Frequency (IDF): Measures how rare or common a word is across all documents.
 Calculation: The TF-IDF score is the product of TF and IDF. Common words like "the" and "a" get low
scores, while unique keywords get higher scores.
 TF-IDF stands for “Term Frequency – Inverse Document Frequency ”.
 TF-IDF is a numerical statistic which measures the importance of the word in a document.
 • Term Frequency: Number of time a word appears in a text document.
 • Inverse Document Frequency: Measure the word is a rare word or common word in a document.
 tf(t,d) = (Number of times term t appears in a document) / (Total number of terms in the document)
Where,
 tf(t,d) - Term Frequency, t = term, d = document
 idf(t) = log [ n / df(t) ] + 1
 where, idf(t) - Inverse Document Frequency n - Total number of documents df(t) is the document
frequency of term t;
 tf-idf(t, d) = tf(t, d) * idf(t)
N-grams
An N-gram is a contiguous sequence of n items (words or characters) from a text. This
method helps capture some context by preserving the order of a short sequence of words,
unlike the simple BoW approach.
 Example (for the sentence "The cat sat on the mat"):
 Unigrams (1-gram): "The", "cat", "sat", "on", "the", "mat"
 Bigrams (2-gram): "The cat", "cat sat", "sat on", "on the", "the mat"
 Trigrams (3-gram): "The cat sat", "cat sat on", "sat on the", "on the mat"
Topic Modeling (Latent Dirichlet Allocation)
 This is an unsupervised method for discovering abstract "topics" within a collection of
documents.
 Process:
 Input: A large collection of documents.
 Model building: The algorithm identifies clusters of co-occurring words and assigns a
probability distribution of topics to each document and a probability distribution of
words to each topic.
 Output: A set of topics, where each topic is characterized by a list of words.
 Example: In a corpus of news articles, one discovered topic might be associated with
words like "election," "vote," and "candidate," while another might involve words like
"stock," "market," and "economy."
Feature discovery beyond text
Metadata extraction
Beyond the content of the document, metadata provides valuable features for
machine learning models.
 Examples: Document type (PDF, Word, etc.), author, creation date, and file size.
Named Entity Recognition (NER)
 NER models identify and label specific entities in a document, such as people,
organizations, locations, and dates.
Example: In a document mentioning "Apple Inc. announced a new product in
Cupertino," an NER model would identify "Apple Inc." as an organization and
"Cupertino" as a location.
Optical Character Recognition (OCR)
For scanned documents and images, OCR is used to extract text and convert it into
a machine-readable format. The resulting text can then be processed using other
NLP techniques.
Questions
 Evaluate different types of recommendation models for various applications.
 Evaluate the effectiveness of a utility matrix in collaborative filtering. What are its limitations, and how might
data sparsity affect recommendation quality? Justify your answer with suitable arguments.
 Develop a utility matrix for a movie recommendation dataset with at least five users and five movies, and
describe how missing values can be predicted using collaborative filtering
 Assess the role of feature extraction methods (e.g., TF-IDF, embeddings) in improving content-based
recommendation accuracy.
 Critically analyze the challenges of cold-start and data sparsity problems in recommender systems and
propose possible solutions.
 Develop a personalized news recommendation framework that integrates user profiling, topic modeling, and
collaborative filtering.
 Construct a hybrid recommendation approach combining user-based and item-based collaborative filtering.
 Develop a complete recommendation framework using a utility matrix that integrates user-based
collaborative filtering and content-based filtering. Explain each step.
 Explain the working of the K-Means clustering algorithm in detail with a step-by-step example.
 Explain how deep learning can improve similarity computation in UBCF.
 Discuss the role of feature extraction in content-based recommendation systems.
 Social Network Graphs
 Formulate an approach for visualizing and interpreting overlapping community structures
in dynamic social networks.
 Evaluate the effectiveness of clustering techniques in analyzing social networks. How do
they help in understanding community structures?
 Explain and analyze different partitioning methods for large social graphs.
 Explain the importance of detecting overlapping communities in real-world social
networks. Provide an example scenario.
 Examine how graph partitioning differs in directed versus undirected graphs.
 Analyze how community detection helps in identifying influential groups in social media
networks.
 Assess the challenges in evaluating community detection results in large-scale social
networks.
 Formulate an approach for visualizing and interpreting overlapping community structures
in dynamic social networks.
 Time Series
 Define time series. Explain the components of a time series with examples.
 Discuss various methods used to evaluate forecast performance and select an appropriate
forecasting model.
 Write short notes on any two of the following: (a) Naïve Forecast, (b) Simple Moving Average,
(c) Exponential Smoothing.
 What are common accuracy measures for time series forecasts?
 Explain autocorrelation and explain its significance in time series analysis.
 What are the main components of an ARIMA(p, d, q) model?
 Explain the concept of overfitting in time series modeling.
 Explain how regression analysis can be used for time series forecasting. Provide an example.
 Discuss the assumptions of regression models in the context of time series data.
 Explain the mathematical formulation of Holt’s linear and Holt-Winters exponential smoothing
methods.
 Discuss how seasonality and trend are incorporated into advanced exponential smoothing
models.
 Critically evaluate the performance of exponential smoothing and ARIMA models for short-term
forecasting.
 Develop a forecasting model using regression and discuss how residual analysis can validate
the model.

You might also like