0% found this document useful (0 votes)

16 views18 pages

KNN-Based Movie Recommendation System

Uploaded by

mananaadviq

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views18 pages

KNN-Based Movie Recommendation System

Uploaded by

mananaadviq

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Project Report submitted

for Artificial Intelligence (UCS411)

Submitted by :

Garvish Manan (102303083)

Rachit Mahajan (102303495)
Lakshya Swar (102303086)

Submitted to
Dr. Simranjeet Kaur

DEPARTMENT OF COMPUTER ENGINEERING

THAPAR INSTITUTE OF ENGINEERING AND TECHNOLOGY,
PATIALA, PUNJAB
INDIA
Jan-May 2025
MOVIE RECOMMENDATION
SYSTEM USING K-NEAREST
NEIGHBOURS ALGORITHM
(KNN)

1
Table of Contents

1. Abstract ................................................................................................................. 3
2. Introduction........................................................................................................... 4
3. Problem Statement ............................................................................................... 6
4. Objective................................................................................................................ 7
5. Methodology .......................................................................................................... 9
5.1 Dataset Collection ...................................................................................... 9
5.2 Data Preprocessing and Merging ................................................................ 9
5.3 Creating a unified ‘Tags’ Field ................................................................. 10
5.4 Text Vectorization – CountVectorizer ( ) .................................................. 11
Settings Used ............................................................................................. 11
5.5 Similarity Computation – Cosine Similarity ............................................ 12
5.6 Recommendation Logic – K-Nearest Neighbors (KNN) ......................... 12
5.7 Enhancing Results with Poster Display using IMDb API ........................ 13
5.8 Building the web app with Streamlit ........................................................ 14
6. Results.................................................................................................................. 15
7. Conclusion .......................................................................................................... 16
8. References............................................................................................................ 16

2
1. Abstract

In today’s digital era, users are flooded with content choices across platforms like Netflix,
Prime Video, and Disney+. While the availability of thousands of movies sounds exciting,
it often leads to choice overload, where users struggle to decide what to watch next. To
tackle this issue, recommender systems have become essential. These systems aim to
suggest relevant content, saving time and improving the user experience.

This project presents a Content-Based Movie Recommendation System built using

classical Machine Learning techniques, specifically the K-Nearest Neighbors (KNN)
algorithm. It utilizes basic, interpretable ML tools and focuses solely on movie metadata
like genres, keywords, plot summaries, cast, and crew information.

The data used comes from the TMDB 5000 Movies and Credits dataset which includes
structured and semi-structured text data. After cleaning the data and extracting key
features, a text based representation (called tags) is created for each movie by combining
the different attributes. These tags are then converted into numerical format using the
CountVectorizer, a very basic frequency based text feature extractor.

Once vectorized, the system uses cosine similarity to compare movies and identify those
that are textually closest to the selected title. The KNN algorithm then retrieves the top 5
most similar movies. Finally, the recommended results are presented to the user through
the Streamlit web UI, which also displays posters of the suggested movies using the
IMDB api.

This project demonstrates that even with simple and interpretable machine learning
models like KNN, user-friendly applications can be built. One can understand core ML
concepts like data preprocessing, vectorization and similarity measurement without
entering the black box complexity of deep learning.

3
This project follows a clear and stepwise approach towards building a functional project
where the simplicity of the methods used make the system easy to understand, efficient to
implement, and suitable for both learning purposes and real world applications.

2. Introduction
In the modern digital world, we are surrounded by a massive amount of content whether
it’s movies, music, books, or shopping products. As the number of available choices
become huge, it becomes harder for users to decide what to pick next.. This is where
recommender systems come into play. These systems help narrow down the choices by
suggesting relevant items based on user preferences or data patterns.

There are mainly three types of recommender systems used today:

1. Collaborative Filtering – This method recommends items by analyzing user

behavior, such as ratings, watch history, or likes. It looks for users with similar
tastes and suggests what they liked. While powerful, this method often suffers from
a problem that it can’t recommend well when a user or item has little or no data.

2. Content-Based Filtering – Instead of stressing on user interactions, this method

focuses on the properties of the items themselves. For example, if a user likes a
sci-fi movie with space and robots, the system will recommend other movies with
similar features. This makes content-based systems reliable even with limited user
data.

3. Hybrid Systems – These systems combine both collaborative and content based
methods to get the best of both worlds. However, for this project with a small
dataset and limited to using classical ML techniques it would have been complex to
implement Hybrid Systems.

4
For this project, we chose to use a content-based recommendation system. This method
works best when we have enough useful details about each movie like its genre, story
summary, keywords, main actors, and director. By looking at this kind of content, the
system can find movies that are similar in topic or style and suggest them to users.

This is different from other systems that depend on user reviews, watch history, or ratings
(like collaborative filtering). Content-based systems do not need any user data. This is
helpful in cases where such data doesn’t exist for example, if a movie is new and hasn’t
been watched or rated much yet. We can still recommend it if it has similarities to other
known movies.

Another good thing about content-based systems is that they are easy to understand. If a
user asks why a certain movie was recommended, we can explain it clearly like “These
movies are both thrillers with time-travel themes,” or “They have the same cast.” This
makes the system more trustworthy and easier to explain in both learning and real-life
situations.

We used Python to build the system, because it’s easy to work with and has many useful
tools. We used:

● Pandas to handle and clean the movie data,

● Scikit-learn to convert movie information into numbers and use the K-Nearest
Neighbors (KNN) model, and

● Streamlit to make a simple web app where users can get recommendations by
selecting a movie they like.

This introduction gives a clear picture of what the project is about. In the next parts, we’ll
talk more about the actual problem we tried to solve, the steps we followed, and how well
the system performed.

5
3. Problem Statement
The overwhelming volume of digital content, especially movies, has made it increasingly
difficult for users to discover relevant titles efficiently. Platforms today provide thousands
of options, but the very scale of this offering leads to choice paralysis. Users are left
confused about what to watch next, leading to disengagement.

While industry grade recommendation engines used by platforms like Netflix or YouTube
provide effective suggestions, they are often powered by deep learning, collaborative
filtering, or hybrid systems that require extensive user data, high computational resources,
and complex infrastructure, however we kept it limited to content based filtering due to
lack of user interaction data.

The core issue addressed by this project is:

Can we design an effective movie recommendation system that works even without user
interaction data, and still provides meaningful suggestions using only content metadata?

This becomes especially important when:

● The dataset does not contain user ratings, reviews, or watch history

● The goal is to understand and apply interpretable ML methods rather than black
box models

● The system must be lightweight, modular, and explainable

● New items (movies) need to be recommended despite having no user data known as
the cold start problem

Another part of the problem lies in the nature of movie metadata itself, which is
semi-structured, textual, and varied. Features like plot summaries, keywords, genres, cast,

6
and crew contain rich information, but extracting and meaningfully combining them into a
usable format for ML models beholds a technical challenge. Vectorizing such data while
preserving context and meaning especially without deep NLP models is non-trivial.

Hence, this project sets out to solve:

● How to design a system that can convert heterogeneous movie metadata into a
unified, machine understandable form

● How to calculate similarity between movies based purely on content data

● How to recommend top similar movies in a simple yet attractive UI

● How to do all this using only classical ML techniques and lightweight tools,
making the system easy to fit in the provided guidelines

This project tackles the problem from the lens of practical ML education building an
application that balances functionality, interpretability, and scalability using modest
resources.

4. Objective
The objective of this project is to develop a practical and efficient content-based movie
recommendation system that utilizes movie metadata such as genres, keywords, plot
summaries, cast, and crew details. By using classical machine learning techniques, the
goal is to create a lightweight and easy to understand system that effectively recommends
movies.

The following objectives were pursued:

● Efficient Text Representation Using Feature Extraction Techniques:

We utilized the CountVectorizer method to convert movie metadata (like plot
7
summaries and genres) into numerical features. This vectorization technique
captures word frequency patterns and helps in representing textual data in a
structured format that can be processed by the ML model.

● Employ Cosine Similarity for Measuring Movie Similarity:

By calculating cosine similarity between movie vectors, the system identifies films
with the closest content-based features. This ensures that the system provides
accurate recommendations based on similarities in movie content such as genre,
cast, and themes.

● Enhance User Experience with Real-Time API Integration:

We incorporated the IMDb API to fetch and display real-time movie posters for
recommended titles. This not only improves the aesthetics of the user interface but
also provides an engaging, interactive experience for users.

● Create a Simple and Intuitive User Interface:

The frontend interface was built using Streamlit, allowing users to interact with the
recommendation system easily. Users can input a movie they like, and the system
will generate a list of similar movies, providing a smooth and user-friendly
browsing experience.

● Scalability for Broader Applications:

The architecture allows easy adaptation to other domains like books, music, or
products. This was achieved by structuring the backend to handle various types of
metadata input, which can be modified to suit different recommendation needs.

● Integrate Software Engineering Best Practices:

In addition to focusing on machine learning, we adhered to software engineering
best practices, including version control using Git, clean code principles, and
documentation. (Readme file alongside [Link]). This ensured that the
system is not only functional but also easy to set up by any beginner.

8
5. Methodology

The movie recommendation system was developed using a clear, step-by-step pipeline
designed to be modular, explainable, and lightweight. This methodology focuses on
content-based filtering, a classical recommendation technique that uses item features
rather than user behavior. The system recommends similar movies by analyzing
descriptive content like plot, cast, and genre using the K-Nearest Neighbors (KNN)
algorithm and cosine similarity for comparison.

Below are the stages followed:

5.1 Dataset Collection

We used the TMDB 5000 Movies and Credits Dataset, which is publicly available on
Kaggle. This dataset includes:

● tmdb_5000_movies.csv: Contains details like title, genres, overview (plot

summary), keywords, vote average, and popularity.

● tmdb_5000_credits.csv: Contains the cast and crew data for each movie.

These two datasets were chosen because they provide structured and semi structured
metadata necessary for content-based recommendations. Importantly, the dataset does not
contain user ratings or interaction data, so collaborative filtering was not applicable.

5.2 Data Preprocessing and Merging

To prepare the data for analysis, we performed the following steps:

● Merging the two datasets using the common movie ID to combine metadata
information.

● Dropping unnecessary columns such as budget, homepage, vote_count, and other

fields not relevant to content similarity.
Handling missing values by dropping or filling them where appropriate.
9
● Parsing JSON-like strings (present in columns like genres, keywords, cast, and
crew) to extract meaningful fields using Python’s ast.literal_eval and list
comprehension.

○ From cast, only the top 3 actors were selected.

○ From crew, we filtered out only the director by checking the job role.

This step converted complex fields into simple lists of strings, making them suitable for
text-based processing.

5.3 Creating a unified ‘Tags’ Field

To capture the core essence of each movie, we created a new feature called tags, which
consolidates several key metadata attributes into a single textual representation. This
unified field serves as the foundation for similarity comparison in the recommendation
engine.

The tags field was formed by combining the following movie details:

● Overview (plot summary)

● Genres
● Keywords
● Names of the top 3 cast members
● Director’s name

To make the data consistent and ready for further processing:

● All text was converted to lowercase to avoid mismatches due to case sensitivity.
● Multi-word terms were joined (e.g., “Science Fiction” → “sciencefiction”) so that
such phrases are treated as single meaningful units.

10
This approach allowed us to turn semi-structured and varied movie metadata into a
uniform text string that summarizes the thematic and stylistic identity of each film.

5.4 Text Vectorization – CountVectorizer ( )

To enable similarity comparison between movies, we needed a way to convert the textual
tags field into numerical form. For this, we applied the CountVectorizer from Scikit-learn,
a frequency based vectorization method.

CountVectorizer transforms a collection of text documents into a matrix where:

● Each row represents a movie,
● Each column corresponds to a word from the top vocabulary, and
● The values indicate how frequently that word appears in that movie’s tags.

Settings Used:
● max_features = 5000: Limits the vocabulary to the most frequent 5000 terms.
● stop_words='english': Removes common words like “the”, “and”, “is”, etc., which
carry little meaning in determining movie similarity.

Note:

This method is based on the Bag of Words (BoW) model, which assumes that the meaning
or context of a document can be approximated by the frequency of its words. While BoW
ignores grammar and word order, it works effectively for tasks involving content
comparison.

No deep semantic or syntactic processing was involved, only straightforward word

frequency analysis using classical vectorization.

11
5.5 Similarity Computation – Cosine Similarity
Once we had vector representations of all movies, we needed a way to compare them. We
used cosine similarity, a popular distance metric for text data.

Cosine Similarity measures the angle between two vectors rather than their absolute
distance. It is defined as:

Where:

● A and B are the vector representations of two movies.

● A.B is the dot product, and ||A||, ||B|| are the magnitudes.

Why cosine similarity? Because two movies can be similar in content even if they have
different magnitudes of words (i.e., one has a longer description), but their direction
(theme and content) remains the same.

We computed the cosine similarity matrix for all movie pairs and stored it in memory for
efficient time saving lookups during recommendation.

5.6 Recommendation Logic – K-Nearest Neighbors (KNN)

With the similarity matrix available, we implemented a simple K-Nearest Neighbors style
algorithm to generate recommendations.

12
Steps followed:

● Given a movie title input by the user, we locate its index in the dataset.
● We retrieve its row from the similarity matrix.
● We sort the scores in descending order and return the top 5 most similar movies
(excluding the selected one itself).

Fig.1 KNN Recommendation illustration

5.7 Enhancing Results with Poster Display using IMDb API

To make the recommendations visually appealing, we fetched movie posters using the
IMDb API. Each recommended movie’s ID is used to construct a URL to retrieve its
poster image. This was done through a helper function using HTTP requests.

This added a professional and polished touch to the system, making it more than just a
text-based tool.

13
5.8 Building the web app with Streamlit
We used Streamlit, a Python-based web framework designed for ML and data science
apps, to create a clean and interactive user interface.

Features of the UI:

● Dropdown to select a movie title
● Button to trigger recommendation
● Side-by-side display of recommended movies with poster and title
● Instant loading and responsiveness using Streamlit’s caching features

Fig.2 Workflow of the Content_Based Movie Recommendation System using KNN

Stage Tool / Technique Purpose

Data Cleaning Pandas Read, merge, and preprocess CSVs

Feature Extraction Python string Extract relevant metadata
manipulation
Text Representation CountVectorizer Convert tags into vectors

14
Similarity Measure Cosine Similarity Compare movies by content
Recommendation Logic KNN Retrieve top similar movies
Poster Fetching IMDb API Enhance output with visuals
UI Framework Streamlit Build interactive web interface

6. Results

When a user selects a movie, the system retrieves five similar titles by computing similarity
scores based on structured metadata such as genres, keywords, plot overviews, lead actors, and
director names. For instance, when tested with the input movie The Dark Knight, the
recommended movies included Batman Begins, The Dark Knight Rises, Man of Steel, Iron Man,
and Watchmen. These suggestions align closely with the superhero genre and share common
stylistic and thematic elements like dark narratives and action sequences.

The system processes user queries efficiently, returning recommendations within a fraction of a
second due to the precomputed and stored cosine similarity matrix. This ensures high
responsiveness without relying on complex or computationally heavy algorithms. Since the
recommendations are based on straightforward metadata fields and word frequency vectors, the
underlying logic remains easy to interpret and transparent, making it reliable for analysis and
debugging.

However, the system is not without limitations. Since it relies on a basic word count approach, it
may not fully capture deeper semantics or the tone of the content. Movies with richer metadata
may also influence results more heavily, introducing bias. Additionally, the system lacks
personalization and delivers the same set of recommendations to all users for a given movie.

Despite these limitations, the model achieves its goal effectively by producing coherent and
contextually accurate recommendations using lightweight, interpretable techniques. This validates
the design choice to focus on metadata driven feature extraction and basic vector operations for
similarity computation.

15
7. Conclusion

This project gave us hands-on experience in building a complete machine learning system using
simple and interpretable techniques. Through this, we understood the importance of data
preprocessing how cleaning, transforming, and structuring movie metadata can directly impact the
quality of recommendations. Creating the unified tags field taught us how combining multiple
text sources can help define the unique identity of each item.

Beyond technical skills, this project emphasized design thinking, how to keep things modular,
explainable, and visually engaging through tools like Streamlit and API integration. It taught us
that good user experience is as important as a working algorithm.

Key Takeaways:
● Clean data is the foundation of any ML project.
● Cosine similarity is a powerful way to find related content without complex logic.

● User-friendly interfaces make ML tools more accessible and fun to use.

● You don’t need deep learning to solve real-world problems, start simple, think smart.

Overall, this project was not just about coding a solution—it was about learning how to turn data
into decisions and building a bridge between raw information and real user needs.

8. References

1. Scikit-learn Documentation – CountVectorizer:

[Link]
[Link]

2. Scikit-learn Documentation – Cosine Similarity:

[Link]
[Link]

3. Movie Dataset (TMDb 5000 Movie Dataset):

[Link]

16
4. Bag of Words Model – Towards Data Science:
[Link]
b4a91

5. Cosine Similarity Explained – Medium:

[Link]

6. K-Nearest Neighbors Algorithm – GeeksforGeeks:

[Link]

7. Introduction to Recommendation Systems – Analytics Vidhya:

[Link]

Movie Recommendation System Overview
No ratings yet
Movie Recommendation System Overview
14 pages
Movie Recommendation System Overview
No ratings yet
Movie Recommendation System Overview
15 pages
Movie Recommendation System Overview
No ratings yet
Movie Recommendation System Overview
24 pages
Movie Recommendation System Overview
No ratings yet
Movie Recommendation System Overview
9 pages
Movie Recommendation System
No ratings yet
Movie Recommendation System
13 pages
Movie Recommender System Project Report
No ratings yet
Movie Recommender System Project Report
20 pages
Movie Recommendation System Proposal
No ratings yet
Movie Recommendation System Proposal
9 pages
Machine Learning Movie Recommendations
No ratings yet
Machine Learning Movie Recommendations
6 pages
Movie Recommendation System Overview
No ratings yet
Movie Recommendation System Overview
12 pages
SRMDB - in (B28 - Research Paper)
No ratings yet
SRMDB - in (B28 - Research Paper)
5 pages
Cini-gram Movie Recommendation System
No ratings yet
Cini-gram Movie Recommendation System
13 pages
AI Movie Recommendation System Overview
No ratings yet
AI Movie Recommendation System Overview
15 pages
Categorical Movie Recommender Project Report-V2
No ratings yet
Categorical Movie Recommender Project Report-V2
26 pages
Intelligent Movie Recommendation System
No ratings yet
Intelligent Movie Recommendation System
20 pages
Movie Recommendation System Overview
No ratings yet
Movie Recommendation System Overview
17 pages
BCA Project: Movie Recommendation System
No ratings yet
BCA Project: Movie Recommendation System
17 pages
Movie Recommendation System Report
No ratings yet
Movie Recommendation System Report
31 pages
Athrav DSBDA Miniproject
No ratings yet
Athrav DSBDA Miniproject
11 pages
Page 8-29 Movie R
No ratings yet
Page 8-29 Movie R
37 pages
Movie Recommendation System Project Report
No ratings yet
Movie Recommendation System Project Report
13 pages
Movie Recommendation System Using Collaborative Filtering
No ratings yet
Movie Recommendation System Using Collaborative Filtering
7 pages
Movie Recommendation System Report
No ratings yet
Movie Recommendation System Report
18 pages
Mini Project Report Final
No ratings yet
Mini Project Report Final
12 pages
Movie Recommendation System Overview
No ratings yet
Movie Recommendation System Overview
14 pages
Movie Recommendation System Report
No ratings yet
Movie Recommendation System Report
28 pages
Personalized Movie Recommendation System
No ratings yet
Personalized Movie Recommendation System
21 pages
Movie Recommendation System Project
No ratings yet
Movie Recommendation System Project
18 pages
Content-Based Movie Recommender System
No ratings yet
Content-Based Movie Recommender System
10 pages
DL Project Report
No ratings yet
DL Project Report
8 pages
Movie Recommendation System Report
0% (1)
Movie Recommendation System Report
33 pages
Movie Recommendation System Design
No ratings yet
Movie Recommendation System Design
13 pages
Onkar Dsbda3pdf
No ratings yet
Onkar Dsbda3pdf
27 pages
Movie Recommendation System Proposal
No ratings yet
Movie Recommendation System Proposal
17 pages
Ansh Final Project Report 01
No ratings yet
Ansh Final Project Report 01
29 pages
Movie Recommendation System Report
No ratings yet
Movie Recommendation System Report
37 pages
Movie Recommendation System Project Report
No ratings yet
Movie Recommendation System Project Report
48 pages
Movie Recommender System Overview
No ratings yet
Movie Recommender System Overview
28 pages
Movie Recommendation System Overview
No ratings yet
Movie Recommendation System Overview
5 pages
NLP-Based Movie Recommendation System
No ratings yet
NLP-Based Movie Recommendation System
16 pages
NLP-Based Movie Recommendation System
No ratings yet
NLP-Based Movie Recommendation System
16 pages
Movie Recommendation System Overview
No ratings yet
Movie Recommendation System Overview
4 pages
Movie Recommendation System Project
No ratings yet
Movie Recommendation System Project
34 pages
Hybrid Movie Recommendation System
No ratings yet
Hybrid Movie Recommendation System
15 pages
Movie Recommendation System Overview
No ratings yet
Movie Recommendation System Overview
13 pages
Onkar Dsbda2
No ratings yet
Onkar Dsbda2
33 pages
Movie Recommendation System Based On Mov
No ratings yet
Movie Recommendation System Based On Mov
7 pages
Introduction
No ratings yet
Introduction
23 pages
Hybrid Movie Recommender System
No ratings yet
Hybrid Movie Recommender System
27 pages
Wa0009.
No ratings yet
Wa0009.
49 pages
Movie Recommendation System Project Report
No ratings yet
Movie Recommendation System Project Report
17 pages
Movie Recommendation System Overview
No ratings yet
Movie Recommendation System Overview
11 pages
AI-Based Movie Recommendation System
No ratings yet
AI-Based Movie Recommendation System
16 pages
Movie Recommendation System Overview
No ratings yet
Movie Recommendation System Overview
17 pages
Movie Recommendation System Project
No ratings yet
Movie Recommendation System Project
16 pages
Movie Recommendation System on Azure
No ratings yet
Movie Recommendation System on Azure
16 pages
Movie Recommendation System Overview
No ratings yet
Movie Recommendation System Overview
53 pages
Movie Recommendation System Based On Mov PDF
No ratings yet
Movie Recommendation System Based On Mov PDF
7 pages
Movie Recommendation System Project
No ratings yet
Movie Recommendation System Project
20 pages
ML-Based Movie Recommendation System
No ratings yet
ML-Based Movie Recommendation System
28 pages
CPU Scheduling Algorithms Explained
No ratings yet
CPU Scheduling Algorithms Explained
81 pages
Photon Emission from He-Ne Laser
No ratings yet
Photon Emission from He-Ne Laser
4 pages
Engineering Design Project Submission
No ratings yet
Engineering Design Project Submission
1 page
Arduino Logic Gates & Sensor Projects
No ratings yet
Arduino Logic Gates & Sensor Projects
4 pages
Tinkercad Arduino Projects Guide
No ratings yet
Tinkercad Arduino Projects Guide
3 pages
Projectile Motion Analysis at 40° Angle
No ratings yet
Projectile Motion Analysis at 40° Angle
31 pages
Load vs. Span Analysis Data
No ratings yet
Load vs. Span Analysis Data
4 pages
AWS VPC and RDS Setup Guide
No ratings yet
AWS VPC and RDS Setup Guide
10 pages
WLAN Device State Update Log
No ratings yet
WLAN Device State Update Log
560 pages
About This Documentation Node - Js v0.6
No ratings yet
About This Documentation Node - Js v0.6
161 pages
Java-Based Online Calculator Project
No ratings yet
Java-Based Online Calculator Project
28 pages
Implementing PPQA in Software Projects
No ratings yet
Implementing PPQA in Software Projects
4 pages
Cloud Computing Training at Insys
No ratings yet
Cloud Computing Training at Insys
23 pages
WatchPad TV Box Premium Features
No ratings yet
WatchPad TV Box Premium Features
19 pages
Benefits of Managed Virtualization
No ratings yet
Benefits of Managed Virtualization
17 pages
Complete Exam Form Process 2025-2026
No ratings yet
Complete Exam Form Process 2025-2026
3 pages
Risk Management in Cybersecurity
No ratings yet
Risk Management in Cybersecurity
2 pages
Earth Renewed PowerPoint Template
No ratings yet
Earth Renewed PowerPoint Template
18 pages
Vigi Ipc Openapi
No ratings yet
Vigi Ipc Openapi
132 pages
PCI-DSS and Crypto Key Management:: White Paper
No ratings yet
PCI-DSS and Crypto Key Management:: White Paper
10 pages
Rekap Media Sosial OSIS 467
No ratings yet
Rekap Media Sosial OSIS 467
18 pages
Computer Systems Servicing Overview
No ratings yet
Computer Systems Servicing Overview
39 pages
Python Programming Basics and Projects
No ratings yet
Python Programming Basics and Projects
57 pages
Microcontroller vs. Embedded Systems
100% (1)
Microcontroller vs. Embedded Systems
18 pages
Automation Testing Interview Q&A Guide
No ratings yet
Automation Testing Interview Q&A Guide
8 pages
DSO138mini: Connect to PC and Transfer Data
No ratings yet
DSO138mini: Connect to PC and Transfer Data
8 pages
Documentation - NestJS - A Progressive Node - Js Framework
No ratings yet
Documentation - NestJS - A Progressive Node - Js Framework
3 pages
iMaster NCE x86 Server Specs
No ratings yet
iMaster NCE x86 Server Specs
14 pages
Understanding Operating Systems: Types & Functions
No ratings yet
Understanding Operating Systems: Types & Functions
15 pages
Structural Vectoring® in Mineral Exploration
No ratings yet
Structural Vectoring® in Mineral Exploration
28 pages
Object-Oriented Analysis Lab Manual
No ratings yet
Object-Oriented Analysis Lab Manual
132 pages
Enhance Galaxy Note 10 Battery Life
No ratings yet
Enhance Galaxy Note 10 Battery Life
18 pages
Ayesha Ali: Software Engineering Profile
No ratings yet
Ayesha Ali: Software Engineering Profile
1 page
Android Night Mode Log Updates
No ratings yet
Android Night Mode Log Updates
2 pages
CH32V00x RISC-V MCU Overview
No ratings yet
CH32V00x RISC-V MCU Overview
2 pages
Diploma in Graphic Design at CADD Lanka
No ratings yet
Diploma in Graphic Design at CADD Lanka
8 pages
CompTIA XK0-005 Exam Q&A Guide
No ratings yet
CompTIA XK0-005 Exam Q&A Guide
25 pages

KNN-Based Movie Recommendation System

Uploaded by

KNN-Based Movie Recommendation System

Uploaded by

Project Report submitted

for Artificial Intelligence (UCS411)

Garvish Manan (102303083)

DEPARTMENT OF COMPUTER ENGINEERING

This project presents a Content-Based Movie Recommendation System built using

There are mainly three types of recommender systems used today:

1. Collaborative Filtering – This method recommends items by analyzing user

2. Content-Based Filtering – Instead of stressing on user interactions, this method

● Pandas to handle and clean the movie data,

The core issue addressed by this project is:

This becomes especially important when:

● The system must be lightweight, modular, and explainable

Hence, this project sets out to solve:

● How to calculate similarity between movies based purely on content data

● How to recommend top similar movies in a simple yet attractive UI

The following objectives were pursued:

● Efficient Text Representation Using Feature Extraction Techniques:

● Employ Cosine Similarity for Measuring Movie Similarity:

● Enhance User Experience with Real-Time API Integration:

● Create a Simple and Intuitive User Interface:

● Scalability for Broader Applications:

● Integrate Software Engineering Best Practices:

Below are the stages followed:

5.1 Dataset Collection

● tmdb_5000_movies.csv: Contains details like title, genres, overview (plot

5.2 Data Preprocessing and Merging

● Dropping unnecessary columns such as budget, homepage, vote_count, and other

○ From cast, only the top 3 actors were selected.

5.3 Creating a unified ‘Tags’ Field

● Overview (plot summary)

To make the data consistent and ready for further processing:

5.4 Text Vectorization – CountVectorizer ( )

CountVectorizer transforms a collection of text documents into a matrix where:

No deep semantic or syntactic processing was involved, only straightforward word

● A and B are the vector representations of two movies.

5.6 Recommendation Logic – K-Nearest Neighbors (KNN)

Fig.1 KNN Recommendation illustration

5.7 Enhancing Results with Poster Display using IMDb API

Features of the UI:

Fig.2 Workflow of the Content_Based Movie Recommendation System using KNN

Stage Tool / Technique Purpose

Data Cleaning Pandas Read, merge, and preprocess CSVs

● User-friendly interfaces make ML tools more accessible and fun to use.

1. Scikit-learn Documentation – CountVectorizer:

2. Scikit-learn Documentation – Cosine Similarity:

3. Movie Dataset (TMDb 5000 Movie Dataset):

5. Cosine Similarity Explained – Medium:

6. K-Nearest Neighbors Algorithm – GeeksforGeeks:

7. Introduction to Recommendation Systems – Analytics Vidhya:

You might also like