Predicting Consumer Behavior with ML

The document discusses predicting consumer behavior using machine learning techniques, specifically K-Nearest Neighbors (KNN) and Naive Bayes algorithms, applied to a dataset from Myntra. It highlights the influence of factors like price and brand on purchasing decisions and presents findings that indicate Naive Bayes achieved an accuracy of approximately 48% in predicting consumer buying behavior. The analysis emphasizes the importance of data quality and algorithm selection in improving predictive accuracy.

Uploaded by

Devika Mehrotra

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views7 pages

Predicting Consumer Behavior with ML

Uploaded by

Devika Mehrotra

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

DA&R

PREDICTING CONSUMER BEHAVIOR THROUGH MACHINE

LEARNING

Celestine D’souza
Student, Department of fashion technology, National Institute of Fashion Technology
Jodhpur, India
Devika Mehrotra
Student, Department of fashion technology, National Institute of Fashion Technology
Jodhpur, India

1. ABSTRACT
With digital innovations, it has become easier for consumers to search for items, match their
preferences with products, and verify the reputation of those products. Consumers decide on a daily
basis whether to buy a product. Oftentimes, the decision to purchase something is influenced by the
price alone, but it can also be influenced by other factors. Today, big data analytics has revolutionized
how brands market to their audiences using digital technology. During their searches, selections, and
purchases of products and services, consumers' journeys are described. It aims to examine the
relationship between consumer behaviour and variables such as price, brand image, and product
information for Myntra product catalogue by using big data and predictive analytics to accurately
predict consumers' behaviour using algorithms like Naive Bayes and KNN and use this information for
business decisions.

Keywords: Consumer Behaviour, Machine Learning, sentimental analysis

2. INTRODUCTION:

A perfect world would have one-to-one interactions between your business and your customers. That's
unlikely, and usually not feasible. Profiling your customers based on their shared characteristics allows
you to more effectively target their needs. You can cluster customers based on internal and
supplemental data such as demographics, geography, product channels, and previous purchases. If you
have a good understanding of customer behaviours within each segment, you can optimize your
communication and offerings across the entire customer lifecycle and even anticipate their
requirements before they are even aware of them. It is difficult to define the logic behind most of our
purchasing decisions. Our buying decisions are heavily influenced by emotions, trust, communication
skills, culture, and intuition. To understand it we implemented two models KNN and Naïve Bayes on
our dataset; Myntra product catalogue
One may share many characteristics with their nearest peers, whether it be your thought process,
working etiquettes, philosophies, or anything else. As a result, we build friendships with people we
deem similar to ourselves. Similar principles apply to the KNN algorithm. To figure out what class a
new unknown data point belongs to, it attempts to locate all its closest neighbours. This approach is
based on distance. KNN stands for "K-Nearest Neighbor". Machine learning is supervised by KNN.
Both classification and regression problems can be solved with this algorithm. KNN calculates the
distance from all points near the unknown data and filters out the ones with the shortest distances.
Therefore, it's called a distance-based algorithm
If you're working with data that contains millions of records, Naive Bayes is a machine learning model
that's recommended. Sentiment analysis is one of the NLP tasks that it provides very good results for.
As an algorithm, it is simple and fast This theorem applies to conditional probability. The conditional
probability is the probability of something happening given what has already occurred. Based on the
prior knowledge of an event, we can determine the conditional probability.

3. LITERATURE REVIEW

3.1 K-NN ALOGORITHM

The K-Nearest Neighbour algorithm is a simple supervised learning technique based on machine
learning algorithms. It assumes the similarity between the new case/data and available cases and put
the new case into the category that is most similar to the available categories. K-NN algorithm stores
all the available data and classifies a new data point based on the similarity. 1 . It is also called a lazy
learner algorithm because it does not learn from the training set immediately instead it stores the
dataset and at the time of classification, it performs an action on the dataset algorithm at the training
phase just stores the dataset and when it gets new data, then it classifies that data into a category that is
much similar to the new data2.
Example: Suppose, let's say we have an image of an animal that looks similar to either a dog or a cat,
we want to confirm that weather the image is of a dog or a cat for this classification we use KNN
algorithm the KNN model will find the features that are similar to the dataset that of dogs’ and cat’s
image the one which is the most similar will decide if the image if of a dog or a cat
Principle: Consider the following figure 1. Let us say we have plotted data points from our training set
on a two-dimensional feature space. As shown, we have a total of 6 data points (3 red and 3 blue). Red
data points belong to ‘class1’ and blue data points belong to ‘class2’. And yellow data point in a
feature space represents the new point for which a class is to be predicted. Obviously, we say it
belongs to ‘class1’ (red points)3

Figure 1
Figure 2
This is the main
principle
behind K

1
[Link]
2
[Link]
3
[Link]
nearest neighbours , that is data points that have the minimum distance K is the number of such data
points we consider in our implementation of the algorithm. Therefore, distance metric and K value are
two important considerations while using the KNN algorithm. Euclidean distance is the most popular
distance metric.
K is a essential parameter in the KNN algorithm, so how to choose a k value??
I)Using error curves: The graph (figure 2 ) shows curves for different value of k for training and test
data .At lower K value there is high variance that's why test error is high and train error is low .4
ii) Knowledge of domain is most useful in choosing the k value
iii)K value should be odd while considering binary classification

3.2 NAIVE BAYES CLASSIFIER

Naive Bayes algorithm are a supervised learning algorithm entirely based on the Bayes theorem it is
used for solving problems based on classification. It is mostly used in text classification with high
dimensional training dataset, it is one of the simplest and very effective algorithms it helps
constructing fast machine learning models that can make quick predictions. It is called probabilistic
classifier that means it can predict on the basis of probability of an object in space. The most popular
functions of NB are spam filtration, sentimental analysis and classifying articles.
Bayes Theorem finds out the probability of an event occurring given the probability of another event
that have already occurred. The Bayes' theorem states mathematically as the following equation:

Where A and B are events, P(B) is not 0 in which we are trying to find the probability of event A that
event B is true therefore event B is also termed as Evidence. P(A) is priori of A i.e., Probability of
event before evidence is even seen. Evidence is an attribute value of an unknown instance here its
Event B. P(A/B) is the probability of event after evidence can be seen i.
The implementation of NB starts at pre data processing setup followed by fitting NB into training set ,
Predicting the result , accuracy of the test result (confusion matrix),visualizing the test set result.
Naive Bayes is the fastest and easiest ML model to find the class of [Link] is used for both binary
and multi class classifications it performs good in multiclass predictors as compared to other models
also it is most famous choice for text classification problems however NB assumes that all features are
unrelated to one another therefore cannot understand the relationship between features' .
NB model is used for many purposes and have quite a few applications like credit scoring, medical
data classification, predictions using real time, heavily used in spam filtering and sentimental analysis.
4. METHODOLOGY

4
[Link]
[Link]
The popularity of e-commerce sites among customers is growing. In online shopping brands play a
huge role as it effects the consumer’s mind in terms of quality , price, social status and so on .
4.1 Data Collection:
[Link]. Product Product Product Gender Price Description Colo Num Buying
ID Name Brand r Images Behaviour

1 1001741 GAP GAP Men 749 Long Black 4 yes

3 Boys sleeves with
Graphic banded
Crew cuffs
Sweatshirt
.
.
12492 1026184 7Rainbow 7 Women 587 Blue Blue 3 yes
5 Beige & Rainbow checked
Blue saree with
Checked tasselled
Saree detail
1) The dataset used for customer behaviour analysis is collected from open platform
[Link] data is of Myntra , an e-commerce website which contains 12491 observations
with 9 columns of product , product information , price ,brand, gender , product Id , primary
colour and buying behaviour. The dataset is discrete with numeric and textual variables. A
small illustrative sample is shown in Table 1.

Table 1 : A small illustrative sample of entries in our data set which contains 12492 consumer decisions to
purchase or not to purchase a specific product.

S.N neck_a shoulder_ elbow_a trunka l_neck_a L L L trunk

o. ngle angle ngle ngle ngle Shoulder elbow angle
angle angle

0 - - - - - - - 27.42
62.1154 70.3564 112.233 98.8793 82.19283 72.997 144.0 654
44 948 36

.
.
111 -56.4353 -81.8038, -94.8011, -92.7495, - - - 28.4672
77.5833787 81.86668 140.632 2
, 4, ,
2) The second model we used was K-NN which was implemented on two sheets which were consisting of
abnormal data. It contains data about shoulder angle, neck angle, trunk angle the dataset is discrete
with numeric.

5. CLASSIFICATION METHODOLOGIES
For our analysis, we adopted the use of two different, well-known classification approaches. The algorithms
were chosen based on their widespread use, well-understood behaviour, and convincing performance in other
classification tasks. In addition, both are functional across a wide range of heterogeneous data types, with some
features being categorical while others are continuous. (Saavi Stubseid, 2018).

6. RESULTS AND DISCUSSION

Table 3: Analysis Classifier Accuracy Precision Recall result of customer

buying behaviour for Naïve Bayes
algorithm.
Naïve Bayes 48.78 48.83 52.05

For Naïve Bayes classifier we used testing data of 3121 observations. Table 3 shows the analysis result of
customer buying behaviour depending on the brand , price and product. According to analysis done using
confusion matrix Naïve Bayes classifier gives an accuracy of approximately 48.78%. Through our analysis, we
also get to know that 1459 consumers will buy the product and 1663 won’t when the main factors are product
brand, price out of 3122 observations.
Accuracy plot of both datasets 14 and 15 gives result 1, it is considered ideal for a abnormal datasets. The
relatively low accuracy of kNN is caused by several factors. One of them is that every characteristic of the
method has the same result on calculating distance. The solution of this problem is to give weight to each data
characteristic5. Accuracy of knn can be increased by improving dataset and eliminating missing values, adding
more numerical values into model implementations.

CONCLUSION
In this paper we have proposed the classification and behaviour of buying behaviour of a consumer using
machine learning. We have collected our data which consists of all myntra products apart from this we have
used two abnormal datasheet we have used mainly two models KNN and Naive Bayes, in this analysis we found
that for KNN the accuracy in both abnormal datasheets is 0 and the accuracy rate plot indicated the
value as 1.0. As the data set is small in case of abnormal data the accuracy tends to be more accurate.
However, the confusion matrix could not be applied.
In Naive Bayes implementation the final output shows that we built a Naive Bayes classifier that can
predict whether a consumer will buy or not depending on the brand and price of the products., with an
accuracy of approximately 48%. By using the confusion matrix, we got to know Through the
confusion matrix we get to know that 1459 consumers will buy the product and 1663 won’t when the
main factors are product brand, price out of 3122 observations. The ggplot() graph helps in
visualisation of price in Naive bayes model.

5
: K U Syaliman et al 2018 J. Phys.: Conf. Ser. 978 012047 ;Improving the accuracy of k-nearest neighbor using local mean
based and distance weight.
i

IoT Classification Algorithms Comparison
No ratings yet
IoT Classification Algorithms Comparison
3 pages
Unit 2
No ratings yet
Unit 2
10 pages
KNN vs Bayesian Networks Analysis
No ratings yet
KNN vs Bayesian Networks Analysis
12 pages
Supervised Learning: Classification Algorithms
No ratings yet
Supervised Learning: Classification Algorithms
23 pages
Unit 02
No ratings yet
Unit 02
20 pages
Supervised Learning Algorithms Overview
No ratings yet
Supervised Learning Algorithms Overview
71 pages
KNN vs Logistic Regression Explained
No ratings yet
KNN vs Logistic Regression Explained
12 pages
Instance-Based Learning & KNN Overview
No ratings yet
Instance-Based Learning & KNN Overview
23 pages
Types of Machine Learning Explained
No ratings yet
Types of Machine Learning Explained
101 pages
Machine Learning Techniques Overview
No ratings yet
Machine Learning Techniques Overview
36 pages
Unit-2 Machine Learning
No ratings yet
Unit-2 Machine Learning
43 pages
Unsupervised Learning Overview and Techniques
No ratings yet
Unsupervised Learning Overview and Techniques
18 pages
Machine Learning: K-Nearest Neighbors Guide
No ratings yet
Machine Learning: K-Nearest Neighbors Guide
23 pages
Supervised Learning: Classification Explained
No ratings yet
Supervised Learning: Classification Explained
48 pages
Understanding Classification in Machine Learning
No ratings yet
Understanding Classification in Machine Learning
20 pages
K-Nearest Neighbors Classification Guide
No ratings yet
K-Nearest Neighbors Classification Guide
6 pages
K-Nearest Neighbor (KNN) Algorithm
No ratings yet
K-Nearest Neighbor (KNN) Algorithm
21 pages
3.3 KNN
No ratings yet
3.3 KNN
27 pages
Overview of Classification Models in ML
No ratings yet
Overview of Classification Models in ML
38 pages
K-NN and Hierarchical Clustering Explained
No ratings yet
K-NN and Hierarchical Clustering Explained
31 pages
Distance-Based Classification Methods
50% (2)
Distance-Based Classification Methods
8 pages
ML Unit 3
No ratings yet
ML Unit 3
31 pages
Supervised Learning: KNN & Decision Trees
No ratings yet
Supervised Learning: KNN & Decision Trees
33 pages
Machine Learning Classification Overview
No ratings yet
Machine Learning Classification Overview
26 pages
K-Nearest Neighbors in Machine Learning
No ratings yet
K-Nearest Neighbors in Machine Learning
18 pages
ML Unit4 NN Classification
No ratings yet
ML Unit4 NN Classification
293 pages
Classification Techniques in Machine Learning
No ratings yet
Classification Techniques in Machine Learning
106 pages
Unit 2
No ratings yet
Unit 2
25 pages
Module 3 - Fundamentals of Machine Learning
No ratings yet
Module 3 - Fundamentals of Machine Learning
17 pages
Module 3 - Fundamentals of Machine Learning
No ratings yet
Module 3 - Fundamentals of Machine Learning
13 pages
Overview of Supervised Learning Techniques
100% (1)
Overview of Supervised Learning Techniques
75 pages
Supervised Learning: KNN & Decision Trees
No ratings yet
Supervised Learning: KNN & Decision Trees
33 pages
Machine Learning: Classification & Clustering
No ratings yet
Machine Learning: Classification & Clustering
105 pages
Understanding k Nearest Neighbours Algorithm
No ratings yet
Understanding k Nearest Neighbours Algorithm
9 pages
Supervised Learning: KNN & Decision Trees
No ratings yet
Supervised Learning: KNN & Decision Trees
38 pages
K-Nearest Neighbors Algorithm Explained
No ratings yet
K-Nearest Neighbors Algorithm Explained
13 pages
KNN Algorithm Overview and Applications
100% (1)
KNN Algorithm Overview and Applications
16 pages
Machine Learning Classification Types
No ratings yet
Machine Learning Classification Types
25 pages
KNN Algorithm Implementation in Python
No ratings yet
KNN Algorithm Implementation in Python
7 pages
KNN
No ratings yet
KNN
25 pages
ML and Cloud Computing for IoT
No ratings yet
ML and Cloud Computing for IoT
149 pages
Supervised Learning: K-NN & Logistic Regression
No ratings yet
Supervised Learning: K-NN & Logistic Regression
88 pages
Performance Metrics in Machine Learning
No ratings yet
Performance Metrics in Machine Learning
7 pages
ML Mod2
No ratings yet
ML Mod2
43 pages
Classification Models in Supervised Learning
No ratings yet
Classification Models in Supervised Learning
48 pages
Supervised Learning: Classification Methods
No ratings yet
Supervised Learning: Classification Methods
20 pages
Supervised Learning: KNN & Decision Trees
No ratings yet
Supervised Learning: KNN & Decision Trees
33 pages
Data Analytics Lab: Classification & Clustering
No ratings yet
Data Analytics Lab: Classification & Clustering
6 pages
Machine Learning Classification Techniques
No ratings yet
Machine Learning Classification Techniques
37 pages
K-Nearest Neighbors Algorithm Overview
No ratings yet
K-Nearest Neighbors Algorithm Overview
27 pages
Classifiers
No ratings yet
Classifiers
62 pages
SH Unit 4 Foml
No ratings yet
SH Unit 4 Foml
239 pages
K-Nearest Neighbors Overview and Insights
No ratings yet
K-Nearest Neighbors Overview and Insights
4 pages
EXIM Compliance in Fashion Export Planning
No ratings yet
EXIM Compliance in Fashion Export Planning
1 page
Business Model Innovation in US Manufacturing
No ratings yet
Business Model Innovation in US Manufacturing
20 pages
Lecturer Salary by Experience and Gender
No ratings yet
Lecturer Salary by Experience and Gender
2 pages
Gopalpur Tussar Fabrics GI Application
No ratings yet
Gopalpur Tussar Fabrics GI Application
24 pages
Importance of Respect in Professional Life
No ratings yet
Importance of Respect in Professional Life
5 pages
Understanding Cluster Analysis Techniques
No ratings yet
Understanding Cluster Analysis Techniques
92 pages
Merchandising Process Overview
No ratings yet
Merchandising Process Overview
5 pages
2023 Global Textile Trade Trends Analysis
No ratings yet
2023 Global Textile Trade Trends Analysis
12 pages
Category Management in Fashion Marketing
No ratings yet
Category Management in Fashion Marketing
29 pages
Mehrotra Silk Handlooms: Banarasi Legacy
No ratings yet
Mehrotra Silk Handlooms: Banarasi Legacy
5 pages
Augmented Reality's Impact on Retail Experience
No ratings yet
Augmented Reality's Impact on Retail Experience
36 pages
Consumer Attitude & Organic Paints
No ratings yet
Consumer Attitude & Organic Paints
9 pages
NIFT Bhopal M.F.M Vacancies 2024
No ratings yet
NIFT Bhopal M.F.M Vacancies 2024
7 pages
Women's Shirt Costing Template
No ratings yet
Women's Shirt Costing Template
2 pages
Sustainable Fashion Upcycling Platform
No ratings yet
Sustainable Fashion Upcycling Platform
3 pages
Solve for x: Trigonometric Equations
No ratings yet
Solve for x: Trigonometric Equations
22 pages
Introduction to Artificial Intelligence
No ratings yet
Introduction to Artificial Intelligence
6 pages
Physics Concepts and Problem Set
No ratings yet
Physics Concepts and Problem Set
13 pages
Matrix Concepts for JEE Preparation
No ratings yet
Matrix Concepts for JEE Preparation
144 pages
Newton's Laws: Revision Questions and Answers
No ratings yet
Newton's Laws: Revision Questions and Answers
6 pages
Vector Algebra and Products Explained
No ratings yet
Vector Algebra and Products Explained
13 pages
Dijkstra, Prim, Kruskal, Knapsack Algorithms
No ratings yet
Dijkstra, Prim, Kruskal, Knapsack Algorithms
12 pages
Kinematics Workbook by Praveen Kumar
No ratings yet
Kinematics Workbook by Praveen Kumar
22 pages
PID Control for Variable Speed Wind Turbines
No ratings yet
PID Control for Variable Speed Wind Turbines
6 pages
Boolean Algebra Fundamentals in Logic Design
No ratings yet
Boolean Algebra Fundamentals in Logic Design
29 pages
ESP Flow Rate Estimation in NM Wells
No ratings yet
ESP Flow Rate Estimation in NM Wells
7 pages
Understanding Cartesian Coordinates
No ratings yet
Understanding Cartesian Coordinates
4 pages
Higher Order Homogeneous LDE Solutions
No ratings yet
Higher Order Homogeneous LDE Solutions
13 pages
Ship Resistance Calculations and Diagrams
No ratings yet
Ship Resistance Calculations and Diagrams
39 pages
Understanding Probabilistic Classifiers
No ratings yet
Understanding Probabilistic Classifiers
5 pages
B.Sc. Semester-I Result Summary 2022-23
No ratings yet
B.Sc. Semester-I Result Summary 2022-23
1 page
Class 9 Mathematics: Number Systems Guide
No ratings yet
Class 9 Mathematics: Number Systems Guide
144 pages
Grade 5 Math Quarterly Exam Guide
No ratings yet
Grade 5 Math Quarterly Exam Guide
3 pages
Scenario Planning: Types and Effectiveness
No ratings yet
Scenario Planning: Types and Effectiveness
19 pages
Modern Control Systems Course Outline
No ratings yet
Modern Control Systems Course Outline
4 pages
Numerical Analysis Exam Questions
No ratings yet
Numerical Analysis Exam Questions
4 pages
Understanding Categorical Data Types
No ratings yet
Understanding Categorical Data Types
3 pages
Y9 Maths Curriculum Overview 2024-2025
No ratings yet
Y9 Maths Curriculum Overview 2024-2025
3 pages
Mathematics Question Bank for Class 10+1/2
No ratings yet
Mathematics Question Bank for Class 10+1/2
101 pages
Math Problem Solutions for Primary 4
No ratings yet
Math Problem Solutions for Primary 4
12 pages
Unit Conversion Principles Explained
No ratings yet
Unit Conversion Principles Explained
4 pages
01 - Lecture Slide - Overview of Tensorflow
100% (1)
01 - Lecture Slide - Overview of Tensorflow
65 pages
Thermodynamics: ΔH and Emf Analysis
No ratings yet
Thermodynamics: ΔH and Emf Analysis
3 pages
Comprehensive Math Question Paper
No ratings yet
Comprehensive Math Question Paper
5 pages
Functional Analysis Exam Questions
No ratings yet
Functional Analysis Exam Questions
4 pages