0% found this document useful (0 votes)
3 views7 pages

Predicting Consumer Behavior with ML

The document discusses predicting consumer behavior using machine learning techniques, specifically K-Nearest Neighbors (KNN) and Naive Bayes algorithms, applied to a dataset from Myntra. It highlights the influence of factors like price and brand on purchasing decisions and presents findings that indicate Naive Bayes achieved an accuracy of approximately 48% in predicting consumer buying behavior. The analysis emphasizes the importance of data quality and algorithm selection in improving predictive accuracy.

Uploaded by

Devika Mehrotra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views7 pages

Predicting Consumer Behavior with ML

The document discusses predicting consumer behavior using machine learning techniques, specifically K-Nearest Neighbors (KNN) and Naive Bayes algorithms, applied to a dataset from Myntra. It highlights the influence of factors like price and brand on purchasing decisions and presents findings that indicate Naive Bayes achieved an accuracy of approximately 48% in predicting consumer buying behavior. The analysis emphasizes the importance of data quality and algorithm selection in improving predictive accuracy.

Uploaded by

Devika Mehrotra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

DA&R

PREDICTING CONSUMER BEHAVIOR THROUGH MACHINE


LEARNING

Celestine D’souza
Student, Department of fashion technology, National Institute of Fashion Technology
Jodhpur, India
Devika Mehrotra
Student, Department of fashion technology, National Institute of Fashion Technology
Jodhpur, India

1. ABSTRACT
With digital innovations, it has become easier for consumers to search for items, match their
preferences with products, and verify the reputation of those products. Consumers decide on a daily
basis whether to buy a product. Oftentimes, the decision to purchase something is influenced by the
price alone, but it can also be influenced by other factors. Today, big data analytics has revolutionized
how brands market to their audiences using digital technology. During their searches, selections, and
purchases of products and services, consumers' journeys are described. It aims to examine the
relationship between consumer behaviour and variables such as price, brand image, and product
information for Myntra product catalogue by using big data and predictive analytics to accurately
predict consumers' behaviour using algorithms like Naive Bayes and KNN and use this information for
business decisions.

Keywords: Consumer Behaviour, Machine Learning, sentimental analysis

2. INTRODUCTION:

A perfect world would have one-to-one interactions between your business and your customers. That's
unlikely, and usually not feasible. Profiling your customers based on their shared characteristics allows
you to more effectively target their needs. You can cluster customers based on internal and
supplemental data such as demographics, geography, product channels, and previous purchases. If you
have a good understanding of customer behaviours within each segment, you can optimize your
communication and offerings across the entire customer lifecycle and even anticipate their
requirements before they are even aware of them. It is difficult to define the logic behind most of our
purchasing decisions. Our buying decisions are heavily influenced by emotions, trust, communication
skills, culture, and intuition. To understand it we implemented two models KNN and Naïve Bayes on
our dataset; Myntra product catalogue
One may share many characteristics with their nearest peers, whether it be your thought process,
working etiquettes, philosophies, or anything else. As a result, we build friendships with people we
deem similar to ourselves. Similar principles apply to the KNN algorithm. To figure out what class a
new unknown data point belongs to, it attempts to locate all its closest neighbours. This approach is
based on distance. KNN stands for "K-Nearest Neighbor". Machine learning is supervised by KNN.
Both classification and regression problems can be solved with this algorithm. KNN calculates the
distance from all points near the unknown data and filters out the ones with the shortest distances.
Therefore, it's called a distance-based algorithm
If you're working with data that contains millions of records, Naive Bayes is a machine learning model
that's recommended. Sentiment analysis is one of the NLP tasks that it provides very good results for.
As an algorithm, it is simple and fast This theorem applies to conditional probability. The conditional
probability is the probability of something happening given what has already occurred. Based on the
prior knowledge of an event, we can determine the conditional probability.

3. LITERATURE REVIEW

3.1 K-NN ALOGORITHM


The K-Nearest Neighbour algorithm is a simple supervised learning technique based on machine
learning algorithms. It assumes the similarity between the new case/data and available cases and put
the new case into the category that is most similar to the available categories. K-NN algorithm stores
all the available data and classifies a new data point based on the similarity. 1 . It is also called a lazy
learner algorithm because it does not learn from the training set immediately instead it stores the
dataset and at the time of classification, it performs an action on the dataset algorithm at the training
phase just stores the dataset and when it gets new data, then it classifies that data into a category that is
much similar to the new data2.
Example: Suppose, let's say we have an image of an animal that looks similar to either a dog or a cat,
we want to confirm that weather the image is of a dog or a cat for this classification we use KNN
algorithm the KNN model will find the features that are similar to the dataset that of dogs’ and cat’s
image the one which is the most similar will decide if the image if of a dog or a cat
Principle: Consider the following figure 1. Let us say we have plotted data points from our training set
on a two-dimensional feature space. As shown, we have a total of 6 data points (3 red and 3 blue). Red
data points belong to ‘class1’ and blue data points belong to ‘class2’. And yellow data point in a
feature space represents the new point for which a class is to be predicted. Obviously, we say it
belongs to ‘class1’ (red points)3

Figure 1
Figure 2
This is the main
principle
behind K

1
[Link]
2
[Link]
3
[Link]
nearest neighbours , that is data points that have the minimum distance K is the number of such data
points we consider in our implementation of the algorithm. Therefore, distance metric and K value are
two important considerations while using the KNN algorithm. Euclidean distance is the most popular
distance metric.
K is a essential parameter in the KNN algorithm, so how to choose a k value??
I)Using error curves: The graph (figure 2 ) shows curves for different value of k for training and test
data .At lower K value there is high variance that's why test error is high and train error is low .4
ii) Knowledge of domain is most useful in choosing the k value
iii)K value should be odd while considering binary classification

3.2 NAIVE BAYES CLASSIFIER


Naive Bayes algorithm are a supervised learning algorithm entirely based on the Bayes theorem it is
used for solving problems based on classification. It is mostly used in text classification with high
dimensional training dataset, it is one of the simplest and very effective algorithms it helps
constructing fast machine learning models that can make quick predictions. It is called probabilistic
classifier that means it can predict on the basis of probability of an object in space. The most popular
functions of NB are spam filtration, sentimental analysis and classifying articles.
Bayes Theorem finds out the probability of an event occurring given the probability of another event
that have already occurred. The Bayes' theorem states mathematically as the following equation:

Where A and B are events, P(B) is not 0 in which we are trying to find the probability of event A that
event B is true therefore event B is also termed as Evidence. P(A) is priori of A i.e., Probability of
event before evidence is even seen. Evidence is an attribute value of an unknown instance here its
Event B. P(A/B) is the probability of event after evidence can be seen i.
The implementation of NB starts at pre data processing setup followed by fitting NB into training set ,
Predicting the result , accuracy of the test result (confusion matrix),visualizing the test set result.
Naive Bayes is the fastest and easiest ML model to find the class of [Link] is used for both binary
and multi class classifications it performs good in multiclass predictors as compared to other models
also it is most famous choice for text classification problems however NB assumes that all features are
unrelated to one another therefore cannot understand the relationship between features' .
NB model is used for many purposes and have quite a few applications like credit scoring, medical
data classification, predictions using real time, heavily used in spam filtering and sentimental analysis.
4. METHODOLOGY

4
[Link]
[Link]
The popularity of e-commerce sites among customers is growing. In online shopping brands play a
huge role as it effects the consumer’s mind in terms of quality , price, social status and so on .
4.1 Data Collection:
[Link]. Product Product Product Gender Price Description Colo Num Buying
ID Name Brand r Images Behaviour

1 1001741 GAP GAP Men 749 Long Black 4 yes


3 Boys sleeves with
Graphic banded
Crew cuffs
Sweatshirt
.
.
12492 1026184 7Rainbow 7 Women 587 Blue Blue 3 yes
5 Beige & Rainbow checked
Blue saree with
Checked tasselled
Saree detail
1) The dataset used for customer behaviour analysis is collected from open platform
[Link] data is of Myntra , an e-commerce website which contains 12491 observations
with 9 columns of product , product information , price ,brand, gender , product Id , primary
colour and buying behaviour. The dataset is discrete with numeric and textual variables. A
small illustrative sample is shown in Table 1.

Table 1 : A small illustrative sample of entries in our data set which contains 12492 consumer decisions to
purchase or not to purchase a specific product.

S.N neck_a shoulder_ elbow_a trunka l_neck_a L L L trunk


o. ngle angle ngle ngle ngle Shoulder elbow angle
angle angle

0 - - - - - - - 27.42
62.1154 70.3564 112.233 98.8793 82.19283 72.997 144.0 654
44 948 36

.
.
111 -56.4353 -81.8038, -94.8011, -92.7495, - - - 28.4672
77.5833787 81.86668 140.632 2
, 4, ,
2) The second model we used was K-NN which was implemented on two sheets which were consisting of
abnormal data. It contains data about shoulder angle, neck angle, trunk angle the dataset is discrete
with numeric.

5. CLASSIFICATION METHODOLOGIES
For our analysis, we adopted the use of two different, well-known classification approaches. The algorithms
were chosen based on their widespread use, well-understood behaviour, and convincing performance in other
classification tasks. In addition, both are functional across a wide range of heterogeneous data types, with some
features being categorical while others are continuous. (Saavi Stubseid, 2018).

6. RESULTS AND DISCUSSION

Table 3: Analysis Classifier Accuracy Precision Recall result of customer


buying behaviour for Naïve Bayes
algorithm.
Naïve Bayes 48.78 48.83 52.05

For Naïve Bayes classifier we used testing data of 3121 observations. Table 3 shows the analysis result of
customer buying behaviour depending on the brand , price and product. According to analysis done using
confusion matrix Naïve Bayes classifier gives an accuracy of approximately 48.78%. Through our analysis, we
also get to know that 1459 consumers will buy the product and 1663 won’t when the main factors are product
brand, price out of 3122 observations.
Accuracy plot of both datasets 14 and 15 gives result 1, it is considered ideal for a abnormal datasets. The
relatively low accuracy of kNN is caused by several factors. One of them is that every characteristic of the
method has the same result on calculating distance. The solution of this problem is to give weight to each data
characteristic5. Accuracy of knn can be increased by improving dataset and eliminating missing values, adding
more numerical values into model implementations.

CONCLUSION
In this paper we have proposed the classification and behaviour of buying behaviour of a consumer using
machine learning. We have collected our data which consists of all myntra products apart from this we have
used two abnormal datasheet we have used mainly two models KNN and Naive Bayes, in this analysis we found
that for KNN the accuracy in both abnormal datasheets is 0 and the accuracy rate plot indicated the
value as 1.0. As the data set is small in case of abnormal data the accuracy tends to be more accurate.
However, the confusion matrix could not be applied.
In Naive Bayes implementation the final output shows that we built a Naive Bayes classifier that can
predict whether a consumer will buy or not depending on the brand and price of the products., with an
accuracy of approximately 48%. By using the confusion matrix, we got to know Through the
confusion matrix we get to know that 1459 consumers will buy the product and 1663 won’t when the
main factors are product brand, price out of 3122 observations. The ggplot() graph helps in
visualisation of price in Naive bayes model.

5
: K U Syaliman et al 2018 J. Phys.: Conf. Ser. 978 012047 ;Improving the accuracy of k-nearest neighbor using local mean
based and distance weight.
i

You might also like