0% found this document useful (0 votes)

81 views4 pages

Predictive Analytics Exam Questions

Predictive AnalyticS

Uploaded by

vidhu.cooky

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

81 views4 pages

Predictive Analytics Exam Questions

Predictive AnalyticS

Uploaded by

vidhu.cooky

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

Predictive Analytics

BAZG512/MBAZG512/PDBAZG512
S2-24
EC3 – Comprehensive Exam (Regular)
Full Marks - 40

Q1. In KNN, Discuss the impact of choosing a very small value for 'K' (e.g., K=1) versus a
very large value for 'K' (e.g., K close to the total number of training samples) on the model's
performance, touching upon potential issues like bias, variance, and sensitivity to noise. (3
Marks)

Q2. Answer the following questions with respect to Naïve Bayes Classifier. (8 Marks)
(a) Clearly explain the "naive" assumption that is central to the Naive Bayes Classifier.
(b) How does this "naive" assumption simplify the calculation of probabilities needed for
classification? (1 Marks)
(c) Despite this often unrealistic assumption, provide one reason why Naive Bayes can still
be an effective classifier in many real-world applications.(1 Marks)
(d) Imagine a simplified scenario where a Naive Bayes Classifier is used to determine if a
patient is likely to have a 'Flu' or 'No Flu', based on two symptoms: 'Fever' (Yes/No) and
'Cough' (Yes/No).
From historical data, the following probabilities have been estimated:
P(Fever=Yes | Flu) = 0.8
P(Cough=Yes | Flu) = 0.7
P(Flu) = 0.1 (Prior probability of having the Flu)
P(Fever=Yes | No Flu) = 0.1
P(Cough=Yes | No Flu) = 0.2
P(No Flu) = 0.9 (Prior probability of not having the Flu)
A new patient presents with both Fever=Yes AND Cough=Yes.
You want to calculate P(Flu | Fever=Yes, Cough=Yes) and P(No Flu | Fever=Yes,
Cough=Yes).
Without needing to calculate the final exact posterior probabilities, write down the
expressions for P(Flu | Fever=Yes, Cough=Yes) and P(No Flu | Fever=Yes, Cough=Yes)
using Bayes' theorem and the "naive" assumption. Which condition (Flu or No Flu)
appears more probable for this patient given the symptoms and priors, and briefly explain
your reasoning by comparing the numerators of these expressions. (5 Marks)
Q3.
In a retail environment, Principal Component Analysis (PCA) is applied to a dataset
containing customer behaviour data, focusing on aspects like online shopping frequency,
average order value, customer ratings, and subscription to promotional offers. The loading
vectors for the first two principal components (PC1 and PC2) are provided below. (7 marks)

V1 V2

Online shopping frequency 0.6 0.2

Average order value 0.6 -0.3
Customer ratings 0.7 0.5
Subscription to promotions 0.3 0.8

Interpret the loading vectors concerning PC1 and PC2. Discuss the implications of a data
point with high PC1 and one with high PC2 regarding the customer segment it represents.
Provide insights into the similarities or differences among various characteristics of the
customers indicated by the features.

Q4. In the context of e-commerce, how can the K-means++ algorithm be applied to perform
customer segmentation and improve targeted marketing strategies? Explain what is the
drawback of KMeans algorithm that is solved by K-means++ in this case. (4 marks)

Q5.
(4 + 4 = 8 marks)
ROC curves for 4 different Machine Learning techniques applied on an employee churn
prediction problem is given below. Comment on which model should be selected and why?
Consider the three threshold points marked in the figure as th1 (0.6), th2 (0.5) and th3 (0.4) in
the orange colour curve for the Random forest model. Discuss the three threshold values and
their effect on the model result. Also, suggest your choice of threshold.
Q6.
Answer the following questions with respect to the following figure. (10 Marks)

a) Suppose the above figures shows decision boundaries for KNN and Logistic
regression model applied on a 2D dataset. Answer which decision boundary (A and
B) is for which algorithm (KNN and Logistic Regression). Explain why. [4 Marks]
b) What function is denoted by the following equation? In which machine learning
algorithm (KNN or Logistic regression) is it used? [1 Mark]

1
−(b0 +b1 x 1+b2 x2 )
1+ e

c) What would be the equation of the decision boundary for the ML algorithm referred
to in question b)? Explain [2 Marks]

d) With reference of question b), explain the function and its necessity in the context of
classification problems? [3 Marks]

Common questions

Using Bayes' theorem and the naive assumption of conditional independence, we express the probabilities for 'Flu' and 'No Flu' given Fever=Yes and Cough=Yes as follows: P(Flu | Fever=Yes, Cough=Yes) ∝ P(Fever=Yes | Flu) × P(Cough=Yes | Flu) × P(Flu), and P(No Flu | Fever=Yes, Cough=Yes) ∝ P(Fever=Yes | No Flu) × P(Cough=Yes | No Flu) × P(No Flu). By comparing the numerators, since P(Fever=Yes | Flu) and P(Cough=Yes | Flu) are both higher than their 'No Flu' counterparts, and P(Flu) is non-zero, the condition 'Flu' appears more probable for the patient .

When selecting a machine learning model using ROC curves for employee churn prediction, one should analyze the area under the curve (AUC) to determine the model's ability to distinguish between classes. A larger AUC suggests a better performance . Additionally, evaluating specific threshold points like th1 (0.6), th2 (0.5), and th3 (0.4) on a particular curve (e.g., the orange curve for a Random Forest model) reveals trade-offs between true positive and false positive rates at each point . The optimal threshold balances sensitivity and specificity. For instance, if the cost of false negatives is high, a lower threshold might be preferred to capture more true positives, even at the expense of more false positives .

The 'naive' assumption in the Naive Bayes classifier is that the features are conditionally independent given the class label. This means the presence or absence of a particular feature does not affect the presence or absence of any other feature given the class . This assumption simplifies probability calculations by allowing the joint probability of the features to be the product of individual probabilities, thus significantly reducing computational complexity .

Choosing a very small value for 'K', such as K=1, in KNN tends to make the model sensitive to noise and can result in high variance. This is because the model effectively memorizes the training data and can easily misclassify a sample if it happens to be near a noisy data point . Conversely, choosing a very large value for 'K' results in a smoother decision boundary, which can reduce variance but increase bias as the model might overlook subtle patterns in the data . Thus, large 'K' values can lead to underfitting as the model generalizes too much, potentially ignoring important local structures .

The logistic function, which maps any real-valued number into the (0, 1) interval, is given by the equation: 1/(1 + e^(-(b0 + b1x1 + b2x2))). It is essential in Logistic Regression as it converts linear combinations of input features into probabilities that can be interpreted as class membership likelihoods . This mapping is critical for binary classification problems, enabling the algorithm to estimate the probability of a sample belonging to a specific class based on input features .

For a customer behavior dataset, a data point with a high PC1 value typically indicates strong associations with variables heavily loaded on PC1, such as online shopping frequency, average order value, and customer ratings, representing customers likely engaged and spending more . On the other hand, a high PC2 value indicates a strong relationship with 'Subscription to promotions' and potentially 'Customer ratings', thus indicating customers who are particularly responsive to promotions, possibly those who look for discounts and offers . This differentiation can help retailers tailor marketing strategies based on the different purchasing profile and promotional responsiveness of customer segments .

Threshold values directly influence the trade-off between sensitivity (true positive rate) and specificity (1 - false positive rate) in a Random Forest model. In an ROC curve, adjusting thresholds like th1 (0.6), th2 (0.5), and th3 (0.4) will shift where the decision boundary is placed, affecting classification outcomes. A higher threshold such as 0.6 makes the model more conservative, increasing specificity which may lead to fewer false positives but might miss out on true positives. Conversely, a lower threshold like 0.4 increases sensitivity, potentially capturing more true positives at the risk of a higher false-positive rate . The choice of threshold should thus reflect the relative costs of different types of classification errors in the specific context of employee churn prediction .

The decision boundary of KNN is typically nonlinear and piecewise defined, reflecting its sensitivity to local neighborhood data variance, as it determines classifications based on localized sample groupings . In contrast, the decision boundary of Logistic Regression is linear, representing a straightforward hyperplane separation as it uses a logistic function to map predictions to probabilities . These boundaries illustrate KNN’s flexibility in complex datasets requiring fine local distinctions, whereas Logistic Regression assumes a more global linear relationship useful for simpler, linearly separable data distributions .

The K-means++ algorithm improves customer segmentation in e-commerce by providing an effective initialization method for cluster centers, which often results in better clustering outputs compared to the standard K-means. It addresses the problem of poor initial cluster centroids assignment, which can lead to suboptimal clustering and convergence to local minima with standard K-means . K-means++ mitigates this by initializing centroids that are distant from one another, ensuring a better spread and representation of the data distribution and potentially leading to improved segmentation and targeted marketing strategies .

Naive Bayes can still be effective in many real-world scenarios because, even with an unrealistic independence assumption, it often provides a good approximation of the joint distribution of features. This effectiveness is due to its robustness to noise and overfitting, particularly in high-dimensional spaces with many features, and its ability to handle both continuous and categorical data efficiently .

Analytics for Competitive Advantage Exam
No ratings yet
Analytics for Competitive Advantage Exam
12 pages
BITS Pilani Marketing Exam EC-3 2024-25
No ratings yet
BITS Pilani Marketing Exam EC-3 2024-25
3 pages
MBA Data Science Exam Questions 2022
No ratings yet
MBA Data Science Exam Questions 2022
1 page
MBA Operations Management Mid-Sem Exam 2024
No ratings yet
MBA Operations Management Mid-Sem Exam 2024
2 pages
Financial Management Exam Guide 2025
No ratings yet
Financial Management Exam Guide 2025
3 pages
IBM Entrance Exam: Data Science Foundations
No ratings yet
IBM Entrance Exam: Data Science Foundations
4 pages
Financial Management Mid-Semester Test 2025
No ratings yet
Financial Management Mid-Semester Test 2025
2 pages
Predictive Analytics Mid-Semester Test 2025
No ratings yet
Predictive Analytics Mid-Semester Test 2025
2 pages
MBA Operations Management Mid-Sem Exam
100% (1)
MBA Operations Management Mid-Sem Exam
1 page
MBA Operations Management Exam Details
No ratings yet
MBA Operations Management Exam Details
3 pages
Operations Management Mid-Semester Test 2025-26
No ratings yet
Operations Management Mid-Semester Test 2025-26
2 pages
ZC415 Exam: Analytics for Advantage
No ratings yet
ZC415 Exam: Analytics for Advantage
2 pages
Managerial Economics Assignment Guide
No ratings yet
Managerial Economics Assignment Guide
6 pages
Boeing's Strategic Management Case Study
No ratings yet
Boeing's Strategic Management Case Study
3 pages
Financial Management Exam Guidelines
No ratings yet
Financial Management Exam Guidelines
20 pages
Business Analytics Mid-Semester Exam
No ratings yet
Business Analytics Mid-Semester Exam
1 page
Operations Management Exam - MBA ZG526
100% (1)
Operations Management Exam - MBA ZG526
2 pages
Hayes' Skills for Renewable Energy Project
No ratings yet
Hayes' Skills for Renewable Energy Project
3 pages
MBA Operations Management Exam 2024
No ratings yet
MBA Operations Management Exam 2024
3 pages
Strategic Management in Dynamic Environments
No ratings yet
Strategic Management in Dynamic Environments
10 pages
Analytics Models for Business Strategy Insights
No ratings yet
Analytics Models for Business Strategy Insights
10 pages
Mid-Sem Exam Solutions: Stat Methods
No ratings yet
Mid-Sem Exam Solutions: Stat Methods
4 pages
BITS Pilani Marketing Exam Questions 2020
No ratings yet
BITS Pilani Marketing Exam Questions 2020
2 pages
ZC415 Mid-Semester Test Overview
No ratings yet
ZC415 Mid-Semester Test Overview
1 page
Financial Management Exam Questions 2025
No ratings yet
Financial Management Exam Questions 2025
7 pages
Marketing Mid-Semester Test 2019-2020
No ratings yet
Marketing Mid-Semester Test 2019-2020
3 pages
Marketing Exam Paper - BITS Pilani
100% (1)
Marketing Exam Paper - BITS Pilani
2 pages
Marketing Strategies for Restaurant Revamp
No ratings yet
Marketing Strategies for Restaurant Revamp
4 pages
BITS Pilani Marketing Mid-Sem Exam 2014
No ratings yet
BITS Pilani Marketing Mid-Sem Exam 2014
1 page
Mid-Semester Test: ZC415 Analytics
No ratings yet
Mid-Semester Test: ZC415 Analytics
1 page
MBA Operations Management Exam 2024-25
No ratings yet
MBA Operations Management Exam 2024-25
2 pages
Marketing Exam Questions for BITS Pilani 2025
No ratings yet
Marketing Exam Questions for BITS Pilani 2025
2 pages
MBA Strategic Management Exam 2023-24
No ratings yet
MBA Strategic Management Exam 2023-24
2 pages
MBA ZC415 Financial Accounting Exam
No ratings yet
MBA ZC415 Financial Accounting Exam
4 pages
Financial Management Exam Guidelines
No ratings yet
Financial Management Exam Guidelines
5 pages
Marketing Exam Questions 2020-21
No ratings yet
Marketing Exam Questions 2020-21
2 pages
Marketing Mid-Semester Test 2022-23
No ratings yet
Marketing Mid-Semester Test 2022-23
4 pages
Strategic Management & Business Policy Guide
No ratings yet
Strategic Management & Business Policy Guide
66 pages
Financial Management Exam Questions 2018
No ratings yet
Financial Management Exam Questions 2018
2 pages
MBA Exam: Global Financial Markets 2023
No ratings yet
MBA Exam: Global Financial Markets 2023
2 pages
Question Paper 2 (Decision Science)
100% (1)
Question Paper 2 (Decision Science)
4 pages
Question Paper 4 (Decision Science)
No ratings yet
Question Paper 4 (Decision Science)
3 pages
MBA Exam: Managing People & Organization
No ratings yet
MBA Exam: Managing People & Organization
2 pages
Business Statistics Test Bank
No ratings yet
Business Statistics Test Bank
8 pages
MBA Strategic Management Exam 2024-25
No ratings yet
MBA Strategic Management Exam 2024-25
1 page
Operations Research Exam Blueprint 2024
No ratings yet
Operations Research Exam Blueprint 2024
72 pages
Customer Arrival and Service Simulation
No ratings yet
Customer Arrival and Service Simulation
21 pages
Marketing Research Questionnaire Design
No ratings yet
Marketing Research Questionnaire Design
6 pages
IIM Indore Operations Management Exam
No ratings yet
IIM Indore Operations Management Exam
7 pages
Blockchain Applications in Autonomous Vehicles
100% (1)
Blockchain Applications in Autonomous Vehicles
4 pages
IBM Entrance Exam: Managerial Economics
No ratings yet
IBM Entrance Exam: Managerial Economics
2 pages
Mid-Semester Test: Financial Accounting 2024
No ratings yet
Mid-Semester Test: Financial Accounting 2024
12 pages
Financial Management Formula Sheet
No ratings yet
Financial Management Formula Sheet
2 pages
Operation Research Exam Paper 2023
No ratings yet
Operation Research Exam Paper 2023
7 pages
Exam Guidelines and Machine Learning Questions
No ratings yet
Exam Guidelines and Machine Learning Questions
5 pages
ML Exam Solutions - May 2022
No ratings yet
ML Exam Solutions - May 2022
17 pages
Pattern Recognition Question Papers for B.Tech
No ratings yet
Pattern Recognition Question Papers for B.Tech
32 pages
Pattern Recognition Question Papers
No ratings yet
Pattern Recognition Question Papers
15 pages
Machine Learning Exam Questions 2024
No ratings yet
Machine Learning Exam Questions 2024
9 pages
Pattern Recognition Exam Paper 2023
100% (1)
Pattern Recognition Exam Paper 2023
2 pages
Predictive Analytics Exam Guide 2024-2025
No ratings yet
Predictive Analytics Exam Guide 2024-2025
4 pages
MBA Data Science Exam Guide 2024-25
No ratings yet
MBA Data Science Exam Guide 2024-25
2 pages
BMW 930730 Data Analysis Overview
No ratings yet
BMW 930730 Data Analysis Overview
55 pages
Predictive Analytics Course Overview
No ratings yet
Predictive Analytics Course Overview
5 pages
Advanced Statistical Methods Exam Guide
No ratings yet
Advanced Statistical Methods Exam Guide
3 pages
PQ Journal and Ldeger
No ratings yet
PQ Journal and Ldeger
33 pages
Cash Flow Statement Preparation Guide
No ratings yet
Cash Flow Statement Preparation Guide
11 pages
Ratios Sheet
No ratings yet
Ratios Sheet
43 pages
Oligopoly Collusion and Economic Profits
No ratings yet
Oligopoly Collusion and Economic Profits
24 pages
MKT Notes by Sajin J - BITS Pilani
No ratings yet
MKT Notes by Sajin J - BITS Pilani
59 pages
MBA Mid-Semester Test: Managing People
No ratings yet
MBA Mid-Semester Test: Managing People
5 pages
MBA Accounting and Finance Exam Questions
No ratings yet
MBA Accounting and Finance Exam Questions
17 pages
Atwood Machine and Impulse-Momentum
No ratings yet
Atwood Machine and Impulse-Momentum
8 pages
Brooke Shaden - Fine Art Compositing - Logic Checklist
No ratings yet
Brooke Shaden - Fine Art Compositing - Logic Checklist
8 pages
Textile Engineering Internship Report
No ratings yet
Textile Engineering Internship Report
26 pages
Understanding Phenomenological Anthropology
No ratings yet
Understanding Phenomenological Anthropology
9 pages
IRDH375 Series: Digital Ground Fault Monitor / Ground Detector Ungrounded (Floating) AC/DC Systems
No ratings yet
IRDH375 Series: Digital Ground Fault Monitor / Ground Detector Ungrounded (Floating) AC/DC Systems
6 pages
Essential Dog Care Guide for Tutors
No ratings yet
Essential Dog Care Guide for Tutors
29 pages
Essential Baking Tips and Techniques
No ratings yet
Essential Baking Tips and Techniques
92 pages
Transformer Control for Wind Farm Self-Start
No ratings yet
Transformer Control for Wind Farm Self-Start
5 pages
Selma Blair
No ratings yet
Selma Blair
54 pages
QD75M Training Manual
No ratings yet
QD75M Training Manual
390 pages
7 Continents and 5 Oceans Overview
No ratings yet
7 Continents and 5 Oceans Overview
2 pages
Split AC Installation Quotation
No ratings yet
Split AC Installation Quotation
1 page
Vibrio cholerae Testing Protocols
No ratings yet
Vibrio cholerae Testing Protocols
56 pages
Magnetic Field Treatment of Alloys
No ratings yet
Magnetic Field Treatment of Alloys
179 pages
Urban Wastewater Resistome Analysis
No ratings yet
Urban Wastewater Resistome Analysis
12 pages
Kawasaki VN1500 Service Manual Guide
75% (4)
Kawasaki VN1500 Service Manual Guide
270 pages
Organic Electrode Materials in Batteries
No ratings yet
Organic Electrode Materials in Batteries
22 pages
Circle Equations and Properties
No ratings yet
Circle Equations and Properties
4 pages
Understanding French Partitive Articles
No ratings yet
Understanding French Partitive Articles
4 pages
Contact Dynamics Method Overview
No ratings yet
Contact Dynamics Method Overview
16 pages
Uroradiology Techniques for Urologic Imaging
No ratings yet
Uroradiology Techniques for Urologic Imaging
89 pages
Panel Commissioning & Training Proposal
No ratings yet
Panel Commissioning & Training Proposal
3 pages
Grundfos CR 10-4 A-FJ-A-E-HQQE Specs
No ratings yet
Grundfos CR 10-4 A-FJ-A-E-HQQE Specs
7 pages
Nursing Roles in Antipsychotic Care
No ratings yet
Nursing Roles in Antipsychotic Care
28 pages
Air Conditioner Installation Quotation
No ratings yet
Air Conditioner Installation Quotation
3 pages
Ashoka's Dhamma: Principles and Impact
100% (2)
Ashoka's Dhamma: Principles and Impact
3 pages
GEKA PUMA Portable Punching Machines
No ratings yet
GEKA PUMA Portable Punching Machines
8 pages
Shehla Zia Case: Right to Environment
No ratings yet
Shehla Zia Case: Right to Environment
9 pages
Pone 0031781
No ratings yet
Pone 0031781
12 pages
Kalaburagi City Slum Population Insights
No ratings yet
Kalaburagi City Slum Population Insights
19 pages

Predictive Analytics Exam Questions

Uploaded by

Predictive Analytics Exam Questions

Uploaded by

Predictive Analytics

Online shopping frequency 0.6 0.2

Common questions

Using the Naive Bayes classifier, how can you express the probabilities for 'Flu' and 'No Flu' given symptoms of both Fever and Cough in a patient, and which condition appears more probable?

How should the ROC curves and threshold points be analyzed to select the best machine learning model for predicting employee churn?

What is the 'naive' assumption in the Naive Bayes classifier, and how does it simplify probability calculations for classification?

How does the choice of a very small or very large value of 'K' in the K-Nearest Neighbors (KNN) algorithm affect model performance in terms of bias, variance, and sensitivity to noise?

What mathematical function corresponds to the logistic function used in Logistic Regression, and what is its role in classification?

In a retail context, how would a data point with high principal component 1 (PC1) differ from one with high principal component 2 (PC2) when using Principal Component Analysis (PCA) on customer behavior data?

How do threshold values affect the performance of a Random Forest model in an employee churn problem when viewed through its ROC curve?

What decision boundary characteristics distinguish KNN from Logistic Regression, and how do these features relate to their inherent algorithms?

How can the K-means++ algorithm be utilized to improve customer segmentation in e-commerce and what problem of the K-means algorithm does it address?

Why can the Naive Bayes classifier be effective despite its unrealistic assumption of feature independence?

You might also like