0% found this document useful (0 votes)
15 views6 pages

Key Concepts in Machine Learning

The document outlines various topics in machine learning, including types of machine learning, evaluation metrics, and specific algorithms like logistic regression and decision trees. It also covers advanced concepts such as neural networks, clustering techniques, and reinforcement learning. Additionally, it discusses practical applications and challenges in machine learning, along with the importance of testing and model evaluation.

Uploaded by

Jovan .J.B
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views6 pages

Key Concepts in Machine Learning

The document outlines various topics in machine learning, including types of machine learning, evaluation metrics, and specific algorithms like logistic regression and decision trees. It also covers advanced concepts such as neural networks, clustering techniques, and reinforcement learning. Additionally, it discusses practical applications and challenges in machine learning, along with the importance of testing and model evaluation.

Uploaded by

Jovan .J.B
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

2 Marks

Unit – 1
List the three main types of Machine Learning.
What does the ROC curve represent?
What does the ROC curve represent in machine learning evaluation?
Mention the importance of Testing in Machine Learning
Mention the procedure in Machine Learning.
Write the formula for Precision in terms of TP, FP.

Unit – 2
What is the main objective of Logistic Regression?
What is an ensemble method in ML?
Give the mathematical equation of simple linear regression.
Differentiate Regression and Classification Algorithms.
How does simple linear regression differ from multiple linear regression?
What does a higher Gini Index value signify about impurity?

Unit – 3
What is meant by the agglomerative hierarchical clustering?
Expand ICA and write one application.
List the applications of K-means clustering.
What is dimensionality reduction in PCA?
State two advantages of using PCA
Write the difference between K-Means and K Modes Clustering.

Unit - 4
Why is weight initialization important in neural networks?
Differentiate between training error and validation error.
Who introduced the perceptron model? Where?
State the limitations of SVM.
What is the role of activation functions in neural networks?
Define the Curse of Dimensionality.

Unit - 5
Define a probabilistic graphical model.
Write two applications of Monte Carlo method
Why are random sampling techniques used in Monte Carlo methods?
Write the difference between exploration and exploitation in RL.
What is a Naive Bayes classifier?
Define Hidden Markov Model.
6 Marks
Unit – 1
1. Explain the Machine Learning process with a neat diagram.
2. Demonstrate with an example how training, validation, and test sets are used in ML
model evaluation
3. E-commerce platforms like Amazon and Flipkart use Machine Learning for real-time
recommendation systems. Explain the working of such systems and discuss the
challenges in scalability, personalization, and data sparsity.
4. A binary classifier produced the following results on a dataset of 200 samples:
True Positives (TP) = 70
False Positives (FP) = 20
True Negatives (TN) = 90
False Negatives (FN) = 20
a) Construct the confusion matrix.
b) Calculate Accuracy, Precision, Recall, and F1-Score.
5. A spam filter classifies 200 emails as Spam or Not Spam. The results are:
• True Positives (Spam correctly identified) = 60
• True Negatives (Not Spam correctly identified) = 100
• False Positives (Not Spam misclassified as Spam) = 20
• False Negatives (Spam misclassified as Not Spam) = 20
Construct the confusion matrix and calculate:
a) Accuracy
b) Precision
c) Recall (Sensitivity)
d) F1 Score
6. Explain the different preprocessing techniques such as handling missing values,
normalization, encoding categorical data, and feature scaling with suitable examples.

Unit - 2
1. Compare Linear Discriminant Analysis (LDA) and Logistic Regression in terms of
assumptions and applications.
2. Fit a simple linear regression model for the following dataset and find the regression
equation (Y on X):
X 1 2 3 4 5
Y 2 4 5 4 5
3. Supervised learning relies on labelled data for model training. Explain how labelled
data is generated and maintained in real-world applications such as fraud detection
and medical diagnosis.
4. Build a Decision Tree for Loan Approval in Banking. Use Decision Trees to assess
whether a loan application should be approved. The decision is based on factors like
credit score, income, employment status and loan history.
5. Discuss various evaluation metrics such as accuracy, precision, recall, F1-score, and
ROC-AUC with suitable examples. How do these metrics help in selecting the right
model for a given application?
6. Using an example dataset predicting whether a student will pass/fail based on study
hours and attendance explain the working of a Decision Tree with a neat diagram.

Unit – 3
1. Apply K-Means clustering (K=2) to the dataset: (2,10),(2,5),(8,4),(5,8),(7,5),(6,4).
Assume initial centroids as (2,10) and (5,8). Perform one iteration of assignment step.
2. A company collects customer purchase data with 10,000 features (dimensions).
Discuss the challenges due to curse of dimensionality.
3. Explain the process of dimensionality reduction using PCA. Illustrate with an example.
4. Elaborate the different types of Density Based clustering and compare the types based
on its key features.
5. Describe the different Centroid Based Clustering with examples.
6. Explain with a program code for the Dimensionality reduction of any image dataset
using Principal Component Analysis.
Unit – 4
1. Explain the role of forward pass, error calculation, and weight update in
backpropagation.
2. Draw the architecture of a multilayer perceptron with 2 input nodes, 1 hidden layer (3
neurons), and 1 output neuron. Explain the flow of information.
3. Describe the backpropagation algorithm in neural networks. Explain its steps with a
simple two-layer example.
4. Compare linear and non-linear SVMs in terms of decision boundary, kernel functions,
and computational complexity. Give suitable examples.
5. Explain with an example the working of a Perceptron.
6. Discuss the importance of weight initialization in neural networks with an example.

Unit – 5
1. Explain the role of the forward algorithm in HMMs with a simple example.
2. Consider a Bayesian network with nodes: Rain (R), Sprinkler (S), WetGrass (W).
Draw the Bayesian network structure and explain the dependencies.
3. Describe the structure of a Bayesian Network with a neat diagram. Construct an
example Bayesian Network for a medical diagnosis problem (e.g., Disease →
Symptoms).
4. Elaborate the role of reinforcement learning in intelligent systems. With suitable
examples, explain how an RL agent interacts with the environment using states, actions,
and rewards.
5. Explain the concept of Markov Models with the help of transition probabilities.
6. Describe the use of Monte Carlo techniques in Machine Learning for optimization and
probabilistic estimation with suitable examples.
10 Marks
Unit – 1
1. Evaluate the impact of overfitting and under fitting in ML with real-world examples.
Suggest techniques to overcome them.
2. Discuss in detail the bias–variance tradeoff. How can an ML engineer balance both
during model building?
3. Illustrate on Appropriate fitting vs Overfitting and explain the techniques used to
avoid Overfitting.
4. Explain the Bias–Variance trade-off in machine learning with a neat diagram.
Illustrate with an example how underfitting and overfitting affect model performance.
5. A dataset of 1000 emails is classified into "spam" and "not spam". Out of 200 spam
emails, the model correctly predicted 150 as spam and 50 as not spam. Out of 800
non-spam emails, it correctly predicted 720 as not spam. Construct the confusion
matrix and calculate accuracy, precision, recall, and F1-score.
6. Explain the testing process in machine learning. Discuss its importance in evaluating
model performance and common testing strategies.
7.

Unit – 2
1. Design a logistic regression model for predicting whether a student passes/fails based on
study hours and attendance. Show dataset partitioning and evaluation metrics
2. Use K-Nearest Neighbor classifier (K=3) to classify the test point X=6 based on the
training data:
X Class
2 A
4 A
5 B
7 B
9 B
3. Consider the following dataset of 2D points with their class labels: We want to classify a
new point Q (5, 4) using the K-Nearest Neighbours algorithm. Use the Euclidean distance
metrics and
• Compute the distances from Q to all points.
• Classify Q using K = 3 neighbours.
• Write the predicted class label of Q.

Point Coordinates (x, y) Class


P1 (1, 2) A
P2 (2, 3) A
P3 (3, 3) B
P4 (6, 5) B
P5 (7, 8) B
P6 (8, 6) A
4. Write a python code to apply the following data cleaning methods to the example dataset
given below and discuss the need of respective data cleaning process. Write the Output.
1. Find the Null Values
2. Drop the column GENDER
3. Drop rows with missing values

5. Demonstrate the concept of Multivariate


Regression with an example.
6. Write a python code to apply the following data cleaning methods to the example dataset
given below and discuss the need of respective data cleaning process. Write the Output.
• Find the Null Values
• Drop the column GENDER
• Drop rows with missing values

Unit – 3
1. Given the dataset below, perform PCA up to 2 principal components and show the
reduced representation:

2. Compare K-Means and Hierarchical Clustering in terms of: K-Means and Hierarchical
clustering on the basis of Algorithm, Complexity, Cluster shape, Scalability,
Interpretability. Based on the comparison, suggest which method you would apply for
clustering large, high-dimensional datasets.
3. Discuss Principal Component Analysis (PCA) in detail. Derive how PCA reduces
dimensionality and differentiate between Probabilistic PCA and standard PCA.
4. Cluster the following eight coordinates into three clusters:
A1(2, 10), A2(2, 5), A3(8, 4), A4(5, 8), A5(7, 5), A6(6, 4), A7(1, 2), A8(4, 9). Assume
the initial cluster centers are A1(2, 10), A4(5, 8) and A7(1, 2). The distance function
between two points a = (x1, y1) and b = (x2, y2) is defined as Ρ (a, b) = |x2 – x1| + |y2
– y1|. Use K-Means Algorithm to find the three cluster centres after the second iteration.
5. Discuss in detail on types of Hierarchical clustering and its steps with a neat diagram.
6. Perform K-Means clustering with K=2 on the dataset:
Points: (1,1), (2,1), (4,3), (5,4)
Initial Centroids: (1,1), (5,4)
Show cluster assignment after first iteration.
Unit – 4
1.
(i) Define perceptron and its basic components(4)
(ii) A perceptron has inputs X1=1, X2=0, weights W1=0.4,W2=0.6, and bias = 0.2.
Compute the perceptron output using a step activation function (threshold = 0.5).(6)
2. Describe linear SVM. Given a dataset with two classes that are linearly separable,
sketch the decision boundary formed by a linear SVM. Label support vectors.
3. Explain the working of a Multilayer Perceptron with the backpropagation learning
algorithm.
4. Elaborate the concept of margin in Support Vector Machines and discuss the limitations
of SVM.
5. Explain how SVM can be used as a linear and non-linear classifier with an example.
6. Describe the architecture of Multilayer Perceptron with a neat diagram and its training
process.

Unit – 5
1. Apply the Naïve Bayes algorithm to classify a new instance given the following
training set:
Weather Temperature Play
Sunny Hot No
Overcast Hot Yes
Rainy Mild Yes
Sunny Mild Yes
Predict whether to Play or Not Play when Weather=Sunny, Temperature=Mild
2. Analyze the interaction between exploration and exploitation in reinforcement learning.
Explain how this trade-off affects the learning of an optimal policy.
3. A Naive Bayes classifier is used for text classification. Consider 2 classes: Sports and
Politics.
P(Sports)=0.6, P(Politics)=0.4
P(word="match"|Sports)=0.05
P(word="match"|Politics)=0.01
Compute posterior probability for the class when the word “match” appears
4. Explain the idea of policy, value function, and Q-function in Reinforcement Learning
and illustrate it with a real-world example.
5. In a Hidden Markov Model, given observed sequence O={A,B}, compute probability
using forward algorithm with provided transition and emission matrices.
6. Describe the reinforcement learning process in detail. Explain the roles of agents,
environment, states, actions, and rewards.

You might also like