60% found this document useful (10 votes)
6K views2 pages

Machine Learning Exam Paper GR20D5129

This document is a machine learning exam question paper that contains two parts: Part A consists of 10 short answer questions worth 2 marks each on topics like reinforcement learning, overfitting, entropy, linear discriminant analysis, K-means clustering, and active learning. Part B consists of 5 long answer questions worth 10 marks each, including questions on computer vision applications, logistic regression, decision trees, clustering, dimensionality reduction, and sequential data modeling. Students are required to answer all questions in the paper over its 3 hour duration for a total of 70 marks.

Uploaded by

SH Gaming
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
60% found this document useful (10 votes)
6K views2 pages

Machine Learning Exam Paper GR20D5129

This document is a machine learning exam question paper that contains two parts: Part A consists of 10 short answer questions worth 2 marks each on topics like reinforcement learning, overfitting, entropy, linear discriminant analysis, K-means clustering, and active learning. Part B consists of 5 long answer questions worth 10 marks each, including questions on computer vision applications, logistic regression, decision trees, clustering, dimensionality reduction, and sequential data modeling. Students are required to answer all questions in the paper over its 3 hour duration for a total of 70 marks.

Uploaded by

SH Gaming
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
  • Part A
  • Part B
  • Advanced Questions

CODE: GR20D5129 GR 20 SET 4

Regular Eaaminations, October/November 2021


[Link] Year l Senmester Regu
Machine Learning
(Data Science) Max Marks: 70
Time: 3 hours

Instructions:
[Link] paper comprises of Part-A
and
Part-B
at one place in the answer book.
2. Part-A (for 20 marks) must be answercd
questions.
3. Part-B (for 50 marks) consists of
tivequestions with internal choice, answer all

PART A
(Answer ALL questions. All questions carry equal marks)
10*2 20 Marks

1. a. How Reinforcement learning is difierent from Unsupervised Learning. 121


b. Define the term Overfitting. Give one solution for it. 121

Give the formula for What is the role of Information gain in constructing the 21
C. Entropy.
decision tress?
Analysis in Machine learning? 121
d. What is role of Linear Discriminant

Mixture and Latent factor models.


2
e. Compare
.
Write the importance of K-Means Clustering. 121
Write about R--Score and its significance. 121

h. Give the significance of Hyperparameter


Optimization. 12
with sparse data? 121
i. How does Machine Learning deal
What is the advantage of Active learning? 121
j
PART B

(Answer ALL questions. All questions


carry equal marks)
5 10 50 Marks

applications related to Computer


Vision. 101
(a) Discuss on any three
bias-variance trade off.
(b) Define the terms bias and variance. Elaborate on

OR

3. (a) Explain in detail about learning curves. 10


in
(b) How Machine Lesrming and Deep Learning are related? Explain regularization
deep learning

the mathematical modelling involved in 10


4. (a) What is Logistic Regression? Explain
Logistic Regression classifier with an example.

(b) Elaborate on Distance based Methods.

OR

Pagel of 2
CODE:GR20D5129 GR2 0 SET -4
tree by using iD3 algorithm. |10)
Consider the following
data and
Construct the dec ision

Competition Type
Profit(Class)
Age
Yes Software Down
Olk Down
No S o f t w a r e

Old
No H a r d w a r e
Dowd
Old Down
Yes Software

Mid
Yes Hardware
Down
Mid
No Hardware Up
Mid
No Software Up
Mid
Yes Software Up
New
No Hardwaré Up
New
No Software Up
New

[10
based clustering algorithms.
Write about Density
6. (a) Hierarchical clustering? Justify
c o m e s under
Agglomerative or
Does K-means
(b) example.
answer with a numerical
your
OR
Factorization approach for Dimensionality 101
Matrix
Kernel PCA
Describe PCA and
Reduction.

models. Write
about Boosting and Bagging 10
need of Ensemble
(a) Explain metrics effects Regression analysis.
Error and R-
Mean Square
(b) Explain
OR
l0
evaluating
machine learning algorithms.
9. (a) Discuss on will increase accuracy
of the models?
mechanism
How this
(b) What is Pipeline? example.
relevant
Explain with
10
data.
RNN algorithm for sequential
6.(a) Explain
Learning.
Active Learning in Machine
the need of
(b) Explain
OR
market data.
10
Stock
for Time Series analvsis for
Elaborate the procedure and machine
11. (a) deep learning
reinforcement learning from
distinguishes
(b) What
learning?

****

Page2 of 2

Common questions

Powered by AI

The formula for entropy in the context of information theory is given by: \( H(S) = -\sum_{i=1}^{n} P(x_i) \log_2 P(x_i) \), where \( P(x_i) \) is the probability of occurrence for each class \( x_i \) in a dataset \( S \). Information gain is a key metric used in the construction of decision trees. It measures the reduction in entropy or impurity after a dataset is split on an attribute. Information gain is used to choose the attribute that best separates the data into distinct classes, thereby resulting in an optimal decision tree. The chosen attribute is the one that, when divided into branches, results in the most significant reduction in weighted entropy, which is crucial for building a tree that generalizes well to unseen data .

Machine Learning handles sparse data using techniques that accommodate or reduce sparsity, which is characterized by the presence of a large number of zeros in datasets like text or collaborative filtering scenarios . One common approach is the use of Regularization, such as L1 regularization (Lasso), which encourages sparsity in the model coefficients themselves. Another method is Matrix Factorization, better suited for recommendation systems, where sparse matrices are approximated by two lower-dimensional matrices that capture the latent patterns . Feature selection methods help by reducing dimensionality, retaining only the most informative features. Additionally, techniques like Sparse Coding and Compressive Sensing explicitly focus on representing data as a sparse combination of basis elements, thus dealing effectively with high-dimensional and sparse environments . These strategies ensure that sparse data does not compromise the performance, efficiency, and scalability of Machine Learning models.

Mixture Models and Latent Factor Models are both probabilistic in nature but have distinct purposes and methodologies. Mixture Models, such as Gaussian Mixture Models, assume that data points are generated by a mixture of several distributions, each representing a different cluster or group within the data. They are useful for capturing population heterogeneity and are often used for clustering tasks without considering any underlying structure beyond the mixture . On the other hand, Latent Factor Models, such as those used in collaborative filtering, assume that observed data is influenced by unobserved (latent) factors. These models aim to uncover the latent factors responsible for observed correlations and are commonly used in recommendation systems to model interactions between entities, such as users and items . Therefore, while Mixture Models focus on clustering based on data distribution, Latent Factor Models emphasize discovering hidden structures influencing the observable data.

Linear Discriminant Analysis (LDA) is used in Machine Learning primarily for dimensionality reduction and classification. It projects data from a higher-dimensional space to a lower-dimensional space while maintaining separability among classes . LDA maximizes the ratio of between-class variance to the within-class variance in any particular dataset, ensuring that the classes remain as distinct as possible when mapped to a smaller subspace. Unlike PCA, which focuses solely on maximizing variance without regard to class labels, LDA explicitly accounts for the class label information, making it better suited for classification tasks where class separability is essential . Thus, LDA is potent in scenarios where the objective is to find the feature space that best discriminates between known classes.

Active Learning is beneficial in Machine Learning because it enhances learning efficiency by selectively querying the most informative data points for labeling, thereby reducing the overall labeling cost and improving model performance with fewer labeled instances . This is particularly advantageous in scenarios where labeling data is expensive, time-consuming, or requires expert input, such as medical diagnosis or fine-tuning language models where huge labeled datasets are scarce. Active Learning helps against the downsides of random sampling by focusing on data points that are likely to improve the decision boundary or fill knowledge gaps in the model’s current understanding . This selective querying process ensures that the model obtains the most value per label, making it an impactful strategy when resources are constrained.

The R-Square, or R² Score, is a statistical measure that represents the proportion of the variance for a dependent variable that's explained by an independent variable or variables in a regression model . It provides a measure of how well the observed outcomes are replicated by the model, based on the proportion of total variation that is explained by the model. The R² Score is important because it offers a quantifiable value to assess the goodness-of-fit of the model, with values closer to 1 indicating better model performance. However, a high R² does not necessarily mean the model is optimal, as it can sometimes increase with more variables without improving model prediction, leading to overfitting . As such, R² needs to be interpreted in the context of the model complexity and the specific characteristics of the data.

K-Means clustering is significant in Machine Learning because it provides a simple yet efficient way to categorize data into distinct groups, facilitating data analysis and summarization . The algorithm works by partitioning the dataset into \( k \) clusters, where each data point belongs to the cluster with the nearest mean, serving as a prototype of the cluster. Key advantages include its efficiency in handling large datasets and its straightforward implementation. However, K-Means has notable limitations. It is sensitive to the initial placement of centroids, which can lead to different results in different runs. The algorithm assumes that clusters are spherical and equally sized, which may not align with the real cluster structure in data. Additionally, it struggles with varying cluster sizes and densities, and is not robust against outliers and noise, highlighting the importance of careful preprocessing and parameter selection .

Hyperparameter Optimization involves the process of finding the best combination of hyperparameters for a Machine Learning model, which are the external configurations not learned from the training data but set prior to the learning process . This optimization is crucial for model performance because hyperparameters can significantly affect the model's predictive power, convergence, and computational efficiency. Poorly chosen hyperparameter values can lead to model underfitting, overfitting, or inefficient learning. Techniques like grid search, random search, and Bayesian optimization are employed to systematically explore the hyperparameter space for optimal settings . Optimizing these values ensures that the model is well-tuned to extract meaningful patterns from the data, thereby enhancing generalization and improving predictive performance on unseen data.

Overfitting occurs in Machine Learning when a model learns not only the training data but also the noise and outliers, making it perform well on the training data but poorly on unseen data . It indicates that the model has become too complex and specific to the training dataset. One common solution to overfitting is to implement regularization techniques, such as adding a penalty for larger coefficients in linear models (L1 or L2 regularization). Regularization helps to keep the model complexity in check and ensures that the model generalizes better to new data by preventing it from fitting the noise.

Reinforcement Learning (RL) differs from Unsupervised Learning in its learning approach. In RL, an agent learns to make decisions by taking actions in an environment to maximize cumulative reward without explicit supervision. The learning is based on the feedback from its actions in the form of rewards or penalties . In contrast, Unsupervised Learning involves finding hidden patterns or intrinsic structures in input data without labeled responses. Here, data is not associated with any output labels, and algorithms attempt to learn the underlying structure without any specific signals for success . Thus, RL focuses on sequential decision-making with performance improvements guided by rewards, while Unsupervised Learning focuses on data organization and understanding inherent patterns.

CODE: GR20D5129 
GR 20 
SET 4 
M.Tech Year l SenmesterRegu 
Regular Eaaminations, October/November 2021 
Machine Learning 
(D
CODE:GR20D5129 
SET -4 
GR 20 
Consider the following data and Construct the dec 
ision tree by using iD3 algorithm. 
|10) 
C

You might also like