0% found this document useful (0 votes)
6 views22 pages

ML Key

The document outlines the examination structure for the Machine Learning course, including various topics and questions for both Part-A and Part-B, covering definitions, scenarios, algorithms, and concepts related to machine learning. It includes questions on unsupervised learning, decision trees, clustering techniques, and the significance of different models like Naïve Bayes and Hidden Markov models. Additionally, it discusses practical applications and theoretical underpinnings of machine learning methodologies.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views22 pages

ML Key

The document outlines the examination structure for the Machine Learning course, including various topics and questions for both Part-A and Part-B, covering definitions, scenarios, algorithms, and concepts related to machine learning. It includes questions on unsupervised learning, decision trees, clustering techniques, and the significance of different models like Naïve Bayes and Hidden Markov models. Additionally, it discusses practical applications and theoretical underpinnings of machine learning methodologies.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

Course code : 20CS41002

IV [Link](CSE/IoT/IT) I Semester (Supplementary) Examinations May 2024

Machine Learning (Scheme of Evalaution)

Part-A 10x2M =20M

1 . a. Define machine learning 1M+1M


b. Mention four scenarios where unsupervised learning is preferred 1M +1M
c. List out various purity metrics in decision tree algorithm 2M
d. What do you mean by a support vector 1M +1M
e. Specify three limitations if K-means clustering 1+1
f. Give two scenarios where spectral clustering is applicable 1M+ 1M
g. Highlight assumptions made by Naïve Bayes classifier 1M+1M
h. State various characteristics of Markov process 2M
i. What is the main objective of representation learning 2M
j. Justify the need for active learning 2M
Part-B
5x10M =50M
2 a. Explain different forms of machine learning with examples. Specify two cases
Where machine learning is not applicable 2M+2M+1M
b. Differentiate between Overfitting and Underfitting in machine learning with example.
2.5M +2.5M
OR
3 a. Explain the need for training a machine learning [Link] precautions are to be followed
when selecting train and test datasets for learning
b. Briefly discuss about Bias-Variance Tradeoff and its significance 2M+ 2M

4 a. Explain various linear models for classification with suitable examples 2.5M +2.5M
b. Logistic regression can be used to achieve binary classification. justify the statement.
2M + 3M
OR
5 a. Consider the training dataset shown in Table [Link] how a decision tree discretizes the
continues attribute percentage for three construction. 10M

Table 1 Sample dataset.

S.n Percentage Awared


o
1 95 YES
2 80 YES
3 72 NO
4 65 YES
5 95 YES
6 32 NO
7 66 NO
8 54 NO
9 89 YES
10 72 YES
6 a. Compare hierarchical and spectral clustering techniques 2M +3M
b. What do you mean by Curse of dimensionality? Why it is a major concern in machine
learning 2M+3M
OR
7 a. Explain the need for principal component analysis and how it is achieved mathematically.
1.5M+1.5M
b. Consider the data points (2,6) and (1,7). Apply principal component analysis and find the
transformed data. 3M+3M+1M
8 a. How do we model uncertainty using Bayesian networks 2M+3M
b. Consider the training dataset shown in Table.2. Apply Naïve Bayes classifier to predict
whether the student gets job offer or not in this final year of the course when the test data is
(CGPA=8.5, Interactive =Yes) 3M+2M

OR
9 a. What is Markov chain ? Compare and contrast Markov model and Hidden Markov
probalistic graphical models 3M+2M
b. Discuss three fundamental problems that can be solved using Hidden Markov model
2.5M+2.5M
10 a. Disucss Generation and Recognition in representation learning 3M+2M
b. What is active learning ? Explain its heuristics 2M+3M

OR
11 a. Justify the below statement with suitable explanation.
Ensemble learning can be achieved using Bootstrap Aggregation and Boosting 2M+2M+1M
b. What do you understand by deep learning ? Highlight the need for deep learning and
situation where deep learning is preferred to machine learning 2.5M+2.5M

Geethanjali college of Engineering and Technology (Autonomous),Hyderabad

IV [Link](CSE/IoT/IT) I Semester (Supplementary) Examinations May 2024

1.a. Define machine learning CO1 BTL1

b. Mention four scenarios where unsupervised learning is preferred CO1 BTL1

c. List out various purity metrics in decision tree algorithm CO2 BTL2
.

d. What do you mean by a support vector CO2 BTL1


e. Specify three limitations if K-means clustering CO3 BTL2

f. Give two scenarios where spectral clustering is applicable CO3 BTL2

g. Highlight assumptions made by Naïve Bayes classifier CO4 BTL1

h. State various characteristics of Markov process CO4 BTL1

i. What is the main objective of representation learning CO5 BTL1

j. Justify the need for active learning CO5 BTL2

PART-B

2. a. Explain different forms of machine learning with examples. Specify two cases
Where machine learning is not applicable 7M CO1 BTL3

2 b. Differentiate between Overfitting and Underfitting in machine learning with example.


3M CO1
BTL4

OR

3 a Explain the need for training a machine learning [Link] precautions are to be
followed when selecting train and test datasets for learning 5M CO1 BTL2

3 b. Briefly discuss about Bias-Variance Tradeoff and its significance 4M CO1 BTL1

4 a. Explain various linear models for classification with suitable examples


5M CO2
BTL3
4 b. Logistic regression can be used to achieve binary classification. justify the statement.
5M CO2
BTL4

OR

5 a. Consider the training dataset shown in Table [Link] how a decision tree discretizes
the continues attribute percentage for three construction. 10M CO2 BTL5

Table 1 Sample dataset.

[Link] Percentage Awared


1 95 YES
2 80 YES
3 72 NO
4 65 YES
5 95 YES
6 32 NO
7 66 NO
8 54 NO
9 89 YES
10 72 YES

6 a. Compare hierarchical and spectral clustering techniques 5M CO3 BTL2

6 b. What do you mean by Curse of dimensionality? Why it is a major concern in machine


learning 5M CO3 BTL3

OR
7 a. Explain the need for principal component analysis and how it is achieved
mathematically. 4M CO3 BTL3

7 b. Consider the data points (2,6) and (1,7). Apply principal component analysis and find
the transformed data. 6M CO3 BTL5

8 a. How do we model uncertainty using Bayesian networks 6M CO4 BTL4

b. Consider the training dataset shown in Table.2. Apply Naïve Bayes classifier to predict
whether the student gets job offer or not in this final year of the course when the test data
is (CGPA=8.5, Interactive =Yes) 3M+2M
OR
9 a. What is Markov chain ? Compare and contrast Markov model and Hidden Markov
probalistic graphical models 3M+2M
b. Discuss three fundamental problems that can be solved using Hidden Markov model
2.5M+2.5M
10 a. Disucss Generation and Recognition in representation learning 3M+2M
b. What is active learning ? Explain its heuristics 2M+3M

OR
11 a. Justify the below statement with suitable explanation.
Ensemble learning can be achieved using Bootstrap Aggregation and Boosting
2M+2M+1M
b. What do you understand by deep learning ? Highlight the need for deep learning and
situation where deep learning is preferred to machine learning
2.5M+2.5M

b. What is random forest model with an example 5M CO2 BTL1

Random Forest is a classifier that contains a number of decision trees on various subsets of the given
dataset and takes the average to improve the predictive accuracy of that dataset." Instead of relying
on one decision tree, the random forest takes the prediction from each tree and based on the majority
votes of predictions, and it predicts the final [Link] greater number of trees in the forest leads to
higher accuracy and prevents the problem of overfitting.
2M

Example: Suppose there is a dataset that contains multiple fruit images. So, this dataset is given to
the Random forest classifier. The dataset is divided into subsets and given to each decision tree.
During the training phase, each decision tree produces a prediction result, and when a new data point
occurs, then based on the majority of results, the Random Forest classifier predicts the final decision.
Consider the below image:

3M

6 a. Which type of Hierarchical Clustering is most commonly used 5M CO3 BTL4


The agglomerative hierarchical clustering algorithm is a popular example of HCA. To group the
datasets into clusters, it follows the bottom-up approach. It means, this algorithm considers
each dataset as a single cluster at the beginning, and then start combining the closest pair of
clusters together. It does this until all the clusters are merged into a single cluster that contains all
the datasets. 2M

This hierarchy of clusters is represented in the form of the dendrogram.

Step-1: Create each data point as a single cluster. Let's say there are N data points, so the number
of clusters will also be N.

Step-2: Take two closest data points or clusters and merge them to form one cluster. So, there
will now be N-1 clusters.

Step-3: Again, take the two closest clusters and merge them together to form one cluster. There
will be N-2 clusters

Step-4: Repeat Step 3 until only one cluster left. So, we will get the following clusters. Consider
the below images:

Step-5: Once all the clusters are combined into one big cluster, develop the dendrogram to
divide the clusters as per the problem. 3M

b. What is Latent Dirichlet Allocation (LDA) used for ? 5M CO3 BTL1

A tool and technique for Topic Modeling, Latent Dirichlet Allocation (LDA) classifies or
categorizes the text into a document and the words per topic, these are modeled based on the
Dirichlet distributions and processes.

The LDA makes two key assumptions:

1. Documents are a mixture of topics, and

2. Topics are a mixture of tokens (or words) 2M

We know the first step with the text data is to clean, preprocess and tokenize the text to words.

After preprocessing the documents, we get the following document word matrix where:

 D1, D2, D3, D4, and D5 are the five documents, and

 the words are represented by the Ws, say there are 8 unique words from W1, to W8.
So, now the corpus is mainly the above-preprocessed document-word matrix, in which every row

is a document and every column is the tokens or the words.

LDA converts this document-word matrix into two other matrices: Document Term matrix and

Topic Word matrix as shown below:

3M

OR

7 a. What is the purpose of spectral clustering 3M CO3 BTL2

Spectral Clustering is a variant of the clustering algorithm that uses the connectivity between
the data points to form the clustering. It uses eigenvalues and eigenvectors of the data matrix to
forecast the data into lower dimensions space to cluster the data points. It is based on the idea
of a graph representation of data where the data point are represented as nodes and the
similarity between the data points are represented by an edge. 1.5M
Spectral clustering has its application in many areas which includes: image segmentation,
educational data mining, entity resolution, speech separation, spectral clustering of protein
sequences, text image segmentation. Though spectral clustering is a technique based on graph
theory, the approach is used to identify communities of vertices in a graph based on the edges
connecting them. This method is flexible and allows us to cluster non-graph data as well either
with or without the original data. 1.5M

b. Given data ={2,3,4,5,6,7 ; 1,5,3,6,7,8}.Compute the principal component using PCA


Algorithm 7M CO3 BTL5

Solved example:

Given data = { 2, 3, 4, 5, 6, 7 ; 1, 5, 3, 6, 7, 8 }.
Consider the two dimensional patterns (2, 1), (3, 5), (4, 3), (5, 6), (6, 7), (7, 8).
Compute the principal component using PCA Algorithm.
Step-01:

Get data.
The given feature vectors are-
x1 = (2, 1)
x2 = (3, 5)
x3 = (4, 3)
x4 = (5, 6)
x5 = (6, 7)
x6 = (7, 8
3M
3M

1M

8 a. Explain Naïve Bayes Classifier with Example ? 5M CO4 BTL3

Bayes' theorem is also known as Bayes' Rule or Bayes' law, which is used to determine the
probability of a hypothesis with prior knowledge. It depends on the conditional probability.
2M
3M

b. How can we learn the values for the HMMs parameters A and B given some data

5M CO4 BTL4

Hidden Markov models (HMMs) are a type of statistical modeling that has been used for several
years. They have been applied in different fields such as medicine, computer science, and data
science. The Hidden Markov model (HMM) is the foundation of many modern-day data science
algorithms. It has been used in data science to make efficient use of observations for successful
predictions or decision-making processes. This blog post will cover hidden Markov models with
real-world examples and important concepts related to hidden Markov models.
3M
2M

OR

9 a Why do we need Conditional Independence ? 5M CO4 BTL4

In probability theory, two events A and B are conditionally independent given a third event C, if
the occurrence or nonoccurrence of A and the occurrence or nonoccurrence of B are independent
events in their conditional probability distribution given C. In the standard notation of probability
theory, A and B are conditionally independent given C if and only if

3M

In other words, A and B are conditionally independent if and only if, given knowledge of
whether C occurs, knowledge of whether A occurs provides no information on the likelihood of
B occurring, and knowledge of whether B occurs provides no information on the likelihood of A
occurring.

The concept of conditional independence is essential to graph-based theories of statistical


inference, as it establishes a mathematical relation between a collection of conditional statements
and a graphoid. 2M

b. What is significance of Markov Model and limitations of Markov Models

5M CO4 BTL5

These are the simplest type of Markov model and are used to represent systems where all states
are observable. Markov chains show all possible states, and between states, they show the
transition rate, which is the probability of moving from one state to another per unit of time.
Applications of this type of model include prediction of market crashes, speech recognition and
search engine algorithms. 2.5M

Limitations:

Markov Chain models, although they often work fine, have some limitations about their use. One
of the main problems is that they become very complicated when more states and more
interactions among states are included. This complexity becomes particularly problematic in
presence of time-dependent probabilities 2.5M

10 a. Explain about Representation Learning 5M CO5 BTL2

Representation learning is a class of machine learning approaches that allow a system to discover
the representations required for feature detection or classification from raw data. The
requirement for manual feature engineering is reduced by allowing a machine to learn the
features and apply them to a given activity.

In representation learning, data is sent into the machine, and it learns the representation on its
own. It is a way of determining a data representation of the features, the distance function, and
the similarity function that determines how the predictive model will perform. Representation
learning works by reducing high-dimensional data to low-dimensional data, making it easier to
discover patterns and anomalies while also providing a better understanding of the data’s overall
behaviour. 3M

Basically, Machine learning tasks such as classification frequently demand input that is
mathematically and computationally convenient to process, which motivates representation
learning. Real-world data, such as photos, video, and sensor data, has resisted attempts to define
certain qualities algorithmically. An approach is to examine the data for such traits or
representations rather than depending on explicit techniques. 2M

b. Explain about Gradient Boosting Machines 5M CO5 BTL2

Gradient Boosting Machine (GBM) is one of the most popular forward learning ensemble
methods in machine learning. It is a powerful technique for building predictive models for
regression and classification tasks.

GBM helps us to get a predictive model in form of an ensemble of weak prediction models such
as decision trees. Whenever a decision tree performs as a weak learner then the resulting
algorithm is called gradient-boosted trees. 2M

It enables us to combine the predictions from various learner models and build a final predictive
model having the correct prediction.

3M

11 a. Explain about Ensemble Learning and Benefits of Ensemble Learning 5M CO5 BTL2

Ensemble learning combines the mapping functions learned by different classifiers to generate an
aggregated mapping function. The diverse methods proposed over the years use different strategies for
computing this combination.

Bagging : The Bagging ensemble technique is the acronym for “bootstrap aggregating” and is one of
the earliest ensemble methods proposed.

For this method, subsamples from a dataset are created and they are called “bootstrap sampling.” To
put it simply, random subsets of a dataset are created using replacement, meaning that the same data
point may be present in several subsets.
These subsets are now treated as independent datasets, on which several Machine Learning models will
be fit. During test time, the predictions from all such models trained on different subsets of the same
data are accounted for.

2M
[Link]:

The boosting ensemble mechanism works in a way markedly different from the bagging mechanism.
Here, instead of parallel processing of data, sequential processing of the dataset occurs. The first
classifier is fed with the entire dataset, and the predictions are analyzed. The instances where Classifier-
1 fails to produce correct predictions (that are samples near the decision boundary of the feature space)
are fed to the second classifier. This is done so that Classifier-2 can specifically focus on the
problematic areas of feature space and learn an appropriate decision boundary. Similarly, further steps
of the same idea are employed, and then the ensemble of all these previous classifiers is computed to
make the final prediction on the test data.
2M

Benefits:
 Performance: An ensemble can make better predictions and achieve better performance than any
single contributing model.
 Robustness: An ensemble reduces the spread or dispersion of the predictions and model
performance.

1M

b Explain about Neural Networks and Deep Learning 5M CO5 BTL2

Neural Networks : The network starts with an input layer that receives input in data [Link]
lines connected to the hidden layers are called weights, and they add up on the hidden layers.
Each dot in the hidden layer processes the inputs, and it puts an output into the next hidden layer
and, lastly, into the output layer.
Looking at the above two images, you can observe how an ANN replicates a biological neuron.

 Input to a neuron - input layer

 Neuron - hidden layer

 Output to the next neuron - output layer

A neural network is a system of hardware or software patterned after the operation of neurons in
the human brain. Neural networks, also called artificial neural networks, are a means of
achieving deep learning. 2.5M

Deep Learning:

Deep learning is a machine learning subset that makes computers do what comes naturally to
humans: learn by [Link] get trained with images as examples, a process very
different from hardwiring a computer program to recognize something and learn. You don't
control how it knows; you control the aspects that go into it. The computer identifies the object
based on the images fed earlier. Scientists built a synthetic form of a biological neuron that
powers any deep learning-based machine. 2.5M

You might also like