0% found this document useful (0 votes)
147 views5 pages

Key Machine Learning Questions & Answers

This document contains a question bank for machine learning concepts including: 1. It provides definitions and questions to test understanding of fundamental machine learning concepts like supervised vs unsupervised learning, classification vs regression, and linear vs non-linear models. 2. It also includes questions about specific algorithms like decision trees, naive Bayes, KNN, linear regression, logistic regression, and SVMs. Example questions cover how to implement the algorithms, calculate key metrics, and evaluate their effectiveness. 3. The last part focuses on questions for decision trees, including how to calculate information gain and entropy, detect overfitting, and determine the stopping criteria.

Uploaded by

manisha mudgal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
147 views5 pages

Key Machine Learning Questions & Answers

This document contains a question bank for machine learning concepts including: 1. It provides definitions and questions to test understanding of fundamental machine learning concepts like supervised vs unsupervised learning, classification vs regression, and linear vs non-linear models. 2. It also includes questions about specific algorithms like decision trees, naive Bayes, KNN, linear regression, logistic regression, and SVMs. Example questions cover how to implement the algorithms, calculate key metrics, and evaluate their effectiveness. 3. The last part focuses on questions for decision trees, including how to calculate information gain and entropy, detect overfitting, and determine the stopping criteria.

Uploaded by

manisha mudgal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
  • Question Bank for Section A
  • Questions on Linear and Logistic Regression and SVM
  • Questions on KNN Algorithm
  • Questions on Decision Trees
  • Questions on Naïve Bayes Classifier

QUESTION BANK FOR SECTION A

Basic Ml questions:

a) Define Machine learning? Briefly explain the types of learning.


c) What are the issues in decision tree induction?

[Link]
[Link] and hypothesis space
[Link] space\feature matrix
[Link] gain
[Link] index
[Link] does this change with positive and negative data values of a feature
[Link] curve
[Link] and hard margin in SVM
[Link]
[Link] vector
[Link] pruning
[Link] pruning
[Link]
[Link] probability.

[Link] between
[Link] and unsupervised machine learning
[Link] and regression
[Link] v/s all and one v/s one multiclass classification
[Link] and logistic regression
[Link] learning and traditional programming
[Link] and non linear separable data.
Q. What are the elements of reinforcement learning?

Q. How to classify mixed data?

[Link] short notes on


a) Logistic regression
b) Back propogation algorithm
c) issues in machine Learning
[Link] one is the best supervised machine learning algorithm out of decision tree, naive
bayes,K-NN and SVM for classifying large PDF documents?

Questions on Distance based method/nearest neighbour/knn


[Link] is “K” in KNN algorithm?

2. How do we decide the value of "K" in KNN algorithm?

3. Why is the odd value of “K” preferable in KNN algorithm?


4. What is the difference between Euclidean Distance and Manhattan distance? What is the
formula of Euclidean distance and Manhattan distance?

5. Why is KNN algorithm called Lazy Learner?

6. Why should we not use KNN algorithm for large datasets?

7. What are the advantages and disadvantages of KNN algorithm?


8. A dealer has a warehouse that stores a variety of fruits and vegetables. When fruit is
brought to the warehouse, various types of fruit may be mixed together. The dealer wants a
model that will sort the fruit according to type. Justify with reasons how machine learning
model is efficient compared to feature based classification technique.
[Link] the K nearest neighbor recognition what would be the best distance metric to
implement for a handwritten digit recognizer?
[Link] to combine and code SVM and KNN for image classification?
[Link] can we increase the acuuracy of KNN ?

Questions on linear and logistic regression and SVM.

[Link] linear regression with example.


1. What is a logistic function? What is the range of values of a logistic function?
2. Why is logistic regression very popular?
3. What is the formula for the logistic regression function?
4. How can the probability of a logistic regression model be expressed as conditional
probability?
5. What are the outputs of the logistic model and the logistic function?
[Link] can’t linear regression be used in place of logistic regression for binary classification?
7. Is the decision boundary linear or nonlinear in the case of a logistic regression model?
8. What is the likelihood function?
9. What is the Maximum Likelihood Estimator (MLE)?
10. Why can’t we use Mean Square Error (MSE) as a cost function for logistic regression?
11. Why is accuracy not a good measure for classification problems?
12. Which algorithm is better at handling outliers logistic regression or SVM?
13. How will you deal with the multiclass classification problem using logistic regression?
[Link] to choose the best fit [Link] with example.
[Link] the working of SVM with diagram.
[Link] values of independent variable x and dependent value y are given below:
Find the least square regression line y=ax+b. Estimate the value of y when x is 10.

[Link] is the goal of the support vector machine (SVM)? How to compute the margin
[Link] decision tree to classify students based on their academic [Link] with
example.
[Link] that we want to build a neural network that classifies two dimensional data (i.e., X
= [x1, x2]) into two classes: diamonds and crosses. We have a set of training data that is
plotted as follows:

Draw a network that can solve this classification problem. Justify your choice of the number
of nodes and the architecture. Draw the decision boundary that your network can find on the
diagram.
[Link] is kernel trick in [Link] with the help of [Link] explain the types of
kernel.
[Link] SVM is efficient than logistic regression for classification?
[Link] will you apply SVM to detect credit card fraud?

Questions on Decision trees

[Link] the following dataset for predicting a outcome of a tennis match

Write the formula for information gain for an [Link] information gain for
all [Link] is selected as the root node?

[Link] are entropy and information gain related vis-a-vis decision trees?
[Link] do you calculate the entropy of children nodes after the split based on on a feature?
[Link] overfitting problem in decision trees.
[Link] is the stopping criteria in decision tress?
[Link] is a decision tree ? How a decision tree is constructed explain with example.

Questions on Naïve Bayes classifier

[Link] naïve bayes classifier in context with Bayes theorem


[Link] Bayes theorem.
[Link] is Naive Bayes naive?

Common questions

Powered by AI

Accuracy may not be suitable for imbalanced datasets because it doesn't consider the class distribution. A model might achieve high accuracy by simply predicting the majority class, which can be misleading when the minority class is the true interest of analysis. In such cases, metrics like precision, recall, and the F1-score provide better insights into the model's performance on each class, emphasizing the success on the minority class .

Decision trees can easily overfit the training data by capturing noise and small fluctuations in the dataset. To mitigate this, techniques such as pruning (pre-pruning and post-pruning) are used. Pre-pruning stops the tree's growth early before it becomes too complex, while post-pruning removes branches that have little importance after the tree is fully grown. These techniques help to simplify the model and enhance its generalization to new data .

The kernel trick in SVM is a method that allows the algorithm to operate in a higher-dimensional space without explicitly calculating the coordinates in that space, by using a kernel function. This capability is particularly useful for dealing with nonlinear data. Common kernel types include the linear kernel, polynomial kernel, and radial basis function (RBF) kernel. Each kernel transforms the input space differently, enabling the SVM to create non-linear decision boundaries .

Entropy is a measure of the disorder or impurity in a dataset, with higher values indicating greater disorder. In decision tree construction, information gain is used to evaluate the effectiveness of an attribute in classifying the data. It is calculated by measuring the reduction in entropy after the dataset is split based on the attribute. An optimal split is one that achieves the highest information gain, indicating a more homogeneous division of data post-split, leading to clearer distinctions between nodes .

Mean Square Error (MSE) is not suitable for logistic regression because the predictions are probabilistic and bounded between 0 and 1, which can result in non-convex minimization problems. Instead, the logistic regression model uses the log-likelihood function as its cost function, whereby the Maximum Likelihood Estimation (MLE) technique is employed to find the optimal parameters. This results in a convex optimization problem that can be efficiently solved .

The hyperplane in a Support Vector Machine (SVM) serves as the decision boundary that separates different classes in the feature space. The goal of an SVM is to find the optimal hyperplane that maximizes the margin between the data points of different classes, ensuring that the classification is as robust as possible. This maximization of the margin reduces the model's susceptibility to overfitting and increases its ability to generalize to unseen data .

Support Vector Machines (SVM) generally handle outliers better than logistic regression, primarily because SVM focuses on maximizing the margin around the decision boundary and not all data points, particularly those outside the margin, influence its positioning. Logistic regression, however, assumes a linear relationship with all data points and is sensitive to outliers that can skew the decision boundary away from its optimal position .

Multiclass classification using logistic regression can be approached with 'one vs. all' where a separate binary classifier is trained for each class to determine whether data belongs to that class or not. Alternatively, 'one vs. one' involves training classifiers for every pair of classes. While 'one vs. all' is computationally simpler, 'one vs. one' often provides more accurate results as the classifiers focus on separating only two classes at a time, leading to simpler decision boundaries .

Supervised learning involves training a model on labeled data, where the outcomes are known, allowing the model to learn the mapping from inputs to outputs. Unsupervised learning, in contrast, deals with unlabeled data, where the model tries to identify patterns and structures, such as clustering or association. Distinguishing between these types is crucial because it influences the choice of algorithms and the approach to problem-solving. Supervised learning is suitable for tasks where outcomes are known and precise prediction is required, while unsupervised learning is used for exploratory data analysis and discovering hidden patterns .

Linear regression is used for predicting continuous outcome variables and assumes a linear relationship between features and target. Logistic regression, on the other hand, is used for predicting binary outcomes and leverages the logistic function to estimate probabilities, resulting in outputs bounded between 0 and 1. While linear regression provides direct numerical predictions, logistic regression provides class probabilities, making it essential for classification tasks .

QUESTION BANK FOR SECTION A
Basic Ml questions:
a) Define Machine learning? Briefly explain the types of learning.
c) What ar
1.What is “K” in KNN algorithm?
2. How do we decide the value of "K" in KNN algorithm?
3. Why is the odd value of “K” prefera
11. Why is accuracy not a good measure for classification problems?
12. Which algorithm is better at handling outliers logist
Q.What is kernel trick in SVM.Explain with the help of example.Also explain the types of 
kernel.
Q.Why SVM is efficient than
Q.Why is Naive Bayes naive?

You might also like