0% found this document useful (0 votes)
83 views5 pages

Machine Learning - Question

Machine Learning QuestionBank

Uploaded by

Kaustubh Desale
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
83 views5 pages

Machine Learning - Question

Machine Learning QuestionBank

Uploaded by

Kaustubh Desale
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Machine Learning

Module 1 :
1. Explain any five business applications of Machine learning(5/20)
2. Write a short note on issues in Machine Learning.(10/24)
3. Consider the use case of Email spam detection. Identify and explain the suitable
machine learning technique for this task.(10/24)
4. How to choose the right ML algorithm? (5/23)
5. Explain any five applications of Machine Learning. (10/23)
6. Explain how to choose the right algorithm for machine learning application.(10/23)
7. Explain the terms overfitting, underfitting, bias & variance tradeoff w.r.t. Machine
Learning.(10/23)
8. What are the issues in Machine learning?(5/22)
9. Explain the steps of developing Machine Learning applications.(10/22)
10. Define Machine Learning and Explain with example importance of Machine
Learning(5/19)

Module 2:
1. Explain performance evaluation metrics for binary classification with suitable
example.(5/24)
2. Explain Gini index along with an example.(5/24)
3. Consider the example below where the mass, 𝑦 (grams), of a chemical is related to the
time, 𝑥 (seconds), for which the chemical reaction has been taking place according to
the table. Find the equation of the regression line. Also explain performance
evaluation measures for regression.

(10/24)
4. Explain Clustering with minimal spanning tree along with [Link] the
dataset given below with 3 features Color, Wig, Num. Ears and one output variable
Emotion.

● Find root node of decision tree using GINI index.


● Explain techniques can be used to handle over fitting in decision trees?(10/24)
5. Explain Regression line, Scatter plot, Error in prediction and Best fitting line. (5/23)
6. Explain the concept of Logistic Regression (5/23)
7. Explain Multivariate Linear regression method. (10/23)
8. Create a decision tree using Gini Index to classify following dataset for profit.
Find SVD for A = (10/23)

9. Linear Regression (10/23)


10. Explain any five performance measures along with example.(5/23)
11. Differentiate between Logistic regression and Support vector machine.(5/23)
12. Explain the following Receiver operating characteristics curve and Area under
curve.(10/23)
13. Explain the concept of regression and enlist its types. A clinical trial gave the data for
BMI and Cholesterol level for 10 Patients as shown in table below, Identify the
machine learning method used to solve the above problem and predict the likely value
of Cholesterol level for someone who has BMI of 27.(10/23)

14. Explain the concept of decision tree. Consider the dataset given in a table below. The
dataset has 3 features as Past Trend, Open interest. Trading volume and one class
label as Return. Compute the Gini Index for all features and specify which node will
be chosen as a root node in decision tree.(10/23)

15. Explain Regression line, Scatter plot, Error in prediction and Best fitting line(5/22)
16. Explain Logistic Regression(5/22)
17. Explain Linear regression along with an example. (10/22)

18. (10/22)
19. Performance Metrics for Classification (10/22)
20. List some advantages of derivative-based optimization techniques. Explain Steepest
Descent method for optimization(10/19)
21. Explain various basic evaluation measures of supervised learning Algorithm for
Classification.(10/19)
22. Consider following table for binary classification. Calculate the root of the decision
tree using Gini index.(10/19)

23. Logistic Regression(5/19)


Module 3:
1. Explain the concept of k fold cross validation.(5/24)
2. Compare Bagging and Boosting with reference to ensemble learning. Explain how
these methods help to improve the performance of the machine learning model(10/24)
3. Explain Ensemble learning algorithm Random Forest and its use cases in real world
applications.(10/24)
4. Explain the Random Forest algorithm in detail. (10/23)
5. Explain the concept of bagging and boosting.(10/23)
6. Explain the necessity of cross validation in Machine learning applications and K-fold
cross validation in detail.(10/23)
7. Explain different ways to combine classifiers.(10/23)
8. Explain the Random Forest algorithm in detail.(10/22)
9. Explain the different ways to combine the classifiers.(10/22)

Module 4:
1. Define following terminologies with reference to Support vector machine: Hyper
plane, Support Vectors, Hard Margin, Soft Margin, Kernel (10/24)
2. Describe Multiclass classification.(10/23)
3. Explain support vector machine as a constrained optimization problem.(10/23)
4. Explain kernel Trick in support vector machine.(10/24)
5. Explain multiclass classification techniques.(10/23)
6. Explain the concept of margin and support vector(5/22)
7. Describe Multiclass classification.(10/22)
8. Why is SVM more accurate than logistic regression?(5/19)
9. Explain Radial Basis Function with example.(5/19)
10. Define Support Vector Machine. Explain how margin is computed and optimal
hyper-plane is decided.(10/19)

Module 5:
1. What is Density based clustering? Explain the steps used for clustering task using
Density-Based Spatial Clustering of Applications with Noise (DBSCAN)
algorithm.(10/24)
2. Explain K-means algorithm. (5/23)
3. DBSCAN (10/23)
4. Explain clustering with minimal spanning tree with reference to Graph based
clustering.(10/23)
5. Explain the concept of Expectation Maximization Algorithm. (10/23)
6. Explain DBSCAN algorithm along with example(10/23)
7. Explain the distance metrics used in clustering. (5/22)
8. Explain EM algorithm. (10/22)
9. DBSCAN(10/22)
10. EM Algorithm(5/19)
Module 6:
1. What is dimensionality reduction? Explain how it can be utilized for classification and
clustering task in Machine learning.(5/24)
2. Explain the Dimensionality reduction technique Linear Discriminant Analysis and its
real-world applications.(10/24)
3. Explain the concept of feature selection and extraction.(5/23)
4. Linear Discriminant Analysis for Dimension Reduction (10/23)
5. Explain Linear Discriminant Analysis.(5/23)
6. Explain in detail Principal Component Analysis for Dimensionality reduction(10/23)
7. Compute the Linear Discriminant projection for the following two-dimensional
dataset. X1= (x1, x2) = {(4,1), (2,4), (2,3), (3,6), (4,4)} and X2= (x1, x2) = {(9,10),
(6,8), (9,5), (8,7), (10,8)}(10/22)
8. Principal Component Analysis for Dimension Reduction (10/22)
9. What is Dimensionality reduction? Describe how Principal Component Analysis is
carried out to reduce dimensionality of data sets.(10/19)

10. Find the singular value decomposition of (10/19)

Common questions

Powered by AI

Dimensionality reduction is crucial for enhancing model performance by reducing the number of input variables, mitigating the curse of dimensionality, and improving interpretability without significant loss of information. Techniques like Principal Component Analysis (PCA) and Linear Discriminant Analysis (LDA) help uncover latent structures, leading to more manageable and computationally efficient models. This often results in faster training times, less overfitting, and better generalization to new data .

Logistic Regression is preferable when interpretability is key, as it provides direct insights into feature contributions through coefficients. It is effective for binary classification with linear boundaries and is computationally less intensive when large datasets are involved. Conversely, SVM is better suited for non-linear and high-dimensional data due to its use of kernel functions. Logistic Regression is also preferred when the problem is less sensitive to outliers .

ROC curve plots the true positive rate against the false positive rate for various threshold settings, visualizing classifier performance across thresholds. AUC quantifies the entire two-dimensional area underneath the ROC curve, summarizing the model's ability to distinguish between classes. A higher AUC indicates better model performance, with a value of 0.5 representing no discriminative power. Both metrics help in model selection by comparing the diagnostic ability of different classifiers .

Binary classification performance metrics include precision, recall, F1-score, and accuracy, focusing on correctly distinguishing between two classes. Metrics such as the Receiver Operating Characteristics (ROC) curve and Area Under Curve (AUC) provide insights into a model's ability to avoid false positives and negatives. In regression tasks, metrics such as Mean Absolute Error (MAE), Mean Squared Error (MSE), and R-squared measure how well a model predicts continuous values by quantifying prediction errors and variance explained by the model .

SVM is formulated as a constrained optimization problem where the goal is to find the optimal hyperplane that maximizes the margin between different classes in a feature space. This involves solving for a hyperplane that separates data points with the widest margin, subject to the constraints that data points are classified correctly. This is achieved by minimizing classification error and maximizing the geometric margin, often using Lagrange multipliers and kernel functions to handle linear and non-linear relationships .

Overfitting in decision trees occurs when the model becomes too complex and captures noise along with the actual data patterns. Techniques to address this include pruning, which removes sections of the tree that provide little predictive power, setting a maximum depth for the tree, and using ensemble methods like Random Forests to average out individual tree overfitting tendencies .

The Gini Index measures the impurity or purity of a dataset split. It quantifies how well a decision tree split can separate different classes. The feature with the lowest Gini Index after a split is selected as the root node because it best differentiates the data into the desired classes. This helps optimize tree structure, making decisions more efficient and improving prediction accuracy .

Bagging, or Bootstrap Aggregating, creates multiple versions of a model using subsets of data sampled with replacement and averages results to improve stability and accuracy. Boosting focuses on creating a sequence of models that fix errors made by previous models, emphasizing hard-to-learn instances with each iteration. Both methods enhance accuracy by reducing variance (bagging) and bias (boosting), thus improving generalization in models .

Selecting a suitable machine learning algorithm involves understanding the problem type (classification, regression, clustering, etc.), the data characteristics, and computational efficiency. One must consider the size and nature of the dataset, the desired interpretability of the model, potential overfitting concerns, and computational resources. Comparing performance metrics across different models using validation techniques like cross-validation can also help in choosing the right algorithm .

Machine learning models face several issues that impact their deployment, such as overfitting, where a model captures noise along with the underlying pattern in the data; underfitting, where the model is too simple to capture the underlying pattern; the bias-variance tradeoff, which involves balancing model complexity and prediction accuracy; data quality and quantity issues, as insufficient or poor-quality data can lead to inaccurate models; and computational complexity, which affects model scalability and real-time processing .

You might also like