Question Bank Introduction to Machine Learning
Unit 1
• Explain the basic concepts of Machine Learning. How does Machine Learning differ from
traditional programming approaches? Illustrate with real-life examples.
• Describe the various types of Machine Learning techniques (Supervised, Unsupervised, Semi-
Supervised, Reinforcement). Compare them with respect to their applications and limitations.
• Discuss the major issues in designing and implementing Machine Learning strategies. How can
challenges such as overfitting, underfitting, and data quality be addressed?
• What is data exploration in Machine Learning? Explain the different types of data and data
attributes with suitable examples.
• Provide a detailed account of statistical description techniques used for summarizing data in
Machine Learning. How do mean, variance, and correlation help in understanding datasets?
• Discuss the role of data visualization in Machine Learning. What are some commonly used
visualization techniques, and how do they aid in data exploration?
• Explain data similarity measures. How do distance-based similarity metrics such as Euclidean
distance, cosine similarity, and Jaccard coefficient differ? Provide examples of their applications.
• What are the different issues in data preprocessing? Explain the importance of handling missing
values, noise, and inconsistencies in a dataset.
• Discuss the methods of data cleaning. Illustrate how missing values, noisy data, and duplicate
records can be effectively managed in Machine Learning projects.
• What is data integration? Describe the challenges faced during the integration of multiple data
sources and explain techniques to resolve conflicts and redundancies.
• Explain data reduction techniques in detail. How do feature selection, dimensionality reduction
(PCA), and data compression contribute to efficient Machine Learning?
• Describe the concept of data transformation. Explain normalization, standardization, and
attribute construction with appropriate examples.
• What is discretization in Machine Learning? Discuss the role of binning, histogram analysis, and
decision tree-based methods in discretizing continuous attributes.
• Data preprocessing is often considered more time-consuming than model building in Machine
Learning. Justify this statement with examples and practical insights.
• Compare and contrast the roles of data exploration, data preprocessing, and data
transformation in the Machine Learning pipeline. Why are these steps crucial before applying
algorithms?
• Case Study Question: Given a dataset with missing values, redundant attributes, and highly
skewed distributions, describe the step-by-step process you would follow for data preprocessing
before applying a classification algorithm.
Unit 2
• Explain the basic concepts of classification and regression in Machine Learning. How do they
differ in terms of problem definition, output variables, and applications? Provide examples.
• What are the key performance metrics for classification models? Discuss accuracy, precision,
recall, F1-score, ROC curves, and confusion matrices with suitable illustrations.
• Discuss how performance of classification models can be visualized. Explain techniques such as
ROC curve, precision-recall curve, and decision boundary plots.
• Explain the working of the k-Nearest Neighbor (k-NN) classifier. What are its strengths and
weaknesses? How does the choice of ‘k’ affect performance?
• Describe the construction and working of a Decision Tree Classifier. How does attribute selection
(using Gini index, Information Gain, or Gain Ratio) impact the final model?
• Explain the Naïve Bayes Classifier with suitable mathematical formulation. Discuss situations
where it performs well despite the “naïve” assumption.
• What is Ensemble Classification? Explain the Random Forest strategy in detail. Compare Random
Forest with single Decision Trees in terms of accuracy, overfitting, and interpretability.
• Discuss the difference between bagging, boosting, and stacking in Ensemble Classification.
Provide examples of algorithms implementing each strategy.
• Explain linear regression with mathematical formulation. How are regression coefficients
estimated using the least squares method? Discuss assumptions and limitations of linear
regression.
• What are nonlinear regression methods? Provide examples and explain how they differ from
linear regression in modeling complex relationships.
• Compare and contrast linear and nonlinear regression methods. Under what conditions should
one prefer nonlinear regression over linear regression? Provide examples.
• Discuss the role of performance analysis in regression tasks. Explain evaluation metrics such as
R², Mean Squared Error (MSE), Root Mean Squared Error (RMSE), and Mean Absolute Error
(MAE).
• Case Study Question: Given a dataset on house prices, explain step-by-step how you would
apply classification (e.g., predicting price category as “Low”, “Medium”, “High”) and regression
(predicting actual house price). Highlight the challenges in both approaches.
Unit 3
• Explain the basic concepts of clustering in Machine Learning. How does clustering differ from
classification? Provide suitable real-world applications of clustering.
• What are partitioning methods in clustering? Explain the working of the k-Means algorithm with
a step-by-step example. Discuss its limitations.
• Discuss the k-Medoids clustering method. Compare it with k-Means in terms of performance,
robustness to noise, and computational complexity.
• Explain hierarchical clustering methods. How do agglomerative and divisive hierarchical
clustering differ in terms of strategy, computation, and outcomes? Illustrate with examples.
• Discuss the advantages and disadvantages of hierarchical clustering compared to partitioning
clustering methods. In what scenarios is hierarchical clustering more appropriate?
• What are density-based clustering methods? Explain the working principle of DBSCAN. How
does it handle clusters of arbitrary shape and noise better than k-Means?
• Explain the key parameters of DBSCAN (ε and MinPts). How do these parameters influence the
identification of core points, border points, and noise points?
• Discuss the concept of association rule mining. Explain the basic measures of rule
interestingness such as support, confidence, and lift with suitable examples.
• What are the major challenges in association rule mining? Explain redundancy, scalability, and
interpretability issues with real-life examples.
• Define outlier analysis. Discuss various statistical, distance-based, and density-based methods
for detecting outliers. Provide practical use cases of outlier detection.
• Explain how clustering methods (like k-Means and DBSCAN) can be extended or modified for
outlier detection. Give examples where clustering naturally reveals outliers.
• Case Study Question: Given a retail dataset of customer transactions, explain how you would
apply clustering, association rule mining, and outlier analysis to extract useful knowledge for
decision-making.
Unit 4
• Explain the basic concepts of frequent pattern mining. Why is it important in knowledge
discovery and data mining? Provide examples from real-world applications.
• Discuss association analysis in detail. What are support, confidence, and lift? Explain how these
measures help in generating and evaluating association rules.
• Explain the working of the Apriori algorithm for frequent itemset mining. Provide a step-by-step
example showing candidate generation and pruning.
• What are the limitations of the Apriori algorithm? Discuss the strategies and improvements used
to overcome these limitations.
• Compare and contrast frequent itemset mining methods: Apriori, FP-Growth, and Eclat.
Highlight their strengths, weaknesses, and use cases.
• What is data stream mining? Explain the challenges of mining frequent patterns in data streams
compared to static datasets.
• Discuss methods for mining time-series data. How do trend analysis, periodicity detection, and
sequence mining help in extracting meaningful patterns?
• Critically analyze the role of frequent pattern mining in modern applications such as
recommendation systems, fraud detection, and web usage mining. Provide examples to support
your arguments.
• Explain the importance of incremental and online algorithms in mining frequent patterns from
data streams. How do they differ from traditional batch algorithms like Apriori, and what
challenges do they address?
• Case Study Question: Assume you are analyzing a supermarket’s transaction database. Explain
how frequent pattern mining and association rule mining can be applied to identify customer
buying habits. How would this differ if the data came from a continuous data stream (e.g., online
purchases)?
Unit 5
• Explain the basic working principle of Support Vector Machines (SVM). How does the concept
of hyperplanes and support vectors help in classification problems?
• Discuss the importance of kernel functions in SVM. Compare linear, polynomial, and radial basis
function (RBF) kernels with examples of their applications.
• Explain the advantages and limitations of SVM compared to traditional classifiers such as k-NN,
Decision Trees, and Naïve Bayes. Provide real-life scenarios where SVM is most suitable.
• What are Artificial Neural Networks (ANN)? Explain the structure of a basic feed-forward neural
network, including input layer, hidden layers, and output layer.
• Discuss the process of training an ANN using backpropagation. How do activation functions
(sigmoid, ReLU, tanh) influence the performance of neural networks?
• Compare and contrast SVM and ANN in terms of architecture, learning process, computational
complexity, and applications. In what situations would you prefer one over the other?
• Describe practical applications of SVM and ANN in real-world machine learning problems such
as image recognition, natural language processing, and medical diagnosis.
• What are the key challenges in applying neural networks? Discuss issues such as overfitting,
vanishing gradient, and high computational cost, along with possible solutions.
• Explain how advanced concepts like SVM and ANN have contributed to modern AI applications.
Provide case studies or examples from industries such as healthcare, finance, and autonomous
systems.
• Case Study Question: A dataset of handwritten digits is given. Explain step-by-step how you
would design and implement both an SVM and an ANN model to solve this classification
problem. Compare their expected performance and challenges.