[Link] is Supervised Machine learning?
Supervised Machine Learning is a type of machine learning where the model is trained using a
labeled dataset.
👉 Key idea:
● The data contains both inputs (features) and their correct outputs (labels/targets).
● The algorithm learns the relationship between input and output so it can predict the
output for new, unseen inputs.
Steps in Supervised Learning:
1. Collect Data – Gather labeled training data.
Example: Emails labeled as spam or not spam.
2. Train Model – Provide input features (X) and correct output labels (Y).
3. Learning – Algorithm finds patterns and relationships.
4. Prediction – Model predicts outputs for new inputs.
5. Evaluation – Accuracy, precision, recall, etc. are used to measure performance.
Types of Supervised Learning:
1. Classification – Output is a category/class.
Example: Predicting whether a tumor is benign or malignant.
2. Regression – Output is a continuous value.
Example: Predicting house prices based on features like size, location, etc.
Example:
● Training Data:
Hours Marks Scored
Studied
2 30
4 60
6 90
○
Model learns the relationship between hours studied (input) and marks scored
(output).
○ Then predicts marks for new inputs (e.g., 5 hours).
[Link] out the data types in Machine Learning.
In Machine Learning, data is the foundation. Different types of data are used depending on the
problem.
1. Numerical Data (Quantitative)
● Represents numbers.
● Can be measured or counted.
● Types:
👉
○ Continuous: Values can take any number within a range.
Example: Height, weight, temperature.
👉
○ Discrete: Whole numbers/counts.
Example: Number of students, cars, clicks.
2. Categorical Data (Qualitative)
● Represents categories or labels.
● Types:
👉
○ Nominal: Categories without order.
Example: Gender (Male/Female), Colors (Red/Blue/Green).
👉
○ Ordinal: Categories with a natural order.
Example: Movie ratings (Poor, Average, Good, Excellent), Education levels.
3. Binary Data
👉
● Special case of categorical data with only two values.
Example: Yes/No, 0/1, Spam/Not Spam.
4. Text Data
👉
● Words, sentences, or documents.
Example: Reviews, Tweets, Emails.
5. Time-Series Data
👉
● Data collected over time intervals.
Example: Stock prices, weather records, sensor readings.
6. Image, Audio, and Video Data
👉
● Unstructured data used in computer vision, speech, and multimedia tasks.
Example: Face images, speech recordings, CCTV video.
3. What is Machine learning? Draw the ML Life cycle
Machine Learning (ML) is an application of Artificial Intelligence (AI) that gives systems the
ability to learn from data and improve automatically with experience without being explicitly
programmed.
It focuses on developing computer programs that can access data, find patterns, and make
predictions or decisions.
Machine Learning Life Cycle (7 Steps):
1. Gathering Data:
○ Collect data from files, databases, internet, or devices.
○ The quantity and quality of data directly affect the model’s accuracy.
2. Data Preparation:
○ Organize and preprocess collected data.
○ Randomize the data and prepare it for training.
3. Data Wrangling:
○ Clean the data by removing missing values, duplicates, and noise.
○ Convert raw data into a usable format for analysis.
4. Data Analysis:
○ Apply analytical techniques and build models.
○ Choose ML methods like Classification, Regression, Clustering, etc.
5. Train the Model:
○ Use training datasets and algorithms.
○ The model learns rules, patterns, and relationships from data.
6. Test the Model:
○ Evaluate the model using test datasets.
○ Check accuracy and performance before real-world use.
7. Deployment:
○ Deploy the trained model into real-world applications.
○ Used for making predictions and decisions.
[Link] the roles of epsilon (ε) and MinPts.
Roles of ε (Epsilon) and MinPts in DBSCAN
1. Epsilon (ε):
○ It defines the radius of the neighborhood around a data point.
○ In simple terms, it tells how close points should be to each other to be considered
as neighbors.
○ A small ε → creates more, smaller clusters.
○ A large ε → may merge clusters into a big one.
2. MinPts (Minimum Points):
○ It is the minimum number of points required to form a dense region (cluster)
within the ε-neighborhood.
○ If the number of points in the ε-radius ≥ MinPts → the point is a core point.
○ If less than MinPts → the point is considered noise or border point.
[Link] the different Types of Regression in Machine Learning.
Here’s a neat, exam-ready answer 👇
Types of Regression in Machine Learning
Regression is a supervised learning technique used to predict a continuous output.
The main types are:
1. Linear Regression – Shows relationship between independent (X) and dependent (Y)
👉
variables with a straight line.
Example: Predicting house prices based on size.
2. Multiple Linear Regression – Uses two or more independent variables to predict the
👉
output.
Example: Predicting salary using experience, education, and age.
👉
3. Polynomial Regression – Fits data using a polynomial curve (non-linear relation).
Example: Growth rate prediction.
4. Ridge Regression (L2 Regularization) – Prevents overfitting by adding a penalty term
to coefficients.
5. Lasso Regression (L1 Regularization) – Similar to ridge but can shrink some
coefficients to zero (feature selection).
6. Elastic Net Regression – Combination of Lasso and Ridge regression.
👉
7. Logistic Regression – Used for classification problems (output is categorical).
Example: Predicting whether an email is spam or not spam.
[Link] term 1. Underfitting 2. Overfitting 3. Bias 4. Variance
Perfect 👍 Here’s a short and clear exam-ready answer for your 5-mark style question.
Definitions
1. Underfitting:
○ A situation where the ML model is too simple to capture the patterns in data.
👉
○ It performs poorly on both training and testing data.
Example: Using a straight line to fit highly curved data.
2. Overfitting:
○ A situation where the ML model is too complex and learns not only patterns but
also noise from training data.
👉
○ It performs well on training data but poorly on test data.
Example: A model memorizing answers instead of generalizing.
3. Bias:
○ Error due to wrong assumptions made by the model.
○ High bias → model is too simple, leading to underfitting.
4. Variance:
○ Error due to the model’s sensitivity to small fluctuations in training data.
○ High variance → model is too complex, leading to overfitting.
[Link] differences between classification and regression
Aspect Classification Regression
Definition Predicts a category or class label. Predicts a continuous numerical
value.
Output Type Discrete (e.g., Yes/No, Spam/Not Continuous (e.g., salary,
Spam). temperature).
Example Predicting if an email is spam or not Predicting house price based on
spam. area and location.
Algorithms Logistic Regression, Decision Tree, Linear Regression, Polynomial
Used Random Forest, SVM, Naive Bayes. Regression, Ridge/Lasso
Regression.
Evaluation Accuracy, Precision, Recall, F1-score, Mean Squared Error (MSE), Mean
Metrics ROC-AUC. Absolute Error (MAE), R² score.
[Link] between Predictive and Descriptive Model
Aspect Predictive Model Descriptive Model
Definition Used to predict future outcomes Used to analyze and describe patterns
based on past data. in existing data.
Goal Forecast unknown values. Summarize and explain data.
Output Prediction of future events or Insights, relationships, and groupings
values. in data.
Technique Regression, Classification, Time Clustering, Association Rules,
s Series Forecasting. Summarization.
Example Predicting customer churn, Grouping customers by buying
stock prices. behavior, market basket analysis.
9. Explain Application of machine learning algorithms
1. Healthcare:
○ Disease prediction and diagnosis (e.g., cancer detection).
○ Drug discovery and personalized treatment.
2. Finance:
○ Fraud detection in transactions.
○ Credit scoring and stock market prediction.
3. E-commerce:
○ Product recommendations (like Amazon, Flipkart).
○ Dynamic pricing and customer behavior analysis.
4. Social Media & Marketing:
○ Targeted advertising and sentiment analysis.
○ Content recommendation (e.g., YouTube, Instagram).
5. Transportation:
○ Self-driving cars and route optimization.
○ Predicting traffic and fuel consumption.
6. Natural Language Processing (NLP):
○ Chatbots, virtual assistants (Siri, Alexa).
○ Machine translation (Google Translate).
7. Image & Speech Recognition:
○ Face recognition (security systems, smartphones).
○ Voice recognition (speech-to-text, voice commands).
10. Write the Difference between Eager learners and Lazy learners.
Aspect Eager Learners Lazy Learners
Definition Build a general model during Delay learning until a query is made; store
training before seeing test data. training data and learn at prediction time.
Training Training is slow (since model is Training is fast (just store the data).
Phase built first).
Prediction Prediction is fast, since the Prediction is slow, since computation is
Phase model is already ready. done at query time.
Memory Uses less memory (only stores Uses more memory (stores full training
Usage the model). data).
Examples Decision Trees, Naive Bayes, k-Nearest Neighbors (k-NN), Case-based
Neural Networks. reasoning.
11. What is the cross-validation Method? Explain the K-fold cross-validation Method.
Cross-Validation Method
● Cross-validation is a resampling technique used in Machine Learning to evaluate
model performance.
● It divides the dataset into training and testing sets multiple times to ensure the model
performs well on unseen data.
● Helps to avoid overfitting and gives a more reliable accuracy score.
K-Fold Cross-Validation
● In K-Fold Cross-Validation, the dataset is divided into K equal parts (folds).
● The process works as follows:
1. Split the dataset into K folds.
2. Use K-1 folds for training and 1 fold for testing.
3. Repeat the process K times, each time using a different fold as test data.
4. Calculate the average accuracy from all K trials.
Example:
● If K = 5 → The dataset is divided into 5 folds.
● The model is trained on 4 folds and tested on the 1 remaining fold.
● This repeats 5 times, and the average performance is taken.
12. What Feature Engineering? Explain Process in Feature Engineering.
What is Feature Engineering?
● Feature Engineering is the process of selecting, modifying, and creating features
(input variables) from raw data to improve the performance of a Machine Learning
model.
● Good feature engineering helps models learn patterns better and increases accuracy.
Process in Feature Engineering
1. Feature Creation:
👉
○ Create new features from existing ones.
Example: From date of birth, create age.
2. Feature Transformation:
👉
○ Transform features to make them suitable for models.
Example: Apply log, square root, or normalization to reduce skewness.
3. Feature Extraction:
👉
○ Reduce dimensionality by extracting useful information.
Example: Principal Component Analysis (PCA) for images/text.
4. Feature Selection:
👉
○ Select only important features and remove irrelevant ones to avoid overfitting.
Example: Choosing marks and study hours to predict performance, ignoring
eye color.
5. Handling Missing Values:
○ Fill missing values using mean/median or remove incomplete records.
6. Encoding Categorical Data:
👉
○ Convert categories into numerical values.
Example: Male=0, Female=1 (Label Encoding / One-Hot Encoding).
13. Why is the Algorithm named Naive Bayes? Explain Bayes’ Theorem and the working
of a Naive Bayes Classifier. Also, state the applications of the Naive Bayes Algorithm.
14. Explain the K-Nearest Neighbour (K-NN) Algorithm in Machine Learning. Discuss how
the K-NN Algorithm works, the method of selecting the value of K, and list the
advantages and disadvantages of the K-NN Algorithm.
15. Explain Bootstrap sampling Method.
● Definition:
Bootstrap is a resampling technique used in statistics and machine learning.
It involves repeatedly drawing random samples with replacement from the original
dataset and then estimating statistics (mean, variance, accuracy, etc.) from these
samples.
● The main goal is to estimate the accuracy and variability of a model or statistic when
the dataset is limited.
How Bootstrap Sampling Works?
1. Suppose we have a dataset of size n.
2. Create a new sample of size n by randomly picking data points with replacement.
○ “With replacement” means the same data point can be chosen more than once.
3. Repeat the above process many times (e.g., 1000 bootstraps) to generate multiple
bootstrap samples.
4. Train models/statistics on each bootstrap sample and measure performance.
5. Use the results to compute confidence intervals, bias, and variance estimates.
Example:
● Dataset: {1, 2, 3, 4} (n=4).
● Bootstrap sample 1: {2, 3, 3, 4}.
● Bootstrap sample 2: {1, 1, 2, 4}.
● Bootstrap sample 3: {2, 2, 3, 4}.
● Each time, samples are taken with replacement.
Applications of Bootstrap in ML:
1. Estimating model accuracy (e.g., classification error rate).
2. Calculating confidence intervals for predictions.
3. Used in Bagging (Bootstrap Aggregation), e.g., Random Forest.
4. Helps in reducing overfitting by providing stable estimates.
16. Explain the Decision Tree Classification Algorithm.
Definition:
● A Decision Tree is a supervised machine learning algorithm used for classification and
regression tasks.
● It works like a tree structure where:
○ Each internal node represents a test on an attribute (feature).
○ Each branch represents the outcome of the test.
○ Each leaf node represents a class label (decision).
How Decision Tree Classification Works?
1. Select the Root Node:
○ Choose the feature that best splits the data into classes.
○ Common criteria: Information Gain, Gini Index, Chi-Square.
2. Split Data:
○ Partition the dataset into subsets based on the chosen attribute.
3. Repeat Process:
○ For each subset, choose the next best feature and create child nodes.
4. Stopping Condition:
○ Stop splitting if:
■ All samples belong to the same class, or
■ No more features are left.
5. Final Tree:
○ The tree structure is formed with decision nodes and leaf nodes.
○ Used to classify new data by traversing the tree.
Example:
If we want to classify whether a person will play tennis:
● Feature 1: Weather (Sunny, Rainy).
● If Sunny → Check Humidity.
● If Humidity = High → No, else → Yes.
● If Rainy → Check Wind.
● If Wind = Strong → No, else → Yes.
Advantages:
● Easy to understand and interpret.
● Works with both categorical and numerical data.
● Requires little data preprocessing.
Disadvantages:
● Prone to overfitting on small datasets.
● Unstable (small changes in data can change the tree).
● Biased towards features with many levels.
17. What is Clustering Method in ML? List the different types of Clustering
Methods in Machine Learning. Explain any one method.
Here’s a simple and exam-ready answer 👇
What is Clustering in ML?
● Clustering is an unsupervised learning method in Machine Learning.
● It is used to group similar data points into clusters such that:
○ Data points in the same cluster are more similar to each other.
○ Data points in different clusters are dissimilar.
● Example: Grouping customers by purchasing habits, grouping news articles by topic.
Types of Clustering Methods in ML
1. Partitioning Methods
○ Example: K-Means, K-Medoids.
○ Divide data into non-overlapping clusters.
2. Hierarchical Methods
○ Example: Agglomerative, Divisive Clustering.
○ Build a tree-like structure of clusters.
3. Density-Based Methods
○ Example: DBSCAN, OPTICS.
○ Form clusters based on dense regions of points.
4. Grid-Based Methods
○ Example: STING, CLIQUE.
○ Divide the data space into a grid and form clusters.
5. Model-Based Methods
○ Example: Gaussian Mixture Models (GMMs).
○ Assume data is generated from a mixture of probability distributions.
Explain One Method: K-Means Clustering (Partitioning Method)
1. Choose the number of clusters (K).
2. Randomly initialize K centroids.
3. Assign each data point to the nearest centroid (form clusters).
4. Recalculate centroids as the mean of assigned points.
5. Repeat steps 3 & 4 until centroids do not change (convergence).
Example:
● Dataset of customer income & spending.
● K-Means groups customers into 3 clusters:
○ Low income–low spending,
○ High income–high spending,
○ Low income–high spending.
18. Explain Improving performance of a model
Improving Performance of a Model
Improving a model’s performance involves applying strategies to increase accuracy, reduce
errors, and make predictions more generalizable. The key steps include:
1. Data Preprocessing
● Cleaning Data: Remove missing, inconsistent, or noisy data to prevent errors in
learning.
● Feature Scaling/Normalization: Ensure features are on a similar scale to help
algorithms converge faster (important for distance-based algorithms like K-NN or
gradient-based models).
● Encoding Categorical Data: Convert categorical variables into numerical format (e.g.,
One-Hot Encoding) so models can process them.
2. Feature Engineering
● Feature Selection: Remove irrelevant or redundant features to reduce overfitting and
improve accuracy.
● Feature Creation: Generate new meaningful features from existing data (e.g., extracting
“month” from a date).
● Dimensionality Reduction: Techniques like PCA reduce complexity and improve
performance.
3. Model Selection and Tuning
● Choosing the Right Model: Select a model suitable for the problem type (e.g.,
regression, classification, clustering).
● Hyperparameter Tuning: Adjust model parameters (e.g., learning rate, number of trees,
K in K-NN) to optimize performance.
● Regularization: Techniques like L1/L2 prevent overfitting by penalizing large weights.
4. Ensemble Methods
● Combine predictions from multiple models to improve accuracy and robustness:
○ Bagging: Reduces variance (e.g., Random Forest)
○ Boosting: Reduces bias (e.g., XGBoost, AdaBoost)
○ Stacking: Combines multiple models to leverage their strengths.
5. Cross-Validation
● Split data into training and testing sets multiple times (e.g., k-fold cross-validation) to
ensure the model generalizes well.
● Helps in avoiding overfitting and getting reliable performance metrics.
6. Increasing Data Quality and Quantity
● More Data: A larger dataset can improve learning and reduce overfitting.
● Data Augmentation: Generate synthetic data for better generalization (common in
image and text data).
7. Monitoring and Iteration
● Evaluate Metrics: Use accuracy, precision, recall, F1-score, ROC-AUC, etc., depending
on the problem.
● Iterative Improvement: Continuously refine the model using feedback, retraining, and
testing.
19. Explain the Apriori Algorithm in Machine Learning with its working steps. Also,
discuss the advantages and disadvantages of the Apriori Algorithm.
Here’s a complete 5-mark level answer for your question:
Apriori Algorithm in Machine Learning
The Apriori Algorithm is a popular association rule mining technique used to find frequent
itemsets and generate rules from large datasets. It is mainly used in market basket analysis to
discover patterns like “if a customer buys X, they are likely to buy Y.”
Working Steps of Apriori Algorithm
1. Set Minimum Support – Define a minimum threshold for itemsets to be considered
frequent.
2. Generate Candidate Itemsets – Start with single items (1-itemsets) and find those
meeting minimum support.
3. Prune Infrequent Itemsets – Remove itemsets that don’t meet minimum support.
4. Generate Higher-Order Itemsets – Combine frequent itemsets to form 2-itemsets,
3-itemsets, etc., and prune those below minimum support.
5. Generate Association Rules – For frequent itemsets, generate rules of the form A → B
with confidence above a minimum threshold.
Advantages
● Simple and easy to understand.
● Efficient for small to medium datasets.
● Helps in discovering meaningful patterns in transactional data.
Disadvantages
● Generates many candidate itemsets, leading to high computation cost.
● Not suitable for large datasets with low support thresholds.
● Only works for categorical data, not continuous data.
20. Explain the complete process of Classification Learning with the help of a
flowchart/diagram.
Classification Learning is a type of supervised machine learning where the goal is to predict
the category or class label of new data based on past labeled data. Examples include spam
detection, disease prediction, or credit scoring.
Steps in Classification Learning
1. Data Collection – Gather relevant data with known labels.
2. Data Preprocessing – Clean data, handle missing values, normalize features, and
encode categorical variables.
3. Feature Selection/Extraction – Choose the most important features that help in
classification.
4. Split Dataset – Divide data into training and testing sets.
5. Train the Model – Use the training data to build a classifier using algorithms like
Decision Tree, K-NN, or SVM.
6. Test the Model – Evaluate the model on the testing data to measure performance
(accuracy, precision, recall).
7. Model Deployment – Use the trained model to classify new, unseen data.
Flowchart of Classification Learning
Start
Data Collection
Data Preprocessing
Feature Selection/Extraction
Split Dataset (Train/Test)
|
Train Model (Classifier)
Test Model (Evaluate Performance)
Model Deployment
End
21. Explain the concept of Random Forest in Machine Learning. Illustrate with an
example and highlight its benefits, drawbacks, and uses.
Here’s a 5-mark level answer for “Random Forest” in Machine Learning:
Random Forest in Machine Learning
Random Forest is an ensemble learning technique used for classification and regression.
It works by combining multiple Decision Trees to make predictions. Each tree is trained on a
random subset of the data, and the final prediction is made by majority voting (for
classification) or averaging (for regression).
Example
Suppose a bank wants to predict whether a customer will default on a loan (Yes/No):
1. Random Forest creates multiple decision trees, each trained on a different subset of
customer data.
2. Each tree predicts whether the customer will default.
3. The Random Forest aggregates the predictions – if most trees say “Yes,” the final
prediction is “Yes.”
Benefits
● Handles large datasets with high dimensionality.
● Reduces overfitting compared to a single decision tree.
● Can handle both categorical and numerical data.
● Provides feature importance information.
Drawbacks
● Slower than a single decision tree due to multiple trees.
● Less interpretable compared to a single decision tree.
● Requires more memory and computation.
Uses
● Finance – Loan default prediction, credit scoring.
● Healthcare – Disease diagnosis.
● Marketing – Customer segmentation, churn prediction.
● E-commerce – Product recommendation.
22. Explain Simple Linear Regression and Multiple Linear Regressions.
Here’s a 5-mark level answer for your question:
Regression in Machine Learning
Regression is a supervised learning technique used to predict continuous numerical values
based on input variables.
1. Simple Linear Regression (SLR)
● Predicts the relationship between one independent variable (X) and one dependent
variable (Y).
● Assumes a linear relationship:
Y=b0+b1X+ϵY = b_0 + b_1X + \epsilon
Where:
● b0b_0 = intercept
● b1b_1 = slope
● ϵ\epsilon = error term
Example: Predicting a student’s marks (Y) based on hours studied (X).
2. Multiple Linear Regression (MLR)
● Predicts the relationship between two or more independent variables (X₁, X₂, …, X )
and one dependent variable (Y).
● Linear equation:
Y=b0+b1X1+b2X2+...+bnXn+ϵY = b_0 + b_1X_1 + b_2X_2 + ... + b_nX_n + \epsilon
Example: Predicting house price (Y) based on size (X₁), number of rooms (X₂), and location
(X₃).
Key Difference
Feature Simple Linear Multiple Linear Regression
Regression
Independent 1 2 or more
Variables
Complexity Simple More complex
Use Basic prediction Real-world scenarios with multiple
factors
23. Define the K-Medoids clustering algorithm. What are its key characteristics,
including any use of parameters like epsilon (ε), and discuss its advantages,
limitations, and common applications
Here’s a detailed 5-mark answer for the K-Medoids clustering algorithm:
K-Medoids Clustering Algorithm
Definition:
K-Medoids is a partition-based clustering algorithm similar to K-Means, but instead of using
the mean of the cluster as the center, it uses an actual data point (medoid) as the
representative of a cluster. This makes it more robust to noise and outliers.
Key Characteristics
1. Medoid Selection:
○ A medoid is the data point in a cluster whose average dissimilarity to all other
points in the cluster is minimum.
2. Distance Metric:
○ Can use Manhattan distance, Euclidean distance, or any dissimilarity
measure.
3. Parameter ε (Epsilon):
○ In some implementations, ε can define a threshold for convergence or
maximum allowable change in medoids during iterations.
○ Helps in stopping the algorithm when clusters stabilize.
4. Algorithm Steps (Basic):
○ Choose k random medoids.
○ Assign each data point to the nearest medoid.
○ Compute the total cost (sum of distances) for each medoid.
○ Swap medoids with non-medoid points if it reduces total cost.
○ Repeat steps 2–4 until no change in medoids or convergence threshold ε is
reached.
5. Robustness:
○ Less sensitive to outliers than K-Means because medoids are actual points.
Advantages
● Handles outliers and noise better than K-Means.
● Produces more representative clusters in real-world datasets.
● Works with arbitrary distance/dissimilarity measures.
Limitations
● Computationally expensive for large datasets because it computes distances between
all pairs of points.
● Requires predefined k (number of clusters).
● Not suitable for very high-dimensional data without dimensionality reduction.
Common Applications
● Market segmentation in retail.
● Image and document clustering.
● Fraud detection (outliers handled well).
● Biological data clustering (e.g., gene expression data).
24. What is Association Rule Learning in Machine learning? Write the applications of
Association Rule Learning in different fields with suitable examples.
Here’s a complete 5-mark answer for your question:
Association Rule Learning (ARL) in Machine Learning
Definition:
Association Rule Learning is an unsupervised machine learning technique used to discover
interesting relationships, patterns, or correlations between items in large datasets. It is
widely used in market basket analysis, where it helps identify products that are frequently
bought together.
Format of an Association Rule:
A→BA \rightarrow B
● Meaning: If A occurs, B is likely to occur.
● Key metrics:
1. Support – How often A and B occur together.
2. Confidence – Probability that B occurs when A occurs.
3. Lift – Measures how much more likely B occurs with A than alone.
Applications of Association Rule Learning
Field Example / Use Case
Retail / Market Basket Analysis: Customers who buy bread also buy butter.
E-commerce Used for product recommendations.
Healthcare Discovering patterns in patient data: Patients with symptom A often
develop disease B. Helps in early diagnosis.
Banking / Finance Fraud detection: Transactions with unusual patterns can indicate
fraudulent activity.
Web & Social Recommendation systems: Users who like movie X are likely to like
Media movie Y.
Manufacturing Defect analysis: Products with component A often fail when
component B is present.
Summary
Association Rule Learning helps organizations to:
● Discover hidden patterns in large datasets.
● Make data-driven decisions like cross-selling, recommendations, and fraud detection.
● Improve customer experience and operational efficiency.
25. Explain Hierarchical Clustering?
Here’s a detailed 5-mark level answer for “Hierarchical Clustering”:
Hierarchical Clustering
Definition:
Hierarchical Clustering is an unsupervised machine learning algorithm used to group similar
data points into clusters based on distance or similarity measures. Unlike K-Means, it does
not require specifying the number of clusters beforehand.
It builds a hierarchy (tree) of clusters, which can be visualized using a dendrogram.
Types of Hierarchical Clustering
1. Agglomerative (Bottom-Up)
○ Starts with each data point as a single cluster.
○ Iteratively merges the two closest clusters until all points form a single cluster
or a stopping criterion is met.
2. Divisive (Top-Down)
○ Starts with all data points in one cluster.
○ Iteratively splits clusters into smaller clusters until each point is in its own
cluster or a stopping criterion is met.
Steps of Hierarchical Clustering
1. Compute a distance matrix for all data points (e.g., Euclidean distance).
2. Choose a linkage criterion to measure cluster similarity (e.g., single, complete,
average).
3. Merge or split clusters based on the chosen criterion.
4. Repeat until all data points are clustered into a hierarchy.
5. Represent the results using a dendrogram, showing clusters and their merge/split
levels.
Advantages
● Does not require specifying the number of clusters in advance.
● Produces a tree-like structure (dendrogram) which gives insight into the data
hierarchy.
● Can work with different types of distance metrics.
Limitations
● Computationally expensive for large datasets (O(n²) complexity).
● Sensitive to noise and outliers.
● Once a merge or split is done, it cannot be undone (no backtracking).
Applications
● Gene expression analysis in bioinformatics.
● Customer segmentation in marketing.
● Document or text clustering in NLP.
● Image segmentation and pattern recognition.