0% found this document useful (0 votes)
8 views7 pages

Machine Learning Concepts and Techniques

Uploaded by

Manshi Singh
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views7 pages

Machine Learning Concepts and Techniques

Uploaded by

Manshi Singh
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

GURU TEGH BAHADUR 4TH CENTENARY ENGINEERING

COLLEGE

Session: 2022-2026

Assignment-1
Machine Learning (CIE-421T)
Part A – Short Answer
Q-1(a) What do you understand by noise in data? What could be implications on the result, if noise is not
treated properly?

Ans- Noise refers to irrelevant, random, or meaningless data that does not represent the true characteristics of the
underlying pattern.

Implications:

 Leads to poor model accuracy and unreliable predictions.


 Increases the risk of overfitting, as the model tries to learn random fluctuations.

Q-1(b) What do you understand by overfitting of data? Give any two methods to avoid overfitting.

Ans- Overfitting occurs when a model learns not only the underlying pattern but also the noise in the training data,
performing well on training data but poorly on unseen data.

Methods to avoid overfitting:

 Regularization (L1/L2 penalties, Dropout)


 Cross-validation or using more training data

Q-1(c) When should we use classification over regression? Explain using example.

Ans- Classification is used when the output variable is categorical (discrete labels). Regression is used when the
output variable is continuous numeric.

Example:

 Classification: Predicting if an email is Spam or Not Spam.


 Regression: Predicting the price of a house based on area, location, etc.

Q-1(d) Define the terms – Precision, Recall, F1-score and Accuracy.

 Precision: Fraction of correctly predicted positive cases out of all predicted positives. Precision=TP/TP+FP
 Recall: Fraction of correctly predicted positive cases out of all actual positives. Recall=TP/TP+FN
 F1-score: Harmonic mean of Precision and Recall. F1=2 × (Precision × Recall/Precision + Recall)
 Accuracy: Proportion of correctly classified instances. Accuracy=TP+TN/TP+FP+TN+FNA

Q-1(e) Define LDA and mention any two limitations.

Ans- LDA is a supervised dimensionality reduction and classification technique that projects data onto a lower-
dimensional space by maximizing the separation between classes while minimizing within-class variance.

Limitations:

 Assumes classes are normally distributed with equal covariance matrices (may not hold in real data).
 Performs poorly if classes are not linearly separable.

Part B – Descriptive / Analytical


Q-2(a) Differentiate between Supervised Learning and Unsupervised Learning.

Ans-

Aspect Supervised Learning Unsupervised Learning

Learning with labeled data (input–output pairs Learning with unlabeled data (only input, no
Definition
given). output labels).

Predict outputs for new inputs


Goal Find hidden patterns, structure, or groupings.
(classification/regression).

Predicting house prices, spam detection, disease Customer segmentation, market basket analysis,
Examples
diagnosis. anomaly detection.

Linear Regression, Logistic Regression, Decision


Algorithms k-Means, Hierarchical Clustering, PCA.
Trees, SVM.

Output Predict a class label or continuous value. Discover clusters or reduce dimensionality.

Q-3(b) Explain Generative Probabilistic Classification.

theorem to compute the posterior probability P (y ∣ x).


Ans- A classification approach where we model the joint probability distribution P (x, y) and then use Bayes’

Working:

Estimate likelihood P (x ∣ y).


 Estimate prior probability P(y).

Use Bayes’ theorem: P (y∣ x) = P (x ∣ y) * P(y) / P(x)




 Assign class with the maximum posterior probability (MAP estimation).

Example:

 Naïve Bayes classifier assumes features are conditionally independent given the class.
 If class = {Spam, Not Spam}, we compute which class has the higher probability for given email words.

Advantage: Works well even with small data, simple and fast.

Limitation: Strong independence assumptions may reduce accuracy.

Q-4(c) Discuss Bagging and Boosting.

Ans- Bagging (Bootstrap Aggregating):

 Train multiple models on different bootstrapped subsets of data.


 Aggregate results (majority vote for classification, average for regression).
 Reduces variance → prevents overfitting.
 Example: Random Forest = Bagging of Decision Trees.

Boosting:
 Sequentially train models where each new model focuses on correcting errors made by the previous ones.
 Combine weak learners into a strong learner (weighted voting).
 Reduces bias and variance.
 Examples: AdaBoost, Gradient Boosting, XGBoost.

Difference:

 Bagging = parallel, reduces variance.


 Boosting = sequential, reduces bias and variance.

Q-5(a). Explain Bayesian Estimation and Maximum Likelihood Estimation in generative learning.
Ans- Maximum Likelihood Estimation (MLE):

 Finds parameters that maximize the likelihood of observing the given data. It ignores prior information.
 Example: In a Gaussian distribution, estimate mean μ and variance σ² that maximize probability of data.
 Limitation: Can overfit, ignores prior knowledge.

Bayesian Estimation:

 Uses Bayes’ theorem by combining likelihood with prior probability, giving a posterior distribution for
parameters.
 Provides a distribution over parameters instead of a single estimate.
 Advantage: Handles uncertainty better, prevents overfitting.

Q-6(b). Explain the Decision Tree Algorithm with example.

Ans- A decision tree is a supervised learning algorithm that splits data into branches based on attribute values.
Each internal node represents an attribute, branches represent decisions, and leaves represent outcomes.

 Select the best feature to split using metrics like Information Gain (Entropy) or Gini Index.
 Create a decision node for the chosen feature.
 Split the dataset into subsets.
 Repeat recursively until stopping criteria (pure nodes or depth limit).

Example: Predicting “Play Tennis” based on attributes like weather conditions (Sunny, Rainy, Cloudy).

Dataset (Weather → Play Tennis):

 Features: Outlook (Sunny, Overcast, Rain), Temperature, Humidity, Wind.


 Target: Play (Yes/No).

Advantages: Easy to interpret and visualize.

Disadvantage: Prone to overfitting.

Part C – Long Answer


Q4(a) Write a short note on Support Vector Machine (SVM).

Ans: SVM is a supervised learning algorithm that classifies data by finding an optimal hyperplane.

Working Principle:

 It finds an optimal hyperplane that best separates data points of different classes.
 For linearly separable data, the hyperplane maximizes the margin (distance between hyperplane and
nearest data points, called support vectors).

Types:

 Linear SVM – works when data is linearly separable.


 Non-linear SVM – uses kernel functions (RBF, polynomial) to transform data into higher dimensions.

Advantages: Effective in high-dimensional spaces, robust to overfitting when dimensions > samples.

Applications: Text classification, image recognition, bioinformatics.

Q-5(b) Explain Logistic Regression.

Ans: Logistic regression is a classification algorithm that outputs a probability between 0 and 1, used when the
dependent variable is categorical.

Concept:

 Instead of predicting values directly (like linear regression), it predicts the probability of a data point
belonging to a class.
 Uses the sigmoid function: P(y=1∣x) =1/1+e^−(β0+β1x)

Decision Rule: If probability > 0.5 → Class 1, else Class 0.

Advantages: Simple, interpretable, works well for binary classification.

Applications: Spam detection, disease prediction, customer churn analysis.

Q6(c) Write the AdaBoost Algorithm.

Ans: AdaBoost (Adaptive Boosting) is an ensemble method that combines multiple weak classifiers to form a
strong classifier.

Algorithm Steps:

 Start by assigning equal weights to all training samples.


 Train a weak classifier (e.g., a decision stump).
 Increase the weights of misclassified samples.
 Train the next classifier, focusing more on difficult cases.
 The final model is formed by a weighted vote of all classifiers.

Advantages: Significantly improves accuracy and reduces both bias and variance.

Limitation: Sensitive to noisy data.

Example: Used in face detection in computer vision.


Q-7(a) What are the Goals of Machine Learning?

Ans:

 Automation of tasks – reduce human effort by making machines learn from data.
 Prediction – forecast future trends (e.g., stock prices, disease risk).
 Classification – assign data into categories (e.g., spam filtering).
 Clustering/Pattern discovery – find hidden structures in data (e.g., customer segmentation).
 Decision making – assist in making intelligent decisions based on data.
 Adaptation – improve system performance automatically with experience.

Q-8(b) Explain Overfitting.

Ans: Overfitting occurs when a model learns the training data too closely, even memorizing noise and irrelevant
details.

Symptoms: High accuracy on training data and low accuracy on test data (poor generalization).

Causes: Model is too complex or training data is insufficient.

Solutions:

 Use cross-validation.
 Apply regularization (L1, L2).
 Prune decision trees.

Example: Memorizing exam questions → leads to high score in practice (training), but failure in a new exam

Q-9(c) What is Nearest Neighbor?

Ans: k-Nearest Neighbor is a learning algorithm used for classification/regression based on closest data points in
feature space.

Algorithm (KNN):

 Choose a value of k.
 Calculate the distance (Euclidean, Manhattan) between the test point and all training points.
 Select k nearest neighbors.
 For classification: assign the most frequent class among neighbors.
For regression: take the average of neighbors’ values.

Advantages: Simple, no training phase and works for both classification & regression.

Limitations: Computationally expensive for large datasets, sensitive to noise.

Example: Handwritten digit recognition (MNIST dataset).

Q-10(d) Describe the Limitations of Perceptron Model.


Ans:

 Linearly separable limitation-The major limitation of the perceptron is that it can only classify
linearly separable data. If the data cannot be separated by a straight line (such as the XOR problem),
the perceptron fails completely, as it cannot capture non-linear decision boundaries.
 No probabilistic interpretation: It produces a hard decision (0 or 1) rather than probabilities, which
makes it unsuitable for tasks requiring uncertainty estimation.
 Fixed learning rate: The learning rate is fixed, and choosing an inappropriate value can cause slow
convergence or oscillations during training.
 Single-layer structure: t has a single-layer architecture, meaning it lacks the hidden layers required to
model complex relationships between inputs and outputs.
 Overfitting: Susceptible if trained too long or on noisy data. The perceptron is sensitive to noisy and
overlapping data, which can lead to incorrect classifications.

You might also like