0% found this document useful (0 votes)
19 views18 pages

ML Paper Solution

The document provides an overview of key concepts in machine learning, including confusion matrix, distribution of error, clustering, reinforcement learning, regression, and semi-supervised learning. It discusses the lifecycle of machine learning, detailing steps from problem definition to model monitoring and maintenance, as well as applications in various fields such as healthcare, finance, and autonomous vehicles. Additionally, it introduces the Naive Bayes theorem as a classification algorithm that assumes feature independence.

Uploaded by

chiman Saini
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views18 pages

ML Paper Solution

The document provides an overview of key concepts in machine learning, including confusion matrix, distribution of error, clustering, reinforcement learning, regression, and semi-supervised learning. It discusses the lifecycle of machine learning, detailing steps from problem definition to model monitoring and maintenance, as well as applications in various fields such as healthcare, finance, and autonomous vehicles. Additionally, it introduces the Naive Bayes theorem as a classification algorithm that assumes feature independence.

Uploaded by

chiman Saini
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

1. (a) Whats is confusion matrix?

Confusion matrix is a simple table used to measure how well a classification model is performing. It compares the
predictions made by the model with the actual results and shows where the model was right or wrong. This helps
you understand where the model is making mistakes so you can improve it. It breaks down the predictions into four
categories:
 True Positive (TP): The model correctly predicted a positive outcome i.e the actual outcome was positive.
 True Negative (TN): The model correctly predicted a negative outcome i.e the actual outcome was
negative.
 False Positive (FP): The model incorrectly predicted a positive outcome i.e the actual outcome was
negative. It is also known as a Type I error.
 False Negative (FN): The model incorrectly predicted a negative outcome i.e the actual outcome was
positive. It is also known as a Type II error.

(b) What is Distribution of Error?

The distribution of errors is a key concept in evaluating the performance of a machine learning model. Errors refer
to the difference between the predicted values and the actual values (or ground truth) from a dataset. Analyzing the
distribution of errors helps in understanding the behavior of the model and can provide insights into whether the
model is performing well or if there are issues like bias, variance, or outliers.

Types of Errors

Bias: This refers to the error introduced by the model's assumptions. High bias means the model is too simplistic
and underfits the data. Underfitting typically occurs when the model cannot capture the underlying patterns in the
data.

Variance: This refers to the error introduced by the model's sensitivity to small fluctuations in the training data.
High variance means the model is too complex and overfits the data, capturing noise and fluctuations as if they were
real patterns.

Residuals: The residuals are the differences between the actual values and the predicted values, i.e., the errors made
by the model.

(c ) What is the purpose of Clustering?

Clustering is an unsupervised machine learning technique that groups similar data points together into clusters based
on their characteristics, without using any labeled data. The objective is to ensure that data points within the same
cluster are more similar to each other than to those in different clusters, enabling the discovery of natural groupings
and hidden patterns in complex datasets.
 Goal: Discover the natural grouping or structure in unlabeled data without predefined categories.
 How: Data points are assigned to clusters based on similarity or distance measures.
 Similarity Measures: Can include Euclidean distance, cosine similarity or other metrics depending
on data type and clustering method.
 Output: Each group is assigned a cluster ID, representing shared characteristics within the cluster.
(d) What is reinforcement Learning?

Reinforcement Learning (RL) is a branch of machine learning that focuses on how agents can learn to make
decisions through trial and error to maximize cumulative rewards. RL allows machines to learn by interacting with
an environment and receiving feedback based on their actions. This feedback comes in the form of rewards or
penalties.

Reinforcement Learning revolves around the idea that an agent (the learner or decision-maker) interacts with an
environment to achieve a goal. The agent performs actions and receives feedback to optimize its decision-making
over time.
 Agent: The decision-maker that performs actions.
 Environment: The world or system in which the agent operates.
 State: The situation or condition the agent is currently in.
 Action: The possible moves or decisions the agent can make.
 Reward: The feedback or result from the environment based on the agent’s action.

(e) what is Regression in Machine Learning?

Regression in machine learning is a supervised learning technique used to predict continuous numerical values by
learning relationships between input variables (features) and an output variable (target). It helps understand how
changes in one or more factors influence a measurable outcome and is widely used in forecasting, risk analysis,
decision-making and trend estimation.
 Works with real-valued output variables
 Helps to identify strengths and the type of relationships
 Supports both simple and complex predictive models.
 Used for tasks like price prediction, trend forecasting and risk scoring.
(f) What is semi-supervised learning?

Semi-supervised learning is a hybrid machine learning approach which uses both supervised and unsupervised
learning. It uses a small amount of labelled data combined with a large amount of unlabelled data to train models.
The goal is to learn a function that accurately predicts outputs based on inputs, similar to supervised learning, but
with much less labelled data.

Semi-supervised learning is particularly valuable when acquiring labelled data is expensive or time-consuming, yet
unlabelled data is plentiful and easy to collect.
 Supervised learning: Similar to a student being taught concepts by a teacher both in class and at home.
 Unsupervised learning: Like a student figuring out concepts independently without instruction like a math
problem.
 Semi-supervised learning: A mix where the teacher provides some concepts in class and the student
practices with homework assignments based on those concepts.

Unit-1
2. What is Machine Learning ? Discuss the need and Fetures of Machine Learning?

Machine learning is a branch of artificial intelligence that enables algorithms to uncover hidden patterns within
datasets. It allows them to predict new, similar data without explicit programming for each task. Machine learning
finds applications in diverse fields such as image and speech recognition, natural language processing,
recommendation systems, fraud detection, portfolio optimization, and automating tasks.

 Handles Massive Data: Machine learning works well with large data and finds patterns that humans might
miss.
 Adapts Dynamically: Systems evolve with new data, staying relevant in changing environments.
 Drives Smarter Decisions: From predicting customer behavior to detecting fraud, ML enhances decision-
making with data-driven insights.
 Personalizes Experiences: Recommendation systems, like those on Netflix or Amazon, tailor suggestions
to individual preferences.
Need for Machine Learning
Machine Learning is needed because traditional programming methods are often insufficient to handle complex,
large-scale, and dynamic problems. Key reasons include:
1. Handling Large Volumes of Data
Modern systems generate massive amounts of data (big data) that cannot be efficiently analyzed using
manual or rule-based approaches.
2. Automation of Decision-Making
ML enables systems to make decisions automatically without explicit programming, reducing human effort
and errors.
3. Learning from Experience
ML systems improve their performance over time by learning from past data, unlike static traditional
programs.
4. Complex Problem Solving
Problems like speech recognition, image processing, fraud detection, and recommendation systems are too
complex for fixed rules.
5. Adaptability to Change
ML models can adapt to changing environments and patterns, such as changing customer behavior or
market trends.
6. Cost and Time Efficiency
Once trained, ML systems can process data and make predictions faster and at lower cost than manual
methods.
Features of Machine Learning
1. Data-Driven Learning
ML algorithms learn patterns directly from data rather than relying on explicitly written rules.
2. Self-Improvement
Performance improves as the system is exposed to more data and experience.
3. Generalization Ability
ML models can make accurate predictions on unseen or new data.
4. Automation
Reduces the need for human intervention in repetitive or complex tasks.
5. Scalability
ML systems can handle increasing amounts of data efficiently.
6. Pattern Recognition
ML excels at identifying hidden patterns and relationships in data.
7. Multiple Learning Types
Supports supervised, unsupervised, semi-supervised, and reinforcement learning methods.
8. Real-Time Processing
ML can be applied in real-time applications such as spam detection and recommendation engines.

3. (a) Explain the life cycle of Machine Learning

Machine Learning Lifecycle is a structured process that defines how machine learning (ML) models are developed,
deployed and maintained. It consists of a series of steps that ensure the model is accurate, reliable and scalable.
It includes defining the problem, collecting and preparing data, exploring patterns, engineering features, training
and evaluating models, deploying them into production and continuously monitoring performance to handle issues
like data drift and retraining needs. Below are the key steps of the ML lifecycle:
Step 1: Problem Definition
The first step is identifying and clearly defining the business problem. A well-framed problem provides the
foundation for the entire lifecycle. Important things like project objectives, desired outcomes and the scope of the
task are carefully designed during this stage.
 Collaborate with stakeholders to understand business goals
 Define project objectives, scope and success criteria
 Ensure clarity in desired outcomes
Step 2: Data Collection
Data Collection phase involves systematic collection of datasets that can be used as raw data to train model. The
quality and variety of data directly affect the model’s performance.
Here are some basic features of Data Collection:
 Relevance: Collect data should be relevant to the defined problem and include necessary features.
 Quality: Ensure data quality by considering factors like accuracy and ethical use.
 Quantity: Gather sufficient data volume to train a robust model.
 Diversity: Include diverse datasets to capture a broad range of scenarios and patterns.
Step 3: Data Cleaning and Preprocessing
Raw data is often messy and unstructured and if we use this data directly to train then it can lead to poor accuracy.
We need to do data cleaning and preprocessing which often involves:
 Data Cleaning: Address issues such as missing values, outliers and inconsistencies in the data.
 Data Preprocessing: Standardize formats, scale values and encode categorical variables for consistency.
 Data Quality: Ensure that the data is well-organized and prepared for meaningful analysis.
Step 4: Exploratory Data Analysis (EDA)
To find patterns and characteristics hidden in the data Exploratory Data Analysis (EDA) is used to uncover insights
and understand the dataset's structure. During EDA patterns, trends and insights are provided which may not be
visible by naked eyes. This valuable insight can be used to make informed decision.
Here are the basic features of Exploratory Data Analysis:
 Exploration: Use statistical and visual tools to explore patterns in data.
 Patterns and Trends: Identify underlying patterns, trends and potential challenges within the dataset.
 Insights: Gain valuable insights for informed decisions making in later stages.
 Decision Making: Use EDA for feature engineering and model selection.
Step 5: Feature Engineering and Selection
Feature engineering and selection is a transformative process that involve selecting only relevant features to
enhance model efficiency and prediction while reducing complexity.
Here are the basic features of Feature Engineering and Selection:
 Feature Engineering: Create new features or transform existing ones to capture better patterns and
relationships.
 Feature Selection: Identify subset of features that most significantly impact the model's performance.
 Domain Expertise: Use domain knowledge to engineer features that contribute meaningfully for
prediction.
 Optimization: Balance set of features for accuracy while minimizing computational complexity.
Step 6: Model Selection
For a good machine learning model, model selection is a very important part as we need to find model that aligns
with our defined problem, nature of the data, complexity of problem and the desired outcomes.
Here are the basic features of Model Selection:
 Complexity: Consider the complexity of the problem and the nature of the data when choosing a model.
 Decision Factors: Evaluate factors like performance, interpretability and scalability when selecting a
model.
 Experimentation: Experiment with different models to find the best fit for the problem.
Step 7: Model Training
With the selected model the machine learning lifecycle moves to model training process. This process involves
exposing model to historical data allowing it to learn patterns, relationships and dependencies within the dataset.
Here are the basic features of Model Training:
 Iterative Process: Train the model iteratively, adjusting parameters to minimize errors and enhance
accuracy.
 Optimization: Fine-tune model to optimize its predictive capabilities.
 Validation: Rigorously train model to ensure accuracy to new unseen data.

Step 8: Model Evaluation and Tuning


Model evaluation involves rigorous testing against validation or test datasets to test accuracy of model on new
unseen data. It provides insights into model's strengths and weaknesses. If the model fails to acheive desired
performance levels we may need to tune model again and adjust its hyperparameters to enhance predictive accuracy.
Here are the basic features of Model Evaluation and Tuning:
 Evaluation Metrics: Use metrics like accuracy, precision, recall and F1 score to evaluate model
performance.
 Strengths and Weaknesses: Identify the strengths and weaknesses of the model through rigorous testing.
 Iterative Improvement: Initiate model tuning to adjust hyperparameters and enhance predictive accuracy.
 Model Robustness: Iterative tuning to achieve desired levels of model robustness and reliability.
Step 9: Model Deployment
Now model is ready for deployment for real-world application. It involves integrating the predictive model with
existing systems allowing business to use this for informed decision-making.
Here are the basic features of Model Deployment:
 Integrate with existing systems
 Enable decision-making using predictions
 Ensure deployment scalability and security
 Provide APIs or pipelines for production use
Step 10: Model Monitoring and Maintenance
After Deployment models must be monitored to ensure they perform well over time. Regular tracking helps detect
data drift, accuracy drops or changing patterns and retraining may be needed to keep the model reliable in real-
world use.
Here are the basic features of Model Monitoring and Maintenance:
 Track model performance over time
 Detect data drift or concept drift
 Update and retrain the model when accuracy drops
 Maintain logs and alerts for real-time issues

(b) What are the application of Machine Learning ?

Machine Learning (ML) is one of the most significant advancements in the field of technology. It gives machines
the ability to learn from data and improve over time without being explicitly programmed. ML models identify
patterns from data and use them to make predictions or decisions.
Organizations use machine learning to automate tasks, make smarter decisions and gain valuable insights. ML is
shaping the world around us. Here are few real-world applications of Machine Learning:

1. Healthcare and Medical Diagnosis


ML algorithms can analyze large volumes of patient data, medical scans and genetic information to aid in diagnosis
and treatment.
 Applications:
 Disease Detection: ML models are used to identify diseases like cancer, pneumonia and Parkinson’s from
medical images. They often achieve accuracy comparable to or better than human doctors.
 Predictive Analytics: By analyzing patient history and symptoms, models can predict the risk of certain
diseases or potential complications.
 Drug Discovery: ML accelerates the drug development process by predicting how different compounds
will interact, reducing the time and cost of research.

2. Smart Assistants and Human-Machine Interaction


Virtual assistants systems rely on natural language processing (NLP) and speech recognition to understand
commands and respond intelligently.
 Applications:
 Voice Assistants: Tools like Siri, Alexa and Google Assistant convert spoken input into actionable
commands.
 Voice Search & Transcription: ML enables users to perform hands-free web searches and get
transcription during meetings or phone calls.
 Chatbots: Businesses use AI-powered chatbots for 24/7 customer support, helping resolve queries faster
and more efficiently.

3. Personalized Recommendations and User Experience


Modern digital platforms uses personalization which is done by using recommender systems. Machine learning
models analyze user behavior to deliver relevant content, improving engagement and satisfaction.
 Applications:
 Streaming Platforms: Netflix and Spotify suggest shows and songs based on your watching or listening
history.
 E-commerce: Sites like Amazon recommend products tailored to your preferences, browsing patterns and
past purchases.
 Social Media: Algorithms curate content feeds, prioritize posts and suggest friends or pages.
These systems use techniques like collaborative filtering and content-based filtering to create personalized digital
experiences.

4. Fraud Detection and Financial Forecasting


In finance, vast sums of money move digitally and machine learning plays a important role in fraud detection and
market analysis.
 Applications:
 Transaction Monitoring: Banks use ML models to detect unusual spending behavior and flag suspicious
transactions.
 Loan Risk Assessment: Credit scoring models analyze customer profiles and predict the likelihood of
default.
 Stock Market Prediction: ML is used to analyze historical stock data and forecast price movements.
Stock markets are complex, algorithmic trading uses these predictions for better decision-making.

5. Autonomous Vehicles and Smart Mobility


Self-driving vehicles use ML to understand their environment, navigate safely and make immediate decisions.
 Key Components:
 Computer Vision: Recognizing lanes, pedestrians, traffic signals and obstacles.
 Sensor Fusion: Combining data from cameras, LiDAR and radar for a 360-degree view.
 Behavior Prediction: Anticipating how other drivers or pedestrians may act.
Autonomous vehicles are capable of operating with minimal human input. Beyond cars, ML is also being used in
traffic optimization, smart navigation systems and predictive maintenance in transportation.

Unit-II

4. (a) What is NaïveBayesian Theorem? Explain.

Naive Bayes is a machine learning classification algorithm that predicts the category of a data point using
probability. It assumes that all features are independent of each other. Naive Bayes performs well in many real-
world applications such as spam filtering, document categorisation and sentiment analysis.

Here:
 Original data has two classes: green circles (y = 1) and red squares (y = 2).
 Estimate probability distribution along the first dimension i.e P(x1∣y=1),P(x1∣y=2)P(x1∣y=1),P(x1∣y=2)
 Estimate probability distribution along the second dimension i.e P(x2∣y=1),P(x2∣y=2)P(x2∣y=1),P(x2∣y=2)
 Combine both dimensions using conditional independence i.e P(x∣y)=∏αP(xα∣y)P(x∣y)=∏αP(xα∣y)

Key Features of Naive Bayes Classifiers


The main idea behind the Naive Bayes classifier is to use Bayes' Theorem to classify data based on the
probabilities of different classes given the features of the data. It is used mostly in high-dimensional text
classification
 The Naive Bayes Classifier is a simple probabilistic classifier and it has very few number of parameters
which are used to build the ML models that can predict at a faster speed than other classification
algorithms.
 It is a probabilistic classifier because it assumes that one feature in the model is independent of existence
of another feature. In other words, each feature contributes to the predictions with no relation between each
other.
 Naive Bayes Algorithm is used in spam filtration, Sentimental analysis, classifying articles and many more.
Why it is Called Naive Bayes?
It is named as "Naive" because it assumes the presence of one feature does not affect other features. The "Bayes"
part of the name refers to its basis in Bayes’ Theorem.
Consider a fictional dataset that describes the weather conditions for playing a game of golf. Given the weather
conditions, each tuple classifies the conditions as fit(“Yes”) or unfit(“No”) for playing golf. Here is a tabular
representation of our dataset.

Outlook Temperature Humidity Windy Play Golf

0 Rainy Hot High False Yes

1 Rainy Hot High True No

2 Overcast Hot High False Yes

3 Sunny Mild High False No

4 Sunny Cool Normal False Yes

5 Sunny Cool Normal True No

6 Overcast Cool Normal True Yes

7 Rainy Mild High False No

8 Rainy Cool Normal False Yes

9 Sunny Mild Normal False Yes

10 Rainy Mild Normal True Yes

11 Overcast Mild High True Yes

12 Overcast Hot Normal False Yes

13 Sunny Mild High True No

The dataset is divided into two parts i.e feature matrix and the response vector.
 Feature matrix contains all the vectors(rows) of dataset in which each vector consists of the value
of dependent features. In above dataset, features are ‘Outlook’, ‘Temperature’, ‘Humidity’ and ‘Windy’.
 Response vector contains the value of class variable (prediction or output) for each row of feature matrix.
In above dataset, the class variable name is ‘Play golf’.
Assumption of Naive Bayes
The fundamental Naive Bayes assumption is that each feature makes an:
 Feature independence: This means that when we are trying to classify something, we assume that each
feature (or piece of information) in the data does not affect any other feature.
 Continuous features are normally distributed: If a feature is continuous, then it is assumed to be
normally distributed within each class.
 Discrete features have multinomial distributions: If a feature is discrete, then it is assumed to have a
multinomial distribution within each class.
 Features are equally important: All features are assumed to contribute equally to the prediction of the
class label.
 No missing data: The data should not contain any missing values.
 Introduction to Bayes' Theorem

Bayes’ Theorem provides a principled way to reverse conditional probabilities. It is defined as:
P(y∣X)=P(X∣y)⋅P(y)P(X)P(y∣X)=P(X)P(X∣y)⋅P(y)
Where:
 P(y∣X)P(y∣X): Posterior probability, probability of class yy given features XX
 P(X∣y)P(X∣y): Likelihood, probability of features XX given class yy
 P(y)P(y): Prior probability of class yy
 P(X)P(X): Marginal likelihood or evidence
 Naive Bayes Working
1. Terminology
Consider a classification problem (like predicting if someone plays golf based on weather). Then:
 yy is the class label (e.g. "Yes" or "No" for playing golf)
 X=(x1,x2,...,xn)X=(x1,x2,...,xn) is the feature vector (e.g. Outlook, Temperature, Humidity, Wind)
A sample row from the dataset:
X=(Rainy, Hot, High, False),y=NoX=(Rainy, Hot, High, False),y=No
This represents:
What is the probability that someone will not play golf given that the weather is Rainy, Hot, High humidity and No
wind?

2. The Naive Assumption


The "naive" in Naive Bayes comes from the assumption that all features are independent given the class. That is:
P(x1,x2,...,xn∣y)=P(x1∣y)⋅P(x2∣y)⋯P(xn∣y)P(x1,x2,...,xn∣y)=P(x1∣y)⋅P(x2∣y)⋯P(xn∣y)
Thus, Bayes' theorem becomes:
P(y∣x1,...,xn)=P(y)⋅∏i=1nP(xi∣y)P(X)P(y∣x1,...,xn)=P(X)P(y)⋅∏i=1nP(xi∣y)
Since the denominator is constant for a given input, we can write:
P(y∣x1,...,xn)∝P(y)⋅∏i=1nP(xi∣y)P(y∣x1,...,xn)∝P(y)⋅∏i=1nP(xi∣y)

3. Constructing the Naive Bayes Classifier


We compute the posterior for each class yy and choose the class with the highest probability:
y^=arg⁡max⁡yP(y)⋅∏i=1nP(xi∣y)y^=argmaxyP(y)⋅∏i=1nP(xi∣y)
This becomes our Naive Bayes classifier.

4. Example: Weather Dataset


Let’s take a dataset used for predicting if golf is played based on:
 Outlook: Sunny, Rainy, Overcast
 Temperature: Hot, Mild, Cool
 Humidity: High, Normal
 Wind: True, False

Example Input: X=(Sunny,Hot,Normal,False)X=(Sunny,Hot,Normal,False)


Goal: Predict if golf will be played (Yes or No).
 5. Pre-computation from Dataset
Class Probabilities:
From dataset of 14 rows:
 P(Yes)=914P(Yes)=149
 P(No)=514P(No)=145
Conditional Probabilities (Tables 1–4):
Table
6. Calculate Posterior Probabilities
For Class = Yes:
P(Yes | today)∝29⋅29⋅69⋅69⋅914P(Yes | today)∝92⋅92⋅96⋅96⋅149
P(Yes | today)≈0.02116P(Yes | today)≈0.02116
For Class = No:
P(No | today)∝35⋅25⋅15⋅25⋅514P(No | today)∝53⋅52⋅51⋅52⋅145
P(No | today)≈0.0068P(No | today)≈0.0068

7. Normalize Probabilities
To compare:
P(Yes | today)=0.021160.02116+0.0068≈0.756P(Yes | today)=0.02116+0.00680.02116≈0.756
P(No | today)=0.00680.02116+0.0068≈0.244P(No | today)=0.02116+0.00680.0068≈0.244

8. Final Prediction
Since:
P(Yes | today)>P(No | today)P(Yes | today)>P(No | today)
The model predicts: Yes (Play Golf)

Naive Bayes for Continuous Features


For continuous features, we assume a Gaussian distribution:
P(xi∣y)=12πσy2exp⁡(−(xi−μy)22σy2)P(xi∣y)=2πσy21exp(−2σy2(xi−μy)2)
Where:
 μyμy is the mean of feature xixi for class yy
 σy2σy2 is the variance of feature xixi for class yy
This leads to what is called Gaussian Naive Bayes.

Types of Naive Bayes Model


There are three types of Naive Bayes Model :

1. Gaussian Naive Bayes


In Gaussian Naive Bayes, continuous values associated with each feature are assumed to be distributed according
to a Gaussian distribution. A Gaussian distribution is also called Normal distribution When plotted, it gives a bell
shaped curve which is symmetric about the mean of the feature values as shown below:

2. Multinomial Naive Bayes


Multinomial Naive Bayes is used when features represent the frequency of terms (such as word counts) in a
document. It is commonly applied in text classification, where term frequencies are important.

3. Bernoulli Naive Bayes


Bernoulli Naive Bayes deals with binary features, where each feature indicates whether a word appears or not in a
document. It is suited for scenarios where the presence or absence of terms is more relevant than their frequency.
Both models are widely used in document classification tasks

Advantages
 Easy to implement and computationally efficient.
 Effective in cases with a large number of features.
 Performs well even with limited training data.
 It performs well in the presence of categorical features.
 For numerical features data is assumed to come from normal distributions
Disadvantages
 Assumes that features are independent, which may not always hold in real-world data.
 Can be influenced by irrelevant attributes.
 May assign zero probability to unseen events, leading to poor generalization.

Applications
 Spam Email Filtering: Classifies emails as spam or non-spam based on features.
 Text Classification: Used in sentiment analysis, document categorization and topic classification.
 Medical Diagnosis: Helps in predicting the likelihood of a disease based on symptoms.
 Credit Scoring: Evaluates creditworthiness of individuals for loan approval.
 Weather Prediction: Classifies weather conditions based on various factors.

(b) Differentiate linear and logistic Regression .

Regression
Here we compare linear regression and logistic regression
Feature Linear Regression Logistic Regression

Type Supervised regression model Supervised classification model

Prediction Continuous numeric values Categorical values

Activation Sigmoid function converts linear output to


Not used
Function probability

Threshold Value Not required Required to classify output into classes

Evaluation
Root Mean Square Error Precision, Recall, Accuracy, F1-score
Metric

Dependent variable follows normal/Gaussian Dependent variable follows binomial


Assumption distribution distribution

Dependent
Numeric and continuous Categorical (binary or multinomial)
Variable

5. What is K-Nearest Neighbours (KNN) and Why it is used? Explain

K-Nearest Neighbors (KNN) is a supervised machine learning algorithm generally used for classification but can
also be used for regression tasks. It works by finding the "k" closest data points (neighbors) to a given input and
makes a predictions based on the majority class (for classification) or the average value (for regression). Since KNN
makes no assumptions about the underlying data distribution it makes it a non-parametric and instance-based
learning method.

K-Nearest Neighbors is also called as a lazy learner algorithm because it does not learn from the training set
immediately instead it stores the entire dataset and performs computations only at the time of classification.
For example, consider the following table of data points containing two features:
The new point is classified as Category 2 because most of its closest neighbors are blue squares. KNN assigns the
category based on the majority of nearby points. The image shows how KNN predicts the category of a new data
point based on its closest neighbours.
 The red diamonds represent Category 1 and the blue squares represent Category 2.
 The new data point checks its closest neighbors (circled points).
 Since the majority of its closest neighbors are blue squares (Category 2) KNN predicts the new data point
belongs to Category 2.

KNN works by using proximity and majority voting to make predictions.


What is 'K' in K Nearest Neighbour?
In the k-Nearest Neighbours algorithm k is just a number that tells the algorithm how many nearby points or
neighbors to look at when it makes a decision.
Example: Imagine you're deciding which fruit it is based on its shape and size. You compare it to fruits you already
know.
 If k = 3, the algorithm looks at the 3 closest fruits to the new one.
 If 2 of those 3 fruits are apples and 1 is a banana, the algorithm says the new fruit is an apple because most
of its neighbors are apples.
How to choose the value of k for KNN Algorithm?
 The value of k in KNN decides how many neighbors the algorithm looks at when making a prediction.
 Choosing the right k is important for good results.
 If the data has lots of noise or outliers, using a larger k can make the predictions more stable.
 But if k is too large the model may become too simple and miss important patterns and this is called
underfitting.
 So k should be picked carefully based on the data.
Statistical Methods for Selecting k
 Cross-Validation: Cross-Validation is a good way to find the best value of k is by using k-fold cross-
validation. This means dividing the dataset into k parts. The model is trained on some of these parts and
tested on the remaining ones. This process is repeated for each part. The k value that gives the highest
average accuracy during these tests is usually the best one to use.
 Elbow Method: In Elbow Method we draw a graph showing the error rate or accuracy for different k
values. As k increases the error usually drops at first. But after a certain point error stops decreasing
quickly. The point where the curve changes direction and looks like an "elbow" is usually the best choice
for k.
 Odd Values for k: It’s a good idea to use an odd number for k especially in classification problems. This
helps avoid ties when deciding which class is the most common among the neighbors.

Distance Metrics Used in KNN Algorithm


KNN uses distance metrics to identify nearest neighbor, these neighbors are used for classification and regression
task. To identify nearest neighbor we use below distance metrics:

 1. Euclidean Distance

Euclidean distance is defined as the straight-line distance between two points in a plane or space. You can think
of it like the shortest path you would walk if you were to go directly from one point to another.
distance(x,Xi)=∑j=1d(xj−Xij)2]distance(x,Xi)=∑j=1d(xj−Xij)2]
 2. Manhattan Distance

This is the total distance you would travel if you could only move along horizontal and vertical lines like a grid
or city streets. It’s also called "taxicab distance" because a taxi can only drive along the grid-like streets of a city.
d(x,y)=∑i=1n∣xi−yi∣d(x,y)=∑i=1n∣xi−yi∣

 3. Minkowski Distance

Minkowski distance is like a family of distances, which includes both Euclidean and Manhattan distances as
special cases.
d(x,y)=(∑i=1n(xi−yi)p)1pd(x,y)=(∑i=1n(xi−yi)p)p1
From the formula above, when p=2, it becomes the same as the Euclidean distance formula and when p=1, it
turns into the Manhattan distance formula. Minkowski distance is essentially a flexible formula that can represent
either Euclidean or Manhattan distance depending on the value of p.
Applications of KNN
 Recommendation Systems: Suggests items like movies or products by finding users with similar
preferences.
 Spam Detection: Identifies spam emails by comparing new emails to known spam and non-spam
examples.
 Customer Segmentation: Groups customers by comparing their shopping behavior to others.
 Speech Recognition: Matches spoken words to known patterns to convert them into text.

Unit-III

6. (a) How Unsupervised learning Works? Explain.

Unsupervised learning is a type of machine learning in which the model is trained on unlabeled data. This means
the system is not given predefined output values or target labels. Instead, it learns by identifying patterns,
structures, and relationships within the data on its own.

Working of Unsupervised Learning


1. Input Data Collection
The algorithm receives a dataset containing only input features without any corresponding output labels.
2. Data Preprocessing
The data is cleaned, normalized, and transformed to improve the learning process and accuracy.
3. Pattern Discovery
The algorithm analyzes the data to find hidden patterns such as similarities, differences, or correlations
among data points.
4. Grouping or Structuring Data
Based on discovered patterns, the algorithm organizes data into:
o Clusters (grouping similar data points together), or
o Associations (discovering relationships between variables), or
o Dimensional structures (reducing data complexity).
5. Model Output
The result is not a predicted label but insights such as:
o Clusters of similar data
o Frequently occurring patterns
o Reduced-dimensional representations
6. Evaluation and Interpretation
Since no labels exist, results are evaluated using measures like cohesion, separation, or domain knowledge
rather than accuracy.
(b) Differentiate supervised and unsupervised Learning?

Difference between Supervised and Unsupervised Learning

Aspect Supervised Learning Unsupervised Learning

Uses labeled data (input features + Uses unlabeled data (only input features,
Input Data corresponding outputs). no outputs).

Predicts outcomes or classifies data Discovers hidden patterns, structures, or


Goal based on known labels. groupings in data.

Less complex, as the model learns


More complex, as the model must find
Computational from labeled data with clear
patterns without any guidance.
Complexity guidance.

Two types : Classification (for


discrete outputs) or regression (for Clustering and association
Types continuous outputs).

Model can be tested and evaluated Cannot be tested in the traditional sense,
Testing the Model using labeled test data. as there are no labels.

7. Explain K-means Clustering Algorithm.

K-Means Clustering Algorithm


K-means clustering is an unsupervised machine learning algorithm used to group data into K distinct clusters
based on similarity. Each cluster is represented by the mean (centroid) of its data points.

I. Working of K-Means Algorithm


1. Choose the Number of Clusters (K)
The value of K is predefined by the user.
2. Initialize Centroids
Randomly select K data points as the initial centroids.
3. Assign Data Points to Clusters
Each data point is assigned to the nearest centroid using a distance measure (commonly Euclidean
distance).
4. Recalculate Centroids
For each cluster, calculate the new centroid by taking the mean of all data points assigned to that cluster.
5. Repeat the Process
Steps 3 and 4 are repeated until:
o Centroids no longer change, or
o Data points remain in the same clusters.
6. Final Clusters Formed
The algorithm converges and produces K well-defined clusters.

Algorithm (Steps)
1. Select value of K
2. Initialize K centroids randomly
3. Assign each data point to the nearest centroid
4. Update centroids
5. Repeat steps 3 and 4 until convergence
Advantages of K-Means
 Simple and easy to understand
 Efficient for large datasets
 Fast convergence
Disadvantages of K-Means
 Requires predefined value of K
 Sensitive to initial centroid selection
 Performs poorly with non-spherical clusters and outliers

Applications of K-Means
 Customer segmentation
 Image compression
 Document clustering
 Market analysis
Unit-IV

8. Explain

(i) ROC and AUC

AUC-ROC curve is a graph used to check how well a binary classification model works. It helps us to understand
how well the model separates the positive cases like people with a disease from the negative cases like people
without the disease at different threshold level. It shows how good the model is at telling the difference between the
two classes by plotting:
 True Positive Rate (TPR): how often the model correctly predicts the positive cases also known as
Sensitivity or Recall.
 False Positive Rate (FPR): how often the model incorrectly predicts a negative case as positive.
 Specificity: measures the proportion of actual negatives that the model correctly identifies. It is calculated
as 1 - FPR.
The higher the curve the better the model is at making correct predictions.

These terms are derived from the confusion matrix which provides the following values:
 True Positive (TP): Correctly predicted positive instances
 True Negative (TN): Correctly predicted negative instances
 False Positive (FP): Incorrectly predicted as positive
 False Negative (FN): Incorrectly predicted as negative

 ROC Curve : It plots TPR vs. FPR at different thresholds. It represents the trade-off between the
sensitivity and specificity of a classifier.
 AUC(Area Under the Curve): measures the area under the ROC curve. A higher AUC value indicates
better model performance as it suggests a greater ability to distinguish between classes. An AUC value of
1.0 indicates perfect performance while 0.5 suggests it is random guessing.
How AUC-ROC Works
AUC-ROC curve helps us understand how well a classification model distinguishes between the two classes.
Imagine we have 6 data points and out of these:
 3 belong to the positive class: Class 1 for people who have a disease.
 3 belong to the negative class: Class 0 for people who don’t have disease.

Now the model will give each data point a predicted probability of belonging to Class 1. The AUC measures the
model's ability to assign higher predicted probabilities to the positive class than to the negative class. Here’s how it
work:
1. Randomly choose a pair: Pick one data point from the positive class (Class 1) and one from the negative
class (Class 0).
2. Check if the positive point has a higher predicted probability: If the model assigns a higher probability
to the positive data point than to the negative one for correct ranking.
3. Repeat for all pairs: We do this for all possible pairs of positive and negative examples.

When to Use AUC-ROC


AUC-ROC is effective when:
 The dataset is balanced and the model needs to be evaluated across all thresholds.
 False positives and false negatives are of similar importance.
In cases of highly imbalanced datasets AUC-ROC might give overly optimistic results. In such cases the Precision-
Recall Curve is more suitable focusing on the positive class.
Model Performance with AUC-ROC:
 High AUC (close to 1): The model effectively distinguishes between positive and negative instances.
 Low AUC (close to 0): The model struggles to differentiate between the two classes.
 AUC around 0.5: The model doesn’t learn any meaningful patterns i.e it is doing random guessing.

(b) Median absolute deviation

Median Absolute Deviation (MAD)


Median Absolute Deviation (MAD) is a robust statistical measure of variability that describes how spread out
the data values are around the median. It is especially useful when the data contains outliers, as it is less sensitive
to extreme values than standard deviation.

Definition
The Median Absolute Deviation is defined as:
MAD=median(∣xi−median(X)∣)
where:
 xix_ixi = each data value
 XXX = dataset

Steps to Calculate MAD


1. Find the median of the dataset.
2. Calculate the absolute deviation of each data point from the median.
3. Find the median of these absolute deviations.
4. The result is the MAD.

Example
Given data:
2, 4, 5, 6, 92,\ 4,\ 5,\ 6,\ 92, 4, 5, 6, 9
1. Median = 5

∣2−5∣=3, ∣4−5∣=1, ∣5−5∣=0, ∣6−5∣=1, ∣9−5∣=4|2-5|=3,\ |4-5|=1,\ |5-5|=0,\ |6-5|=1,\ |9-5|


2. Absolute deviations:

=4∣2−5∣=3, ∣4−5∣=1, ∣5−5∣=0, ∣6−5∣=1, ∣9−5∣=4


3. Median of deviations = 1
MAD=1\text{MAD} = 1MAD=1

Features / Characteristics
 Robust to outliers
 Based on median, not mean
 Simple to compute
 Suitable for skewed data

Uses of MAD
 Outlier detection
 Robust statistics
 Data preprocessing in machine learning
 Measuring data variability

9. What is performance measurement of Model in machine learning. Explain

Model performance indicates how well a machine learning (ML) model carries out the task for which it was
designed, based on various metrics. Measuring model performance is essential for optimizing an ML model before
releasing it to production and enhancing it after deployment. Without proper optimization, models might produce
inaccurate or unreliable predictions and suffer from inefficiencies, leading to poor performance.
Assessing model performance happens during the model evaluation and model monitoring stages of a machine
learning pipeline. After artificial intelligence (AI) practitioners work on the initial phases of ML projects, they then
evaluate a model’s performance across multiple datasets, tasks and metrics to gauge its effectiveness. Once the
model is deployed, machine learning operations (MLOps) teams monitor model performance for continuous
improvement.

Factors affecting model performance


An AI model’s performance is generally measured using a test set, comparing the model’s outputs against
predictions on the baseline test set. Insights gained from evaluating performance help determine if a model is ready
for real-world deployment or if it needs tweaking or additional training.
Here are some factors that can impact a machine learning model’s performance:
 Data quality
 Data leakage
 Feature selection
 Model fit
 Model drift
 Bias

Data quality
A model is only as good as the data used to train it. Model performance falls short when its training data is flawed,
containing inaccuracies or inconsistencies like duplicates, missing values and wrong data labels or annotations. A
lack of balance—such as having too many values for one scenario over another or a training dataset that’s not
sufficient or diverse enough to correctly capture correlations—can also lead to skewed results.

Data leakage
Data leakage in machine learning occurs when a model uses information during training that wouldn’t be available
at the time of prediction. This can be caused by data preprocessing errors or contamination due to improper splitting
of data into training, validation and test sets. Data leakage causes a predictive model to struggle when generalizing
on unseen data, yield inaccurate or unreliable results, or inflate or deflate performance metrics.

Feature selection
Feature selection involves choosing the most relevant features of a dataset to use for model training. Data features
influence how machine learning algorithms configure their weights during training, which in turn drives
performance. Additionally, reducing the feature space to a selected subset can help improve performance while
lowering computational demands. However, picking irrelevant or insignificant features can weaken model
performance.

Model fit
Overfitting happens when an ML model is too complex and fits too closely or even exactly to its training data, so it
doesn’t generalize well on new data. Conversely, underfitting occurs when a model is so simple that it fails to
capture the underlying patterns in both training and testing data.

Model drift
Model drift refers to a model’s performance degrading because of changes in data or in the relationships between
input and output variables. This decay can negatively impact model performance, leading to faulty decision-making
and bad predictions.

Bias
Bias in AI can be introduced at any phase of a machine learning workflow, but it’s particularly prevalent in the data
processing and model development stages. Data bias occurs when the unrepresentative nature of training and fine-
tuning datasets adversely affects model behavior and performance. Meanwhile, algorithmic bias is not caused by the
algorithm itself but by how data science teams collect and code training data and how AI programmers design and
develop machine learning algorithms. AI bias can lead to inaccurate outputs and potentially harmful outcomes.

Common questions

Powered by AI

Gaussian Naive Bayes is suitable for continuous features assumed to follow a Gaussian distribution, useful in applications like continuous variable prediction. Bernoulli Naive Bayes, handling binary features, excels in scenarios prioritizing the presence/absence of features, such as in document classification where it considers term presence rather than frequency, making it ideal for tasks like spam detection .

Model drift causes a decline in prediction reliability due to changes in data patterns or variable relationships over time. To address this, continuous monitoring and regular retraining on fresh data are essential to update the model and preserve its relevance and accuracy in real-world environments, ensuring informed decision-making .

Model deployment integrates a predictive model into existing systems for real-world use, where it aids decision-making processes through its predictions. Monitoring post-deployment helps detect issues like data drift, accuracy decline, or changes in underlying patterns. Retraining and maintenance are necessary to maintain reliability and performance, ensuring that the model adapts to evolving real-world conditions .

Naive Bayes assumes feature independence, equal importance of features, and that data has no missing values. These assumptions simplify computations, making the algorithm efficient for high-dimensional data like text classification. However, they may not always hold true, potentially affecting performance if features are indeed correlated or differently influential. Despite these assumptions, Naive Bayes performs well in applications like spam filtering and sentiment analysis .

Model selection is fundamental because it directly impacts the model's ability to address the specific problem and handle the nature of the data effectively. Important factors include the complexity of the problem, the data's characteristics, performance expectations, interpretability, and scalability. Experimenting with different models is essential to find the best fit .

The iterative process of model training improves predictive capabilities by continuously adjusting parameters to reduce errors and increase accuracy. It allows the model to learn from historical data, recognizing patterns and dependencies, and optimizing through fine-tuning to enhance its prediction performance .

Model evaluation metrics like accuracy, precision, recall, and F1 score are critical in determining model reliability. They offer insights into a model's strengths and weaknesses, guiding necessary adjustments to improve predictive accuracy and achieving robustness. These metrics ensure the model's effective performance against unseen data, which is crucial for reliable decision-making in real-world applications .

Sensor fusion in autonomous vehicles synthesizes data from different sensors like cameras, LiDAR, and radar to provide a comprehensive 360-degree view of the environment. This integration allows vehicles to identify lanes, pedestrians, traffic signals, and obstacles, thereby enhancing the vehicle's decision-making capacity for safe navigation .

Linear regression models predict continuous numeric values, applying to applications requiring estimation of quantities, such as sales forecasting. In contrast, logistic regression models output categorical values through a sigmoid activation function, suitable for classification tasks where the outcome is binary, like determining the probability of an event occurring .

Poor data quality, including inaccuracies, inconsistencies, and imbalanced datasets, hampers model performance by producing unreliable predictions and skewed results. Mitigation involves ensuring accurate data collection, handling missing values, balancing the dataset, and consistently verifying data integrity during preprocessing to support robust model training and minimize biases .

You might also like