Understanding Bias and Variance in ML

Uploaded by

asanetejal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views46 pages

Understanding Bias and Variance in ML

Uploaded by

asanetejal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

Subject : Machine Learning

Unit-2:-Supervised Learning:- Regression

Bias
• What is Bias?
• In general, a machine learning model analyses the data, find
patterns in it and make predictions.
• While training, the model learns these patterns in the dataset
and applies them to test data for prediction.
• While making predictions, a difference occurs between
prediction values made by the model and actual
values/expected values, and this difference is known as bias
errors or Errors due to bias.
• Low Bias: A low bias model will make fewer assumptions
about the form of the target function.
• High Bias: A model with a high bias makes more assumptions,
and the model becomes unable to capture the important
features of our dataset. A high bias model also cannot
perform well on new data.
Bias
• Some examples of machine learning algorithms with
low bias are Decision Trees, k-Nearest Neighbours and
Support Vector Machines. At the same time, an
algorithm with high bias is Linear Regression, Linear
Discriminant Analysis and Logistic Regression.
• Ways to reduce High Bias:
• High bias mainly occurs due to a much simple model.
Below are some ways to reduce the high bias:
• Increase the input features as the model is underfitted.
• Use more complex models, such as including some
polynomial features.
Variance Error?
• What is a Variance Error?
• The variance would specify the amount of variation in the
prediction if the different training data was used.
• In simple words, variance tells that how much a random
variable is different from its expected value.
• Ideally, a model should not vary too much from one training
dataset to another, which means the algorithm should be good in
understanding the hidden mapping between inputs and output
variables.
• Variance errors are either of low variance or high variance.
• Low variance means there is a small variation in the prediction
of the target function with changes in the training data set. At
the same time, High variance shows a large variation in the
prediction of the target function with changes in the training
dataset.
• A model that shows high variance learns a lot and
perform well with the training dataset, and does not
generalize well with the unseen dataset.
• As a result, such a model gives good results with the
training dataset but shows high error rates on the test
dataset.
• Since, with high variance, the model learns too much
from the dataset, it leads to overfitting of the model.
• A model with high variance has the below problems:
• A high variance model leads to overfitting.
• Increase model complexities.
• Some examples of machine learning algorithms with
low variance are, Linear Regression, Logistic
Regression, and Linear discriminant analysis.
• At the same time, algorithms with high variance
are decision tree, Support Vector Machine, and K-
nearest neighbours.
• The figure demonstrates the three concepts discussed above.
On the left, the red line represents a model that is underfitting.
• The model notes that there is some trend in the data, but it is
not specific enough to capture relevant information. It is
unable to make accurate predictions for training or new data.
In the middle, the blue line represents a model that is
balanced.
• This model notes there is a trend in the data, and accurately
models it. This middle model will be able to generalize
successfully.
• On the right, the blue line represents a model that is
overfitting.
• The model notes a trend in the data, and accurately models the
training data, but it is too specific. It will fail to make accurate
predictions with new data because it learned the training data
too well.
Ways to Reduce High Variance:
• Reduce the input features or number of parameters as a model is
overfitted.
• Do not use a much complex model.
• Increase the training data.
• Increase the Regularization term.
Different Combinations of Bias-Variance
Low-Bias, Low-Variance:
The combination of low bias and low variance
shows an ideal machine learning model.
However, it is not possible practically.
• Low-Bias, High-Variance: With low bias and high
variance, model predictions are inconsistent and
accurate on average. This case occurs when the model
learns with a large number of parameters and hence
leads to an overfitting
• High-Bias, Low-Variance: With High bias and low
variance, predictions are consistent but inaccurate on
average. This case occurs when a model does not learn
well with the training dataset or uses few numbers of
the parameter. It leads to underfitting problems in the
model.
• High-Bias, High-Variance:
With high bias and high variance, predictions are
inconsistent and also inaccurate on average.
Generalization
• Generalization is a term used to describe a model’s ability to
react to new data. That is, after being trained on a training
set, a model can digest new data and make accurate
predictions.
• If a model has been trained too well on training data, it will be
unable to generalize.
• It will make inaccurate predictions when given new data,
making the model useless even though it is able to make
accurate predictions for the training data. This is called
overfitting.
• The inverse is also true. Underfitting happens when a model
has not been trained enough on the data. In the case of
underfitting, it makes the model just as useless and it is not
capable of making accurate predictions, even with the training
data.
Linear Regression
• Regression analysis is a statistical method to model the
relationship between a dependent (target) and
independent (predictor) variables with one or more
independent variables.
• It predicts continuous/real values such as temperature,
age, salary, price, etc.
• Regression is a supervised learning technique
• It is mainly used for prediction, forecasting, time series
modeling, and determining the causal-effect
relationship between variables.
• "Regression shows a line or curve that passes through
all the datapoints on target-predictor graph in such a
way that the vertical distance between the datapoints
and the regression line is minimum."
Linear Regression
• Some examples of regression can be as:
• Prediction of rain using temperature and other factors
• Determining Market trends
• Prediction of road accidents due to rash driving.
• Types of Regression
• Linear Regression
• Logistic Regression
• Polynomial Regression
• Support Vector Regression
• Decision Tree Regression
• Random Forest Regression
• Ridge Regression
• Lasso Regression:
Types Regression
Linear Regression
• Linear regression is a statistical regression method which is used
for predictive analysis.
• It is one of the very simple and easy algorithms which works on
regression and shows the relationship between the continuous
variables.
• It is used for solving the regression problem in machine learning.
• Linear regression shows the linear relationship between the
independent variable (X-axis) and the dependent variable (Y-axis),
hence called linear regression.
• If there is only one input variable (x), then such linear regression
is called simple linear regression. And if there is more than one
input variable, then such linear regression is called multiple linear
regression.
• The relationship between variables in the linear regression model
can be explained using the below image. Here we are predicting
the salary of an employee on the basis of the year of experience.
Linear Regression
• Y= aX+b
• Here, Y = dependent variables (target variables),
X= Independent variables (predictor variables),
a and b are the linear coefficients
• Some popular applications of linear regression are:
• Analyzing trends and sales estimates
• Salary forecasting
• Real estate prediction
• Arriving at ETAs in traffic.
Simple Linear Regression
• Simple regression problem (a single x and a
single y), the form of the model would be:
Constant Coefficient

y = b0 + b1 *
x1

Dependent Independent variable

variable (IV)
(DV)
Example-2
• Let’s make this concrete with an example.
Imagine we are predicting weight (y) from
height (x). Our linear regression model
representation for this problem would be:
y = B0 + B1 * x1
or
weight =B0 +B1 * height
• Where B0 is the bias coefficient and B1 is the coefficient for
the height column. We use a learning technique to find a
good set of coefficient values. Once found, we can plug in
different height values to predict the weight.
• For example, lets use B0 = 0.1 and B1 = 0.5. Let’s
plug them in and calculate the weight (in kilograms)
for a person with the height of 182 centimeters.
weight = 0.1 + 0.05 * 182
weight = 91.1
• You can see that the above equation could be plotted as a line
in two-dimensions. The B0 is our starting point regardless of
what height we have.
• We can run through a bunch of heights from 100 to 250
centimeters and plug them to the equation and get weight
values, creating our line.
Multi Linear Regression
Constant Coefficients

y = b0 + b1 * x1 + b2 * x2 + ... + bn *
xn
Independent variables
Dependent
(IVs)
variable
(DV)
Logistic Regression
• Logistic regression is another supervised learning algorithm
which is used to solve the classification problems.
In classification problems, we have dependent variables in a
binary or discrete format such as 0 or 1.
• Logistic regression algorithm works with the categorical variable
such as 0 or 1, Yes or No, True or False, Spam or not spam, etc.
• It is a predictive analysis algorithm which works on the concept
of probability.
• Logistic regression is a type of regression, but it is different from
the linear regression algorithm in the term how they are used.
• Logistic regression uses sigmoid function or logistic function
which is a complex cost function. This sigmoid function is used
to model the data in logistic regression. The function can be
represented as:
Logistic Regression

• f(x)= Output between the 0 and 1 value.

• x= input to the function
• e= base of natural logarithm.
• It uses the concept of threshold levels, values
above the threshold level are rounded up to 1,
and values below the threshold level are
rounded up to 0.
• There are three types of logistic regression:
• Binary(0/1, pass/fail)
• Multi(cats, dogs, lions)
• Ordinal(low, medium, high)
Ridge Regression (L1)
Ridge Regression (L1)
Ridge Regression (L1)
Ridge Regression (L1)
Ridge Regression (L1)
Ridge Regression (L1)
Ridge Regression (L1)
Ridge Regression (L1)
Ridge Regression (L1)
Lasso Regression (L2)
Ridge vs Lasso Regression (L1 vs L2)
Elasticnet Regression
MAE
• The essential step in any machine learning model is to
evaluate the accuracy of the model.
• The Mean Squared Error, Mean absolute error, Root
Mean Squared Error, and R-Squared or Coefficient of
determination metrics are used to evaluate the
performance of the model in regression analysis.
• The Mean absolute error represents the average of the
absolute difference between the actual and predicted values in
the dataset.
• It measures the average of the residuals in the dataset.
• Advantages of MAE
• The MAE you get is in the same unit as the
output variable.
• It is most Robust to outliers.
• Disadvantages of MAE
• The graph of MAE is not differentiable so we
have to apply various optimizers like Gradient
descent which can be differentiable.
• Now to overcome the disadvantage of MAE
next metric came as MSE.
MSE
• Mean Squared Error represents the average of the
squared difference between the original and predicted
values in the data set.
• It measures the variance of the residuals.
• we perform squared to avoid the cancellation of
negative terms and it is the benefit of MSE.
• Advantages of MSE
• The graph of MSE is differentiable, so you can
easily use it as a loss function.
• Disadvantages of MSE
• The value you get after calculating MSE is a
squared unit of output. for example, the output
variable is in meter(m) then after calculating
MSE the output we get is in meter squared.
• If you have outliers in the dataset then it
penalizes the outliers most and the calculated
MSE is bigger. So, in short, It is not Robust to
outliers which were an advantage in MAE.
RME
• Root Mean Squared Error is the square root of Mean
Squared error. It measures the standard deviation of
residuals. Advantages of RMSE
• The output value you get is in the same unit as the
required output variable which makes interpretation of
loss easy.
• Disadvantages of RMSE
• It is not that robust to outliers as compared to MAE.
• for performing RMSE we have to NumPy NumPy square
root function over MSE.
R-squared
• The coefficient of determination or R-
squared represents the proportion of the variance in
the dependent variable which is explained by the linear
regression model.
• It is a scale-free score i.e. irrespective of the values
being small or large, the value of R square will be less
than one.
• To control this situation of RMSE we take the log of
calculated RMSE error and resultant we get as RMSLE.
• To perform RMSLE we have to use the NumPy log
function over RMSE.
• It is a very simple metric that is used by most of the
datasets hosted for Machine Learning competitions.
• So, with help of R squared we have a baseline
model to compare a model which none of the
other metrics provides.
• The same we have in classification problems
which we call a threshold which is fixed at 0.5.
So basically R2 squared calculates how must
regression line is better than a mean line.
• Hence, R2 squared is also known as Coefficient
of Determination or sometimes also known as
Goodness of fit.
• suppose If the R2 score is zero then the above regression
line by mean line is equal means 1 so 1-1 is zero.
• So, in this case, both lines are overlapping means model
performance is worst, It is not capable to take advantage
of the output column.
• Now the second case is when the R2 score is 1, it means
when the division term is zero and it will happen when
the regression line does not make any mistake, it is
perfect. In the real world, it is not possible.
• So we can conclude that as our regression line moves
towards perfection, R2 score move towards one. And the
model performance improves.
• The normal case is when the R2 score is between zero
and one like 0.8 which means your model is capable to
explain 80 per cent of the variance of data.
Adjusted R Squared

• The disadvantage of the R2 score is while adding new

features in data the R2 score starts increasing or
remains constant but it never decreases because It
assumes that while adding more data variance of data
increases.
• But the problem is when we add an irrelevant feature
in the dataset then at that time R2 sometimes starts
increasing which is incorrect.
• Hence, To control this situation Adjusted R Squared
came into existence.
Differences among these evaluation metrics
• Mean Squared Error(MSE) and Root Mean Square Error
penalizes the large prediction errors vi-a-vis Mean
Absolute Error (MAE).
• However, RMSE is widely used than MSE to evaluate the
performance of the regression model with other
random models as it has the same units as the
dependent variable (Y-axis).
• MSE is a differentiable function that makes it easy to
perform mathematical operations in comparison to a
non-differentiable function like MAE.
• Therefore, in many models, RMSE is used as a default
metric for calculating Loss Function despite being
harder to interpret than MAE
Differences among these evaluation metrics
• The lower value of MAE, MSE, and RMSE implies higher accuracy
of a regression model. However, a higher value of R square is
considered desirable.
• R Squared & Adjusted R Squared are used for explaining how
well the independent variables in the linear regression model
explains the variability in the dependent variable.
• R Squared value always increases with the addition of the
independent variables which might lead to the addition of the
redundant variables in our model. However, the adjusted R-
squared solves this problem.
• Both RMSE and R- Squared quantifies how well a linear
regression model fits a dataset. The RMSE tells how well a
regression model can predict the value of a response variable in
absolute terms while R- Squared tells how well the predictor
variables can explain the variation in the response variable.

Bias-Variance Tradeoff in Machine Learning
No ratings yet
Bias-Variance Tradeoff in Machine Learning
23 pages
Univariate vs Multivariate Regression Analysis
No ratings yet
Univariate vs Multivariate Regression Analysis
38 pages
DSBD Unit Ii
No ratings yet
DSBD Unit Ii
15 pages
Regression Techniques in Machine Learning
No ratings yet
Regression Techniques in Machine Learning
56 pages
Regression and Logistic Models Explained
No ratings yet
Regression and Logistic Models Explained
46 pages
Overfitting vs Underfitting in Regression
No ratings yet
Overfitting vs Underfitting in Regression
5 pages
Understanding Linear Regression Basics
No ratings yet
Understanding Linear Regression Basics
16 pages
Understanding Bias-Variance Tradeoff in ML
No ratings yet
Understanding Bias-Variance Tradeoff in ML
23 pages
Definition of Linear Regression
No ratings yet
Definition of Linear Regression
60 pages
Understanding Regression in Machine Learning
No ratings yet
Understanding Regression in Machine Learning
42 pages
Understanding Linear and Polynomial Regression
No ratings yet
Understanding Linear and Polynomial Regression
24 pages
Supervised Learning: Regression Techniques
No ratings yet
Supervised Learning: Regression Techniques
41 pages
Understanding Simple Linear Regression
No ratings yet
Understanding Simple Linear Regression
38 pages
Types of Regression in Machine Learning
No ratings yet
Types of Regression in Machine Learning
23 pages
Understanding Regression Analysis in ML
No ratings yet
Understanding Regression Analysis in ML
24 pages
Understanding Bias and Variance in Regression
No ratings yet
Understanding Bias and Variance in Regression
53 pages
Predictive Techniques in Machine Learning
No ratings yet
Predictive Techniques in Machine Learning
5 pages
ML Bias & Variance for B.Tech Students
No ratings yet
ML Bias & Variance for B.Tech Students
107 pages
Learning Algorithms: Regression Overview
No ratings yet
Learning Algorithms: Regression Overview
22 pages
Understanding Regression in Machine Learning
No ratings yet
Understanding Regression in Machine Learning
6 pages
Da Unit 3 Notes
No ratings yet
Da Unit 3 Notes
13 pages
Regularization and Multicollinearity Solutions
No ratings yet
Regularization and Multicollinearity Solutions
30 pages
Foundational Machine Learning Concepts
No ratings yet
Foundational Machine Learning Concepts
22 pages
Machine Learning Algorithms Overview
No ratings yet
Machine Learning Algorithms Overview
5 pages
Understanding Machine Learning Concepts
No ratings yet
Understanding Machine Learning Concepts
4 pages
Unit 1
No ratings yet
Unit 1
82 pages
Linear Regression in Machine Learning
No ratings yet
Linear Regression in Machine Learning
93 pages
Understanding Bias and Variance in ML
No ratings yet
Understanding Bias and Variance in ML
36 pages
Data Mining - Unit 2
No ratings yet
Data Mining - Unit 2
62 pages
Advanced AI Course: Regression Analysis
No ratings yet
Advanced AI Course: Regression Analysis
13 pages
Understanding Linear Regression Techniques
No ratings yet
Understanding Linear Regression Techniques
16 pages
Understanding Regression Types and Techniques
No ratings yet
Understanding Regression Types and Techniques
18 pages
Cross-Validation and Bias-Variance Tradeoff
No ratings yet
Cross-Validation and Bias-Variance Tradeoff
50 pages
Understanding Linear Regression Concepts
No ratings yet
Understanding Linear Regression Concepts
10 pages
Understanding Bias and Variance in ML
No ratings yet
Understanding Bias and Variance in ML
8 pages
Machine Learning Course Overview
No ratings yet
Machine Learning Course Overview
40 pages
Understanding Linear Regression Concepts
No ratings yet
Understanding Linear Regression Concepts
57 pages
MLT Unit-2 Regression
No ratings yet
MLT Unit-2 Regression
13 pages
Understanding Parametric Machine Learning
No ratings yet
Understanding Parametric Machine Learning
59 pages
Understanding Machine Learning Basics
No ratings yet
Understanding Machine Learning Basics
37 pages
Linear Regression in Machine Learning
No ratings yet
Linear Regression in Machine Learning
13 pages
Bias-Variance Trade-Off in ML Errors
No ratings yet
Bias-Variance Trade-Off in ML Errors
6 pages
Machine Learning System Design Overview
100% (3)
Machine Learning System Design Overview
84 pages
Machine Learning: Linear Regression Insights
No ratings yet
Machine Learning: Linear Regression Insights
24 pages
Bias vs Variance in Machine Learning
No ratings yet
Bias vs Variance in Machine Learning
11 pages
Understanding Bias-Variance in ML
No ratings yet
Understanding Bias-Variance in ML
17 pages
Week 3
No ratings yet
Week 3
17 pages
Evaluating Machine Learning Algorithms
100% (2)
Evaluating Machine Learning Algorithms
42 pages
Linear & Logistic Regression Overview
No ratings yet
Linear & Logistic Regression Overview
31 pages
Understanding Bias and Variance in ML
No ratings yet
Understanding Bias and Variance in ML
8 pages
Univariate Linear Regression Overview
No ratings yet
Univariate Linear Regression Overview
19 pages
Data Mining-2nd Unit
No ratings yet
Data Mining-2nd Unit
53 pages
Regression
No ratings yet
Regression
81 pages
Types of Machine Learning Explained
No ratings yet
Types of Machine Learning Explained
50 pages
Machine Learning: Bias-Variance & Regression
No ratings yet
Machine Learning: Bias-Variance & Regression
18 pages
AASHTO Rigid Pavement Design Guide
No ratings yet
AASHTO Rigid Pavement Design Guide
14 pages
Commentary To Eurocode
100% (5)
Commentary To Eurocode
168 pages
Aggregate and Concrete Testing Manual
100% (2)
Aggregate and Concrete Testing Manual
32 pages
Evaluation of The BOD POD (R) For Assessing Body Fat in Collegiate Football
No ratings yet
Evaluation of The BOD POD (R) For Assessing Body Fat in Collegiate Football
10 pages
Thomas-Fiering Model for Stream Flow Generation
100% (3)
Thomas-Fiering Model for Stream Flow Generation
26 pages
Analise Eficiencia Sensor AS3935 Silva SIPDA 2019
No ratings yet
Analise Eficiencia Sensor AS3935 Silva SIPDA 2019
7 pages
ITU-R P.530-14 (Eng) PDF
No ratings yet
ITU-R P.530-14 (Eng) PDF
53 pages
Analyzing High School Chemistry Grades
No ratings yet
Analyzing High School Chemistry Grades
2 pages
Types of Research Analysis Explained
100% (1)
Types of Research Analysis Explained
3 pages
Regression Analysis in Machine Learning
No ratings yet
Regression Analysis in Machine Learning
47 pages
GRC Specification
No ratings yet
GRC Specification
15 pages
Neural Networks For Short-Term Load Forecasting
100% (1)
Neural Networks For Short-Term Load Forecasting
12 pages
Empirical Evaluation of Income Numbers
No ratings yet
Empirical Evaluation of Income Numbers
3 pages
Child Safety in the Digital Age
No ratings yet
Child Safety in the Digital Age
10 pages
Extreme Sea Level Analysis: East India
No ratings yet
Extreme Sea Level Analysis: East India
7 pages
Demand Forecasting via Multiple Regression
No ratings yet
Demand Forecasting via Multiple Regression
26 pages
Machine Learning for Japan's Inflation Forecasts
No ratings yet
Machine Learning for Japan's Inflation Forecasts
23 pages
Chen 2013
No ratings yet
Chen 2013
7 pages
Organic Matter and Soil Fertility Study
No ratings yet
Organic Matter and Soil Fertility Study
22 pages
Effects of Sediment Transport On Grain-Size Distributions
No ratings yet
Effects of Sediment Transport On Grain-Size Distributions
14 pages
Statistics With Economics and Business Applications: Chapter 4 Useful Discrete Probability Distributions
No ratings yet
Statistics With Economics and Business Applications: Chapter 4 Useful Discrete Probability Distributions
30 pages
Metil
No ratings yet
Metil
7 pages
Notes and Correspondence Plotting Positions in Extreme Value Analysis
No ratings yet
Notes and Correspondence Plotting Positions in Extreme Value Analysis
7 pages
AP Statistics Midterm Review Guide
No ratings yet
AP Statistics Midterm Review Guide
6 pages
Midterm Exam: Environmental Statistics 2025
No ratings yet
Midterm Exam: Environmental Statistics 2025
2 pages
Advanced Analytics for Big Data
No ratings yet
Advanced Analytics for Big Data
6 pages
Hybrid ML Model for PV Power Forecasting
No ratings yet
Hybrid ML Model for PV Power Forecasting
14 pages
Understanding Correlational Research
No ratings yet
Understanding Correlational Research
12 pages
Variation of Impact Bending in Relation To Its Position in The Tree
No ratings yet
Variation of Impact Bending in Relation To Its Position in The Tree
7 pages
Statistical Analysis Techniques
100% (1)
Statistical Analysis Techniques
179 pages

Understanding Bias and Variance in ML

Uploaded by

Understanding Bias and Variance in ML

Uploaded by

Subject : Machine Learning

Unit-2:-Supervised Learning:- Regression

Dependent Independent variable

• f(x)= Output between the 0 and 1 value.

• The disadvantage of the R2 score is while adding new

You might also like