Subject : Machine Learning
Unit-2:-Supervised Learning:- Regression
Bias
• What is Bias?
• In general, a machine learning model analyses the data, find
patterns in it and make predictions.
• While training, the model learns these patterns in the dataset
and applies them to test data for prediction.
• While making predictions, a difference occurs between
prediction values made by the model and actual
values/expected values, and this difference is known as bias
errors or Errors due to bias.
• Low Bias: A low bias model will make fewer assumptions
about the form of the target function.
• High Bias: A model with a high bias makes more assumptions,
and the model becomes unable to capture the important
features of our dataset. A high bias model also cannot
perform well on new data.
Bias
• Some examples of machine learning algorithms with
low bias are Decision Trees, k-Nearest Neighbours and
Support Vector Machines. At the same time, an
algorithm with high bias is Linear Regression, Linear
Discriminant Analysis and Logistic Regression.
• Ways to reduce High Bias:
• High bias mainly occurs due to a much simple model.
Below are some ways to reduce the high bias:
• Increase the input features as the model is underfitted.
• Use more complex models, such as including some
polynomial features.
Variance Error?
• What is a Variance Error?
• The variance would specify the amount of variation in the
prediction if the different training data was used.
• In simple words, variance tells that how much a random
variable is different from its expected value.
• Ideally, a model should not vary too much from one training
dataset to another, which means the algorithm should be good in
understanding the hidden mapping between inputs and output
variables.
• Variance errors are either of low variance or high variance.
• Low variance means there is a small variation in the prediction
of the target function with changes in the training data set. At
the same time, High variance shows a large variation in the
prediction of the target function with changes in the training
dataset.
• A model that shows high variance learns a lot and
perform well with the training dataset, and does not
generalize well with the unseen dataset.
• As a result, such a model gives good results with the
training dataset but shows high error rates on the test
dataset.
• Since, with high variance, the model learns too much
from the dataset, it leads to overfitting of the model.
• A model with high variance has the below problems:
• A high variance model leads to overfitting.
• Increase model complexities.
• Some examples of machine learning algorithms with
low variance are, Linear Regression, Logistic
Regression, and Linear discriminant analysis.
• At the same time, algorithms with high variance
are decision tree, Support Vector Machine, and K-
nearest neighbours.
• The figure demonstrates the three concepts discussed above.
On the left, the red line represents a model that is underfitting.
• The model notes that there is some trend in the data, but it is
not specific enough to capture relevant information. It is
unable to make accurate predictions for training or new data.
In the middle, the blue line represents a model that is
balanced.
• This model notes there is a trend in the data, and accurately
models it. This middle model will be able to generalize
successfully.
• On the right, the blue line represents a model that is
overfitting.
• The model notes a trend in the data, and accurately models the
training data, but it is too specific. It will fail to make accurate
predictions with new data because it learned the training data
too well.
Ways to Reduce High Variance:
• Reduce the input features or number of parameters as a model is
overfitted.
• Do not use a much complex model.
• Increase the training data.
• Increase the Regularization term.
Different Combinations of Bias-Variance
Low-Bias, Low-Variance:
The combination of low bias and low variance
shows an ideal machine learning model.
However, it is not possible practically.
• Low-Bias, High-Variance: With low bias and high
variance, model predictions are inconsistent and
accurate on average. This case occurs when the model
learns with a large number of parameters and hence
leads to an overfitting
• High-Bias, Low-Variance: With High bias and low
variance, predictions are consistent but inaccurate on
average. This case occurs when a model does not learn
well with the training dataset or uses few numbers of
the parameter. It leads to underfitting problems in the
model.
• High-Bias, High-Variance:
With high bias and high variance, predictions are
inconsistent and also inaccurate on average.
Generalization
• Generalization is a term used to describe a model’s ability to
react to new data. That is, after being trained on a training
set, a model can digest new data and make accurate
predictions.
• If a model has been trained too well on training data, it will be
unable to generalize.
• It will make inaccurate predictions when given new data,
making the model useless even though it is able to make
accurate predictions for the training data. This is called
overfitting.
• The inverse is also true. Underfitting happens when a model
has not been trained enough on the data. In the case of
underfitting, it makes the model just as useless and it is not
capable of making accurate predictions, even with the training
data.
Linear Regression
• Regression analysis is a statistical method to model the
relationship between a dependent (target) and
independent (predictor) variables with one or more
independent variables.
• It predicts continuous/real values such as temperature,
age, salary, price, etc.
• Regression is a supervised learning technique
• It is mainly used for prediction, forecasting, time series
modeling, and determining the causal-effect
relationship between variables.
• "Regression shows a line or curve that passes through
all the datapoints on target-predictor graph in such a
way that the vertical distance between the datapoints
and the regression line is minimum."
Linear Regression
• Some examples of regression can be as:
• Prediction of rain using temperature and other factors
• Determining Market trends
• Prediction of road accidents due to rash driving.
• Types of Regression
• Linear Regression
• Logistic Regression
• Polynomial Regression
• Support Vector Regression
• Decision Tree Regression
• Random Forest Regression
• Ridge Regression
• Lasso Regression:
Types Regression
Linear Regression
• Linear regression is a statistical regression method which is used
for predictive analysis.
• It is one of the very simple and easy algorithms which works on
regression and shows the relationship between the continuous
variables.
• It is used for solving the regression problem in machine learning.
• Linear regression shows the linear relationship between the
independent variable (X-axis) and the dependent variable (Y-axis),
hence called linear regression.
• If there is only one input variable (x), then such linear regression
is called simple linear regression. And if there is more than one
input variable, then such linear regression is called multiple linear
regression.
• The relationship between variables in the linear regression model
can be explained using the below image. Here we are predicting
the salary of an employee on the basis of the year of experience.
Linear Regression
• Y= aX+b
• Here, Y = dependent variables (target variables),
X= Independent variables (predictor variables),
a and b are the linear coefficients
• Some popular applications of linear regression are:
• Analyzing trends and sales estimates
• Salary forecasting
• Real estate prediction
• Arriving at ETAs in traffic.
Simple Linear Regression
• Simple regression problem (a single x and a
single y), the form of the model would be:
Constant Coefficient
y = b0 + b1 *
x1
Dependent Independent variable
variable (IV)
(DV)
Example-2
• Let’s make this concrete with an example.
Imagine we are predicting weight (y) from
height (x). Our linear regression model
representation for this problem would be:
y = B0 + B1 * x1
or
weight =B0 +B1 * height
• Where B0 is the bias coefficient and B1 is the coefficient for
the height column. We use a learning technique to find a
good set of coefficient values. Once found, we can plug in
different height values to predict the weight.
• For example, lets use B0 = 0.1 and B1 = 0.5. Let’s
plug them in and calculate the weight (in kilograms)
for a person with the height of 182 centimeters.
weight = 0.1 + 0.05 * 182
weight = 91.1
• You can see that the above equation could be plotted as a line
in two-dimensions. The B0 is our starting point regardless of
what height we have.
• We can run through a bunch of heights from 100 to 250
centimeters and plug them to the equation and get weight
values, creating our line.
Multi Linear Regression
Constant Coefficients
y = b0 + b1 * x1 + b2 * x2 + ... + bn *
xn
Independent variables
Dependent
(IVs)
variable
(DV)
Logistic Regression
• Logistic regression is another supervised learning algorithm
which is used to solve the classification problems.
In classification problems, we have dependent variables in a
binary or discrete format such as 0 or 1.
• Logistic regression algorithm works with the categorical variable
such as 0 or 1, Yes or No, True or False, Spam or not spam, etc.
• It is a predictive analysis algorithm which works on the concept
of probability.
• Logistic regression is a type of regression, but it is different from
the linear regression algorithm in the term how they are used.
• Logistic regression uses sigmoid function or logistic function
which is a complex cost function. This sigmoid function is used
to model the data in logistic regression. The function can be
represented as:
Logistic Regression
• f(x)= Output between the 0 and 1 value.
• x= input to the function
• e= base of natural logarithm.
• It uses the concept of threshold levels, values
above the threshold level are rounded up to 1,
and values below the threshold level are
rounded up to 0.
• There are three types of logistic regression:
• Binary(0/1, pass/fail)
• Multi(cats, dogs, lions)
• Ordinal(low, medium, high)
Ridge Regression (L1)
Ridge Regression (L1)
Ridge Regression (L1)
Ridge Regression (L1)
Ridge Regression (L1)
Ridge Regression (L1)
Ridge Regression (L1)
Ridge Regression (L1)
Ridge Regression (L1)
Lasso Regression (L2)
Ridge vs Lasso Regression (L1 vs L2)
Elasticnet Regression
MAE
• The essential step in any machine learning model is to
evaluate the accuracy of the model.
• The Mean Squared Error, Mean absolute error, Root
Mean Squared Error, and R-Squared or Coefficient of
determination metrics are used to evaluate the
performance of the model in regression analysis.
• The Mean absolute error represents the average of the
absolute difference between the actual and predicted values in
the dataset.
• It measures the average of the residuals in the dataset.
• Advantages of MAE
• The MAE you get is in the same unit as the
output variable.
• It is most Robust to outliers.
• Disadvantages of MAE
• The graph of MAE is not differentiable so we
have to apply various optimizers like Gradient
descent which can be differentiable.
• Now to overcome the disadvantage of MAE
next metric came as MSE.
MSE
• Mean Squared Error represents the average of the
squared difference between the original and predicted
values in the data set.
• It measures the variance of the residuals.
• we perform squared to avoid the cancellation of
negative terms and it is the benefit of MSE.
• Advantages of MSE
• The graph of MSE is differentiable, so you can
easily use it as a loss function.
• Disadvantages of MSE
• The value you get after calculating MSE is a
squared unit of output. for example, the output
variable is in meter(m) then after calculating
MSE the output we get is in meter squared.
• If you have outliers in the dataset then it
penalizes the outliers most and the calculated
MSE is bigger. So, in short, It is not Robust to
outliers which were an advantage in MAE.
RME
• Root Mean Squared Error is the square root of Mean
Squared error. It measures the standard deviation of
residuals. Advantages of RMSE
• The output value you get is in the same unit as the
required output variable which makes interpretation of
loss easy.
• Disadvantages of RMSE
• It is not that robust to outliers as compared to MAE.
• for performing RMSE we have to NumPy NumPy square
root function over MSE.
R-squared
• The coefficient of determination or R-
squared represents the proportion of the variance in
the dependent variable which is explained by the linear
regression model.
• It is a scale-free score i.e. irrespective of the values
being small or large, the value of R square will be less
than one.
• To control this situation of RMSE we take the log of
calculated RMSE error and resultant we get as RMSLE.
• To perform RMSLE we have to use the NumPy log
function over RMSE.
• It is a very simple metric that is used by most of the
datasets hosted for Machine Learning competitions.
• So, with help of R squared we have a baseline
model to compare a model which none of the
other metrics provides.
• The same we have in classification problems
which we call a threshold which is fixed at 0.5.
So basically R2 squared calculates how must
regression line is better than a mean line.
• Hence, R2 squared is also known as Coefficient
of Determination or sometimes also known as
Goodness of fit.
• suppose If the R2 score is zero then the above regression
line by mean line is equal means 1 so 1-1 is zero.
• So, in this case, both lines are overlapping means model
performance is worst, It is not capable to take advantage
of the output column.
• Now the second case is when the R2 score is 1, it means
when the division term is zero and it will happen when
the regression line does not make any mistake, it is
perfect. In the real world, it is not possible.
• So we can conclude that as our regression line moves
towards perfection, R2 score move towards one. And the
model performance improves.
• The normal case is when the R2 score is between zero
and one like 0.8 which means your model is capable to
explain 80 per cent of the variance of data.
Adjusted R Squared
• The disadvantage of the R2 score is while adding new
features in data the R2 score starts increasing or
remains constant but it never decreases because It
assumes that while adding more data variance of data
increases.
• But the problem is when we add an irrelevant feature
in the dataset then at that time R2 sometimes starts
increasing which is incorrect.
• Hence, To control this situation Adjusted R Squared
came into existence.
Differences among these evaluation metrics
• Mean Squared Error(MSE) and Root Mean Square Error
penalizes the large prediction errors vi-a-vis Mean
Absolute Error (MAE).
• However, RMSE is widely used than MSE to evaluate the
performance of the regression model with other
random models as it has the same units as the
dependent variable (Y-axis).
• MSE is a differentiable function that makes it easy to
perform mathematical operations in comparison to a
non-differentiable function like MAE.
• Therefore, in many models, RMSE is used as a default
metric for calculating Loss Function despite being
harder to interpret than MAE
Differences among these evaluation metrics
• The lower value of MAE, MSE, and RMSE implies higher accuracy
of a regression model. However, a higher value of R square is
considered desirable.
• R Squared & Adjusted R Squared are used for explaining how
well the independent variables in the linear regression model
explains the variability in the dependent variable.
• R Squared value always increases with the addition of the
independent variables which might lead to the addition of the
redundant variables in our model. However, the adjusted R-
squared solves this problem.
• Both RMSE and R- Squared quantifies how well a linear
regression model fits a dataset. The RMSE tells how well a
regression model can predict the value of a response variable in
absolute terms while R- Squared tells how well the predictor
variables can explain the variation in the response variable.