Week 3
Bias, variance, regularization, paltforms, back prpopagtion, issues, shallow leraning
1. BIAS AND VARIANCE
Supervised machine learning algorithms can best be understood through the lens of
the bias-variance trade-off.
In supervised machine learning an algorithm learns a model from training data.
The goal of any supervised machine learning algorithm is to best estimate the
mapping function (f) for the output variable (Y) given the input data (X).
The mapping function is often called the target function because it is the function that
a given supervised machine learning algorithm aims to approximate.
The prediction error for any machine learning algorithm can be broken down into
three parts: Bias Error, Variance Error, Irreducible Error
The irreducible error cannot be reduced regardless of what algorithm is used. It is the
error introduced from the chosen framing of the problem and may be caused by
factors like unknown variables that influence the mapping of the input variables to the
output variable.
Bias Error
Bias are the simplifying assumptions made by a model to make the target function
easier to learn.
Generally, parametric algorithms have a high bias making them fast to learn and
easier to understand but generally less flexible. In turn, they have lower predictive
performance on complex problems that fail to meet the simplifying assumptions of the
algorithms bias.
Low Bias: Suggests less assumptions about the form of the target function. EG: Decision
Trees, k-Nearest Neighbors and Support Vector Machines.
High-Bias: Suggests more assumptions about the form of the target function. EG: Linear
Regression, Linear Discriminant Analysis and Logistic Regression.
Variance Error
Variance is the amount that the estimate of the target function will change if different
training data was used.
The target function is estimated from the training data by a machine learning
algorithm, so we should expect the algorithm to have some variance.
Ideally, it should not change too much from one training dataset to the next, meaning
that the algorithm is good at picking out the hidden underlying mapping between the
inputs and the output variables.
Machine learning algorithms that have a high variance are strongly influenced by the
specifics of the training data. This means that the specifics of the training
have influences the number and types of parameters used to characterize the mapping
function.
Low Variance: Suggests small changes to the estimate of the target function with
changes to the training dataset. EG: Linear Regression, Linear Discriminant Analysis
and Logistic Regression.
High Variance: Suggests large changes to the estimate of the target function with
changes to the training dataset. EG: Decision Trees, k-Nearest Neighbors and Support
Vector Machines.
Generally, nonparametric machine learning algorithms that have a lot of flexibility
have a high variance.
Bias-Variance Trade-Off
The goal of any supervised machine learning algorithm is to achieve low bias and low
variance. In turn the algorithm should achieve good prediction performance.
You can see a general trend in the examples above:
Parametric or linear machine learning algorithms often have a high bias but a low
variance.
Non-parametric or non-linear machine learning algorithms often have a low bias but a
high variance.
The parameterization of machine learning algorithms is often a battle to balance out
bias and variance.
Examples of configuring the bias-variance trade-off for specific algorithms:
The k-nearest neighbors algorithm has low bias and high variance, but the trade-off can be
changed by increasing the value of k which increases the number of neighbors that contribute
t the prediction and in turn increases the bias of the model.
The support vector machine algorithm has low bias and high variance, but the trade-off can
be changed by increasing the C parameter that influences the number of violations of the
margin allowed in the training data which increases the bias but decreases the variance.
There is no escaping the relationship between bias and variance in machine learning.
Increasing the bias will decrease the variance.
Increasing the variance will decrease the bias.
There is a trade-off at play between these two concerns and the algorithms you choose
and the way you choose to configure them are finding different balances in this trade-off
for your problem
In reality, we cannot calculate the real bias and variance error terms because we do not
know the actual underlying target function. Nevertheless, as a framework, bias and
variance provide the tools to understand the behavior of machine learning algorithms in
the pursuit of predictive performance.
2. REGULARIZATION
One of the major aspects of training your machine learning model is avoiding overfitting.
The model will have a low accuracy if it is overfitting.
This happens because your model is trying too hard to capture the noise in your training
dataset.
By noise we mean the data points that don’t really represent the true properties of your
data, but random chance.
Learning such data points, makes your model more flexible, at the risk of overfitting.
The concept of balancing bias and variance, is helpful in understanding the phenomenon
of overfitting.
The fitting procedure involves a loss function, known as residual sum of squares or RSS.
The coefficients are chosen, such that they minimize this loss function.
Now, this will adjust the coefficients based on your training data.
If there is noise in the training data, then the estimated coefficients won’t generalize well
to the future data. This is where regularization comes in and shrinks or regularizes these
learned estimates towards zero.
Ridge Regression
Above image shows ridge regression, where the RSS is modified by adding the shrinkage
quantity. Now, the coefficients are estimated by minimizing this function.
Here, λ is the tuning parameter that decides how much we want to penalize the flexibility
of our model.
The increase in flexibility of a model is represented by increase in its coefficients, and if
we want to minimize the above function, then these coefficients need to be small.
This is how the Ridge regression technique prevents coefficients from rising too high.
Also, notice that we shrink the estimated association of each variable with the response,
except the intercept β0, This intercept is a measure of the mean value of the response when
When λ = 0, the penalty term has no effect, and the estimates produced by ridge regression
xi1 = xi2 = …= xip = 0.
will be equal to least squares.
coefficient estimates will approach zero.
However, as λ→∞, the impact of the shrinkage penalty grows, and the ridge regression
As can be seen, selecting a good value of λ is critical. Cross validation comes in handy for
this purpose. The coefficient estimates produced by this method are also known as the L2
norm.
The coefficients that are produced by the standard least squares method are scale
equivariant, i.e. if we multiply each input by c then the corresponding coefficients are
scaled by a factor of 1/c.
Therefore, regardless of how the predictor is scaled, the multiplication of predictor and
coefficient(Xjβj) remains the same. However, this is not the case with ridge regression,
and therefore, we need to standardize the predictors or bring the predictors to the same
scale before performing ridge regression. The formula used to do this is given below.
Lasso
Lasso is another variation, in which the above function is minimized. Its clear that this
variation differs from ridge regression only in penalizing the high coefficients.
It uses |βj|(modulus)instead of squares of β, as its penalty. In statistics, this is known
as the L1 norm.
Lets take a look at above methods with a different perspective. The ridge regression
can be thought of as solving an equation, where summation of squares of coefficients
is less than or equal to s.
And the Lasso can be thought of as an equation where summation of modulus of
coefficients is less than or equal to s. Here, s is a constant that exists for each value of
shrinkage factor λ.
These equations are also referred to as constraint functions.
Consider there are 2 parameters in a given problem. Then according to above
formulation, the ridge regression is expressed by β1² + β2² ≤ s.
This implies that ridge regression coefficients have the smallest RSS(loss function) for
all points that lie within the circle given by β1² + β2² ≤ s.
Similarly, for lasso, the equation becomes,|β1|+|β2|≤ s. This implies that lasso
coefficients have the smallest RSS(loss function) for all points that lie within the
diamond given by |β1|+|β2|≤ s.
The image below describes these equations.
The above image shows the constraint functions(green areas), for lasso(left) and ridge
regression(right), along with contours for RSS(red ellipse).
Points on the ellipse share the value of RSS. For a very large value of s, the green
regions will contain the center of the ellipse, making coefficient estimates of both
regression techniques, equal to the least squares estimates. But, this is not the case in
the above image.
In this case, the lasso and ridge regression coefficient estimates are given by the first
point at which an ellipse contacts the constraint region.
will not generally occur on an axis, and so the ridge regression coe fficient estimates
Since ridge regression has a circular constraint with no sharp points, this intersection
will be exclusively non-zero.
However, the lasso constraint has corners at each of the axes, and so the ellipse will
When this occurs, one of the coefficients will equal zero. In higher dimensions(where
often intersect the constraint region at an axis.
parameters are much more than 2), many of the coefficient estimates may equal zero
simultaneously.
This sheds light on the obvious disadvantage of ridge regression, which is model
interpretability.
It will shrink the coefficients for least important predictors, very close to zero. But it
will never make them exactly zero. In other words, the final model will include all
predictors.
However, in the case of the lasso, the L1 penalty has the effect of forcing some of the
coefficient estimates to be exactly equal to zero when the tuning parameter λ is
sufficiently large. Therefore, the lasso method also performs variable selection and
is said to yield sparse models.
What does Regularization achieve?
A standard least squares model tends to have some variance in it, i.e. this model won’t
generalize well for a data set different than its training data.
Regularization, significantly reduces the variance of the model, without substantial
increase in its bias.
So the tuning parameter λ, used in the regularization techniques described above,
controls the impact on bias and variance.
As the value of λ rises, it reduces the value of coefficients and thus reducing the
variance. Till a point, this increase in λ is beneficial as it is only reducing the
variance(hence avoiding overfitting), without loosing any important properties in the
data.
But after certain value, the model starts loosing important properties, giving rise to bias
in the model and thus underfitting.
Therefore, the value of λ should be carefully selected.
This is all the basic you will need, to get started with Regularization. It is a useful
technique that can help in improving the accuracy of your regression models.
3. PLATFORMS FOR DEEP LEARNING
The machine learning paradigm is continuously evolving. Deep learning is what makes
solving complex problems possible.
1. TensorFlow: top-deep-learning-framework
TensorFlow is arguably one of the best deep learning frameworks and has been
adopted by several giants such as Airbus, Twitter, IBM, and others mainly due to its
highly flexible system architecture.
The most well-known use case of TensorFlow has got to be Google Translate coupled
with capabilities such as natural language processing, text
classification/summarization, speech/image/handwriting recognition, forecasting, and
tagging.
TensorFlow is available on both desktop and mobile and also supports languages such
as Python, C++, and R to create deep learning models along with wrapper libraries.
TensorFlow comes with two tools that are widely used:
1. TensorBoard for the effective data visualization of network modeling and
performance.
2. TensorFlow Serving for the rapid deployment of new algorithms/experiments while
retaining the same server architecture and APIs. It also provides integration with other
TensorFlow models, which is different from conventional practices and can be
extended to serve other model and data types.
2. Caffe: caffe-top-deep-learning-framework
Caffe is a deep learning framework that is supported with interfaces like C, C++,
Python, and MATLAB as well as the command line interface.
It is well known for its speed and transposability and its applicability in modeling
convolution neural networks (CNN).
The biggest benefit of using Caffe’s C++ library (comes with a Python interface) is
the ability to access available networks from the deep net repository Caffe Model Zoo
that are pre-trained and can be used immediately.
When it comes to modeling CNNs or solving image processing issues, this should be
your go-to library.
Caffe’s biggest USP is speed. It can process over 60 million images on a daily basis
with a single Nvidia K40 GPU. That’s 1 ms/image for inference and 4 ms/image for
learning — and more recent library versions are faster still.
Caffe is a popular deep learning network for visual recognition. However, Caffe does
not support fine-granular network layers like those found in TensorFlow or CNTK.
Given the architecture, the overall support for recurrent networks, and language
modeling it's quite poor, and establishing complex layer types has to be done in a low-
level language.
3. Microsoft Cognitive Toolkit/CNTK: microsoft-top-deep-learning-framework
Popularly known for easy training and the combination of popular model types across
servers, the Microsoft Cognitive Toolkit (previously known as CNTK) is an open-
source deep learning framework to train deep learning models.
It performs efficient convolution neural networks and training for image, speech, and
text-based data. Similar to Caffe, it is supported by interfaces such as Python, C++,
and the command line interface.
Given its coherent use of resources, the implementation of reinforcement learning
models or generative adversarial networks (GANs) can be done easily using the
toolkit. It is known to provide higher performance and scalability as compared to
toolkits like Theano or TensorFlow while operating on multiple machines.
Compared to Caffe, when it comes to inventing new complex layer types, users don’t
need to implement them in a low-level language due to the fine granularity of the
building blocks.
The Microsoft Cognitive Toolkit supports both RNN and CNN types of neural models
and thus is capable of handling images, handwriting, and speech recognition
problems. Currently, due to the lack of support on ARM architecture, its capabilities
on mobile are fairly limited.
4. Torch/PyTorch: pytorch-top-deep-learning-framework
Torch is a scientific computing framework that offers wide support for machine
learning algorithms.
It is a Lua-based deep learning framework and is used widely amongst industry giants
such as Facebook, Twitter, and Google. It employs CUDA along with C/C++ libraries
for processing and was basically made to scale the production of building models and
provide overall flexibility.
As of late, PyTorch has seen a high level of adoption within the deep learning
framework community and is considered to be a competitor to TensorFlow.
PyTorch is basically a port to the Torch deep learning framework used for
constructing deep neural networks and executing tensor computations that are high in
terms of complexity.
As opposed to Torch, PyTorch runs on Python, which means that anyone with a basic
understanding of Python can get started on building their own deep learning models.
Given PyTorch framework’s architectural style, the entire deep modeling process is
far simpler as well as transparent compared to Torch.
5. MXNet: mxnet-top-deep-learning-framework
Designed specifically for the purpose of high efficiency, productivity, and flexibility,
MXNet (pronounced as mix-net) is a deep learning framework supported by Python,
R, C++, and Julia.
The beauty of MXNet is that it gives the user the ability to code in a variety of
programming languages. This means that you can train your deep learning models
with whichever language you are comfortable in without having to learn something
new from scratch.
With the backend written in C++ and CUDA, MXNet is able to scale and work with a
myriad of GPUs, which makes it indispensable to enterprises. Case in point: Amazon
employed MXNet as its reference library for deep learning.
MXNet supports long short-term memory (LTSM) networks along with both RNNs
and CNNs.
This deep learning framework is known for its capabilities in imaging,
handwriting/speech recognition, forecasting, and NLP.
6. Chainer :chainer-top-deep-learning-framework
Highly powerful, dynamic and intuitive, Chainer is a Python-based deep learning
framework for neural networks that is designed by the run strategy.
Compared to other frameworks that use the same strategy, you can modify the
networks during runtime, allowing you to execute arbitrary control flow statements.
Chainer supports both CUDA computation along with multi-GPU. This deep learning
framework is utilized mainly for sentiment analysis, machine translation, speech
recognition, etc. using RNNs and CNNs.
7. Keras: keras-top-deep-learning-framework
Well known for being minimalistic, the Keras neural network library (with a
supporting interface of Python) supports both convolutional and recurrent networks
that are capable of running on either TensorFlow or Theano.
The library is written in Python and was developed keeping quick experimentation as
its USP.
Due to the fact that the TensorFlow interface is a tad bit challenging coupled with the
fact that it is a low-level library that can be intricate for new users, Keras was built to
provide a simplistic interface for the purpose of quick prototyping by constructing
effective neural networks that can work with TensorFlow.
Lightweight, easy to use, and really straightforward when it comes to building a deep
learning model by stacking multiple layers: that is Keras in a nutshell. These are the
very reasons why Keras is a part of TensorFlow’s core API.
The primary usage of Keras is in classification, text generation and summarization,
tagging, and translation, along with speech recognition and more. If you happen to be
a developer with some experience in Python and wish to dive into deep learning,
Keras is something you should definitely check out.
8. Deeplearning4j: dl4j-top-deep-learning-framework
Parallel training through iterative reduce, microservice architecture adaptation, and
distributed CPUs and GPUs are some of the salient features of the Deeplearning4j
deep learning framework. It is developed in Java as well as Scala and supports other
JVM languages, too.
Widely adopted as a commercial, industry-focused distributed deep learning platform,
the biggest advantage of this deep learning framework is that you can bring together
the entire Java ecosystem to execute deep learning.
It can also be administered on top of Hadoop and Spark to orchestrate multiple host
threads. DL4J uses MapReduce to train the network while depending on other
libraries to execute large matrix operations.
Deeplearning4j comes with a deep network support through RBM, DBN, convolution
neural networks (CNNs), recurrent neural networks (RNNs), recursive neural tensor
networks (RNTNs), and long short-term memory (LTSM).
Since this deep learning framework is implemented in Java, it is much more efficient
compared to Python.
When it comes to image recognition tasks using multiple GPUs, it is as fast as Caffe.
This framework shows matchless potential for image recognition, fraud detection, text
mining, parts-of-speech tagging, and natural language processing.
With Java as your core programming language, you should certainly opt for this deep
learning framework if you’re looking for a robust and effective method of deploying
your deep learning models to production.
4. SOFTWARE LIBRARIES
Numpy:
NumPy is an open-source numerical and popular Python library. It can be used to perform
a variety of mathematical operations on arrays and matrices. It’s one of the most used
scientific computing libraries, and it’s often used by scientists for data analysis. Additionally,
its ability to process multidimensional arrays—handling linear algebra and Fourier
transformation—makes it ideal for machine learning and artificial intelligence (AI) projects.
Compared with regular Python lists, NumPy arrays require significantly less storage area.
They’re also much faster and more convenient to use than the former. NumPy allows you to
manipulate the data in the matrix and to transpose and reshape it. Compiled, Numpy’s
capabilities help you improve the performance of your machine learning model without much
hassle.
SciPy
SciPy is a free and open-source library that’s based on NumPy. It can be used to perform
scientific and technical computing on large sets of data. Similar to NumPy, SciPy comes with
embedded modules for array optimization and linear algebra. It’s considered a foundational
Python library due to its critical role in scientific analysis and engineering. SciPy depends
greatly on NumPy for its array manipulation subroutines and includes all of NumPy’s
functions. However, it adds to them to make them full-fledged scientific tools that are still
user-friendly. SciPy is ideal for image manipulation and provides basic processing features of
non-scientific high-level mathematical functions. It’s easy to use and fast to execute. It also
includes high-level commands that play a role in data visualization and manipulation.
Scikit-Learn
Based on NumPy and SciPy, scikit-learn is a free Python library that’s often considered a
direct extension of SciPy. It was specifically designed for data modeling and developing
machine learning algorithms, both supervised and unsupervised. Thanks to its simple,
intuitive, and consistent interface, scikit-learn is both beginner and user-friendly. Although
the use of scikit-learn is limited because it only excels in data modeling, it does an excellent
job at allowing users to manipulate and share data however they need.
Theano
Theano is a numerical computation Python library made specifically for machine learning. It
allows for efficient definition, optimization, and evaluation of mathematical expressions and
matrix calculations to employ multidimensional arrays to create deep learning models. It’s a
highly specific library and almost exclusively used by ML and DL developers and
programmers. Theano supports integration with NumPy, and when used with a graphics
processing unit (GPU) rather than a central processing unit (CPU), it performs data-intensive
computations 140 times faster. Additionally, Theano has built-in validation and unit testing
tools to avoid bugs and errors later on in the code.
TensorFlow
TensorFlow is a free and open-source Python library that specializes in differentiable
programming. The library offers a collection of tools and resources that help make building
DL and ML models and neural networks straightforward for beginners and professionals.
TensorFlow’s architecture and framework are flexible and allow it to run on several
computational platforms such as CPU and GPU. However, it performs its best when working
on a tensor processing unit (TPU). TensorFlow can be used to implement reinforcement-
learning in ML and DL models and allows you to directly visualize your machine learning
models with its built-in tools. TensorFlow isn’t limited to working on desktop devices. It lets
you create and train smart models on servers and smartphones.
Keras
Keras is an open-source Python library designed for developing and evaluating neural
networks within deep learning and machine learning models. It can run on top of Theano and
TensorFlow, making it possible to start training neural networks with a little code. The Keras
library is modular, flexible, and extensible, making it beginner- and user-friendly. It also
offers a fully functioning model for creating neural networks as it integrates with objectives,
layers, optimizers, and activation functions. Keras framework is flexible and portable,
allowing it to operate in multiple environments and work on both CPUs and GPUs. It allows
for fast and efficient prototyping, research work, and data modeling and visualization. Keras
also has one of the widest ranges when it comes to data types because it can work on text
images and images to train models.
PyTorch
PyTorch is an open-source machine learning Python library that’s based on the C
programming language framework, Torch. PyTorch qualifies as a data science library and can
integrate with other similar Python libraries such as NumPy. It’s able to seamlessly create
computational graphs that can be changed anytime while the Python program is running. It’s
mainly used in ML and DL applications such as computer vision and natural language
processing. PyTorch is known for its high speeds of execution even when it’s handling heavy
and extensive graphs. It’s also highly flexible, which allows it to operate on simplified
processors in addition to CPUs and GPUs. PyTorch comes with a collection of powerful APIs
that lets you expand on the PyTorch library, as well as a natural language toolkit for smoother
processing. It’s compatible with Python’s IDE tools, which makes for an easy debugging
process.
Pandas
Pandas is a data science and analysis Python library that allows developers to build intuitive
and seamless high-level data structures. Built on top of NumPy, Pandas is responsible for
preparing data sets and points for machine training. Pandas uses two types of data structures,
one-dimensional (series) and two-dimensional (DataFrame), which, together, allow Pandas to
be used in a variety of sectors, from science and statistics to finance and engineering. The
Pandas library is flexible and can be used in tandem with other scientific and numerical
libraries. Its data structures are easy to use because they’re highly descriptive, quick, and
compliant. With Pandas, you can manipulate data functionality by grouping, integrating, and
re-indexing it using minimal commands.
Matplotlib
Matplotlib is a data visualization library that’s used for making plots and graphs. It’s an
extension of SciPy and is able to handle NumPy data structures as well as complex data
models made by Pandas. Although its expertise is limited to 2D plotting, Matplotlib can
produce high-quality and publish-ready diagrams, graphs, plots, histograms, error charts,
scatter plots and bar charts. Matplotlib is intuitive and easy to use, making it a great choice
for beginners. It’s even easier to use for people with preexisting knowledge in various other
graph-plotting tools. It offers GUI toolkit support, including wxPython, Tkinter, and Qt.
5. BACKPROPAGATION
Backpropagation is the essence of neural network training. It is the method of fine-tuning the
weights of a neural network based on the error rate obtained in the previous epoch (i.e.,
iteration). Proper tuning of the weights allows you to reduce error rates and make the model
reliable by increasing its generalization.
Backpropagation in neural network is a short form for “backward propagation of errors.” It is
a standard method of training artificial neural networks. This method helps calculate the
gradient of a loss function with respect to all the weights in the network. The Back
propagation algorithm in neural network computes the gradient of the loss function for a
single weight by the chain rule. It efficiently computes one layer at a time, unlike a native
direct computation. It computes the gradient, but it does not define how the gradient is used.
It generalizes the computation in the delta rule.
Inputs X, arrive through the preconnected path
Input is modeled using real weights W. The weights are usually randomly selected.
Calculate the output for every neuron from the input layer, to the hidden layers, to the
output layer.
Calculate the error in the outputs
Travel back from the output layer to the hidden layer to adjust the weights such that
the error is decreased.
Most prominent advantages of Backpropagation are:
Backpropagation is fast, simple and easy to program
It has no parameters to tune apart from the numbers of input
It is a flexible method as it does not require prior knowledge about
the network
It is a standard method that generally works well
It does not need any special mention of the features of the
function to be learned.
6. ISSUES IN DEEP LEARNING
Inadequate Training Data
The major issue that comes while using deep learning algorithms is the lack of quality as
well as quantity of data. Although data plays a vital role in the processing of machine
learning algorithms, many data scientists claim that inadequate data, noisy data, and
unclean data are extremely exhausting the machine learning algorithms. For example, a
simple task requires thousands of sample data, and an advanced task such as speech or
image recognition needs millions of sample data examples. Further, data quality is also
important for the algorithms to work ideally, but the absence of data quality is also found
in deep Learning applications. Data quality can be affected by some factors as follows:
Noisy Data- It is responsible for an inaccurate prediction that affects the decision
as well as accuracy in classification tasks.
Incorrect data- It is also responsible for faulty programming and results obtained
in machine learning models. Hence, incorrect data may affect the accuracy of the
results also.
Generalizing of output data- Sometimes, it is also found that generalizing output
data becomes complex, which results in comparatively poor future actions.
2. Poor quality of data
Data plays a significant role in machine learning, and it must be of good quality as well.
Noisy data, incomplete data, inaccurate data, and unclean data lead to less accuracy in
classification and low-quality results. Hence, data quality can also be considered as a
major common problem while processing deep learning algorithms.
3. Non-representative training data
To make sure our training model is generalized well or not, we have to ensure that sample
training data must be representative of new cases that we need to generalize. The training
data must cover all cases that are already occurred as well as occurring. Further, if we are
using non-representative training data in the model, it results in less accurate predictions.
A machine learning model is said to be ideal if it predicts well for generalized cases and
provides accurate decisions. If there is less training data, then there will be a sampling
noise in the model, called the non-representative training set. It won't be accurate in
predictions. To overcome this, it will be biased against one class or a group. Hence, we
should use representative data in training to protect against being biased and make
accurate predictions without any drift.
4. Overfitting and Underfitting
Overfitting:
Overfitting is one of the most common issues faced by deep Learning engineers and data
scientists. Whenever a deep learning model is trained with a huge amount of data, it starts
capturing noise and inaccurate data into the training data set. It negatively affects the
performance of the model. The main reason behind overfitting is using non-linear
methods used in machine learning algorithms as they build non-realistic data models. We
can overcome overfitting by using linear and parametric algorithms in the learning
models.
Underfitting:
Underfitting is just the opposite of overfitting. Whenever a deep learning model is
trained with fewer amounts of data, and as a result, it provides incomplete and inaccurate
data and destroys the accuracy of the model. Underfitting occurs when our model is too
simple to understand the base structure of the data, just like an undersized pant. This
generally happens when we have limited data into the data set, and we try to build a linear
model with non-linear data. In such scenarios, the complexity of the model destroys, and
rules of the machine learning model become too easy to be applied on this data set, and
the model starts doing wrong predictions as well.
5. Monitoring and maintenance
As we know that generalized output data is mandatory for any deep learning model;
hence, regular monitoring and maintenance become compulsory for the same. Different
results for different actions require data change; hence editing of codes as well as
resources for monitoring them also become necessary.
6. Getting bad recommendations
A deep learning model operates under a specific context which results in bad
recommendations and concept drift in the model. Let's understand with an example where
at a specific time customer is looking for some gadgets, but now customer requirement
changed over time but still learning model showing same recommendations to the
customer while customer expectation has been changed. This incident is called a Data
Drift. It generally occurs when new data is introduced or interpretation of data changes.
However, we can overcome this by regularly updating and monitoring data according to
the expectations.
7. Lack of skilled resources
Although deep Learning and Artificial Intelligence are continuously growing in the
market, still these industries are fresher in comparison to others. The absence of skilled
resources in the form of manpower is also an issue. Hence, we need manpower having in-
depth knowledge of mathematics, science, and technologies for developing and managing
scientific substances for machine learning.
8. Process Complexity of deep Learning
The deep learning process is very complex, which is also another major issue faced by
machine learning engineers and data scientists. However, deep Learning and Artificial
Intelligence are very new technologies but are still in an experimental phase and
continuously being changing over time. There is the majority of hits and trial
experiments; hence the probability of error is higher than expected. Further, it also
includes analyzing the data, removing data bias, training data, applying complex
mathematical calculations, etc., making the procedure more complicated and quite
tedious.
10. Data Bias
Data Biasing is also found a big challenge in deep Learning. These errors exist when
certain elements of the dataset are heavily weighted or need more importance than others.
Biased data leads to inaccurate results, skewed outcomes, and other analytical errors.
However, we can resolve this error by determining where data is actually biased in the
dataset. Further, take necessary steps to reduce it.
11. Lack of Explainability
This basically means the outputs cannot be easily comprehended as it is programmed in
specific ways to deliver for certain conditions. Hence, a lack of explainability is also
found in machine learning algorithms which reduce the credibility of the algorithms.
12. Slow implementations and results
This issue is also very commonly seen in deep learning models. However, machine
learning models are highly efficient in producing accurate results but are time-consuming.
Slow programming, excessive requirements' and overloaded data take more time to
provide accurate results than expected. This needs continuous maintenance and
monitoring of the model for delivering accurate results.
7. SHALLOW LEARNING
A shallow neural network has only one (or just a few) hidden layers between the input and
output layers. The input layer receives the data, the hidden layer(s) process it, and the final
layer produces the output. Shallow neural networks are simpler, more easily trained, and have
greater computational efficiency than deep neural networks, which may have thousands of
hidden units in dozens of layers. Shallow networks are typically used for simpler tasks such
as linear regression, binary classification, or low-dimensional feature extraction.
Technically speaking, a neural network with more than one hidden layer is no longer
considered a shallow neural network. In some cases, however, a neural network with 2 or 3
hidden layers, each with a small number of hidden units and having simple connectivity
between them may produce straightforward outputs and may still be considered a "shallow"
network.