Introduction to Deep Learning Concepts
Introduction to Deep Learning Concepts
Unit 2
Introduction to Deep Learning and
Frameworks
Introduction
Deep learning is based on the branch of machine learning, which is a subset of artificial intelligence. Since
neural networks imitate the human brain and so deep learning will do. In deep learning, nothing is
programmed explicitly. Basically, it is a machine learning class that makes use of numerous nonlinear
processing units so as to perform feature extraction as well as transformation. The output from each
preceding layer is taken as input by each one of the successive layers.
Deep learning models are capable enough to focus on the accurate features themselves by requiring a little
guidance from the programmer and are very helpful in solving out the problem of dimensionality.
Deep learning algorithms are used, especially when we have a huge no of inputs and outputs.
Since deep learning has been evolved by the machine learning, which itself is a subset of artificial
intelligence and as the idea behind the artificial intelligence is to mimic the human behavior, so same is "the
idea of deep learning to build such algorithm that can mimic the brain".
Deep learning is implemented with the help of Neural Networks, and the idea behind the motivation of
Neural Network is the biological neurons, which is nothing but a brain cell.
Deep learning is a type of machine learning that teaches computers to perform tasks by
learning from examples, much like humans do. Imagine teaching a computer to recognize
cats: instead of telling it to look for whiskers, ears, and a tail, you show it thousands of
pictures of cats. The computer finds the common patterns all by itself and learns how to
identify a cat. This is the essence of deep learning.
In technical terms, deep learning uses something called "neural networks," which are
inspired by the human brain. These networks consist of layers of interconnected nodes that
process information. The more layers, the "deeper" the network, allowing it to learn more
complex features and perform more sophisticated tasks.
Capabilities of DL
Key Features of DL
Deep learning uses feature extraction to recognize similar features of the same label and then uses
decision boundaries to determine which features accurately represent each label. In the cats and dogs
classification, the deep learning models will extract information such as the eyes, face, and body shape of
animals and divide them into two classes.
The deep learning model consists of deep neural networks. The simple neural network consists of an input
layer, a hidden layer, and an output layer. Deep learning models consist of multiple hidden layers, with
additional layers that the model's accuracy has improved.
AI Vs ML VS DL
the idea behind the artificial intelligence is to
mimic the human behavior, so same is "the idea
of deep learning to build such algorithm that
can mimic the brain".
Artificial Intelligence is basically the mechanism to incorporate human intelligence into machines
through a set of rules(algorithm). AI is a combination of two words: “Artificial” meaning something
made by humans or non-natural things and “Intelligence” meaning the ability to understand or think
accordingly. Another definition could be that “AI is basically the study of training your
machine(computers) to mimic a human brain and its thinking capabilities”. AI focuses
on 3 major aspects(skills): learning, reasoning, and self-correction to obtain the maximum
efficiency possible.
Machine Learning is basically the study/process which provides the system(computer) to learn
automatically on its own through experiences it had and improve accordingly without being explicitly
programmed. ML is an application or subset of AI. ML focuses on the development of programs
so that it can access data to use it for itself. The entire process makes observations on data to
identify the possible patterns being formed and make better future decisions as per the examples
provided to them. The major aim of ML is to allow the systems to learn by themselves
through experience without any kind of human intervention or assistance.
Deep Learning:
Deep Learning is basically a sub-part of the broader family of Machine Learning which makes use of
Neural Networks(similar to the neurons working in our brain) to mimic human brain-like behavior.
DL algorithms focus on information processing patterns mechanism to possibly identify the
patterns just like our human brain does and classifies the information accordingly. DL works on
larger sets of data when compared to ML and the prediction mechanism is self-administered
by machines.
Machine Learning Deep Learning
Takes less time to train the model. Takes more time to train the model.
Less complex and easy to interpret the More complex, it works like the black box
Perceptron
Perceptron is the most commonly used term for all folks. It is the primary step to learn
Machine Learning and Deep Learning technologies, which consists of a set of weights, input
values or scores, and a threshold.
Perceptron is a building block of an Artificial Neural Network. Initially, in the mid of 19th
century, Mr. Frank Rosenblatt invented the Perceptron for performing certain calculations to
detect input data capabilities or business intelligence.
Perceptron model is also treated as one of the best and simplest types of Artificial Neural
networks. However, it is a supervised learning algorithm of binary classifiers. Hence, we can
consider it as a single-layer neural network with four main parameters, i.e., input values,
weights and Bias, net sum, and an activation function.
Mr. Frank Rosenblatt invented the perceptron model as a binary classifier which contains
three main [Link] classifiers can be considered as linear classifiers. In simple
words, we can understand it as a classification algorithm that can predict linear predictor
function in terms of weight and feature vectors.
Features of the model we want to train should be passed as input to the
The inputs will be multiplied by the weights or weight coefficients and the
The Bias value is added to move the output function away from the origin
This computed value will be fed to the activation function (chosen based on the
The result value from the activation function is the output value
○ Input Nodes or Input Layer:
This is the primary component of Perceptron which accepts the initial data
into the system for further processing. Each input node contains a real
numerical value.
○ Wight and Bias:
Weight parameter represents the strength of the connection between
units. This is another most important parameter of Perceptron
components. Weight is directly proportional to the strength of the
associated input neuron in deciding the output. Further, Bias can be
considered as the line of intercept in a linear equation.
○ Activation Function:
These are the final and important components that help to determine
whether the neuron will fire or not. Activation Function can be considered
primarily as a step function.
Multi-layer Perceptron
Multi-layer perception is also known as MLP. It is fully connected dense layers,
which transform any input dimension to the desired dimension. A multi-layer
perception is a neural network that has multiple layers. To create a neural
network we combine neurons together so that the outputs of some neurons
are inputs of other neurons.
A multi-layer perceptron has one input layer and for each input, there is one
neuron(or node), it has one output layer with a single node for each output and
it can have any number of hidden layers and each hidden layer can have any
number of nodes. A schematic diagram of a Multi-Layer Perceptron (MLP) is
depicted below.
In the multi-layer perceptron diagram above, we can see that there are three
inputs and thus three input nodes and the hidden layer has three nodes. The
output layer gives two outputs, therefore there are two output nodes. The nodes
in the input layer take input and forward it for further process, in the diagram
above the nodes in the input layer forwards their output to each of the three
nodes in the hidden layer, and in the same way, the hidden layer processes the
information and passes it to the output layer.
Every node in the multi-layer perception uses a sigmoid activation function. The
sigmoid activation function takes real values as input and converts them to
numbers between 0 and 1 using the sigmoid formula.
Types of DNN
1. ANN
2. CNN
ANN Architecture: Artificial Neural Networks
Artificial Neural Networks contain artificial neurons which are called units. These
units are arranged in a series of layers that together constitute the whole Artificial
Neural Network in a system. A layer can have only a dozen units or millions of units
as this depends on how the complex neural networks will be required to learn the
hidden patterns in the dataset. Commonly, Artificial Neural Network has an input
layer, an output layer as well as hidden layers. The input layer receives data from
the outside world which the neural network needs to analyze or learn about. Then
this data passes through one or multiple hidden layers that transform the input into
data that is valuable for the output layer. Finally, the output layer provides an
output in the form of a response of the Artificial Neural Networks to input data
provided.
In the majority of neural networks, units are interconnected from one layer to
another. Each of these connections has weights that determine the influence of one
unit on another unit. As the data transfers from one unit to another, the neural
network learns more and more about the data which eventually results in an output
from the output layer.
The structures and operations of human neurons serve as the basis for
artificial neural networks. It is also known as neural networks or neural nets.
The input layer of an artificial neural network is the first layer, and it receives
input from external sources and releases it to the hidden layer, which is the
second layer. In the hidden layer, each neuron receives input from the previous
layer neurons, computes the weighted sum, and sends it to the neurons in the
next layer. These connections are weighted means effects of the inputs from
the previous layer are optimized more or less by assigning different-different
weights to each input and it is adjusted during the training process by
optimizing these weights for improved model performance.
Biological Artificial
Neuron Neuron
Dendrite Inputs
Cell nucleus or
Nodes
Soma
Synapses Weights
Axon Output
How do Artificial Neural Networks learn?
Artificial neural networks are trained using a training set. For example,
suppose you want to teach an ANN to recognize a cat. Then it is shown
thousands of different images of cats so that the network can learn to identify
a cat. Once the neural network has been trained enough using images of cats,
then you need to check if it can identify cat images correctly. This is done by
making the ANN classify the images it is provided by deciding whether they
are cat images or not. The output obtained by the ANN is corroborated by a
human-provided description of whether the image is a cat image or not. If the
ANN identifies incorrectly then back-propagation is used to adjust whatever it
has learned during training. Backpropagation is done by fine-tuning the
weights of the connections in ANN units based on the error rate obtained. This
process continues until the artificial neural network can correctly recognize a
cat in an image with minimal possible error rates.
What are the types of Artificial Neural Networks?
● Feedforward Neural Network: The feedforward neural network is one of the most basic
artificial neural networks. In this ANN, the data or the input provided travels in a single
direction. It enters into the ANN through the input layer and exits through the output layer
while hidden layers may or may not exist. So the feedforward neural network has a front-
propagated wave only and usually does not have backpropagation.
● Convolutional Neural Network: A Convolutional neural network has some similarities to the
feed-forward neural network, where the connections between units have weights that
determine the influence of one unit on another unit. But a CNN has one or more than one
convolutional layer that uses a convolution operation on the input and then passes the result
obtained in the form of output to the next layer. CNN has applications in speech and image
processing which is particularly useful in computer vision.
● Modular Neural Network: A Modular Neural Network contains a collection of different neural
networks that work independently towards obtaining the output with no interaction between
them. Each of the different neural networks performs a different sub-task by obtaining
unique inputs compared to other networks. The advantage of this modular neural network is
Applications of Artificial Neural Networks
1. Social Media: Artificial Neural Networks are used heavily in Social Media. For example, let’s take the ‘People you may know’
feature on Facebook that suggests people that you might know in real life so that you can send them friend requests. Well, this
magical effect is achieved by using Artificial Neural Networks that analyze your profile, your interests, your current friends, an
also their friends and various other factors to calculate the people you might potentially know. Another common application of
Machine Learning in social media is facial recognition. This is done by finding around 100 reference points on the person’s face
and then matching them with those already available in the database using convolutional neural networks.
2. Marketing and Sales: When you log onto E-commerce sites like Amazon and Flipkart, they will recommend your products to buy
based on your previous browsing history. Similarly, suppose you love Pasta, then Zomato, Swiggy, etc. will show you restauran
recommendations based on your tastes and previous order history. This is true across all new-age marketing segments like
Book sites, Movie services, Hospitality sites, etc. and it is done by implementing personalized marketing. This uses Artificial
Neural Networks to identify the customer likes, dislikes, previous shopping history, etc., and then tailor the marketing
campaigns accordingly.
3. Healthcare: Artificial Neural Networks are used in Oncology to train algorithms that can identify cancerous tissue at the
microscopic level at the same accuracy as trained physicians. Various rare diseases may manifest in physical characteristics and
can be identified in their premature stages by using Facial Analysis on the patient photos. So the full-scale implementation of
Artificial Neural Networks in the healthcare environment can only enhance the diagnostic abilities of medical experts and
ultimately lead to the overall improvement in the quality of medical care all over the world.
4. Personal Assistants: I am sure you all have heard of Siri, Alexa, Cortana, etc., and also heard them based on the phones you
have!!! These are personal assistants and an example of speech recognition that uses Natural Language Processing to interact
with the users and formulate a response accordingly. Natural Language Processing uses artificial neural networks that are made t
handle many tasks of these personal assistants such as managing the language syntax, semantics, correct speech, the conversation
that is going on, etc.
TensorFlow
References
• [Link]
• [Link]
tensorflow#popular_libraries_for_deep_learning
TensorFlow is a popular framework of machine learning and deep learning. It is a free
and open-source library which is released on 9 November 2015 and developed by Google
Brain Team. It is entirely based on Python programming language and use for numerical
computation and data flow, which makes machine learning faster and easier.
TensorFlow can train and run the deep neural networks for image recognition,
handwritten digit classification, recurrent neural network, word embedding, natural
language processing, video detection, and many more. TensorFlow is run on multiple
CPUs or GPUs and also mobile operating systems.
The word TensorFlow is made by two words, i.e., Tensor and Flow
E.g.
A tensor can be generated from the input data or the result of a computation. In
TensorFlow, all operations are conducted inside a graph. The group is a set of calculation
that takes place successively. Each transaction is called an op node are connected.
Advantages
○ It was fixed to run on multiple CPUs or GPUs and mobile operating systems.
○ The portability of the graph allows to conserve the computations for current or
later use. The graph can be saved because it can be executed in the future.
○ All the computation in the graph is done by connecting tensors together.
is a state that holds an initial value. The values are nothing but tensors. The
variables can be added to the computational graph by calling the constructors.
Whenever a variable is created, it is always initialized. They basically hold weights
and biases during the session execution.
Syntax:
[Link](initial_value=None,
trainable=None,
validate_shape=True,
caching_device=None,
name=None,
variable_def=None,
dtype=None,
import_scope=None,
constraint=None,
synchronization=[Link],
aggregation=[Link],
Creating and Manipulating Tensor Variables
Variables
Code with Tensorflow Version 1.x
%tensorflow_version 1.x
import tensorflow as tf
print(tf.__version__)
print(variable)
#variable must be initialized before a graph is used for the first time.
init = tf.global_variables_initializer()
sess = [Link]()
[Link](init)
%tensorflow_version 2.x
import tensorflow as tf
print(tf.__version__)
#Variables are defined by providing their initial value
and type
variable = [Link]([0.9,0.7], dtype = tf.float32)
print(variable)
Output - 2.4.1
<[Link] 'Variable:0' shape=(2,) dtype=float32,
numpy=array([0.9, 0.7], dtype=float32)>
[Link]
As the name suggests, it is an empty place. It is an empty variable to which the
training data is fed later. The [Link] allows us to create the structure
first which is setting up of computational graphs and then feeding the data
into it. It allows us to put the data during runtime. As the session starts, we
feed the data into the placeholders.
Syntax: [Link](dtype, shape=None, name=None)
# run session
[Link](c, feed_dict={a: 1, b: 8})
Placeholders
import tensorflow as tf
print(tf.__version__)
a = [Link](tf.float32)
b = [Link](tf.float32)
add = a + b
sess = [Link]()
# Executing add by passing the values [1, 3] [2, 4] for a and b respectively
import tensorflow as tf
tensor1 = [Link]([[1, 2, 3, 4]], dtype=float)
tensor1
Addition of Two numbers
Addition of Two numbers
a = [Link](5,name="a")
b = [Link](15,name="b")
c = [Link](a,b,name="c")
print("Value of c before running tensor:",c)
Value of c before running tensor: Tensor("c_5:0",
shape=(), dtype=int32)
sess = [Link]()
output = [Link](c)
print("Value of c after running graph:",output)
Subtraction of two numbers
import tensorflow as tf
a = [Link](5,name="a")
b = [Link](15,name="b")
c = [Link](a,b,name="c")
d = [Link](a,b,name="d")
sess = [Link]()
output = [Link](c);
print(output)
[Link]()
Tensor Flow Operations
We can perform addition, subtraction, multiplication, division, and many more operations with
TensorFlow variables.
# import packages
import tensorflow as tf
# create two variables
tensor1 = [Link]([3, 4])
tensor2 = [Link]([5, 6])
print("Addition of tensors", tensor1+tensor2)
print("Subtraction of tensors", tensor1-tensor2)
print("Multiplication of tensors", tensor1*tensor2)
print("division of tensors", tensor1/tensor2)
● dtype: the datatype of the elements in the tensor that will be fed.
● shape : by default None. The tensor’s shape that will be fed , it is an optional parameter. One
can feed a tensor of any shape if the shape isn’t specified.
● name: by default None. The operation’s name , optional parameter.
Returns:
A Tensor that can be used to feed a value but cannot be evaluated directly.
# importing packages
import [Link].v1 as tf
GPU (Graphics Processing Unit): The GPU, composed of specialized cores, excels
at parallel processing tasks, especially graphics rendering. GPUs come in two types:
integrated (within the processor, sharing system memory) and dedicated (standalone
hardware with dedicated memory). Represented by industry leaders like Nvidia and
AMD, GPUs are essential for resource-intensive tasks such as deep learning and
high-end gaming.
Key Differences Between CPU and GPU:
1. Processing Speed: CPUs handle general tasks, while GPUs excel at handling
numerous calculations concurrently, making them perfect for tasks like
graphics processing and deep learning model training.
2. Computing Architecture: CPUs process tasks sequentially, whereas GPUs
thrive in parallel processing, ideal for handling large datasets or complex
computations in one go.
3. Number of Cores: CPUs have fewer but powerful cores, while GPUs boast
numerous CUDA cores (Nvidia) or stream processors (AMD), enabling
massive parallelism.
TensorFlow, known for its flexibility and versatility in deep learning, effectively
utilizes CPUs for various tasks. While GPUs are often associated with TensorFlow for
their parallel processing capabilities, CPUs play a crucial role in several aspects of
TensorFlow operations.
Role of CPUs for TensorFlow:
1. Preprocessing and Data Loading: CPUs are instrumental in preprocessing tasks such as data
augmentation, normalization, and data loading. These operations involve manipulating input
data before feeding it into the neural network model. Since these tasks are often sequential and
do not require parallel processing, CPUs are well-suited for handling them efficiently.
2. Inference and Deployment: During inference, when a trained model makes predictions on
new data, CPUs can handle the computational load efficiently. Many deployment scenarios, such
as serving models in production environments or running inference on devices with limited GPU
capabilities (such as mobile devices or edge devices), rely heavily on CPU resources.
3. Training on Small Datasets: For small datasets or simple models, where the computational
requirements are not as high, CPUs can perform adequately for training tasks. TensorFlow’s
ability to distribute computations across multiple CPU cores allows it to efficiently train models
on CPUs, albeit with potentially longer training times compared to GPUs.
4. Debugging and Development: CPUs are invaluable during the development and debugging
phases of deep learning projects. They enable researchers and developers to quickly iterate on
their models without the need for specialized GPU hardware. TensorFlow’s compatibility with
CPUs ensures seamless development and debugging experiences across different computing
environments.
5. Handling Non-Parallelizable Operations: Certain operations within TensorFlow, such as
control flow constructs (e.g., loops, conditionals) or custom operations with complex logic, may
not be easily parallelizable. CPUs excel at handling these non-parallelizable tasks efficiently due
to their sequential processing nature.
Role of GPUs for TensorFlow:
1. Parallel Processing Power: One of the primary reasons for using GPUs in deep learning is their
exceptional parallel processing capabilities. Unlike CPUs, which are optimized for sequential tasks,
GPUs contain thousands of cores that can execute multiple computations simultaneously. This
parallelism is crucial for handling the massive matrix multiplications and convolutions inherent in
deep neural network operations.
2. Accelerated Training Speeds: Deep learning model training involves iterative optimization of
millions of parameters through backpropagation. GPUs excel at processing these computations in
parallel, resulting in significantly faster training speeds compared to CPUs. With GPUs, deep
learning practitioners can train complex models on large datasets in a fraction of the time it would
take with CPUs alone, thereby accelerating the research and development process.
3. Model Complexity and Scale: The scalability of GPUs allows deep learning practitioners to tackle
increasingly complex models and larger datasets. As deep neural networks grow in depth and width
to achieve higher accuracy and handle more intricate tasks, GPUs provide the computational
horsepower required to train these models efficiently. TensorFlow’s compatibility with GPUs enables
researchers and engineers to harness this scalability effectively.
4. High-Performance Computing (HPC) Applications: GPUs are also well-suited for high-
performance computing (HPC) applications beyond deep learning. Tasks such as scientific
simulations, computational fluid dynamics, and molecular dynamics simulations benefit from the
parallel processing capabilities of GPUs. TensorFlow’s integration with GPU-accelerated libraries like
cuDNN (CUDA Deep Neural Network library) further enhances its performance on GPU hardware.
5. Real-time Inference and Deployment: GPUs play a crucial role in real-time inference and
deployment of deep learning models. By leveraging the parallel processing capabilities of GPUs,
TensorFlow can perform rapid predictions on new data, making it suitable for applications such as
image recognition, natural language processing, and autonomous driving. GPUs enable efficient
deployment of TensorFlow models in production environments, ensuring low-latency responses and
high throughput.
Logistic Regression Model on TensorFlow
Logistic Regression is Classification algorithm commonly used in Machine
Learning. It allows categorizing data into discrete classes by learning the
relationship from a given set of labeled data. It learns a linear
relationship from the given dataset and then introduces a non-linearity in
the form of the Sigmoid function.
def sigmoid(z):
return 1 / (1 + [Link]( - z))
[Link]()
# importing modules
import numpy as np
import pandas as pd
import tensorflow as tf
import [Link] as plt
from [Link] import OneHotEncoder
Logging and Training the Logistic Regression
Logistic Regression: Logistic Regression is Classification algorithm
commonly used in Machine Learning. It allows categorizing data into
discrete classes by learning the relationship from a given set of labeled
data. It learns a linear relationship from the given dataset and then
introduces a non-linearity in the form of the Sigmoid function. In case of
Logistic regression, the hypothesis is the Sigmoid of a straight line, i.e,
h(x)=σ(wx+b) where
Where the vector w represents the Weights and the scalar b represents the
Bias of the model. Let us visualize the Sigmoid Function –
import numpy as np
import [Link] as plt
def sigmoid(z):
return 1 / (1 + [Link]( - z))
[Link]()
Introduction of Keras and PyTorch
Keras has a high level of API as it is capable on running on top of the other
frameworks. So it is easier to use and less code are needed.
PyTorch has a low level API and hence it becomes little difficult to use and more code
is required for the similar task but it gives better control to the programmer.
S.N Kera PyTorc
o are two
Keras and PyTorch s h most powerful open-source machine learning
of the
libraries. Keras is a python based open-source library used in deep learning (for
neural networks).It can run on top of TensorFlow, Microsoft CNTK or Theano. It is
very simple to understand and use, and suitable for fast experimentation. Keras
models can be run both on CPU as well as GPU. PyTorch is an open-source
machine learning library which was developed by Facebook’s AI Research Group. It
can be integrated with Python and C++. It is popular because of its efficient
S.N
Keras PyTorch
o
2. Keras has a high level API. While PyTorch has a low level API.
Keras has a simple architecture,making it While PyTorch has very low readability
4.
more readable and easy to use. due to a complex architecture.
Keras is mostly used for small datasets due While PyTorch is preferred for large
6.
to its slow speed. datasets and high performance.
Thank You