0% found this document useful (0 votes)

93 views4 pages

Deep Learning Activation Functions Explained

HPC

Uploaded by

Aditya Pimpale

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

93 views4 pages

Deep Learning Activation Functions Explained

HPC

Uploaded by

Aditya Pimpale

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

Deep Learning Unit 2

A perceptron is a basic unit in a neural network that takes multiple inputs, applies weights to them,
sums them up, and produces an output based on whether the sum exceeds a certain threshold.

Perceptrons are trained using a supervised learning approach.

1. Initialization: Start by assigning random weights and a bias to the perceptron.

2. Forward Pass: For each training example:
 Calculate the weighted sum of inputs.
 Apply an activation function to get the perceptron's output.
3. Error Calculation: Compare the perceptron's output to the actual target output and determine the
error.
4. Weight Update: Adjust the weights and bias to reduce the error using the Perceptron Learning
Rule.
5. Repeat Steps 2-4: Keep going through the training data, adjusting weights until the perceptron
achieves a satisfactory accuracy on the training set.

Activation functions determine the output of a neural network, introducing nonlinearity to enable
learning complex patterns.

 Purpose: They transform input signals into output signals, allowing neural networks to model and
understand nonlinear relationships in data.

List of activation functions: -

1. Sigmoid Function, Tanh function, Linear, Hard tanh function, Softmax , Rectified linear

Sigmoid Function:

1. Range: Outputs values between 0 and 1.

2. Smoothness: It provides smooth and
continuous transitions.
3. Vanishing Gradient: Can suffer from
vanishing gradient problem, slowing down
training.
4. Application: Often used in binary
classification tasks for outputting
probabilities.

Hyperbolic Tangent Function (tanh):

1. Range: Outputs values between -1 and 1.
2. Zero-Centered: Unlike sigmoid, it has a mean of zero, aiding convergence.
3. Smoothness: Smooth and continuous like sigmoid.
4. Vanishing Gradient: Also prone to vanishing gradient issue but mitigated by its zero-centered
nature.
Hyper parameters

1. Unlike weights and biases, hyperparameters are set before training.

2. Network Structure: Define number of layers and neurons (think model complexity).
3. Learning Rate: Controls how much weights are adjusted during training (like step size).
4. Batch Size: Number of data points used to update weights at once (efficiency vs. noise).
5. Epochs: Number of times the entire training set goes through the network (iterations).
6. Regularization: Techniques like L1/L2 or dropout (prevent overfitting).
7. Optimization: Choice of optimizer algorithm (e.g., Adam, SGD) to update weights.

Forward Propagation:
1. Input: Data point enters the network's first layer.
2. Activation: Each neuron in a layer calculates a weighted sum of its inputs and applies an
activation function.
3. Propagation: This activated value becomes the input for the next layer's neurons.
4. Output: The final layer produces the network's prediction.
5. Comparison: The prediction is compared to the actual target value (error calculation).
Back Propagation:
1. Initialize Weights: Start with random weights and biases for the neural network.

2. Forward Pass: Pass input data through the network to make predictions.

3. Calculate Error: Compare predicted outputs with actual outputs to compute the error.

4. Backward Pass: Propagate the error backward through the network to adjust weights.

5. Update Weights: Use the error information to update weights and biases, refining the network's
predictions.

6. Repeat steps 2-5 until the network learns the patterns in the data.

Loss functions measure how well a neural network performs on a task.

1. They quantify the difference between the network's predictions and the actual target values.
2. The goal during training is to minimize this loss function.
3. Loss functions are: - Mean squared error, mean absolute error, binary cross entropy, Categorical
Cross-Entropy.

Mean Absolute Error (MAE):

1. Definition: Measures average absolute difference between predicted and actual values.

2. Interpretation: Robust to outliers, treats all errors equally regardless of direction.

3. Calculation: Compute average of absolute differences between predictions and actual values.
Mean Squared Error (MSE):

1. Definition: Measures average squared difference between predicted and actual values.

2. Interpretation: Emphasizes larger errors, sensitive to outliers due to squaring.

3. Calculation: Compute average of squared differences between predictions and actual values.

Sentiment analysis, also known as opinion mining, is a technique used to understand the emotional
tone of text data.

1. Input: Text data like reviews, social media posts, or articles is fed into the system.

2. Analysis: Techniques include lexicon-based (lists of positive/negative words) and machine

learning (trained models to analyze structure and context).

3. Output: Text is categorized as positive (happiness), negative (anger), or neutral (lack of strong
sentiment).

4. Applications: Used by businesses to gauge customer satisfaction, monitor brand reputation,

understand public opinion, and improve marketing.

5. Benefits: Offers insights beyond text analysis, aiding in understanding opinions and emotions.

6. Limitations: Challenged by sarcasm, slang, and complex emotions, leading to potential nuances
being missed.

In deep learning, regularization refers to techniques that prevent overfitting by constraining the
complexity of the model. This helps the model generalize better to unseen data.
Single layer feed forward network: -

1. Input: Data enters through input neurons, representing

features.

2. Weighting: Each input is multiplied by a weight to adjust

its influence.

3. Summing: Weighted inputs are summed in each

neuron.

4. Activation: Neurons use an activation function (like a threshold) to determine output.

5. Output: Final output is the result of the activation function in each neuron.

6. Learning: During training, weights are adjusted based on prediction errors to improve mapping of
inputs to outputs.

Multi-layer feed forward network: -

1. Input: Data enters through the first layer's

neurons, representing features.

2. Propagation: Neurons in each layer calculate

weighted sums of inputs and apply activation
functions.

3. Hidden Layers: Weighted sums from the first

layer become inputs for subsequent hidden
layers.

4. Multiple Layers: Network complexity and learning capability depend on the number of hidden
layers and neurons.

5. Output Layer: Final layer's neurons generate predictions (e.g., image classification).

6. Learning: Weights are adjusted during training based on prediction errors, propagating back
through layers (backpropagation) to improve learning.

Common questions

Forward propagation calculates the network's predictions by passing input data through each layer, applying weighted sums and activation functions. This process translates raw input into structured output, which is then evaluated against target values to guide error correction through backpropagation. Each forward pass informs model adjustments, improving the prediction accuracy over successive iterations .

Single-layer feedforward networks process inputs through a single layer, limiting them to linearly separable tasks. In contrast, multi-layer networks employ numerous hidden layers, allowing them to learn more complex, non-linear patterns due to increased depth and computational capability. This multi-layer structure enhances their ability to approximate diverse functions and generalize broader contexts .

Backpropagation efficiently computes gradients of the loss function with respect to weights, enabling weight adjustments that minimize the error iteratively across layers. This method allows neural networks to update weights in a way that optimizes the final output against the training data, refining predictions through continuous error correction .

Sentiment analysis uses machine learning models trained on labeled data to detect patterns in the textual context, identifying positive, negative, or neutral sentiments. This approach offers nuanced insights into consumer opinions and emotions. However, it struggles with sarcasm, slang, and complex emotional expressions, potentially missing subtleties in natural language interactions .

The initialization of weights and biases is crucial in determining the starting point for gradient descent during training. Random starting values can lead to differences in the convergence speed and the potential to get stuck in local minima. Proper initialization can help avoid issues such as the vanishing gradient problem, especially in deep networks, by ensuring variance is maintained across layers .

Activation functions introduce non-linearity by transforming linear inputs into non-linear outputs, which allow neural networks to model complex patterns and relationships within data. Functions like sigmoid and tanh enable smooth mapping of inputs to outputs, assisting in handling non-linear separations. This capacity is essential for neural networks to approximate any continuous function, underpinning their superiority over linear models in pattern recognition tasks .

A loss function quantifies the difference between predicted outputs and actual target values, guiding the optimization process to minimize this discrepancy during training. MAE measures the average absolute difference and treats all errors equally, offering robustness to outliers but potentially leading to slower convergence. MSE emphasizes larger errors by squaring them, aiding faster gradient descent convergence but being sensitive to outliers .

Hyperparameters, such as learning rate, batch size, and epochs, critically affect the training efficiency and performance. The learning rate determines the step size for weight updates, with too high or too low values leading to slow convergence or instability. Batch size influences the trade-off between noisy estimates of the gradient and efficient computation, while the number of epochs affects the extent of learning from the data, potentially leading to overfitting if excessive .

The vanishing gradient problem occurs when gradients become too small, preventing effective weight updates in deep networks. Activation functions like the Hyperbolic Tangent (tanh) are zero-centered, which helps to mitigate this issue by maintaining non-zero gradients throughout the network. Despite this, both tanh and sigmoid functions can still suffer from vanishing gradients, as they squash input to small ranges, leading to increasingly smaller gradients .

Regularization reduces overfitting by constraining the model complexity, which leads to better generalization on unseen data. Common techniques include L1 and L2 regularization that add penalties for larger weights, forcing the model to focus on a simpler hypothesis space. Dropout randomly turns off neurons during training, further discouraging complex co-adaptations between neurons .

Practical Methodology in Deep Learning
No ratings yet
Practical Methodology in Deep Learning
25 pages
Debugging Strategies for Deep Learning
No ratings yet
Debugging Strategies for Deep Learning
5 pages
Deep Learning Practical Methodology Guide
No ratings yet
Deep Learning Practical Methodology Guide
19 pages
Default Baseline Models in Deep Learning
No ratings yet
Default Baseline Models in Deep Learning
2 pages
R Programming: Debugging Techniques
No ratings yet
R Programming: Debugging Techniques
10 pages
Understanding Neural Networks Basics
No ratings yet
Understanding Neural Networks Basics
16 pages
McCulloch-Pitts Neuron vs Perceptron
No ratings yet
McCulloch-Pitts Neuron vs Perceptron
15 pages
Sigmoid Deep Learning
No ratings yet
Sigmoid Deep Learning
8 pages
Deep Learning Lab Experiments Guide
No ratings yet
Deep Learning Lab Experiments Guide
23 pages
Overview of Natural Language Processing
No ratings yet
Overview of Natural Language Processing
37 pages
DL Lab Manual: Neural Network Programs
No ratings yet
DL Lab Manual: Neural Network Programs
29 pages
Software Engineering UNIT-4
No ratings yet
Software Engineering UNIT-4
46 pages
Enhancing Deep Learning with Bayesian Inference
No ratings yet
Enhancing Deep Learning with Bayesian Inference
28 pages
5 Applications
No ratings yet
5 Applications
48 pages
Artificial Neural Networks Syllabus
No ratings yet
Artificial Neural Networks Syllabus
2 pages
MLP Input and Output Layer Insights
No ratings yet
MLP Input and Output Layer Insights
36 pages
Basics of Deep Learning Course Overview
No ratings yet
Basics of Deep Learning Course Overview
69 pages
Linkers, Debuggers, and Shell Programming
100% (2)
Linkers, Debuggers, and Shell Programming
27 pages
Backpropagation in Multilayer Perceptrons
100% (1)
Backpropagation in Multilayer Perceptrons
11 pages
JNTUH R22 DAA Syllabus Overview
100% (1)
JNTUH R22 DAA Syllabus Overview
2 pages
Dimensionality Reduction Techniques Explained
No ratings yet
Dimensionality Reduction Techniques Explained
6 pages
Deep Learning: Machine Learning Basics
No ratings yet
Deep Learning: Machine Learning Basics
35 pages
RNN and LSTM for Time Series Forecasting
No ratings yet
RNN and LSTM for Time Series Forecasting
13 pages
Challenges in Training Deep Neural Networks
No ratings yet
Challenges in Training Deep Neural Networks
4 pages
Machine Learning Optimization Techniques
No ratings yet
Machine Learning Optimization Techniques
51 pages
Data Visualization Lab
No ratings yet
Data Visualization Lab
13 pages
Understanding Recurrent Neural Networks
No ratings yet
Understanding Recurrent Neural Networks
21 pages
Characteristics of Predictive Models
No ratings yet
Characteristics of Predictive Models
25 pages
Data Science Overview and R Basics
No ratings yet
Data Science Overview and R Basics
22 pages
Deep Learning Overview and Concepts
No ratings yet
Deep Learning Overview and Concepts
45 pages
Distance Measures in Machine Learning
No ratings yet
Distance Measures in Machine Learning
24 pages
Regularized Autoencoders in Deep Learning
No ratings yet
Regularized Autoencoders in Deep Learning
5 pages
Machine Learning Notes AI Students (Unit 1&2)
No ratings yet
Machine Learning Notes AI Students (Unit 1&2)
118 pages
Single Layer Perceptrons Overview
No ratings yet
Single Layer Perceptrons Overview
25 pages
Introduction to Deep Learning Concepts
No ratings yet
Introduction to Deep Learning Concepts
58 pages
Recursive vs Non-Recursive Algorithms
No ratings yet
Recursive vs Non-Recursive Algorithms
10 pages
NLP Chapter-1
No ratings yet
NLP Chapter-1
24 pages
NLP Model for English to Gujarati Text
No ratings yet
NLP Model for English to Gujarati Text
7 pages
Machine Learning in Data Science: Unit 5
No ratings yet
Machine Learning in Data Science: Unit 5
19 pages
McCulloch-Pitts Neuron Model Overview
No ratings yet
McCulloch-Pitts Neuron Model Overview
53 pages
Deep Learning: Definition and Applications
No ratings yet
Deep Learning: Definition and Applications
63 pages
Backpropagation Algorithm Explained
No ratings yet
Backpropagation Algorithm Explained
13 pages
Deep Learning: Definition & Applications
100% (1)
Deep Learning: Definition & Applications
37 pages
String Matching Algorithms Overview
No ratings yet
String Matching Algorithms Overview
53 pages
ANN Functional Units for Pattern Recognition
No ratings yet
ANN Functional Units for Pattern Recognition
12 pages
DevOps Lab Manual R22
No ratings yet
DevOps Lab Manual R22
35 pages
Basics of Neural Networks Overview
No ratings yet
Basics of Neural Networks Overview
34 pages
AI Problem Classes Overview
No ratings yet
AI Problem Classes Overview
13 pages
B.Tech CSE AI & ML NLP Syllabus
No ratings yet
B.Tech CSE AI & ML NLP Syllabus
1 page
Unit 4 IDS: R Programming Concepts
100% (1)
Unit 4 IDS: R Programming Concepts
66 pages
Anatomy of Neural Networks in Keras
No ratings yet
Anatomy of Neural Networks in Keras
13 pages
Inverted Indexing in Information Retrieval
No ratings yet
Inverted Indexing in Information Retrieval
18 pages
Third-Generation Neural Networks Overview
No ratings yet
Third-Generation Neural Networks Overview
38 pages
7th Sem Syllabus for B.Tech CSE
No ratings yet
7th Sem Syllabus for B.Tech CSE
11 pages
Gradient Descent and Backpropagation Guide
100% (1)
Gradient Descent and Backpropagation Guide
48 pages
Introduction to Node.js and Its Benefits
No ratings yet
Introduction to Node.js and Its Benefits
97 pages
DL Half TechKnowledge
No ratings yet
DL Half TechKnowledge
50 pages
JNTUK R20 Deep Learning Notes PDF
No ratings yet
JNTUK R20 Deep Learning Notes PDF
61 pages
AAI Notes For Unit 1
No ratings yet
AAI Notes For Unit 1
17 pages
ADS Error Codes Overview
No ratings yet
ADS Error Codes Overview
8 pages
Basic Hardware Troubleshooting Methods
No ratings yet
Basic Hardware Troubleshooting Methods
15 pages
Science Exam Questions for Grade 4
No ratings yet
Science Exam Questions for Grade 4
2 pages
Internship ON Fundamentals of Iot and Python Carried Out at " Ethnotech Academic Solutions "
No ratings yet
Internship ON Fundamentals of Iot and Python Carried Out at " Ethnotech Academic Solutions "
27 pages
C Programming Tokens and Data Types Guide
No ratings yet
C Programming Tokens and Data Types Guide
37 pages
Shifted Power Method for Eigenvalues
No ratings yet
Shifted Power Method for Eigenvalues
12 pages
PIC16F87XA I2C Mode Overview
No ratings yet
PIC16F87XA I2C Mode Overview
15 pages
Initial Admin Business Role Setup
No ratings yet
Initial Admin Business Role Setup
9 pages
West Bengal Health Scheme Application Status
No ratings yet
West Bengal Health Scheme Application Status
1 page
Java Summer Training Overview
No ratings yet
Java Summer Training Overview
40 pages
Programmer References
No ratings yet
Programmer References
328 pages
Tralee TY Students Enjoy Ballyroe Ball
No ratings yet
Tralee TY Students Enjoy Ballyroe Ball
35 pages
Power P Programmable Controller Data Sheet
No ratings yet
Power P Programmable Controller Data Sheet
26 pages
Enterprise Data Storage Solutions Guide
No ratings yet
Enterprise Data Storage Solutions Guide
13 pages
Sangfor NGAF 8.0.26 Security Bundles
No ratings yet
Sangfor NGAF 8.0.26 Security Bundles
5 pages
Co-Creating Digital 700625 NDX
No ratings yet
Co-Creating Digital 700625 NDX
12 pages
CMOS 4000 Series IC List
100% (1)
CMOS 4000 Series IC List
3 pages
InspireComputing Y3 WB Answers
No ratings yet
InspireComputing Y3 WB Answers
23 pages
Good Abap Programming Practice Manual in Sap BW Incl Hana PDF
100% (3)
Good Abap Programming Practice Manual in Sap BW Incl Hana PDF
69 pages
Instruction Set Principles and Architectures: Computer Architecture Prof. Muhamed Mudawar
No ratings yet
Instruction Set Principles and Architectures: Computer Architecture Prof. Muhamed Mudawar
53 pages
ARGUS 800 Videoendoscopic Overview
No ratings yet
ARGUS 800 Videoendoscopic Overview
4 pages
Management Insights from TimesJobs Reviews
No ratings yet
Management Insights from TimesJobs Reviews
4 pages
International Journal of Nursing Sciences: Franklin Leung, Yee-Chun Lau, Martin Law, Shih-Kien Djeng
No ratings yet
International Journal of Nursing Sciences: Franklin Leung, Yee-Chun Lau, Martin Law, Shih-Kien Djeng
5 pages
Grey Level Transformation Techniques
No ratings yet
Grey Level Transformation Techniques
89 pages
Dandelia's Sacrifice for Fistulina
No ratings yet
Dandelia's Sacrifice for Fistulina
37 pages
VB Event-Driven Programming Basics
No ratings yet
VB Event-Driven Programming Basics
132 pages
Microsoft Viva for AI Transformation
No ratings yet
Microsoft Viva for AI Transformation
35 pages
The Design Review of Feature-Based Method in Embedding The Hidden Message in Text As The Implementation of Steganography
No ratings yet
The Design Review of Feature-Based Method in Embedding The Hidden Message in Text As The Implementation of Steganography
8 pages
Blockchain's Impact on Energy Sector
No ratings yet
Blockchain's Impact on Energy Sector
22 pages
Java Inheritance and Polymorphism Explained
No ratings yet
Java Inheritance and Polymorphism Explained
2 pages