0% found this document useful (0 votes)
93 views4 pages

Deep Learning Activation Functions Explained

HPC

Uploaded by

Aditya Pimpale
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
93 views4 pages

Deep Learning Activation Functions Explained

HPC

Uploaded by

Aditya Pimpale
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

Deep Learning Unit 2

A perceptron is a basic unit in a neural network that takes multiple inputs, applies weights to them,
sums them up, and produces an output based on whether the sum exceeds a certain threshold.

Perceptrons are trained using a supervised learning approach.

1. Initialization: Start by assigning random weights and a bias to the perceptron.


2. Forward Pass: For each training example:
 Calculate the weighted sum of inputs.
 Apply an activation function to get the perceptron's output.
3. Error Calculation: Compare the perceptron's output to the actual target output and determine the
error.
4. Weight Update: Adjust the weights and bias to reduce the error using the Perceptron Learning
Rule.
5. Repeat Steps 2-4: Keep going through the training data, adjusting weights until the perceptron
achieves a satisfactory accuracy on the training set.

Activation functions determine the output of a neural network, introducing nonlinearity to enable
learning complex patterns.

 Purpose: They transform input signals into output signals, allowing neural networks to model and
understand nonlinear relationships in data.

List of activation functions: -

1. Sigmoid Function, Tanh function, Linear, Hard tanh function, Softmax , Rectified linear

Sigmoid Function:

1. Range: Outputs values between 0 and 1.


2. Smoothness: It provides smooth and
continuous transitions.
3. Vanishing Gradient: Can suffer from
vanishing gradient problem, slowing down
training.
4. Application: Often used in binary
classification tasks for outputting
probabilities.

Hyperbolic Tangent Function (tanh):


1. Range: Outputs values between -1 and 1.
2. Zero-Centered: Unlike sigmoid, it has a mean of zero, aiding convergence.
3. Smoothness: Smooth and continuous like sigmoid.
4. Vanishing Gradient: Also prone to vanishing gradient issue but mitigated by its zero-centered
nature.
Hyper parameters

1. Unlike weights and biases, hyperparameters are set before training.


2. Network Structure: Define number of layers and neurons (think model complexity).
3. Learning Rate: Controls how much weights are adjusted during training (like step size).
4. Batch Size: Number of data points used to update weights at once (efficiency vs. noise).
5. Epochs: Number of times the entire training set goes through the network (iterations).
6. Regularization: Techniques like L1/L2 or dropout (prevent overfitting).
7. Optimization: Choice of optimizer algorithm (e.g., Adam, SGD) to update weights.

Forward Propagation:
1. Input: Data point enters the network's first layer.
2. Activation: Each neuron in a layer calculates a weighted sum of its inputs and applies an
activation function.
3. Propagation: This activated value becomes the input for the next layer's neurons.
4. Output: The final layer produces the network's prediction.
5. Comparison: The prediction is compared to the actual target value (error calculation).
Back Propagation:
1. Initialize Weights: Start with random weights and biases for the neural network.

2. Forward Pass: Pass input data through the network to make predictions.

3. Calculate Error: Compare predicted outputs with actual outputs to compute the error.

4. Backward Pass: Propagate the error backward through the network to adjust weights.

5. Update Weights: Use the error information to update weights and biases, refining the network's
predictions.

6. Repeat steps 2-5 until the network learns the patterns in the data.

Loss functions measure how well a neural network performs on a task.

1. They quantify the difference between the network's predictions and the actual target values.
2. The goal during training is to minimize this loss function.
3. Loss functions are: - Mean squared error, mean absolute error, binary cross entropy, Categorical
Cross-Entropy.

Mean Absolute Error (MAE):

1. Definition: Measures average absolute difference between predicted and actual values.

2. Interpretation: Robust to outliers, treats all errors equally regardless of direction.

3. Calculation: Compute average of absolute differences between predictions and actual values.
Mean Squared Error (MSE):

1. Definition: Measures average squared difference between predicted and actual values.

2. Interpretation: Emphasizes larger errors, sensitive to outliers due to squaring.

3. Calculation: Compute average of squared differences between predictions and actual values.

Sentiment analysis, also known as opinion mining, is a technique used to understand the emotional
tone of text data.

1. Input: Text data like reviews, social media posts, or articles is fed into the system.

2. Analysis: Techniques include lexicon-based (lists of positive/negative words) and machine


learning (trained models to analyze structure and context).

3. Output: Text is categorized as positive (happiness), negative (anger), or neutral (lack of strong
sentiment).

4. Applications: Used by businesses to gauge customer satisfaction, monitor brand reputation,


understand public opinion, and improve marketing.

5. Benefits: Offers insights beyond text analysis, aiding in understanding opinions and emotions.

6. Limitations: Challenged by sarcasm, slang, and complex emotions, leading to potential nuances
being missed.

In deep learning, regularization refers to techniques that prevent overfitting by constraining the
complexity of the model. This helps the model generalize better to unseen data.
Single layer feed forward network: -

1. Input: Data enters through input neurons, representing


features.

2. Weighting: Each input is multiplied by a weight to adjust


its influence.

3. Summing: Weighted inputs are summed in each


neuron.

4. Activation: Neurons use an activation function (like a threshold) to determine output.

5. Output: Final output is the result of the activation function in each neuron.

6. Learning: During training, weights are adjusted based on prediction errors to improve mapping of
inputs to outputs.

Multi-layer feed forward network: -

1. Input: Data enters through the first layer's


neurons, representing features.

2. Propagation: Neurons in each layer calculate


weighted sums of inputs and apply activation
functions.

3. Hidden Layers: Weighted sums from the first


layer become inputs for subsequent hidden
layers.

4. Multiple Layers: Network complexity and learning capability depend on the number of hidden
layers and neurons.

5. Output Layer: Final layer's neurons generate predictions (e.g., image classification).

6. Learning: Weights are adjusted during training based on prediction errors, propagating back
through layers (backpropagation) to improve learning.

Common questions

Powered by AI

Forward propagation calculates the network's predictions by passing input data through each layer, applying weighted sums and activation functions. This process translates raw input into structured output, which is then evaluated against target values to guide error correction through backpropagation. Each forward pass informs model adjustments, improving the prediction accuracy over successive iterations .

Single-layer feedforward networks process inputs through a single layer, limiting them to linearly separable tasks. In contrast, multi-layer networks employ numerous hidden layers, allowing them to learn more complex, non-linear patterns due to increased depth and computational capability. This multi-layer structure enhances their ability to approximate diverse functions and generalize broader contexts .

Backpropagation efficiently computes gradients of the loss function with respect to weights, enabling weight adjustments that minimize the error iteratively across layers. This method allows neural networks to update weights in a way that optimizes the final output against the training data, refining predictions through continuous error correction .

Sentiment analysis uses machine learning models trained on labeled data to detect patterns in the textual context, identifying positive, negative, or neutral sentiments. This approach offers nuanced insights into consumer opinions and emotions. However, it struggles with sarcasm, slang, and complex emotional expressions, potentially missing subtleties in natural language interactions .

The initialization of weights and biases is crucial in determining the starting point for gradient descent during training. Random starting values can lead to differences in the convergence speed and the potential to get stuck in local minima. Proper initialization can help avoid issues such as the vanishing gradient problem, especially in deep networks, by ensuring variance is maintained across layers .

Activation functions introduce non-linearity by transforming linear inputs into non-linear outputs, which allow neural networks to model complex patterns and relationships within data. Functions like sigmoid and tanh enable smooth mapping of inputs to outputs, assisting in handling non-linear separations. This capacity is essential for neural networks to approximate any continuous function, underpinning their superiority over linear models in pattern recognition tasks .

A loss function quantifies the difference between predicted outputs and actual target values, guiding the optimization process to minimize this discrepancy during training. MAE measures the average absolute difference and treats all errors equally, offering robustness to outliers but potentially leading to slower convergence. MSE emphasizes larger errors by squaring them, aiding faster gradient descent convergence but being sensitive to outliers .

Hyperparameters, such as learning rate, batch size, and epochs, critically affect the training efficiency and performance. The learning rate determines the step size for weight updates, with too high or too low values leading to slow convergence or instability. Batch size influences the trade-off between noisy estimates of the gradient and efficient computation, while the number of epochs affects the extent of learning from the data, potentially leading to overfitting if excessive .

The vanishing gradient problem occurs when gradients become too small, preventing effective weight updates in deep networks. Activation functions like the Hyperbolic Tangent (tanh) are zero-centered, which helps to mitigate this issue by maintaining non-zero gradients throughout the network. Despite this, both tanh and sigmoid functions can still suffer from vanishing gradients, as they squash input to small ranges, leading to increasingly smaller gradients .

Regularization reduces overfitting by constraining the model complexity, which leads to better generalization on unseen data. Common techniques include L1 and L2 regularization that add penalties for larger weights, forcing the model to focus on a simpler hypothesis space. Dropout randomly turns off neurons during training, further discouraging complex co-adaptations between neurons .

You might also like