0% found this document useful (0 votes)

36 views3 pages

Activation Functions in Neural Networks

The document is an assignment for an Introduction to Machine Learning course, covering various concepts such as gradient descent, activation functions, and maximum likelihood estimation. It includes multiple-choice questions with solutions provided for each question. Key topics discussed include the effects of activation functions in neural networks, transformations for linear separability, and the relationship between MLE and MAP.

Uploaded by

Vijay

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

36 views3 pages

Activation Functions in Neural Networks

Uploaded by

Vijay

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Assignment 5

Introduction to Machine Learning

Prof. B. Ravindran
1. If the step size in gradient descent is too large, what can happen?
(a) Overfitting
(b) The model will not converge
(c) We can reach maxima instead of minima
(d) None of the above
Sol. (b)
Ref. lecture

2. Recall the XOR(tabulated below) example from class where we did a transformation of features
to make it linearly separable. Which of the following transformations can also work?

X1 X2 Y
-1 -1 -1
1 -1 1
-1 1 1
1 1 -1

(a) X1′ = X12 , X2′ = X22

(b) X1′ = 1 + X1 , X2′ = 1 − X2
(c) X1′ = X1 X2 , X2′ = −X1 X2
(d) X1′ = (X1 − X2 )2 , X2′ = (X1 + X2 )2
Sol. (c), (d)
(c)

X1′ X2′ Y
1 -1 -1
-1 1 1
-1 1 1
1 -1 -1

(d)

X1′ X2′ Y
0 4 -1
4 0 1
4 0 1
0 4 -1

The two transformations above are linearly separable.

1
3. What is the effect of using activation function f (x) = x for hidden layers in an ANN?

(a) No effect. It’s as good as any other activation function (sigmoid, tanh etc).
(b) The ANN is equivalent to doing multi-output linear regression.
(c) Backpropagation will not work.
(d) We can model highly complex non-linear functions.

Sol. (b)
Ref. lecture

4. Which of the following functions can be used on the last layer of an ANN for classification?

(a) Softmax
(b) Sigmoid
(c) Tanh
(d) Linear
Sol. (a), (b), (c)
Ref. lecture

5. Statement: Threshold function cannot be used as activation function for hidden layers.
Reason: Threshold functions do not introduce non-linearity.

(a) Statement is true and reason is false.

(b) Statement is false and reason is true.
(c) Both are true and the reason explains the statement.
(d) Both are true and the reason does not explain the statement.
Sol. (a)
The reason is that threshold function is non-differentiable so we will not be able to calculate
gradient for backpropagation.

6. We use several techniques to ensure the weights of the neural network are small (such as
random initialization around 0 or regularisation). What conclusions can we draw if weights of
our ANN are high?
(a) Model has overfitted.
(b) It was initialized incorrectly.
(c) At least one of (a) or (b).
(d) None of the above.
Sol. (d)
Overfitting may be because of high weights but the two are not always associated.

2
7. On different initializations of your neural network, you get significantly different values of loss.
What could be the reason for this?
(a) Overfitting
(b) Some problem in the architecture
(c) Incorrect activation function
(d) Multiple local minima
Sol. (d)
Ref. lecture

8. The likelihood L(θ|X) is given by:

(a) P (θ|X)
(b) P (X|θ)
(c) P (X).P (θ)
P (θ)
(d) P (X)

Sol. (b)
Ref. lecture

9. You are trying to estimate the probability of it raining today using maximum likelihood esti-
mation. Given that in n days, it rained nr times, what is the probability of it raining today?
nr
(a) n
nr
(b) nr +n
n
(c) nr +n
(d) None of the above.

Sol. (a)
The question follows the same idea as the coin example discussed in the class.
10. Choose the correct statement (multiple may be correct):
(a) MLE is a special case of MAP when prior is a uniform distribution.
(b) MLE acts as regularisation for MAP.
(c) MLE is a special case of MAP when prior is a beta disrubution .
(d) MAP acts as regularisation for MLE.
Sol. (a), (d)
Ref. lecture

Common questions

The perceived probability of an event like rain can be estimated using MLE by dividing the number of times the event occurred (nr) by the total number of observations (n), resulting in the probability estimation of nr/n. This approach is similar to the coin-flipping example, focusing on frequency observations .

Different initializations leading to different loss values can be attributed to multiple local minima in the loss landscape. This variance occurs because the starting point heavily influences the path taken in optimization, potentially leading to different local minima . Strategies like using better initialization methods and optimization techniques may help alleviate this issue.

MLE is a special case of MAP when the prior is a uniform distribution. In MAP, prior information can adjust estimations, contrasting MLE, which does not consider priors. Therefore, if the prior distribution is non-informative (uniform), MAP and MLE yield the same results, with MLE acting as a baseline devoid of regularization effects .

If the step size in gradient descent is too large, the model will not converge. Instead of gradually approaching the minimum point, the model might overshoot, never settling down into the minima .

For the final layer in an ANN for classification, appropriate functions include softmax, sigmoid, and tanh. These functions are preferred because they introduce non-linearity and enable the conversion of the logits into probabilities which are crucial for classification tasks .

High weights in a neural network do not necessarily indicate overfitting. While high weights can cause overfitting, they are not always associated directly with it. Overfitting may have other causes, and high weights might also occur due to improper initialization or architecture .

Threshold functions are non-differentiable and therefore unsuitable for use as activation functions in hidden layers of neural networks. Since backpropagation relies on calculating the gradient to update the weights, non-differentiability prevents these calculations, hindering effective training of the network .

The XOR problem can be made linearly separable by using the transformations X'1 = X1X2, X'2 = -X1X2 or X'1 = (X1 - X2)^2, X'2 = (X1 + X2)^2. This implies that nonlinear relationships in data can be addressed through appropriate transformations that expose linear features .

Using the activation function f(x) = x in hidden layers of an ANN makes the network equivalent to performing multi-output linear regression. This is because such a linear activation does not introduce non-linearity into the model, limiting the network's ability to capture complex patterns in the data .

In Bayesian inference, the likelihood represents the plausibility of the data under different parameter values. The likelihood function L(θ|X) is mathematically defined as P(X|θ), showing the probability of the data given the parameters. It guides the updating of beliefs about the parameters in light of observed data .

Deep Learning Quiz 1: Concepts & Questions
No ratings yet
Deep Learning Quiz 1: Concepts & Questions
5 pages
Clustering Analysis with DBSCAN and K-means
100% (1)
Clustering Analysis with DBSCAN and K-means
3 pages
Machine Learning Assignment 1 Questions
No ratings yet
Machine Learning Assignment 1 Questions
4 pages
Conditional Independence in Graphs
No ratings yet
Conditional Independence in Graphs
4 pages
Midterm Exam: Deep Neural Networks
No ratings yet
Midterm Exam: Deep Neural Networks
14 pages
BITS Pilani Machine Learning Mid-Sem Exam
No ratings yet
BITS Pilani Machine Learning Mid-Sem Exam
6 pages
Machine Learning Exam Questions
No ratings yet
Machine Learning Exam Questions
5 pages
Machine Learning Classification Exercises
No ratings yet
Machine Learning Classification Exercises
3 pages
Decision Trees: Properties and Calculations
No ratings yet
Decision Trees: Properties and Calculations
2 pages
Cs230exam Win20 Soln
No ratings yet
Cs230exam Win20 Soln
28 pages
Decision Trees in Machine Learning
No ratings yet
Decision Trees in Machine Learning
3 pages
Machine Learning Assignment 7 Solutions
100% (1)
Machine Learning Assignment 7 Solutions
3 pages
Bagging vs Boosting Trees Explained
100% (1)
Bagging vs Boosting Trees Explained
12 pages
CS230 Midterm Exam Overview
No ratings yet
CS230 Midterm Exam Overview
34 pages
SVM Classifier Concepts and Questions
No ratings yet
SVM Classifier Concepts and Questions
11 pages
Machine Learning Exam Guide for 2024
No ratings yet
Machine Learning Exam Guide for 2024
12 pages
Machine Learning Assignment Solutions
No ratings yet
Machine Learning Assignment Solutions
4 pages
Decision Tree Question
No ratings yet
Decision Tree Question
6 pages
EPFL Machine Learning Exam Guidelines
No ratings yet
EPFL Machine Learning Exam Guidelines
21 pages
Neural Network Functions and Techniques
No ratings yet
Neural Network Functions and Techniques
3 pages
Neural Network Concepts and Misconceptions
No ratings yet
Neural Network Concepts and Misconceptions
13 pages
21-Numerical Problems On Agglomerative and DBSCAN Clustering-21-03-2024
No ratings yet
21-Numerical Problems On Agglomerative and DBSCAN Clustering-21-03-2024
12 pages
Machine Learning Final Exam Review
No ratings yet
Machine Learning Final Exam Review
6 pages
CS378 Natural Language Processing Midterm
No ratings yet
CS378 Natural Language Processing Midterm
11 pages
Deep Learning Midterm Practice Questions
No ratings yet
Deep Learning Midterm Practice Questions
5 pages
K-Means Clustering Quiz Questions
No ratings yet
K-Means Clustering Quiz Questions
7 pages
Machine Learning Exam Questions & Answers
No ratings yet
Machine Learning Exam Questions & Answers
6 pages
SVM Classifier on Modified Iris Dataset
0% (1)
SVM Classifier on Modified Iris Dataset
2 pages
CS189 Midterm Cheat Sheet Guidelines
No ratings yet
CS189 Midterm Cheat Sheet Guidelines
9 pages
Importance of Dropping Unimportant Features
No ratings yet
Importance of Dropping Unimportant Features
5 pages
SVM Exam Questions and Concepts
No ratings yet
SVM Exam Questions and Concepts
7 pages
Deep Learning Comprehensive Exam 2022
No ratings yet
Deep Learning Comprehensive Exam 2022
2 pages
NPTEL Machine Learning Assignment Guide
No ratings yet
NPTEL Machine Learning Assignment Guide
28 pages
CSI 4107 Final Exam - April 2006
100% (1)
CSI 4107 Final Exam - April 2006
10 pages
NLP MCQs on RNNs and Sentiment Analysis
No ratings yet
NLP MCQs on RNNs and Sentiment Analysis
13 pages
CISC 867 Deep Learning Assignment 1
No ratings yet
CISC 867 Deep Learning Assignment 1
3 pages
NPTEL Machine Learning Week 2 Overview
No ratings yet
NPTEL Machine Learning Week 2 Overview
6 pages
Big Data Computing: Assignment 7 Overview
100% (1)
Big Data Computing: Assignment 7 Overview
3 pages
Graphical Models and HMMs Overview
No ratings yet
Graphical Models and HMMs Overview
3 pages
Stanford CS230 Deep Learning Midterm Exam
No ratings yet
Stanford CS230 Deep Learning Midterm Exam
20 pages
RNN Exam Questions and Answers
No ratings yet
RNN Exam Questions and Answers
15 pages
Neural Networks Assignment - IIT KGP
No ratings yet
Neural Networks Assignment - IIT KGP
21 pages
CS230 Midterm Exam Winter 2021
No ratings yet
CS230 Midterm Exam Winter 2021
21 pages
Deep Neural Network Exam Paper
No ratings yet
Deep Neural Network Exam Paper
3 pages
Machine Learning Solution Manual
No ratings yet
Machine Learning Solution Manual
67 pages
Decision Tree Pruning and Responses
100% (1)
Decision Tree Pruning and Responses
10 pages
CS 4641/7641 Fall 2019 Midterm Exam
No ratings yet
CS 4641/7641 Fall 2019 Midterm Exam
7 pages
CS230 Deep Learning Midterm Exam 2021
No ratings yet
CS230 Deep Learning Midterm Exam 2021
21 pages
Deep Learning Techniques Question Paper
No ratings yet
Deep Learning Techniques Question Paper
12 pages
ANN Quiz - PDF - Artificial Neural Network - Computational Science
No ratings yet
ANN Quiz - PDF - Artificial Neural Network - Computational Science
17 pages
Deep Learning Exam MCQs and Concepts
No ratings yet
Deep Learning Exam MCQs and Concepts
9 pages
Linear Regression Model Insights
No ratings yet
Linear Regression Model Insights
8 pages
Decision Tree Classifier Insights
No ratings yet
Decision Tree Classifier Insights
3 pages
Neural Network Fundamentals and Techniques
100% (1)
Neural Network Fundamentals and Techniques
3 pages
Week 5 Solutions
No ratings yet
Week 5 Solutions
5 pages
Week 5 Solutions
No ratings yet
Week 5 Solutions
5 pages
CSED105 AI Midterm Exam Guidelines
No ratings yet
CSED105 AI Midterm Exam Guidelines
9 pages
CS231n Neural Network Exam Solutions
No ratings yet
CS231n Neural Network Exam Solutions
20 pages
ML 2018 - 0
No ratings yet
ML 2018 - 0
7 pages
Machine Learning MCQ Question Bank
100% (4)
Machine Learning MCQ Question Bank
22 pages
Naive Bayes Classifier Analysis
No ratings yet
Naive Bayes Classifier Analysis
4 pages
Machine Learning Assignment on Logistic Regression
No ratings yet
Machine Learning Assignment on Logistic Regression
4 pages
Machine Learning Assignment 4 Solutions
No ratings yet
Machine Learning Assignment 4 Solutions
3 pages
Understanding Regression to the Mean
No ratings yet
Understanding Regression to the Mean
3 pages
Understanding RFID Technology and Applications
No ratings yet
Understanding RFID Technology and Applications
60 pages
Multiresonator-Based Chipless RFID
No ratings yet
Multiresonator-Based Chipless RFID
20 pages
Sanskrit Document Compilation
No ratings yet
Sanskrit Document Compilation
14 pages
GATE Data Structures Questions Overview
100% (1)
GATE Data Structures Questions Overview
35 pages
High-Order Hybrid Equation Solver
No ratings yet
High-Order Hybrid Equation Solver
7 pages
Unsupervised Learning: Clustering & Analysis
No ratings yet
Unsupervised Learning: Clustering & Analysis
22 pages
Limiter №6 Plugin Overview and Installation
No ratings yet
Limiter №6 Plugin Overview and Installation
22 pages
Excel Solver Network Flow Report
No ratings yet
Excel Solver Network Flow Report
7 pages
Class 10 Polynomial Project Guide
57% (7)
Class 10 Polynomial Project Guide
21 pages
Butterworth Filter Design in MATLAB
No ratings yet
Butterworth Filter Design in MATLAB
7 pages
Digital Signal Processing Interpolation
No ratings yet
Digital Signal Processing Interpolation
11 pages
AI Planning and Search Algorithms
No ratings yet
AI Planning and Search Algorithms
52 pages
Improved Euler (Heun'S) Method Calculator: Faktura, Timeføring, Altinn
No ratings yet
Improved Euler (Heun'S) Method Calculator: Faktura, Timeføring, Altinn
7 pages
Gaussian Elimination Explained
No ratings yet
Gaussian Elimination Explained
21 pages
Multi-Objective Optimization Techniques
No ratings yet
Multi-Objective Optimization Techniques
29 pages
IIR Filter Design: Chebyshev vs Butterworth
No ratings yet
IIR Filter Design: Chebyshev vs Butterworth
22 pages
Algorithms and Data Structures Overview
No ratings yet
Algorithms and Data Structures Overview
13 pages
Matrix Theory Question Bank for CS
No ratings yet
Matrix Theory Question Bank for CS
6 pages
Analyzing Sentiment in Text Data
No ratings yet
Analyzing Sentiment in Text Data
2 pages
Understanding Numerical Methods and Errors
No ratings yet
Understanding Numerical Methods and Errors
4 pages
Non-Linear Equation Solver Analysis
No ratings yet
Non-Linear Equation Solver Analysis
6 pages
AI Assignment: Search Strategies Analysis
No ratings yet
AI Assignment: Search Strategies Analysis
3 pages
VGG Net Unit 5
No ratings yet
VGG Net Unit 5
11 pages
Understanding ANFIS Architecture and Learning
No ratings yet
Understanding ANFIS Architecture and Learning
4 pages
Solving Systems of Linear Equations
No ratings yet
Solving Systems of Linear Equations
13 pages
DSP Course and Resources Overview
No ratings yet
DSP Course and Resources Overview
8 pages
Neural Networks in Machine Learning
No ratings yet
Neural Networks in Machine Learning
60 pages
Dijkstra Algorithm for Shortest Paths
No ratings yet
Dijkstra Algorithm for Shortest Paths
4 pages
Optimal Control in Quadratic Systems
No ratings yet
Optimal Control in Quadratic Systems
27 pages
C++ String Processing Techniques
No ratings yet
C++ String Processing Techniques
31 pages
Operations Research Applications Overview
No ratings yet
Operations Research Applications Overview
20 pages
Gauss Elimination and Matrix Inversion
No ratings yet
Gauss Elimination and Matrix Inversion
3 pages

Activation Functions in Neural Networks

Uploaded by

Activation Functions in Neural Networks

Uploaded by

Assignment 5

Introduction to Machine Learning

(a) X1′ = X12 , X2′ = X22

The two transformations above are linearly separable.

(a) Statement is true and reason is false.

8. The likelihood L(θ|X) is given by:

Common questions

How does the perceived probability of an event, such as rain today, get estimated using MLE with observed data from previous days?

Why might different initializations of a neural network lead to significantly different loss values, and how can this be addressed?

What is the nature of Maximum Likelihood Estimation (MLE) in relation to Maximum A Posteriori (MAP)? How does the nature of priors play a role here?

What are the potential consequences of selecting an excessively large step size in the gradient descent algorithm?

Which functions are appropriate for the final layer activation in an ANN for classification tasks, and why are certain functions preferred?

What can be concluded about a neural network if the weights are high, and does this necessarily indicate overfitting?

Explain the problem of non-differentiability in threshold functions when used as activation functions in neural networks.

How can the XOR problem be transformed to make it linearly separable, and what does this imply about feature transformations?

Discuss the use of the activation function f(x) = x in hidden layers of an Artificial Neural Network (ANN). What impact does this have on the network's capabilities?

What role does likelihood play in Bayesian inference, and how is the likelihood function L(θ|X) mathematically defined?

You might also like