0% found this document useful (0 votes)
12 views9 pages

Neural Network Assignment Insights

The document outlines an assignment focused on neural networks and perceptrons, including questions on their capabilities, architectures, and evaluation metrics. It covers topics such as Boolean functions, gradient descent, and the Mean Squared Error (MSE) loss function. The assignment consists of multiple-choice questions with specific scenarios and calculations related to neural network design and performance.

Uploaded by

sharmi12
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views9 pages

Neural Network Assignment Insights

The document outlines an assignment focused on neural networks and perceptrons, including questions on their capabilities, architectures, and evaluation metrics. It covers topics such as Boolean functions, gradient descent, and the Mean Squared Error (MSE) loss function. The assignment consists of multiple-choice questions with specific scenarios and calculations related to neural network design and performance.

Uploaded by

sharmi12
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

Week 2 : Assignment 2

Due date: 2025-08-06, 23:59 IST.

Assignment not submitted

1 point

Consider a single perceptron shown below.w=1,b=−0.5. The perceptron uses a step activation
function defined as
f(x)={10if wx+b≥0otherwise

Predict the output for input values 0.51 and 0.49.

1, 0

0, 1

1, 1

0, 0

1 point

You are given a Boolean function that is not linearly separable. Which of the following is true
regarding its representation using a perceptron-based network?

It can be represented using a single-layer perceptron if you increase the number of perceptrons.

It requires at least one hidden layer in the network.

It cannot be represented by any feedforward neural network.

It can only be represented by a network with more than 2n perceptrons.

1 point

As 𝑛 increases, representing all Boolean functions using a 2-layer perceptron becomes impractical
due to:

Increase in training data size

Exponential increase in required hidden layer neurons

Limitation in backpropagation algorithm

Decrease in classification accuracy

1 point
You are designing neural networks to represent Boolean functions. Consider the capabilities of
single-layer and multi-layer perceptrons.
Which of the following statements are true?

A single-layer perceptron can represent all linearly separable Boolean functions.

XOR requires at least one hidden layer to be represented.

A network with 2n hidden neurons and one output neuron can represent all Boolean functions
over n inputs.

A single-layer perceptron can represent the XOR function if the learning rate is set appropriately.

1 point

You are given a neural network with 2 inputs, a hidden layer with 4 perceptrons, and one output
neuron. The hidden neurons are designed to fire for specific input patterns like {–1, +1}, etc.
Which of the following are true about such a network?

It can represent linearly non-separable functions like XOR.

The network uses hidden neurons to convert a non-linearly separable function into linearly
separable subproblems.

Removing any one hidden neuron will not affect the network's ability to represent XOR.

This network must use sigmoid activation in the hidden layer to implement XOR.

1 point

You are designing a spam filter using a perceptron. Some input features (like the presence of the
word “FREE”) are not linearly separable from others. Which architecture is most appropriate for
learning from such data?

Single-layer perceptron with more training data

Multi-layer perceptron with hidden neurons

Removing the non-linearly separable features

Output layer with more neurons

1 point

You are given an arbitrary Boolean function defined over 4 binary inputs. Which of the following
neural network architectures is guaranteed to represent this function?

One perceptron

A network with 4 hidden neurons

A network with 16 hidden neurons and one output perceptron

A network with 5 output neurons

For a single input value x=1.5,w=2,b=−1, compute the output of the sigmoid neuron up to 2 decimal
places.
Fill the blank: _________
1 point

1 point

In a sigmoid neuron defined as f(x)=σ(wx+b), where σ(z)=11+e−z. Suppose the weight w is positive.
If the bias b is increased, in which direction does the sigmoid curve shift along the x-axis?

Upwards

Leftwards

Downwards

Rightwards

1 point

Which of the following statements are true?


I. Logistic function is smooth and continuous
II. Logistic function is differentiable.

Only Statement I is true

Only Statement II is true

Both statements I and II are true

None of the above

1 point

Which of the following statements are true about learning algorithms?

I. Learning algorithms always maximize a loss function


II. Learning algorithms learn parameters from data

Only Statement I is true

Only Statement II is true

Both statements I and II are true

None of the above

1 point

Consider a neural network with 12 input features, a hidden layer with 8 neurons, and a single output
neuron. All layers are fully connected, and biases are included in both the hidden and output layers.

How many gradients must be computed during backpropagation?

101

110

105

113
1 point

You are evaluating a regression model on a dataset of 3 points. The actual target values and
predicted outputs from your model are given below.

Using the Mean Squared Error (MSE) loss function defined as:
MSE=1n∑i=1n(yi−f(xi))2
What is the MSE for this model on the given dataset?

1.00

0.67

0.33

2.00

1 point

You are given a model defined as y^=f(x)=2x+3 For three input-output pairs, the inputs
are x=0,1,2, and the corresponding actual target values are y=4,6,9 respectively.
Using the Mean Squared Error (MSE) loss function defined as:
MSE=1n∑i=1n(yi−f(xi))2
What is the value of the MSE for this model on the given dataset?

1.00

1.33

2.00

3.00

1 point

A regression model can be evaluated using the Mean Absolute Error (MAE), which is defined as:
MAE=1n∑i=1n|yi−y^i|
The model was tested on 4 data points. The actual and predicted values are:
for input 1, y=3 and y^=2.5
for input 2, y=7 and y^=6
for input 3, y=4 and y^=4.5
for input 4, y=6 and y^=7

0.75

1.00

1.25
1.50

1 point

You are comparing two models for different function learning tasks:

Model A: A multilayer network of perceptrons


Model B: A multilayer network of sigmoid neurons

Task 1: Learn a Boolean function (like XOR)


Task 2: Learn a continuous function (like sin(x))

Which of the following statements is most appropriate?

Model A can represent both tasks with high precision

Model A is better for Task 1, Model B is better for Task 2

Model B can approximate both Task 1 and Task 2 outputs, but not represent Task 1 exactly

Both models are equivalent in their representation abilities

1 point

A neural network is trained to predict customer churn based on multiple features: age, contract
duration, and monthly charges. After training, you observe that the weight associated with the
monthly charges feature is close to zero, while the others have larger magnitudes.

What is the most reasonable inference?

Monthly charges had missing values in training data

Monthly charges were not normalized correctly

Monthly charges may not have contributed significantly to the model’s prediction

The learning rate was too high for that feature

1 point

You are building a neural network-based fraud detection system. A sigmoid neuron receives three
inputs:

x1: transaction amount


x2: number of transactions in last hour
x3: time of transaction

After training, the learned weights are:

w1=3.2,w2=0.05,w3=−0.02

Assume all input features have been scaled to a similar range (for example, between 0 and 1).
Which of the following is the most reasonable conclusion?
The time of transaction is the most important feature

number of transactions in last hour is the most important feature

The transaction amount is a highly influential feature

The sigmoid neuron is not functioning properly

1 point

You are optimizing a function f(x)=x2−x+2 using gradient descent. Let the learning rate be η=0.01,
and the value of x at a step t be xt. Which of the following gives the correct value of x at
step t+1 after one update using gradient descent?

xt+1=xt−0.01(2xt−1)

xt+1=xt+0.01(2xt)

xt+1=xt−(2xt−1)

xt+1=xt−0.01(xt−1)

1 point

Let f(x)=x3−4x+1. You are using gradient descent with learning rate η=0.1.
What is the correct update rule for x at step t+1, given that xt is the current value?

xt+1=xt−0.1⋅(3x2t−4)

xt+1=xt−0.1⋅(3x2t+4)

xt+1=xt+0.1⋅(3x2t−4)

xt+1=xt+0.1⋅(3x2t+4)

1 point

In a temperature calibration model, the function f(T,x)=T2+5x+20 models the system deviation,
where T is the temperature input and x is a sensor setting. Suppose gradient descent with a learning
rate of 1 is used to minimize the deviation. The process starts at (T,x)=(0,0).
What will be the value of T after 10 iterations?

50

-10

1 point

You are minimizing the function


f(x1,x2)=4x21+5x2+9
with learning rate η=0.5, and starting from (x1,x2)=(0,0).
What is the value of x2 after 5 iterations?

-2.5
-12.5

-1

-0.5

1 point

Let f(x1,x2)=x21+x22. Apply gradient descent with learning rate η=0.1 starting from (x1,x2)=(1,2).
What is the updated value of x1 and x2 after one iteration?

(x1,x2)=(0.9,1.9)

(x1,x2)=(0.8,1.6)

(x1,x2)=(1.1,2.1)

(x1,x2)=(1.0,2.0)

1 point

You train a logistic regression model for spam classification with labels 1(spam) and 0 (not spam).
After training, the model has learned a weight vector such that
wTx=2.5
Which of the following can be correctly inferred about the model’s prediction?

The predicted probability of class 1 is greater than 0.5

The predicted label is 1

The predicted label is 0

The value of wTx irrelevant to prediction

1 point

You are designing a binary classifier using logistic regression. The model has learned the weight
vector w=[−3,4] and no bias term is used.
If a new point x=[1,1] is evaluated, what will be the model output and prediction?

The predicted label is 1

The predicted label is 0

The model output cannot be determined without a bias term

The model output is undefined for input [1, 1]

1 point

You're building a machine learning model to predict housing prices. Your teammate proposes several
functions to use as the model's output. You are asked to identify which function is not suitable to use
directly as an output function in a standard supervised learning model.
Which of the following output functions is least appropriate in this setting?

y^=wTx

y^=log(1+ewTx)
y^=sin(wTx)

y^=11+e−wTx

1 point

The plot shows a logistic function σ(x)=11+e−(wx+b) with sharp transition from 0 to 1 near a point on
the x-axis.

Based on the curve, what can you infer about the parameters w and b?

w is close to 0 and b is large

w is large and b is small

w is large and b is large

w is small and b is negative

1 point

You are experimenting with the sigmoid neuron σ(x)=11+e−(wx+b). You observe that the curve
transitions very gradually across a wide range of x-values.

Which of the following changes would make the curve transition more sharply (closer to a step
function)?

Increase b

Increase w

Decrease w

Set b=0
1 point

Why is Sum of Squared Errors (SSE) considered better than Sum of Errors (SE) in many learning
scenarios?

SSE ensures that positive and negative errors do not cancel each other out

SSE magnifies larger errors, making the model more sensitive to outliers

The derivative of SSE with respect to prediction is simple and continuous

SSE always leads to better accuracy than SE

Sum of errors can be zero even when individual predictions are wrong

1 point

Statement I: Any linearly separable function can be represented using a singlelayer perceptron.
Statement II: A single sigmoid neuron can approximate any Boolean function with zero error.
Which of the above statements is/are correct?

Only I

Only II

Both I and II

None

1 point

You are given a multi-layer perceptron with one hidden layer consisting of 8 perceptrons and a single
output neuron. Each perceptron in the hidden layer outputs either 0 or 1 based on its input.
Which of the following statements is true about the function capacity of this network?

The network is capable of implementing 28 Boolean functions

The network is capable of implementing 264 Boolean functions

The output neuron receives a continuous-valued input

Each hidden neuron produces 64 possible outputs

You may submit any number of times before the due date. The final submission will be considered
for grading.

Common questions

Powered by AI

The logistic function being differentiable allows for smooth and continuous optimization, which is beneficial in training models using gradient-based methods .

Representing the XOR Boolean function requires at least one hidden layer in the network, as it is not linearly separable .

Increasing the bias b in a sigmoid function shifts the curve to the left along the x-axis .

With a logistic regression model using a weight vector w=[−3,4] and input x=[1,1], the predicted label would be 1, indicating the model output results in a decision boundary crossing, classifying it positively .

The correct update rule for x at step t+1 is xt+1 = xt − 0.01(2xt − 1) using gradient descent with a learning rate of η=0.01 .

110 gradients need to be computed during backpropagation for the described neural network architecture .

Single-layer perceptrons can only represent linearly separable Boolean functions. Non-linearly separable functions, like XOR, require multi-layer architecture with hidden neurones .

Model A, a multilayer network of perceptrons, is better suited for representing Boolean functions (like XOR), while Model B, a multilayer network of sigmoid neurons, is better for representing continuous functions like sin(x).

SSE ensures that positive and negative errors do not cancel each other out and magnifies larger errors, making models more sensitive to outliers .

A feature with a weight close to zero may not have contributed significantly to the model's prediction .

You might also like