Understanding Neural Networks: From Single Layer to Multi Layer
Perceptron
PART 1: THE SINGLE LAYER PERCEPTRON
What is a Perceptron?
Imagine you're trying to decide whether to go outside today. You might
consider:
Is it sunny? (Yes=1, No=0)
Is it warm? (Yes=1, No=0)
Your brain weighs these factors and makes a decision. A perceptron works
exactly like this! It's the simplest form of an artificial "neuron" that makes
decisions based on inputs.
Real-Life Example: The Ice Cream Decision
Let's say you'll only go out for ice cream if it's sunny AND warm (AND gate).
This is a perfect example to understand perceptrons.
Breaking Down the Perceptron's Anatomy
Inputs (x₁, x₂) Weights (w₁, w₂) Sum (z) Activation Output (y)
Think of it like this:
Inputs (x): The information you receive (like "Is it sunny?")
Weights (w): How important each piece of information is
Bias (b): Your natural tendency (like being generally optimistic or
pessimistic)
Sum (z): Adding up all the weighted information
Activation (σ): The final decision-making step
The Mathematics Made Simple
Step 1: Weighted Sum (The "Thinking" Part)
Imagine you're shopping for a laptop:
Battery life is very important to you (weight = 0.8)
Price is somewhat important (weight = 0.5)
Color doesn't matter (weight = 0.1)
When you look at a laptop:
Great battery? (x₁ = 1)
Good price? (x₂ = 1)
Nice color? (x₃ = 1)
Your thinking process:
(1 × 0.8) + (1 × 0.5) + (1 × 0.1) = 1.4
Formula:
z = x₁w₁ + x₂w₂ + x₃w₃ = Wᵀx
Step 2: Adding Bias (Your Natural Tendency)
Maybe you're naturally excited about any new laptop (bias = +0.2):
z = (x₁w₁ + x₂w₂ + x₃w₃) + b
z = 1.4 + 0.2 = 1.6
Step 3: Activation Function (Making the Final Decision)
Think of this as your "YES/NO" threshold. In a simple perceptron:
If z ≥ 0.5 → YES (output = 1)
If z < 0.5 → NO (output = 0)
σ(z) = { 1 if z ≥ 0.5
0 if z < 0.5 }
So with z = 1.6 ≥ 0.5 → You buy the laptop!
How the Perceptron Learns: The AND Gate Example
Let's teach a perceptron to understand "AND" logic:
Input 1 Input 2 Expected Output
0 0 0
0 1 0
1 0 0
1 1 1
Learning Process (Like Teaching a Child)
Round 1 - First Attempt:
We start with random weights: w₁ = 0.9, w₂ = 0.9, threshold = 0.5
Example 1: (0,0) → Should be 0
z = (0×0.9) + (0×0.9) = 0
Since 0 < 0.5 → Output = 0 ✓ Correct! No learning needed.
Example 2: (0,1) → Should be 0
z = (0×0.9) + (1×0.9) = 0.9
Since 0.9 ≥ 0.5 → Output = 1 ✗ Wrong!
Error = Expected - Actual = 0 - 1 = -1
Learning happens when we make mistakes!
The Learning Rule (How We Fix Mistakes)
When we're wrong, we adjust our thinking (weights):
New Weight = Old Weight + (Learning Rate × Error × Input)
Think of learning rate as "how quickly you learn from mistakes":
High learning rate (0.9): Dramatic changes, might overreact
Low learning rate (0.1): Small, careful adjustments
Fixing our mistake:
Learning rate = 0.5
w₁ (new) = 0.9 + 0.5 × (-1) × 0 = 0.9 (no change since input was 0)
w₂ (new) = 0.9 + 0.5 × (-1) × 1 = 0.4
Example 3: (1,0) → Should be 0
z = (1×0.9) + (0×0.4) = 0.9
Output = 1 ✗ Wrong! (we'll fix this in next round)
Example 4: (1,1) → Should be 1
z = (1×0.9) + (1×0.4) = 1.3
Output = 1 ✓ Correct!
Round 2 - Testing Our Learning:
Now with updated weights: w₁ = 0.4, w₂ = 0.4
Example 1: (0,0) → z = 0 → Output 0 ✓
Example 2: (0,1) → z = 0.4 → Output 0✓
Example 3: (1,0) → z = 0.4 → Output 0✓
Example 4: (1,1) → z = 0.8 → Output 1✓
Success! The perceptron has learned AND logic!
PART 2: MULTI-LAYER PERCEPTRON
Why Do We Need Multiple Layers?
The Limitation of Single Layer
A single perceptron can only solve "linearly separable" problems. Imagine
drawing a straight line to separate answers:
AND Gate is easy: You can draw one line
XOR Gate is tricky: You cannot separate with one line!
XOR: Output 1 when inputs are different (0,1 or 1,0)
Real-Life Example: Hiring Decision
Imagine you're hiring for a job. The decision is complex:
Simple rules (AND/OR) are not enough. You need multiple perspectives:
First layer (Basic screening): Looks at education and experience
Second layer (Skills assessment): Evaluates technical skills
Third layer (Culture fit): Considers personality and values
Final layer: Makes the hiring decision based on all factors
The Multi-Layer Architecture
Input Layer Hidden Layer Output Layer
(Middle-level thinking)
Think of it like:
Input Layer: Raw data (resume, test scores)
Hidden Layer: Intermediate concepts (qualified candidate?, skilled
candidate?)
Output Layer: Final decision (hire = 1, don't hire = 0)
The XOR Problem: A Perfect Example
XOR (exclusive OR) returns 1 when inputs are different:
Input 1 Input 2 XOR Output
0 0 0
0 1 1
1 0 1
1 1 0
Why XOR Needs Multiple Layers
Think of XOR as a combination of simpler operations:
XOR = (A AND NOT B) OR (NOT A AND B)
This requires two levels of thinking:
1. First, recognize patterns "A AND NOT B" and "NOT A AND B"
2. Then, combine these patterns
How Information Flows Through Layers
Forward Propagation (The Thinking Process)
Step 1: Input to Hidden Layer
z₁ = w₁₁x₁ + w₁₂x₂ + b₁ (First hidden neuron's raw thinking)
z₂ = w₂₁x₁ + w₂₂x₂ + b₂ (Second hidden neuron's raw thinking)
z₃ = w₃₁x₁ + w₃₂x₂ + b₃ (Third hidden neuron's raw thinking)
a₁ = σ(z₁) (First neuron's activated output)
a₂ = σ(z₂) (Second neuron's activated output)
a₃ = σ(z₃) (Third neuron's activated output)
Step 2: Hidden to Output Layer
z_final = v₁a₁ + v₂a₂ + v₃a₃ + b_final
y = σ(z_final)
The Sigmoid Activation Function (The "Squishing" Function)
In multi-layer networks, we use smoother activation functions:
σ(z) = 1 / (1 + e⁻ᶻ)
This "squishes" any number into a value between 0 and 1:
Input (z) Output σ(z)
-∞ 0
-10 0.000045
-5 0.0067
0 0.5
5 0.9933
10 0.999955
+∞ 1
Think of it as "certainty level":
Very negative input → Very certain it's NO
Zero input → Completely uncertain (50/50)
Very positive input → Very certain it's YES
Learning in Multiple Layers (Backpropagation)
The Challenge
In a single layer, we knew exactly which weight caused an error. In multiple
layers, how do we know:
Was the error caused by output layer weights?
Or hidden layer weights?
Or maybe the bias?
The Solution: Backpropagation
Think of it like a team project where blame (error) flows backward:
1. Output layer error: "We got the final decision wrong"
2. Each output neuron says: "My error came from these hidden
neurons"
3. Hidden neurons say: "Our errors came from these inputs"
The Mathematics (Made Simple)
Step 1: Calculate output layer error
δ_output = (y_predicted - y_actual) × σ'(z_final)
Where σ'(z) is the derivative of sigmoid: σ(z) × (1 - σ(z))
Step 2: Calculate hidden layer error
δ_hidden = (δ_output × v) × σ'(z_hidden)
Step 3: Update weights
v_new = v_old - learning_rate × δ_output × a_hidden
w_new = w_old - learning_rate × δ_hidden × x_input
Complete XOR Example with Numbers
Let's walk through a complete example:
Initial Setup
Inputs: x₁, x₂ (0 or 1)
Hidden layer: 3 neurons
Learning rate: 0.5
Forward Pass for (0,1):
Hidden Layer Calculations:
z₁ = (0×0.5) + (1×0.3) + 0.1 = 0.4
a₁ = σ(0.4) = 1/(1+e⁻⁰·⁴) = 0.60
z₂ = (0×0.2) + (1×0.8) + 0.1 = 0.9
a₂ = σ(0.9) = 0.71
z₃ = (0×0.7) + (1×0.4) + 0.1 = 0.5
a₃ = σ(0.5) = 0.62
Output Layer:
z_final = (0.60×0.8) + (0.71×0.5) + (0.62×0.6) + 0.2
= 0.48 + 0.355 + 0.372 + 0.2 = 1.407
y = σ(1.407) = 0.80
Error: For XOR, (0,1) should output 1, so error = 0.80 - 1 = -0.20
Backward Pass (Learning from Error)
Output Layer Update:
δ_output = -0.20 × σ'(1.407)
= -0.20 × [0.80 × (1-0.80)]
= -0.20 × 0.16 = -0.032
v₁_new = 0.8 - 0.5 × (-0.032) × 0.60 = 0.8096
v₂_new = 0.5 - 0.5 × (-0.032) × 0.71 = 0.5114
v₃_new = 0.6 - 0.5 × (-0.032) × 0.62 = 0.6099
Hidden Layer Updates (simplified):
Each hidden weight gets updated based on how much it contributed to the
error.
Training Process Over Multiple Rounds
Round Input Predicted Actual Error Notes
1 (0,0) 0.32 0 +0.32 Learning
1 (0,1) 0.80 1 -0.20 Learning
1 (1,0) 0.75 1 -0.25 Learning
1 (1,1) 0.45 0 +0.45 Learning
After many rounds:
| 100 | (0,0) | 0.12 | 0 | +0.12 | Improving |
| 100 | (0,1) | 0.91 | 1 | -0.09 | Improving |
| 100 | (1,0) | 0.89 | 1 | -0.11 | Improving |
| 100 | (1,1) | 0.23 | 0 | +0.23 | Improving |
Eventually, the network learns to approximate XOR perfectly!
Key Takeaways for Beginners
1. Single Layer Perceptron = One decision-maker, good for simple
yes/no problems
2. Multi-Layer Perceptron = Multiple decision-makers working
together, can solve complex problems
3. Weights = Importance of each piece of information
4. Bias = Natural tendency or baseline
5. Activation Function = Decision rule (step function for simple,
sigmoid for complex)
6. Learning = Adjusting weights based on errors
7. Forward Propagation = Making a decision
8. Backpropagation = Learning from mistakes