0% found this document useful (0 votes)
3 views12 pages

Week 5

The document explains the concepts of single-layer and multi-layer perceptrons in neural networks. It details how perceptrons make decisions based on inputs, weights, and biases, and illustrates learning through examples like the AND and XOR gates. The document also covers the forward propagation and backpropagation processes essential for training multi-layer networks.

Uploaded by

notforu567
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views12 pages

Week 5

The document explains the concepts of single-layer and multi-layer perceptrons in neural networks. It details how perceptrons make decisions based on inputs, weights, and biases, and illustrates learning through examples like the AND and XOR gates. The document also covers the forward propagation and backpropagation processes essential for training multi-layer networks.

Uploaded by

notforu567
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

Understanding Neural Networks: From Single Layer to Multi Layer

Perceptron

PART 1: THE SINGLE LAYER PERCEPTRON

What is a Perceptron?

Imagine you're trying to decide whether to go outside today. You might


consider:

 Is it sunny? (Yes=1, No=0)

 Is it warm? (Yes=1, No=0)

Your brain weighs these factors and makes a decision. A perceptron works
exactly like this! It's the simplest form of an artificial "neuron" that makes
decisions based on inputs.

Real-Life Example: The Ice Cream Decision

Let's say you'll only go out for ice cream if it's sunny AND warm (AND gate).
This is a perfect example to understand perceptrons.

Breaking Down the Perceptron's Anatomy

Inputs (x₁, x₂) Weights (w₁, w₂) Sum (z) Activation Output (y)
Think of it like this:

 Inputs (x): The information you receive (like "Is it sunny?")

 Weights (w): How important each piece of information is

 Bias (b): Your natural tendency (like being generally optimistic or


pessimistic)

 Sum (z): Adding up all the weighted information

 Activation (σ): The final decision-making step

The Mathematics Made Simple

Step 1: Weighted Sum (The "Thinking" Part)

Imagine you're shopping for a laptop:

 Battery life is very important to you (weight = 0.8)

 Price is somewhat important (weight = 0.5)

 Color doesn't matter (weight = 0.1)


When you look at a laptop:

 Great battery? (x₁ = 1)

 Good price? (x₂ = 1)

 Nice color? (x₃ = 1)

Your thinking process:


(1 × 0.8) + (1 × 0.5) + (1 × 0.1) = 1.4

Formula:

z = x₁w₁ + x₂w₂ + x₃w₃ = Wᵀx

Step 2: Adding Bias (Your Natural Tendency)

Maybe you're naturally excited about any new laptop (bias = +0.2):

z = (x₁w₁ + x₂w₂ + x₃w₃) + b

z = 1.4 + 0.2 = 1.6

Step 3: Activation Function (Making the Final Decision)

Think of this as your "YES/NO" threshold. In a simple perceptron:

 If z ≥ 0.5 → YES (output = 1)

 If z < 0.5 → NO (output = 0)

σ(z) = { 1 if z ≥ 0.5

0 if z < 0.5 }

So with z = 1.6 ≥ 0.5 → You buy the laptop!

How the Perceptron Learns: The AND Gate Example

Let's teach a perceptron to understand "AND" logic:

Input 1 Input 2 Expected Output

0 0 0

0 1 0

1 0 0

1 1 1

Learning Process (Like Teaching a Child)


Round 1 - First Attempt:

We start with random weights: w₁ = 0.9, w₂ = 0.9, threshold = 0.5

Example 1: (0,0) → Should be 0

 z = (0×0.9) + (0×0.9) = 0

 Since 0 < 0.5 → Output = 0 ✓ Correct! No learning needed.

Example 2: (0,1) → Should be 0

 z = (0×0.9) + (1×0.9) = 0.9

 Since 0.9 ≥ 0.5 → Output = 1 ✗ Wrong!

 Error = Expected - Actual = 0 - 1 = -1

Learning happens when we make mistakes!

The Learning Rule (How We Fix Mistakes)

When we're wrong, we adjust our thinking (weights):

New Weight = Old Weight + (Learning Rate × Error × Input)

Think of learning rate as "how quickly you learn from mistakes":

 High learning rate (0.9): Dramatic changes, might overreact

 Low learning rate (0.1): Small, careful adjustments

Fixing our mistake:


Learning rate = 0.5

w₁ (new) = 0.9 + 0.5 × (-1) × 0 = 0.9 (no change since input was 0)
w₂ (new) = 0.9 + 0.5 × (-1) × 1 = 0.4

Example 3: (1,0) → Should be 0

 z = (1×0.9) + (0×0.4) = 0.9

 Output = 1 ✗ Wrong! (we'll fix this in next round)

Example 4: (1,1) → Should be 1


 z = (1×0.9) + (1×0.4) = 1.3

 Output = 1 ✓ Correct!

Round 2 - Testing Our Learning:

Now with updated weights: w₁ = 0.4, w₂ = 0.4

Example 1: (0,0) → z = 0 → Output 0 ✓


Example 2: (0,1) → z = 0.4 → Output 0✓
Example 3: (1,0) → z = 0.4 → Output 0✓
Example 4: (1,1) → z = 0.8 → Output 1✓

Success! The perceptron has learned AND logic!


PART 2: MULTI-LAYER PERCEPTRON

Why Do We Need Multiple Layers?

The Limitation of Single Layer

A single perceptron can only solve "linearly separable" problems. Imagine


drawing a straight line to separate answers:

AND Gate is easy: You can draw one line

XOR Gate is tricky: You cannot separate with one line!

XOR: Output 1 when inputs are different (0,1 or 1,0)

Real-Life Example: Hiring Decision

Imagine you're hiring for a job. The decision is complex:

Simple rules (AND/OR) are not enough. You need multiple perspectives:

 First layer (Basic screening): Looks at education and experience

 Second layer (Skills assessment): Evaluates technical skills

 Third layer (Culture fit): Considers personality and values

 Final layer: Makes the hiring decision based on all factors

The Multi-Layer Architecture

Input Layer Hidden Layer Output Layer


(Middle-level thinking)

Think of it like:

 Input Layer: Raw data (resume, test scores)

 Hidden Layer: Intermediate concepts (qualified candidate?, skilled


candidate?)

 Output Layer: Final decision (hire = 1, don't hire = 0)

The XOR Problem: A Perfect Example

XOR (exclusive OR) returns 1 when inputs are different:

Input 1 Input 2 XOR Output

0 0 0

0 1 1

1 0 1

1 1 0

Why XOR Needs Multiple Layers

Think of XOR as a combination of simpler operations:

 XOR = (A AND NOT B) OR (NOT A AND B)

This requires two levels of thinking:

1. First, recognize patterns "A AND NOT B" and "NOT A AND B"

2. Then, combine these patterns

How Information Flows Through Layers

Forward Propagation (The Thinking Process)

Step 1: Input to Hidden Layer

z₁ = w₁₁x₁ + w₁₂x₂ + b₁ (First hidden neuron's raw thinking)

z₂ = w₂₁x₁ + w₂₂x₂ + b₂ (Second hidden neuron's raw thinking)

z₃ = w₃₁x₁ + w₃₂x₂ + b₃ (Third hidden neuron's raw thinking)


a₁ = σ(z₁) (First neuron's activated output)

a₂ = σ(z₂) (Second neuron's activated output)

a₃ = σ(z₃) (Third neuron's activated output)

Step 2: Hidden to Output Layer

z_final = v₁a₁ + v₂a₂ + v₃a₃ + b_final

y = σ(z_final)

The Sigmoid Activation Function (The "Squishing" Function)

In multi-layer networks, we use smoother activation functions:

σ(z) = 1 / (1 + e⁻ᶻ)

This "squishes" any number into a value between 0 and 1:

Input (z) Output σ(z)

-∞ 0

-10 0.000045

-5 0.0067

0 0.5

5 0.9933

10 0.999955

+∞ 1

Think of it as "certainty level":

 Very negative input → Very certain it's NO

 Zero input → Completely uncertain (50/50)

 Very positive input → Very certain it's YES

Learning in Multiple Layers (Backpropagation)


The Challenge

In a single layer, we knew exactly which weight caused an error. In multiple


layers, how do we know:

 Was the error caused by output layer weights?

 Or hidden layer weights?

 Or maybe the bias?

The Solution: Backpropagation

Think of it like a team project where blame (error) flows backward:

1. Output layer error: "We got the final decision wrong"

2. Each output neuron says: "My error came from these hidden
neurons"

3. Hidden neurons say: "Our errors came from these inputs"

The Mathematics (Made Simple)

Step 1: Calculate output layer error

δ_output = (y_predicted - y_actual) × σ'(z_final)

Where σ'(z) is the derivative of sigmoid: σ(z) × (1 - σ(z))

Step 2: Calculate hidden layer error

δ_hidden = (δ_output × v) × σ'(z_hidden)

Step 3: Update weights

v_new = v_old - learning_rate × δ_output × a_hidden

w_new = w_old - learning_rate × δ_hidden × x_input

Complete XOR Example with Numbers

Let's walk through a complete example:

Initial Setup

 Inputs: x₁, x₂ (0 or 1)

 Hidden layer: 3 neurons

 Learning rate: 0.5


Forward Pass for (0,1):

Hidden Layer Calculations:

z₁ = (0×0.5) + (1×0.3) + 0.1 = 0.4

a₁ = σ(0.4) = 1/(1+e⁻⁰·⁴) = 0.60

z₂ = (0×0.2) + (1×0.8) + 0.1 = 0.9

a₂ = σ(0.9) = 0.71

z₃ = (0×0.7) + (1×0.4) + 0.1 = 0.5

a₃ = σ(0.5) = 0.62

Output Layer:

z_final = (0.60×0.8) + (0.71×0.5) + (0.62×0.6) + 0.2

= 0.48 + 0.355 + 0.372 + 0.2 = 1.407

y = σ(1.407) = 0.80

Error: For XOR, (0,1) should output 1, so error = 0.80 - 1 = -0.20

Backward Pass (Learning from Error)

Output Layer Update:

δ_output = -0.20 × σ'(1.407)

= -0.20 × [0.80 × (1-0.80)]

= -0.20 × 0.16 = -0.032

v₁_new = 0.8 - 0.5 × (-0.032) × 0.60 = 0.8096

v₂_new = 0.5 - 0.5 × (-0.032) × 0.71 = 0.5114

v₃_new = 0.6 - 0.5 × (-0.032) × 0.62 = 0.6099

Hidden Layer Updates (simplified):


Each hidden weight gets updated based on how much it contributed to the
error.
Training Process Over Multiple Rounds

Round Input Predicted Actual Error Notes

1 (0,0) 0.32 0 +0.32 Learning

1 (0,1) 0.80 1 -0.20 Learning

1 (1,0) 0.75 1 -0.25 Learning

1 (1,1) 0.45 0 +0.45 Learning


After many rounds:
| 100 | (0,0) | 0.12 | 0 | +0.12 | Improving |
| 100 | (0,1) | 0.91 | 1 | -0.09 | Improving |
| 100 | (1,0) | 0.89 | 1 | -0.11 | Improving |
| 100 | (1,1) | 0.23 | 0 | +0.23 | Improving |

Eventually, the network learns to approximate XOR perfectly!

Key Takeaways for Beginners

1. Single Layer Perceptron = One decision-maker, good for simple


yes/no problems

2. Multi-Layer Perceptron = Multiple decision-makers working


together, can solve complex problems

3. Weights = Importance of each piece of information

4. Bias = Natural tendency or baseline

5. Activation Function = Decision rule (step function for simple,


sigmoid for complex)

6. Learning = Adjusting weights based on errors

7. Forward Propagation = Making a decision

8. Backpropagation = Learning from mistakes

You might also like