0% found this document useful (0 votes)
14 views22 pages

Understanding Planning in AI Systems

Planning involves creating a sequence of actions to achieve a goal, represented through states, goals, and actions using the STRIPS framework. Plans can be linear, partially ordered, conditional, or hierarchical, and can be generated through forward or backward state-space search methods. The document also discusses the air cargo problem as an example of applying these planning concepts and highlights the advantages of partial ordering in planning to manage independent actions.

Uploaded by

akashlagudu
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views22 pages

Understanding Planning in AI Systems

Planning involves creating a sequence of actions to achieve a goal, represented through states, goals, and actions using the STRIPS framework. Plans can be linear, partially ordered, conditional, or hierarchical, and can be generated through forward or backward state-space search methods. The document also discusses the air cargo problem as an example of applying these planning concepts and highlights the advantages of partial ordering in planning to manage independent actions.

Uploaded by

akashlagudu
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Planning:

The task of coming up with a sequence of actions that will achieve a goal is
called Planning.
 “Deciding in ADVANCE what is to be done”
 A problem solving methodology
 Generating a set of actions that are likely to lead to a goal
 Deciding on a course of actions before acting

Representation of states and goals: -

In the STRIPS (STanford Research Institute Problem Solver) language, states


are represented by conjunctions of function-free ground literals, that is,
predicates applied to constant symbols, possibly negated.

For example,
At(Home) ∧ ¬ Have(Milk) ∧ ¬ Have(Bananas) ∧ ¬ Have(Drill) ∧….

Goals are also described by conjunctions of literals.


For example,
At(Home) ∧Have(Milk) ∧ Have(Bananas) ∧ Have(Drill)
represents a goal state in which the agent is at home and possesses milk,
bananas, and a drill.
Goals can also contain variables.​
For example, the goal of being at a store that sells milk would be represented
as:
At(x) ∧ Sells(x, Milk)
where x is a variable representing a store. The specific store is not known yet;
the goal is simply to be at some place x which satisfies the condition Sells(x,
Milk).

Representation of Actions (STRIPS Operators):


In the STRIPS framework, an operator is a structured representation of an
agent's action. It defines what the action is, the conditions required for
performing the action, and the changes it causes in the world.
A STRIPS operator consists of three main components:

1. Action Description:
This specifies the name of the action and its parameters.
It represents what the agent actually performs in the environment.
e.g., Go(there).

2. Precondition (Precond):
A conjunction of positive literals (atoms) that must be true in the current state
for the action to be legally applied.
It describes the conditions that must hold before the action can be executed.
e.g., At(here) ∧ Path(here, there).

3. Effects:
A conjunction of literals (positive or negative) that describes how the world
changes as a result of performing the action — what becomes true and what
becomes false.

Example Operator: Going from One Place to Another


The operator for an agent moving from one location (here) to another (there) is
defined as:
Op(Action: Go(there), Precond: At(here)∧Path(here,there), Effect:
At(there)∧¬At(here))

This operator represents the action of moving from here to there, requiring a
path between the two locations as a precondition, and resulting in the agent's
location changing to there while negating the previous location (here) denoting
that the agent is no longer here.

Representation of Plans: -
In Artificial Intelligence, a plan is a sequence of actions designed to achieve a
specific goal from a given initial state.​
Just as we represent states and actions formally, we also need a clear way to
represent plans so that a planning system can reason about them, execute them,
or modify them when needed.

Definition of a Plan
A plan can be represented as an ordered or partially ordered set of actions that
transforms the initial state into a goal state.

Formally:
Plan = [A1, A2, A3, …, An]
where
●​ each Ai is an action operator (such as Go(Home) or Buy(Milk)), and
●​ the sequence of actions is chosen so that performing them in order
satisfies the goal conditions.
For example, a plan to go home and buy milk might be:
Plan= [Go(Home), Buy(Milk)]
This plan consists of two actions: going home and buying milk. The plan's
success depends on the preconditions of each action being met and the effects
achieving the desired goal.

Components of a Plan
A typical plan representation includes:
1.​ Initial State (S₀):​
Describes the situation before any action is performed.
2.​ Goal State (G):​
Describes the desired conditions to be achieved.
3.​ Sequence of Actions:​
The list or structure of actions that, when applied in order starting from
S₀, lead to a state satisfying G.
Formally:
Plan = <S₀, {A1, A2, …, An}, G>

Types of Plan Representations


(a) Linear Plans (Sequential Plans)
●​ A sequence of actions that are executed one after another.
●​ Each action is executed exactly once in the given order.
●​ Representation:
●​ [A1 → A2 → A3 → … → An]
●​ Example:
[Go(Store), Buy(Milk), Go(Home)]

(b) Partial-Order Plans


●​ Actions are partially ordered, meaning some actions can occur in any
order as long as their dependencies are respected.
●​ This allows flexibility where actions that do not interfere with each other
can be performed in any order or even simultaneously.
●​ Represented as:
●​ (A1 < A3), (A2 < A3)
which means A1 and A2 must occur before A3, but A1 and A2 can occur
in any order.
●​ Useful for parallel or flexible planning.

(c) Conditional Plans


●​ These plans incorporate tests or conditions and include different branches
of actions for different possible outcomes (i.e., If X is true, do A; else, do
B).
●​ Conditional plans can be represented using:
▪​ If-then statements: "If condition X is true, then take action A."
▪​ Decision trees: Visual representations of possible conditions and
actions.
●​ Used when actions have uncertain effects or incomplete information.

(d) Hierarchical Plans


The hierarchical plan is a tree structure where abstract tasks (like 'Make dinner')
are iteratively decomposed into sub-tasks until only primitive actions (like
'Chop Carrots') remain.
Example of a Simple Plan
Goal: At(Home) ∧ Have(Milk)
Plan: [Go(Store), Buy(Milk), Go(Home)]
Explanation:
●​ Initially, the agent is at home without milk.
●​ After performing these actions in sequence, the agent ends up at home
with milk, satisfying the goal.

Planning with State-Space Search:


Planning with state-space search treats planning as a search problem, in
which the planner searches through a space of possible world states to find a
sequence of actions that achieves the goal.

Each node in the search space represents a state of the world, and each edge
represents an action that transforms one state into another.

Key Concepts
1. State Space: The state space is a set of all possible states a problem can be
in.
●​ State: A complete description of the world for the purpose of the problem
(e.g., the position of all tiles in the 8-puzzle, or the contents of a robot's
current location).
●​ Initial State: The starting configuration of the problem.
●​ Goal State: The desired configuration or a set of states that satisfy the
goal condition.
2. Operators (Actions)
These are the actions that can transition the problem from one state to another.
●​ Each operator has preconditions (what must be true to execute the
action) and effects (how the state changes after execution).
●​ The application of an operator corresponds to moving along an edge in
the search graph.
3. Plan
A plan is a sequence of operators that leads from the initial state to a goal state.
The search process aims to discover this sequence.

This is analogous to problem-solving search, but the difference is that planning


explicitly uses logical representations of states, actions, and goals.

Types of State-Space Search in Planning


There are two main approaches:
(a) Forward State-Space Search (Progression Search)
●​ Starts from the initial state and applies actions forward until a goal state
is reached.
●​ At each step, the planner applies all actions whose preconditions are
satisfied in the current state.
●​ Generates successor states by applying the effects of those actions.

Algorithm Outline:
Initial State → Apply applicable actions → Generate new states →
Continue until Goal State is reached

Example:​
If the initial state is At(Home) and the goal is Have(Milk),​
the planner might generate:
At(Home) → Go(Store) → At(Store) → Buy(Milk) → Have(Milk)
Advantages:
●​ Simple and intuitive.
●​ Suitable when the initial state is fully known.
Disadvantages:
●​ The search space can be very large.
●​ May explore many irrelevant states before reaching the goal.

(b) Backward State-Space Search (Regression Search)


●​ Starts from the goal state and works backward toward the initial state.
●​ At each step, the planner selects an action that could achieve part of the
goal and then replaces the goal with that action’s preconditions.
●​ This continues until the current goal description is satisfied by the initial
state.
Algorithm Outline:
Goal State → Find an action that achieves it → Replace goal with action’s
preconditions →
Continue until Initial State satisfies current goals

Example:​
Goal: At(Home) ∧ Have(Milk)
At(Home) ∧ Have(Milk)
→ (Buy(Milk)) gives precondition At(Store)
→ (Go(Store)) gives precondition At(Home)
→ Reached initial state: At(Home)
Advantages:
●​ Focuses search directly on actions relevant to the goal.
●​ Reduces branching compared to forward search.
Disadvantages:
●​ Complex to manage when multiple subgoals interact.
Example: Air Cargo Problem:
Forward Approach in the Air Cargo Problem
Given scenario:
●​ 10 airports: A₁ to A₁₀
●​ Each airport has 5 planes (total 50 planes).
●​ 20 cargo pieces per airport (total 200 cargos).
●​ Goal: Move all 20 cargos from Airport A to Airport B.

Initial State (S₀):


●​ At(C1, A), At(C2, A), …, At(C20, A)
●​ At(P1, A), At(P2, A), …, At(P5, A)
●​ (Other planes and cargos at their respective airports.)
Goal State (G):
●​ At(C1, B) ∧ At(C2, B) ∧ … ∧ At(C20, B)

Applicable Actions in the Initial State:


At the beginning, many actions are possible:
●​ Load actions:​
For each cargo Ci at airport A and each plane Pj at airport A,​
Load(Ci, Pj, A) is applicable if both At(Ci, A) and At(Pj, A) are true.
●​ Fly actions:​
For each plane Pj at any airport, and each possible destination (9 other
airports), Fly(Pj, from, to) is applicable if At(Pj, from) is true.
●​ Unload actions: (Ex: Unload(Ci, Pj, B) )
Applicable only if a cargo is already loaded in a plane at some airport.

Hence, even at the first step, there can be hundreds of applicable actions:
●​ (200 cargos × 5 planes per airport) = up to 1,000 Load actions.
●​ (50 planes × 9 destinations) = 450 Fly actions.
●​ Plus Unload actions later.

Backward Approach in the Air Cargo Problem

Goal State (G):


All 20 cargos must be at Airport B.
G= At(C1, B) ∧ At(C2, B) ∧ … ∧ At(C20, B)

Step 1: Identify Actions That Can Achieve the Goal


To make At(Ci, B) true, the relevant action is:
Unload(Ci, Pj, B)
since its effect includes At(Ci, B).

Step 2: Regress Goal Through the Unload Action


The preconditions of Unload(Ci, Pj, B) are:
In(Ci, Pj) ∧ At(Pj, B)
So, to achieve At(Ci, B), we now need these new subgoals:
●​ Cargo Ci is inside plane Pj, and
●​ Plane Pj is at airport B.
Thus, regressing the goal gives us:
G1={In(C1,Pj), At(Pj,B),…,In(C20,Pj), At(Pj,B)}

Step 3: Regress the Subgoal In(Ci, Pj)


To make In(Ci, Pj) true, the action is:
Load(Ci, Pj, A)
with preconditions:
At(Ci,A) ∧ At(Pj, A)

Step 4: Regress the Subgoal At(Pj, B)


To have At(Pj, B) true, the action is:
Fly(Pj, A, B)
with precondition:
At(Pj, A)

Step 5: Combine the Regression Results


After regressing all goals, the final set of required conditions becomes:
Gfinal={At(C1,A),At(C2,A),…,At(C20,A),At(P1,A)}
which exactly matches the initial state —all cargos and planes start at Airport
A.

Key Insight
Backward search avoids exploring irrelevant actions (like flying planes between
other airports) because it only considers actions that can achieve current goals.
Thus, the branching factor is much smaller than in forward search.

Partial Ordering Planning:


Forward and backward state–searches construct totally ordered plans consisting
of a strictly linear sequences of actions.
This representation ignores the fact that many subproblems are independent.

A solution to an air cargo problem consists of a totally ordered sequence of


actions, yet if 30 packages are being loaded onto one plane in one airport and 50
packages are being loaded onto another at another airport, it seems pointless to
come up with a strict linear ordering of 80 load actions; the two subsets of
actions should be thought of independently.

An alternative is to represent plans as partially ordered structures: a plan is a set


of actions and a set of constraints of the form Before(ai, aj) saying that one
action occurs before another.
Example: The Spare Tire Problem
Initial State
The car has a flat tire on the axle and the spare tire in the trunk.
At(Flat, Axle)∧ At(Spare, Trunk)

Goal State
The spare tire is on the axle.
At(Spare, Axle)
Relevant Actions
There are three main actions needed for the solution:
Preconditions (Must
Action Effects (Changes to the state)
be TRUE)
¬At(Flat, Axle) ∧
Remove(Flat, Axle) At(Flat, Axle)
At(Flat, Ground)
¬At(Spare, Trunk) ∧
Remove(Spare, Trunk) At(Spare, Trunk)
At(Spare, Ground)
At(Spare, Ground) ∧ At(Spare, Axle) ∧
PutOn(Spare, Axle)
¬At(Flat, Axle) ¬At(Spare, Ground)

A solution to the problem is [Remove(Flat , Axle), Remove(Spare, Trunk ),


PutOn(Spare, Axle)].

Note that Remove(Spare, Trunk ) and Remove(Flat, Axle) can be done in either
order as long as they are both completed before the PutOn(Spare, Axle) action.

Partially ordered plans are created by a search through the space of plans rather
than through the state space.
We start with the empty plan consisting of just the initial state and the goal, with
no actions in between, as shown in the following figure:

Fig: The tire problem expressed as an empty plan

The search procedure then looks for a flaw in the plan, and makes an addition to
the plan to correct the flaw (or if no correction can be made, the search
backtracks and tries something else). A flaw is anything that keeps the partial
plan from being a solution.
For example, one flaw in the empty plan is that no action achieves At(Spare,
Axle). One way to correct the flaw is to insert into the plan the action
PutOn(Spare, Axle).

Fig: An incomplete partially ordered plan for the tire problem. Boxes represent
actions and arrows indicate that one action must occur before another.

Of course that introduces some new flaws: the preconditions of the new action
(i.e., At(Spare, Ground) ∧ ¬At(Flat, Axle)) are not achieved. The search keeps
adding to the plan (backtracking if necessary) until all flaws are resolved, as
shown in the following figure:

Fig: A complete partially-ordered solution

At every step, we make the least commitment possible to fix the flaw.

For example, in adding the action Remove(Spare, Trunk ) we need to commit to


having it occur before PutOn(Spare, Axle), but we make no other commitment
that places it before or after other actions.
In the above figure, we can see a partially ordered plan that is a solution to the
spare tire problem. Actions are boxes and ordering constraints are arrows.
Note that Remove(Spare, Trunk ) and Remove(Flat, Axle) can be done in either
order as long as they are both completed before the PutOn(Spare, Axle) action.
ARTIFICIAL NEURAL NETWORKS:
The most fundamental unit of an artificial neural network is called an artificial
neuron.

Why is it called a neuron ? Where does the inspiration come from?


The inspiration comes from biology (more specifically, from the brain)
biological neurons = neural cells = neural processing units

The term neural network is generally used to describe a network of biological


neurons that are functionally connected to the central nervous system of most
living organisms.

A biological neuron looks as follows:

A Neuron, is an electrically excitable cell that receives, processes, and


transmits information through electro chemical signals.

A typical neuron consists of a cell body (soma), dendrites (input


channels), and an axon (output channel). The excitation of a signal is called
´firing´.
Dendrite: Receives signals from other neurons
Soma: Processes the information
Axon: Transmits the output of this neuron
Synapse: Point of connection to other neurons

There is a massively parallel interconnected network of neurons.


An average human brain has around 1011 (100 billion) neurons!

An artificial neural network (ANN) is composed of artificial neurons or nodes


that are connected together to form a network.

Artificial Neural Network is a computational model composed of


interconnected nodes (or "neurons") organized in layers. These nodes take
input, process it, and pass it to subsequent nodes.
ANN is a machine learning approach which is used to solve complex problems
like pattern recognition, classification, and prediction.

ANNs are inspired by the biological brain's structure but are mathematical
models implemented in software.

Each neuron in ANN is considered to be processing element that receives a


number of inputs on which an activation function is applied to obtain the output
value of the neuron.

Abstract model of a single neuron (or) Structure of neuron:


Each neuron consists of the following:
●​ Inputs: The neuron receives multiple input signals, x1​, x2​,…, xn.

●​ Weights (wi): Each input is associated with a weight, which determines


the influence of that input on the final output. The weight corresponds to
the strength of the synaptic connection in the biological brain.

●​ Summation Function: The weighted inputs are summed up, often with
an added bias term (b):
𝑛
Weighted Sum (or) Net Input (or) Net = ∑ 𝑤𝑖𝑥𝑖 + 𝑏
𝑖=1

●​ Activation Function: The net input is passed through an activation


function (or transfer function or squashing function) to produce the
neuron's output.
Output = f(Net)
This function introduces non-linearity, which is crucial for the network to
learn complex patterns. Common activation functions include the
Sigmoid, ReLU (Rectified Linear Unit), and Tanh.

Activation Functions
Activation functions are essential components of neural networks that introduce
non-linearity to the model, allowing it to learn complex patterns and map
inputs to outputs in a sophisticated way.

Some of the commonly used activation functions are as follows:


1. Step function:
A step function is characterized by having a binary output based on whether the
input value exceeds a certain threshold, θ. It is also known as sign function.
It used in Perceptrons and single-layer feedforward networks.

𝑓(𝑛𝑒𝑡) = {1, 𝑖𝑓 𝑛𝑒𝑡≥θ 0, 𝑖𝑓 𝑛𝑒𝑡 < θ


or equivalently (with bias included):
𝑓(𝑛𝑒𝑡) = {1, 𝑖𝑓 𝑛𝑒𝑡≥0 0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒

2. Sigmoid function:
1
𝑓(𝑛𝑒𝑡) = −𝑛𝑒𝑡
1+𝑒

It is smooth and differentiable. It used in multilayer networks (for


backpropagation).
The output is always a value between 0 and 1, i.e., (0, 1).

3. Hyperbolic Tangent (Tanh) Function


Alternative to sigmoid, with outputs between −1 and +1:
𝑛𝑒𝑡 −𝑛𝑒𝑡
𝑒 −𝑒
𝑓(𝑛𝑒𝑡) = 𝑡𝑎𝑛ℎ⁡(𝑛𝑒𝑡) = 𝑛𝑒𝑡 −𝑛𝑒𝑡
𝑒 +𝑒

4. ReLU (Rectified Linear Unit):


Common in modern deep learning.
ReLU is currently the most popular choice for hidden layers in deep neural
networks due to its simplicity and computational efficiency.
f(net)=max(0, net)
-----------------------------------------------------------------------------------------------
Learning: The process of adjusting the weights and biases to minimize the
difference between the network's output and the desired output for a given
training set is called learning or training.

Neural Network Architectures:


The architecture of a neural network is associated with the learning algorithm
used to train it.
Broadly, there are three classes of network architectures as listed below:
1. Single-Layer Feed-Forward Network
2. Multi-Layer Feed-Forward Network
3. Recurrent Network
Single-Layer Feed-Forward Networks:
A feed-forward network is one where connections between the nodes do not
form a cycle; information moves in only one direction—forward—from the
input layer to the output layer.
The simplest type is the Single-Layer Feed-Forward Network.
●​ Architecture:
It consists of only two layers: an input layer and an output layer (which
contains the processing elements). It has no hidden layers.
The input layer does not count because no computation is performed in
this layer.
●​ Operation:
Each input node is connected to each output node via a link having a
weight.
The input nodes simply distribute the input values to the output nodes.
The output nodes perform the weighted summation on the input
parameters and apply the activation function to produce the final output.

The following figure shows a Single-Layer Feed-Forward Network with


two input nodes {X1, X2} and four output nodes. Each link from input
node to each output node has weights Wij, where i=1, 2 and j=1, …, 4.

Perceptron:
A perceptron is a special form of single-layer feed-forward network and was
proposed by Frank Rosenblatt, an American psychologist, in the year 1958.
The perceptron performs binary classification, i.e., it decides whether an input
belongs to one class or another.

Architecture
The classic Perceptron is a Single-Layer Feed-Forward Network, meaning it has
only:
●​ An Input Layer (to receive data).
●​ An Output Layer (which contains the single processing
element/neuron).
●​ No hidden layers.

The Perceptron mimics a single biological neuron and consists of the following:
1.​ Input Signals (xi): It receives multiple input features (numerical values)
from the outside.
2.​ Weights (wi): Each input xi is multiplied by an associated weight wi,
which represents the importance or strength of that input.
3.​ Weighted Sum and Bias: The perceptron calculates the weighted sum
of the inputs and adds a bias term (b):
𝑛
Net = ∑ 𝑤𝑖𝑥𝑖 + 𝑏
𝑖=1
​ The bias allows the classifier line to be shifted away from the origin.
4.​ Activation Function: The sum (Net) is passed through an activation
function called step function (a hard threshold function) to produce a
final binary output:
Output= 𝑓(𝑛𝑒𝑡) = {1, 𝑖𝑓 𝑛𝑒𝑡≥θ 0, 𝑖𝑓 𝑛𝑒𝑡 < θ
where θ is the threshold value.
In the original Perceptron, the threshold θ is typically 0.

Working of the Perceptron


Each input xᵢ is multiplied by its corresponding weight wᵢ. The weighted inputs
are summed and bias b is added. The activation function checks whether the
total input crosses the threshold θ. If yes, output is 1; otherwise, output is 0.

Learning Rule/Training Rule


The perceptron learns by adjusting the weights and bias based on the error
between the desired output and actual output.

Weight update rule:


wᵢ(new) = wᵢ(old) + η (t − y) xᵢ
b(new) = b(old) + η (t − y)
where
●​ η = learning rate (0 < η ≤ 1)
●​ t = target (desired output)
●​ y = perceptron output

Learning Algorithm Steps


1. Initialize weights and bias to small random values.
2. For each training example:
a. Compute the output using current weights.
b. Compare the output with the target output.
c. Update weights using the learning rule.
3. Repeat until convergence or all examples are correctly classified.

Perceptron Convergence Theorem


The perceptron learning algorithm is guaranteed to find a set of weights that
correctly classifies the training examples if the data is linearly separable.
However, if the data is not linearly separable (like XOR), the perceptron fails to
converge.

Pseudocode
Initialize weights and bias
Repeat until convergence:
For each training example (x1, x2, ..., xn, t):
net = Σ (wi * xi) + b
if net ≥ 0 then y = 1 else y = 0
Δwi = η * (t - y) * xi
wi = wi + Δwi
b = b + η * (t - y)
End

Example (Numerical Illustration):


Let’s train a perceptron for the logical AND function.

Target
x₁ x₂
(t)
0 0 0
0 1 0
1 0 0
1 1 1
Given Parameters
Initial Weights: w₁ = 0, w₂ = 0, b = 0
Learning Rate (η): 1
Activation Function: f(net) = 1 if net ≥ 0 else 0
Update Rule:
wᵢ(new) = wᵢ(old) + η (t − y) xᵢ
b(new) = b(old) + η (t − y)

Epoch 1:
Input Target net Output Error w₁ w₂ b
(x₁,x₂) (t) (y) (t−y)
(0,0) 0 0 1 -1 0 0 -1
(0,1) 0 -1 0 0 0 0 -1
(1,0) 0 -1 0 0 0 0 -1
(1,1) 1 -1 0 1 1 1 0
Final weights after Epoch 1: w₁ = 1, w₂ = 1, b = 0

Epoch 2:
Input Target net Output Error w₁ w₂ b
(x₁,x₂) (t) (y) (t−y)
(0,0) 0 0 1 -1 1 1 -1
(0,1) 0 0 1 -1 1 0 -2
(1,0) 0 -1 0 0 1 0 -2
(1,1) 1 -1 0 1 2 1 -1
Final weights after Epoch 2: w₁ = 2, w₂ = 1, b = -1
Epoch 3:
Input Target net Output Error w₁ w₂ b
(x₁,x₂) (t) (y) (t−y)
(0,0) 0 -1 0 0 2 1 -1
(0,1) 0 0 1 -1 2 0 -2
(1,0) 0 0 1 -1 1 0 -3
(1,1) 1 -2 0 1 2 1 -2
Final weights after Epoch 3: w₁ = 2, w₂ = 1, b = -2
Epoch 4:
Input Target net Output Error w₁ w₂ b
(x₁,x₂) (t) (y) (t−y)
(0,0) 0 -2 0 0 2 1 -2
(0,1) 0 -1 0 0 2 1 -2
(1,0) 0 0 1 -1 1 1 -3
(1,1) 1 -1 0 1 2 2 -2
Final weights after Epoch 4: w₁ = 2, w₂ = 2, b = -2
Epoch 5:
Input Target net Output Error w₁ w₂ b
(x₁,x₂) (t) (y) (t−y)
(0,0) 0 -2 0 0 2 2 -2
(0,1) 0 0 1 -1 2 1 -3
(1,0) 0 -1 0 0 2 1 -3
(1,1) 1 0 1 0 2 1 -3
Final weights after Epoch 5: w₁ = 2, w₂ = 1, b = -3

Epoch 6:
Input Target net Output Error w₁ w₂ b
(x₁,x₂) (t) (y) (t−y)
(0,0) 0 -3 0 0 2 1 -3
(0,1) 0 -2 0 0 2 1 -3
(1,0) 0 -1 0 0 2 1 -3
(1,1) 1 0 1 0 2 1 -3
Final weights after Epoch 6: w₁ = 2, w₂ = 1, b = -3
Final Converged Values
After 6 epochs, the perceptron converges with:
w₁ = 2, w₂ = 1, b = -3
These weights correctly classify all inputs of the AND gate.

Final Decision Boundary Equation:


Net equation:
net=2x1+x2 -3
Decision boundary equation (set net = 0)
2x1+x2 -3=0
Example: Implement OR gate with Perceptron

OR gate Truth Table


Input x₁ Input x₂ Target Output (t)
0 0 0
0 1 1
1 0 1
1 1 1

Given Parameters
Initial Weights: w₁ = 0.2, w₂ = 0.4, b = 0.2
Learning Rate (η): 0.2
Activation Function: f(net) = 1 if net ≥ 0 else 0
Update Rule: wᵢ(new) = wᵢ(old) + η (t − y) xᵢ
b(new) = b(old) + η (t − y)

Epoch 1:
Input Target net Output Error w₁ w₂ b
(x₁,x₂) (t) (y) (t−y)
(0,0) 0 0.2 1 -1 0.2 0.4 0.0
(0,1) 1 0.4 1 0 0.2 0.4 0.0
(1,0) 1 0.2 1 0 0.2 0.4 0.0
(1,1) 1 0.6 1 0 0.2 0.4 0.0
Final weights after Epoch 1: w₁ = 0.2, w₂ = 0.4, b = 0.0
Epoch 2:
Input Target net Output Error w₁ w₂ b
(x₁,x₂) (t) (y) (t−y)
(0,0) 0 0.0 1 -1 0.2 0.4 -0.2
(0,1) 1 0.2 1 0 0.2 0.4 -0.2
(1,0) 1 0.0 1 0 0.2 0.4 -0.2
(1,1) 1 0.4 1 0 0.2 0.4 -0.2
Final weights after Epoch 2: w₁ = 0.2, w₂ = 0.4, b = -0.2
Epoch 3:
Input Target net Output Error w₁ w₂ b
(x₁,x₂) (t) (y) (t−y)
(0,0) 0 -0.2 0 0 0.2 0.4 -0.2
(0,1) 1 0.2 1 0 0.2 0.4 -0.2
(1,0) 1 0.0 1 0 0.2 0.4 -0.2
(1,1) 1 0.4 1 0 0.2 0.4 -0.2
Final weights after Epoch 3: w₁ = 0.2, w₂ = 0.4, b = -0.2
Final Converged Values
After 3 epochs, the perceptron converges with:
w₁ = 0.2, w₂ = 0.4, b = -0.2
These weights correctly classify all inputs of the OR gate.

Final Decision Boundary Equation


Net equation
net=0.2x1+0.4x2-0.2
Decision boundary (set net = 0)
0.2x1+0.4x2-0.2=0
Multiply by 5 to avoid decimals:
x1+2x2-1=0

Multilayer Feed Forward Networks


Multilayer Feed Forward Networks are used to handle data that is not linearly
separable or not separable by hyper-plane.
A Multi-Layer Feed-Forward Neural Network (MLFFNN) is an extension
of the basic perceptron (single-layer neural network) that can model complex,
non-linear relationships.
It is characterized by its structure where information flows in only one
direction—forward—from the input layer, through one or more hidden layers, to
the output layer, without any loops or feedback connections.

Architecture:
A typical MLFFNN consists of at least three layers:
1. Input Layer
●​ Receives the initial data (i.e., features from the dataset.)
●​ Contains neurons (or, nodes) equal to the number of features or
dimensions in the input data. These nodes simply pass the input values to
the next layer.
●​ No computation occurs here.
2. Hidden Layers (One or More)
●​ Perform intermediate computations and learn complex patterns from the
input data.
●​ These layers consist of fully connected neurons. Each neuron in a hidden
layer is connected to every neuron in the preceding layer and every
neuron in the subsequent layer (hence, "fully connected").
●​ More hidden layers allow the network to capture more complex patterns.
●​ Key Operation (Per Neuron):
1.​ Weighted Sum (z): It calculates the sum of the products of inputs
(xi) from the previous layer and their corresponding weights (wi),
and then adds a bias (b).
z = Σwi xi + b
2.​ Activation: This sum is then passed through a non-linear
activation function (g) (like sigmoid, ReLU, tanh), which
introduces non-linearity, allowing the network to approximate any
continuous function (Universal Approximation Theorem).
a = g(z).

3. Output Layer
●​ Produces the final result of the network, which is the prediction.
●​ The number of neurons depends on the task (e.g., one neuron for binary
classification/regression, multiple neurons for multi-class classification).
●​ Activation: The activation function here is chosen based on the task:
o​ Regression: Often uses a linear (identity) activation function.
o​ Binary Classification: Often uses the Sigmoid function to output
a probability between 0 and 1.
o​ Multi-Class Classification: Often uses the Softmax function to
output a probability distribution over the classes.

Working of a Multi-Layer Feed-Forward Network


The function of an MLFFNN is to find the optimal set of weights and biases
that minimize the difference between the network's output and the true target
output. This is achieved through a supervised learning process involving two
main phases: Forward Propagation and Backpropagation.

1. Forward Propagation
This is the process of generating a prediction from a given input. The
computation moves forward from input to output:
1.​ Input data enters the Input Layer.
2.​ Data is passed forward to the first Hidden Layer. Each neuron computes
its weighted sum and applies its activation function.
3.​ The output of the first hidden layer becomes the input for the next hidden
layer, and this process continues until the data reaches the Output Layer.
4.​ The output layer produces the network's final Prediction (ŷ).

2. Loss (or, Error) Calculation


After forward propagation, a Loss Function is used to quantify the difference
between the network's prediction (ŷ) and the actual true value (y).
●​ Example Loss Functions: Mean Squared Error (MSE) for regression,
Cross-Entropy for classification.

3. Backpropagation (The Learning Mechanism)


Backpropagation is the algorithm that computes the gradient of the loss function
with respect to every weight in the network. This involves:
1.​ Backward Error Propagation: The error is calculated at the output layer
and then propagated backward through the network, layer by layer, using
the chain rule of calculus. This determines how much each individual
weight contributed to the final error.
2.​ Weight Update: An Optimization Algorithm (like Gradient Descent or
Adam) uses the calculated gradients to slightly adjust the weights and
biases. The adjustment is made in the direction that reduces the loss (or,
error). The size of this adjustment is controlled by the learning rate (η):
∂𝐸
wnew = wold - η ∂𝑤
where, E is the error (or, loss) .

This entire process (Forward Pass, Loss Calculation, Backpropagation, Weight


Update) is repeated iteratively over the entire training dataset for many epochs
until the network's predictions are sufficiently accurate (i.e., the loss is
minimized).

You might also like