0% found this document useful (0 votes)
18 views151 pages

Understanding Intelligent Agents in AI

The document outlines the fundamentals of Artificial Intelligence (AI), focusing on intelligent agents, their characteristics, and various search algorithms. It defines key concepts such as rationality, types of agents, and advantages and disadvantages of AI, while also detailing the structure of intelligent agents using the PEAS model. Additionally, it discusses specific search algorithms like Breadth-First Search and Bidirectional Search, including their advantages, disadvantages, and examples.

Uploaded by

michal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views151 pages

Understanding Intelligent Agents in AI

The document outlines the fundamentals of Artificial Intelligence (AI), focusing on intelligent agents, their characteristics, and various search algorithms. It defines key concepts such as rationality, types of agents, and advantages and disadvantages of AI, while also detailing the structure of intelligent agents using the PEAS model. Additionally, it discusses specific search algorithms like Breadth-First Search and Bidirectional Search, including their advantages, disadvantages, and examples.

Uploaded by

michal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Course Code/Course Name: AL3391/ ARTIFICIAL INTELLIGENCE

UNIT I INTELLIGENT AGENTS

Introduction to AI – Agents and Environments – Concept of rationality – Nature of


environments Structure of agents. Problem solving agents – search algorithms –
uninformed search strategies.

PART-A (2 Marks)
1. Define Artificial intelligence.
"It is a branch of computer science by which we can create intelligent
machines whichcan behave like a human, think like humans, and able to
make decisions."
2. Define Rationality.

 Rationality is nothing but status of being reasonable, sensible, and


having goodsense of judgment.

 Rationality is concerned with expected actions and results depending


upon whatthe agent has perceived.

 Performing actions with the aim of obtaining useful information is


an importantpart of rationality.

3. List out the types of uninformed search Algorithm.


o Breadth-first search
o Uniform cost search
o Depth-first search
o Iterative deepening depth-first search
o Bidirectional Search

4. Define Agent.
In artificial intelligence, an agent is a computer program or system that is
designed to perceive its environment, make decisions and take actions to
achieve a specific goal or set of goals. The agent operates autonomously,
meaning it is not directly controlled by a human operator.

5. Define utility-based agents


A utility-based agent is an agent that acts based not only on what the goal is,
but the best way to reach that goal.

6. Define payoff?
Payoff function is used for modeling human behavior. Payoff function for a
player is a mapping from the cross-product of players strategy spaces to the
players set of payoffs.

7. Write advantages and disadvantages of AI.


Following are some main advantages of Artificial Intelligence:
 High Accuracy with less errors: AI machines or systems are prone

to less errors and high accuracy as it takes decisions as per pre-


experience or information.
 High-Speed: AI systems can be of very high-speed and fast-
decision making, because of that AI systems can beat a chess
champion in the Chess game.
 High reliability: AI machines are highly reliable and can perform
the same action multiple times with high accuracy.
 Useful for risky areas: AI machines can be helpful in situations
such as defusing a bomb, exploring the ocean floor, where to
employ a human can be risky.
 Digital Assistant: AI can be very useful to provide digital assistant
to the users such as AI technology is currently used by various E-
commerce websites to show the products as per customer
requirement.
 Useful as a public utility: AI can be very useful for public utilities
such as a self-driving car which can make our journey safer and
hassle-free, facial recognition for security purpose, Natural
language processing to communicate with the human in human-
language, etc.

Following are the disadvantages of AI:


 High Cost: The hardware and software requirement of AI is very
costly as it requires lots of maintenance to meet current world
requirements.
 Can't think out of the box: Even we are making smarter machines
with AI, but still they cannot work out of the box, as the robot will
only do that work for which they are trained, or programmed.
 No feelings and emotions: AI machines can be an outstanding
performer, but still it does not have the feeling so it cannot make
any kind of emotional attachment with human, and may
sometime be harmful for users if the proper care is not taken.
 Increase dependency on machines: With the increment of
technology, people are getting more dependent on devices and
hence they are losing their mental capabilities.

8. List out agent terminologies.


 Performance Measure of Agent − It is the criteria, which
determines howsuccessful an agent is.
 Behavior of Agent − It is the action that agent performs after
any given sequenceof percepts.
 Percept − It is agent’s perceptual inputs at a given instance.
 Percept Sequence − It is the history of all that an agent has perceived
till date.
 Agent Function − It is a map from the precept sequence to an action.

9. Define Rational agent.


A rational agent is an artificial intelligence (AI) component. It applies AI to
different real-world problems. As such, it chooses an action from a set of
distinct options. It has models that allow it to deal with unexpected variables
and always selects the best possible outcome from all the available options.

The term “rational agent,” however, is not only applied to a system. It can also
refer to a person, a company, or an application, practically anything or anyone
that makes rational decisions.
10. What is meant by informed search algorithm.
The algorithms of an informed search contain information regarding the
goal state. It helps an AI make more efficient and accurate searches. A
function obtains this data/info to estimate the closeness of a state to its goal in
the system. For example, Graph Search and Greedy Search.

11. Define DFS.


Depth-first search (DFS) is an algorithm for traversing or searching tree or
graph data structures. The algorithm starts at the root node (selecting some
arbitrary node as the root node in the case of a graph) and explores as far as
possible along each branch before backtracking.

12. Define BFS.


BFS, or Breadth-First Search, is a node method for obtaining the graph's
shortest path. It makes use of a queue data structure with FIFO (first in, first
out) ordering.
13. List down the characteristics of intelligent agent
 Learning
 Autonomy
 Informed decision-making
 Simple reflex agents
 Characteristics of intelligent agents
 Goals
 Perception
 Reduced costs
 The power of continuous adaptation
 Problem solving
 Reactivity
 Scalable and adaptable performance
 Actions

14. What are the steps involved to solve a problem in AI?


To solve a problem, an AI agent goes through a structured process that involves defining
the problem, gathering knowledge, exploring possible solutions, and executing the best
strategy.

15. Differentiate between Intelligence and Artificial Intelligence.


Intelligence (Human Intelligence)
 Source: Arises from biological evolution and development within the human brain.

 Cognitive Aspects: Involves consciousness, emotions, intuition, empathy, creativity, and complex
judgment.

 Learning: Humans learn through a combination of education, experience, and exposure, with the
ability to generalize knowledge and learn from fewer examples.

 Adaptability: Can adapt to new and unexpected situations by generalizing knowledge and using
common sense.

 Strengths: Excels in creative thinking, emotional understanding, ethical decision-making, and


complex problem-solving.

Artificial Intelligence (AI)

 Source: Machine-based, created by humans and driven by data, algorithms, and programming.

 Cognitive Aspects: Lacks consciousness, emotions, intuition, and self-awareness.

 Learning: Learns and improves performance through data and algorithms, requiring large amounts of
training.

 Adaptability: Limited to the data it's trained on; while it can adapt to new data, it can't truly generalize
across vastly different domains without specific programming.

 Strengths: Excels at high-speed data processing, performing repetitive tasks, and recognizing
patterns in vast datasets.

PART-B (15 Marks)


1. Explain with example about Bidirectional search Algorithm.

Bidirectional Search Algorithm:

Bidirectional search algorithm runs two simultaneous searches, one form initial state
called as forward-search and other from goal node called as backward-search, to find
the goal node. Bidirectional search replaces one single search graph with two small
subgraphs in which one starts the search from an initial vertex and other starts from
goal vertex. The search stops when these two graphs intersect each other.

Bidirectional search can use search techniques such as BFS, DFS, DLS, etc.
Advantages:
o Bidirectional search is fast.
o Bidirectional search requires less memory

Disadvantages:
o Implementation of the bidirectional search tree is difficult.
o In bidirectional search, one should know the goal state in advance .

Example

In the below search tree, bidirectional search algorithm is applied. This algorithm
divides one graph/tree into two sub-graphs. It starts traversing from node 1 in the
forward direction and starts from goal node 16 in the backward direction.

The algorithm terminates at node 9 where two searches meet.

Completeness:
Bidirectional Search is complete if we use BFS in both searches.
Time Complexity: Time complexity of bidirectional search using BFS is O(bd ).
Space Complexity: Space complexity of bidirectional search is O(bd ).
Optimal: Bidirectional search is Optimal.

[Link] detail about structure and nature of environments.


Nature of Environments:
The environment is the Task Environment (problem) for which the Rational Agent is the
solution. Any task environment is characterised on the basis of PEAS.
1. Performance – What is the performance characteristic which would either make
the agent successful or not.
For example, as per the previous example clean floor, optimal energy consumption
might be performance measures.
2. Environment – Physical characteristics and constraints expected. For example,
wood floors, furniture in the way etc
3. Actuators – The physical or logical constructs which would take action. For
example for the vacuum cleaner, these are the suction pumps
4. Sensors – Again physical or logical constructs which would
sense the environment.

Rational Agents could be physical agents like the one described above or it could also be a
program that operates in a non-physical environment like an operating system. Imagine a bot
web site operator designed to scan Internet news sources and show the interesting items to its
users, while selling advertising space to generate revenue.
As another example, consider an online tutoring system

Agent Performance Environment Actuator Sensor

Math E Computer display system for


SLA defined score Student, Teacher, Keyboard,
learning exercises, corrections,
on the test parents Mouse
system feedback

 Observable – Full or Partial? If the agents sensors get full access then they do not need to
pre- store any information. Partial may be due to inaccuracy of sensors or incomplete
information about an environment, like limited access to enemy territory
 Number of Agents – For the vacuum cleaner, it works in a single agent environment but for
driver-less taxis, every driver-less taxi is a separate agent and hence multi agent
environment
 Deterministic – The number of unknowns in the environment which affect the predictability
of the environment. For example, floor
space for cleaning is mostly
deterministic, the furniture is where it is
most of the time but taxi driving on a
road is non-deterministic.
 Discrete – Does the agent respond
when needed or does it have to
continuously scan the environment.
Driver-less is continuous, online tutor is
discrete
 Static – How often does the
environment change. Can the agent
learn about the environment and
always do the same thing?
 Episodic – If the response to a certain
precept is not dependent on the previous one i.e. it is stateless (static methods in Java)
then it is discrete. If the decision taken now influences the future decisions then it is a
sequential environment.

[Link] structure of Intelligent agents in detail.


Intelligent Agents:

An intelligent agent is an autonomous entity which act upon an environment using sensors and
actuators for achieving goals. An intelligent agent may learn from the environment to achieve
their goals. A thermostat is an example of an intelligent agent.

Following are the main four rules for an AI agent:

o Rule 1: An AI agent must have the ability to perceive the environment.


o Rule 2: The observation must be used to make decisions.
o Rule 3: Decision should result in an action.
o Rule 4: The action taken by an AI agent must be a rational action.

Rational Agent:

A rational agent is an agent which has clear preference, models uncertainty, and acts in a way to
maximize its performance measure with all possible actions.

A rational agent is said to perform the right things. AI is about creating rational agents to use for
game theory and decision theory for various real-world scenarios.

For an AI agent, the rational action is most important because in AI reinforcement learning
algorithm, for each best possible action, agent gets the positive reward and for each wrong
action, an agent gets a negative reward.

Note: Rational agents in AI are very similar to intelligent agents.

Rationality:

The rationality of an agent is measured by its performance measure. Rationality can be judged on
the basis of following points:

o Performance measure which defines the success criterion.


o Agent prior knowledge of its environment.
o Best possible actions that an agent can perform.
o The sequence of percepts.

Note: Rationality differs from Omniscience because an Omniscient agent knows the actual
outcome of its action and act accordingly, which is not possible in reality.

Structure of an AI Agent
The task of AI is to design an agent program which implements the agent function. The structure
of an intelligent agent is a combination of architecture and agent program. It can be viewed as:

Agent = Architecture + Agent program

Following are the main three terms involved in the structure of an AI agent:

Architecture: Architecture is machinery that an AI agent executes on.

Agent Function: Agent function is used to map a percept to an action

f:P* → A

Agent program: Agent program is an implementation of agent function. An agent program


executes on the physical architecture to produce function f.

PEAS Representation

PEAS is a type of model on which an AI agent works upon. When we define an AI agent or
rational agent, then we can group its properties under PEAS representation model. It is made up
of four words:

o P: Performance measure
o E: Environment
o A: Actuators
o S: Sensors

Here performance measure is the objective for the success of an agent's behavior.

PEAS for self-driving cars:

Let's suppose a self-driving car then PEAS representation will be:

Performance: Safety, time, legal drive, comfort Environment:

Roads, other vehicles, road signs, pedestrian Actuators:

Steering, accelerator, brake, signal, horn

Sensors: Camera, GPS, speedometer, odometer, accelerometer, sonar.

Example of Agents with their PEAS representation

Agent Performance measure Environment Actuators Sensors


1. Keyboard
o Healthy patient o Patient o Tests
Medical (Entry of symptoms)
Diagnose o Minimized cost o Hospital o Treatments
o Staff

2.
o Cleanness o Room o Wheels o Camera
Vacuum
Cleaner
o Efficiency o Table o Brushes o Dirt detection
o Battery life o Wood floor o Vacuum sensor
o Security o Carpet Extractor o Cliff sensor
o Various o Bump Sensor
obstacles o Infrared Wall
Sensor

3. Part - o Percentage of o Conveyor


parts in correct with parts,
belt o Jointed Arms o Camera
picking
Robot bins. o Hand o Joint angle
o Bins
sensors.
[Link] detail about BFS search algorithm
Breadth-first Search:

a. Breadth-first search is the most common search strategy for traversing a


tree or graph. This algorithm searches breadthwise in a tree or graph, so it is
called breadth-first search.
b. BFS algorithm starts searching from the root node of the tree and expands all
successor node at the current level before moving to nodes of next level.
c. The breadth-first search algorithm is an example of a general-graph search algorithm.
d. Breadth-first search implemented using FIFO queue data structure.

Example:

In the below tree structure, we have shown the traversing of the tree using BFS algorithm from
the root node S to goal node K. BFS search algorithm traverse in layers, so it will follow the path
which is shown by the dotted arrow, and the traversed path will be:
S---> A--->B---->C--->D---->G--->H--->E---->F---->I ---- >K

Advantages:

BFS will provide a solution if any solution exists.


If there are more than one solutions for a given problem, then BFS will provide the minimal solution which
requires the least number of steps.
Disadvantages:
It requires lots of memory since each level of the tree must be saved into memory to expand
the next level.
BFS needs lots of time if the solution is far away from the root node.
Time Complexity: Time Complexity of BFS algorithm can be obtained by the number of nodes
traversed in BFS until the shallowest Node. Where the d= depth of shallowest solution and b is a
node at every state.

T (b) = 1+b2+b3+ ..... + bd= O (bd)

Space Complexity: Space complexity of BFS algorithm is given by the Memory size of frontier
which is O(bd).

Completeness: BFS is complete, which means if the shallowest goal node is at some finite
depth, then BFS will find a solution.

Optimality: BFS is optimal if path cost is a non-decreasing function of the depth of the node.

4. Explain detail about DFS search algorithm.


1. Depth-first Search

o Depth-first search isa recursive algorithm for traversing a tree or graph data structure.
o It is called the depth-first search because it starts from the root node and follows
each path to its greatest depth node before moving to the next path.
o DFS uses a stack data structure for its implementation.
o The process of the DFS algorithm is similar to the BFS algorithm.

Note: Backtracking is an algorithm technique for finding all possible solutions using recursion.

Advantage:

o DFS requires very less memory as it only needs to store a stack of the nodes on the path
from root node to the current node.
o It takes less time to reach to the goal node than BFS algorithm (if it traverses in the right
path).

Disadvantage:

o There is the possibility that many states keep re-occurring, and there is no guarantee of
finding the solution.
o DFS algorithm goes for deep down searching and sometime it may go to the infinite loop.
Completeness: DFS search algorithm is complete within finite state space as it will expand
every node within a limited search tree.

Time Complexity: Time complexity of DFS will be equivalent to the node traversed by the
algorithm. It is given by:

T(n)= 1+ n2+ n3 + ....... + nm=O(nm)

Where, m= maximum depth of any node and this can be much larger than d (Shallowest
solution depth)

Space Complexity: DFS algorithm needs to store only single path from the root node, hence
space complexity of DFS is equivalent to the size of the fringe set, which is O(bm).

Optimal: DFS search algorithm is non-optimal, as it may generate a large number of steps or
high cost to reach to the goal node.

Example:

In the below search tree, we have shown the flow of depth-first search, and it will follow the
order as:

Root node--->Left node ---- > right node.

It will start searching from root node S, and traverse A, then B, then D and E, after traversing E,
it will backtrack the tree as E has no other successor and still goal node is not found. After
backtracking it will traverse node C and then G, and here it will terminate as it found goal node.
5. Explain detail about properties of agent Environments.

Agent Environment in AI:

An environment is everything in the world which surrounds the agent, but it is not a part of an
agent itself. An environment can be described as a situation in which an agent is present.

The environment is where agent lives, operate and provide the agent with something to sense and
act upon it. An environment is mostly said to be non-feministic.

Features of Environment

Environment can have various features from the point of view of an agent:

1. Fully observable vs Partially Observable


2. Static vs Dynamic
3. Discrete vs Continuous
4. Deterministic vs Stochastic
5. Single-agent vs Multi-agent
6. Episodic vs sequential
7. Known vs Unknown
8. Accessible vs Inaccessible

1. Fully observable vs Partially Observable:

o If an agent sensor can sense or access the complete state of an environment at each
point of time then it is a fully observable environment, else it is partially observable.
o A fully observable environment is easy as there is no need to maintain the internal state
to keep track history of the world.
o An agent with no sensors in all environments then such an environment is
called as unobservable.

2. Deterministic vs Stochastic:

o If an agent's current state and selected action can completely determine the next state
of the environment, then such environment is called a deterministic environment.
o A stochastic environment is random in nature and cannot be determined completely by
an agent.
o In a deterministic, fully observable environment, agent does not need to worry
about uncertainty.

3. Episodic vs Sequential:

o In an episodic environment, there is a series of one-shot actions, and only the


current percept is required for the action.
o However, in Sequential environment, an agent requires memory of past actions
to determine the next best actions.

4. Single-agent vs Multi-agent

o If only one agent is involved in an environment, and operating by itself then such
an environment is called single agent environment.
o However, if multiple agents are operating in an environment, then such an environment
is called a multi-agent environment.
o The agent design problems in the multi-agent environment are different from single
agent environment.

5. Static vs Dynamic:

o If the environment can change itself while an agent is deliberating then such
environment is called a dynamic environment else it is called a static environment.
o Static environments are easy to deal because an agent does not need to continue
looking at the world while deciding for an action.
o However for dynamic environment, agents need to keep looking at the world at
each action.
o Taxi driving is an example of a dynamic environment whereas Crossword puzzles are an
example of a static environment.

6. Discrete vs Continuous:

o If in an environment there are a finite number of percepts and actions that can be
performed within it, then such an environment is called a discrete environment else it is
called continuous environment.
o A chess gamecomes under discrete environment as there is a finite number of moves
that can be performed.
o A self-driving car is an example of a continuous environment.

7. Known vs Unknown

o Known and unknown are not actually a feature of an environment, but it is an


agent's state of knowledge to perform an action.
o In a known environment, the results for all actions are known to the agent. While
in unknown environment, agent needs to learn how it works in order to perform an
action.
o It is quite possible that a known environment to be partially observable and an
Unknown environment to be fully observable.
8. Accessible vs Inaccessible

o If an agent can obtain complete and accurate information about the state's
environment, then such an environment is called an Accessible environment else it is
called inaccessible.
o An empty room whose state can be defined by its temperature is an example of an
accessible [Link] about an event on earth is an example of
Inaccessible environment.
[Link] details about any two uninformed search algorithm with an example
2. Depth-Limited Search Algorithm:
A depth-limited search algorithm is similar to depth-first search with a predetermined limit. Depth-
limited search can solve the drawback of the infinite path in the Depth-first search. In this algorithm,
the node at the depth limit will treat as it has no successor nodes further.
Depth-limited search can be terminated with two Conditions of failure:

o Standard failure value: It indicates that problem does not have any solution.
o Cutoff failure value: It defines no solution for the problem within a given depth limit.

Advantages:
Depth-limited search is Memory efficient.

Disadvantages:

o Depth-limited search also has a disadvantage of incompleteness.

Example:

Completeness: DLS search algorithm is complete if the solution is above the depth-limit.

Time Complexity: Time complexity of DLS algorithm is O(bℓ). Space

Complexity: Space complexity of DLS algorithm is O(b×ℓ).

Optimal: Depth-limited search can be viewed as a special case of DFS, and it is also not optimal
even if ℓ>d.

3. Uniform-cost Search Algorithm:


Uniform-cost search is a searching algorithm used for traversing a weighted tree or graph. This
algorithm comes into play when a different cost is available for each edge. The primary goal of
the uniform-cost search is to find a path to the goal node which has the lowest cumulative cost.
Uniform-cost search expands nodes according to their path costs form the root node. It can be
used to solve any graph/tree where the optimal cost is in demand. A uniform-cost search
algorithm is implemented by the priority queue. It gives maximum priority to the lowest
cumulative cost. Uniform cost search is equivalent to BFS algorithm if the path cost of all edges
is the same.
Advantages:

o Uniform cost search is optimal because at every state the path with the least cost
is chosen.

Disadvantages:

o It does not care about the number of steps involve in searching and only concerned
about path cost. Due to which this algorithm may be stuck in an infinite loop.

Example:

Completeness:
Uniform-cost search is complete, such as if there is a solution, UCS will find it.

Time Complexity:

Let C* is Cost of the optimal solution, and ε is each step to get closer to the goal node. Then
the number of steps is = C*/ε+1. Here we have taken +1, as we start from state 0 and end to
C*/ε.

Hence, the worst-case time complexity of Uniform-cost search isO(b1 + [C*/ε])/.

Space Complexity:
The same logic is for space complexity so, the worst-case space complexity of Uniform-cost
search is O(b1 + [C*/ε]).

Optimal:

Uniform-cost search is always optimal as it only selects a path with the lowest path cost.
UNIT II PROBLEM SOLVING
Heuristic search strategies – Heuristic functions. Local search and optimization problems –
local search in continuous space – search with non-deterministic actions – search in partially
observable environments – online search agents and unknown environments.
PART-A (2 Marks)
1. What is heuristic functions?
 Heuristics function: Heuristic is a function which is used in
Informed Search, andit finds the most promising path.
 It takes the current state of the agent as its input and
produces the estimation ofhow close agent is from the goal.
 The heuristic method, however, might not always give the
best solution, but itguaranteed to find a good solution in
reasonable time.
 Heuristic function estimates how close a state is to the goal.
 It is represented by h(n), and it calculates the cost of an optimal
path between thepair of states. The value of the heuristic function
 is always positive.
 Admissibility of the heuristic
function is given as:h(n) <=
h*(n)
 Here h(n) is heuristic cost, and h*(n) is the estimated cost. Hence
heuristic costshould be less than or equal to the estimated cost.

2. Define greedy search.


 Greedy best-first search algorithm always selects the path which appears best
atthat moment.
 It is the combination of depth-first search and breadth-first search algorithms.
 It uses the heuristic function and search.
 Best-first search allows us to take the advantages of both algorithms.
 With the help of best-first search, at each step, we can choose the most promising
node. In the best first search algorithm, we expand the node which is closest to the
goal node and the closest cost is estimated by heuristic function, i.e.
f(n)= g(n).
Where, h(n)= estimated cost from node n to the goal.
The greedy best first algorithm is implemented by the priority queue.
3. What is A*search.
 A* search is the most commonly known form of best-first search.
 It uses heuristic function h(n), and cost to reach the node n from the start state g(n).
 It has combined features of UCS and greedy best-first search, by which it solve the
problem efficiently.
 A* search algorithm finds the shortest path through the search space using the
heuristic function.
 This search algorithm expands less search tree and provides optimal result faster. A*
algorithm is similar to UCS except that it uses g(n)+h(n) instead of g(n).
In A* search algorithm, we use search heuristic as well as the cost to reach the node. Hence
we

4. Write a advantages and disadvantages of A* search.

Advantages:
 A* search algorithm is the best algorithm than other search
algorithms.
 A* search algorithm is optimal and complete.
 This algorithm can solve very complex problems.

Disadvantages
:
 It does not always produce the shortest path as it mostly
based on heuristics andapproximation.
 A* search algorithm has some complexity issues.
 The main drawback of A* is memory requirement as
it keeps all generatednodes in the memory, so it is not
practical for various large-scale problems.
5. Write a advantages and disadvantages of greedy search.
The main advantage of the greedy method is that it is relatively easy to
implement and understand. However, there are some disadvantages to
using this method. First, the greedy method is not guaranteed to find the
best solution. Second, it can be quite slow.

6. Define pure heuristic search


 Pure heuristic search is the simplest form of heuristic search
algorithms.
 It expands nodes based on their heuristic value h(n). It
maintains two lists, OPEN and CLOSED list. In the CLOSED list,
it places those nodes which have already expanded and in the
OPEN list, it places nodes which have yet not been expanded.
 On each iteration, each node n with the lowest heuristic value
is expanded and generates all its successors and n is placed to
the closed list.
 The algorithm continues unit a goal state is found.
 In the informed search we will discuss two main algorithms
which are given below:
 Best First Search Algorithm(Greedy search)
 A* Search Algorithm

7. Write a properties of heuristic search algorithm.


Use of heuristic function in a heuristic search algorithm leads to
following propertiesof a heuristic search algorithm:
 Admissible Condition: An algorithm is said to be admissible,
if it returns anoptimal solution.
 Completeness: An algorithm is said to be complete, if it
terminates with asolution (if the solution exists).

Dominance Property: If there are two admissible heuristic


algorithms A1 and A2 having h1 and h2 heuristic
functions, then A1 is said to dominate A2 if h1 is better than h2
for all the values of node n.
 Optimality Property: If an algorithm is complete, admissible,
and dominating other algorithms, it will be the best one and will
definitely give an optimal solution.

8. Define hill climbing algorithm.


 Hill climbing algorithm is a local search algorithm which
continuously moves inthe direction of increasing elevation/value to find
the peak of the mountain or best solution to the problem.
 It terminates when it reaches a peak value where no neighbor has a higher
value.
 Hill climbing algorithm is a technique which is used for optimizing the
mathematical problems. One of the widely discussed examples of Hill
climbing algorithm is Traveling-salesman Problem in which we need to
minimize the distance traveled by the salesman.
 It is also called greedy local search as it only looks to its good
immediate neighbor state and not beyond that.
 A node of hill climbing algorithm has two components which are state and
value.
 Hill Climbing is mostly used when a good heuristic is available.
 In this algorithm, we don't need to maintain and handle the search tree
or graph asit only keeps a single current state.

9. List out the features of hill climbing.


Following are some main features of Hill Climbing Algorithm:
 Generate and Test variant: Hill Climbing is the variant of Generate
and Test method. The Generate and Test method produce feedback
which helps to decide which direction to
move in the search space.

 Greedy approach: Hill-climbing algorithm search moves in the direction


which optimizes the cost.
 No backtracking: It does not backtrack the search space, as it does not
remember the previous states.
10. List out types of hill climbing algorithm.
 Simple hill Climbing:
 Steepest-Ascent hill-climbing:
 Stochastic hill Climbing:
11. Define online search Agent.
 After each action, an online agent receives a percept telling it
what state it has reached; from this information, it can
augment its map of the environment.
 The current map is used to decide where to go next.
 This interleaving of planning and action means that online
search algorithms are quite different from the offline search
algorithms we have seen previously.
 For example, offline algorithms such as A can expand a node in
one part of the space and then immediately expand a node in
another part of the space, because node expansion involves
simulated rather than real actions.
 An online algorithm, on the other hand, can discover
successors only for a node that it physically occupies.
 To avoid traveling all the way across the tree to expand the
next node, it seems better to expand nodes in a local order.
 Depth-first search has exactly this property because (except
when backtracking) the next node expanded is a child of the
previous node expanded.

12. What is Genetic algorithm?


A genetic algorithm (GA) is a heuristic search algorithm used to solve
search and optimization problems. This algorithm is a subset of
evolutionary algorithms, which are used in computation. Genetic
algorithms employ the
concept of genetics and natural selection to provide solutions to problems.

13. Define annealing.


Annealing is a heat treatment process that alters the microstructure of a material,
most commonly metal or glass, to change its properties. The process involves
heating the material above its recrystallization temperature, holding it at that
temperature for a specific duration, and then slowly cooling it.
14. Define Global minimum and Global maximum.
A global minimum is the lowest value a function or a set of numbers reaches across
its entire domain, while a global maximum is the highest value it reaches over the
same domain. Unlike local extrema, which are the highest or lowest points in a
small interval, global extrema represent the absolute highest and lowest
points, respectively, for the entire set or function.

15. List the various search strategies.


Search strategies are broadly categorized into three main types: uninformed,
informed, and adversarial.
Uninformed (Blind) search strategies
Uninformed search algorithms use no additional information about the goal's
location beyond the problem's definition. They explore the search space
systematically but blindly.

 Breadth-First Search (BFS): This algorithm explores the search space


level by level, expanding all nodes at the present depth before moving to the
next level.
o Best for: Finding the shortest path in an unweighted graph.
o Downsides: Can be memory-intensive for large search spaces.
 Depth-First Search (DFS): This recursive algorithm explores as far as
possible along each branch before backtracking.
o Best for: Finding a solution deep within a large search tree with low
memory usage.
o Downsides: Can get stuck in infinite loops and is not guaranteed to
find the shortest path.
 Depth-Limited Search (DLS): A variation of DFS that limits the search
depth to avoid infinite loops.
o Downsides: Can miss solutions that lie beyond the set depth limit.
 Iterative Deepening Depth-First Search (IDDFS): This combines the
benefits of BFS and DFS by repeatedly running DLS with an increasing
depth limit.
o Best for: Optimality and completeness, with low memory use.
 Uniform-Cost Search (UCS): This algorithm expands the node with the
lowest cumulative path cost from the start.
o Best for: Finding the least-cost path in a weighted graph.
 Bidirectional Search: This runs two simultaneous searches, one forward
from the initial state and one backward from the goal state, meeting in the
middle.
o Best for: Significantly reducing the search space and time
complexity.

Informed (Heuristic) search strategies


Informed search strategies use a heuristic function, or a "rule of thumb," to estimate
the cost or distance to the goal. This extra knowledge helps guide the search more
efficiently.

 Greedy Best-First Search: This algorithm expands the node that appears
closest to the goal based only on the heuristic function,

h(n)h of n
ℎ(𝑛)

o Best for: Quickly finding a solution in large state spaces.


o Downsides: It does not guarantee an optimal solution.
 A Search:* This widely used algorithm is a combination of UCS and
Greedy search. It evaluates nodes based on the function

f(n)=g(n)+h(n)f of n equals g of n plus h of n

𝑓(𝑛)=𝑔(𝑛)+ℎ(𝑛)

, where

g(n)g of n

𝑔(𝑛)

is the cost from the start and

h(n)h of n

ℎ(𝑛)

is the heuristic estimate to the goal.

o Best for: Finding the optimal path in a weighted graph.


o Downsides: Can require significant memory for large search spaces.
 Recursive Best-First Search (RBFS): A memory-efficient version of A*
that uses recursion to explore the most promising path first.
 Iterative Deepening A (IDA):** A memory-efficient variation of A* that
performs a series of depth-first searches with increasing cost thresholds.
 Beam Search: An optimization of BFS that limits the number of nodes
explored at each level to a "beam width," trading optimality for efficiency.
 Hill Climbing: A local search algorithm that iteratively moves towards a
better state by changing a single element of the solution.
o Downsides: It can get stuck in local optima.
 Simulated Annealing: A variant of hill climbing that allows for occasional
"bad" moves to escape local optima.
PART-B (13 Marks)
1. Explain details about greedy search algorithm in detail.

1.) Best-first Search Algorithm (Greedy Search):

Greedy best-first search algorithm always selects the path which appears best at that moment. It
is the combination of depth-first search and breadth-first search algorithms. It uses the heuristic
function and search. Best-first search allows us to take the advantages of both algorithms. With
the help of best-first search, at each step, we can choose the most promising node. In the best
first search algorithm, we expand the node which is closest to the goal node and the closest cost
is estimated by heuristic function, i.e.

f(n)= g(n).

Were, h(n)= estimated cost from node n to the goal.

The greedy best first algorithm is implemented by the priority queue.

Best first search algorithm:


o Step 1: Place the starting node into the OPEN list.
o Step 2: If the OPEN list is empty, Stop and return failure.
o Step 3: Remove the node n, from the OPEN list which has the lowest value of h(n), and
places it in the CLOSED list.
o Step 4: Expand the node n, and generate the successors of node n.
o Step 5: Check each successor of node n, and find whether any node is a goal node or
not. If any successor node is goal node, then return success and terminate the search,
else proceed to Step 6.
o Step 6: For each successor node, algorithm checks for evaluation function f(n), and then
check if the node has been in either OPEN or CLOSED list. If the node has not been in
both list, then add it to the OPEN list.
o Step 7: Return to Step 2.
Advantages:
o Best first search can switch between BFS and DFS by gaining the advantages of both the
algorithms.
o This algorithm is more efficient than BFS and DFS algorithms.

Disadvantages:
o It can behave as an unguided depth-first search in the worst case scenario.
o It can get stuck in a loop as DFS.
o This algorithm is not optimal.

Example:

Consider the below search problem, and we will traverse it using greedy best-first search. At
each iteration, each node is expanded using evaluation function f(n)=h(n) , which is given in the
below table.
In this search example, we are using two lists which are OPEN and CLOSED Lists.
Following are the iteration for traversing the above example.

Expand the nodes of S and put in the CLOSED list

Initialization: Open [A, B], Closed [S]

Iteration 1: Open [A], Closed [S, B]

Iteration 2: Open [E, F, A], Closed [S,


B]

: Open [E, A], Closed [S, B, F]

Iteration 3: Open [I, G, E, A], Closed [S, B, F]

: Open [I, E, A], Closed [S, B, F, G]

Hence the final solution path will be: S----> B----->F------- > G

Time Complexity: The worst case time complexity of Greedy best first search is O(bm).

Space Complexity: The worst case space complexity of Greedy best first search is
O(bm). Where, m is the maximum depth of the search space.

Complete: Greedy best-first search is also incomplete, even if the given state space is finite.

Optimal: Greedy best first search algorithm is not optimal.


2. Explain detail about A*search algorithm with an example.
A* Search Algorithm:
A* search is the most commonly known form of best-first search. It uses heuristic function h(n),
and cost to reach the node n from the start state g(n). It has combined features of UCS and
greedy best-first search, by which it solve the problem efficiently. A* search algorithm finds the
shortest path through the search space using the heuristic function. This search algorithm

expands less search tree and provides optimal result faster. A* algorithm is similar to UCS
except that it uses g(n)+h(n) instead of g(n).

In A* search algorithm, we use search heuristic as well as the cost to reach the node. Hence we
can combine both costs as following, and this sum is called as a fitness number.

Algorithm of A* search:

Step1: Place the starting node in the OPEN list.

Step 2: Check if the OPEN list is empty or not, if the list is empty then return failure and stops.

Step 3: Select the node from the OPEN list which has the smallest value of evaluation function
(g+h), if node n is goal node then return success and stop, otherwise

Step 4: Expand node n and generate all of its successors, and put n into the closed list. For each
successor n', check whether n' is already in the OPEN or CLOSED list, if not then compute
evaluation function for n' and place into Open list.

Step 5: Else if node n' is already in OPEN and CLOSED, then it should be attached to the back
pointer which reflects the lowest g(n') value.

Step 6: Return to Step 2.

Advantages:

o A* search algorithm is the best algorithm than other search algorithms.


o A* search algorithm is optimal and complete.
o This algorithm can solve very complex problems.

Disadvantages:
o It does not always produce the shortest path as it mostly based on heuristics
and approximation.
o A* search algorithm has some complexity issues.
o The main drawback of A* is memory requirement as it keeps all generated nodes in the
memory, so it is not practical for various large-scale problems.

Example:
In this example, we will traverse the given graph using the A* algorithm. The heuristic value of
all states is given in the below table so we will calculate the f(n) of each state using the formula
f(n)= g(n) + h(n), where g(n) is the cost to reach any node from start state.

Here we will use OPEN and CLOSED list.

Solution:
Initialization: {(S, 5)}

Iteration1: {(S--> A, 4), (S-->G, 10)}

Iteration2: {(S--> A-->C, 4), (S--> A-->B, 7), (S-->G, 10)}

Iteration3: {(S--> A-->C--->G, 6), (S--> A-->C--->D, 11), (S--> A-->B, 7), (S-->G, 10)}

Iteration 4 will give the final result, as S--->A--->C--->G it provides the optimal path with cost 6.

Points to remember:

o A* algorithm returns the path which occurred first, and it does not search for
all remaining paths.
o The efficiency of A* algorithm depends on the quality of heuristic.
o A* algorithm expands all nodes which satisfy the condition f(n)<="" li="">

Complete: A* algorithm is complete as long as:

o Branching factor is finite.


o Cost at every action is fixed.

Optimal: A* search algorithm is optimal if it follows below two conditions:

o Admissible: the first condition requires for optimality is that h(n) should be an
admissible heuristic for A* tree search. An admissible heuristic is optimistic in nature.
o Consistency: Second required condition is consistency for only A* graph-search.

If the heuristic function is admissible, then A* tree search will always find the least cost path.

Time Complexity: The time complexity of A* search algorithm depends on heuristic function,
and the number of nodes expanded is exponential to the depth of solution d. So the time
complexity is O(b^d), where b is the branching factor.

Space Complexity: Th
.3. Explain detail about Heuristic functions in detail.

Heuristic Functions in AI: As we have already seen that an informed search make use of
heuristic functions in order to reach the goal node in a more prominent way. Therefore, there
are several pathways in a search tree to reach the goal node from the current node. The
selection of a good heuristic function matters certainly. A good heuristic function is determined
by its efficiency. More is the information about the problem, more is the processing time.

Some toy problems, such as 8-puzzle, 8-queen, tic-tac-toe, etc., can be solved more efficiently
with the help of a heuristic function. Let’s see how
Consider the following 8-puzzle problem where we have a start state and a goal state. Our task is
to slide the tiles of the current/start state and place it in an order followed in the goal state. There
can be four moves either left, right, up, or down. There can be several ways to convert the
current/start state to the goal state, but, we can use a heuristic function h(n) to solve the problem
more efficiently.

A heuristic function for the 8-puzzle problem is defined below:

h(n)=Number of tiles out of position.


So, there is total of three tiles out of position i.e., 6,5 and 4. Do not count the empty tile present
in the goal state). i.e. h(n)=3. Now, we require to minimize the value of h(n) =0.
We can construct a state-space tree to minimize the h(n) value to 0, as shown below:
It is seen from the above state space tree that the goal state is minimized from h(n)=3 to h(n)=0. However, we can create
and use several heuristic functions as per the reqirement. It is also clear from the above example that a heuristic
function h(n) can be defined as the information required to solve a given problem more efficiently. The information can
be related to the nature of the state, cost of transforming from one state to another, goal node characterstics, etc.,
which is expressed as a heuristic function//.

4. Explain detail about Hill Climbing Algorithm with an example.

Hill Climbing Algorithm

It is a technique for optimizing the mathematical problems. Hill Climbing is widely used when a
good heuristic is available.

It is a local search algorithm that continuously moves in the direction of increasing


elevation/value to find the mountain's peak or the best solution to the problem. It terminates
when it reaches a peak value where no neighbor has a higher value. Traveling-salesman Problem
is one of the widely discussed examples of the Hill climbing algorithm, in which we need to
minimize the distance traveled by the salesman.

It is also called greedy local search as it only looks to its good immediate neighbor state and not
beyond that. The steps of a simple hill-climbing algorithm are listed below:
Step 1: Evaluate the initial state. If it is the goal state, then return success and Stop.

Step 2: Loop Until a solution is found or there is no new operator left to apply.

Step 3: Select and apply an operator to the current state.

Step 4: Check new state:

If it is a goal state, then return to success and quit.

Else if it is better than the current state, then assign a new state as a current state.

Else if not better than the current state, then return to step2.

Step 5: Exit.
5. Explain detail about Steepest-Ascent hill climbing and stochastic hill climbing algorithm.

Hill Climbing Algorithm in AI

Hill Climbing Algorithm: Hill climbing search is a local search problem. The purpose of the hill
climbing search is to climb a hill and reach the topmost peak/ point of that hill. It is based on
the heuristic search technique where the person who is climbing up on the hill estimates the
direction which will lead him to the highest peak.
State-space Landscape of Hill climbing algorithm
To understand the concept of hill climbing algorithm, consider the below landscape representing
the goal state/peak and the current state of the climber. The topographical regions shown in the
figure can be defined as:

 Global Maximum: It is the highest point on the hill, which is the goal state.
 Local Maximum: It is the peak higher than all other peaks but lower than the global
maximum.
 Flat local maximum: It is the flat area over the hill where it has no uphill or downhill. It
is a saturated point of the hill.
 Shoulder: It is also a flat area where the summit is possible.
 Current state: It is the current position of the person.

Types of Hill climbing search algorithm


There are following types of hill-climbing search:

 Simple hill climbing


 Steepest-ascent hill climbing
 Stochastic hill climbing
 Random-restart hill climbing
Simple hill climbing search
Simple hill climbing is the simplest technique to climb a hill. The task is to reach the highest
peak of the mountain. Here, the movement of the climber depends on his move/steps. If he finds
his next step better than the previous one, he continues to move else remain in the same state.
This search focus only on his previous and next step.
Simple hill climbing Algorithm

1. Create a CURRENT node, NEIGHBOUR node, and a GOAL node.


2. If the CURRENT node=GOAL node, return GOAL and terminate the search.
3. Else CURRENT node<= NEIGHBOUR node, move ahead.
4. Loop until the goal is not reached or a point is not found.

Steepest-ascent hill climbing


Steepest-ascent hill climbing is different from simple hill climbing search. Unlike simple hill
climbing search, It considers all the successive nodes, compares them, and choose the node
which is closest to the solution. Steepest hill climbing search is similar to best-first
search because it focuses on each node instead of one.
Note: Both simple, as well as steepest-ascent hill climbing search, fails when there is no closer
node.
Steepest-ascent hill climbing algorithm

1. Create a CURRENT node and a GOAL node.


2. If the CURRENT node=GOAL node, return GOAL and terminate the search.
3. Loop until a better node is not found to reach the solution.
4. If there is any better successor node present, expand it.
5. When the GOAL is attained, return GOAL and terminate.

Stochastic hill climbing


Stochastic hill climbing does not focus on all the nodes. It selects one node at random and
decides whether it should be expanded or search for a better one.

Random-restart hill climbing


Random-restart algorithm is based on try and try strategy. It iteratively searches the node and
selects the best one at each step until the goal is not found. The success depends most commonly
on the shape of the hill. If there are few plateaus, local maxima, and ridges, it becomes easy to
reach the destination.

Limitations of Hill climbing algorithm


Hill climbing algorithm is a fast and furious approach. It finds the solution state rapidly because
it is quite easy to improve a bad state. But, there are following limitations of this search:

 Local Maxima: It is that peak of the mountain which is highest than all its neighboring
states but lower than the global maxima. It is not the goal peak because there is another peak higher than
it.

 Plateau: It is a flat surface area where no uphill exists. It becomes difficult for the
climber to decide that in which direction he should move to reach the goal point.
Sometimes, the person gets lost in the flat area.

 Ridges: It is a challenging problem where the person finds two or more local maxima of
the same height commonly. It becomes difficult for the person to navigate the right
point and stuck to that point itself.
6. Explain detail about Online search Agent with an example.
2. Online Search Agents and Unknown Environments:
7. Write brief about following local search algorithms: (i)Simulated annealing
(ii) Local beam search

3. Local Search Algorithms and Optimization Problem:


The informed and uninformed search expands the nodes systematically in two ways:

 keeping different paths in the memory and


 selecting the best suitable path,

Which leads to a solution state required to reach the goal node. But beyond these “classical
search algorithms," we have some “local search algorithms” where the path cost does not
matters, and only focus on solution-state needed to reach the goal node.

A local search algorithm completes its task by traversing on a single current node rather than
multiple paths and following the neighbors of that node generally.

Although local search algorithms are not systematic, still they have the
following two advantages:

 Local search algorithms use a very little or constant amount of memory as they
operate only on a single path.
 Most often, they find a reasonable solution in large or infinite state spaces where
the classical or systematic algorithms do not work.

Does the local search algorithm work for a pure optimized problem?
Yes, the local search algorithm works for pure optimized problems. A pure optimization problem
is one where all the nodes can give a solution. But the target is to find the best state out of all
according to the objective function. Unfortunately, the pure optimization problem fails to find
high-quality solutions to reach the goal state from the current state.
Note: An objective function is a function whose value is either minimized or maximized in
different contexts of the optimization problems. In the case of search algorithms, an objective
function can be the path cost for reaching the goal node, etc.
Working of a Local search algorithm
Let's understand the working of a local search algorithm with the help of an example:
Consider the below state-space landscape having both:

 Location: It is defined by the state.


Elevation: It is defined by the value of the objective function or heuristic cost function
 .

The local search algorithm explores the above landscape by finding the following two points:

 Global Minimum: If the elevation corresponds to the cost, then the task is to find
the lowest valley, which is known as Global Minimum.
 Global Maxima: If the elevation corresponds to an objective function, then it finds the
highest peak which is called as Global Maxima. It is the highest point in the valley.

We will understand the working of these points better in Hill-climbing search.


Below are some different types of local searches:

 Hill-climbing Search
 Simulated Annealing
 Local Beam Search

Hill Climbing Algorithm in AI

Hill Climbing Algorithm: Hill climbing search is a local search problem. The purpose of the hill
climbing search is to climb a hill and reach the topmost peak/ point of that hill. It is based on

the heuristic search technique where the person who is climbing up on the hill estimates the
direction which will lead him to the highest peak.
State-space Landscape of Hill climbing algorithm
To understand the concept of hill climbing algorithm, consider the below landscape representing
the goal state/peak and the current state of the climber. The topographical regions shown in the
figure can be defined as:

 Global Maximum: It is the highest point on the hill, which is the goal state.
 Local Maximum: It is the peak higher than all other peaks but lower than the global
maximum.
 Flat local maximum: It is the flat area over the hill where it has no uphill or downhill. It
is a saturated point of the hill.
 Shoulder: It is also a flat area where the summit is possible.
 Current state: It is the current position of the person.
Types of Hill climbing search algorithm
There are following types of hill-climbing search:

 Simple hill climbing


 Steepest-ascent hill climbing
 Stochastic hill climbing
 Random-restart hill climbing


Simple hill climbing search
Simple hill climbing is the simplest technique to climb a hill. The task is to reach the highest
peak of the mountain. Here, the movement of the climber depends on his move/steps. If he finds
his next step better than the previous one, he continues to move else remain in the same state.
This search focus only on his previous and next step.
Simple hill climbing Algorithm

5. Create a CURRENT node, NEIGHBOUR node, and a GOAL node.


6. If the CURRENT node=GOAL node, return GOAL and terminate the search.
7. Else CURRENT node<= NEIGHBOUR node, move ahead.
8. Loop until the goal is not reached or a point is not found.

Steepest-ascent hill climbing


Steepest-ascent hill climbing is different from simple hill climbing search. Unlike simple hill
climbing search, It considers all the successive nodes, compares them, and choose the node
which is closest to the solution. Steepest hill climbing search is similar to best-first
search because it focuses on each node instead of one.
Note: Both simple, as well as steepest-ascent hill climbing search, fails when there is no closer
node.
Steepest-ascent hill climbing algorithm

6. Create a CURRENT node and a GOAL node.


7. If the CURRENT node=GOAL node, return GOAL and terminate the search.
8. Loop until a better node is not found to reach the solution.
9. If there is any better successor node present, expand it.
10. When the GOAL is attained, return GOAL and terminate.

Stochastic hill climbing


Stochastic hill climbing does not focus on all the nodes. It selects one node at random and
decides whether it should be expanded or search for a better one.

Random-restart hill climbing


Random-restart algorithm is based on try and try strategy. It iteratively searches the node and
selects the best one at each step until the goal is not found. The success depends most commonly
on the shape of the hill. If there are few plateaus, local maxima, and ridges, it becomes easy to
reach the destination.

Limitations of Hill climbing algorithm


Hill climbing algorithm is a fast and furious approach. It finds the solution state rapidly because
it is quite easy to improve a bad state. But, there are following limitations of this search:

 Local Maxima: It is that peak of the mountain which is highest than all its neighboring
states but lower than the global maxima. It is not the goal peak because there is another peak higher than
it.

 Plateau: It is a flat surface area where no uphill exists. It becomes difficult for the
climber to decide that in which direction he should move to reach the goal point.
Sometimes, the person gets lost in the flat area.

 Ridges: It is a challenging problem where the person finds two or more local maxima of
the same height commonly. It becomes difficult for the person to navigate the right
point and stuck to that point itself.
Simulated Annealing
Simulated annealing is similar to the hill climbing algorithm. It works on the current situation. It
picks a random move instead of picking the best move. If the move leads to the improvement
of the current situation, it is always accepted as a step towards the solution state, else it accepts
the move having a probability less than 1. This search technique was first used in 1980 to
solve VLSI layout problems. It is also applied for factory scheduling and other large
optimization tasks.
Local Beam Search
Local beam search is quite different from random-restart search. It keeps track of k states instead
of just one. It selects k randomly generated states, and expand them at each step. If any state is a
goal state, the search stops with success. Else it selects the best k successors from the complete
list and repeats the same process. In random-restart search where each search process runs
independently, but in local beam search, the necessary information is shared between the parallel
search processes.
Disadvantages of Local Beam search

 This search can suffer from a lack of diversity among the k states.
 It is an expensive version of hill climbing search.
UNIT III GAME PLAYING AND CSP
Game theory – optimal decisions in games – Alpha-beta search – Monte-carlo tree search –
Stochastic games – partially observable games. Constraint satisfaction problems – constraint
propagation backtracking search for CSP – local search for CSP – structure of CSP.
PART-A (2 Marks)
1. Define game theory.
Game : Any set of circumstances that has a result dependent on
the actions of two or moredecision-makers (players)

 Players : A strategic decision-maker within the context of the game


 Strategy: A complete plan of action a player will take given the set of

circumstances thatmight arise within the game


 Payoff : The payout a player receives from arriving at a particular
outcome (The payoutcan be in any quantifiable form, from
dollars to utility.)

 We first consider games with two players, whom we call MAX and
MIN for reasons thatwill soon become obvious.
 MAX moves first, and then they take turns moving until the game is over.

 At the end of the game, points are awarded to the winning player and
penalties are givento the loser.
2. List out the types of games in AI.
 Perfect information: A game with the perfect information is that in
which agents can look into the complete board. Agents have all the
information about the game, and they can see each other moves also.
Examples are Chess, Checkers, Go, etc.

 Imperfect information: If in a game agents do not have all information


about the game and not aware with what's going on, such type of games
are called the game with imperfect information, such as tic-tac-toe,
Battleship, blind, Bridge, etc.

 Deterministic games: Deterministic games are those games which


follow a strict pattern and set of rules for the games, and there is no
randomness associated with them. Examples are chess, Checkers, Go, tic-
tac-toe, etc.

 Non-deterministic games: Non-deterministic are those games which have


various unpredictable events and has a factor of chance or luck. This
factor of chance or luck is introduced by either dice or cards. These are
random, and each action response is not fixed. Such games are also called
as stochastic games. Example: Backgammon, Monopoly, Poker, etc.

3. Write a applications of game theory.


The game theory is widely applied to study human as well as animal behaviors. It is
utilized in economics to understand the economic behaviors, such as behaviors of
consumers, markets and firms. Game theory has been commonly used in social
sciences as well. It is applied in the study of sociological, political and psychological
behaviors. The use of analysis based on game theory is seen in biology too. In addition
to behavioral prediction, game theory utilized in the development of theories of
normative or ethical behavior.

 The following are just a few examples of game theory applications:


 Stock trades and the investors’ reactions and
decisions against

 stock marketdevelopments and the behaviors and


decisions of other investors
 OPEC member countries’ decision to change the
amount of oil extraction and sale andtheir
compliance or non-compliance with quota
arrangements
 Corporate behavior regarding product pricing in
monopoly or multilateral competition markets
 Animal interaction with one another in social
life (hunting or sharing achievements
orsupporting each other)

4. List out the elements of game theory.


 A game can be defined as a type of search in AI which
can be formalized of thefollowing elements:
 Initial state: It specifies how the game is set up at the start.
 Player(s): It specifies which player has moved in the state space.
 Action(s): It returns the set of legal moves in state space.
 Result(s, a): It is the transition model, which specifies the
result of moves in the statespace.
 Terminal-Test(s): Terminal test is true if the game is over,
else it is false at any case. The state where the game ends is
called terminal states.
 Utility(s, p): A utility function gives the final numeric value
for a game that ends interminal states s for player p. It is also
called payoff function. For Chess, the outcomes are a win, loss,
or draw and its payoff values are +1, 0, ½. And for tic-tac-
toe, utility values are +1, -1, and 0.

.5 . What is maximizer and minimizer.


The two participants in Minimax are referred to as the maximizer and
minimizer. The maximizer strives to achieve the maximum score, whereas the

minimizer seeks to achieve the lowest score. A value accompanies


each board state.

6. What is Tic -Tac -To.


 Mini-max algorithm is a recursive or backtracking algorithm which
is used indecision-making and game theory.
 It provides an optimal move for the player assuming that
opponent is also playingoptimally.
 Mini-Max algorithm uses recursion to search through the [Link]-
Max algorithm is mostly used for game playing in AI.
 Such as Chess, Checkers, tic-tac-toe, go, and various tow-players game.
 This Algorithm computes the minimax decision for the current state.

7. What is Alpha-Beta pruning.


 Alpha-beta pruning is a modified version of the minimax algorithm.
 It is an optimization technique for the minimax algorithm.
 As we have seen in the minimax search algorithm that the number
of game statesit has to examine are exponential in depth of the
tree.
 There is a technique by which without checking each node of the
game tree wecan compute the correct minimax decision, and this
technique is called pruning.
 This involves two threshold parameter Alpha and beta for
future expansion, so itis called alpha-beta pruning.
 It is also called as Alpha-Beta Algorithm.
 Alpha-beta pruning can be applied at any depth of a tree, and
sometimes it notonly prune the tree leaves but also entire sub-
tree.

8. Define Monte Carlo Tree Search algorithm.


 The Games like Tic-Tac-Toe, Rubik’s Cube, Sudoku, Chess, Go and
many others have common property that lead to exponential increase in
the number of possible actions thatcan be played.
 These possible steps increase exponentially as the game goes forward.
 Ideally if you can predict every possible move and its result that may occur in the
future.
 You can increase your chance of winning.
 But since the moves increase exponentially — the computation power
that is required to calculate the moves also goes through the roof.
 Monte Carlo Tree Search is a method usually used in games to predict
the path (moves)that should be taken by the policy to reach the final
winning solution

9. Write a advantages and disadvantages of Monte Carlo search algorithm.


Advantages of Monte Carlo Tree Search:

1. MCTS is a simple algorithm to implement.


2. Monte Carlo Tree Search is a heuristic algorithm. MCTS can operate
effectively without any knowledge in the particular domain, apart from the rules and
end conditions, and can find its own moves and learn from them by playing
random playouts.
3. The MCTS can be saved in any intermediate state and that state can be
used in futureuse cases whenever required.
4. MCTS supports asymmetric expansion of the search tree based on the
circumstances in which it is operating.

Disadvantages of Monte Carlo Tree Search:

1. As the tree growth becomes rapid after a few iterations, it requires a huge
amount ofmemory.
2. There is a bit of a reliability issue with Monte Carlo Tree Search. In
certain scenarios,there might be a single branch or path, that might
lead to loss against the opposition

when implemented for those turn-based games. This is mainly due to the vast
amount ofcombinations and each of the nodes might not be visited enough
number of times to understand its result or outcome in the long run.
3. MCTS algorithm needs a huge number of iterations to be able to effectively
decide the most efficient path. So, there is a bit of a speed issue there.

10. Define stochastic games.


 In real life, many unpredictable external events can put us into
unforeseensituations.
 Many games mirror this unpredictability by including a random element,
such asthe throwing of dice.
 We call these stochastic games.
 Backgammon is a typical game that combines luck and skill.

 Dice are rolled at the beginning of a player’s turn to determine the legal moves.

11. Define card games.

Card games provide many examples of stochastic partial observability, where the
missing information is generated randomly. For example, in many games, cards
are dealt randomly at the

beginning of the game, with each player receiving a hand that is not
visible to the other players. Such games include bridge, whist, hearts, and
some forms of poker.
At first sight, it might seem that these card games are just like dice games: the
cards are dealt randomly and determine the moves available to each player,
but all the “dice” are rolled at the beginning! Even though this analogy turns
out to be incorrect, it suggests an effective algorithm: consider all possible
deals of the invisible cards; solve each one as if it were a fully observable
game; and then choose the move that has the best outcome averaged over all
the deals. Suppose that each deal s occurs with probability P(s); then the
move we want is

12. Define constraint satisfaction problem.


Constraint satisfaction is a technique where a problem is solved when its
values satisfy certain constraints or rules of the problem. Such type of
technique leads to a deeper understanding of the problem structure as well as
its complexity.

13. List out types domains in CSP.


There are following two types of domains which are used by the variables :
 Discrete Domain: It is an infinite domain which can have
one state for multiple variables. For example, a start state
can be allocated infinite times for each variable.
 Finite Domain: It is a finite domain which can have
continuous states describing one domain for one specific
variable. It is also called a continuous domain.

14. Define backtracking search.


A backtracking search algorithm performs a depth-first traversal of a search
tree, where the branches out of a node represent alternative choices that may
have to be examined in order to find a solution, and the constraints
are used to prune subtrees containing no solutions.

[Link] is Tree Decomposition.

In AI, a decomposition tree can refer to two main concepts: a visual analysis tool that helps
users explore data hierarchically and perform root cause analysis, particularly in Power BI, and a graph theory
technique where a graph is mapped to a tree structure to simplify complex problems. The visual tool
uses artificial intelligence to automatically suggest dimensions to drill into, while the graph theory
technique is applied to algorithms for problems like NP-hard problems and constraint satisfaction
PART-B (15 Marks)
1. Explain detail about game theory and its applications.

1. Game Theory:
Game theory is basically a branch of mathematics that is used to typical strategic interaction
between different players (agents), all of which are equally rational, in a context with
predefined rules (of playing or maneuvering) and outcomes. Every player or agent is a rational
entity who is selfish and tries to maximize the reward to be obtained using a particular
strategy. All the players abide by certain rules in order to receive a predefined playoff- a
reward after a certain outcome. Hence, a GAME can be defined as a set of players, actions,
strategies, and a final playoff for which all the players are competing.
Game Theory has now become a describing factor for both Machine Learning algorithms and
many daily life situations.

Consider the SVM (Support Vector Machine) for instance. According to Game Theory, the
SVM is a game between 2 players where one player challenges the other to find the best hyper-
plane after providing the most difficult points for classification. The final playoff of this game
is a solution that will be a trade-off between the strategic abilities of both players competing.

Nash equilibrium:
Nash equilibrium can be considered the essence of Game Theory. It is basically a state, a point
of equilibrium of collaboration of multiple players in a game. Nash Equilibrium guarantees
maximum profit to each player.
Let us try to understand this with the help of Generative Adversarial Networks (GANs).

What is GAN?
It is a combination of two neural networks: the Discriminator and the Generator. The
Generator Neural Network is fed input images which it analyzes and then produces new
sample images, which are made to represent the actual input images as close as possible. Once
the images have been produced, they are sent to the Discriminator Neural Network. This neural
network judges the images sent to it and classifies them as generated images and actual input
images. If the image is classified as the original image, the DNN changes its parameters of
judging. If the image is classified as a generated image, the image is rejected and returned to
the GNN. The GNN then alters its parameters in order to improve the quality of the image
produced.
This is a competitive process which goes on until both neural networks do not require to make
any changes in their parameters and there can be no further improvement in both neural
networks. This state of no further improvement is known as NASH EQUILIBRIUM. In other
words, GAN is a 2-player competitive game where both players are continuously optimizing
themselves to find a Nash Equilibrium.

But how do we know if the game has reached Nash Equilibrium?


In any game, one of the agents is required to disclose their strategy in front of the other agents.
After the revelation, if none of the players changes their strategies, it is understood that the
game has reached Nash Equilibrium.

Now that we are aware of the basics of Game Theory, let us try to understand how Nash
Equilibrium is attained in a simultaneous game. There are many examples but the most famous
is the Prisoner’s Dilemma. There are some more examples such as the Closed-bag exchange
Game, the Friend or For Game, and the iterated Snowdrift Game.

In all these games, two players are involved and the final playoff is a result of a decision that
has to be made by both players. Both players have to make a choice between defection and co-
operation. If both players cooperate, the final playoff will turn out to be positive for both.
However, if both defect, the final playoff will be negative for both players. If there is a
combination of one player defecting and the other co-operating, the final playoff will be
positive for one and negative for another.

Here, Nash Equilibrium plays an important role. Only if both players jot out a strategy that
benefits each other and provide both with a positive playoff, the solution to this problem will
be optimal.

There are many more real examples and a number of pieces of code that try to solve this
dilemma. The basic essence, however, is the attainment of the Nash Equilibrium in an
uncomfortable situation.

Where is GAME THEORY now?


Game Theory is increasingly becoming a part of the real-world in its various applications in
areas like public health services, public safety, and wildlife. Currently, game theory is being
used in adversary training in GANs, multi-agent systems, and imitation and reinforcement
learning. In the case of perfect information and symmetric games, many Machine Learning and
Deep Learning techniques are applicable. The real challenge lies in the development of
techniques to handle incomplete information games, such as Poker. The complexity of the
game lies in the fact that there are too many combinations of cards and the uncertainty of the
cards being held by the various players.

Types of Games:
Currently, there are about 5 types of classification of games. They are as follows:
1. Zero-Sum and Non-Zero Sum Games: In non-zero-sum games, there are multiple players
and all of them have the option to gain a benefit due to any move by another player. In zero-
sum games, however, if one player earns something, the other players are bound to lose
a key playoff.
2. Simultaneous and Sequential Games: Sequential games are the more popular games where
every player is aware of the movement of another player. Simultaneous games are more
difficult as in them, the players are involved in a concurrent game. BOARD GAMES are the
perfect example of sequential games and are also referred to as turn-based or extensive-
form games.

Perfect Information game.


4. Asymmetric and Symmetric Games: Asymmetric games are those win in which each player
has a different and usually conflicting final goal. Symmetric games are those in which all
players have the same ultimate goal but the strategy being used by each is completely
different.
5. Co-operative and Non-Co-operative Games: In non-co-operative games, every player plays
for himself while in co-operative games, players form alliances in order to achieve the final
goal.

2. Explain detail about MINIMAX algorithm.

Humans’ intellectual capacities have been engaged by games for as long as civilization has
existed, sometimes to an alarming degree. Games are an intriguing subject for AI researchers
because of their abstract character. A game’s state is simple to depict, and actors are usually
limited to a small number of actions with predetermined results. Physical games, such as
croquet and ice hockey, contain significantly more intricate descriptions, a much wider variety
of possible actions, and rather ambiguous regulations defining the legality of activities. With
the exception of robot soccer, these physical games have not piqued the AI community’s
interest.
Games are usually intriguing because they are difficult to solve. Chess, for example, has an
average branching factor of around 35, and games frequently stretch to 50 moves per player,
therefore the search tree has roughly 35100 or 10154 nodes (despite the search graph having
“only” about 1040 unique nodes). As a result, games, like the real world, necessitate the ability
to make some sort of decision even when calculating the best option is impossible.
Inefficiency is also heavily punished in games. Whereas a half-efficient implementation of A
search will merely take twice as long to complete, a chess software that is half as efficient in
utilizing its available time will almost certainly be beaten to death, all other factors being
equal. As a result of this research, a number of intriguing suggestions for making the most use
of time have emerged.

Optimal Decision Making in Games

Let us start with games with two players, whom we’ll refer to as MAX and MIN for obvious
reasons. MAX is the first to move, and then they take turns until the game is finished. At the
conclusion of the game, the victorious player receives points, while the loser receives
penalties. A game can be formalized as a type of search problem that has the following
elements:
 S0: The initial state of the game, which describes how it is set up at the start.
 Player (s): Defines which player in a state has the move.
 Actions (s): Returns a state’s set of legal moves.
 Result (s, a): A transition model that defines a move’s outcome.
 Terminal-Test (s): A terminal test that returns true if the game is over but false otherwise.
Terminal states are those in which the game has come to a conclusion.
 Utility (s, p): A utility function (also known as a payout function or objective function )
determines the final numeric value for a game that concludes in the terminal state s for
player p. The result in chess is a win, a loss, or a draw, with values of +1, 0, or 1/2.
Backgammon’s payoffs range from 0 to +192, but certain games have a greater range of
possible outcomes. A zero-sum game is defined (confusingly) as one in which the total
reward to all players is the same for each game instance. Chess is a zero-sum game because
each game has a payoff of 0 + 1, 1 + 0, or 1/2 + 1/2. “Constant-sum” would have been a
preferable name, 22 but zero-sum is the usual term and makes sense if each participant is
charged 1.

The game tree for the game is defined by the beginning state, ACTIONS function, and
RESULT function—a tree in which the nodes are game states and the edges represent
movements. The figure below depicts a portion of the tic-tac-toe game tree (noughts and
crosses). MAX may make nine different maneuvers from his starting position. The game
alternates between MAXs setting an X and MINs placing an O until we reach leaf nodes
corresponding to terminal states, such as one player having three in a row or all of the
squares being filled. The utility value of the terminal state from the perspective of MAX is
shown by the number on each leaf node; high values are thought to be beneficial for MAX
and bad for MIN
The game tree for tic-tac-toe is relatively short, with just 9! = 362,880 terminal nodes.
However, because there are over 1040 nodes in chess, the game tree is better viewed as a
theoretical construct that cannot be realized in the actual world. But, no matter how big the
game tree is, MAX’s goal is to find a solid move. A tree that is superimposed on the whole
game tree and examines enough nodes to allow a player to identify what move to make is
referred to as a search tree.

A sequence of actions leading to a goal state—a terminal state that is a win—would be the best
solution in a typical search problem. MIN has something to say about it in an adversarial
search. MAX must therefore devise a contingent strategy that specifies M A X’s initial state
move, then MAX’s movements in the states resulting from every conceivable MIN response,
then MAX’s moves in the states resulting from every possible MIN reaction to those moves,
and so on. This is quite similar to the AND-OR search method, with MAX acting as OR and
MIN acting as AND. When playing an infallible opponent, an optimal strategy produces results
that are as least as excellent as any other plan. We’ll start by demonstrating how to find the
best plan.
We’ll move to the trivial game in the figure below since even a simple game like tic-tac-toe is
too complex for us to draw the full game tree on one page. MAX’s root node moves are
designated by the letters a1, a2, and a3. MIN’s probable answers to a1 are b1, b2, b3, and so
on. This game is over after MAX and MIN each make one move. (In game terms, this tree
consists of two half-moves and is one move deep, each of which is referred to as a ply.) The
terminal states in this game have utility values ranging from 2 to 14.

Game’s Utility Function

The optimal strategy can be found from the minimax value of each node, which we express as
MINIMAX, given a game tree (n). Assuming that both players play optimally from there
through the finish of the game, the utility (for MAX) of being in the corresponding state is the
node’s minimax value. The usefulness of a terminal state is obviously its minimax value.
Furthermore, if given the option, MAX prefers to shift to a maximum value state, whereas
MIN wants to move to a minimum value state. So here’s what we’ve got:

Let’s use these definitions to analyze the game tree shown in the figure above. The game’s
UTILITY function provides utility values to the terminal nodes on the bottom level. Because
the first MIN node, B, has three successor states with values of 3, 12, and 8, its minimax value
is 3. Minimax value 2 is also used by the other two MIN nodes. The root node is a MAX node,
with minimax values of 3, 2, and 2, resulting in a minimax value of 3. We can also find the
root of the minimax decision: action a1 is the best option for MAX since it leads to the highest
minimax value.
This concept of optimal MAX play requires that MIN plays optimally as well—it maximizes
MAX’s worst-case outcome. What happens if MIN isn’t performing at its best? Then it’s a
simple matter of demonstrating that MAX can perform even better. Other strategies may
outperform the minimax method against suboptimal opponents, but they will always
outperform optimal opponents.

[Link] detail about alpha-beta search algorithm.

3. Alpha-Beta Pruning Search:

o Alpha-beta pruning is a modified version of the minimax algorithm. It is an


optimization technique for the minimax algorithm.
o As we have seen in the minimax search algorithm that the number of game states it has
to examine are exponential in depth of the tree. Since we cannot eliminate the
exponent, but we can cut it to half. Hence there is a technique by which without
checking each node of the game tree we can compute the correct minimax decision,
and this technique is called pruning. This involves two threshold parameter Alpha and
beta for future expansion, so it is called alpha-beta pruning. It is also called as Alpha-
Beta Algorithm.
o Alpha-beta pruning can be applied at any depth of a tree, and sometimes it not only
prune the tree leaves but also entire sub-tree.
o The two-parameter can be defined as:
a. Alpha: The best (highest-value) choice we have found so far at any point along
the path of Maximizer. The initial value of alpha is -∞.
b. Beta: The best (lowest-value) choice we have found so far at any point along the
path of Minimizer. The initial value of beta is +∞.
The Alpha-beta pruning to a standard minimax algorithm returns the same move as the
standard algorithm does, but it removes all the nodes which are not really affecting the final
decision but making algorithm slow. Hence by pruning these nodes, it makes the algorithm
fast.

Condition for Alpha-beta pruning:

The main condition which required for alpha-beta pruning

α>=β

Key points about alpha-beta pruning:


o The Max player will only update the value of alpha.
o The Min player will only update the value of beta.
o While backtracking the tree, the node values will be passed to upper nodes instead
of values of alpha and beta.
o We will only pass the alpha, beta values to the child nodes.

Working of Alpha-Beta Pruning:

Let's take an example of two-player search tree to understand the working of Alpha-beta
pruning

Step 1: At the first step the, Max player will start first move from node A where α= -∞ and
β= +∞, these value of alpha and beta passed down to node B where again α= -∞ and β= +∞,
and Node B passes the same value to its child D.
Step 2: At Node D, the value of α will be calculated as its turn for Max. The value of α is
compared with firstly 2 and then 3, and the max (2, 3) = 3 will be the value of α at node D
and node value will also 3.

Step 3: Now algorithm backtrack to node B, where the value of β will change as this is a
turn of Min, Now β= +∞, will compare with the available subsequent nodes value, i.e. min
(∞, 3) = 3, hence at node B now α= -∞, and β= 3.
In the next step, algorithm traverse the next successor of Node B which is node E, and the
values of α= -∞, and β= 3 will also be passed.

Step 4: At node E, Max will take its turn, and the value of alpha will change. The current
value of alpha will be compared with 5, so max (-∞, 5) = 5, hence at node E α= 5 and β= 3,
where α>=β, so the right successor of E will be pruned, and algorithm will not traverse it,
and the value at node E will be 5.
Step 5: At next step, algorithm again backtrack the tree, from node B to node A. At node
A, the value of alpha will be changed the maximum available value is 3 as max (-∞, 3)= 3,
and β= +∞, these two values now passes to right successor of A which is Node C.

At node C, α=3 and β= +∞, and the same values will be passed on to node F.

Step 6: At node F, again the value of α will be compared with left child which is 0, and
max(3,0)= 3, and then compared with right child which is 1, and max(3,1)= 3 still α
remains 3, but the node value of F will become 1.
Step 7: Node F returns the node value 1 to node C, at C α= 3 and β= +∞, here the value of
beta will be changed, it will compare with 1 so min (∞, 1) = 1. Now at C, α=3 and β= 1,
and again it satisfies the condition α>=β, so the next child of C which is G will be pruned,
and the algorithm will not compute the entire sub-tree G.
Step 8: C now returns the value of 1 to A here the best value for A is max (3, 1) = 3.
Following is the final game tree which is the showing the nodes which are computed and
nodes which has never computed. Hence the optimal value for the maximizer is 3 for this
example.

Move Ordering in Alpha-Beta pruning:

The effectiveness of alpha-beta pruning is highly dependent on the order in which each
node is examined. Move order is an important aspect of alpha-beta pruning.

It can be of two types:

Worst ordering: In some cases, alpha-beta pruning algorithm does not prune any of the
leaves of the tree, and works exactly as minimax algorithm. In this case, it also consumes
more time because of alpha-beta factors, such a move of pruning is called worst ordering.
In this case, the best move occurs on the right side of the tree. The time complexity for
such an order is O(bm).

Ideal ordering: The ideal ordering for alpha-beta pruning occurs when lots of pruning
happens in the tree, and best moves occur at the left side of the tree. We apply DFS hence it
first search left of the tree and go deep twice as minimax algorithm in the same amount of
time. Complexity in ideal ordering is O(bm/2).

Rules to find good ordering:

Following are some rules to find good ordering in alpha-beta pruning:

o Occur the best move from the shallowest node.


o Order the nodes in the tree such that the best nodes are checked first.
o Use domain knowledge while finding the best move. Ex: for Chess, try order: captures
first, then threats, then forward moves, backward moves.
We can bookkeep the states, as there is a possibility that states may repeat.

[Link] detail about Monte Carlo tree search algorithm.

4. Monte Carlo Tree Search (MCTS):

Monte Carlo Tree Search (MCTS) is a search technique in the field of Artificial Intelligence
(AI). It is a probabilistic and heuristic driven search algorithm that combines the classic tree
search implementations alongside machine learning principles of reinforcement learning.
In tree search, there’s always the possibility that the current best action is actually not the most
optimal action. In such cases, MCTS algorithm becomes useful as it continues to evaluate other
alternatives periodically during the learning phase by executing them, instead of the current
perceived optimal strategy. This is known as the ” exploration-exploitation trade-off “. It
exploits the actions and strategies that is found to be the best till now but also must continue to
explore the local space of alternative decisions and find out if they could replace the current
best.

Exploration helps in exploring and discovering the unexplored parts of the tree, which could
result in finding a more optimal path. In other words, we can say that exploration expands the
tree’s breadth more than its depth. Exploration can be useful to ensure that MCTS is not
overlooking any potentially better paths. But it quickly becomes inefficient in situations with
large number of steps or repetitions. In order to avoid that, it is balanced out by exploitation.
Exploitation sticks to a single path that has the greatest estimated value. This is a greedy
approach and this will extend the tree’s depth more than its breadth. In simple words, UCB
formula applied to trees helps to balance the exploration-exploitation trade-off by periodically
exploring relatively unexplored nodes of the tree and discovering potentially more optimal
paths than the one it is currently exploiting.
For this characteristic, MCTS becomes particularly useful in making optimal decisions in
Artificial Intelligence (AI) problems.

Monte Carlo Tree Search (MCTS) algorithm:


In MCTS, nodes are the building blocks of the search tree. These nodes are formed based on
the outcome of a number of simulations. The process of Monte Carlo Tree Search can be
broken down into four distinct steps, viz., selection, expansion, simulation and
backpropagation. Each of these steps is explained in details below:

 Selection: In this process, the MCTS algorithm traverses the current tree from the root node
using a specific strategy. The strategy uses an evaluation function to optimally select nodes
with the highest estimated value. MCTS uses the Upper Confidence Bound (UCB) formula
applied to trees as the strategy in the selection process to traverse the tree. It balances the
exploration-exploitation trade-off. During tree traversal, a node is selected based on some
parameters that return the maximum value. The parameters are characterized by the
formula that is typically used for this purpose is given below.

 where;
Si = value of a node i
xi = empirical mean of a node i
C = a constant
t = total number of simulations
When traversing a tree during the selection process, the child node that returns the greatest
value from the above equation will be one that will get selected. During traversal, once a
child node is found which is also a leaf node, the MCTS jumps into the expansion step.
 Expansion: In this process, a new child node is added to the tree to that node which was
optimally reached during the selection process.
 Simulation: In this process, a simulation is performed by choosing moves or strategies until
a result or predefined state is achieved.
 Backpropagation: After determining the value of the newly added node, the remaining tree
must be updated. So, the backpropagation process is performed, where it backpropagates
from the new node to the root node. During the process, the number of simulation stored in
each node is incremented. Also, if the new node’s simulation results in a win, then the
number of wins is also incremented.
The above steps can be visually understood by the diagram given below:
These types of algorithms are particularly useful in turn based games where there is no element
of chance in the game mechanics, such as Tic Tac Toe, Connect 4, Checkers, Chess, Go, etc.
This has recently been used by Artificial Intelligence Programs like AlphaGo, to play against
the world’s top Go players. But, its application is not limited to games only. It can be used in
any situation which is described by state-action pairs and simulations used to forecast
outcomes.
As we can see, the MCTS algorithm reduces to a very few set of functions which we can use
any choice of games or in any optimizing strategy.

Advantages of Monte Carlo Tree Search:

1. MCTS is a simple algorithm to implement.


2. Monte Carlo Tree Search is a heuristic algorithm. MCTS can operate effectively without
any knowledge in the particular domain, apart from the rules and end conditions, and
can find its own moves and learn from them by playing random playouts.
3. The MCTS can be saved in any intermediate state and that state can be used in future
use cases whenever required.
4. MCTS supports asymmetric expansion of the search tree based on the circumstances in
which it is operating.
Disadvantages of Monte Carlo Tree Search:

1. As the tree growth becomes rapid after a few iterations, it requires a huge amount
of memory.
2. There is a bit of a reliability issue with Monte Carlo Tree Search. In certain scenarios, there
might be a single branch or path, that might lead to loss against the opposition when
implemented for those turn-based games. This is mainly due to the vast amount of
combinations and each of the nodes might not be visited enough number of times to
understand its result or outcome in the long run.
3. MCTS algorithm needs a huge number of iterations to be able to effectively decide the most
efficient path. So, there is a bit of a speed issue there.

4. Explain detail about constraint satisfaction problem(CSP).

5. Constraint satisfaction problems:


We have seen so many techniques like Local search, Adversarial search to solve different
problems. The objective of every problem-solving technique is one, i.e., to find a solution to
reach the goal. Although, in adversarial search and local search, there were no constraints on
the agents while solving the problems and reaching to its solutions.

Constraint satisfaction technique. By the name, it is understood that constraint satisfaction


means solving a problem under certain constraints or rules.
Constraint satisfaction is a technique where a problem is solved when its values satisfy certain
constraints or rules of the problem. Such type of technique leads to a deeper understanding of
the problem structure as well as its complexity.
Constraint satisfaction depends on three components, namely:
X: It is a set of variables.
D: It is a set of domains where the variables reside. There is a specific domain for each
variable.
C: It is a set of constraints which are followed by the set of variables.
In constraint satisfaction, domains are the spaces where the variables reside, following the
problem specific constraints. These are the three main elements of a constraint satisfaction
technique. The constraint value consists of a pair of {scope, rel}. The scope is a tuple of
variables which participate in the constraint and rel is a relation which includes a list of values
which the variables can take to satisfy the constraints of the problem.
Solving Constraint Satisfaction Problems
The requirements to solve a constraint satisfaction problem (CSP) is:

 A state-space
 The notion of the solution.

A state in state-space is defined by assigning values to some or all variables such as


{X1=v1, X2=v2, and so on…}.
An assignment of values to a variable can be done in three ways:
 Consistent or Legal Assignment: An assignment which does not violate any constraint
or rule is called Consistent or legal assignment.
 Complete Assignment: An assignment where every variable is assigned with a value,
and the solution to the CSP remains consistent. Such assignment is known as Complete
assignment.
 Partial Assignment: An assignment which assigns values to some of the variables only.
Such type of assignments are called Partial assignments.

Types of Domains in CSP


There are following two types of domains which are used by the variables :

 Discrete Domain: It is an infinite domain which can have one state for multiple
 variables. For example, a start state can be allocated infinite times for each variable.
 Finite Domain: It is a finite domain which can have continuous states describing one
domain for one specific variable. It is also called a continuous domain. 

Constraint Types in CSP


With respect to the variables, basically there are following types of constraints:

 Unary Constraints: It is the simplest type of constraints that restricts the value of a
single variable.
 Binary Constraints: It is the constraint type which relates two variables. A value x2 will
contain a value which lies between x1 and x3. 
 Global Constraints: It is the constraint type which involves an arbitrary number of
variables.
 Some special types of solution algorithms are used to solve the following types of
constraints:
 Linear Constraints: These type of constraints are commonly used in linear
programming where each variable containing an integer value exists in linear form
only.
 Non-linear Constraints: These type of constraints are used in non-linear programming
where each variable (an integer value) exists in a non-linear form. 

Note: A special constraint which works in real-world is known as Preference constraint.

Constraint Propagation

In local state-spaces, the choice is only one, i.e., to search for a solution. But in CSP, we have
two choices either:

 We can search for a solution or


 We can perform a special type of inference called constraint propagation.
Constraint propagation is a special type of inference which helps in reducing the legal number
of values for the variables. The idea behind constraint propagation is local consistency.
In local consistency, variables are treated as nodes, and each binary constraint is treated as
an arc in the given problem. There are following local consistencies which are discussed
below:

 Node Consistency: A single variable is said to be node consistent if all the values in the

variable’s domain satisfy the unary constraints on the variables.

 Arc Consistency: A variable is arc consistent if every value in its domain satisfies the
binary constraints of the variables.
 Path Consistency: When the evaluation of a set of two variable with respect to a third
variable can be extended over another variable, satisfying all the binary constraints. It is
similar to arc consistency.
k-consistency: This type of consistency is used to define the notion of stronger forms of propagation.
Here, we examine the k-consistency of the variables

6. . Explain detail about constraint propagation.

Constraint Propagation

In local state-spaces, the choice is only one, i.e., to search for a solution. But in CSP, we have
two choices either:

 We can search for a solution or


 We can perform a special type of inference called constraint propagation.
Constraint propagation is a special type of inference which helps in reducing the legal number
of values for the variables. The idea behind constraint propagation is local consistency.
In local consistency, variables are treated as nodes, and each binary constraint is treated as
an arc in the given problem. There are following local consistencies which are discussed
below:

 Node Consistency: A single variable is said to be node consistent if all the values in the

variable’s domain satisfy the unary constraints on the variables.

 Arc Consistency: A variable is arc consistent if every value in its domain satisfies the
binary constraints of the variables.
 Path Consistency: When the evaluation of a set of two variable with respect to a third
variable can be extended over another variable, satisfying all the binary constraints. It is
similar to arc consistency.
 k-consistency: This type of consistency is used to define the notion of stronger forms of
propagation. Here, we examine the k-consistency of the variables.

CSP Problems

Constraint satisfaction includes those problems which contains some constraints while solving
the problem. CSP includes the following problems:

 Graph Coloring: The problem where the constraint is that no adjacent sides can have
the same color.

 Sudoku Playing: The gameplay where the constraint is that no number from 0-9 can be
repeated in the same row or column.
 n-queen problem: In n-queen problem, the constraint is that no queen should be
placed either diagonally, in the same row or column. 

Note: The n-queen problem is already discussed in Problem-solving in AI section.

 Crossword: In crossword problem, the constraint is that there should be the


correct formation of the words, and it should be meaningful. 
 Latin square Problem: In this game, the task is to search the pattern which is occurring
several times in the game. They may be shuffled but will contain the same digits. 

 Cryptarithmetic Problem: This problem has one most important constraint that is, we
cannot assign a different digit to the same character. All digits should contain a unique
alphabet.
[Link] detail about Monte Carlo Tree search algorithm

7. . Monte Carlo Tree Search (MCTS):

Monte Carlo Tree Search (MCTS) is a search technique in the field of Artificial Intelligence
(AI). It is a probabilistic and heuristic driven search algorithm that combines the classic tree
search implementations alongside machine learning principles of reinforcement learning.
In tree search, there’s always the possibility that the current best action is actually not the most
optimal action. In such cases, MCTS algorithm becomes useful as it continues to evaluate other
alternatives periodically during the learning phase by executing them, instead of the current
perceived optimal strategy. This is known as the ” exploration-exploitation trade-off “. It
exploits the actions and strategies that is found to be the best till now but also must continue to
explore the local space of alternative decisions and find out if they could replace the current
best.

Exploration helps in exploring and discovering the unexplored parts of the tree, which could
result in finding a more optimal path. In other words, we can say that exploration expands the
tree’s breadth more than its depth. Exploration can be useful to ensure that MCTS is not
overlooking any potentially better paths. But it quickly becomes inefficient in situations with
large number of steps or repetitions. In order to avoid that, it is balanced out by exploitation.
Exploitation sticks to a single path that has the greatest estimated value. This is a greedy
approach and this will extend the tree’s depth more than its breadth. In simple words, UCB
formula applied to trees helps to balance the exploration-exploitation trade-off by periodically
exploring relatively unexplored nodes of the tree and discovering potentially more optimal
paths than the one it is currently exploiting.
For this characteristic, MCTS becomes particularly useful in making optimal decisions in
Artificial Intelligence (AI) problems.

Monte Carlo Tree Search (MCTS) algorithm:


In MCTS, nodes are the building blocks of the search tree. These nodes are formed based on
the outcome of a number of simulations. The process of Monte Carlo Tree Search can be
broken down into four distinct steps, viz., selection, expansion, simulation and
backpropagation. Each of these steps is explained in details below:

 Selection: In this process, the MCTS algorithm traverses the current tree from the
root node using a specific strategy. The strategy uses an evaluation function to
optimally select nodes with the highest estimated value. MCTS uses the Upper
Confidence Bound (UCB) formula applied to trees as the strategy in the selection
process to traverse the tree. It balances the exploration-exploitation trade-off.
During tree traversal, a node is selected based on some parameters that return the
maximum value. The parameters are characterized by the formula that is typically
used for this purpose is given below.

 where;
Si = value of a node i
xi = empirical mean of
a node i C = a constant
t = total number of simulations
When traversing a tree during the selection process, the child node that returns the
greatest value from the above equation will be one that will get selected. During
traversal, once a child node is found which is also a leaf node, the MCTS jumps
into the expansion step.
 Expansion: In this process, a new child node is added to the tree to that node which
was optimally reached during the selection process.
 Simulation: In this process, a simulation is performed by choosing moves or
strategies until a result or predefined state is achieved.
 Backpropagation: After determining the value of the newly added node, the
remaining tree must be updated. So, the backpropagation process is performed,
where it backpropagates from the new node to the root node. During the process,
the number of simulation stored in each node is incremented. Also, if the new
node’s simulation results in a win, then the number of wins is also incremented.
The above steps can be visually understood by the diagram given below:
UNIT IV LOGICAL REASONING
Knowledge-based agents – propositional logic – propositional theorem proving –
propositional model checking – agents based on propositional logic. First-order
logic – syntax and semantics Knowledge representation and engineering –
inferences in first-order logic – Forward chaining Backward chaining –
resolution.

PART-A (2 Marks)
1. Define reasoning.
 The reasoning is the mental process of deriving logical conclusion
and making predictions from available knowledge, facts, and
beliefs.
Or we can say, "Reasoning is a way to infer facts from existing data."
 It is a general process of thinking rationally, to find valid conclusions.
 In artificial intelligence, the reasoning is essential so that the machine can also
think rationally as a human brain, and can perform like a human.
 When a system is required to do something, that it has not been explicitly
told how to do,, it must figure out what it needs to know from what it already
knows.
Fact 1 : Robins are Birds
Fact 2 : All birds have wings
Question : Do Robins have wings?

 Hence Reasoning system, must find out, what it needs to know


from what italready knows.
 Logic is a language of reasoning. It is a collection of rules called logic
arguments,we use when doing logic reasoning.

Logical Reasoning is a process of drawing conclusions from premises using rule


ofinference.
2. Draw an architecture of knowledge based agents.

The architecture of knowledge-based agent:

 The above diagram is representing a generalized


architecture for a knowledge-based agent.
 The knowledge-based agent (KBA) take input from the
environment by perceivingthe environment.
 The input is taken by the inference engine of the agent and
which alsocommunicate with KB to decide as per the
knowledge store in KB.
 The learning element of KBA regularly updates the KB by learning
newknowledge.

3. Define knowledge based agents.


o An intelligent agent needs knowledge about the real

world for taking decisions and reasoning to act efficiently.


o Knowledge-based agents are those agents who have the
capability of maintaining an internal state of
knowledge, reason over that knowledge, update their
knowledge after observations and take actions. These
agents can represent the world with some formal
representation and act intelligently.
Knowledge-based agents

4. What are the operations performed by KBA.


Following are three operations which are performed by KBA in
order to show theintelligent behavior:
1. TELL: This operation tells the knowledge base what it
perceives from theenvironment.
2. ASK: This operation asks the knowledge base what action it
should perform.
3. Perform: It performs the selected action.
5. What is propositional logic. Give an example?
 Propositional logic (PL) is the simplest form of logic where
all the statements aremade by propositions.
 A proposition is a declarative statement which is either true or false.
 It is a technique of knowledge representation in logical and
mathematical form

Example:
a) It is Sunday.
b) The Sun rises from West (False proposition)
c) 3+3= 7(False proposition)
d) 5 is a prime number.
6. Define Davis-Putnam algorithm.
The Davis–Putnam algorithm was developed by Martin Davis and Hilary Putnam for checking the
validity of a first-order logic formula using a resolution-based decision procedure for propositional
logic.
7. Define hybrid agent.
A Hybrid Agent is a new kind of real estate professional that utilizes artificial intelligence to automate,
digitize and increase daily production and income.

8. Define first order logic.


 First-order logic is symbolized reasoning in which each sentence, or statement,
isbroken down into a subject and a predicate.
 First-order logic can be useful in the creation of computer programs. It is also
ofinterest to researchers in artificial intelligence.

9. Define syntax and sematics.


The syntax and semantics of first-order (FO) logic allow us to explicitly represent objects and
relationships among object, which provides us with much more representational power than the
propositional case
10. Define inferences.
 Inference in First-Order Logic is used to deduce new facts or sentences
fromexisting sentences.
 Before understanding the FOL inference rule, let's understand some basicterminologies
used in FOL.

11. Define forward chaining


Forward chaining is a method of reasoning in artificial intelligence in which inference rules
are applied to existing data to extract additional data until an endpoint (goal) is achieved.

In this type of chaining, the inference engine starts by evaluating existing facts, derivations, and
conditions before deducing new information. An endpoint (goal) is achieved through the
manipulation of knowledge that exists in the knowledge base. Forward chaining can be used in
planning, monitoring, controlling, and interpretingapplications.
12. Define backward chaining.
 Backward chaining is a concept in artificial intelligence that involves
backtrackingfrom the endpoint or goal to steps that led to the endpoint.

 This type of chaining starts from the goal and moves


backward to comprehend thesteps that were taken to
attain this goal.
 The backtracking process can also enable a person establish
logical steps that canbe used to find other important
solutions.
 Backward chaining can be used in debugging, diagnostics,
and prescriptionapplications.

13. Define Resolution.


 Resolution is a theorem proving technique that proceeds by
building refutation proofs, i.e., proofs by contradictions.
 It was invented by a Mathematician John Alan Robinson in the year
1965.
 Resolution is used, if there are various statements are given,
and we need to provea conclusion of those statements.
 Unification is a key concept in proofs by resolutions. Resolution is a
single inference rule which can efficiently operate on the
conjunctive normal form or clausal form.
[Link] various inference rules in Propositiona logic.

In propositional logic for AI, common inference rules include Modus Ponens (If A implies
B, and A is true, then B is true), Modus Tollens (If A implies B, and B is false, then A is
false), and Hypothetical Syllogism (If A implies B, and B implies C, then A implies C). Other
rules are Disjunctive Syllogism (If A or B is true, and A is false, then B is true)
and Conjunction (If A is true and B is true, then A and B are true).

15. List various inference rules in Predicate logic


Predicate logic includes standard Propositional Logic inference rules (like Modus
Ponens and Modus Tollens) and specific rules for quantifiers: Universal
Instantiation (applying a universal statement to a specific instance), Universal
Generalization (inferring a universal statement from an arbitrary specific
instance), Existential Instantiation (giving a name to an existing individual),
and Existential Generalization (inferring existence from a specific true instance)
PART-B (15 MARKS)

1. Explain detail about knowledge based agents andits operations.

1. Knowledge-based agent in Artificial intelligence

o An intelligent agent needs knowledge about the real world for taking decisions and
reasoning to act efficiently.
o Knowledge-based agents are those agents who have the capability of maintaining
an internal state of knowledge, reason over that knowledge, update their
knowledge after observations and take actions. These agents can represent the
world with some formal representation and act intelligently.
o Knowledge-based agents are composed of two main parts:
o Knowledge-base and
o Inference system.

A knowledge-based agent must able to do the following:

o An agent should be able to represent states, actions, etc.


o An agent Should be able to incorporate new percepts
o An agent can update the internal representation of the world
o An agent can deduce the internal representation of the world
o An agent can deduce appropriate actions.

The architecture of knowledge-based agent:

The above diagram is representing a generalized architecture for a knowledge-based agent.


The knowledge-based agent (KBA) take input from the environment by perceiving the
environment. The input is taken by the inference engine of the agent and which also
communicate with KB to decide as per the knowledge store in KB. The learning element of
KBA regularly updates the KB by learning new knowledge.

Knowledge base: Knowledge-base is a central component of a knowledge-based agent, it is


also known as KB. It is a collection of sentences (here 'sentence' is a technical term and it is
not identical to sentence in English). These sentences are expressed in a language which is
called a knowledge representation language. The Knowledge-base of KBA stores fact about
the world.

Why use a knowledge base?

Knowledge-base is required for updating knowledge for an agent to learn with experiences
and take action as per the knowledge.

Inference system

Inference means deriving new sentences from old. Inference system allows us to add a new
sentence to the knowledge base. A sentence is a proposition about the world. Inference
system applies logical rules to the KB to deduce new information.

Inference system generates new facts so that an agent can update the KB. An inference
system works mainly in two rules which are given as:

o Forward chaining
o Backward chaining

Operations Performed by KBA

Following are three operations which are performed by KBA in order to show the intelligent behavior:

1. TELL: This operation tells the knowledge base what it perceives from the environment.
2. ASK: This operation asks the knowledge base what action it should perform.
3. Perform: It performs the selected action.

A generic knowledge-based agent:

Following is the structure outline of a generic knowledge-based agents program:

function KB-AGENT(percept):

persistent: KB, a knowledge base

t, a counter, initially
0, indicating time TELL(KB,
MAKE-PERCEPT-
SENTENCE(percept, t))
Action = ASK(KB, MAKE-
ACTION-QUERY(t))
TELL(KB, MAKE-ACTION-
SENTENCE(action, t))

t=t+1

return action
The knowledge-based agent takes percept as input and returns an action as output. The
agent maintains the knowledge base, KB, and it initially has some background knowledge of
the real world. It also has a counter to indicate the time for the whole process, and this
counter is initialized with zero.

Each time when the function is called, it performs its three operations:

o Firstly it TELLs the KB what it perceives.


o Secondly, it asks KB what action it should take
Third agent program TELLS the KB that which action was chosen

The MAKE-PERCEPT-SENTENCE generates a sentence as setting that the agent perceived


the given percept at the given time.

The MAKE-ACTION-QUERY generates a sentence to ask which action should be

done at the current time. MAKE-ACTION-SENTENCE generates a sentence which

asserts that the chosen action was executed.

Various levels of knowledge-based agent:

A knowledge-based agent can be viewed at different levels which are given below:

1. Knowledge level

Knowledge level is the first level of knowledge-based agent, and in this level, we need to
specify what the agent knows, and what the agent goals are. With these specifications, we can
fix its behavior. For example, suppose an automated taxi agent needs to go from a station A
to station B, and he knows the way from A to B, so this comes at the knowledge level.

2. Logical level:

are encoded into different logics. At the logical level, an encoding of knowledge into logical
sentences occurs. At the logical level we can expect to the automated taxi agent to reach to
the destination B.

3. Implementation level:

This is the physical representation of logic and knowledge. At the implementation level agent
perform actions as per logical and knowledge level. At this level, an automated taxi agent
actually implement his knowledge and logic so that he can reach to the destination.

Approaches to designing a knowledge-based agent:

There are mainly two approaches to build a knowledge-based agent:


1. 1. Declarative approach: We can create a knowledge-based agent by initializing
with an empty knowledge base and telling the agent all the sentences with which we
want to start with. This approach is called Declarative approach.
2. 2. Procedural approach: In the procedural approach, we directly encode desired
behavior as a program code. Which means we just need to write a program that
already encodes the desired behavior or agent.

However, in the real world, a successful agent can be built by combining both declarative and
procedural approaches, and declarative knowledge can often be compiled into more efficient
procedural code.

[Link] detail about Propositional logic.

2. Propositional logic

Propositional logic (PL) is the simplest form of logic where all the statements are made by
propositions. A proposition is a declarative statement which is either true or false. It is a
technique of knowledge representation in logical and mathematical form.

Example:

a) It is Sunday.
b) The Sun rises from West (False proposition)
c) 3+3= 7(False proposition)
d) 5 is a prime number.

Following are some basic facts about propositional logic:

o Propositional logic is also called Boolean logic as it works on 0 and 1.


o In propositional logic, we use symbolic variables to represent the logic, and we
can use any symbol for a representing a proposition, such A, B, C, P, Q, R, etc.
o Propositions can be either true or false, but it cannot be both.
o Propositional logic consists of an object, relations or function, and logical connectives.
o These connectives are also called logical operators.
o The propositions and connectives are the basic elements of the propositional logic.
o Connectives can be said as a logical operator which connects two sentences.
o A proposition formula which is always true is called tautology, and it is also called a valid sentence.
o A proposition formula which is always false is called Contradiction.
o A proposition formula which has both true and false values is called
o Statements which are questions, commands, or opinions are not propositions such
as "Where is Rohini", "How are you", "What is your name", are not propositions.

Syntax of propositional logic:

The syntax of propositional logic defines the allowable sentences for the knowledge
representation. There are two types of Propositions:

a. Atomic Propositions
b. Compound propositions

o Atomic Proposition: Atomic propositions are the simple propositions. It consists


of a single proposition symbol. These are the sentences which must be either true
or false.

Example:

a) 2+2 is 4, it is an atomic proposition as it is a true fact.


b) "The Sun is cold" is also a proposition as it is a false fact.

o Compound proposition: Compound propositions are constructed by


combining simpler or atomic propositions, using parenthesis and logical
connectives.

Example:

a) "It is raining today, and street is wet."


b) "Ankit is a doctor, and his clinic is in Mumbai."

Logical Connectives:

Logical connectives are used to connect two simpler propositions or representing a sentence
logically. We can create compound propositions with the help of logical connectives. There
are mainly five connectives, which are given as follows:

1. Negation: A sentence such as ¬ P is called negation of P. A literal can be either


Positive literal or negative literal.
2. Conjunction: A sentence which has ∧ connective such as, P ∧ Q is called a conjunction.
Example: Rohan is intelligent and hardworking. It can be written as,

P= Rohan is intelligent,

Q= Rohan is hardworking. → P∧ Q.

3. Disjunction: A sentence which has ∨ connective, such as P ∨ Q. is called


disjunction, where P and Q are the propositions.
Example: "Ritika is a doctor or Engineer",

Here P= Ritika is Doctor. Q= Ritika is Doctor, so we can write it as P ∨ Q.

4. Implication: A sentence such as P → Q, is called an implication. Implications are


also known as if-then rules. It can be represented as
If it is raining, then the street is wet.

Let P= It is raining, and Q= Street is wet, so it is represented as P → Q

5. Biconditional: A sentence such as P⇔ Q is a Biconditional sentence, example If


I am breathing, then I am alive
P= I am breathing, Q= I am alive, it can be represented as P ⇔ Q.

Following is the summarized table for Propositional Logic Connectives:


Truth Table:

In propositional logic, we need to know the truth values of propositions in all possible
scenarios. We can combine all the possible combination with logical connectives, and the
representation of these combinations in a tabular

called Truth table. Following are the truth table for all logical
connectives:
Truth table with three propositions:

We can build a proposition composing three propositions P, Q, and R. This truth table is
made-up of 8n Tuples as we have taken three proposition symbols.

Precedence of connectives:
Just like arithmetic operators, there is a precedence order for propositional connectors or
logical operators. This order should be followed while evaluating a propositional problem.
Following is the list of the precedence order for operators:

Precedence Operators

First Precedence Parenthesis

Second Precedence Negation

Third Precedence Conjunction(AND)

Fourth Precedence Disjunction(OR)

Fifth Precedence Implication

Six Precedence Biconditional

Note: For better understanding use parenthesis to make sure of the correct
interpretations. Such as ¬R∨ Q, It can be interpreted as (¬R) ∨ Q

Logical equivalence:

Logical equivalence is one of the features of propositional logic. Two propositions are said to
be logically equivalent if and only if the columns in the truth table are identical to each other.

Let's take two propositions A and B, so for logical equivalence, we can write it as A⇔B. In
below truth table we can see that column for ¬A∨ B and A→B, are identical hence A is
Equivalent to B

Properties of Operators:

o Commutativity:
o P∧ Q= Q ∧ P, or
o P ∨ Q = Q ∨ P.
o Associativity:
o (P ∧ Q) ∧ R= P ∧ (Q ∧ R),
o (P ∨ Q) ∨ R= P ∨ (Q ∨ R)
o Identity element:
o P ∧ True = P,
o P ∨ True= True.
o Distributive:
o P∧ (Q ∨ R) = (P ∧ Q) ∨ (P ∧ R).
o P ∨ (Q ∧ R) = (P ∨ Q) ∧ (P ∨ R).
o DE Morgan's Law:
o ¬ (P ∧ Q) = (¬P) ∨ (¬Q)
o ¬ (P ∨ Q) = (¬ P) ∧ (¬Q).
o Double-negation elimination:
o ¬ (¬P) = P.

Limitations of Propositional logic:

o We cannot represent relations like ALL, some, or none with propositional logic. Example:
a. All the girls are intelligent.
b. Some apples are sweet.
Propositional logic has limited expressive power.

In propositional logic, we cannot describe statements in terms of their properties or logical relationships.

[Link] brief notes about propositional model checking

3. Propositional model checking

The set of possible models, given a fixed propositional vocabulary, is finite, so


entailment can be checked by enumerating models. Efficient model-checking inference
algorithms for propositional logic include backtracking and local search methods and can
often solve large problems quickly.

2 families of algorithms for the SAT problem based on model checking:

a. based on backtracking
b. based on local hill-climbing search
1. A complete
backtracking
algorithm
David-Putnam
algorithm
(DPLL):

DPLL embodies 3 improvements over the scheme of TT-ENTAILS?: Early termination, pure
symbol heuristic, unit clause heuristic.

Tricks that enable SAT solvers to scale up to large variable and value
problems: Component analysis, ordering, intelligent
backtracking, random restarts, clever indexing.

Local search algorithms

Local search algorithms can be applied directly to the SAT problem, provided that choose
the right evaluation function. (We can choose an evaluation function that counts the number
of unsatisfied clauses.)

These algorithms take steps in the space of complete assignments, flipping the truth value of
one symbol at a time. The space usually contains many local minima, to escape from which
various forms of randomness are required. Local search methods such as WALKSAT can be
used to find solutions. Such algorithm are sound but not complete. WALKSAT: one of the
simplest and most effective algorithms.

On every iteration, the algorithm picks an unsatisfied clause, and chooses randomly between
2 ways to pick a symbol to flip:

Either a. a “min-conflicts” step that minimizes the number of unsatisfied clauses in the new state;
Or b. a “random walk” step that picks the symbol randomly.
When the algorithm returns a model, the input
sentence is indeed satifiable; When the algorithm
Either a. The sentence is unsatisfiable;
returns failure, there are 2 possible causes:
Or b. We need to give the algorithm more time.

If we set max_flips=∞, p>0, the algorithm will:

Either a. eventually return a model if one exists


Or b. never terminate if the sentence is unsatifiable.
Thus WALKSAT is useful when we expect a solution to exist, but cannot always detect unsatisfiability.

The landscape of random SAT problems

Underconstrained problem: When we look at satisfiability problems in CNF, an


underconstrained problem is one with relatively few clauses constraining the variables.

An overconstrained problem has many clauses relative to the number of variables and
is likely to have no solutions.

The notation CNFk(m, n) denotes a k-CNF sentence with m clauses and n symbols. (with n
variables and k literals per clause).

Given a source of random sentences, where the clauses are chosen uniformly,
independently and without replacement from among all clauses with k different literals,
which are positive or negative at random.
Hardness: problems right at the threshold > overconstrained problems > underconstrained p

Satifiability threshold conjecture: A theory says that for every k≥3, there is a threshold ratio
rk, such that as n goes to infinity, the probability that CNFk(n, rn) is satisfiable becomes 1 for
all values or r below the threshold, and 0 for all values above. (remains unproven)

[Link] an algorithm for Hybrid Wumpus -Agent algorithm.

4. Agents based on propositional logic


1. The current state of the world
We can associate proposition with timestamp to avoid contradiction.

e.g. ¬Stench3, Stench4

fluent: refer an aspect of the world that changes. (E.g. Ltx,y)

atemporal variables: Symbols associated with permanent aspects of the world do not need a time
superscript.

Effect axioms: specify the outcome of an action at the next time step.

Frame problem: some information lost because the effect axioms fails to state what
remains unchanged as the result of an action.

Solution: add frame axioms explicity asserting all the propositions that remain the same.

Representation frame problem: The proliferation of frame axioms is inefficient, the set of
frame axioms will be O(mn) in a world with m different actions and n fluents.

Solution: because the world exhibits locaility (for humans each action typically changes no
more than some number k of those fluents.) Define the transition model with a set of axioms
of size O(mk) rather than size O(mn).

Inferential frame problem: The problem of projecting forward the results of a t step lan of
action in time O(kt) rather than O(nt).
Solution: change one’s focus from writing axioms about actions to writing axioms about fluents.

For each fluent F, we will have an axiom that defines the truth value of Ft+1 in terms of fluents
at time t and the action that may have occurred at time t.

The truth value of Ft+1 can be set in one of 2 ways:

Either a. The action at time t cause F to be true at t+1

Or b. F was already true at time t and the action at time t does


not cause it to be false. An axiom of this form is called a
successor-state axiom and has this schema:

Qualification problem: specifying all unusual exceptions that could cause the action to fail.

2. A hybrid agent

Hybrid agent: combines the ability to deduce various aspect of the state of the world with
condition-action rules, and with problem-solving algorithms.

The agent maintains and update KB as a current plan.

The initial KB contains the atemporal axioms. (don’t depend on t)

At each time step, the new percept sentence is added along with all the axioms that
depend on t (such as the successor-state axioms).

Then the agent use logical inference by ASKING questions of the KB (to work out which
squares are safe and which have yet to be visited).

The main body of the agent program constructs a plan based on a decreasing priority of goals:

1. If there is a glitter, construct a plan to grab the gold, follow a route back to the initial
location and climb out of the cave;
2. Otherwise if there is no current plan, plan a route (with A* search) to the closest safe
square unvisited yet, making sure the route goes through only safe squares;
3. If there are no safe squares to explore, if still has an arrow, try to make a safe square
by shooting at one of the possible wumpus locations.
4. If this fails, look for a square to explore that is not provably unsafe.
5. If there is no such square, the mission is impossible, then retreat to the initial location and climb out of the
cave.
Weakness: The computational expense goes up as time goes by.

3. Logical state estimation


To get a constant update time, we need to cache the result of inference.

Belief state: Some representation of the set of all possible current state of the world. (used to replace the past history of
percepts and all their ramifications)

We use a logical sentence involving the proposition symbols associated with the current time step and the temporal
symbols.

Logical state estimation involves maintaining a logical sentence that describes the set of possible states consistent with
the observation history. Each update step requires inference using the transition model of the environment, which is built
from successor-state axioms that specify how each fluent changes.

State estimation: The process of updating the belief state as new percepts arrive.

Exact state estimation may require logical formulas whose size is exponential in the number of symbols.
One common scheme for approximate state estimation: to represent belief state as conjunctions of literals (1-CNF
formulas).

The agent simply tries to prove Xt and ¬Xt for each symbol Xt, given the belief state at t-1.

The conjunction of provable literals becomes the new belief state, and the previous belief state is discarded. (This
scheme may lose some information as time goes along.)

The set of possible states represented by the 1-CNF belief state includes all states that are in fact possible given the full
percept history. The 1-CNF belief state acts as a simple outer envelope, or conservative approximation.

4. Making plans by propositional inference


We can make plans by logical inference instead of A* search in Figure 7.20.
Basic idea:

1. Construct a sentence that includes:


a) Init0: a collection of assertions about the initial state;
b) Transition1, …, Transitiont: The successor-state axioms for all possible actions at each time up to some maximum time
t;
c) HaveGoldt∧ClimbedOutt: The assertion that the goal is achieved at time t.
2. Present the whole sentence to a SAT solver. If the solver finds a satisfying model, the goal is achievable; else the
planning is impossible.
3. Assuming a model is found, extract from the model those variables that represent actions and are assigned true.

Together they represent a plan to ahieve the goals.

Decisions within a logical agent can be made by SAT solving: finding possible models specifying future action sequences
that reach the goal. This approach works only for fully observable or sensorless environment.

SATPLAN: A propositional planning. (Cannot be used in a partially observable environment)

SATPLAN finds models for a sentence containing the initial sate, the goal, the successor-state axioms, and the action
exclusion axioms.

(Because the agent does not know how many steps it will take to reach the goal, the algorithm tries each possible number
of steps t up to some maximum conceivable plan length Tmax.)
Precondition axioms: stating that an action occurrence requires the preconditions to be satisfied, added to avoid
generating plans with illegal actions.

Action exclusion axioms: added to avoid the creation of plans with multiple simultaneous actions that interfere with each
other.

Propositional logic does not scale to environments of unbounded size because it lacks the expressive power to deal
concisely with time, space and universal patterns of relationshipgs among objects.

[Link] an brief notes about First Order Logic


5. First-order logic

First-Order Logic in Artificial intelligence

In the topic of Propositional logic, we have seen that how to represent statements using propositional logic. But
unfortunately, in propositional logic, we can only represent the facts, which are either true or false. PL is not sufficient to
represent the complex sentences or natural language statements. The propositional logic has very limited expressive
power. Consider the following sentence, which we cannot represent using PL logic.

o "Some humans are intelligent", or


o "Sachin likes cricket."

To represent the above statements, PL logic is not sufficient, so we required some more powerful logic, such as first- order
logic.

First-Order logic:

o First-order logic is another way of knowledge representation in artificial intelligence. It is an extension to


propositional logic.

o FOL is sufficiently expressive to represent the natural language statements in a concise way.
o First-order logic is also known as Predicate logic or First-order predicate logic. First-order logic is a powerful language
that develops information about the objects in a more easy way and can also express the relationship between those
objects.
o First-order logic (like natural language) does not only assume that the world contains facts like propositional logic but
also assumes the following things in the world:
o Objects: A, B, people, numbers, colors, wars, theories, squares, pits, wumpus, ......
o Relations: It can be unary relation such as: red, round, is adjacent, or n-any relation such as: the
sister of, brother of, has color, comes between
o Function: Father of, best friend, third inning of, end of, ......
o As a natural language, first-order logic also has two main parts:
a. Syntax
b. Semantics

Syntax of First-Order logic:

The syntax of FOL determines which collection of symbols is a logical expression in first-order logic. The basic
syntactic elements of first-order logic are symbols. We write statements in short-hand notation in FOL.

Basic Elements of First-order logic:

Following are the basic elements of FOL syntax:

Constant 1, 2, A, John, Mumbai, cat,....

Variables x, y, z, a, b,....

Predicates Brother, Father, >,....

Function sqrt, LeftLegOf, ....

Connectives ∧, ∨, ¬, ⇒, ⇔

Equality ==

Quantifier ∀, ∃

Atomic sentences:

o Atomic sentences are the most basic sentences of first-order logic. These sentences are formed from a predicate symbol
followed by a parenthesis with a sequence of terms.
o We can represent atomic sentences as Predicate (term1, term2, ....... , term n).

Example: Ravi and Ajay are brothers: => Brothers(Ravi, Ajay).

Chinky is a cat: => cat (Chinky).

Complex Sentences:

o Complex sentences are made by combining atomic sentences using connectives.

First-order logic statements can be divided into two parts:

o Subject: Subject is the main part of the statement.


Predicate: A predicate can be defined as a relation, which binds two atoms together in a statement

Consider the statement: "x is an integer.", it consists of two parts, the first part x is the subject of the statement and
second part "is an integer," is known as a predicate.
Quantifiers in First-order logic:

o A quantifier is a language element which generates quantification, and quantification specifies the quantity of specimen
in the universe of discourse.
o These are the symbols that permit to determine or identify the range and scope of the variable in the logical expression.
There are two types of quantifier:
a. Universal Quantifier, (for all, everyone, everything)
b. Existential quantifier, (for some, at least one).

Universal Quantifier:

Universal quantifier is a symbol of logical representation, which specifies that the statement within its range is true for
everything or every instance of a particular thing.

The Universal quantifier is represented by a symbol ∀, which resembles an inverted A.

Note: In universal quantifier we use implication "→".

If x is a variable, then ∀x is read as:

o For all x
o For each x
o For every x.

Example:

All man drink coffee.

Let a variable x which refers to a cat so all x can be represented in UOD as below:
∀x man(x) → drink (x, coffee).

It will be read as: There are all x where x is a man who drink coffee.

Existential Quantifier:

Existential quantifiers are the type of quantifiers, which express that the statement within its scope is true for at least one
instance of something.

It is denoted by the logical operator ∃, which resembles as inverted E. When it is used with a predicate variable then it is
called as an existential quantifier.

Note: In Existential quantifier we always use AND or Conjunction symbol (∧).

If x is a variable, then existential quantifier will be ∃x or ∃(x). And it will be read as:

o There exists a 'x.'


o For some 'x.'
o For at least one 'x.'

Example:

Some boys are intelligent.


∃x: boys(x) ∧ intelligent(x)

It will be read as: There are some x where x is a boy who is intelligent.

Points to remember:

o The main connective for universal quantifier ∀ is implication →.


o The main connective for existential quantifier ∃ is and ∧.

Properties of Quantifiers:

o In universal quantifier, ∀x∀y is similar to ∀y∀x.


o In Existential quantifier, ∃x∃y is similar to ∃y∃x.
o ∃x∀y is not similar to ∀y∃x.

Some Examples of FOL using quantifier:

1. All birds fly.


In this question the predicate is "fly(bird)."

And since there are all birds who fly so it will be represented as follows.

∀x bird(x) →fly(x).

2. Every man respects his parent.


In this question, the predicate is "respect(x, y)," where x=man, and y= parent. Since
there is every man so will use ∀, and it will be represented as follows:

∀x man(x) → respects (x, parent).

3. Some boys play cricket.


In this question, the predicate is "play(x, y)," where x= boys, and y= game. Since there are some boys so we will use ∃,
and it will be represented as:

∃x boys(x) → play(x, cricket).

4. Not all students like both Mathematics and Science.


In this question, the predicate is "like(x, y)," where x= student, and y= subject.
Since there are not all students, so we will use ∀ with negation, so following representation for this:

¬∀ (x) [ student(x) → like(x, Mathematics) ∧ like(x, Science)].

5. Only one student failed in Mathematics.


In this question, the predicate is "failed(x, y)," where x= student, and y= subject.

Since there is only one student who failed in Mathematics, so we will use following representation for this:

∃(x) [ student(x) → failed (x, Mathematics) ∧∀ (y) [¬(x==y) ∧ student(y) → ¬failed (x, Mathematics)].

Free and Bound Variables:

The quantifiers interact with variables which appear in a suitable way. There are two types of variables in First-order logic
which are given below:

Free Variable: A variable is said to be a free variable in a formula if it occurs outside the scope of the quantifier.

Example: ∀x ∃(y)[P (x, y, z)], where z is a free variable.

Bound Variable: A variable is said to be a bound variable in a formula if it occurs within the scope of the quantifier.

Example: ∀x [A (x) B( y)], here x and y are the bound variables.

6. Explain detail about forward and backward chaining algorithm with an example.

6. Forward chaining and Backward chaining

Forward Chaining and backward chaining in AI

In artificial intelligence, forward and backward chaining is one of the important topics, but before understanding
forward and backward chaining lets first understand that from where these two terms came.

Inference engine:
The inference engine is the component of the intelligent system in artificial intelligence, which applies logical rules to
the knowledge base to infer new information from known facts. The first inference engine was part of the expert
system. Inference engine commonly proceeds in two modes, which are:

A. Forward chaining
B. Backward chaining

Horn Clause and Definite clause:

Horn clause and definite clause are the forms of sentences, which enables knowledge base to use a more restricted
and efficient inference algorithm. Logical inference algorithms use forward and backward chaining approaches, which
require KB in the form of the first-order definite clause.

Definite clause: A clause which is a disjunction of literals with exactly one positive literal is known as a definite
clause or strict horn clause.

Horn clause: A clause which is a disjunction of literals with at most one positive literal is known as horn clause.
Hence all the definite clauses are horn clauses.

Example: (¬ p V ¬ q V k). It has only one positive literal k.

It is equivalent to p ∧ q → k.

A. Forward Chaining

Forward chaining is also known as a forward deduction or forward reasoning method when using an inference engine.
Forward chaining is a form of reasoning which start with atomic sentences in the knowledge base and applies
inference rules (Modus Ponens) in the forward direction to extract more data until a goal is reached.

The Forward-chaining algorithm starts from known facts, triggers all rules whose premises are satisfied, and add their
conclusion to the known facts. This process repeats until the problem is solved.

Properties of Forward-Chaining:

o It is a down-up approach, as it moves from bottom to top.


o It is a process of making a conclusion based on known facts or data, by starting from the initial state and
reaches the goal state.
o Forward-chaining approach is also called as data-driven as we reach to the goal using available data.
o Forward -chaining approach is commonly used in the expert system, such as CLIPS, business, and production
rule systems.

Consider the following famous example which we will use in both approaches:

Example:
"As per the law, it is a crime for an American to sell weapons to hostile nations. Country A, an enemy of
America, has some missiles, and all the missiles were sold to it by Robert, who is an American citizen."

Prove that "Robert is criminal."

To solve the above problem, first, we will convert all the above facts into first-order definite clauses, and then we will
use a forward-chaining algorithm to reach the goal.

Facts Conversion into FOL:

o It is a crime for an American to sell weapons to hostile nations. (Let's say p, q, and r are variables)
American (p) ∧ weapon(q) ∧ sells (p, q, r) ∧ hostile(r) → Criminal(p) ...(1)

o Country A has some missiles. ?p Owns(A, p) ∧ Missile(p). It can be written in two definite clauses by using
Existential Instantiation, introducing new Constant T1.
Owns(A, T1) ...................(2)

Missile(T1).................... (3)

o All of the missiles were sold to country A by Robert.


?p Missiles(p) ∧ Owns (A, p) → Sells (Robert, p, A) ..............(4)

o Missiles are weapons.


Missile(p) → Weapons (p) .................... (5)

o Enemy of America is known as hostile.


Enemy(p, America) →Hostile(p) ..................... (6)

o Country A is an enemy of America.


Enemy (A, America) ...................... (7)

o Robert is American
American(Robert).........................(8)

Forward chaining proof:

Step-1:

In the first step we will start with the known facts and will choose the sentences which do not have implications, such
as: American(Robert), Enemy(A, America), Owns(A, T1), and Missile(T1). All these facts will be represented as
below.

Step-2:

At the second step, we will see those facts which infer from available facts and with satisfied premises.

Rule-(1) does not satisfy premises, so it will not be added in the first iteration.

Rule-(2) and (3) are already added.


Rule-(4) satisfy with the substitution {p/T1}, so Sells (Robert, T1, A) is added, which infers from the conjunction of
Rule (2) and (3).

Rule-(6) is satisfied with the substitution(p/A), so Hostile(A) is added and which infers from Rule-(7).

Step-3:

At step-3, as we can check Rule-(1) is satisfied with the substitution {p/Robert, q/T1, r/A}, so we can add
Criminal(Robert) which infers all the available facts. And hence we reached our goal statement.

Hence it is proved that Robert is Criminal using forward chaining approach.

B. Backward Chaining:

Backward-chaining is also known as a backward deduction or backward reasoning method when using an inference
engine. A backward chaining algorithm is a form of reasoning, which starts with the goal and works backward,
chaining through rules to find known facts that support the goal.

Properties of backward chaining:

o It is known as a top-down approach.


o Backward-chaining is based on modus ponens inference rule.

o In backward chaining, the goal is broken into sub-goal or sub-goals to prove the facts true.
o It is called a goal-driven approach, as a list of goals decides which rules are selected and used.
o Backward -chaining algorithm is used in game theory, automated theorem proving tools, inference engines,
proof assistants, and various AI applications.
o The backward-chaining method mostly used a depth-first search strategy for proof.

Example:

In backward-chaining, we will use the same above example, and will rewrite all the rules.

o American (p) ∧ weapon(q) ∧ sells (p, q, r) ∧ hostile(r) → Criminal(p) ...(1)


Owns(A, T1) ......................... (2)

o Missile(T1)
o ?p Missiles(p) ∧ Owns (A, p) → Sells (Robert, p, A) .................. (4)
o Missile(p) → Weapons (p) ........................ (5)
o Enemy(p, America) →Hostile(p) ......................... (6)
o Enemy (A, America) .......................... (7)
o American(Robert)............................. (8)

Backward-Chaining proof:

In Backward chaining, we will start with our goal predicate, which is Criminal(Robert), and then infer further rules.

Step-1:

At the first step, we will take the goal fact. And from the goal fact, we will infer other facts, and at last, we will prove
those facts true. So our goal fact is "Robert is Criminal," so following is the predicate of it.

Step-2:

At the second step, we will infer other facts form goal fact which satisfies the rules. So as we can see in Rule-1, the
goal predicate Criminal (Robert) is present with substitution {Robert/P}. So we will add all the conjunctive facts below
the first level and will replace p with Robert.

Here we can see American (Robert) is a fact, so it is proved here.


Step-3:t At step-3, we will extract further fact Missile(q) which infer from Weapon(q), as it satisfies Rule-(5). Weapon

(q) is also true with the substitution of a constant T1 at q.

Step-4:

At step-4, we can infer facts Missile(T1) and Owns(A, T1) form Sells(Robert, T1, r) which satisfies the Rule- 4, with the
substitution of A in place of r. So these two statements are proved here.
Step-5:

At step-5, we can infer the fact Enemy(A, America) from Hostile(A) which satisfies Rule- 6. And hence all the
statements are proved true using backward chaining.

[Link] detail about knowledge representation and engineering

1. Knowledge representation and engineering

Knowledge Engineering in First-order logic

What is knowledge-engineering?

The process of constructing a knowledge-base in first-order logic is called as knowledge- engineering. In knowledge-
engineering, someone who investigates a particular domain, learns important concept of that domain, and generates
a formal representation of the objects, is known as knowledge engineer.

In this topic, we will understand the Knowledge engineering process in an electronic circuit domain, which is already
familiar. This approach is mainly suitable for creating special-purpose knowledge base.

The knowledge-engineering process:

Following are some main steps of the knowledge-engineering process. Using these steps, we will develop a
knowledge base which will allow us to reason about digital circuit (One-bit full adder) which is given below
1. Identify the task:

The first step of the process is to identify the task, and for the digital circuit, there are various reasoning tasks

At the first level or highest level, we will examine the functionality of the circuit:

o Does the circuit add properly?


o What will be the output of gate A2, if all the inputs are high?

At the second level, we will examine the circuit structure details such as:

o Which gate is connected to the first input terminal?


o Does the circuit have feedback loops?

2. Assemble the relevant knowledge:

In the second step, we will assemble the relevant knowledge which is required for digital circuits. So for digital circuits,
we have the following required knowledge:

o Logic circuits are made up of wires and gates.


o Signal flows through wires to the input terminal of the gate, and each gate produces the corresponding
output which flows further.
o In this logic circuit, there are four types of gates used: AND, OR, XOR, and NOT.
o All these gates have one output terminal and two input terminals (except NOT gate, it has one input
terminal).

3. Decide on vocabulary:

The next step of the process is to select functions, predicate, and constants to represent the circuits, terminals, signals,
and gates. Firstly we will distinguish the gates from each other and from other objects. Each gate is represented as an
object which is named by a constant, such as, Gate(X1). The functionality of each gate is determined by its type,
which is taken as constants such as AND, OR, XOR, or NOT. Circuits will be identified by a predicate: Circuit (C1).

For the terminal, we will use predicate: Terminal(x).

For gate input, we will use the function In(1, X1) for denoting the first input terminal of the gate, and for output
terminal we will use Out (1, X1).
The function Arity(c, i, j) is used to denote that circuit c has i input, j output.

The connectivity between gates can be represented by predicate Connect(Out(1, X1), In(1, X1)).

We use a unary predicate On (t), which is true if the signal at a terminal is on.

4. Encode general knowledge about the domain:

To encode the general knowledge about the logic circuit, we need some following rules:

o If two terminals are connected then they have the same input signal, it can be represented as:
∀ t1, t2 Terminal (t1) ∧ Terminal (t2) ∧ Connect (t1, t2) → Signal (t1) = Signal (2).

o Signal at every terminal will have either value 0 or 1, it will be represented as:

∀ t Terminal (t) →Signal (t) = 1 ∨Signal (t) = 0.

o Connect predicates are commutative:

∀ t1, t2 Connect(t1, t2) → Connect (t2, t1).

o Representation of types of gates:

∀ g Gate(g) ∧ r = Type(g) → r = OR ∨r = AND ∨r = XOR ∨r = NOT.

o Output of AND gate will be zero if and only if any of its input is zero.

∀ g Gate(g) ∧ Type(g) = AND →Signal (Out(1, g))= 0 ⇔ ∃n Signal (In(n, g))= 0.

o Output of OR gate is 1 if and only if any of its input is 1:

∀ g Gate(g) ∧ Type(g) = OR → Signal (Out(1, g))= 1 ⇔ ∃n Signal (In(n, g))= 1

o Output of XOR gate is 1 if and only if its inputs are different:

∀ g Gate(g) ∧ Type(g) = XOR → Signal (Out(1, g)) = 1 ⇔ Signal (In(1, g)) ≠ Signal (In(2, g)).

o Output of NOT gate is invert of its input:

∀ g Gate(g) ∧ Type(g) = NOT → Signal (In(1, g)) ≠ Signal (Out(1, g)).

o All the gates in the above circuit have two inputs and one output (except NOT gate).

∀ g Gate(g) ∧ Type(g) = NOT → Arity(g, 1, 1)

∀ g Gate(g) ∧ r =Type(g) ∧ (r= AND ∨r= OR ∨r= XOR) → Arity (g, 2, 1).

o All gates are logic circuits:

∀ g Gate(g) → Circuit (g).

5. Encode a description of the problem instance:

Now we encode problem of circuit C1, firstly we categorize the circuit and its gate components. This step is
easy if ontology about the problem is already thought. This step involves the writing simple atomics
sentences of instances of concepts, which is known as ontology.

For the given circuit C1, we can encode the problem instance in atomic sentences as below:

Since in the circuit there are two XOR, two AND, and one OR gate so atomic sentences for these gates will be:
For XOR gate: Type(x1)= XOR,
Type(X2) = XOR For AND gate:
Type(A1) = AND, Type(A2)= AND
For OR gate: Type (O1) = OR.

And then represent the connections between all the gates.

Note: Ontology defines a particular theory of the nature of existence.

1. Pose queries to the inference procedure and get answers:

In this step, we will find all the possible set of values of all the terminal for the adder circuit. The first query will be:

What should be the combination of input which would generate the first output of circuit C1, as 0
and a second output to be 1?

∃ i1, i2, i3 Signal (In(1, C1))=i1 ∧ Signal (In(2, C1))=i2 ∧ Signal (In(3, C1))= i3

∧ Signal (Out(1, C1)) =0 ∧ Signal (Out(2, C1))=1

2. Debug the knowledge base:

Now we will debug the knowledge base, and this is the last step of the complete process. In this step, we
will try to debug the issues of knowledge base.

In the knowledge base, we may have omitted assertions like 1 ≠ 0.

UNIT V PROBABILISTIC REASONING


Acting under uncertainty – Bayesian inference – Naïve Bayes models. Probabilistic reasoning
Bayesian networks – Exact inference in BN – Approximate inference in BN – Causal
networks.
PART-A (2 Marks)
1. Define uncertainty. Give an example?
 Till now, we have learned knowledge representation using first- order
logic and propositional logic with certainty, which means we were sure
about the predicates.
 With this knowledge representation, we might write A→B, which means if
A istrue then B is true, but consider a situation where we are not sure
about whether A is true or not then we cannot express this statement,
this situation is called uncertainty.
 So to represent uncertain knowledge, where we are not sure about the
predicates, we need uncertain reasoning or probabilistic reasoning.

2. Define Belief state.


Problem-solving agents and Logical agents designed to handle uncertainty
bykeeping track of a belief state—a representation of the set of all
possible world states that it might be in—and generating a contingency
plan that handles everypossible eventuality that its sensors may report
during execution

3. Define Bayes rule.


 Bayes' rule allows us to compute the single term P(B|A) in
terms of P(A|B), P(B),and P(A).
 This is very useful in cases where we have a good probability of
these three termsand want to determine the fourth one.
 Suppose we want to perceive the effect of some unknown
cause, and want tocompute that cause, then the Bayes' rule
becomes:

Application of Bayes' theorem in Artificial intelligence


4. Define Bayesian network.
 Bayesian belief network is key computer technology for
dealing with probabilistic events and to solve a problem
which has uncertainty. We can define a Bayesian network
as:

"A Bayesian network is a probabilistic graphical model which


represents a set of variables and their conditional dependencies
using a directed acyclic graph."

 It is also called a Bayes network, belief network,


decision network, or Bayesian model.
Bayesian networks are probabilistic, because these
networks are built from a probability distribution, and also use
probability theory for prediction and anomalydetection.

5. List out the components of Bayesian network.

The Bayesian network has mainly two components:


 Causal Component

 Actual numbers
 Each node in the Bayesian network has condition
probabil
distribution P(Xi |Parent(Xi) ), which determines the effect
of parent on that node.
 Bayesian network is based on Joint probability
distribution a conditional probability.
 So let's first understand the joint probability distribution:

Joint Probability – It is the measure of two events happening at t


same time. It can be written as P(A∩B)

Conditional Probability– It is the measure of the probability of an eve


occurring given that another event has occurred. In other terms, t
conditional probability of an event X is the probability that the eve
will occur given that event Y has already occurred.

P(X|Y): Probability of event X occurring given that event Y already


[Link] X and Y are dependent events, then P(X|Y) = P(X∩Y)/P(Y)

If X and Y are independent events, then P(X∩Y) = 0So, P(X|Y) = P(X)

6. Write a Applications of Bayesian network.


Healthcare Industry: The Bayesian network is used in the
healthcare industry for the detection and prevention of diseases.
Based on models built, we can find out the likelysymptoms and
predict whether a person will be diseased or not. For instance, if a
person has cholesterol, then there are high chances that the
person gets a heart problem. With this information, a person can
take preventive measures.

Web Search: Bayesian Network modes can be used for search


accuracy based on userintent. Based on the user’s intent, these
models show things that are relevant to the person. For instance,
when we search for Python functions most of the time, then the
web search model activates our intent and it makes sure to show
relevant things to the user.

Mail Spam Filtering: Gmail is using Bayesian Models to filter the


mails by readingor understanding the context of mail. For
instance, we may have observed spamemails in the spam folder
in Gmail. So, how are these emails classified as spams? Using
the Bayesian model, which observes the mail and based on the
previousexperience and the probability it classifies mail as spam
or not.
Biomonitoring: Bayesian Models are used to quantify the
concentration of chemicals in blood and human tissues. These use
indicators to measure blood and urine. To find the level of ECCs,
one can conduct biometric studies.

Information Retrieval: Models can be used for retrieving


information from the database. During this process, we refine our
problem multiple times. It is used to reduce information overload.
7. Define causal networks. Give an example?
Causal reasoning is a crucial element to how humans understand, explain, and
make decisions about the world. Causal AI means automating causal reasoning with
machine learning. Today’s learning machines have superhuman predictionability but aren't
particularly good at causal reasoning, even when we train them on obscenely large
amounts of data. In this book, you will learn how to write algorithms that capture causal
reasoning in the context of machine learning and automated data science.

8. Define Exact inference. Give an example?


 It is the term used when inference is performed exactly (subject to
standardnumerical rounding errors).
 Exact inference is applicable to a large range of problems, but may not
bepossible when combinations/paths get large.

9. What is approximate Inference. Give an example?

 Wider class of problems


 Non deterministic
 No guarantee of correct answer

10. Define Naïve Bayes model.


 It is a classification technique based on Bayes’ Theorem with an assumption of
independence among predictors. In simple terms, a Naive Bayes classifier assumes that
the presence of a particular feature in a class is unrelated to the presence of any other
feature.
 For example, a fruit may be considered to be an apple if it is red, round, and about 3
inches in diameter.
 Even if these features depend on each other or upon the existence of the other
features, all of these properties independently contribute to the probability that this
fruit is an apple and that is why it is known as ‘Naive’.
 Naive Bayes model is easy to build and particularly useful for
very large data sets. Along with simplicity, Naive Bayes is known to
outperform even highly sophisticated classification methods.
 Bayes theorem provides a way of calculating posterior probability P(c|x) from
P(c), P(x) and P(x|c). Look at the equation below:

Above,

 P(c|x) is the posterior probability of class (c, target) given predictor (x,
attributes).

 P(c) is the prior probability of class.


 P(x|c) is the likelihood which is the probability of predictor given class.
 P(x) is the prior probability of predictor.

[Link] are the logic used in reasoning with uncertain information.

Logic for reasoning with uncertain information includes Probabilistic Reasoning, which uses probabilities to
represent belief and likelihood, and Fuzzy Logic, which handles vagueness and imprecision by allowing degrees
of truth rather than strict true/false values. Non-monotonic Reasoning extends rules to reason with
incomplete information, where conclusions can be retracted with new evidence

12. What is causal network.

A causal network is a graphical model using nodes (representing variables) and directed edges (arrows) to
show cause-and-effect relationships between them. These networks visually map how changes in one
variable directly influence another, helping to understand the underlying mechanisms of a complex
system, and are used in fields like artificial intelligence, data science, and epidemiology for prediction
and decision-making

PART-B (15 Marks)

1. Explain detail about Exact inference in BN with example.


Exact inference:

In exact inference, we analytically compute the conditional probability distribution over


the variables of interest.
But sometimes, that’s too hard to do, in which case we can use approximation techniques based
on statistical sampling

Given a Bayesian network, what questions might we want to ask?

a. Conditional probability query: P(x | e)

b. Maximum a posteriori probability: What value of x maximizes P(x|e) ?

General question: What’s the whole probability distribution over variable X given evidence e, P(X |

e)?

In our discrete probability situation, the only way to answer a MAP query is to compute
the probability of x given e for all possible values of x and see which one is greatest

So, in general, we’d like to be able to compute a whole probability distribution over some variable or

variables X, given instantiations of a set of variables e

Using the joint distribution

To answer any query involving a conjunction of variables, sum over the variables not involved in
the query

Given the joint distribution over the variables, we can easily answer any question about the
value of a single variable by summing (or marginalizing) over the other variables.

So, in a domain with four variables, A, B, C, and D, the probability that variable D has value d is
the sum over all possible combinations of values of the other three variables of the joint
probability of all four values. This is exactly the same as the procedure we went through in the
last lecture, where to compute the probability of cavity, we added up the probability of cavity
and toothache and the probability of cavity and not toothache.

In general, we’ll use the first notation, with a single summation indexed by a list of variable
names, and a joint probability expression that mentions values of those variables. But here we
can see the completely written-out definition, just so we all know what the shorthand is
supposed to mean.

To compute a conditional probability, we reduce it to a ratio of conjunctive queries using the


definition of conditional probability, and then answer each of those queries by marginalizing out
the variables not mentioned.

In the numerator, here, you can see that we’re only summing over variables A and C, because b
and d are instantiated in the query.

We’re going to learn a general purpose algorithm for answering these joint queries fairly
efficiently. We’ll start by looking at a very simple case to build up our intuitions, then we’ll write
down the algorithm, then we’ll apply it to a more complex case.

Here’s our very simple case. It’s a bayes net with four nodes, arranged in a chain.

So, we know from before that the probability that variable D has some value little d is the sum
over A, B, and C of the joint distribution, with d fixed.
Now, using the chain rule of Bayesian networks, we can write down the joint probability as a
product over the nodes of the probability of each node’s value given the values of its parents. So,
in this case, we get P(d|c) times P(c|b) times P(b|a) times P(a).

This expression gives us a method for answering the query, given the conditional probabilities
that are stored in the net. And this method can be applied directly to any other bayes net. But
there’s a problem with it: it requires enumerating all possible combinations of assignments to A,
B, and C, and then, for each one, multiplying the factors for each node. That’s an enormous
amount of work and we’d like to avoid it if at all possible.

So, we’ll try rewriting the expression into something that might be more efficient to evaluate.
First, we can make our summation into three separate summations, one over each variable.

Then, by distributivity of addition over multiplication, we can push the summations in, so that
the sum over A includes all the terms that mention A, but no others, and so on. It’s pretty clear
that this expression is the same as the previous one in value, but it can be evaluated more
efficiently. We’re still, eventually, enumerating all assignments to the three variables, but we’re
doing somewhat fewer multiplications than before. So this is still not completely satisfactory.

If you look, for a minute, at the terms inside the summation over A, you’ll see that we’re doing
these multiplications over for each value of C, which isn’t necessary, because they’re
independent of C. Our idea, here, is to do the multiplications once and store them for later use.
So, first, for each value of A and B, we can compute the product, generating a two dimensional
matrix.
Then, we can sum over the rows of the matrix, yielding one value of the sum for each possible
value of b.

We’ll call this set of values, which depends on b, f1 of b.

Now, we can substitute f1 of b in for the sum over A in our previous expression. And, effectively,
we can remove node A from our diagram. Now, we express the contribution of b, which takes
the contribution of a into account, as f_1 of b.

We can continue the process in basically the same way. We can look at the summation over b
and see that the only other variable it involves is c. We can summarize those products as a set of
factors, one for each value of c. We’ll call those factors f_2 of c.

We substitute f_2 of c into the formula, remove node b from the diagram, and now we’re down to a

simple expression in which d is known and we have to sum over values of c.


Variable Elimination Algorithm

Given a Bayesian network, and an elimination order for the non-query variables , compute

For i = m downto 1

 remove all the factors that mention Xi


 multiply those factors, getting a value for each combination of mentioned variables
 sum over Xi
 put this new factor into the factor set

That was a simple special case. Now we can look at the algorithm in the general case. Let’s
assume that we’re given a Bayesian network and an ordering on the variables that aren’t fixed in
the query. We’ll come back later to the question of the influence of the order, and how we might
find a good one.

We can express the probability of the query variables as a sum over each value of each of the
non- query variables of a product over each node in the network, of the probability that that
variable has the given value given the values of its parents.

So, we’ll eliminate the variables from the inside out. Starting with variable Xm and finishing with

variable X1.

To eliminate variable Xi, we start by gathering up all of the factors that mention Xi, and removing

them from our set of factors. Let’s say there are k such factors.

Now, we make a k+1 dimensional table, indexed by Xi as well as each of the other variables that
is mentioned in our set of factors.

We then sum the table over the Xi dimension, resulting in a k-dimensional

table. This table is our new factor, and we put a term for it back into our set

of factors. Once we’ve eliminated all the summations, we have the desired

value.

One more example


Here’s a more complicated example, to illustrate the variable elimination algorithm in a more
general case. We have this big network that encodes a domain for diagnosing lung disease.
(Dyspnea, as I understand it, is shortness of breath).

We’ll do variable elimination on this graph using elimination order A, B, L, T, S, X, V.

So, we start by eliminating V. We gather the two terms that mention V and see that they also
involve variable T. So, we compute the product for each value of T, and summarize those in the
factor f1 of T.

Now we can substitute that factor in for the summation, and remove the node from the network.

The next variable to be eliminated is X. There is actually only one term involving X, and it also
involves variable A. So, for each value of A, we compute the sum over X of P(x|a). But wait! We
know what this value is! If we fix a and sum over x, these probabilities have to add up to 1.

So, rather than adding another factor to our expression, we can just remove the whole sum. In
general, the only nodes that will have an influence on the probability of D are its ancestors.

Now, it’s time to eliminate S. We find that there are three terms involving S, and we gather them
into the sum. These three terms involve two other variables, B and L. So we have to make a
factor that specifies, for each value of B and L, the value of the sum of products.

We’ll call that factor f_2 of b and l.

Now we can substitute that factor back into our expression. We can also eliminate node S. But in
eliminating S, we’ve added a direct dependency between L and B (they used to be dependent via
S, but now the dependency is encoded explicitly in f2(b). We’ll show that in the graph by drawing
a line between the two nodes. It’s not exactly a standard directed conditional dependence, but
it’s still useful to show that they’re coupled.

Now we eliminate T. It involves two terms, which themselves involve variables A and L. So we
make a new factor f3 of A and L.

We can substitute in that factor and eliminate T. We’re getting close!

Next we eliminate L. It involves these two factors, which depend on variables A and B. So we
make a new factor, f4 of A and B, and substitute it in. We remove node L, but couple A and B.

At this point, we could just do the summations over A and B and be done. But to finish out the

algorithm the way a computer would, it’s time to eliminate variable B.


It involves both of our remaining terms, and it seems to depend on variables A and D. However,
in this case, we’re interested in the probability of a particular value, little d of D, and so the
variable d is instantiated. Thus, we can treat it as a constant in this expression, and we only need
to generate a factor over a, which we’ll call f5 of a. And we can now, in some sense, remove D
from our network as well (because we’ve already factored it into our answer).

Finally, to get the probability that variable D has value little d, we simply sum factor f5 over all
values of a. Yay! We did it.

Properties of Variable Elimination

Let’s see how the variable elimination algorithm performs, both in theory and in practice.

 Time is exponential in size of largest factor


 Bad elimination order can generate huge factors
 NP Hard to find the best elimination order
 Even the best elimination order may generate large factors
 There are reasonable heuristics for picking an elimination order (such as choosing
the variable that results in the smallest next factor)
 Inference in polytrees (nets with no cycles) is linear in size of the network (the largest CPT)
 Many problems with very large nets have only small factors, and thus efficient inference

First of all, it’s pretty easy to see that it runs in time exponential in the number of variables involved
in the largest factor. Creating a factor with k variables involves making a k+1 dimensional table. If you
have b values per variable, that’s a table of size b^(k+1). To make each entry, you have to multiply at
most n numbers, where n is the number of nodes. We have to do this for each variable to be
eliminated (which is usually close to n). So we have something like time = O(n^2 b^k).

How big the factors are depends on the elimination order. You’ll see in one of the recitation exercises
just how dramatic the difference in factor sizes can be. A bad elimination order can generate huge
factors.

So, we’d like to use the elimination order that generates the smallest factors. Unfortunately, it turns
out to be NP hard to find the best elimination order.

At least, there are some fairly reasonable heuristics for choosing an elimination order. It’s usually
done dynamically. So, rather than fixing the elimination order in advance, as we suggested in the
algorithm description, you can pick the next variable to be eliminated depending on the situation. In
particular, one reasonable heuristic is to pick the variable to eliminate next that will result in the
smallest factor. This greedy approach won’t always be optimal, but it’s not usually too bad.

There is one case where Bayes net inference in general, and the variable elimination algorithm in
particular is fairly efficient, and that’s when the network is a polytree. A polytree is a network with no
cycles. That is, a network in which, for any two nodes, there is only one path between them. In a
polytree, inference is linear in the size of the network, where the size of the network is defined to be
the size of the largest conditional probability table (or exponential in the maximum number of
parents of any node). In a polytree, the optimal elimination order is to start at the root nodes, and
work downwards, always eliminating a variable that no longer has any parents. In doing so, we never
introduce additional connections into the network.

So, inference in polytrees is efficient, and even in many large non-polytree networks, it’s possible to

keep the factors small, and therefore to do inference relatively efficiently.

When the network is such that the factors are, of necessity, large, we’ll have to turn to a different class

of methods.

. 2. Explain about Bayesian network with an example.


1. Bayesian networks or Belief networks

Bayesian Belief Network in artificial intelligence

Bayesian belief network is key computer technology for dealing with probabilistic events and to solve a
problem which has uncertainty. We can define a Bayesian network as:

"A Bayesian network is a probabilistic graphical model which represents a set of variables and their
conditional dependencies using a directed acyclic graph."

It is also called a Bayes network, belief network, decision network, or Bayesian model.

Bayesian networks are probabilistic, because these networks are built from a probability distribution, and also
use probability theory for prediction and anomaly detection.

Real world applications are probabilistic in nature, and to represent the relationship between multiple events, we
need a Bayesian network. It can also be used in various tasks including prediction, anomaly detection,
diagnostics, automated insight, reasoning, time series prediction, and decision making under uncertainty.

Bayesian Network can be used for building models from data and experts opinions, and it consists of two parts:

o Directed Acyclic Graph


o Table of conditional probabilities.

The generalized form of Bayesian network that represents and solve decision problems under uncertain
knowledge is known as an Influence diagram.

A Bayesian network graph is made up of nodes and Arcs (directed links), where:

o Each node corresponds to the random variables, and a variable can be continuous or discrete.
o Arc or directed arrows represent the causal relationship or conditional probabilities between random
variables. These directed links or arrows connect the pair of nodes in the
graph. These links represent that one node directly influence the other node, and if there is no directed
link that means that nodes are independent with each other
o In the above diagram, A, B, C, and D are random variables represented by the nodes of
the network graph.
o If we are considering node B, which is connected with node A by a directed arrow, then
node A is called the parent of Node B.
o Node C is independent of node A.

Note: The Bayesian network graph does not contain any cyclic graph. Hence, it is known as a directed
acyclic graph or DAG

The Bayesian network has mainly two components:

o Causal Component
o Actual numbers

Each node in the Bayesian network has condition probability distribution P(Xi |Parent(Xi) ), which determines
the effect of the parent on that node.

Bayesian network is based on Joint probability distribution and conditional probability. So let's first understand
the joint probability distribution:

Joint probability distribution:

If we have variables x1, x2, x3, , xn, then the probabilities of a different combination of x1, x2, x3.. xn, are known as

Joint probability distribution.

P[x1, x2, x3, , xn], it can be written as the following way in terms of the joint probability distribution.

= P[x1| x2, x3,....., xn]P[x2, x3, , xn]

= P[x1| x2, x3,....., xn]P[x2|x3,....., xn]. P[xn-1|xn]P[xn].

In general for each variable Xi, we can write the equation as:

P(Xi|Xi-1,........., X1) = P(Xi |Parents(Xi ))

Explanation of Bayesian network:

Let's understand the Bayesian network through an example by creating a directed acyclic graph:

Example: Harry installed a new burglar alarm at his home to detect burglary. The alarm reliably responds at
detecting a burglary but also responds for minor earthquakes. Harry has two neighbors David and Sophia, who
have taken a responsibility to inform Harry at work when they hear the alarm. David always calls Harry when he
hears the alarm, but sometimes he got confused with the phone ringing and calls at that time too. On the other
hand, Sophia likes to listen to high music, so sometimes she misses to hear the alarm. Here we would like to
compute the probability of Burglary Alarm.

Problem:

Calculate the probability that alarm has sounded, but there is neither a burglary, nor an earthquake occurred, and David and Sophia both called
the Harry.

Solution:

o The Bayesian network for the above problem is given below. The network structure is showing that
burglary and earthquake is the parent node of the alarm and directly affecting the probability of alarm's
going off, but David and Sophia's calls depend on alarm probability.
o The network is representing that our assumptions do not directly perceive the burglary and also do not
notice the minor earthquake, and they also not confer before calling.
o The conditional distributions for each node are given as conditional probabilities table or CPT.
o Each row in the CPT must be sum to 1 because all the entries in the table represent an exhaustive set of
cases for the variable.
o In CPT, a boolean variable with k boolean parents contains 2K probabilities. Hence, if there are two
parents, then CPT will contain 4 probability values

List of all events occurring in this network:

o Burglary (B)
o Earthquake(E)
o Alarm(A)
o David Calls(D)

o Sophia calls(S)

We can write the events of problem statement in the form of probability: P[D, S, A, B, E], can rewrite the
above probability statement using joint probability distribution:

P[D, S, A, B, E]= P[D | S, A, B, E]. P[S, A, B, E]

=P[D | S, A, B, E]. P[S | A, B, E]. P[A, B, E]

= P [D| A]. P [ S| A, B, E]. P[ A, B, E]

= P[D | A]. P[ S | A]. P[A| B, E]. P[B, E]

= P[D | A ]. P[S | A]. P[A| B, E]. P[B |E]. P[E]

Let's take the observed probability for the Burglary and earthquake component:
P(B= True) = 0.002, which is the probability of burglary.

P(B= False)= 0.998, which is the probability of no burglary.

P(E= True)= 0.001, which is the probability of a minor earthquake

P(E= False)= 0.999, Which is the probability that an earthquake not

occurred. We can provide the conditional probabilities as per the below

tables: Conditional probability table for Alarm A:

The Conditional probability of Alarm A depends on Burglar and earthquake:

B E P(A= True) P(A= False)

True True 0.94 0.06

True False 0.95 0.04

False True 0.31 0.69

False False 0.001 0.999

Conditional probability table for David Calls:

The Conditional probability of David that he will call depends on the probability of Alarm.

A P(D= True) P(D= False)

True 0.91 0.09

False 0.05 0.95

Conditional probability table for Sophia Calls:

The Conditional probability of Sophia that she calls is depending on its Parent Node "Alarm."

A P(S= True) P(S= False)

True 0.75 0.25

False 0.02 0.98

From the formula of joint distribution, we can write the problem statement in the form of probability distribution:

P(S, D, A, ¬B, ¬E) = P (S|A) *P (D|A)*P (A|¬B ^ ¬E) *P (¬B) *P (¬E).

= 0.75* 0.91* 0.001* 0.998*0.999


= 0.00068045.

Hence, a Bayesian network can answer any query about the domain by using Joint

distribution. The semantics of Bayesian Network

There are two ways to understand the semantics of the Bayesian network, which is given below:

1. To understand the network as the representation of the Joint probability distribution.

It is helpful to understand how to construct the network.

2. To understand the network as an encoding of a collection of conditional independence statements.

It is helpful in designing inference procedure.

3. Explain detail about Approximate inference in BN with example.


1. Approximate inference:

Sampling

To get approximate answer we can do stochastic simulation (sampling).

Another strategy, which is a theme that comes up also more and more in AI actually, is to say, well,
we didn't really want the right answer anyway. Let's try to do an approximation. And you can also
show that it's computationally hard to get an approximation that's within epsilon of the answer that
you want, but again that doesn't keep us from trying.
So, the other thing that we can do is the stochastic simulation or sampling. In sampling, what we do
is we look at the root nodes of our graph, and attached to this root node is some probability that A is
going to be true, right? Maybe it's .4. So we flip a coin that comes up heads with probability .4 and
see if we get true or false.

We flip our coin, let's say, and we get true for A -- this time. And now, given the assignment of true to
A, we look in the conditional probability table for B given A = true, and that gives us a probability for
B.

Now, we flip a coin with that probability. Say we get False. We enter that into the table.

We do the same thing for C, and let’s say we get True.

Now, we look in the CPT for D given B and C, for the case where B is false and C is true, and we flip a
coin with that probability, in order to get a value for D.

So, there's one sample from the joint distribution of these four variables. And you can just keep
doing this, all day and all night, and generate a big pile of samples, using that algorithm. And now you
can ask various questions.

Estimate:

P*(D|A) = #D,A / #A

Let's say you want to know the probability of D given A. How would you answer - - given all the
examples -- what would you do to compute the probability of D given A? You would just count. You’d
count the number of cases in which A and D were true, and you’d divide that by the number

of cases in which A was true, and that would give you an unbiased estimate of the probability of D given A.
The

more samples, the more confidence you’d have that the estimated probability is close to the true one.

Estimation

 Some probabilities are easier than others to estimate


 In generating the table, the rare events will not be well represented
 P(Disease| spots-on-your-tongue, sore toe)
 If spots-on-your-tongue and sore toe are not root nodes, you would generate a huge table
but the cases of interest would be very sparse in the table
 Importance sampling lets you focus on the set of cases that are important to answering
your question
It's going to turn out that some probabilities are easier than other ones to estimate.
Exactly because of the process we’re using to generate the samples, the majority of them will be the
typical cases. Oh, it's someone with a cold, someone with a cold, someone with a cold, someone with
a cold, someone with a cold, someone with malaria, someone with a cold, someone with a cold. So
the rare results are not going to come up very often. And so doing this sampling naively can make it
really hard to estimate the probability of a rare event. If it's something that happens one in ten
thousand times, well, you know for sure you're going to need, some number of tens of thousands of
samples to get even a reasonable estimate of that probability.

Imagine that you want to estimate the probability of some disease given -- oh, I don't know -- spots
on your tongue and a sore toe. Somebody walks in and they have a really peculiar set of symptoms,
and you want to know what's the probability that they have some disease.

Well, if the symptoms are root nodes, it's easy. If the symptoms were root nodes, you could just
assign the root nodes to have their observed values and then simulate the rest of the network as
before.

But if the symptoms aren't root nodes then if you do naïve sampling, you would generate a giant
table of samples, and you'd have to go and look and say, gosh, how many cases do I have where
somebody has spots on their tongue and a sore toe; and the answer would be, well, maybe zero or
not very many.

There’s a technique called importance sampling, which allows you to draw examples from a
distribution that’s going to be more helpful and then reweight them so that you can still get an
unbiased estimate of the desired conditional probability. It’s a bit beyond the scope of this class to
get into the details, but it’s an important and effective idea.

Recitation Problem

• Do the variable elimination algorithm on the net below using the elimination order A,B,C (that
is, eliminate node C first). In computing P(D=d), what factors do you get?

• What if you wanted to compute the whole marginal distribution P(D)?

Here’s the network we started with. We used elimination order C, B, A (we eliminated A first). Now
we’re going to explore what happens when we eliminate the variables in the opposite order. First,
work on the case we did, where we’re trying to calculate the probability that node D takes on a
particular value, little d. Remember that little d is a constant in this case. Now, do the case where
we’re trying to find the whole distribution over D, so we don’t know a particular value for little d.
Another Recitation Problem

Find an elimination order that keeps the factors small for the net below, or show that there is no
such order.

Here’s a pretty complicated graph. But notice that no node has more than 2 parents, so none of the
CPTs are huge. The question is, is this graph hard for variable elimination? More concretely, can you
find an elimination order that results only in fairly small factors? Is there an elimination order that
generates a huge factor?

The Last Recitation Problem

Bayesian networks (or related models) are often used in computer vision, but they almost always
require sampling. What happens when you try to do variable elimination on a model like the grid
below?

[Link] about Bayesian inference with an example

1. Bayesian inference

Bayes' theorem in Artificial intelligence

Bayes' theorem:
Bayes' theorem is also known as Bayes' rule, Bayes' law, or Bayesian reasoning, which determines the
probability of an event with uncertain knowledge.

In probability theory, it relates the conditional probability and marginal probabilities of two random events.

Bayes' theorem was named after the British mathematician Thomas Bayes. The Bayesian inference is an
application of Bayes' theorem, which is fundamental to Bayesian statistics.

It is a way to calculate the value of P(B|A) with the knowledge of P(A|B).

Bayes' theorem allows updating the probability prediction of an event by observing new information of the real world.

Example: If cancer corresponds to one's age then by using Bayes' theorem, we can determine the probability of
cancer more accurately with the help of age.

Bayes' theorem can be derived using product rule and conditional probability of event A with known

event B: As from product rule we can write:

P(A ⋀ B)= P(A|B) P(B) or

Similarly, the probability of event B with known event A:

P(A ⋀ B)= P(B|A) P(A)

Equating right hand side of both the equations, we will get:

The above equation (a) is called as Bayes' rule or Bayes' theorem. This equation is basic of most modern AI
systems for probabilistic inference.

It shows the simple relationship between joint and conditional probabilities. Here,

P(A|B) is known as posterior, which we need to calculate, and it will be read as Probability of hypothesis A when
we have occurred an evidence B.

P(B|A) is called the likelihood, in which we consider that hypothesis is true, then we calculate the probability of
evidence.

P(A) is called the prior probability, probability of hypothesis before considering the

evidence P(B) is called marginal probability, pure probability of an evidence.


In the equation (a), in general, we can write P (B) = P(A)*P(B|Ai), hence the Bayes' rule can be written as:

Where A1, A2, A3,. , An is a set of mutually exclusive and exhaustive events.

Applying Bayes' rule:

Bayes' rule allows us to compute the single term P(B|A) in terms of P(A|B), P(B), and P(A). This is very useful in
cases where we have a good probability of these three terms and want to determine the fourth one. Suppose we
want to perceive the effect of some unknown cause, and want to compute that cause, then the Bayes' rule
becomes:

Example-1:

Question: what is the probability that a patient has diseases meningitis with a stiff neck?

Given Data:

A doctor is aware that disease meningitis causes a patient to have a stiff neck, and it occurs 80% of the time.
He is also aware of some more facts, which are given as follows:

o The Known probability that a patient has meningitis disease is 1/30,000.


o The Known probability that a patient has a stiff neck is 2%.

Let a be the proposition that patient has stiff neck and b be the proposition that patient has meningitis. , so we
can calculate the following as:

P(a|b) = 0.8

P(b) =

1/30000

P(a)= .02

Hence, we can assume that 1 patient out of 750 patients has meningitis disease with a stiff neck.
Example-2:

Question: From a standard deck of playing cards, a single card is drawn. The probability that the card is
king is 4/52, then calculate posterior probability P(King|Face), which means the drawn face card is a king
card.

Solution:

P(king): probability that the card is King= 4/52= 1/13

P(face): probability that a card is a face card= 3/13

P(Face|King): probability of face card when we assume it is a king = 1

Putting all values in equation (i) we will get:

Application of Bayes' theorem in Artificial intelligence:

Following are some applications of Bayes' theorem:

o It is used to calculate the next step of the robot when the already executed step is given.
o Bayes' theorem is helpful in weather forecasting.
o It can solve the Monty Hall problem.

[Link] about propabilistic reasoning with an example

1. Probabilistic reasoning

Probabilistic reasoning:

Probabilistic reasoning is a way of knowledge representation where we apply the concept of probability to indicate
the uncertainty in knowledge. In probabilistic reasoning, we combine probability theory with logic to handle the
uncertainty.
We use probability in probabilistic reasoning because it provides a way to handle the uncertainty that is the result of
someone's laziness and ignorance.

In the real world, there are lots of scenarios, where the certainty of something is not confirmed, such as "It will rain
today," "behavior of someone for some situations," "A match between two teams or two players." These are probable
sentences for which we can assume that it will happen but not sure about it, so here we use probabilistic reasoning.

Need of probabilistic reasoning in AI:

o When there are unpredictable outcomes.


o When specifications or possibilities of predicates becomes too large to handle.
o When an unknown error occurs during an experiment.

In probabilistic reasoning, there are two ways to solve problems with uncertain knowledge:

o Bayes' rule
o Bayesian Statistics

As probabilistic reasoning uses probability and related terms, so before understanding probabilistic reasoning, let's
understand some common terms:

Probability: Probability can be defined as a chance that an uncertain event will occur. It is the numerical measure of
the likelihood that an event will occur. The value of probability always remains between 0 and 1 that represent ideal
uncertainties.

0 ≤ P(A) ≤ 1, where P(A) is the probability of an event A.

P(A) = 0, indicates total uncertainty in an event A.

P(A) =1, indicates total certainty in an event A.

We can find the probability of an uncertain event by using the below formula.

o P(¬A) = probability of a not happening event.


o P(¬A) + P(A) = 1.

Event: Each possible outcome of a variable is called an event.

Sample space: The collection of all possible events is called sample space.

Random variables: Random variables are used to represent the events and objects in the real world.

Prior probability: The prior probability of an event is probability computed before observing new information.
Posterior Probability: The probability that is calculated after all evidence or information has taken into account. It is
a combination of prior probability and new information.

Conditional probability:

Conditional probability is a probability of occurring an event when another event has already happened.

Let's suppose, we want to calculate the event A when event B has already occurred, "the probability of A under the

conditions of B", it can be written as:

Where P(A⋀B)= Joint probability of a and B

P(B)= Marginal probability of B.

If the probability of A is given and we need to find the probability of B, then it will be given as:

It can be explained by using the below Venn diagram, where B is occurred event, so sample space will be reduced to
set B, and now we can only calculate event A when event B is already occurred by dividing the probability of P(A⋀B)
by P( B ).

Example:

In a class, there are 70% of the students who like English and 40% of the students who likes English and mathematics,
and then what is the percent of students those who like English also like mathematics?

Solution:

Let, A is an event that a student likes Mathematics

B is an event that a student likes English.


Hence, 57% are the students who like English also like Mathematics.

6. Define causal networks. Analyze the causal networks with an example.

1. Casual Networks:

A causal network is an acyclic digraph arising from an evolution of a substitution system, and
representing its history. The illustration above shows a causal network corresponding to the
rules

(applied in a left-to-right scan) and initial condition .

The figure above shows the procedure for diagrammatically creating a causal network from
a mobile automaton.
In an evolution of a multiway system, each substitution event is a vertex in a causal network.
Two events which are related by causal dependence, meaning one occurs just before the other,
have an edge between the corresponding vertices in the causal network. More precisely, the edge
is a directed edge leading from the past event to the future event.
Some causal networks are independent of the choice of evolution, and these are called causally
invariant.

[Link] the detail about Inference in Bayesian Network

1. Inference in Bayesian Networks


1. Exact inference
2. Approximate inference

1. Exact inference:

In exact inference, we analytically compute the conditional probability distribution over the
variables of interest.

But sometimes, that’s too hard to do, in which case we can use approximation techniques based on
statistical sampling

Given a Bayesian network, what questions might we want to ask?

• Conditional probability query: P(x | e)

• Maximum a posteriori probability: What value of x maximizes P(x|e) ?

General question: What’s the whole probability distribution over variable X given evidence e, P(X |

e)?

In our discrete probability situation, the only way to answer a MAP query is to compute the
probability of x given e for all possible values of x and see which one is greatest

So, in general, we’d like to be able to compute a whole probability distribution over some variable or

variables X, given instantiations of a set of variables e

Using the joint distribution

To answer any query involving a conjunction of variables, sum over the variables not involved in the
query
Given the joint distribution over the variables, we can easily answer any question about the value of
a single variable by summing (or marginalizing) over the other variables.

So, in a domain with four variables, A, B, C, and D, the probability that variable D has value d is the
sum over all possible combinations of values of the other three variables of the joint probability of
all four values. This is exactly the same as the procedure we went through in the last lecture, where
to compute the probability of cavity, we added up the probability of cavity and toothache and the
probability of cavity and not toothache.

In general, we’ll use the first notation, with a single summation indexed by a list of variable names,
and a joint probability expression that mentions values of those variables. But here we can see the
completely written-out definition, just so we all know what the shorthand is supposed to mean.

To compute a conditional probability, we reduce it to a ratio of conjunctive queries using the


definition of conditional probability, and then answer each of those queries by marginalizing out the
variables not mentioned.

In the numerator, here, you can see that we’re only summing over variables A and C, because b and
d are instantiated in the query.
We’re going to learn a general purpose algorithm for answering these joint queries fairly efficiently.
We’ll start by looking at a very simple case to build up our intuitions, then we’ll write down the
algorithm, then we’ll apply it to a more complex case.

Here’s our very simple case. It’s a bayes net with four nodes, arranged in a chain.

So, we know from before that the probability that variable D has some value little d is the sum over
A, B, and C of the joint distribution, with d fixed.

Now, using the chain rule of Bayesian networks, we can write down the joint probability as a
product over the nodes of the probability of each node’s value given the values of its parents. So, in
this case, we get P(d|c) times P(c|b) times P(b|a) times P(a).
This expression gives us a method for answering the query, given the conditional probabilities that
are stored in the net. And this method can be applied directly to any other bayes net. But there’s a
problem with it: it requires enumerating all possible combinations of assignments to A, B, and C, and
then, for each one, multiplying the factors for each node. That’s an enormous amount of work and
we’d like to avoid it if at all possible.

So, we’ll try rewriting the expression into something that might be more efficient to evaluate. First,
we can make our summation into three separate summations, one over each variable.

Then, by distributivity of addition over multiplication, we can push the summations in, so that the
sum over A includes all the terms that mention A, but no others, and so on. It’s pretty clear that this
expression is the same as the previous one in value, but it can be evaluated more efficiently. We’re
still, eventually, enumerating all assignments to the three variables, but we’re doing somewhat
fewer multiplications than before. So this is still not completely satisfactory.

If you look, for a minute, at the terms inside the summation over A, you’ll see that we’re doing these
multiplications over for each value of C, which isn’t necessary, because they’re independent of C.
Our idea, here, is to do the multiplications once and store them for later use. So, first, for each value
of A and B, we can compute the product, generating a two dimensional matrix.
Then, we can sum over the rows of the matrix, yielding one value of the sum for each possible value
of b.

We’ll call this set of values, which depends on b, f1 of b.

Now, we can substitute f1 of b in for the sum over A in our previous expression. And, effectively, we
can remove node A from our diagram. Now, we express the contribution of b, which takes the
contribution of a into account, as f_1 of b.

We can continue the process in basically the same way. We can look at the summation over b and
see that the only other variable it involves is c. We can summarize those products as a set of factors,
one for each value of c. We’ll call those factors f_2 of c.

We substitute f_2 of c into the formula, remove node b from the diagram, and now we’re down to a
simple expression in which d is known and we have to sum over values of c.

Variable Elimination Algorithm

Given a Bayesian network, and an elimination order for the non-query variables , compute

For i = m downto 1

 remove all the factors that mention Xi


 multiply those factors, getting a value for each combination of mentioned variables
 sum over Xi
 put this new factor into the factor set

That was a simple special case. Now we can look at the algorithm in the general case. Let’s assume
that we’re given a Bayesian network and an ordering on the variables that aren’t fixed in the query.
We’ll come back later to the question of the influence of the order, and how we might find a good
one.

We can express the probability of the query variables as a sum over each value of each of the non-
query variables of a product over each node in the network, of the probability that that variable has
the given value given the values of its parents.

So, we’ll eliminate the variables from the inside out. Starting with variable Xm and finishing with

variable X1.

To eliminate variable Xi, we start by gathering up all of the factors that mention Xi, and removing

them from our set of factors. Let’s say there are k such factors.

Now, we make a k+1 dimensional table, indexed by Xi as well as each of the other variables that is
mentioned in our set of factors.

We then sum the table over the Xi dimension, resulting in a k-dimensional table.

This table is our new factor, and we put a term for it back into our set of factors.

Once we’ve eliminated all the summations, we have the desired [Link]

more example
Here’s a more complicated example, to illustrate the variable elimination algorithm in a more
general case. We have this big network that encodes a domain for diagnosing lung disease.
(Dyspnea, as I understand it, is shortness of breath).

We’ll do variable elimination on this graph using elimination order A, B, L, T, S, X, V.

So, we start by eliminating V. We gather the two terms that mention V and see that they also
involve variable T. So, we compute the product for each value of T, and summarize those in the
factor f1 of T.

Now we can substitute that factor in for the summation, and remove the node from the network.

The next variable to be eliminated is X. There is actually only one term involving X, and it also
involves variable A. So, for each value of A, we compute the sum over X of P(x|a). But wait! We
know what this value is! If we fix a and sum over x, these probabilities have to add up to 1.

So, rather than adding another factor to our expression, we can just remove the whole sum. In
general, the only nodes that will have an influence on the probability of D are its ancestors.

Now, it’s time to eliminate S. We find that there are three terms involving S, and we gather them
into the sum. These three terms involve two other variables, B and L. So we have to make a factor
that specifies, for each value of B and L, the value of the sum of products.

We’ll call that factor f_2 of b and l.

Now we can substitute that factor back into our expression. We can also eliminate node S. But in
eliminating S, we’ve added a direct dependency between L and B (they used to be dependent via S,
but now the dependency is encoded explicitly in f2(b). We’ll show that in the graph by drawing a line
between the two nodes. It’s not exactly a standard directed conditional dependence, but it’s still
useful to show that they’re coupled.

Now we eliminate T. It involves two terms, which themselves involve variables A and L. So we make
a new factor f3 of A and L.
We can substitute in that factor and eliminate T. We’re getting close!

Next we eliminate L. It involves these two factors, which depend on variables A and B. So we make a
new factor, f4 of A and B, and substitute it in. We remove node L, but couple A and B.

At this point, we could just do the summations over A and B and be done. But to finish out the

algorithm the way a computer would, it’s time to eliminate variable B.

It involves both of our remaining terms, and it seems to depend on variables A and D. However, in
this case, we’re interested in the probability of a particular value, little d of D, and so the variable d
is instantiated. Thus, we can treat it as a constant in this expression, and we only need to generate a
factor over a, which we’ll call f5 of a. And we can now, in some sense, remove D from our network
as well (because we’ve already factored it into our answer).
Finally, to get the probability that variable D has value little d, we simply sum factor f5 over all
values of a. Yay! We did it.

Properties of Variable Elimination

Let’s see how the variable elimination algorithm performs, both in theory and in practice.

 Time is exponential in size of largest factor


 Bad elimination order can generate huge factors
 NP Hard to find the best elimination order

 Even the best elimination order may generate large factors


 There are reasonable heuristics for picking an elimination order (such as choosing the
variable that results in the smallest next factor)
 Inference in polytrees (nets with no cycles) is linear in size of the network (the largest CPT)
 Many problems with very large nets have only small factors, and thus efficient inference

First of all, it’s pretty easy to see that it runs in time exponential in the number of variables involved in
the largest factor. Creating a factor with k variables involves making a k+1 dimensional table. If you have
b values per variable, that’s a table of size b^(k+1). To make each entry, you have to multiply at most n
numbers, where n is the number of nodes. We have to do this for each variable to be eliminated (which
is usually close to n). So we have something like time = O(n^2 b^k).

How big the factors are depends on the elimination order. You’ll see in one of the recitation exercises
just how dramatic the difference in factor sizes can be. A bad elimination order can generate huge
factors.

So, we’d like to use the elimination order that generates the smallest factors. Unfortunately, it turns out
to be NP hard to find the best elimination order.

At least, there are some fairly reasonable heuristics for choosing an elimination order. It’s usually done
dynamically. So, rather than fixing the elimination order in advance, as we suggested in the algorithm
description, you can pick the next variable to be eliminated depending on the situation. In particular,
one reasonable heuristic is to pick the variable to eliminate next that will result in the smallest factor.
This greedy approach won’t always be optimal, but it’s not usually too bad.

There is one case where Bayes net inference in general, and the variable elimination algorithm in
particular is fairly efficient, and that’s when the network is a polytree. A polytree is a network with no
cycles. That is, a network in which, for any two nodes, there is only one path between them. In a
polytree, inference is linear in the size of the network, where the size of the network is defined to be the
size of the largest conditional probability table (or exponential in the maximum number of parents of
any node). In a polytree, the optimal elimination order is to start at the root nodes, and work
downwards, always eliminating a variable that no longer has any parents. In doing so, we never
introduce additional connections into the network.

So, inference in polytrees is efficient, and even in many large non-polytree networks, it’s possible to

keep the factors small, and therefore to do inference relatively efficiently.

When the network is such that the factors are, of necessity, large, we’ll have to turn to a different class

of methods.

1. Approximate inference:

Sampling

To get approximate answer we can do stochastic simulation (sampling).


Another strategy, which is a theme that comes up also more and more in AI actually, is to say, well, we
didn't really want the right answer anyway. Let's try to do an approximation. And you can also show that
it's computationally hard to get an approximation that's within epsilon of the answer that you want, but
again that doesn't keep us from trying.

So, the other thing that we can do is the stochastic simulation or sampling. In sampling, what we do is
we look at the root nodes of our graph, and attached to this root node is some probability that A is going
to be true, right? Maybe it's .4. So we flip a coin that comes up heads with probability .4 and see if we
get true or false.

We flip our coin, let's say, and we get true for A -- this time. And now, given the assignment of true to A,
we look in the conditional probability table for B given A = true, and that gives us a probability for B.

Now, we flip a coin with that probability. Say we get False. We enter that into the table.

We do the same thing for C, and let’s say we get True.

Now, we look in the CPT for D given B and C, for the case where B is false and C is true, and we flip a coin
with that probability, in order to get a value for D.

So, there's one sample from the joint distribution of these four variables. And you can just keep doing
this, all day and all night, and generate a big pile of samples, using that algorithm. And now you can ask
various questions.

Estimate:

P*(D|A) = #D,A / #A

Let's say you want to know the probability of D given A. How would you answer - - given all the
examples -- what would you do to compute the probability of D given A? You would just count. You’d
count the number of cases in which A and D were true, and you’d divide that by the number of cases in
which A was true, and that would give you an unbiased estimate of the probability of D given A. The

more samples, the more confidence you’d have that the estimated probability is close to the true one.

Estimation

 Some probabilities are easier than others to estimate


 In generating the table, the rare events will not be well represented
 P(Disease| spots-on-your-tongue, sore toe)
 If spots-on-your-tongue and sore toe are not root nodes, you would generate a huge table but
the cases of interest would be very sparse in the table
 Importance sampling lets you focus on the set of cases that are important to answering your
question
It's going to turn out that some probabilities are easier than other ones to estimate.

Exactly because of the process we’re using to generate the samples, the majority of them will be the
typical cases. Oh, it's someone with a cold, someone with a cold, someone with a cold, someone with a
cold, someone with a cold, someone with malaria, someone with a cold, someone with a cold. So the
rare results are not going to come up very often. And so doing this sampling naively can make it really
hard to estimate the probability of a rare event. If it's something that happens one in ten thousand
times, well, you know for sure you're going to need, some number of tens of thousands of samples to
get even a reasonable estimate of that probability.

Imagine that you want to estimate the probability of some disease given -- oh, I don't know -- spots on
your tongue and a sore toe. Somebody walks in and they have a really peculiar set of symptoms, and
you want to know what's the probability that they have some disease.

Well, if the symptoms are root nodes, it's easy. If the symptoms were root nodes, you could just assign
the root nodes to have their observed values and then simulate the rest of the network as before.

But if the symptoms aren't root nodes then if you do naïve sampling, you would generate a giant table
of samples, and you'd have to go and look and say, gosh, how many cases do I have where somebody
has spots on their tongue and a sore toe; and the answer would be, well, maybe zero or not very many.

There’s a technique called importance sampling, which allows you to draw examples from a distribution
that’s going to be more helpful and then reweight them so that you can still get an unbiased estimate of
the desired conditional probability. It’s a bit beyond the scope of this class to get into the details, but it’s
an important and effective idea.

Recitation Problem

• Do the variable elimination algorithm on the net below using the elimination order A,B,C (that is,
eliminate node C first). In computing P(D=d), what factors do you get?

• What if you wanted to compute the whole marginal distribution P(D)?


Here’s the network we started with. We used elimination order C, B, A (we eliminated A first).
Now we’re going to explore what happens when we eliminate the variables in the opposite
order. First, work on the case we did, where we’re trying to calculate the probability that node
D takes on a particular value, little d. Remember that little d is a constant in this case. Now, do
the case where we’re trying to find the whole distribution over D, so we don’t know a
particular value for little d.

Another Recitation Problem

Find an elimination order that keeps the factors small for the net below, or show that there is
no such order.

Here’s a pretty complicated graph. But notice that no node has more than 2 parents, so none
of the CPTs are huge. The question is, is this graph hard for variable elimination? More
concretely, can you find an elimination order that results only in fairly small factors? Is there
an elimination order that generates a huge factor?

The Last Recitation Problem

Bayesian networks (or related models) are often used in computer vision, but they almost
always require sampling. What happens when you try to do variable elimination on a model
like the grid below?

You might also like