Path vs.
State Optimization
• Previous lecture: path to goal is solution to
problem
Local Search and Optimization – systematic exploration of search space.
Chapter 4
• This lecture: a state is solution to problem
– for some problems path is irrelevant.
Adapted from IIT Delhi AI course, Berkley AI course, UW AI course
– E.g., 8-queens
• Different algorithms can be used
– Local search
1 5
Local search and optimization
Satisfaction vs. Optimization • Local search
– Keep track of single current state
Goal Satisfaction Optimization – Move only to neighboring states
– Ignore paths
reach the goal node Constraint optimize(objective function)
satisfaction Constraint Optimization
• Advantages:
– Use very little memory
– Can often find reasonable solutions in large or infinite (continuous)
state spaces.
You can go back and forth between the two problems
Typically in the same complexity class
• “Pure optimization” problems
– All states have an objective function
– Goal is to find state with max (or min) objective value
– Does not quite fit into path-cost/goal-state formulation
– Local search can do quite well on these problems. 4
Example: n-queens
Example: n-queens
Put n queens on an n×n board with no two queens on the
• Put n queens on an n x n board with no two same row, column, or diagonal.
queens on the same row, column, or diagonal
Move a queen to reduce number of conflicts.
• Is it a satisfaction problem or optimization?
5
Hill-climbing search: 8-queens problem Search Space
• State
– All 8 queens on the board in some configuration
• Successor function
– move a single queen to another square in the same column.
• Example of a heuristic function h(n):
– the number of pairs of queens that are attacking each other
• Need to convert to an optimization problem – (so we want to minimize this)
• h = number of pairs of queens that are attacking each other
• h = 17 for the above state
7
Hill-climbing search: 8-queens problem Trivial Algorithms
• Random Sampling
–Generate a state randomly
• Random Walk
–Randomly pick a neighbor of the current state
• Is this a solution?
• Both algorithms asymptotically complete.
• What is h?
11
Hill-climbing (Greedy Local Search) max version Hill-climbing search
function HILL-CLIMBING( problem) return a state that is a local maximum • “a loop that continuously moves towards increasing value”
input: problem, a problem – terminates when a peak is reached
local variables: current, a node. – Aka greedy local search
neighbor, a node.
• Value can be either
current MAKE-NODE(INITIAL-STATE[problem]) – Objective function value
loop do – Heuristic function value (minimized)
neighbor a highest valued successor of current • Hill climbing does not look ahead of the immediate neighbors
if VALUE [neighbor] ≤ VALUE[current] then return STATE[current]
current neighbor • Can randomly choose among the set of best successors
– if multiple have the best value
min version will reverse inequalities and look for lowest valued successor • “climbing Mount Everest in a thick fog with amnesia”
1
2
“Landscape” of search
Search Landscape (three-dimensions)
Hill Climbing gets stuck in local minima
depending on?
1
3
Hill-climbing on 8-queens Hill Climbing Drawbacks
• Randomly generated 8-queens starting states… • Local maxima
• 14% the time it solves the problem
• 86% of the time it get stuck at a local minimum
• Plateaus
• However…
– Takes only 4 steps on average when it succeeds • Diagonal ridges
– And 3 on average when it gets stuck
1
– (for a state space with 8^8 =~17 million states) 6
1
5
Escaping Shoulders: Sideways Move Tabu Search
• If no downhill (uphill) moves, allow sideways moves in hope
that algorithm can escape • prevent returning quickly to the same state
– Need to place a limit on the possible number of sideways moves to • Keep fixed length queue (“tabu list”)
avoid infinite loops • add most recent state to queue; drop oldest
• For 8-queens • Never make the step that is currently tabu’ed
– Now allow sideways moves with a limit of 100 • Properties:
– Raises percentage of problem instances solved from 14 to 94% – As the size of the tabu list grows, hill-climbing will
asymptotically become “non-redundant” (won’t look at the same state
– However…. twice)
• 21 steps for every successful solution – In practice, a reasonable sized tabu list (say 100 or so) improves the
• 64 for each failure 1 performance of hill climbing in many problem 19 s
7
Escaping Shoulders/local Optima Enforced Hill Climbing Hill-climbing: stochastic variations
• Perform breadth first search from a local optima • Stochastic hill-climbing
– to find the next state with better h function – Random selection among the uphill moves.
– The selection probability can vary with the steepness of the uphill
move.
• Typically,
– prolonged periods of exhaustive search
• To avoid getting stuck in local minima
– bridged by relatively quick periods of hill-climbing
–Random-walk hill-climbing
• Middle ground b/w local and systematic search –Random-restart hill-climbing
–Hill-climbing with both
21
Hill Climbing with random walk Hill-climbing with random restarts
When the state-space landscape has local minima, any • If at first you don’t succeed, try, try again!
search that moves only in the greedy direction cannot be • Different variations
complete – For each restart: run until termination vs. run for a fixed time
Random walk, on the other hand, is asymptotically – Run a fixed number of restarts or run indefinitely
complete • Analysis
– Say each search has probability p of success
• E.g., for 8-queens, p = 0.14 with no sideways moves
Idea: Put random walk into greedy hill-climbing
– Expected number of restarts?
• At each step do one of the two
– Expected number of steps taken?
– Greedy: With prob p move to the neighbor with largest value
• If you want to pick one local search algorithm, learn this one!!
– Random: With prob 1-p move to a random neighbor 22 2
2
Hill-climbing with random walk Simulated Annealing
At each step do one of the two • Simulated Annealing = physics inspired twist on random walk
Greedy: With prob p move to the neighbor with largest value • Basic ideas:
Random: With prob 1-p move to a random neighbor – like hill-climbing identify the quality of the local improvements
– instead of picking the best move, pick one randomly
Hill-climbing with both – say the change in objective function is
– if is positive, then move to that state
• At each step do one of the three – otherwise:
• move to this state with probability proportional to
– Greedy: move to the neighbor with largest value • thus: worse moves (very large negative ) are executed less often
– Random Walk: move to a random neighbor – however, there is always a chance of escaping from local maxima
– Random Restart: Resample a new current state – over time, make it less likely to accept locally bad moves
– (Can also make the size of the move random as well, i.e., allow “large” steps in state
2 2
3 space) 4
Physical Interpretation of Simulated Annealing Simulated annealing
• A Physical Analogy: function SIMULATED-ANNEALING( problem, schedule) return a solution state
• imagine letting a ball roll downhill on the function surface input: problem, a problem
• this is like hill-climbing (for minimization) schedule, a mapping from time to temperature
• now imagine shaking the surface, while the ball rolls, gradually reducing the local variables: current, a node.
amount of shaking next, a node.
• this is like simulated annealing T, a “temperature” controlling the prob. of downward steps
current MAKE-NODE(INITIAL-STATE[problem])
• Annealing = physical process of cooling a liquid or metal until for t 1 to ∞ do
particles achieve a certain frozen crystal state T schedule[t]
• simulated annealing: if T = 0 then return current
• free variables are like particles next a randomly selected successor of current
• seek “low energy” (high quality) configuration ∆E VALUE[next] - VALUE[current]
• slowly reducing temp. T with particles moving around randomly if ∆E > 0 then current next
else current next only with probability e∆E /T
25 2
6
Temperature T Simulated Annealing in Practice
– method proposed in 1983 by IBM researchers for solving VLSI layout problems
• high T: probability of “locally bad” move is higher
(Kirkpatrick et al, Science, 220:671-680, 1983).
• low T: probability of “locally bad” move is lower • theoretically will always find the global optimum
• typically, T is decreased as the algorithm runs longer
• i.e., there is a “temperature schedule” – Other applications: Traveling salesman, Graph partitioning, Graph coloring,
Scheduling, Facility Layout, Image Processing, …
– useful for some problems, but can be very slow
• slowness comes about because T must be decreased very gradually to retain optimality
2 2
7 8
Local beam search Local Beam Search (contd)
• Idea: Keeping only one node in memory is an extreme
• Not the same as k random-start searches run in parallel!
reaction to memory problems.
• Searches that find good states recruit other searches to join them
• Keep track of k states instead of one
• Problem: quite often, all k states end up on same local hill
– Initially: k randomly selected states
• Idea: Stochastic beam search
– Next: determine all successors of k states – Choose k successors randomly, biased towards good ones
– If any of successors is goal finished
– Else select k best from successors and repeat
• Observe the close analogy to natural selection!
2 3
9 0
Genetic algorithms
• Twist on Local Search: successor is generated by combining two parent states 8
7
• A state is represented as a string over a finite alphabet (e.g. binary)
– 8-queens 6 String representation
16257483
• State = position of 8 queens each in a column 5
• Start with k randomly generated states (population)
4
3
• Evaluation function (fitness function): 2
– Higher values for better states. 1
– Opposite to heuristic function, e.g., # non-attacking pairs in 8-queens
• Produce the next generation of states by “simulated evolution” Can we evolve 8-queens through genetic algorithms?
– Random selection
– Crossover
– Random mutation
3 3
1 2
Genetic algorithms Genetic algorithms
4 states for 2 pairs of 2 states New states Random
8-queens randomly selected based after crossover mutation
problem on fitness. Random applied Has the effect of “jumping” to a completely different new
crossover points selected part of the search space (quite non-local)
• Fitness function: number of non-attacking pairs of queens
(min = 0, max = 8 × 7/2 = 28)
• 24/(24+23+20+11) = 31%
• 23/(24+23+20+11) = 29% etc 3
3
3
4
Comments on Genetic Algorithms Optimization of Continuous Functions
• Genetic algorithm is a variant of “stochastic beam search”
• Positive points • Discretization
– Random exploration can find solutions that local search can’t
• (via crossover primarily)
– use hill-climbing
– Appealing connection to human evolution
• “neural” networks, and “genetic” algorithms are metaphors!
• Gradient descent
• Negative points – make a move in the direction of the gradient
– Large number of “tunable” parameters
• gradients: closed form or empirical
– Difficult to replicate performance from one problem to another
– Lack of good empirical studies comparing to simpler methods
– Useful on some (small?) set of problems but no convincing evidence that GAs are better than hill-
climbing w/random restarts in general
• Question: are GAs really optimizing the individual fitness function? Mixability? 3
6
Gradient Descent
Assume we have a continuous function: f(x1,x2,…,xN)
and we want minimize over continuous variables X1,X2,..,Xn
1. Compute the gradients for all i: f(x1,x2,…,xN) /xi
2. Take a small step downhill in the direction of the gradient:
xi xi - λf(x1,x2,…,xN) /xi
3. Repeat.
• How to select λ
– Line search: successively double
– until f starts to increase again
3 3
7 8