0% found this document useful (0 votes)
10 views10 pages

Local Search Optimization Techniques

Uploaded by

Agam Singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views10 pages

Local Search Optimization Techniques

Uploaded by

Agam Singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Path vs.

State Optimization
• Previous lecture: path to goal is solution to
problem
Local Search and Optimization – systematic exploration of search space.
Chapter 4
• This lecture: a state is solution to problem
– for some problems path is irrelevant.
Adapted from IIT Delhi AI course, Berkley AI course, UW AI course
– E.g., 8-queens

• Different algorithms can be used


– Local search

1 5

Local search and optimization


Satisfaction vs. Optimization • Local search
– Keep track of single current state
Goal Satisfaction Optimization – Move only to neighboring states
– Ignore paths
reach the goal node Constraint optimize(objective function)
satisfaction Constraint Optimization
• Advantages:
– Use very little memory
– Can often find reasonable solutions in large or infinite (continuous)
state spaces.
You can go back and forth between the two problems
Typically in the same complexity class
• “Pure optimization” problems
– All states have an objective function
– Goal is to find state with max (or min) objective value
– Does not quite fit into path-cost/goal-state formulation
– Local search can do quite well on these problems. 4
Example: n-queens
Example: n-queens
Put n queens on an n×n board with no two queens on the
• Put n queens on an n x n board with no two same row, column, or diagonal.
queens on the same row, column, or diagonal
Move a queen to reduce number of conflicts.

• Is it a satisfaction problem or optimization?


5

Hill-climbing search: 8-queens problem Search Space


• State
– All 8 queens on the board in some configuration

• Successor function
– move a single queen to another square in the same column.

• Example of a heuristic function h(n):


– the number of pairs of queens that are attacking each other
• Need to convert to an optimization problem – (so we want to minimize this)
• h = number of pairs of queens that are attacking each other
• h = 17 for the above state
7
Hill-climbing search: 8-queens problem Trivial Algorithms
• Random Sampling
–Generate a state randomly

• Random Walk
–Randomly pick a neighbor of the current state

• Is this a solution?
• Both algorithms asymptotically complete.
• What is h?
11

Hill-climbing (Greedy Local Search) max version Hill-climbing search


function HILL-CLIMBING( problem) return a state that is a local maximum • “a loop that continuously moves towards increasing value”
input: problem, a problem – terminates when a peak is reached
local variables: current, a node. – Aka greedy local search
neighbor, a node.
• Value can be either
current  MAKE-NODE(INITIAL-STATE[problem]) – Objective function value
loop do – Heuristic function value (minimized)
neighbor  a highest valued successor of current • Hill climbing does not look ahead of the immediate neighbors
if VALUE [neighbor] ≤ VALUE[current] then return STATE[current]
current  neighbor • Can randomly choose among the set of best successors
– if multiple have the best value

min version will reverse inequalities and look for lowest valued successor • “climbing Mount Everest in a thick fog with amnesia”
1
2
“Landscape” of search
Search Landscape (three-dimensions)

Hill Climbing gets stuck in local minima


depending on?
1
3

Hill-climbing on 8-queens Hill Climbing Drawbacks


• Randomly generated 8-queens starting states… • Local maxima
• 14% the time it solves the problem
• 86% of the time it get stuck at a local minimum
• Plateaus

• However…
– Takes only 4 steps on average when it succeeds • Diagonal ridges
– And 3 on average when it gets stuck
1
– (for a state space with 8^8 =~17 million states) 6

1
5
Escaping Shoulders: Sideways Move Tabu Search
• If no downhill (uphill) moves, allow sideways moves in hope
that algorithm can escape • prevent returning quickly to the same state
– Need to place a limit on the possible number of sideways moves to • Keep fixed length queue (“tabu list”)
avoid infinite loops • add most recent state to queue; drop oldest
• For 8-queens • Never make the step that is currently tabu’ed
– Now allow sideways moves with a limit of 100 • Properties:
– Raises percentage of problem instances solved from 14 to 94% – As the size of the tabu list grows, hill-climbing will
asymptotically become “non-redundant” (won’t look at the same state
– However…. twice)
• 21 steps for every successful solution – In practice, a reasonable sized tabu list (say 100 or so) improves the
• 64 for each failure 1 performance of hill climbing in many problem 19 s
7

Escaping Shoulders/local Optima Enforced Hill Climbing Hill-climbing: stochastic variations

• Perform breadth first search from a local optima • Stochastic hill-climbing


– to find the next state with better h function – Random selection among the uphill moves.
– The selection probability can vary with the steepness of the uphill
move.
• Typically,
– prolonged periods of exhaustive search
• To avoid getting stuck in local minima
– bridged by relatively quick periods of hill-climbing
–Random-walk hill-climbing
• Middle ground b/w local and systematic search –Random-restart hill-climbing
–Hill-climbing with both
21
Hill Climbing with random walk Hill-climbing with random restarts
When the state-space landscape has local minima, any • If at first you don’t succeed, try, try again!
search that moves only in the greedy direction cannot be • Different variations
complete – For each restart: run until termination vs. run for a fixed time
Random walk, on the other hand, is asymptotically – Run a fixed number of restarts or run indefinitely
complete • Analysis
– Say each search has probability p of success
• E.g., for 8-queens, p = 0.14 with no sideways moves
Idea: Put random walk into greedy hill-climbing
– Expected number of restarts?
• At each step do one of the two
– Expected number of steps taken?
– Greedy: With prob p move to the neighbor with largest value
• If you want to pick one local search algorithm, learn this one!!
– Random: With prob 1-p move to a random neighbor 22 2
2

Hill-climbing with random walk Simulated Annealing


At each step do one of the two • Simulated Annealing = physics inspired twist on random walk
Greedy: With prob p move to the neighbor with largest value • Basic ideas:
Random: With prob 1-p move to a random neighbor – like hill-climbing identify the quality of the local improvements
– instead of picking the best move, pick one randomly
Hill-climbing with both – say the change in objective function is 
– if  is positive, then move to that state
• At each step do one of the three – otherwise:
• move to this state with probability proportional to 
– Greedy: move to the neighbor with largest value • thus: worse moves (very large negative ) are executed less often
– Random Walk: move to a random neighbor – however, there is always a chance of escaping from local maxima
– Random Restart: Resample a new current state – over time, make it less likely to accept locally bad moves
– (Can also make the size of the move random as well, i.e., allow “large” steps in state
2 2
3 space) 4
Physical Interpretation of Simulated Annealing Simulated annealing
• A Physical Analogy: function SIMULATED-ANNEALING( problem, schedule) return a solution state
• imagine letting a ball roll downhill on the function surface input: problem, a problem
• this is like hill-climbing (for minimization) schedule, a mapping from time to temperature
• now imagine shaking the surface, while the ball rolls, gradually reducing the local variables: current, a node.
amount of shaking next, a node.
• this is like simulated annealing T, a “temperature” controlling the prob. of downward steps

current  MAKE-NODE(INITIAL-STATE[problem])
• Annealing = physical process of cooling a liquid or metal until for t  1 to ∞ do
particles achieve a certain frozen crystal state T  schedule[t]
• simulated annealing: if T = 0 then return current
• free variables are like particles next  a randomly selected successor of current
• seek “low energy” (high quality) configuration ∆E  VALUE[next] - VALUE[current]
• slowly reducing temp. T with particles moving around randomly if ∆E > 0 then current  next
else current  next only with probability e∆E /T
25 2
6

Temperature T Simulated Annealing in Practice


– method proposed in 1983 by IBM researchers for solving VLSI layout problems
• high T: probability of “locally bad” move is higher
(Kirkpatrick et al, Science, 220:671-680, 1983).
• low T: probability of “locally bad” move is lower • theoretically will always find the global optimum
• typically, T is decreased as the algorithm runs longer
• i.e., there is a “temperature schedule” – Other applications: Traveling salesman, Graph partitioning, Graph coloring,
Scheduling, Facility Layout, Image Processing, …

– useful for some problems, but can be very slow


• slowness comes about because T must be decreased very gradually to retain optimality

2 2
7 8
Local beam search Local Beam Search (contd)
• Idea: Keeping only one node in memory is an extreme
• Not the same as k random-start searches run in parallel!
reaction to memory problems.
• Searches that find good states recruit other searches to join them

• Keep track of k states instead of one


• Problem: quite often, all k states end up on same local hill
– Initially: k randomly selected states
• Idea: Stochastic beam search
– Next: determine all successors of k states – Choose k successors randomly, biased towards good ones
– If any of successors is goal  finished
– Else select k best from successors and repeat
• Observe the close analogy to natural selection!
2 3
9 0

Genetic algorithms
• Twist on Local Search: successor is generated by combining two parent states 8
7
• A state is represented as a string over a finite alphabet (e.g. binary)
– 8-queens 6 String representation
16257483
• State = position of 8 queens each in a column 5
• Start with k randomly generated states (population)
4
3
• Evaluation function (fitness function): 2
– Higher values for better states. 1
– Opposite to heuristic function, e.g., # non-attacking pairs in 8-queens

• Produce the next generation of states by “simulated evolution” Can we evolve 8-queens through genetic algorithms?
– Random selection
– Crossover
– Random mutation
3 3
1 2
Genetic algorithms Genetic algorithms

4 states for 2 pairs of 2 states New states Random


8-queens randomly selected based after crossover mutation
problem on fitness. Random applied Has the effect of “jumping” to a completely different new
crossover points selected part of the search space (quite non-local)
• Fitness function: number of non-attacking pairs of queens
(min = 0, max = 8 × 7/2 = 28)
• 24/(24+23+20+11) = 31%
• 23/(24+23+20+11) = 29% etc 3
3
3
4

Comments on Genetic Algorithms Optimization of Continuous Functions


• Genetic algorithm is a variant of “stochastic beam search”
• Positive points • Discretization
– Random exploration can find solutions that local search can’t
• (via crossover primarily)
– use hill-climbing
– Appealing connection to human evolution
• “neural” networks, and “genetic” algorithms are metaphors!
• Gradient descent
• Negative points – make a move in the direction of the gradient
– Large number of “tunable” parameters
• gradients: closed form or empirical
– Difficult to replicate performance from one problem to another
– Lack of good empirical studies comparing to simpler methods
– Useful on some (small?) set of problems but no convincing evidence that GAs are better than hill-
climbing w/random restarts in general

• Question: are GAs really optimizing the individual fitness function? Mixability? 3
6
Gradient Descent
Assume we have a continuous function: f(x1,x2,…,xN)
and we want minimize over continuous variables X1,X2,..,Xn

1. Compute the gradients for all i: f(x1,x2,…,xN) /xi

2. Take a small step downhill in the direction of the gradient:

xi  xi - λf(x1,x2,…,xN) /xi
3. Repeat.
• How to select λ
– Line search: successively double
– until f starts to increase again
3 3
7 8

You might also like