0% found this document useful (0 votes)
4 views68 pages

Overview of Evolutionary Computation

Uploaded by

ziadhani43
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views68 pages

Overview of Evolutionary Computation

Uploaded by

ziadhani43
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd

Lecture #5

• Review of the EC field


• Provide info to use EC tools to solve practical
problems
• Terminology and key concepts Paradigms
Review history
• Discuss main areas of EC
•We will examine five areas of
evolutionary computation:
Genetic algorithms*
Evolutionary
programming
Evolution strategies
Particle swarm
optimization*

•Most work has been done in genetic algorithms


(GAs), so this history (arbitrarily) concentrates
there.

•Focus is on people:
* A broad sample, not exhaustive
• Has had more influence on GA field than any other person
• Has been at the Univ. of Michigan since the early 1960s
• Holland “… created the genetic algorithm field …”
• Published and taught in the field of adaptive systems in 1960s
• Was a pioneer in the use of a population of individuals to do
search
• Derived schema theorem which shows that more fit schema are
more likely
to reproduce
• Used reproduction, crossover, and mutation as early as 1960s
• GA was originally called a “genetic plan”
•Holland published Adaptation in Natural and Artificial Systems
(2nd Edition in 1992)
• DeJong’s dissertation
Five test functions for GAs
Metrics for convergence and ongoing performance
Measured effects of varying population size, crossover
probability, mutation rate, and generation gap.
• Student of John Holland
• Dissertation on gas pipeline project
• Published landmark book, “Genetic Algorithms in
Search,
Optimization, and Machine Learning” (1989)
• Self-taught in GAs
• Used GAs for 2-D bin packing in chip layout
at TI
• Edited “Handbook of Genetic
Algorithms” Tutorial
Case studies
Software diskette: OOGA (Lisp)
and Genesis ©
Larry J. Fogel
• Developed evolutionary programming (EP) in 1960s
•EP utilizes survival of fittest (or “survival of more skillful”),
but only operation on structures is mutation
• Interests in finite state machines and machine intelligence
•EP “… abstracts evolution as a top-down process of adaptive
behavior, rather than a bottom-up process of adaptive
genetics …”
• 1960s book was controversial
•EP suffered from lack of funding due to numerics versus
symbolics controversy
• David Fogel and others now carrying on the work
• Latané’s dynamic social impact
theory
• Axelrod’s culture model
• Kennedy’s adaptive culture model
•Behaviors of individuals can be explained in terms
of the self-organizing properties of their social
systems
• Clusters of individuals develop similar beliefs
• Subpopulations diverge from one another
(polarization)
• Consolidation: Opinion diversity is reduced as
individuals are
exposed to majority arguments
• Clustering: Individuals become more like their
neighbors in social space
• Correlation: Attitudes that were originally
independent tend to
become associated
• Continuing diversity: Clustering prevents minority
views form complete consolidation
• Individuals influence one another, and in doing so
become more similar
• Patterns of belief held by individuals tend to
correlate within regions of a population
• This model is consistent with findings in the fields
of social psychology, sociology, economics, and
anthropology
• Populations of individuals are pictured as strings of
symbols, or “features”
• Probability of interaction between two individuals is a
function of
their similarity
• Individuals become more similar as a result of
interactions
• The observed dynamic is polarization, homogeneous
subpopulations that differ from one another
• No effect of similarity on probability of interaction

•The effect of similarity is negative in that its


dissimilarity that creates boundaries between
cultural regions

•Interaction occurs if fitnesses are different


• Individuals searching for solutions learn from the
experiences of
others (individuals learn from their neighbors)

•An observer of the population perceives phenomena of


which the individuals are the parts (individuals that
interact frequently become similar)

•Culture affects the performance of individuals that


comprise it (individuals gain benefit by imitating their
neighbors)
• Social behavior increases the ability of an individual
to adapt

•There is a relationship between adaptability and


intelligence

•Intelligence arises from interactions among


individuals
• 1994 IEEE World Congress on Computational Intelligence
(WCCI)
First ICEC chaired by Zbigniew Michalewicz

• 1998, 2002 and 2006 IEEE WCCIs


Workers in different groups now working together
• Cover evolutionary computation theory and paradigms

•Emphasize use of EC to solve practical problems

•Compare with other techniques - see how EC fits in with other


approaches
Evolutionary computation consists of machine
learning optimization and classification paradigms
that are roughly based on evolution mechanisms
such as biological genetics and natural selection.
The EC field comprises four
main areas: genetic algorithms, evolutionary
programming, evolution strategies and genetic
programming.
EC paradigms differ from traditional search and
optimization paradigms in that EC paradigms:

1) Use a population of points in their search,

2) Use direct “fitness” information, instead of function


derivatives or other related knowledge, and ,

3) Use probabilistic rather than deterministic transition


rules.
• Each member is a point in the hyperspace problem domain,
and thus is a potential solution

•A new population is generated each epoch (generation)

•Population typically remains the same size

•Operators such as crossover and mutation significantly


enhance parallel search capabilities
•Auxiliary information such as derivatives used to
minimize sum-squared error in neural nets is not
used

•The fitness value optimized is directly


proportional to the function value being optimized

•If fitness is proportional to profit, for example,


then the
fitness rises as the profit rises

•Provides environmental influence on individuals


• Search is not random

•Search is directed toward regions that are likely to


have higher
fitness values

•Different EC paradigms make different uses of


stochasticity
•Here Parameters means components of the solution, for
example each disease is a bit; we prefer to call user defined
inputs to the GA the Parameters

•Often encoded as binary strings

•Any finite alphabet can be used

•Typically, population member string is of fixed length


1. Initialize the population

2. Calculate the fitness for each individual

3. Perform evolutionary operations (e.g., mate


selection,
crossover, mutation)

4. Go to step 2 until some condition is met


•Initialization often done by randomizing initial population

•Fitness value is proportional to value of function being


optimized (often scaled 0–1)

•Selection for reproduction based on fitness values


(some
paradigms such as PSO retain all population members)
• Optimization and classification

•Mostly
optimization non-
differentiable
many local optima
may not know
optimum
system may be
Note: EC is usually the second-best way to solve a
dynamic,
problem!
changing with
time, or even
chaotic
• Can be used to solve many problems, and many kinds of
problems,
with minimal adjustments

• Are fast and easy to implement


• Overview

•Terminology

•GA example problem

•Review of GA operations

•Schemata and the schema


theorem
• Reflect in a primitive way some of the natural processes of
evolution

•GAs perform highly efficient search of the problem


hyperspace

•GAs work with a population of individuals (chromosomes)

•Number of elements in each individual equals the number of


parameters in the optimization problem

•If w is the number of parameters, and we have b bits per


parameter,
then the search space is 2wb

The variables being optimized comprise the phenotype


space. The binary strings upon which the operators
1. Initialize the population (usually randomized binary
strings)

2. Calculate the fitness for each individual in the


population

3. Reproduce selected individuals to form new


population

4. Perform crossover and mutation (or other


operations)

5. Loop to step 2 until some condition is met


• Fitness evaluation

•Copying strings

•Exchanging portions of
strings

•Flipping bits in strings


Optimize value f (x) 
of
 x
256 ove
sin to integers.r
0x
where x is restricted 255
The maximum value of 1occurs at x
= 128.

Note: Function space is identical to fitness


• One variable

•Use binary alphabet

•Therefore, represent each individual as 8-bit binary


string

•Set number of population members to (artificially


low) 8

•Randomize population

•Calculate fitness for each member of (initialized)


population
Reproduction is used to form new population of n
individuals Select members of current population
Use stochastic process based on fitnesses

First, compute normalized fitness value for each individual


by dividing individual fitness by sum of all fitnesses
(fi / 5.083 in the example case)

Generate a random number between 0 and 1 n times to


form new population (sort of like spinning roulette wheel)
The eight random numbers generated, presented in
random order,
are .293, .971, .160, .469, .664, .568, .371, and 109.
This result in initial population member numbers 3, 8, 2,
5, 6, 5, 3, and 1 being chosen to make up the population
after reproduction, as shown below.
• Crossover: the exchange of portions of two individuals (the
“parents”)

•Probability of crossover usually ~ .65 - .80, .75 used here

•Randomly pair off population, determine whether or not


crossover will occur (in example, 3 of 4 crossover)

•Randomly select two crossover points for pairs undergoing


crossover

•Exchange portions of strings between 1st and 2nd crossover


points,
treating individuals as “ring” structures
• Mutation is final operation in Gas

•Consists of flipping bits with probability of flip ~ .001


- .01

•Example problem, assume no bit flips (64 bits total in


population)

•Now have first generation, with total fitness of 6.313,


and two individuals each with fitness > .99

•Now ready for more generations until stopping


condition met:
* Number of generations
* Population member(s) with > specified fitness
* No change in max fitness in n generations
1
.

2
.
3
.

4
.
5
.
• Representation of variables
• Population size
• Population initialization
• Fitness calculation
• Reproduction
• Crossover
• Inversion
• Mutation
• Selecting number of
generations
Consider example problem,
where 127 is 01111111 and 128 is 10000000

• The smallest fitness change requires change in every bit

•Gray coding has representation such that adjacent


values vary by a
single bit

•Some software converts dynamic range and resolution


into
appropriate bit strings

•Different alphabets possible


To get Gray code from binary code, the leftmost bit is the same, then
Gi = XOR(Bi, Bi-1) for i>=2, where Gi is the ith Gray code bit, Bi is ith
binary bit.
Some software allows you to set the dynamic range
and
resolution for each variable.

Example: Dynamic range is 2.5 to 4.5, and you want


a resolution of 3 decimal places (1 in 1000), you
need 11 bits.

Problems can occur: If we have a dynamic range of 5


and a resolution of 3 decimal places, then we need
13 bits, the same as for a dynamic range of 8. We
can now get numbers from crossover and mutation
out of range, and repairs or penalties must be
implemented.
• Start with moderate sized population

•50-500 is often a good starting place for a GA

•Population size tend to increase linearly with individual


string length (not exponentially)
• Usually selected stochastically
• Sometimes seeded with a few promising
individuals
• Do not skew population significantly
• Most GAs scale fitness values
• Example problem illustrates two common problems:
1. “Bunching” of fitness values near top of
scale, thereby lowering fitness
differentials
2. Fitness values often are nearly constant
•Solve bunching problem by:
*Equally spacing fitness values (don’t use 0, and keep
track of f(x) for evaluation purposes)
*Use “scaling” window over last w (~5-10) generations
that keeps
track of minimum fitness value

•defining a new
Solve “flat” f (x)
fitness new  f (inold
problem problems like example
f(x):
problem) by (x)n
•Accomplish minimization with a GA by setting fitness
equal to fmax
- f(x), where fmax is set by scaling
• Reproduction often done by normalizing fitnesses and
generating
n random numbers between 0 and 1

• Baker’s method: Use roulette wheel with n pointers


spaced 1/n
apart; use normalized fitness; spin wheel once.

•“Tournament selection” - Select two individuals at


random; the individual with the higher fitness is
selected for the next population

•Generation gap approach: Replace x percent that have


worst
fitness values(x is defined as the generation gap)

•“Elitist strategy” ensures that individual with highest


• Two-point crossover, with p(c) of 60-80% is common

•Often start with relatively high crossover rate, and reduce


it during the run

•The most basic crossover is one-point:


10110|010 > 10110100
01001|100 > 01001010

•Uniform crossover (Syswerda)


* Randomly choose 2 parents & stochastically decide
whether to do crossover
* At each bit position, exchange bits between the 2
strings
with p(c) <= 50% [constant for run]
• Example of matrix crossover after Bezdek (1994)

•Working with a population of c by n matrices,


where c =
no. of classes and n is number of patterns

•Each matrix represents classes assigned to all


patterns (each column represents a pattern; only
one 1 is allowed in each column)

•This case is analogous to working with strings that


have
an alphabet of c characters
c=
3
n=
7
• Single parent produces single child

•Not used much today...it destroys “schemata”


(defined later)

•Example: 10|01101|0 -> 10101100


• Stochastically flipping bits often with p(m) ~ .001

•If real-valued parameters used, mutation can


assign any value in parameters allowed range

•Probability of mutation usually held constant or


increased during run

•Can increase mutation rate when fitness


variablility drops below some threshold
• Often a trial and error process

•Optimum value (if known) a function of the


problem

•Multiple runs often done; overall best solution


selected
Schemata are similarity templates for strings (features of
strings)

Schema: a subset of strings with identical values at


specified string locations

A “don’t care” or “wildcard” symbol such as “*” or “#”


is used
in schema locations where the value doesn’t matter

The schema 1**0 has 4 matching strings:


1000, 1010, 1100, 1110
For a string of length l and alphabet of a
characters, the
total possible number of schemata is (a+1) l

For a population of n individuals, there are


between 2l and n2l unique schemata

Schemata in highly fit individuals are more


likely to be
reproduced in the new population

If 2l then all populatoin members are identical


If n2l thenno two schemata are the same

Population diversity is proportional to the


number of schemata
Crossover and mutation are needed to guide search
into new regions

Effect of crossover dependent on “defining length” of


schemata In case of 1****0 and **10**, the
first is more likely to be disrupted by
crossover

Mutation is as likely to disrupt one as the other

Short, highly fit, schemata will appear in increasing


numbers in successive generations
The Schema Theorem predicts the number of times a
specific schema will appear in the next generation of
a GA, given the average fitness of individuals
containing the schema, the average fitness of the
population, and other parameters.
• A genetic algorithm works with all schemata
simultaneously, an attribute named intrinsic
parallelism by Holland.

• The most rapidly increasing representation in any


population will be of highly fit, short schemata,
which will experience growth approaching
exponential.

• The Schema Theorem does not indicate how well a


GA will solve a particular problem.
“…[T]here is something profoundly moving about
linking a genetic algorithm to a difficult problem and
returning later to find that the algorithm has evolved
a solution that is better than the one a human found.
With genetic algorithms we are not optimizing;
we are creating conditions in which optimization
occurs, as it may have occurred in the natural world.
One feels a kind of resonance at such times
that is uncommon and profound.”
Dave Davis (1991)

(This could apply to other EC paradigms as


well.)
Similar to genetic algorithms
(GAs) Uses population
of solutions
Evolves a solution using
survival of the fittest
Uses mutation (slightly
differently than GAs)

Different from GAs


Concentrates on “top-down”
processes of adaptive
behavior
Simulates evolution
(phenotypic behavior)
rather than genetics
(genotypic
behavior) Does not use
1. Initialize the population
2. Expose the population to the environment
3. Calculate the fitness for each member
4. Randomly mutate each “parent” population
member
5. Evaluate parents and children
6. Select members of new population
7. Go to step 2 until some condition is met
Component variables are usually real-valued

Dynamic range constraints usually exist and are

observed Random initial values within dynamic

ranges are used Population is often a few dozen

to a few hundred

You might also like