Lecture #5
• Review of the EC field
• Provide info to use EC tools to solve practical
problems
• Terminology and key concepts Paradigms
Review history
• Discuss main areas of EC
•We will examine five areas of
evolutionary computation:
Genetic algorithms*
Evolutionary
programming
Evolution strategies
Particle swarm
optimization*
•Most work has been done in genetic algorithms
(GAs), so this history (arbitrarily) concentrates
there.
•Focus is on people:
* A broad sample, not exhaustive
• Has had more influence on GA field than any other person
• Has been at the Univ. of Michigan since the early 1960s
• Holland “… created the genetic algorithm field …”
• Published and taught in the field of adaptive systems in 1960s
• Was a pioneer in the use of a population of individuals to do
search
• Derived schema theorem which shows that more fit schema are
more likely
to reproduce
• Used reproduction, crossover, and mutation as early as 1960s
• GA was originally called a “genetic plan”
•Holland published Adaptation in Natural and Artificial Systems
(2nd Edition in 1992)
• DeJong’s dissertation
Five test functions for GAs
Metrics for convergence and ongoing performance
Measured effects of varying population size, crossover
probability, mutation rate, and generation gap.
• Student of John Holland
• Dissertation on gas pipeline project
• Published landmark book, “Genetic Algorithms in
Search,
Optimization, and Machine Learning” (1989)
• Self-taught in GAs
• Used GAs for 2-D bin packing in chip layout
at TI
• Edited “Handbook of Genetic
Algorithms” Tutorial
Case studies
Software diskette: OOGA (Lisp)
and Genesis ©
Larry J. Fogel
• Developed evolutionary programming (EP) in 1960s
•EP utilizes survival of fittest (or “survival of more skillful”),
but only operation on structures is mutation
• Interests in finite state machines and machine intelligence
•EP “… abstracts evolution as a top-down process of adaptive
behavior, rather than a bottom-up process of adaptive
genetics …”
• 1960s book was controversial
•EP suffered from lack of funding due to numerics versus
symbolics controversy
• David Fogel and others now carrying on the work
• Latané’s dynamic social impact
theory
• Axelrod’s culture model
• Kennedy’s adaptive culture model
•Behaviors of individuals can be explained in terms
of the self-organizing properties of their social
systems
• Clusters of individuals develop similar beliefs
• Subpopulations diverge from one another
(polarization)
• Consolidation: Opinion diversity is reduced as
individuals are
exposed to majority arguments
• Clustering: Individuals become more like their
neighbors in social space
• Correlation: Attitudes that were originally
independent tend to
become associated
• Continuing diversity: Clustering prevents minority
views form complete consolidation
• Individuals influence one another, and in doing so
become more similar
• Patterns of belief held by individuals tend to
correlate within regions of a population
• This model is consistent with findings in the fields
of social psychology, sociology, economics, and
anthropology
• Populations of individuals are pictured as strings of
symbols, or “features”
• Probability of interaction between two individuals is a
function of
their similarity
• Individuals become more similar as a result of
interactions
• The observed dynamic is polarization, homogeneous
subpopulations that differ from one another
• No effect of similarity on probability of interaction
•The effect of similarity is negative in that its
dissimilarity that creates boundaries between
cultural regions
•Interaction occurs if fitnesses are different
• Individuals searching for solutions learn from the
experiences of
others (individuals learn from their neighbors)
•An observer of the population perceives phenomena of
which the individuals are the parts (individuals that
interact frequently become similar)
•Culture affects the performance of individuals that
comprise it (individuals gain benefit by imitating their
neighbors)
• Social behavior increases the ability of an individual
to adapt
•There is a relationship between adaptability and
intelligence
•Intelligence arises from interactions among
individuals
• 1994 IEEE World Congress on Computational Intelligence
(WCCI)
First ICEC chaired by Zbigniew Michalewicz
• 1998, 2002 and 2006 IEEE WCCIs
Workers in different groups now working together
• Cover evolutionary computation theory and paradigms
•Emphasize use of EC to solve practical problems
•Compare with other techniques - see how EC fits in with other
approaches
Evolutionary computation consists of machine
learning optimization and classification paradigms
that are roughly based on evolution mechanisms
such as biological genetics and natural selection.
The EC field comprises four
main areas: genetic algorithms, evolutionary
programming, evolution strategies and genetic
programming.
EC paradigms differ from traditional search and
optimization paradigms in that EC paradigms:
1) Use a population of points in their search,
2) Use direct “fitness” information, instead of function
derivatives or other related knowledge, and ,
3) Use probabilistic rather than deterministic transition
rules.
• Each member is a point in the hyperspace problem domain,
and thus is a potential solution
•A new population is generated each epoch (generation)
•Population typically remains the same size
•Operators such as crossover and mutation significantly
enhance parallel search capabilities
•Auxiliary information such as derivatives used to
minimize sum-squared error in neural nets is not
used
•The fitness value optimized is directly
proportional to the function value being optimized
•If fitness is proportional to profit, for example,
then the
fitness rises as the profit rises
•Provides environmental influence on individuals
• Search is not random
•Search is directed toward regions that are likely to
have higher
fitness values
•Different EC paradigms make different uses of
stochasticity
•Here Parameters means components of the solution, for
example each disease is a bit; we prefer to call user defined
inputs to the GA the Parameters
•Often encoded as binary strings
•Any finite alphabet can be used
•Typically, population member string is of fixed length
1. Initialize the population
2. Calculate the fitness for each individual
3. Perform evolutionary operations (e.g., mate
selection,
crossover, mutation)
4. Go to step 2 until some condition is met
•Initialization often done by randomizing initial population
•Fitness value is proportional to value of function being
optimized (often scaled 0–1)
•Selection for reproduction based on fitness values
(some
paradigms such as PSO retain all population members)
• Optimization and classification
•Mostly
optimization non-
differentiable
many local optima
may not know
optimum
system may be
Note: EC is usually the second-best way to solve a
dynamic,
problem!
changing with
time, or even
chaotic
• Can be used to solve many problems, and many kinds of
problems,
with minimal adjustments
• Are fast and easy to implement
• Overview
•Terminology
•GA example problem
•Review of GA operations
•Schemata and the schema
theorem
• Reflect in a primitive way some of the natural processes of
evolution
•GAs perform highly efficient search of the problem
hyperspace
•GAs work with a population of individuals (chromosomes)
•Number of elements in each individual equals the number of
parameters in the optimization problem
•If w is the number of parameters, and we have b bits per
parameter,
then the search space is 2wb
The variables being optimized comprise the phenotype
space. The binary strings upon which the operators
1. Initialize the population (usually randomized binary
strings)
2. Calculate the fitness for each individual in the
population
3. Reproduce selected individuals to form new
population
4. Perform crossover and mutation (or other
operations)
5. Loop to step 2 until some condition is met
• Fitness evaluation
•Copying strings
•Exchanging portions of
strings
•Flipping bits in strings
Optimize value f (x)
of
x
256 ove
sin to integers.r
0x
where x is restricted 255
The maximum value of 1occurs at x
= 128.
Note: Function space is identical to fitness
• One variable
•Use binary alphabet
•Therefore, represent each individual as 8-bit binary
string
•Set number of population members to (artificially
low) 8
•Randomize population
•Calculate fitness for each member of (initialized)
population
Reproduction is used to form new population of n
individuals Select members of current population
Use stochastic process based on fitnesses
First, compute normalized fitness value for each individual
by dividing individual fitness by sum of all fitnesses
(fi / 5.083 in the example case)
Generate a random number between 0 and 1 n times to
form new population (sort of like spinning roulette wheel)
The eight random numbers generated, presented in
random order,
are .293, .971, .160, .469, .664, .568, .371, and 109.
This result in initial population member numbers 3, 8, 2,
5, 6, 5, 3, and 1 being chosen to make up the population
after reproduction, as shown below.
• Crossover: the exchange of portions of two individuals (the
“parents”)
•Probability of crossover usually ~ .65 - .80, .75 used here
•Randomly pair off population, determine whether or not
crossover will occur (in example, 3 of 4 crossover)
•Randomly select two crossover points for pairs undergoing
crossover
•Exchange portions of strings between 1st and 2nd crossover
points,
treating individuals as “ring” structures
• Mutation is final operation in Gas
•Consists of flipping bits with probability of flip ~ .001
- .01
•Example problem, assume no bit flips (64 bits total in
population)
•Now have first generation, with total fitness of 6.313,
and two individuals each with fitness > .99
•Now ready for more generations until stopping
condition met:
* Number of generations
* Population member(s) with > specified fitness
* No change in max fitness in n generations
1
.
2
.
3
.
4
.
5
.
• Representation of variables
• Population size
• Population initialization
• Fitness calculation
• Reproduction
• Crossover
• Inversion
• Mutation
• Selecting number of
generations
Consider example problem,
where 127 is 01111111 and 128 is 10000000
• The smallest fitness change requires change in every bit
•Gray coding has representation such that adjacent
values vary by a
single bit
•Some software converts dynamic range and resolution
into
appropriate bit strings
•Different alphabets possible
To get Gray code from binary code, the leftmost bit is the same, then
Gi = XOR(Bi, Bi-1) for i>=2, where Gi is the ith Gray code bit, Bi is ith
binary bit.
Some software allows you to set the dynamic range
and
resolution for each variable.
Example: Dynamic range is 2.5 to 4.5, and you want
a resolution of 3 decimal places (1 in 1000), you
need 11 bits.
Problems can occur: If we have a dynamic range of 5
and a resolution of 3 decimal places, then we need
13 bits, the same as for a dynamic range of 8. We
can now get numbers from crossover and mutation
out of range, and repairs or penalties must be
implemented.
• Start with moderate sized population
•50-500 is often a good starting place for a GA
•Population size tend to increase linearly with individual
string length (not exponentially)
• Usually selected stochastically
• Sometimes seeded with a few promising
individuals
• Do not skew population significantly
• Most GAs scale fitness values
• Example problem illustrates two common problems:
1. “Bunching” of fitness values near top of
scale, thereby lowering fitness
differentials
2. Fitness values often are nearly constant
•Solve bunching problem by:
*Equally spacing fitness values (don’t use 0, and keep
track of f(x) for evaluation purposes)
*Use “scaling” window over last w (~5-10) generations
that keeps
track of minimum fitness value
•defining a new
Solve “flat” f (x)
fitness new f (inold
problem problems like example
f(x):
problem) by (x)n
•Accomplish minimization with a GA by setting fitness
equal to fmax
- f(x), where fmax is set by scaling
• Reproduction often done by normalizing fitnesses and
generating
n random numbers between 0 and 1
• Baker’s method: Use roulette wheel with n pointers
spaced 1/n
apart; use normalized fitness; spin wheel once.
•“Tournament selection” - Select two individuals at
random; the individual with the higher fitness is
selected for the next population
•Generation gap approach: Replace x percent that have
worst
fitness values(x is defined as the generation gap)
•“Elitist strategy” ensures that individual with highest
• Two-point crossover, with p(c) of 60-80% is common
•Often start with relatively high crossover rate, and reduce
it during the run
•The most basic crossover is one-point:
10110|010 > 10110100
01001|100 > 01001010
•Uniform crossover (Syswerda)
* Randomly choose 2 parents & stochastically decide
whether to do crossover
* At each bit position, exchange bits between the 2
strings
with p(c) <= 50% [constant for run]
• Example of matrix crossover after Bezdek (1994)
•Working with a population of c by n matrices,
where c =
no. of classes and n is number of patterns
•Each matrix represents classes assigned to all
patterns (each column represents a pattern; only
one 1 is allowed in each column)
•This case is analogous to working with strings that
have
an alphabet of c characters
c=
3
n=
7
• Single parent produces single child
•Not used much today...it destroys “schemata”
(defined later)
•Example: 10|01101|0 -> 10101100
• Stochastically flipping bits often with p(m) ~ .001
•If real-valued parameters used, mutation can
assign any value in parameters allowed range
•Probability of mutation usually held constant or
increased during run
•Can increase mutation rate when fitness
variablility drops below some threshold
• Often a trial and error process
•Optimum value (if known) a function of the
problem
•Multiple runs often done; overall best solution
selected
Schemata are similarity templates for strings (features of
strings)
Schema: a subset of strings with identical values at
specified string locations
A “don’t care” or “wildcard” symbol such as “*” or “#”
is used
in schema locations where the value doesn’t matter
The schema 1**0 has 4 matching strings:
1000, 1010, 1100, 1110
For a string of length l and alphabet of a
characters, the
total possible number of schemata is (a+1) l
For a population of n individuals, there are
between 2l and n2l unique schemata
Schemata in highly fit individuals are more
likely to be
reproduced in the new population
If 2l then all populatoin members are identical
If n2l thenno two schemata are the same
Population diversity is proportional to the
number of schemata
Crossover and mutation are needed to guide search
into new regions
Effect of crossover dependent on “defining length” of
schemata In case of 1****0 and **10**, the
first is more likely to be disrupted by
crossover
Mutation is as likely to disrupt one as the other
Short, highly fit, schemata will appear in increasing
numbers in successive generations
The Schema Theorem predicts the number of times a
specific schema will appear in the next generation of
a GA, given the average fitness of individuals
containing the schema, the average fitness of the
population, and other parameters.
• A genetic algorithm works with all schemata
simultaneously, an attribute named intrinsic
parallelism by Holland.
• The most rapidly increasing representation in any
population will be of highly fit, short schemata,
which will experience growth approaching
exponential.
• The Schema Theorem does not indicate how well a
GA will solve a particular problem.
“…[T]here is something profoundly moving about
linking a genetic algorithm to a difficult problem and
returning later to find that the algorithm has evolved
a solution that is better than the one a human found.
With genetic algorithms we are not optimizing;
we are creating conditions in which optimization
occurs, as it may have occurred in the natural world.
One feels a kind of resonance at such times
that is uncommon and profound.”
Dave Davis (1991)
(This could apply to other EC paradigms as
well.)
Similar to genetic algorithms
(GAs) Uses population
of solutions
Evolves a solution using
survival of the fittest
Uses mutation (slightly
differently than GAs)
Different from GAs
Concentrates on “top-down”
processes of adaptive
behavior
Simulates evolution
(phenotypic behavior)
rather than genetics
(genotypic
behavior) Does not use
1. Initialize the population
2. Expose the population to the environment
3. Calculate the fitness for each member
4. Randomly mutate each “parent” population
member
5. Evaluate parents and children
6. Select members of new population
7. Go to step 2 until some condition is met
Component variables are usually real-valued
Dynamic range constraints usually exist and are
observed Random initial values within dynamic
ranges are used Population is often a few dozen
to a few hundred