0% found this document useful (0 votes)
11 views27 pages

Genetic Algorithm For Variable Selection: Jennifer Pittman

Uploaded by

Taimoor Shakeel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views27 pages

Genetic Algorithm For Variable Selection: Jennifer Pittman

Uploaded by

Taimoor Shakeel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd

Genetic Algorithm

for Variable Selection


Jennifer Pittman
ISDS
Duke University
Genetic Algorithms
Step by Step
Jennifer Pittman
ISDS
Duke University
Example: Protein Signature Selection in Mass Spectrometry

h
t
t
p
:
/
/
w
w
w
.
u
n
i
-
m
a
i
n
z
.
d
e
/
~
f
r
o
s
c
0
0
0
/
f
b
g
_
p
o
3
.
h
t
m
l

molecular weight
r
e
l
a
t
i
v
e

i
n
t
e
n
s
i
t
y

Genetic Algorithm (Holland)
heuristic method based on survival of the fittest
in each iteration (generation) possible solutions or
individuals represented as strings of numbers
useful when search space very large or too complex
for analytic treatment
00010101 00111010 11110000
00010001 00111011 10100101
00100100 10111001 01111000
11000101 01011000 01101010
3021 3058 3240
Flowchart of GA


h
t
t
p
:
/
/
w
w
w
.
s
p
e
c
t
r
o
s
c
o
p
y
n
o
w
.
c
o
m

individuals allowed to
reproduce (selection),
crossover, mutate

all individuals in population
evaluated by fitness function

h
t
t
p
:
/
/
i
b
-
p
o
l
a
n
d
.
v
i
r
t
u
a
l
a
v
e
.
n
e
t
/
e
e
/
g
e
n
e
t
i
c
1
/
3
g
e
n
e
t
i
c
a
l
g
o
r
i
t
h
m
s
.
h
t
m

Initialization
proteins corresponding to 256 mass spectrometry
values from 3000-3255 m/z
assume optimal signature contains 3 peptides
represented by their m/z values in binary encoding
population size ~M=L/2 where L is signature length
(a simplified example)
00010101 00111010 11110000
00010101 00111010 11110000
00010001 00111011 10100101
00100100 10111001 01111000
11000101 01011000 01101010
Initial
Population
M = 12
L = 24
Searching
search space defined by all possible encodings of
solutions
selection, crossover, and mutation perform
pseudo-random walk through search space
operations are non-deterministic yet directed
Phenotype Distribution
[Link]
Evaluation and Selection
evaluate fitness of each solution in current
population (e.g., ability to classify/discriminate)
[involves genotype-phenotype decoding]
selection of individuals for survival based on
probabilistic function of fitness
may include elitist step to ensure survival of
fittest individual
on average mean fitness of individuals increases
Roulette Wheel Selection
[Link]
Crossover
combine two individuals to create new individuals
for possible inclusion in next generation
main operator for local search (looking close to
existing solutions)
perform each crossover with probability p
c
{0.5,,0.8}
crossover points selected at random
individuals not crossed carried over in population
Initial Strings Offspring
Single-Point
Two-Point
Uniform
11000101 01011000 01101010
00100100 10111001 01111000
11000101 01011000 01101010
11000101 01011000 01101010
00100100 10111001 01111000
00100100 10111001 01111000 10100100 10011001 01101000
00100100 10011000 01111000
00100100 10111000 01101010
11000101 01011001 01111000
11000101 01111001 01101010
01000101 01111000 01111010
Mutation
each component of every individual is modified with
probability p
m

main operator for global search (looking at new
areas of the search space)
individuals not mutated carried over in population
p
m
usually small {0.001,,0.01}
rule of thumb = 1/no. of bits in chromosome
[Link]
00010101 00111010 11110000
00010001 00111011 10100101
00100100 10111001 01111000
11000101 01011000 01101010
3021 3058 3240
3017 3059 3165
3036 3185 3120
3197 3088 3106
0.67
0.23
0.45
0.94
phenotype genotype fitness
1
4
2
3
1
3
4
4
00010101 00111010 11110000
00100100 10111001 01111000
11000101 01011000 01101010
11000101 01011000 01101010
selection
00010101 00111010 11110000
00100100 10111001 01111000
11000101 01011000 01101010
11000101 01011000 01101010
one-point crossover (p=0.6)

0.3

0.8
00010101 00111001 01111000
00100100 10111010 11110000
11000101 01011000 01101010
11000101 01011000 01101010
mutation (p=0.05)
00010101 00111001 01111000
00100100 10111010 11110000
11000101 01011000 01101010
11000101 01011000 01101010
00010101 00110001 01111010
10100110 10111000 11110000
11000101 01111000 01101010
11010101 01011000 00101010
3021 3058 3240
3017 3059 3165
3036 3185 3120
3197 3088 3106
00010101 00111010 11110000
00010001 00111011 10100101
00100100 10111001 01111000
11000101 01011000 01101010
0.67
0.23
0.45
0.94
00010101 00110001 01111010
10100110 10111000 11110000
11000101 01111000 01101010
11010101 01011000 00101010
starting generation
next generation
phenotype genotype fitness
3021 3049 3122
3166 3184 3240
3197 3120 3106
3213 3088 3042
0.81
0.77
0.42
0.98
0 20 40 60 80 100 120
10
50
100
GA Evolution
Generations
A
c
c
u
r
a
c
y

i
n

P
e
r
c
e
n
t

[Link]
genetic algorithm learning
[Link]
0 50 100 150 200
-
7
0





-
6
0











-
5
0












-
4
0














Generations
F
i
t
n
e
s
s

c
r
i
t
e
r
i
a

F
i
t
n
e
s
s

v
a
l
u
e

(
s
c
a
l
e
d
)

iteration
Holland, J. (1992), Adaptation in natural and
artificial systems , 2
nd
Ed. Cambridge: MIT Press.
Davis, L. (Ed.) (1991), Handbook of genetic algorithms.
New York: Van Nostrand Reinhold.
Goldberg, D. (1989), Genetic algorithms in search,
optimization and machine learning. Addison-Wesley.
References
Fogel, D. (1995), Evolutionary computation: Towards a
new philosophy of machine intelligence. Piscataway:
IEEE Press.
Bck, T., Hammel, U., and Schwefel, H. (1997),
Evolutionary computation: Comments on the history and
the current state, IEEE Trans. On Evol. Comp. 1, (1)
[Link]
[Link]

IlliGAL ([Link]
Online Resources
GAlib ([Link]
iteration
P
e
r
c
e
n
t

i
m
p
r
o
v
e
m
e
n
t

o
v
e
r

h
i
l
l
c
l
i
m
b
e
r

Schema and GAs
a schema is template representing set of bit strings
1**100*1 { 10010011, 11010001, 10110001, 11110011, }
every schema s has an estimated average fitness f(s):
E
t+1
k [f(s)/f(pop)] E
t

schema s receives exponentially increasing or decreasing
numbers depending upon ratio f(s)/f(pop)
above average schemas tend to spread through
population while below average schema disappear
(simultaneously for all schema implicit parallelism)
MALDI-TOF
[Link]/pics/main/[Link]

You might also like