0% found this document useful (0 votes)
21 views7 pages

Genetic Algorithms in Machine Learning

Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views7 pages

Genetic Algorithms in Machine Learning

Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

COS4852/U5/0/2024

UNIT 5 U5/0/2024

Machine Learning
COS4852

Year module

Department of Computer Science

School of Computing

CONTENTS

This document contains the material for UNIT 5 for COS4852 for 2024.

university
Define tomorrow. of south africa
1 OUTCOMES

In this Unit you will learn about G ENETIC A LGORITHMS. Specifically you will study:

1. A basic class of G ENETIC A LGORITHMS.

2. The encoding of the chromosomes from a problem.

3. The function and structure of the fitness function and the various operators used in G ENETI -
C A LGORITHMS .

4. The theoretical basis of G ENETIC A LGORITHMS.

5. Different variants of G ENETIC A LGORITHMS and other types of genetic operators.

6. More advanced versions of G ENETIC A LGORITHMS, specifically their application to real-valued


problems.

After completion of this Unit you will be able to:

1. Understand and recognise appropriate learning problems that can be solved using genetic
algorithms.

2. Encode a learning problem for use in genetic algorithms.

3. Understand and describe genetic operators.

4. Understand and describe the fitness function and the process of selection.

5. Solve a given problem using the genetic algorithm technique.

6. Understand and describe how genetic algorithms search the hypothesis space.

2 PREPARATION

2.1 Textbooks

One of the definitive textbooks on G ENETIC A LGORITHMS was written by Melanie Mitchell. The book
goes into a lot of detail, and will serve well in a standalone course on G ENETIC A LGORITHMS.

2
COS4852/U5

2.2 Online material

The notes from the Johannes Kepler University Institute for Mathematical Methods in Medicine and
Databased Modeling provides another excellent, detailed set of notes on G ENETIC A LGORITHMS.
This is much shorter than Melanie Mitchell’s book and was developed for a course on G ENETI -
C A LGORITHMS . It starts with a simple class of G ENETIC A LGORITHMS (very similar to the example
below), with four detailed examples. It continues into more variants of G ENETIC A LGORITHMS, and
discusses G ENETIC A LGORITHMS for continuous-valued problems in Chapter 5. The rest of the
notes are more advanced, and is not covered in this module.

3 INTRODUCTION

G ENETIC A LGORITHMS are a type of optimisation algorithm, which attempt top find the minimum (or
maximum) of a function. G ENETIC A LGORITHMS are part of a broader field in Machine Learning,
called Evolutionary Computing. G ENETIC A LGORITHMS imitate the concept of biological evolution
and natural selection to find individual solutions that are the ‘fittest’. The process in G ENETIC A L -
GORITHMS are essentially random, but with control over the structure, population, which random
processes to use, and measures to search for better solutions.

3.1 A basic algorithm for G ENETIC A LGORITHMS

Most G ENETIC A LGORITHMS consist of the following components:

• A population of candidate solutions, encoded in what are termed chromosomes, which can
reproduce to create new members of the population.

• A fitness function that gives a measure of how close to a good solution a chromosome is.

• A selection mechanism to choose chromosomes that will reproduce to create new population
members. This relies on the fitness function.

• Techniques to introduce randomness into the new generations. The most common techniques
are crossover and mutation that either swop bits of chromosome from the parent, or randomly
change bits of the chromosome. These have the effect of creating new candidate solution.

A basic algorithm for G ENETIC A LGORITHMS is given in Figure 1.

To get from one generation to the next, there are four basic steps, called operators:

Selection: Use the fitness function to pick a number of individuals in the population for reproduction.

Crossover: Merge the chromosomes of two parent individuals to generate two new individuals.
There are various mechanisms to do this.

3
Figure 1: A basic algorithm for G ENETIC A LGORITHMS

t ←0
compute initial population β0
while stopping condition not met do
select individuals for reproduction
create offspring by crossover
select and mutate some individuals
generate and select new generation
end while

Mutation: As in real evolution, there are essentially random processes at work (radiation, cancer,
viruses, aging,etc.)that cause mutations in genetic material. This same concept is used in
G ENETIC A LGORITHMS to randomly make changes in chromosomes to enhance diversity, and
thereby increase the probability of finding a solution.

Sampling: Again, as in the real world, individuals die. Sampling is another random process. To
keep the algorithm efficient the population is usually kept at a constant number, so that as
many individuals are removed as were created during the crossover operation.

3.2 A simple example

Let’s look at a simple optimisation task to see how G ENETIC A LGORITHMS works. Consider the
problem of maximising (finding the maximum value) the following function (example from Goldberg):

−x 2
f (x) = + 3x
10
where,
x ∈Z
x ∈ [0, 31]
(x is an integer between 0 and 31).

Figure 2 shows this function.

The choice of range and integer numbers here is deliberate. This is to illustrate that you can use a
binary encoding of the chromosome. We can encode the x-values as binary strings of 5 bits, from
00000 to 11111, which covers the complete domain of x ∈ [0, 31]. For example, x = 5 encodes to
the chromosome 00011. Every one of the 32 chromosomes in this case can be a possible solution
to this optimisation task.

The first step in the algorithm is to create an initial population. Set the population size to 10, and
select 10 chromosomes at random for the initial population.

Table 1 shows the initial population in the population column.

4
COS4852/U5

f(x)
A = (15, 22.5)

20

15

10

x
0 5 10 15 20 25 30

−x 2
Figure 2: f (x) = 10
+ 3x

We also define a fitness function that determine how close an individual is to the solution. In this
case the fitness function is simply the function we are trying to maximise, fit(x) = f (x). We also need
to select which chromosomes will reproduce. Fitter individuals should have a higher probability of
being selected. A simple way to do that is to weight the probability by the fitness value:
fit(xi )
P(i) = P10
k=1 fit(xi )

The fitness values and the likelihood of an individual being selected to create the next generation is
also shown in Table 1.

We can set a parameter to determine the ratio of individuals to select for reproduction. For simplicity
again, we can choose that we want to replace the entire population. To generate a new population
of 10 members, and each pairing produces 2 new members, we create 5 pairs randomly. The first
column in Table 2 shows the chromosome pairs selected to reproduce.

The next thing to do is to decide on the crossover point in each pair of chromosomes. In this case
this is a number from 0 to 4, indicating the position in the chromosome. A crossover point at 0

5
i chromosome xi fit(xi ) P(xi )
1 01011 11 20.9 0.1416
2 11010 26 10.4 0.0705
3 00010 2 5.6 0.0379
4 01110 14 22.4 0.1518
5 01100 12 21.6 0.1463
6 11110 30 0 0
7 10110 22 17.6 0.1192
8 01001 9 18.9 0.1280
9 00011 3 8.1 0.0549
10 10001 17 22.1 0.1497
Sum 147.6
Average 14.76
Max 22.4

Table 1: The initial population and calculations.

crossover mating new


i point pairs population xi fit(xi )
5 2 01|100 01010 10 20.0
2 11|010 01000 28 5.6
4 4 0111|0 01111 15 22.5
8 0100|1 01000 8 17.6
9 4 0001|1 00110 10 20.0
2 1101|0 11011 27 8.1
7 0 10110 10110 22 17.6
4 01110 01110 14 22.4
10 3 100|01 10010 17 22.1
8 010|01 01010 9 18.9
Sum 174.8
Average 17.48
Max 22.5

Table 2: The next generation

means that the parents are effectively cloned (as with chromosomes 7 and 4 here) (or this could be
interpreted as the parents not having any offspring, but surviving to the next generation – the result
is the same).

Once we have generated our new population we need to mutate some of them. This is also a
parameter that we can set - 0.001 is often a good value here, meaning that 0.001 of the available
bits will be swopped from a 0 to 1, or vice versa. This is another random choice. For a population of
10 with 5-bit strings there are 50 bits available to mutate. For our ratio of 0.001 that equals 0.05

6
COS4852/U5

bits. To maintain this mutation rate we have to randomly flip a bit once every 20 generations, on
average. For illustration purposes, we will flip a random bit in the first generation – the third bit of
chromosome 9, in this case.

Here we also see that random selection may result in the same chromosome being selected as the
parent of more than one offspring (chromosomes 2 and 8 here), while some don’t get selected at all
(chromosomes 1, 3 and 6).

In G ENETIC A LGORITHMS with more complex problems, and larger chromosomes the algorithm
would typically run for thousands of generations until it is stopped. One stopping criterion could
simply be a certain number of generations. A more appropriate criterion would be if the fitness
values does not improve for a number of generations. In this case we see that by the second
generation both the maximum fitness and the average fitness has increased. The maximum fitness
value here is fit(x4 ) = 22.5, which happens to be the solution (in a real problem we won’t know this).
So, running the algorithm for more generations will not increase the maximum fitness, but is likely to
improve the average fitness. This could be another stopping criterion – when the average fitness
approaches the maximum fitness arbitrarily close.

4 ACTIVITIES

4.1 TASK 1 - STUDY THE NOTES

Download the notes mentioned under Online Material above, and do the following:

• Study Chapters 1 and 2 in detail.

• Read Chapter 3 and discuss the relation between the analysis done there and Tom Mitchell’s
approach to hypotheses.

• Read Chapter 4 to learn more about other variants of G ENETIC A LGORITHMS.

• Study Chapter 5, Sections 1 and 2 on how to implement G ENETIC A LGORITHMS for real-valued
problems.

4.2 TASK 2 - IMPLEMENT G ENETIC A LGORITHMS

Find a number of integer-valued and real-valued problems that will be suitable for G ENETIC A L -
GORITHMS and implement the algorithm to solve these. Pay specific attention to the design of
chromosome, the fitness function, and the choice of operators.

© UNISA 2024

Common questions

Powered by AI

Maintaining population diversity in a genetic algorithm is crucial to prevent premature convergence and ensure robust exploration of the search space. This can be achieved through strategies such as adjusting crossover and mutation rates, employing diverse initial populations, and utilizing methods like fitness sharing or niching to preserve multiple optimal solutions. Higher mutation rates can increase diversity by introducing new genetic material, while mechanisms like fitness sharing ensure that diverse solutions contribute to evolutionary progress. These strategies are important to avoid stagnation in local optima and ensure adaptation to dynamic environments .

The fitness function in a genetic algorithm evaluates how well a candidate solution meets the desired criteria of the problem being solved. It influences the algorithm's effectiveness by providing a quantitative measure to compare different solutions, guiding the selection of chromosomes for reproduction. A well-designed fitness function effectively discriminates between better and worse solutions, ensuring that advantageous traits are propagated through generations. It ensures that the algorithm focuses on promising areas of the search space, directly impacting the efficiency and quality of the solutions obtained .

The selection process in genetic algorithms is vital because it determines which individuals get to reproduce and contribute their genes to the next generation. This process directly influences the convergence and quality of solutions. Typically, fitter individuals have a higher chance of being selected, which can be optimized using criteria such as fitness proportionate selection (roulette wheel) or tournament selection. These criteria enhance selective pressure, increasing the likelihood of preserving good solutions and driving the population toward optimal or near-optimal solutions. Proper adjustment of selection intensity is crucial to balance exploration and exploitation .

A genetic algorithm consists of several key components: a population of candidate solutions encoded as chromosomes, a fitness function, selection mechanisms, crossover, and mutation techniques. The chromosomes represent potential solutions and evolve over generations. The fitness function evaluates how close a chromosome is to an optimal solution, guiding the selection process for reproduction. Crossover merges chromosomes from parents, creating diverse offspring, while mutation introduces random changes to maintain genetic variability. These components work together to explore the solution space and optimize the function being addressed .

Crossover and mutation are crucial for enhancing the genetic algorithm's search capability. Crossover combines the genetic information of two parent chromosomes to produce new offspring. This operation explores new areas of the solution space by mixing existing solutions and introducing variations. Mutation introduces random changes to individual chromosomes, ensuring genetic diversity and preventing premature convergence on suboptimal solutions. It allows exploration of the solution space by altering individual traits, thus increasing the probability of escaping local optima .

Genetic algorithms can optimize the function f(x) = -x^2 + 3x by representing x as a binary chromosome of 5 bits, covering the range 0 to 31. The algorithm initializes a population with random binary strings, each representing a possible x value. Using a fitness function defined as the function being optimized, the algorithm selects individuals based on fitness, applies crossover and mutation to create offspring, and iterates through generations. By evaluating fitness values and generating diverse solutions, the algorithm hones in on optimal x values that maximize f(x), such as x = 15, yielding fit(x) values closer to the maximum .

Genetic algorithms search the hypothesis space by evolving a population of candidate solutions across generations. Each candidate, encoded as a chromosome, represents a potential hypothesis. The algorithm applies selection, crossover, and mutation to explore this space, guided by the fitness function to retain high-quality solutions while introducing diversity. This process is significant because it allows the algorithm to efficiently search complex and large hypothesis spaces, identifying optimal or near-optimal solutions without exhaustive search. By balancing exploration and exploitation, genetic algorithms can solve problems where traditional methods may struggle .

The initial population in a genetic algorithm is crucial because it establishes the baseline genetic diversity from which solutions evolve. A well-chosen initial population, representing a broad spectrum of potential solutions, enhances the algorithm's ability to explore the search space effectively. Optimization can be achieved by incorporating domain knowledge to generate more promising starting solutions or by ensuring randomness to capture diverse possibilities. A diverse and well-structured initial population aids in avoiding premature convergence and accelerates convergence on optimal solutions, significantly impacting the effectiveness of the algorithm .

When solving real-valued problems, genetic algorithms need to handle continuous variables, unlike integer-valued problems where discrete values are involved. This requires a different encoding strategy, often using floating-point representation rather than binary strings. Operators like crossover and mutation also need to be adapted to maintain precision and continuity. Real-valued genetic algorithms may use techniques such as arithmetic crossover or Gaussian mutation to effectively navigate the continuous search space. These adaptations allow the algorithm to efficiently explore and exploit real-valued problem domains .

Stopping criteria for genetic algorithms can include a fixed number of generations, stagnation in fitness values, or achieving a predefined fitness threshold. These criteria affect the algorithm's outcome by determining the point at which the search process concludes. A fixed generation count provides a predictable runtime, while monitoring fitness stagnation allows the algorithm to terminate once improvements plateau. Achieving a fitness threshold can ensure that only sufficiently high-quality solutions are accepted. The choice of stopping criteria balances computational efficiency with solution quality, influencing the algorithm's effectiveness and resource consumption .

You might also like