Genetic Programming: an introduction

(The following is a synopsis of chapters 5 and 6 of John R. Koza’s “Genetic Programming”)

OVERVIEW:

Genetic Programming is derived from Genetic Algorithms, and in many ways involves the same process of evolutionary natural selection. The core difference between Genetic Algorithms and Genetic Programming is the program structure. Genetic Algorithms (GAs) operate on fixed length binary strings representing a biological chromosome, while Genetic Programming (GP) instead operates on dynamic hierarchical programs. While in GA, the size and shape of a binary string are fixed, GP programs vary in size and shape. This makes GP programs more complex and, in many ways, more powerful.

The more complex and dynamic program structure of GP results in a cascade of other differences which separate it from GA. Because of this, some problems may be better suited to one method or the other. Generally the more complex the problem or the larger the search space for a solution, the more likely the problem should be solved by GP.

GP solves problems by breeding computer programs, then using reproduction and crossover in iterations through generations of programs, until a solution to the problem is found or some other set parameter is satisfied (such as number of generations run).

GP
Koza’s GP, Figure 5.1

Koza references Holland’s (1975) list of key features of all adaptive systems. For GP to operate successfully, the following must be defined:
· Initial Structures
· Fitness Measures
· Genetic Processes
· Program Memory
· Termination Criterion
· Designating a result

INITIAL STRUCTURES:

GP programs are best represented by expression trees, which are hierarchies containing functions and terminals.

Expression Tree

The above diagram represents the program string (in pseudocode):

if{x<4}then{t}else{r}}

Any of the functions if, <, then, else or terminals x, 4, t, r will be subject to change as GP progresses through generations.

It is important to select a sufficient set of primitive functions for a problem. For instance, with an ant trying to maneuver through a maze for food, MOVE, RIGHT, and LEFT are sufficient. Anything less is insufficient, while anything more may or may not be useful.

Keep in mind the set of functions and terminals must be sufficient, but extraneous functions and terminals will often degrade performance.

After the sufficient pool of primitive functions has been determined, the initial population (generation 0) is generated with expression trees containing random combinations of these functions and terminals.

A uniform random probability distribution is responsible for arranging the the expression trees within the initial population, using only the sufficient function set determined ahead of time.

There are three types of expression trees (pp. 92-93):
· Full (every branch of the tree is equal length)
· Grow (variably shaped trees bound only by maximum branch length)
· Ramped half-and-half (combination of full and grow)

Koza prefers to use the ramped half-and-half method for all of his GP demonstrations.

FITNESS MEASURES:

Defining fitness is important because fitness directly affects the primary GP operations of reproduction and crossover. There are many ways to accomplish this.

“The most common approach to measuring fitness is to create an explicit fitness measure for each individual in the population… Each individual in a population is assigned a scalar fitness value by means of some well-defined explicit evaluative procedure” (p. 94).

Koza describes four kinds of explicit fitness measures used in GP programs:

1. Raw Fitness (The “measurement of fitness that is stated in the natural terminology of the problem itself” [p. 95].)
2. Standardized Fitness (The lower the number the better. If the raw fitness is also striving for the lowest number, standardized = raw. Otherwise standardized is the inverse of raw.)
3. Adjusted Fitness (Computed from the standard fitness, adjusted fitness is always between 0 and 1. The bigger the adjusted fitness, the more fit the individual.)
4. Normalized Fitness (Computed from adjusted fitness, the sum of normalized fitness values equals 1.)

GENETIC PROCESSES:

The genetic processes in GP consist of two primary operations: Reproduction and Crossover.

Reproduction:
Reproduction in GP is asexual (one parent, one offspring).

“The operation of reproduction consists of two steps. First, a single [program] is selected from the population according to some selection method based on fitness. Second, the selected individual is copied, without alteration, from the current population into the new population (i.e., the new generation)” (p. 99).

Three methods of reproduction include:
· Fitness-proportionate selection (The most popular method–uses the fitness measure to determine how many of which programs are allowed to reproduce)
· Tournament selection
· Rank selection

Crossover:
Crossover in GP is sexual recombination (two parents, two offspring). It is, essentially, a swapping of subtrees.

Which parents are chosen for crossover is determined by the fitness-based selection method.

Random crossover points are chosen (again determined by using a uniform probability distribution) from each parent. Everything in the subtree below the crossover point is swapped with the subtree underneath the crossover point in the other parent.

Swapping of simple terminals (instead of entire subtrees) is common in GP, and this is the same as point mutation in GA–so mutation is inherent in GP crossover.

Another Crossover difference between GA and GP: In GP the two crossover points in the two parents are usually different; in GA the two points are always identical because crossover is operating on a fixed-length character string. This leads to better capacity for variety in GP:

“In genetic programming, when an individual incestuously mates with itself (or copies of itself), the two resulting offspring will, in general, be different (except in the relatively infrequent case when the crossover points are the same). As before, the Darwinian reproduction operation creates a tendency toward convergence; however, in genetic programming, the crossover operation exerts a counterbalancing pressure away from convergence. Thus, convergence of the population is unlikely in genetic programming” (p. 104).

Also important for GP crossover is the maximum permissible size, which is measured by the depth of the tree. Regulating the maximum depth keeps crossover from creating oversized trees and saves computing time.

There are also secondary operations in GP, all of which are optional and are probably best reserved for specific situations:
· Mutation (Asexual changing of an individual terminal, but occurs occasionally through sexual crossover in GP anyway. Useful in moderation in GAs, but really not necessary in most GP applications.)
· Permutation (Asexual inversion–swapping terminals of a function.)
· Editing (Follows editing rules such as shortening “(NOT ( NOT X))” to “X.” This can be used for cosmetic purposes and for simplification or improvement of GP output.)
· Encapsulation (Protection from crossover for ideal offspring subtrees, encapsulation offers “a means for automatically identifying a potentially useful subtree and giving it a name so that it can be referenced and used later” [p. 110].)
· Decimation (Deleting a portion of population, controlled by two parameters [p. 112]: “a percentage and a condition specifying when the operation is to be invoked.”)

MEMORY:

It is only necessary for the current generation to be maintained in memory for GP operations. Although the individuals in a population represent only a small set of possibilities in the realm of all possible programs, the genetic processes ensure the search will keep moving in the right direction–toward a solution.

TERMINATION CRITERION:

This is not nature and the computer will not run the program forever, so a termination criterion must also be set. This may be a set number of generations or a 100% exact solution to a problem, which, when satisfied, ends the program.

DESIGNATING A RESULT:

Results come in different forms. There may be a 100% exact solution, but maybe there is no single best solution. If the program is terminated at a set number of generations, then the best-so-far individual is usually chosen. The fitness parameters are used to determine what is the best-so-far individual, what is the 100% exact solution to the problem, or what percentage of correctness is close enough to suffice.

Tagged with:
 

Genetic Algorithms: an introduction

The following is a synopsis of chapter 3 of John R. Koza’s “Genetic Programming”:

Genetic algorithms provide a way to search a large space of possible solutions in a near-optimal way, using 4 Bio-Darwinian methods:

1. Natural Selection (find the members of the population with the best fitness)
2. Reproduction (replicate those members of the population identified to have the best fitness)
3. Crossover (cross-breed two parent chromosomes at a random point to produce new offspring)
4. Mutation (randomly change bits within a chromosome)

Genetic Algorithms
Koza’s Figure 3.1

Natural selection and reproduction result in a definite performance improvement within the population, but just using these two methods limits the potential players in the realm of all possible solutions. Basically the existing best persist and reproduce and the not-so-good go extinct, but if the optimal solution was not in the population to begin with, it could never be found because there is no variation.

“Natural selection does not create variety. It merely selects from whatever variety is already present in the population in order to increase the average fitness of the population as a whole” (pg. 48).

Crossover and mutation are necessary evils in that they slightly degrade the overall optimization, while opening up the possibilities toward discovering better solutions that were not present in the original population.

“The genetic crossover operation serves the necessary function of creating promising new individuals in the search space” (pg. 48).

So, by keeping relatively low levels of crossover and mutation within the GA architecture, new solutions can be discovered while still allowing natural selection and reproduction to keep the balance tilting toward an optimal solution.

“The allocation of future trials is most nearly optimal when [crossover and mutation rates] are both small. A schema with a relatively short defining length and a relatively few defined positions is a building block which will be propagated from generation to generation at close to the near-optimal rate. The genetic algorithm processes such schema most favorably. A problem whose solution can be incrementally built up from schemata of relatively short defining length and relatively few defined positions is handled by genetic algorithms in a near-optimal way” (pg. 49).

The interesting thing about the genetic algorithm is that it is a domain independent process and can find near-optimal solutions without even storing a memory of the solution’s lineage. The only prerequisite is for the problem to be defined in terms the GA can “understand.”

This prerequisite can be fulfilled by applying a simple 4-step process to any problem (pp. 53-54):

1. Select the representation scheme.

This often includes transitioning problem parameters to binary strings. This is an ideal way to present the problem to the GA as a “chromosome” ripe for reproduction, crossover and mutation.
2. Identify the fitness measure.
This step is crucial to defining what the problem is seeking to achieve.
3. Define control parameters and variables.
Here is where the population, bit string length, and generations to be run are defined.
4. Define when / how the run will be terminated and what will be designated as the best result.

Koza proves the point of how domain independent the GA is by giving several different in-depth examples in this chapter: among them a restaurant-burger conundrum, a load-bearing optimizer, and the ever-popular ant-food path optimization problem. All of these problems can be broken down into binary strings allowing solutions to be figured out through the use of genetic algorithms. A good read, for sure.

Tagged with: