Defining Fitness

Keeping Fit
What we have so far:
Having a good definition of fitness is going to be crucial to our success with Alphabet Soup, and any of our games for that matter. So far we have only one fitness factor, a rather rudimentary “meatiness” goal: the more 1s, the better.

Aiming for the maximum fitness of 1, we define an ideally fit chromosome of 16 binary “alleles” as 16/16 =1, or:

chromosome = 1111111111111111 =
1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1 = 16/16 = 1 = 100% fit.

Anything less than ideal is given a fitness value such as 13/16 = 0.81, or:

chromosome = 1111110111011101 =
1+1+1+1+1+1+0+1+1+1+0+1+1+1+0+1 = 13/16 = 0.81 = 81% fit.

This is all very simple. Too simple. But it’s a start. The question is, where do we go from here?

In progress:
The next step will be to include the schemata for the letters. Fitness will be a combined value of the “meatiness” and the “likeness” defined by a comparator function. Here is how it will all work:

First we will have the letter genomes in the Binary Letter Library broken down and defined into the final schemata:

charLine[0] = 0*****0000000000

lines 1 thru 14

charLine[15]=0******0000***00

Where there is a “0″, the value is always zero–in any letter in our population. Why calculate that if it is never going to change? The remaining values then, represented by asterisks, will be either 1s or 0s, depending on the letter. The results of the GA generations will be run through the schemata, with the remaining asterisked positions matched up against the letters in the library. Matching to the nearest letter will be easy. Say there are 125 asterisked positions. After the first generation of GA takes place, the fittest-so-far (meatiest individual) is passed through the schemata. All zero-only spots (such as charLine[0][0], charLine[0][6] are dropped immediately. All asterisk “either-or” positions are evaluated. If charLine[0][1] is a ‘1′, it is matched to all letters with a value of ‘1′, and so on. The genome will then take the shape of the letter with the most matching values….

An important technical aspect of this that we have agreed on is to have the comparator only change the way the letters are displayed, while leaving their array intact–aside from the schemata. In other words, the original GA-defined chromosome/genome is maintained, while a copy of itself is stripped to be matched against the schemata. This allows it to maintain its evolutionary integrity with a level of abstraction so more fitness functions can be performed. I will deal more with this later.

So how do we define the new fitness value, with both the schemata (likeness) and meatiness in mind? Here is a 5-step process that should do the job:

1. Take value of current generation individual. Let’s use for genInd[5], say: 0000101011101110

2. Pass through schemata. Let’s use generic letter line 5, which might be 00*****0******00, making genInd[5] now equal 0000101011101100
(the only bit that changes is the second-to-last one, because based on the schema that bit always equals 0).

3. Define “likeness value.” For simplicity’s sake let’s use charAline[5] which equals 0000010001000000. For all bits of genInd[5] that match charAline[5], add a 1:

So, comparing 0000101011101100 (genInd[5]) to 0000010001000000 (charAline[5]), and disregarding schemata characters that are always 0 (blue), we find only 4 bits that match (red), giving genInd[5] the “likeness” value of 4 as compared to charAline[5].

4. Define the “meatiness value.” To incorporate the “meatiness” factor, let’s add an additional +1 to each value of matching letter that is, actually, “1″ (meat). When we add up all the matching value of 1s between genInd[5] and charAline[5], we only get +1: 0000101011101100 compared to 0000010001000000. So, for this particular example, the meatiness value = 1.

5. Add “likeness value” and “meatiness value”.To get the total fitness value, add the likeness value (4) to the meatiness value (1) for a total value of 5.

Note: since this schemata line has 5 characters that are always equal to 0 (shown here: 00*****0******00), they are subtracted from the maximum fitness amount. So, the maximum fitness of this line is 16-5 = 11 possible values that match (“likeness” value), plus the possibility that all of those values are 1s (“meatiness” value), making the total maximum fitness of this line equal to:

11 (max “likeness”) + 11 (max “meatiness”) = 22. So in our example,
4 (likeness) + 1 (meatiness) = 5 was not all that great (5/22 = 23%). Other letters in the alphabet will much likely have higher fitness on this line.

I think this will be a good way to determine total fitness–at least in early versions of Alphabet Soup.

Future awesomeness:
All technicalities aside, this is where Alphabet Soup can get really interesting. This will probably be around… version 3.0 or so. Right now, these are just vague ideas.

Fitness based on letter shape was the original goal of Alphabet Soup all along:

“Hark! Here comes a second generation character, ‘i’, who has lost the vestigial tittle (the dot). Most other letters are unchanged or changed in a negative or neutral way. The lower mass of the ‘i’ makes it faster and take in less calories. Alas, it’s lower weight makes it do less damage and is losing ground to the dastardly ‘T’s and their hearty top most shield. Will the character genes be favorable enough to carry on or will they be lost and another random variation will make itself dominant. At the front of ‘i’s’ and ‘T’s’ war, a fork occurs with the ‘w’s. The ‘w’s strike fast coming down but their double point distributes the impacts, lessoning the effect on harder shelled letters like the ‘T’.”

With this, things like food, environment and the ability to reproduce/fight/defend are all important.

Another idea is this: fitness is dynamic, determined by the players. They change the landscape–by adding obstacles, giving new goals (such as winning a race, building a tower, et cetera). Different letters will be more suited to different tasks, so fitness will vary from task to task. This might keep it more interesting than if fitness was just based on one thing (like meatiness or shape). Also, on the note of shape mattering–maybe shape won’t matter as much as far as fitness value is concerned, however shape may be important for movement in Farseer if we can get the physics of it right. So… if the goal is to get over a mountain or something, the shape and physics will be important to the letter’s success, indirectly having an effect on the fitness anyway.

A good comparison is to think of how to keep a Sim alive: by balancing a variety of values that are important to the character’s success at staying alive, such as social value, food, health, activity, recreation, knowledge, and so on. Any of these kinds of ideas could be incorporated into the fitness value to make it more interesting and keep the player more involved in what is happening in the Soup. One important key to this is to allow the Evolutionary Computation to do its thing without interferring. If we can hone that delicate balance between the EC and the player-letter interaction, Alphabet Soup could be great.

The future awesomeness of Alphabet Soup depends on our ability to come up with more fitness ideas, and to make them actually work.

Other considerations:
We have talked a lot about including sensing organs for letters to be able to interact with their environment on a biological level. Also in discussion are color values which may or may not be related to fitness, sensing ability, or overall health of a letter. We will have to deal more with these ideas and see what we can agree on.

One thing we have reached consensus on is the level of abstraction that will be present in our final product. Overall, we are going to let the GA and GP do their things in a very precise way, keeping the Evolutionary Computation aspect as pure as possible–but this will all be going on behind the scenes. The precision in the background will be modified and “rounded” (such as with the schemata) before reaching the eyes of the player. This will allow the letters to maintain their forms, allow the Physics Engine to operate on fixed shapes, and so on. However there will likely be a more abstract level of EC going on as well, such as with the shapes present in the physics engine and how they react to one another. The more levels of EC operating within the game, the more rich the environment and the more capable it will be of surprising us.

But we don’t want the abstraction to completely cover up what is going on behind the scenes.

One way to be able to unite the abstraction with the player’s awareness of what is going on is through keeping a detailed history of each genome in the population. We should somehow tie the chromosomes to their respective genome so that the genome holds its identity and evolutionary lifespan throughout its own history. Maybe stamp each chromosome with some sort of hidden i.d. assigning it to its current genome. These generational histories will provide the owner of the letter with information showing how old the letter is (how many generations it has survived) and each detailed transformation it has undergone during its evolution to its current state. This could be useful for farming, trading or even–who knows–combat? Dog-fighting is illegal, but how about pitting our letters against each other? There are all kinds of scenarios and places we could go with Alphabet Soup. And as the creators of Dwarf Fortress believe, the more ideas we integrate, the better the game might become.

Tagged with:
 

Development Notes

There have been some exciting developments in Alphabet Soup. We now have a functioning Binary-Bitmap Converter, and a working Genetic Algorithm. I thought I’d make some notes on the first few successful runs of the GA, so here goes:

For the simplest Alphabet Soup chromosome, we are using a 16-bit binary string which, while all 1s and 0s, isn’t functioning as a true binary string. Instead, for our simplest calculation of fitness, we are assigning the “meatiest” chromosome (the one with the most 1s) as the fittest individual. So instead of going by the decimal value of 0000001000100101 (which is 549) we are adding the 1s for a value of 4. This cuts back on processing power, especially when dealing with a full 256 bits, and it allows us some easy numbers to work with for churning out a fitness value of any given chromosome. So, the optimal value is 16, represented as: 1111111111111111.

Success!
Success!

Our first run of the GA striving to reach this value with a random starting population has been a surprising success. Reproduction and crossover rapidly increase the fitness no matter what the starting population. The GA often converges on not-quite ideal solutions, but with enough generations the mutations do eventually converge on the correct string. Our mutation rate is set to 10%. The other parameters are: min:0, max:2, crossover:80%, length:16.

These parameters seem to work quite well. I changed the population paramater to see how that affected the runs, and the results were interesting. The following chart describes our first six runs of the “Alpha” Alphabet Soup Genetic Algorithm:

Population     Generations to converge     Total number of individuals    
2 520 1040
2 791 1582
4 273 1092
4 376 1504
50 42 (w/ 2 mutants) 2100
50 63 (w/ 1 mutant) 3150

I found it interesting that convergence seems to happen faster with a smaller population. I was especially surprised at the efficiency with which only two parents managed to converge on the correct solutions. The least number of individuals required to converge was 1,040, which happened with only two in the population, while the longest run required 3,150 individuals–using a population of 50. That’s still not bad, considering the search space contains 65,536 possibilities.

Tagged with:
 

Genetic Algorithms: an introduction

The following is a synopsis of chapter 3 of John R. Koza’s “Genetic Programming”:

Genetic algorithms provide a way to search a large space of possible solutions in a near-optimal way, using 4 Bio-Darwinian methods:

1. Natural Selection (find the members of the population with the best fitness)
2. Reproduction (replicate those members of the population identified to have the best fitness)
3. Crossover (cross-breed two parent chromosomes at a random point to produce new offspring)
4. Mutation (randomly change bits within a chromosome)

Genetic Algorithms
Koza’s Figure 3.1

Natural selection and reproduction result in a definite performance improvement within the population, but just using these two methods limits the potential players in the realm of all possible solutions. Basically the existing best persist and reproduce and the not-so-good go extinct, but if the optimal solution was not in the population to begin with, it could never be found because there is no variation.

“Natural selection does not create variety. It merely selects from whatever variety is already present in the population in order to increase the average fitness of the population as a whole” (pg. 48).

Crossover and mutation are necessary evils in that they slightly degrade the overall optimization, while opening up the possibilities toward discovering better solutions that were not present in the original population.

“The genetic crossover operation serves the necessary function of creating promising new individuals in the search space” (pg. 48).

So, by keeping relatively low levels of crossover and mutation within the GA architecture, new solutions can be discovered while still allowing natural selection and reproduction to keep the balance tilting toward an optimal solution.

“The allocation of future trials is most nearly optimal when [crossover and mutation rates] are both small. A schema with a relatively short defining length and a relatively few defined positions is a building block which will be propagated from generation to generation at close to the near-optimal rate. The genetic algorithm processes such schema most favorably. A problem whose solution can be incrementally built up from schemata of relatively short defining length and relatively few defined positions is handled by genetic algorithms in a near-optimal way” (pg. 49).

The interesting thing about the genetic algorithm is that it is a domain independent process and can find near-optimal solutions without even storing a memory of the solution’s lineage. The only prerequisite is for the problem to be defined in terms the GA can “understand.”

This prerequisite can be fulfilled by applying a simple 4-step process to any problem (pp. 53-54):

1. Select the representation scheme.

This often includes transitioning problem parameters to binary strings. This is an ideal way to present the problem to the GA as a “chromosome” ripe for reproduction, crossover and mutation.
2. Identify the fitness measure.
This step is crucial to defining what the problem is seeking to achieve.
3. Define control parameters and variables.
Here is where the population, bit string length, and generations to be run are defined.
4. Define when / how the run will be terminated and what will be designated as the best result.

Koza proves the point of how domain independent the GA is by giving several different in-depth examples in this chapter: among them a restaurant-burger conundrum, a load-bearing optimizer, and the ever-popular ant-food path optimization problem. All of these problems can be broken down into binary strings allowing solutions to be figured out through the use of genetic algorithms. A good read, for sure.

Tagged with:
 

Character Collisions

The binary-bitmap conversion portion of the BBC has already been discussed and is in development. But there are many other aspects of this part of the program that need to be considered, including a physics engine and the motion of the letters. One thing that’s been bugging me is how the letters will react when they run into one another.

Let’s assume that two letters, an ‘A’ and a ‘B’ are floating around in their predetermined environment. They run into each other–what happens?

First of all, we need to clearly define what consitutes a character. There are 256 pixels in the 16×16 square, yes, but most of the border consists of ‘0’s–white space. Is it going to be considered a collision when the outer white pixels of the bitmap collide, or are we going to go with a more literal description of the letters running into each other, i.e. the black pixels (‘1’s)?

I think the second idea makes more sense, even though in computer logic the bitmap touches the other bitmap when the 16×16 squares meet, which causes a kind of predicament for us. The crazy thing about going with option 2 is that it will cause a morphing in the bitmap of at least one of the letters, even before the actual collision takes place. In other words, the ‘A’ bitmap will already change before the genetic code is activated.

                    Purple shows changed pixels:
                    0000000000000000
                    0000000000000000
                    0001111111000000
                    0001111111100000
                    0001000000110000
                    0001000000010000
                    0001000000010000
                    0001000000110000
                    0001111111100000
                    0001111111110000
                    0001000000010000     ‘A’ bitmap becomes:
00000000000001000000010000     0000000000000100
00000001000001000000110000     0000000100000100
00000011100001111111100000     0000001110000111
00000010100001111111000000     0000001010000111
00000110110000000000000000     0000011011000000
0000010001000000                         0000010001000000
0000110001100000                         0000110001100000
0000100000100000                         0000100000100000
0000100000100000                         0000100000100000
0001111111110000                         0001111111110000
0001111111110000                         0001111111110000
0001000000010000                         0001000000010000
0011000000011000                         0011000000011000
0010000000001000                         0010000000001000
0110000000001100                         0110000000001100
0000000000000000                         0000000000000000

One way around this is to make the border ‘0’s (white pixels) invisible, or transparent to the other letters. Also, we could somehow use them so that when the outer pixels of two letters collide, they activate some sort of “signal” (probably a specific function in the code) that a collision is imminent and that the decision-making procedure needs to occur. Maybe this could even change the white pixels to a different color as an extra visual aid to show that a collision is about to take place somewhere within the soup–a way to highlight the action. Just a thought.

Before:                                             Collision Imminent!
0000000000000000                         0000000000000100
0000000100000000                         0000000100000100
0000001110000000                         0000001110000111
0000001010000000                         0000001010000111
0000011011000000                         0000011011000000
0000010001000000                         0000010001000000
0000110001100000                         0000110001100000
0000100000100000                         0000100000100000
0000100000100000                         0000100000100000
0001111111110000                         0001111111110000
0001111111110000                         0001111111110000
0001000000010000                         0001000000010000
0011000000011000                         0011000000011000
0010000000001000                         0010000000001000
0110000000001100                         0110000000001100
0000000000000000                         0000000000000000

The fight-or-flight function, another good chance for GP to help with decision-making, will then determine: Do I, as an ‘A’…
1. continue toward the ‘B’ (attack),
2. change direction to avoid ‘B’ (retreat),
3. breed with ‘B’ (perform GA/GP)?

The thing is, both letters will be running this function to determine the best course of action. It would be interesting in this scenario to see what would happen if ‘A’ tried attacking while ‘B’ decided to activate the genetic algorithm. What would happen? Who would win? Maybe the program would crash and we’d need to tweak it, but whatever–this is just a brainstorming session.

All this can be figured out. I think the main problem with the white borders in a bitmap will be aesthetics: the collision wouldn’t really look like a collision if just the outer empty sections of the letter boxes are running into each other. We can go with the highlighting/color-changing idea, or maybe we can implement some sort of overlap in the outer white pixels on the bitmaps to show more precise-looking collisions. The speed of the letter movement could also be enough to distract the user’s eye from the extra white space. Then again, as small as a 16×16 bitmap is, the difference is probably subtle enough that it won’t even be noticeable.

Tagged with:
 

*Define Initial Population*

The initial population in the final version of Alphabet Soup may likely contain one of each keyboard character available. For now, for simplicity’s sake, let’s begin with 4 characters: A,B,a,b. In our game, all 4 characters decide to run into each other at once.

First we will convert the 16×16 characters into binary, 0 for gray and 1 for yellow:

A:
0000000000000000
0000000100000000
0000001110000000
0000001010000000
0000011011000000
0000010001000000
0000110001100000
0000100000100000
0000100000100000
0001111111110000
0001111111110000
0001000000010000
0011000000011000
0010000000001000
0110000000001100
0000000000000000

A, Binary String: 0000000000000000000000010000000000000011100000000000001010000000000001101100000000000100010000000000110001100000000010000010000000001000001000000001111111110000000111111111000000010000000100000011000000011000001000000000100001100000000011000000000000000000
A, Binary-Decimal conversion: 95786086615602871731981918389383308042547593781186608

B:
0000000000000000
0000000000000000
0001111111000000
0001111111100000
0001000000110000
0001000000010000
0001000000010000
0001000000110000
0001111111100000
0001111111110000
0001000000010000
0001000000010000
0001000000110000
0001111111100000
0001111111000000
0000000000000000

B, Binary String: 0000000000000000000000000000000000011111110000000001111111100000000100000011000000010000000100000001000000010000000100000011000000011111111000000001111111110000000100000001000000010000000100000001000000110000000111111110000000011111110000000000000000000000
B, Binary-Decimal conversion: 46403387827017775033653233413204161160929512263696

a:
0000000000000000
0000000000000000
0000000000000000
0000001111100000
0000011111110000
0000110000110000
0000000000010000
0000000111110000
0000011111010000
0000110000010000
0000110000110000
0000111001110000
0000011111011000
0000000000000000
0000000000000000
0000000000000000

a, Binary String: 0000000000000000000000000000000000000000000000000000001111100000000001111111000000001100001100000000000000010000000000011111000000000111110100000000110000010000000011000011000000001110011100000000011111011000000000000000000000000000000000000000000000000000
a, Binary-Decimal conversion: 86418088698874722540000455600782178239016967

b:
0000000000000000
0000100000000000
0000100000000000
0000100000000000
0000101111000000
0000111111100000
0000110000110000
0000100000010000
0000100000010000
0000100000010000
0000100000010000
0000110000110000
0000111011100000
0000101111000000
0000000000000000
0000000000000000

b, Binary String: 0000000000000000000010000000000000001000000000000000100000000000000010111100000000001111111000000000110000110000000010000001000000001000000100000000100000010000000010000001000000001100001100000000111011100000000010111100000000000000000000000000000000000000
b, Binary-Decimal conversion: 766259462624453036391004127445265907255673543052767246

So, our Initial Population (POPinit) consists of the following four characters and their decimal representation:
1. (A) 95786086615602871731981918389383308042547593781186608
2. (B) 46403387827017775033653233413204161160929512263696
3. (a) 86418088698874722540000455600782178239016967
4. (b) 766259462624453036391004127445265907255673543052767246

*Define fitness of POPinit*

First, we establish the Total, the Best, the Worst and the Average based on their decimal representations:
Total: (A+B+a+b) 862091952714301014596894421608062875060164244585234517
Best: (b) 766259462624453036391004127445265907255673543052767246
Worst: (a) 86418088698874722540000455600782178239016967
Average: 23958122541618481117083899450474345137254323064508656.25

Based on these findings, we can calculate the chance of Reproduction / Survival for each character.
A: 0.1111088977388315456010566688830799217993208329796695901253… = 11% chance of reproducing
B: 0.0000538264945878643991068686448919016779835612586246604953… = <1% chance of reproducing
a: 0.0000000001002423099146058828565205588056220034311729841463397… = MUCH <1% chance of reproducing
b: 0.8888372756663382800852305796155076177170736023305327652329… = 88% chance of reproducing

*Perform Reproduction Function*

As you can tell from the fitness of each character, the character “b” has by far the best chance at survival, while character “A” is the only other letter with much of a chance at all. In the actual game, the reproduction function will decide who actually survives reproduction so it is impossible to say for sure, but based on these percentages we will go with three b’s and one A surviving to population generation 0.

So, the result of the reproduction function should look something like:

1. (b) 766259462624453036391004127445265907255673543052767246
2. (b) 766259462624453036391004127445265907255673543052767246
3. (b) 766259462624453036391004127445265907255673543052767246
4. (A) 95786086615602871731981918389383308042547593781186608

*Define fitness of gen0″

Again, we determine the total, best, worst and average fitness of the new population, gen0:

Total, better: (b+b+b+A) 2394564474488961980904994300725181029809568222939488346
Best, same: (b) 766259462624453036391004127445265907255673543052767246
Worst, better: (A) 95786086615602871731981918389383308042547593781186608
Average, better: 598641118622240495226248575181295257452392055734872086.5

Note that the Total, Worst, and Average fitness of the population all improved as an expected result of natural selection. The Best fitness, meanwhile stays the same–and it has no chance for improvement until running the next stage of the GA…

*Perform genetic crossover operation on gen0*

Start with 2 parents. Let’s use 3 & 4 as an example with an arbitrary crossover point (we’ll say 170 digits in–though this would actually be determined by the randomly generated genetic operation):

3. (b) 766259462624453036391004127445265907255673543052767246 =
00000000000000000000100000000000000010000000000000001000000000000000101111000000000011111110000000001100001100000000100000010000000010000001000000001000000100000000100000 | (crossover point) 01000000001100001100000000111011100000000010111100000000000000000000000000000000000000

4. (A) 957860866156028717319819183893833080425475937811868 =
00000000000000000000000100000000000000111000000000000010100000000000011011000000000001000100000000001100011000000000100000100000000010000010000000011111111100000001111111 | (crossover point) 11000000010000000100000011000000011000001000000000100001100000000011000000000000000000

From this crossover, we get 2 new offspring characters:

offspring1 (o1):
0000000000000000
0000100000000000
0000100000000000
0000100000000000
0000101111000000
0000111111100000
0000110000110000
0000100000010000
0000100000010000
0000100000010000
0000100000110000
0001000000010000
0011000000011000
0010000000001000
0110000000001100
0000000000000000

o1, Binary String: 0000000000000000000010000000000000001000000000000000100000000000000010111100000000001111111000000000110000110000000010000001000000001000000100000000100000010000000010000011000000010000000100000011000000011000001000000000100001100000000011000000000000000000
o1, Binary-Decimal conversion: 766259462624453036391004127445265907255673543589892144

offspring2 (o2):
0000000000000000
0000000100000000
0000001110000000
0000001010000000
0000011011000000
0000010001000000
0000110001100000
0000100000100000
0000100000100000
0001111111110000
0001111111010000
0000110000110000
0000111011100000
0000101111000000
0000000000000000
0000000000000000

o2, Binary String: 0000000000000000000000010000000000000011100000000000001010000000000001101100000000000100010000000000110001100000000010000010000000001000001000000001111111110000000111111101000000001100001100000000111011100000000010111100000000000000000000000000000000000000
o2, Binary-Decimal conversion: 95786086615602871731981918389383308042547593244061710

So gen1 looks like:
1,2 are the same as before; 3,4 are the offspring of crossover:
1. (b) 766259462624453036391004127445265907255673543052767246
2. (b) 766259462624453036391004127445265907255673543052767246
3. (o1) 766259462624453036391004127445265907255673543589892144
4. (o2) 95786086615602871731981918389383308042547593244061710

*Convert gen1 offspring to valid characters*

Now, 3 and 4 are new shapes that do not correspond with any of our 4 letters. The BBC (Bitmap-to-Binary Converter) will be used to “read” these bitmaps and find the closest letter in the space of possible letters which we will define. So, the offspring will turn into something like “h” and “8″, because those are the closest possibilities within the alphabet.

I was going to include a range in decimal numbers and apply that to the nearest alphabet character within that range, but I don’t think that will work because more often than not the offspring will just end up reverting to one of the two characters from which it was created. From this perspective, the BBC may be a very useful portion of the AS program.

So, o1 become “h” and o2 becomes “8″, converting gen1 characters to:

o1=h:
0000000000000000
0000110000000000
0000110000000000
0000110000000000
0000110111000000
0000111111100000
0000111000110000
0000110000110000
0000110000110000
0000110000110000
0000110000110000
0000110000110000
0000110000110000
0000110000110000
0000000000000000
0000000000000000

h, Binary String: 0000000000000000000011000000000000001100000000000000110000000000000011011100000000001111111000000000111000110000000011000011000000001100001100000000110000110000000011000011000000001100001100000000110000110000000011000011000000000000000000000000000000000000
h, Binary-Decimal conversion: 1149389193936678235951120191876555791760811195370319884

o2=8:
0000001111000000
0000011111100000
0000110000110000
0000100000010000
0000100000010000
0000110000110000
0000011111000000
0000011111100000
0000110000110000
0000100000010000
0000100000010000
0000100000010000
0000110000110000
0000011111100000
0000001111000000
0000000000000000

8, Binary String: 0000001111000000000001111110000000001100001100000000100000010000000010000001000000001100001100000000011111000000000001111110000000001100001100000000100000010000000010000001000000001000000100000000110000110000000001111110000000000011110000000000000000000000
8, Binary-Decimal conversion: 23539885800661303801528815919744719668529798326595472592908

So the gen1 now looks like:
1. (b) 766259462624453036391004127445265907255673543052767246
2. (b) 766259462624453036391004127445265907255673543052767246
3. (h) 1149389193936678235951120191876555791760811195370319884
4. (8) 23539885800661303801528815919744719668529798326595472592908

From which we evaluate the fitness again (“8″ clearly will be the new dominate gene) and so on, generation after generation…

*Conclusion*

This is the basic gist of Alphabet Soup by GA. The final aspect of using GA is to incorporate mutations: the random changing of various pixels within the letters, which will obviously introduce the same effect of crossover in adding to the population of letters. Mutations will only be applied to a small portion of the entire population in each generation, just like in nature. With only 4 characters to choose from, let’s say there are no mutations in this particular generation. That gives us the current pool of characters, including the initial population, gen0 and the current generation (gen1):

[POPinit][gen0] [gen1]
[A,B,a,b][b,b,b,A][b,b,h,8]

So, we have 6 “b”s, 2 “A”s, 1 “a”, 1 “h”, 1 “B” and 1 “8″ floating around. Granted this is an example run, but it plays out pretty realistically: note that the dominate gene (b) is taking up quite a bit of the population, while a and B have not grown at all in population. “h” and “8″ are the new kids on the block, and they will have a direct impact on the future generations, again based on their fitness for reproduction and crossover–with a little bit of mutation thrown in here and there.

One of the core questions remains: How do we make the shapes of the characters matter? In other words, how do we specify the importance of various pixels? This, I believe, can also be accomplished through the BBC, and I have some ideas for that. But more on that later.

Over six months have passed since The Project began, and it was long before that the ideas were seeded. Pariahpism published his version of The Project origins right away. While this was an accurate description, I feel it is time to post my view on where it all started, what the past six months have involved, and where we might be going with The Project.

It began with an article published in 2007 by Alan Bellows on DamnInteresting.com about hardware evolution, “On the Origin of Circuits”. A comment in this article led me to Kevin Kelly’s book, Out of Control which is fully readable online. Now normally in my internet browsing I am not going to follow a link to a book and end up reading it in its entirety, but that is exactly what happened with Out of Control. The book starts out in Biosphere 2 in Arizona, and I thought, “Hey, I’ve been there!” And it only got more interesting as it went. Who would have thought ecology, philosophy, technology, economies and evolution had so many things in common.

Turns out Kevin Kelly is no scientist or engineer, but he is a hub of information, finding new meaning and understanding in many subjects, especially technology, with direct access to the top thinkers of our time. His blog The Technium continues to amaze and inspire me, while helping me think outside the box.

Out of Control covered a broad range of subjects, but some information seemed to stick out from the rest. I distinctly recall being impressed by the following, from Chapter 2:

In the film Batman Returns a horde of large black bats swarmed through flooded tunnels into downtown Gotham. The bats were computer generated. A single bat was created and given leeway to automatically flap its wings. The one bat was copied by the dozens until the animators had a mob. Then each bat was instructed to move about on its own on the screen following only a few simple rules encoded into an algorithm: don’t bump into another bat, keep up with your neighbors, and don’t stray too far away. When the algorithmic bats were run, they flocked like real bats.

It all came down to simple mathematical algorithms. Throw enough math into the software and, TADA! Biology plays out on your computer screen! Then, Chapter 15 dealt specifically with Artificial Evolution, while Chapter 17 “An Open Universe” expounded on that. Tom Ray and his game Tierra were highlighted, as was the GP master John R. Koza. These would soon become household names to The Project.

Fast-forward to July 2009. Pariahpism had taken to game journaling, and while not being interested in gaming I was surprisingly enthralled by his stories. My experience with software engineering combined with the memories of Origin of the Circuits and Out of Control prompted a sudden want to try to develop something of the same substance, but on a different scale. Imagine gaming journals about a self-evolving, unpredictable game! Immediately ideas started to take hold. We could provide the goals and the necessary environments, but let the games do the work!

After brief but furious discussions on the matter, Pariahpism proceeded to write about a number of game ideas brewing in our heads:

Alphabet Soup
The Field of Blobs
The Arti-Field of Blobs – The Field of Blobs Side-Quest
Game Untitled

And then the serious research began to see how to go about bringing such ideas to fruition. Little did we know everything that was already out there. Computational Intelligence is a huge realm under AI which includes, among others, Genetic Programming, Genetic Algorithms, A-life, Neural Networks, and so on. Books, websites and whole development environments are relegated to these subjects, prompting us to back up a step and begin collecting information at a rapid pace.

And here we are—still collecting. I have decided to completely overhaul the old website and devote it to The Project, and video gaming in general. Why dailyvillain.com? It started in 2004 as an inside joke, turned nickname, turned URL. It even stood as an acronym on the website for a while—something like Vagabond Interpreting Life’s Lessons and Idiosyncratic Nuances—but that was way too long. Anyway, the web space was already there and now it exists and persists as The Project’s home.

So where is The Project headed? Our original goals still stand. Early versions of Alphabet Soup and various programming languages are the focus of most of our attention for the moment, but concerning the future, anything is fair game at this point. The nice thing is that the interwebs allows the unprecedented vast sharing of information, letting us build on what others have already created. The beauty of Open Source is that it is more than just convenient, it also allows for the acceleration of information sharing, and of course, innovation. That’s what The Project is all about.

For a tad more organization… This blog will help to track our thoughts and progress, while this blog has become the home for the aforementioned gaming journals, other game reviews and pretty much anything game-related.

Tagged with:
 

Alphabet Soup, 0.01

(Originally published December 29, 2009)

The Project still requires an immense amount of reading and research before it can produce much fruit, but there is nothing to keep us from starting with the basics (except of course the risk of extinction like everyone who has ventured before us).

We obviously cannot do this completely from scratch, so to find out where to begin, we look to simple GP and GA examples. Pariahpism discovered MSDN’s example program using ants to find the most efficient way to follow a trail of food in a 32×32 square matrix. The program uses C# in the .NET framework, along with Microsoft’s System.CodeDom Namespace.

Full source code for the example program is available here.

Now, to apply the principles of the Ant/Food program to Alphabet soup, we begin by breaking the program into the comparable classes. I consider the User Interface a separate aspect of the program to be built later, but additionally there are two necessary sets of high level classes.

The first set is the problem set, which includes the customized goals and limitations. Basically, these are the details specific to our program (letter class, number class, case class, goals, movements, starting population, and so on). While the article’s program describes an ant following a path of food, Alphabet Soup will involve alphabet characters reacting to collisions with one another. A secondary but important aspect of this set of classes is a private method (or methods) included to record what happens within the program–the generational “history” which is saved to an array so the array instances can be compared for fitness level. This allows the program to use natural selection for determining “who” survives to the next generation–and, in the case of Alphabet Soup–what the future holds for those letters which collide, based in part on the fitness of past results. For the early versions of our program, the problem set and UI will take up the bulk of our customized coding.

The GP framework, on the other hand, already exists more or less. It does still have to be customized as far as the variable names are concerned, but the basic set of code instructions will be very similar to what the article defines–especially if we create our first program in C#. In other words, the GP framework is a standardized code including the necessary algorithms for mutations and inheritance.

At the core of the GP framework, the Ant/Food program uses the Generator class, which the article refers to as the “kernel” of the GP implementation. It does, after all, perform: “selection, breeding, code generation, execution, and result sorting.”

Something I’ve been seeing over and over again in GP is Koza’s “tree” full of terminals and functions. For the ant program, functions include “FacingFood” and terminals include “Left” and “Right.” For Alphabet Soup, let’s use the examples of function: “CollidingLetter” with terminals “Avoid” and “Attack.” Random expressions are created by the algorithm by combining available functions and terminals from the tree. For instance, “if CollidingLetter, then Attack else Avoid.” These random combinations are weighed against a fitness measure (defined elsewhere in the program), which helps determine the outcome of the future generations. In following traditional GP, the selected alphabet characters (a certain percentage) are copied for crossover and reproduction while the rest are discarded–depending on how exactly we choose to do it. Again, this code should be easily modified from MSDN’s version to what we need for Alphabet Soup.

The article also addresses the role of mutation in the ant program. Mutation will be crucial to how Alphabet Soup plays out:

“[Mutation] involves making a random change in a small number of offspring in order to maintain diversity in the population. Mutation is essential because otherwise the population becomes completely dominated by a few good bloodlines, which makes it difficult to continue progress beyond a certain point. [The Ant/Food program] performs a simple mutation step during the crossover operation to introduce randomness. When a terminal node is copied, a random terminal is substituted two percent of the time.”

So these are the basics of the basics to get us started. Another note to keep in mind is the usefulness of XML. The Ant program copies the ant path to an XML file which can later be reloaded. This may not have much use for the application itself, but it can be interesting for reviewing past generations and possible patterns/flaws/suggestions in the code.

Also, check out another simple example in C.

Tagged with:
 

Methods for Meaning

(Originally published December 8, 2009)

Finding meaning:
Chapter VI of “Godel, Escher, Bach” begins to delve into trying to pin down the location of meaning. It discusses the recursive enumerable set: something being defined in terms of simpler versions of itself (such as Fibonacci numbers, or cells in a body).

Small, simple parts make up complexity, out of which meaning is drawn. Levels of complication increase until the recursive system might be “strong enough to break out of any predetermined patterns… Instead of just considering programs composed of procedures which can recursively call themselves, why not get really sophisticated, and invent programs which can modify themselves—programs which can act on programs, extending them, improving them, generalizing them, fixing them, and so on? This kind of ‘tangled recursion’ lies at the heart of intelligence.” (p. 152).

“…meaning is part of an object to the extent that it acts upon intelligence in a predictable way” (p. 165) – In context, deciphering may take more or less effort but the eventual result is in essence inevitable: the intended interpretation remains the same no matter what. We are just “drawing out” the information that is inherent in the message. (Consider elemental DNA being chemically drawn out to establish the “who” of the overall person). It is like the principle that “information is in the air” waiting to be grasped as it inevitably will be when the right condition, place, time and intelligence exists to grasp it. The Project should be able to grasp such ideas using the power of evolutionary computation and memristors beyond human brain capability. So, a little more about grasping meaning:

According to GEB, 3 mechanisms are needed to draw meaning from any message (p.167):

Message: 1+1=2
(just the “meaningless” figures themselves) / GEB’s FRAME MESSAGE

Core message: 1+1=2
(meaning: start with one, add one, end up with two) / GEB’s INNER MESSAGE

Trigger: 1+1=2 is understood for what it is to be important through the use of mathematics / GEB’s OUTER MESSAGE

The trigger is the interesting part. The message can be anything. So the question for the trigger becomes, “How do I interpret the message? What method do I use to accomplish this?” This can require a lot of information, and it acts as the “schematic,” the barebones algorithm used to draw meaning from the meaningless. Where does it come from, how is it determined?

The way this triggering occurs may be a universal dialect (algorithm?) which can be applied to biology, technology, and so on. From GEB: “We could ascribe meanings… of a message to the message itself, because of the fact that decipher mechanisms are themselves universal—that is, they are fundamental forms of nature which arise in the same way in diverse contexts” (p. 171) .

But when we really try to pin it down, no meaning is absolute; at its core, meaning is accepted by what makes the most sense and there can be no absolute certainty. “…one can never give an ultimate, absolute proof that a proof in some system is correct. Of course, one can give a proof of a proof, or a proof of a proof of a proof—but the validity of the outermost system always remains an unproven assumption, accepted on faith.” (pp. 192-193).

We as humans recognize these patterns subconsciously; our intelligence rises above the inherent contradictions and endless downward spiral of circular reasoning. Machines need to be trained to do the same for effectively drawing meaning from patterns in data. Computers already have more processing capability than our brains; we simply need to learn how to mold those capabilities.

Methods:
Here are some methods for extracting data in a meaningful way, mostly drawing from what we know about the human brain, as consolidated in Kurzweil’s “AI toolkit” (found here):

Expert systems: Using human-like decision making with not only blatant logic, but also underlying logic which our biological brains take for granted (something that can be spelled out in the Propositional Calculus [GEB, chapter VII]): Basically determines things based on small bits of logic combined into large decision-making systems.

Bayesian Nets: A self-training system, learning from past experience. “Many expert systems based on Bayesian techniques gather data from experience in an ongoing fashion, thereby continually learning and improving their decision making.” (i.e., spam filters).

Markov Models: Useful for application in speech recognition, including word patterns and grammar rules discovered through machines “listening to” and analyzing human speech.

Neural Nets: Many dumb parts (neurons) organizing in layers and patterned configurations. Meant to replicate the neurons in a brain. Ignorance is improved through “coaching” (experience) from one or more sources, until eventually the neural net makes its own choices based on experience. “A powerful, well-taught neural net can emulate a wide range of human pattern-recognition faculties. Systems using multilayer neural nets have shown impressive results in a wide variety of pattern-recognition tasks, including recognizing handwriting, human faces, fraud in commercial transactions such as credit-card charges, and many others. In my own experience in using neural nets in such contexts, the most challenging engineering task is not coding the nets but in providing automated lessons for them to learn their subject matter.”

Genetic Algorithms: Given a goal, the GA makes use of mutations and sexual reproduction (inheriting parts of two chosen parents) to feel out the best paths towards this goal while discarding bad paths along the way. Basically, natural selection in code. “Like neural nets GAs are a way to harness the subtle but profound patterns that exist in chaotic data. A key requirement for their success is a valid way of evaluating each possible solution. This evaluation needs to be fast because it must take account of many thousands of possible solutions for each generation of simulated evolution.”

Recursive Search: This involves analyzing a tree of every possible outcome, and narrowing the search by eliminating bad or unlikely branches to more quickly find adequate solutions. This recursive algorithm was used by Deep Blue and Deep Fritz for analyzing chess positions. “Key to the success of the recursive formula is pruning this huge tree of possibilities and ultimately stopping its growth.”

So what is the best AI methodology? Use a combination of all of the above, depending on the application. Generally, the more varied the approach, the more robust the processing and analyzing power, along with better potential for evolutionary growth. And this still doesn’t touch on the use of memristors (et cetera). There are more tools to come as the workings of the brain become more understood.

Tagged with: