Development Notes

There have been some exciting developments in Alphabet Soup. We now have a functioning Binary-Bitmap Converter, and a working Genetic Algorithm. I thought I’d make some notes on the first few successful runs of the GA, so here goes:

For the simplest Alphabet Soup chromosome, we are using a 16-bit binary string which, while all 1s and 0s, isn’t functioning as a true binary string. Instead, for our simplest calculation of fitness, we are assigning the “meatiest” chromosome (the one with the most 1s) as the fittest individual. So instead of going by the decimal value of 0000001000100101 (which is 549) we are adding the 1s for a value of 4. This cuts back on processing power, especially when dealing with a full 256 bits, and it allows us some easy numbers to work with for churning out a fitness value of any given chromosome. So, the optimal value is 16, represented as: 1111111111111111.

Success!
Success!

Our first run of the GA striving to reach this value with a random starting population has been a surprising success. Reproduction and crossover rapidly increase the fitness no matter what the starting population. The GA often converges on not-quite ideal solutions, but with enough generations the mutations do eventually converge on the correct string. Our mutation rate is set to 10%. The other parameters are: min:0, max:2, crossover:80%, length:16.

These parameters seem to work quite well. I changed the population paramater to see how that affected the runs, and the results were interesting. The following chart describes our first six runs of the “Alpha” Alphabet Soup Genetic Algorithm:

Population     Generations to converge     Total number of individuals    
2 520 1040
2 791 1582
4 273 1092
4 376 1504
50 42 (w/ 2 mutants) 2100
50 63 (w/ 1 mutant) 3150

I found it interesting that convergence seems to happen faster with a smaller population. I was especially surprised at the efficiency with which only two parents managed to converge on the correct solutions. The least number of individuals required to converge was 1,040, which happened with only two in the population, while the longest run required 3,150 individuals–using a population of 50. That’s still not bad, considering the search space contains 65,536 possibilities.

Tagged with:
 

Character Collisions

The binary-bitmap conversion portion of the BBC has already been discussed and is in development. But there are many other aspects of this part of the program that need to be considered, including a physics engine and the motion of the letters. One thing that’s been bugging me is how the letters will react when they run into one another.

Let’s assume that two letters, an ‘A’ and a ‘B’ are floating around in their predetermined environment. They run into each other–what happens?

First of all, we need to clearly define what consitutes a character. There are 256 pixels in the 16×16 square, yes, but most of the border consists of ‘0’s–white space. Is it going to be considered a collision when the outer white pixels of the bitmap collide, or are we going to go with a more literal description of the letters running into each other, i.e. the black pixels (‘1’s)?

I think the second idea makes more sense, even though in computer logic the bitmap touches the other bitmap when the 16×16 squares meet, which causes a kind of predicament for us. The crazy thing about going with option 2 is that it will cause a morphing in the bitmap of at least one of the letters, even before the actual collision takes place. In other words, the ‘A’ bitmap will already change before the genetic code is activated.

                    Purple shows changed pixels:
                    0000000000000000
                    0000000000000000
                    0001111111000000
                    0001111111100000
                    0001000000110000
                    0001000000010000
                    0001000000010000
                    0001000000110000
                    0001111111100000
                    0001111111110000
                    0001000000010000     ‘A’ bitmap becomes:
00000000000001000000010000     0000000000000100
00000001000001000000110000     0000000100000100
00000011100001111111100000     0000001110000111
00000010100001111111000000     0000001010000111
00000110110000000000000000     0000011011000000
0000010001000000                         0000010001000000
0000110001100000                         0000110001100000
0000100000100000                         0000100000100000
0000100000100000                         0000100000100000
0001111111110000                         0001111111110000
0001111111110000                         0001111111110000
0001000000010000                         0001000000010000
0011000000011000                         0011000000011000
0010000000001000                         0010000000001000
0110000000001100                         0110000000001100
0000000000000000                         0000000000000000

One way around this is to make the border ‘0’s (white pixels) invisible, or transparent to the other letters. Also, we could somehow use them so that when the outer pixels of two letters collide, they activate some sort of “signal” (probably a specific function in the code) that a collision is imminent and that the decision-making procedure needs to occur. Maybe this could even change the white pixels to a different color as an extra visual aid to show that a collision is about to take place somewhere within the soup–a way to highlight the action. Just a thought.

Before:                                             Collision Imminent!
0000000000000000                         0000000000000100
0000000100000000                         0000000100000100
0000001110000000                         0000001110000111
0000001010000000                         0000001010000111
0000011011000000                         0000011011000000
0000010001000000                         0000010001000000
0000110001100000                         0000110001100000
0000100000100000                         0000100000100000
0000100000100000                         0000100000100000
0001111111110000                         0001111111110000
0001111111110000                         0001111111110000
0001000000010000                         0001000000010000
0011000000011000                         0011000000011000
0010000000001000                         0010000000001000
0110000000001100                         0110000000001100
0000000000000000                         0000000000000000

The fight-or-flight function, another good chance for GP to help with decision-making, will then determine: Do I, as an ‘A’…
1. continue toward the ‘B’ (attack),
2. change direction to avoid ‘B’ (retreat),
3. breed with ‘B’ (perform GA/GP)?

The thing is, both letters will be running this function to determine the best course of action. It would be interesting in this scenario to see what would happen if ‘A’ tried attacking while ‘B’ decided to activate the genetic algorithm. What would happen? Who would win? Maybe the program would crash and we’d need to tweak it, but whatever–this is just a brainstorming session.

All this can be figured out. I think the main problem with the white borders in a bitmap will be aesthetics: the collision wouldn’t really look like a collision if just the outer empty sections of the letter boxes are running into each other. We can go with the highlighting/color-changing idea, or maybe we can implement some sort of overlap in the outer white pixels on the bitmaps to show more precise-looking collisions. The speed of the letter movement could also be enough to distract the user’s eye from the extra white space. Then again, as small as a 16×16 bitmap is, the difference is probably subtle enough that it won’t even be noticeable.

Tagged with:
 

Bitmap-Binary Basics

The Bitmap-Binary Converter will likely be the core engine for at least the early versions of Alphabet Soup. To create it, there are a few things we need to consider. The two most glaring questions are how will we read individual pixels, and how will the actual conversion between pixel color and binary code take place? An even more basic consideration is which programming language we will use. I am leaning heavily toward C#, as it seems fully capable of tackling both problems.

There are a couple of ways we could do this. Let’s start with the assumption we begin with the bitmap and define all the colors for a given letter (“A”, for example). This website explains how to access each pixel individually–exactly what we are trying to do.

This is the core of the code allowing the color definition for each pixel based on location within the bitmap:

public void SetPixel(int x, int y, PixelData color)
    {
        PixelData* pixel = PixelAt(x, y);
        *pixel = color;
    }

However this is using ‘unsafe’ code to access each individual pixel. To use ’safe’ code the actual program execution would take forever. So there may be a third alternative worth looking into. And here are more thoughts about this. This is something we are going to have to experiment with to find the best method, or we might have to figure out our own.

To save the bitmap code for future use, we will have to set the black pixels to 1 and white pixels to 0. This will produce 256 characters, ranging from charA[0] to charA[255]. We might have the program write this data to a separate file, or just store it within its own memory.

For visual reference, here is the bitmap in actual size:
A

Here is the ‘A’ bitmap represented in binary:

0000000000000000
0000000100000000
0000001110000000
0000001010000000
0000011011000000
0000010001000000
0000110001100000
0000100000100000
0000100000100000
0001111111110000
0001111111110000
0001000000010000
0011000000011000
0010000000001000
0110000000001100
0000000000000000

Here is the above bitmap referenced in pixel notation (shortened to describe the four corners):

(0,0) through (0,15)

(15,0) through (15,15)

And here is the character equivalent to the above pixels:
charA[0] = (0,0) = value of ‘0′
charA[15] = (0, 15) = value of ‘0′
charA[240] = (15,0) = value of ‘0′
charA[255] = (15,15) = value of ‘0′

Or, we do it the other way: Start with 62 ‘.txt’ files (52 letters and numbers 0-9, or whatever amount we end up with for the base set of characters), each with 256 “0″ and “1″ characters representing the 16×16 bitmap.

Read the text from the file and use the code “if charA[0]=’0′…” / “if charA[0]=’1′…” and the earlier bitmap code to convert each corresponding pixel to black or white by referencing the corresponding pixel within the bitmap. Whenever crossover occurs in the program, it creates a new binary (or .txt) file to represent that organism. I think a thorough understanding of the C# StringBuilder Class is going to come in handy for this, in splicing strings of binary characters and performing other modifications. Again, this will be closely intertwined with the GP or GA code, which will be the determining factor for where the splicing will occur each time.

Basically we’re accomplishing the same thing but from two different directions: bitmap-to-binary or binary-to-bitmap. It will probably be valuable to know how to do this in both directions for purposes of the program anyway. Basic C# data type conversions (string-to-binary-to-decimal-to-chars-to-string and so on) will be instrumental in making this work properly. This should actually be pretty simple; it will just take some time to sort out the details.

What I described above should accomplish the basic functionality of the BBC; once we get this working we can move onto the engine’s more complex functions, such as attributing higher “sensitivity values” to certain pixels and recognizing “contact” with other letters.

*Define Initial Population*

The initial population in the final version of Alphabet Soup may likely contain one of each keyboard character available. For now, for simplicity’s sake, let’s begin with 4 characters: A,B,a,b. In our game, all 4 characters decide to run into each other at once.

First we will convert the 16×16 characters into binary, 0 for gray and 1 for yellow:

A:
0000000000000000
0000000100000000
0000001110000000
0000001010000000
0000011011000000
0000010001000000
0000110001100000
0000100000100000
0000100000100000
0001111111110000
0001111111110000
0001000000010000
0011000000011000
0010000000001000
0110000000001100
0000000000000000

A, Binary String: 0000000000000000000000010000000000000011100000000000001010000000000001101100000000000100010000000000110001100000000010000010000000001000001000000001111111110000000111111111000000010000000100000011000000011000001000000000100001100000000011000000000000000000
A, Binary-Decimal conversion: 95786086615602871731981918389383308042547593781186608

B:
0000000000000000
0000000000000000
0001111111000000
0001111111100000
0001000000110000
0001000000010000
0001000000010000
0001000000110000
0001111111100000
0001111111110000
0001000000010000
0001000000010000
0001000000110000
0001111111100000
0001111111000000
0000000000000000

B, Binary String: 0000000000000000000000000000000000011111110000000001111111100000000100000011000000010000000100000001000000010000000100000011000000011111111000000001111111110000000100000001000000010000000100000001000000110000000111111110000000011111110000000000000000000000
B, Binary-Decimal conversion: 46403387827017775033653233413204161160929512263696

a:
0000000000000000
0000000000000000
0000000000000000
0000001111100000
0000011111110000
0000110000110000
0000000000010000
0000000111110000
0000011111010000
0000110000010000
0000110000110000
0000111001110000
0000011111011000
0000000000000000
0000000000000000
0000000000000000

a, Binary String: 0000000000000000000000000000000000000000000000000000001111100000000001111111000000001100001100000000000000010000000000011111000000000111110100000000110000010000000011000011000000001110011100000000011111011000000000000000000000000000000000000000000000000000
a, Binary-Decimal conversion: 86418088698874722540000455600782178239016967

b:
0000000000000000
0000100000000000
0000100000000000
0000100000000000
0000101111000000
0000111111100000
0000110000110000
0000100000010000
0000100000010000
0000100000010000
0000100000010000
0000110000110000
0000111011100000
0000101111000000
0000000000000000
0000000000000000

b, Binary String: 0000000000000000000010000000000000001000000000000000100000000000000010111100000000001111111000000000110000110000000010000001000000001000000100000000100000010000000010000001000000001100001100000000111011100000000010111100000000000000000000000000000000000000
b, Binary-Decimal conversion: 766259462624453036391004127445265907255673543052767246

So, our Initial Population (POPinit) consists of the following four characters and their decimal representation:
1. (A) 95786086615602871731981918389383308042547593781186608
2. (B) 46403387827017775033653233413204161160929512263696
3. (a) 86418088698874722540000455600782178239016967
4. (b) 766259462624453036391004127445265907255673543052767246

*Define fitness of POPinit*

First, we establish the Total, the Best, the Worst and the Average based on their decimal representations:
Total: (A+B+a+b) 862091952714301014596894421608062875060164244585234517
Best: (b) 766259462624453036391004127445265907255673543052767246
Worst: (a) 86418088698874722540000455600782178239016967
Average: 23958122541618481117083899450474345137254323064508656.25

Based on these findings, we can calculate the chance of Reproduction / Survival for each character.
A: 0.1111088977388315456010566688830799217993208329796695901253… = 11% chance of reproducing
B: 0.0000538264945878643991068686448919016779835612586246604953… = <1% chance of reproducing
a: 0.0000000001002423099146058828565205588056220034311729841463397… = MUCH <1% chance of reproducing
b: 0.8888372756663382800852305796155076177170736023305327652329… = 88% chance of reproducing

*Perform Reproduction Function*

As you can tell from the fitness of each character, the character “b” has by far the best chance at survival, while character “A” is the only other letter with much of a chance at all. In the actual game, the reproduction function will decide who actually survives reproduction so it is impossible to say for sure, but based on these percentages we will go with three b’s and one A surviving to population generation 0.

So, the result of the reproduction function should look something like:

1. (b) 766259462624453036391004127445265907255673543052767246
2. (b) 766259462624453036391004127445265907255673543052767246
3. (b) 766259462624453036391004127445265907255673543052767246
4. (A) 95786086615602871731981918389383308042547593781186608

*Define fitness of gen0″

Again, we determine the total, best, worst and average fitness of the new population, gen0:

Total, better: (b+b+b+A) 2394564474488961980904994300725181029809568222939488346
Best, same: (b) 766259462624453036391004127445265907255673543052767246
Worst, better: (A) 95786086615602871731981918389383308042547593781186608
Average, better: 598641118622240495226248575181295257452392055734872086.5

Note that the Total, Worst, and Average fitness of the population all improved as an expected result of natural selection. The Best fitness, meanwhile stays the same–and it has no chance for improvement until running the next stage of the GA…

*Perform genetic crossover operation on gen0*

Start with 2 parents. Let’s use 3 & 4 as an example with an arbitrary crossover point (we’ll say 170 digits in–though this would actually be determined by the randomly generated genetic operation):

3. (b) 766259462624453036391004127445265907255673543052767246 =
00000000000000000000100000000000000010000000000000001000000000000000101111000000000011111110000000001100001100000000100000010000000010000001000000001000000100000000100000 | (crossover point) 01000000001100001100000000111011100000000010111100000000000000000000000000000000000000

4. (A) 957860866156028717319819183893833080425475937811868 =
00000000000000000000000100000000000000111000000000000010100000000000011011000000000001000100000000001100011000000000100000100000000010000010000000011111111100000001111111 | (crossover point) 11000000010000000100000011000000011000001000000000100001100000000011000000000000000000

From this crossover, we get 2 new offspring characters:

offspring1 (o1):
0000000000000000
0000100000000000
0000100000000000
0000100000000000
0000101111000000
0000111111100000
0000110000110000
0000100000010000
0000100000010000
0000100000010000
0000100000110000
0001000000010000
0011000000011000
0010000000001000
0110000000001100
0000000000000000

o1, Binary String: 0000000000000000000010000000000000001000000000000000100000000000000010111100000000001111111000000000110000110000000010000001000000001000000100000000100000010000000010000011000000010000000100000011000000011000001000000000100001100000000011000000000000000000
o1, Binary-Decimal conversion: 766259462624453036391004127445265907255673543589892144

offspring2 (o2):
0000000000000000
0000000100000000
0000001110000000
0000001010000000
0000011011000000
0000010001000000
0000110001100000
0000100000100000
0000100000100000
0001111111110000
0001111111010000
0000110000110000
0000111011100000
0000101111000000
0000000000000000
0000000000000000

o2, Binary String: 0000000000000000000000010000000000000011100000000000001010000000000001101100000000000100010000000000110001100000000010000010000000001000001000000001111111110000000111111101000000001100001100000000111011100000000010111100000000000000000000000000000000000000
o2, Binary-Decimal conversion: 95786086615602871731981918389383308042547593244061710

So gen1 looks like:
1,2 are the same as before; 3,4 are the offspring of crossover:
1. (b) 766259462624453036391004127445265907255673543052767246
2. (b) 766259462624453036391004127445265907255673543052767246
3. (o1) 766259462624453036391004127445265907255673543589892144
4. (o2) 95786086615602871731981918389383308042547593244061710

*Convert gen1 offspring to valid characters*

Now, 3 and 4 are new shapes that do not correspond with any of our 4 letters. The BBC (Bitmap-to-Binary Converter) will be used to “read” these bitmaps and find the closest letter in the space of possible letters which we will define. So, the offspring will turn into something like “h” and “8″, because those are the closest possibilities within the alphabet.

I was going to include a range in decimal numbers and apply that to the nearest alphabet character within that range, but I don’t think that will work because more often than not the offspring will just end up reverting to one of the two characters from which it was created. From this perspective, the BBC may be a very useful portion of the AS program.

So, o1 become “h” and o2 becomes “8″, converting gen1 characters to:

o1=h:
0000000000000000
0000110000000000
0000110000000000
0000110000000000
0000110111000000
0000111111100000
0000111000110000
0000110000110000
0000110000110000
0000110000110000
0000110000110000
0000110000110000
0000110000110000
0000110000110000
0000000000000000
0000000000000000

h, Binary String: 0000000000000000000011000000000000001100000000000000110000000000000011011100000000001111111000000000111000110000000011000011000000001100001100000000110000110000000011000011000000001100001100000000110000110000000011000011000000000000000000000000000000000000
h, Binary-Decimal conversion: 1149389193936678235951120191876555791760811195370319884

o2=8:
0000001111000000
0000011111100000
0000110000110000
0000100000010000
0000100000010000
0000110000110000
0000011111000000
0000011111100000
0000110000110000
0000100000010000
0000100000010000
0000100000010000
0000110000110000
0000011111100000
0000001111000000
0000000000000000

8, Binary String: 0000001111000000000001111110000000001100001100000000100000010000000010000001000000001100001100000000011111000000000001111110000000001100001100000000100000010000000010000001000000001000000100000000110000110000000001111110000000000011110000000000000000000000
8, Binary-Decimal conversion: 23539885800661303801528815919744719668529798326595472592908

So the gen1 now looks like:
1. (b) 766259462624453036391004127445265907255673543052767246
2. (b) 766259462624453036391004127445265907255673543052767246
3. (h) 1149389193936678235951120191876555791760811195370319884
4. (8) 23539885800661303801528815919744719668529798326595472592908

From which we evaluate the fitness again (“8″ clearly will be the new dominate gene) and so on, generation after generation…

*Conclusion*

This is the basic gist of Alphabet Soup by GA. The final aspect of using GA is to incorporate mutations: the random changing of various pixels within the letters, which will obviously introduce the same effect of crossover in adding to the population of letters. Mutations will only be applied to a small portion of the entire population in each generation, just like in nature. With only 4 characters to choose from, let’s say there are no mutations in this particular generation. That gives us the current pool of characters, including the initial population, gen0 and the current generation (gen1):

[POPinit][gen0] [gen1]
[A,B,a,b][b,b,b,A][b,b,h,8]

So, we have 6 “b”s, 2 “A”s, 1 “a”, 1 “h”, 1 “B” and 1 “8″ floating around. Granted this is an example run, but it plays out pretty realistically: note that the dominate gene (b) is taking up quite a bit of the population, while a and B have not grown at all in population. “h” and “8″ are the new kids on the block, and they will have a direct impact on the future generations, again based on their fitness for reproduction and crossover–with a little bit of mutation thrown in here and there.

One of the core questions remains: How do we make the shapes of the characters matter? In other words, how do we specify the importance of various pixels? This, I believe, can also be accomplished through the BBC, and I have some ideas for that. But more on that later.