Geeks With Blogs
Greg Young Greg.ToString()

http://www.c-sharpcorner.com/UploadFile/mgold/GAAnagramSolver03262006194953PM/GAAnagramSolver.aspx

Is a great example.

This particular example is using genetic algorithms to solve anagrams (similar use to solving a grouping of letters in scrabble).

This example shows a lack of understanding of what their strong and weak points are (especially when they discuss using genetic algorithms to find the best possible answer based upon a scoring system based on the letters used such as in scrabble).

I am going to give here and then detail the three major reasons

1) A Brute Force Search is reasonable (say at a maximum 20 unique characters? one could quickly loop through these)
2) The search is non-deterministic
3) It has a yes or no scoring system

The search space is definately reasonable a large word may have a few thousand permutations but we are not talking about significantly large search spaces. Worse is that since we are using a genetic search our search is non-deterministic, we quite often repeat previous positions (see listing 3 from article)

Listing 3 - Output of the console showing all anagrams for the word rose:

Genome 0: ores --> 4

Genome 1: rose --> 4

Genome 2: rose --> 4

Genome 3: sore --> 4

Genome 4: ores --> 4

Genome 5: ores --> 4

Genome 6: sore --> 4

Genome 7: roes --> 4

Genome 8: rose --> 4

Genome 9: rose --> 4

 

 

“The fitness function turns out to be simple in this case because we can only accept a perfect fitness.“

The biggest problem I have with this particular usage of genetic algorithms is that their largest benefit is in fact being ignored. Genetic algorithms are wonderful at performing a task called hill climbing (that is manipulating themselves slightly to see if the manipulation does slightly better, if so keeping it). The big problem is that the scoring system either returns “correct“ or “incorrect“ there are no gradients of correctness, as such the genetic algorithms have no hill to climb and become nothing more than a random permutation search (I would actually wager the random permutation search would do better in the long run due to less likelyhood of duplication).

In order to better assess this, let us presume that it does not duplicate at all through some magical mechansim; most likely a very large in memory cache it would have to hit but we will forget about that overhead and focus on the search itself. The key to identifying its worth in comparison to a random search would have to be a proof that it is more likely to find another random word by transposing a character in a known good word. For example it would be saying that it is more likely to find a another word by transposing two characters in rose than it would be to just randomly generate a new set of characters (I have yet to see a proof that states it to be more likely, have you?)

As such we are left with a random search.

The alternative classic solution would be a brute force search which would actually work quite well here in that it is

1) Deterministic
2) much less likely to duplicate (could even be done to not duplicate at all)

 

Posted on Thursday, March 30, 2006 1:12 AM | Back to top


Comments on this post: AI misuse

# re: AI misuse
Requesting Gravatar...
Hi,

I'm the writer of this article, and you are write, although the algorithm is itelf genetic, it is not actually performing a genetic algorithm because the fitness function is not deterministic. I do point this out in my article though as seen in the quote below:

"The fitness function turns out to be simple in this case because we can only accept a perfect fitness. Note that in some ways this program does not really behave like a genetic algorithm—it does not approach an elite population since the fitness does not gradually change. The program is more of a generator of random combinations of genomes. If we were to change the fitness function to score words based on the scrabble score, then we could utilize the program more as a genetic algorithm so that the population approaches the highest scrabble score. "

One could argue, however, that in some ways it still is a genetic algorithm, because it mutates and crosses over, and spawns generations, it just doesn't really have anything to learn <g>.

-Mike
Left by Michael Gold on Mar 30, 2006 7:52 PM

# re: AI misuse
Requesting Gravatar...
First of all I don't want you to think I am downgrading your work, I am quite happy to see people using genetic algorithms as they are a powerful and under used concept.

"I'm the writer of this article, and you are write, although the algorithm is itelf genetic, it is not actually performing a genetic algorithm because the fitness function is not deterministic."

I am saying that the search functionality is non-deterministic as it is based upon random permutations which is the case with ALL genetic algorithms. The fitness function is by all means deterministic as it is assured to be repeatable, given the same input twice it will return the same output.

On a mroe general note, GA's are good at finding a "good" answer but they are terrible in most problem spaces at finding the best answer. For some very large problem domains this is actually an acceptable behavior which is why they are used (travelling salesman problem is a classic example).

The way that they get these "good" results is by hill climbing to local maxima. If there is no or little correlation between the maxima then genetic algorithms have trouble. Many handle the local maxima case by increasing the level of mutation when a local maxima is found. The opposite is true as you are hill climbing ... you often reduce the amount of mutation.

The addition of "word score" to the problem will not help much in this case as it is still a yes or no question (with a score).

example problem space

0000000000
0000000800
0000000000
0000000000
0000000000

The GA's will essentially have to wade through vast waste lands of 0s (no possible hill climbing).

One could also make an attempt at say using the score of letters but again, all the GAs will "learn" is which letters are the most valuable (and will likely get caught quickly in a local maxima that does not have a solution).

Example with letter scores

5 6 8 9
9 8 3 2
2 2 79 1
1 1 8 1
9 8 7 7

Because it is searching for a "yes/no" type answer there is not a good chance that the GA's will climb towards it, it is instead by random chance that they will find it. My guess as to the given behavior would be that the GA would quickly find the most valuable letter combinations (through hill climbing) and would then get stuck or it might find a valued word and then get stuck on the valued word; in either case it would at that point become a random search. This is also known as a deceptive problem as it could just as easily be something like ...

9 8 7 1 79
11 9 8 2 1
15 12 9 2 1
14 11 9 2 1
10 10 9 1 1

In which case they will push away from the answer (towards the 15)

It's not the implementation thats bad, its the problem that it is trying to solve, it is just a very bad example to show the true strength of genetic algorithms.

When dealing with GA based work; it is standard to include a test against a simple random generator in order to show that the GA is in fact not operating in the same method. I would be interested in seeing these results within this problem domain as I think it would be roughly equivalent. A more interesting result accross a large test set would be if the GA actually came out better as it would suggest that one is more likely to form a new word through transposition of an existent word as opposed to a complete random generation which I would find rather interesting)
Left by Greg Young on Mar 30, 2006 10:13 PM

Your comment:
 (will show your gravatar)


Copyright © Greg Young | Powered by: GeeksWithBlogs.net