I wrote a bot that plays PopCap's Bookworm Deluxe. It's pretty good at it. As a video is worth about 21.6 million words, see for yourself:
The rest of this article is relatively techical, so if that's not your thing, you can stop now.
First problem to solve was how to get the image from the game and how to send mouse clicks back in. These were pretty easy to solve, assuming you have experience with win32 programming.
I started from the template in my ancient win32 tutorial (after fixing a few things that don't work anymore, mostly casting variables and resolving between A vs W versions of functions). This gave me a win32 app with a windows bitmap I could poke directly.
To get the pixels from the game's window, I asked window to give me the window handle, got the device context for the window and after that I could simply bitblt the bits over to my bitmap. Easy peasy.
bookwormwin = FindWindowA(NULL, "BookWorm Deluxe 1.13");
bookwormdc = GetWindowDC(bookwormwin);
...
BitBlt(pDC,0,0,WIDTH,HEIGHT,bookwormdc,0,0,SRCCOPY);
Sending clicks to the game was done via mouse_event (which is deprecated, but works, and is likely to be simpler than the currently-official method). To position the clicks properly, I needed to query the window's position.
GetWindowPlacement(bookwormwin,&bookwormwinpos);
...
void click(int x, int y)
{
SetCursorPos(bookwormwinpos.rcNormalPosition.left + x, bookwormwinpos.rcNormalPosition.top + y);
mouse_event(MOUSEEVENTF_LEFTDOWN, 0, 0, 0, 0);
Sleep(10);
mouse_event(MOUSEEVENTF_LEFTUP, 0, 0, 0, 0);
}
There's a danger in doing controls this way, namely, while your bot is in control, there's fairly little you can do to stop it. I experimented with detecting keyboard presses to kill the bot, but with the focus on the game this is rather tricky. Eventually I ended up setting up another administrator account on my PC so I can control-alt-del and switch users and then kill the bot from the other account.
I'm fully aware I could have made the bot more robust and not get into such situations but this was good enough solution, and those out-of-control bot situations ended up being rather rare.
Now that the bot can see the game and click on it, I needed to know which state the game was in. This I did simply by looking at the values of a couple of pixels on the screen; this pixel has this value only when in the main menu; that pixel has this value when asking to start a new game or continue previous one; that pixel has this value when in-game.
That done, I needed to get the state of the board.
OCR is the hardest and the least working part of the bot. To begin, I took a few screen shots and got the exact glyphs the game used.
My first stab at the OCR was to take a few pixel values from each letter, and ran the game until it didn't detect several letters (by painstakingly comparing the two glyphs and adding another detected pixel). This didn't work horribly well, but let me get started with the other parts of the bot.
Later on, I extracted the actual glyph data from my screen shots, loaded them up in the bot and started comparing the dark pixels directly. This worked slightly better.
However, due to all of the particle systems, glow effects, shaking tiles, floating scores, etc, it was far from perfect.
The final iteration first runs the screen capture through a preprocess where the pixel values are first multiplied by themselves four times to crank up contrast, and then pixels are considered "dark" if their value is less than 1/6 of the brightest pixel of the tile. This let me get mostly legible glyphs out of even the brightest and most particle system-filled tiles.
Looking at what the bot sees, the tiles which are the most difficult to figure out are actually the tiles that are about to become fire tiles, due to all of the particle system activity.
When matching the glyphs, first the tile is scanned for "dark" pixels, and if the dark pixel matches the glyph's dark pixels, the glyph gets a point; if it doesn't, the glyph loses a point. Finally, the glyph score is divided by the number of dark pixels the glyph is expected to have. The glyph with the highest score is selected.
This process is not, by any means, perfect, but it performs well enough.
If you were to load up the glyphs in photoshop, you'd be surprised how few pixels differ between some of the glyphs, such as O, C, G. Another times, one glyph pretty much completely covers another, such as with M and V, or B and E. Hats off to any "real" OCR software - mine was working in more or less optimal starting point (ignoring the particles and such, of course).
Now that the game board is known (or close enough, anyway), I needed to find the words. In the first few versions I used a big word list I've used in the past, but that contains so many words bookworm doesn't recognize that the bot wasted a lot of time just trying various words.
I figured I could make the bot learn that certain words are not valid, but since the OCR failures are also an option, that would have led the bot to discard its whole dictionary eventually.
So I figured it would be best to use the same dictionary as the game. I googled to find whether someone has extracted it already, but all I found was someones explanation on how to add words to the word list. I shrugged and took a glance at the word list file.
The wordlist.txt starts with:
aa
2h
3ed
ing
s
2l
3iis
s
2rdvark
8s
4wolf
7ves
3gh
rgh
I looked at this for a while and figured it out, and wrote a program to turn this into a plain word list so I can just swap my original dictionary to it.
The rules are as follows:
[offset]string
And that's it.
Following the above rules, we get:
aa -> aa
2h -> aah
3ed -> aahed
ing -> aahing
s -> aahs
2l -> aal
3iis -> aaliis
s -> aals
2rdvark -> aardvark
8s -> aardvarks
4wolf -> aardwolf
7ves -> aardwolves
3gh -> aargh
rgh -> aarrgh
I loaded the dictionary into a tree structure, which looks something like..
a a h*e d
| i n g
| s
l*i i s
| s
r d v a r k*s
| w o l f
| v e s
g h
r g h
Then I just went through the game field recursively, traversing the tree whenever a valid branch was found, and whenever a full word was found I recorded it. (so for "books", "boo", "book" and "books" words would be noted).
So if we were tracing that graph and we have already found two 'a' letters, the next possible letters would be 'h', 'l' or 'r'. If none of those tiles are available, we're done. If we do find a 'h', we record "aah" as one word, and then start looking for 'e', 'i' or 's', and so forth.
Each found word is given a score based on the difficulty of the characters (so that 'e' gets one point while 'Qu' gets four), multiplied by any special tiles the word crosses. Normal tiles have multiplier of 1, special tiles have multiplier of 2, while fire tiles have a multiplier of 100. I really, really wanted the bot to concentrate on killing the fire tiles.
With the potential word selected, the bot clicks on the tiles of the potential word, and then reads the screen to see whether Bookworm gives a word score (meaning bookworm considers it a valid word). If so, the bot clicks on submit. Sometimes the particles that fly from the total score hit the area where the word score appears, causing the bot to think it has a valid word =)
If the word isn't valid, the tiles are unselected by clicking on the last tile of the board once and the first tile twice. This will always leave the board in a state where no tiles are selected.
Whether the word was valid or not, the bot will then take another screen shot and tries to figure out the tiles. With all the noise going on, it's best to just retry the OCR a lot. To avoid trying to enter the same failed word over and over, whenever a word fails, the bot will skip a word from the best-scored list. This counter is reset on a valid word.
The bot also keeps track of average word length, and if this falls low enough, it will trigger a shuffle.
It's still possible for the bot to lose. If a fire tile appears at the bottom (or gets there by accident), and is surrounded by incompatible letters (such as 'V' surrounded by 'Qu' tiles), it will never get included in a potential word and this won't be scored!
I considered adding detection for this kind of case and trigger a shuffle, but it ended up being such an improbable case that I didn't bother with it.
The OCR could be better. For instance, I could take several screen shots and average out the noise.
I never tried to trim down all of the sleeps I have in the bot's behavior, so it could actually be even faster.
Detection of game state (including the "level up" dialogs) could be more robust. I didn't set the game to click on the "level up" dialogs by design so that I'd get control back every now and then. As a stand-alone demo that would be easy to add.
Only the first 1000 found words are considered. I haven't checked whether the bot hits this limit often. If it does, a lot of better words are missed.
The bot doesn't care about the bonus words at all.
A smart human player can pick a low-scoring word just to get the tiles into a configuration where a high-scoring word is available. The bot doesn't even consider this. Another human strategy is to save the bonus-multiplier tiles until a long word grabs many of them and scores even higher. This would be easier to implement.
In the end, I only ever ran the debug build of the bot - the processing power of modern computers is downright crazy, considering all of the brute-force pixel-crunching and recursive word searching that it has to perform. The dictionary, in it's rich and expensive tree form, takes about 25 megabytes of memory.
It was a fun little project to do, but I'm not even going to release the source code (for obvious cheating reasons). I never cheat in multiplayer games; someone might use this "engine" to do just that.
I hope you found it as funny as I did.
As always, comments are appreciated.