Search the web
Sign In
New User? Sign Up
wordgame-programmers · A list for programmers of word games. Ve
? Already a member? Sign in to Yahoo!

Yahoo! Groups Tips

Did you know...
Want to share photos of your group with the world? Add a group photo to Flickr.

Best of Y! Groups

   Check them out and nominate your group.
Having problems with message search? Fill out this form to ensure your group is one of the first to be migrated to the new message search system.

Messages

  Messages Help
Advanced
Fast Pattern Matching for Crossword Filling   Message List  
Reply | Forward Message #956 of 963 |
Re: [wgp] Fast Pattern Matching for Crossword Filling

On Mon, Mar 9, 2009 at 7:58 PM, Andy Cook wrote:
>
> Such things are all strongly dependent on the word list, of course.
> Number 2 seems relatively easy using Sowpods:
>
> ULTRAHEATED
> MORIBUNDITY
> BRIGANDS AE
> RICOTTA OMS
> ACARI RELIT
> GAR SETLINE
> ET BEREAVER
> DEBASEDNESS
>
> TATTLETALES
> HEREINABOVE
> ERECTERS UP
> MOSTEST ALT
> ALTAR RANGI
> TIS AMALGAM
> IT ETATISTE
> CHEMISETTES
>
Very nice !

My french lexicon for such grids contains 120000 words (with no
conjugation), and I have been unable to find such a grid.

BTW, there is a crossword benchmark set here:

http://4c.ucc.ie/~hcambaza/page2/page9/page9.html


> I'd be interested in more info on how you use CSP for crosswords, if you
> don't mind.
>
First, I recommend that you take a look at Madore's program:
http://www.madore.org/~david/programs/#prog_crosswords

It's the easiest approach I found on CSP applied to crosswords.
I found 4 or 5 other sources, but they are bloated and cryptic.

Here is how to apply CSP to crosswords:

1) each square is represented by a set of letters A to Z (I personally
prefer to represent it as an integer, with one bit per letter)

2) At the beginning of the grid filling, every square contains all
possibles values A-Z.
(programmatically, this is equivalent to setting all 26 bits to 1).

3) Reduction
At the first iteration, you check every possible word horizontally
then vertically, and remove the impossible letters.
For example, if the first horizontal word cannot start with Z, you can
remove the Z from the first square, and this will remove the Z in the
vertical words.
You continue to perform this 'reduction' as long as letters are removed.

4) Cheapest-first heuristic
When there is no possible reduction, you select the square that
contains the least possible letters.
At this moment, I recommend using some weighting, like choosing the
square with the longest word and the least possible letters.

5) Once you chose a square, you enumerate all possible values, and
call the same routine recursively (at the above point 3).
Madore enumerates the values randomly, but you may apply some other
weighting, like choosing the letter that gives the least number of
words.

I obtained a major boost in the grid filling by using a bits table for
every word in the crossword.
Originally, all the bits in the table are set to 1, meaning that the
words can be placed.
When a word is impossible to place, the bit is set to 0.
This way, when you try to check all words that fit in one place, a bit
equal to 0 means that it's impossible to place.

I hope I have been clear.
The algorithm can be programmed in around 200 lines of code.

Another approach is to fill grid with words, instead of letters, but
it's basically the same technique.

For more information, I recommend the following articles:
Search Lessons Learned from Crossword Puzzles (Ginsberg & al)
Constructing Crossword Grids: Use of Heuristics vs Constraints (Meehan & Gray)
Solving Crossword Puzzles as Probabilistic Constraint Satisfaction
(Shazeer, Littman & Keim)
Constraint Programming Lessons Learned from Crossword Puzzles
(Beacham, Chen, Sillito and van Beek)
Crossword Puzzles as a Constraint Problem (Anbulagan & Adi Botea)
Crossword Grid Composition with a Hierarchical CSP Encoding (Adi Botea)
Generalized NoGoods in CSP (Katsileros & Bacchus)
CS 227 Programming Assignment (Sullivan, Robertson & Vantimitta)
Practical Crossword Generation with Checkpoint Search (Arbiser)

If somebody is interested, I can provide links about themed
crosswords, how to solve crossword with a program, and clue generation
!

JC



Mon Mar 9, 2009 10:12 pm

jcmeyrignac
Offline Offline
Send Email Send Email

Forward
Message #956 of 963 |
Expand Messages Author Sort by Date

I'm currently investigating crossword filling with a CSP approach. The CSP approach allows to fill the cells in an 'unordered' order, so we have to use a...
Jean-Charles Meyrignac
jcmeyrignac
Offline Send Email
Mar 3, 2009
9:15 am

Last time I looked at this, the 'obvious' algorithm was woefully expensive in terms of storage - maybe nowadays it's not. That would be to have a lookup which...
Graham Toal
graham_toal
Offline Send Email
Mar 3, 2009
3:13 pm

... Frankly, I'm only limited by the memory, since filling a crossword doesn't require any storage, outside of the lexicon. Your approach is interesting, but I...
Jean-Charles Meyrignac
jcmeyrignac
Offline Send Email
Mar 3, 2009
3:55 pm

Oups, here is the real distribution of my wordlist: size 2: 93 words size 3: 538 words size 4: 1745 words size 5: 4244 words size 6: 7757 words size 7: 11117...
Jean-Charles Meyrignac
jcmeyrignac
Offline Send Email
Mar 3, 2009
3:57 pm

The latest version of the Australian program "Crossword Express" seems to be able to fill a grid from any given dictionary very quickly. Do any of you have any...
David Levy
davidlevylondon
Offline Send Email
Mar 3, 2009
5:30 pm

I haven't tried running them for 10 years or so, but I would imagine that the old C command-line programs that used to take an hour or so would be pretty...
Graham Toal
graham_toal
Offline Send Email
Mar 3, 2009
6:00 pm

I'm currently trying the CSP approach, but previously worked with the program from Laeuter, which fills the grid letter by letter. CSP means Constraint...
Jean-Charles Meyrignac
jcmeyrignac
Offline Send Email
Mar 3, 2009
7:22 pm

... BTW, here are some new links: Filling with dancing links: http://www.club.cc.cmu.edu/~ajo/free-software/xword/ A freeware Linux solver (seems to fill with...
Jean-Charles Meyrignac
jcmeyrignac
Offline Send Email
Mar 3, 2009
7:58 pm

... Indeed, filling something like: .....#....... .....#.#.#... .......#..... ##.#.....#.#. .....#.#..... .#.#.....#.## .....#....... ...#.#.#..... ...
Warwick Allison
warwickallison
Offline Send Email
Mar 9, 2009
5:23 am

... Re-reading my code, it's not *purely* brute force - it has one heuristic optimization: as the search progresses, it tries the possible words in order of...
Warwick Allison
warwickallison
Offline Send Email
Mar 9, 2009
6:06 am

... Hum, you are right, your patterns are too simple. First, notice that I'm french, and we use different rules for crossword filling. In France, it's not good...
Jean-Charles Meyrignac
jcmeyrignac
Offline Send Email
Mar 9, 2009
8:32 am

... Such things are all strongly dependent on the word list, of course. Number 2 seems relatively easy using Sowpods: ULTRAHEATED MORIBUNDITY BRIGANDS AE ...
Andy Cook
andrucook
Offline Send Email
Mar 9, 2009
6:58 pm

... Very nice ! My french lexicon for such grids contains 120000 words (with no conjugation), and I have been unable to find such a grid. BTW, there is a...
Jean-Charles Meyrignac
jcmeyrignac
Offline Send Email
Mar 9, 2009
10:12 pm

BTW, the current record for a 15x15 american crossword is 19 cells: http://www.mathpuzzle.com/Manny19.gif I'm interested with a better result. Any volunteer ? ...
Jean-Charles Meyrignac
jcmeyrignac
Offline Send Email
Mar 9, 2009
10:22 pm

... I finally found a way to perform a very fast words checking. For every word of the grid, I keep a table of bits representing all the words that can fit...
Jean-Charles Meyrignac
jcmeyrignac
Offline Send Email
Mar 5, 2009
9:04 am
Advanced

Copyright © 2009 Yahoo! Inc. All rights reserved.
Privacy Policy - Terms of Service - Guidelines - Help