On Mon, Mar 9, 2009 at 7:58 PM, Andy Cook wrote:
>
> Such things are all strongly dependent on the word list, of course.
> Number 2 seems relatively easy using Sowpods:
>
> ULTRAHEATED
> MORIBUNDITY
> BRIGANDS AE
> RICOTTA OMS
> ACARI RELIT
> GAR SETLINE
> ET BEREAVER
> DEBASEDNESS
>
> TATTLETALES
> HEREINABOVE
> ERECTERS UP
> MOSTEST ALT
> ALTAR RANGI
> TIS AMALGAM
> IT ETATISTE
> CHEMISETTES
>
Very nice !
My french lexicon for such grids contains 120000 words (with no
conjugation), and I have been unable to find such a grid.
BTW, there is a crossword benchmark set here:
http://4c.ucc.ie/~hcambaza/page2/page9/page9.html
> I'd be interested in more info on how you use CSP for crosswords, if you
> don't mind.
>
First, I recommend that you take a look at Madore's program:
http://www.madore.org/~david/programs/#prog_crosswords
It's the easiest approach I found on CSP applied to crosswords.
I found 4 or 5 other sources, but they are bloated and cryptic.
Here is how to apply CSP to crosswords:
1) each square is represented by a set of letters A to Z (I personally
prefer to represent it as an integer, with one bit per letter)
2) At the beginning of the grid filling, every square contains all
possibles values A-Z.
(programmatically, this is equivalent to setting all 26 bits to 1).
3) Reduction
At the first iteration, you check every possible word horizontally
then vertically, and remove the impossible letters.
For example, if the first horizontal word cannot start with Z, you can
remove the Z from the first square, and this will remove the Z in the
vertical words.
You continue to perform this 'reduction' as long as letters are removed.
4) Cheapest-first heuristic
When there is no possible reduction, you select the square that
contains the least possible letters.
At this moment, I recommend using some weighting, like choosing the
square with the longest word and the least possible letters.
5) Once you chose a square, you enumerate all possible values, and
call the same routine recursively (at the above point 3).
Madore enumerates the values randomly, but you may apply some other
weighting, like choosing the letter that gives the least number of
words.
I obtained a major boost in the grid filling by using a bits table for
every word in the crossword.
Originally, all the bits in the table are set to 1, meaning that the
words can be placed.
When a word is impossible to place, the bit is set to 0.
This way, when you try to check all words that fit in one place, a bit
equal to 0 means that it's impossible to place.
I hope I have been clear.
The algorithm can be programmed in around 200 lines of code.
Another approach is to fill grid with words, instead of letters, but
it's basically the same technique.
For more information, I recommend the following articles:
Search Lessons Learned from Crossword Puzzles (Ginsberg & al)
Constructing Crossword Grids: Use of Heuristics vs Constraints (Meehan & Gray)
Solving Crossword Puzzles as Probabilistic Constraint Satisfaction
(Shazeer, Littman & Keim)
Constraint Programming Lessons Learned from Crossword Puzzles
(Beacham, Chen, Sillito and van Beek)
Crossword Puzzles as a Constraint Problem (Anbulagan & Adi Botea)
Crossword Grid Composition with a Hierarchical CSP Encoding (Adi Botea)
Generalized NoGoods in CSP (Katsileros & Bacchus)
CS 227 Programming Assignment (Sullivan, Robertson & Vantimitta)
Practical Crossword Generation with Checkpoint Search (Arbiser)
If somebody is interested, I can provide links about themed
crosswords, how to solve crossword with a program, and clue generation
!
JC