CPT - The Primer

1. Introduction
    1.1 What is CPT for? 1.2 Who will need it?
    1.3 Multi-platform support? 1.4 What is the primer for?
2. The Crossword Power Tools Ideology
3. Crossword Set
    3.1 Header Elements 3.2 Crossword Elements
4. The Source-Target Paradigm
5. The Data
    5.1 Data Folders 5.2 Data Formats 5.3 Encoding
    5.4 Custom Composition
6. Crossword Properties
    6.1 Styles 6.2 Words
    6.3 Blacks and Whites 6.4 Symmetries
    6.5 Structure 6.6 Layout and Printing
7. Sudoku Properties
8. General Notes about the GUI
1. Introduction

1.1 What is CPT for?

The Crossword Power Tools (CPT for short) are programs for the creation of crosswords, sudoku puzzles, and dictionaries from scratch.

CPT Crosswords is a collection of tools that allow you with small number of mouse clicks to see the printout of the new generated puzzle.

The features include:

1.2 Who will need it?

It is for all crossword/sudoku setters and publishers. The CPT programs are not resource-intensive and can be run on a home PC. The dictionary tools are standalone applications which can be used by anyone.

1.3 Multi-platform support?

CPT modules are written in Java and in ANSI C but the current programs can be run only on x86 PCs under Windows and Linux. The supported Java versions are from 1.1 and above.

1.4 What is the primer for?

First of all, it is an overview of the current version of CPT. Second, it is a 'glossary' - all basic notions are described here. Any of the programs has its separate manual but it should be used together with this document.

The version of this document is 1.3.

 

2. The Crossword Power Tools Ideology

We don't think the software alone can do high quality crosswords. This is creative human task. Our goal is to support the users with flexible tools.

There are three major stages of creating the crossword in CPT. The first step is to create the layout of the diagram. The second one is filling the grid with words and the third one is adding the clues.

In the case of sudoku, a mask has to be prepared and then the puzzle is generated. The unconstrained approach is also supported - initial mask is not used, as in the conventional sudoku generators.

Block diagram of CPT Crosswords

 

All steps could be done by hand (with the Editor) or using the generators. The outputs of the generators are in different data formats called 'diagrams', 'grids', 'crosswords', and 'sudoku puzzles'. They differ in the contents and in the presentation of the diagrams. For example, B&W diagrams and sudoku masks are simple bitmaps. The 'grids' hold the diagrams and the words (or the masks and the digits). The 'crosswords' contain all information of a complete crossword. The 'sudoku puzzle' contain the mask and the givens.

The 'minor' stages include handling the word lists and the clues. The dictionary tools include the programs CPT Word Lists and CPT Dictionary. The first is able to create highly compressed word lists/dictionaries and the second is a dictionary browser.

CPT Crosswords includes the modules: CPT-Diagrams generator, CPT-Words generator, CPT-Clues generator, CPT-Sudoku generator, CPT-Editor, CPT-Wizard, and a subset of the dictionary tools.

The data processing in CPT is pipelined. For most of the tasks the input is a collection of items or set, and the output is a new set. This way the unit of processing is not a single crossword but a crossword set handled as library file.

 

3. Crossword Set

Strictly speaking, the crossword set is collection of items having the same size (columns x rows) and the same data format. Here 'set' is used in the mathematical sense - the software is ensuring no repetitions of items when the set is saved after any operation. The set has a header and contains one or more puzzles.

3.1 Header Elements

3.2 Crossword Elements

The elements described here do not reflect the real data structures but they appear this way in the dialogs where the user can see or edit the data.

 

4. The Source-Target Paradigm

The Source and the Target are merely working files that serve all modules in the system. For example, for any generator you have to select the input data into a Source and the result of the generation will be written into a Target. For the automated queries and for the manual selections the results are put into a Source. Here is a simple picture of these operations:

Operations in CPT

Any of these working files is 'crossword set', described in the previous chapter. Once again, they are working files and after any operation if you want to keep the result, save them into library file, because after the next operation they might be overwritten. The library files are also sets and they hold your data. During the save operation the Source/Target contents is added to the selected library with possible conversion and all duplicates are removed.

When the target operation includes supporting data (dictionaries), this data is called 'Base'. The notion of 'Base' is used in CPT Word Lists and in CPT Crosswords programs.

 

5. The Data

There are numerous details in the data management, which are not interesting for the user but the notions described here should be clear.

5.1 Data Folders

The Data Folder in general is filtered set of files of a directory defined by the user. The Data Folder types are:

To browse or edit something you have to select the folder type first, to define the directory and filters if necessary, and then to start the CPT Editor.

5.2 Data Formats

The Data Formats of crossword library files are:

The Data Formats of sudoku library files are:

Why many data formats? Because this way we can hold for example, 10000 B&W diagrams of size 15x15 just in 120 KB library file first, and second, for effective data processing. The penalty is that conversions should be done. When the user follows the standard steps, the conversions are done automatically. But there are cases where the user should point out the conversion.

Single crossword/sudoku files

The files from Files type folder can have different data format as well. The files with extension INI, PAT, LAT, and XBM can hold diagrams only. The others can hold diagrams or complete crosswords/sudokus. Our native single crossword/sudoku text file formats/extensions are INI and CPT. They can hold the data elements described in the Crossword Set chapter. This may not be true for the other external files produced by programs similar to the CPT kit. The only file format/extension that can support Locale and Encoding is CPT. On the other site, the non-native external files might contain properties, which are not supported and ignored by CPT modules. These are the scrambling of answers, non standard word numbers, and more than one letter in a cell.

The non-native supported formats for crosswords are the Across Lite TXT (v.1 and v.2) and PUZ (v.1, read only), and SYM (Sympaty). The non-native supported formats for sudoku are SS (Simple Sudoku, read only), SDK (Sudo Cue, SadMan, read only), SUD (Sudoku Puzzle Generator, read only), and TXT.

The Data Formats of dictionary files are:

The CTree is binary dictionary file supported by all CPT programs. 'Crossword form' means that the words are converted to lower case, and all non-letter characters are ignored during the creation (if custom composition is not used.) The data attached to the word could be one or more clues with 'xc' tag, and one or more answers with 'xa' tag.

5.3 Encoding

The different language writings are supported by computers via hundreds of encoding schemes. They are classified by the supported scripts and by the number bytes used per character. For example, the European languages can use one byte, while some of the Asian languages and Unicode schemes use two and more bytes.

The processing of different encoded texts is supported via pair of converters. The input converter is byte-to-character (btc), which translates bytes from the source encoding to Unicode characters. The output converter is character-to-byte (ctb) and its task is to translate Unicode characters to the target encoding. In CPT programs most of the names of encoding converters from Sun's Java international RTE are built in. There is a mechanism via 'User Defined Encoding' to use any available converter when it is not in the built in list. The same mechanism is used to select the custom converters developed for CPT. In our dialog boxes, where the built in list is shown, the encoding names are ordered as follows:

List of converters

The input files could be in any encoding, but internally, our programs work only in 'one-byte' and 'two-bytes Unicode' modes. This means that the crosswords and the dictionaries could be in any one-byte encoding or in Unicode, and in some dialogs the bottom half of the list will contain only one entry - Unicode.

When a crossword in Unicode is saved as a text file (CPT format), it is converted to the custom UnicodeASCII encoding.

The crossword modules support single character only (one-byte or two-bytes Unicode) in a letter cell. This means that an encoding using two or more character per letter should not be used. If there is no proper encoding converter, a new converter could be created (like the Vietnamese VN1 converter) or a custom composition should be used.

5.4 Custom Composition

The custom composition is intended to solve the 'single character per letter cell' problem. It is an encoding scheme similar to the Unicode normalization - a character not from the alphabet (called marker) is used to encode a sequence of characters from the alphabet.

For Thai and Hindi crosswords (syllable in a cell) the program uses custom Unicode normalization:

Hindi crossword

For rebus type crosswords (several letters in a cell) you have to use 'rebus definitions' to create the Base Words and then to use this encoded word list for the crossword generation/editing. In this method, the new encoded words are added to the source word list, while in Thai/Hindi composition the source words are replaced by the encoded words.

 

6. Crossword Properties

For any crossword, in any data format, the CPT modules are maintaining list of attributes or properties, which are shown in the windows, used in the queries or used for evaluation of the crossword quality.

6.1 Styles

The traditions and the crossword publishers have imposed some restrictions of the structure of the diagrams. We made summaries of these restrictions in several styles and built them into the software. We have to note that the requirements for high quality crosswords lie on the shoulders of the designers, we support them just with low level technical details.

NY Times

According to this style the minimum word length should be 3. The diagram should be square and should have standard symmetry. There should be no unches (unchecked letters), and there are restrictions on the number of words and blacks depending on the grid size. Usually the standard sizes are 15x15, 19x19, and 21x21. In CPT this restriction is relaxed and sizes as 12x12, 30x30, etc., are accepted as well.

15x15 NY Times diagram   45x45 NY Times diagram

Scandy

This style has very few restrictions. The top row and the left column should start with black and any odd position should contain black on these lines. The structure of the diagram should allow drawing the clues into the black boxes and the direction of clue arrow marks should be only on right and down.

11x16 Scandy diagram

Clues Inside

The only restriction of this style is the structure of the diagram to allow allocating the clues inside the black boxes. The Scandy style is subset of this one but the diagrams can have very different layout and the algorithms for allocating clue positions are different (see below).

16x16 Clues Inside diagram

Black Grid

The diagram contains blacks in even rows in every even column (as minimum) . These diagrams usually are used for cryptic crossword. The style is supported only by CPT Diagrams generator.

11x11 Black Grid diagram

Free

This is the style where no built in restrictions are imposed. The diagram generation process depends only on the parameters given by the user.

Clue Allocation

The clue allocation inside black boxes is supported for B&W diagrams having enough blacks. These diagrams are named here as Scandy and Clues Inside (in other sources also called Scandinavian or Swedish.) Any black cell can contain one or two clues. The program uses two algorithms: 'light' one for Scandy style and 'heavy' one for Clues Inside style. The user can start the algorithms manually for any diagram as well. The first one is relatively fast because the clue arrow direction is only down or right - for any clue the black box used is on top or on left of the word starting cell (and the like for reversed words: for horizontal word - on top or on right, and for vertical word - on bottom or on right). The second algorithm can take very long time because all possible combinations are checked - the clue arrow can be in any direction and in the word ending cell as well.

The clue allocated positions could be saved if the crossword contains clues. In all other cases the clue allocated positions are used just for the display and the printing.

6.2 Words

The structural properties about the words are:



6.3 Blacks and Whites

These properties include:

6.4 Symmetries

The list includes:

When the diagram has horizontal and vertical symmetry it is standard symmetrical as well but the reverse is not always true.

6.5 Structure

These properties include:



6.6 Layout and Printing

Most of these properties are temporarily set in the Editor (not saved in files), and used only for printing.

 

7. Sudoku Properties

Dimensions

This property includes the puzzle size (columns by rows) and the sudoku block size (columns by rows). The program supports only square sudoku grids having blocks which are rectangles. From these rectangles the supported one is that with the closest dimensions and the bigger block side defines the block columns.

The following dimensions are supported: 4x4 block 2x2; 6x6 block 3x2; 8x8 block 4x2; 9x9 block 3x3; 10x10 block 5x2; 12x12 block 4x3; 14x14 block 7x2; 15x15 block 5x3; 16x16 block 4x4; 18x18 block 6x3; 20x20 block 5x4; 21x21 block 7x3; 22x22 block 11x2; 24x24 block 6x4; 25x25 block 5x5; 26x26 block 13x2; 27x27 block 9x3; 28x28 block 7x4; 30x30 block 6x5; 32x32 block 8x4.

These are the characters used as digits (up to size 32x32):
123456789ABCDEFGHIJKLMNOPQRSTUVW

Part of 32x32 sudoku


Ambiguous

The sudoku puzzle should have only one (unique) solution. If the puzzle has two or more solutions, it is ambiguous.

Difficulty

This property defines how easy is to solve the puzzle: easy, medium, and hard.

Number Givens (Clues)

The number of given digits.

One Rule

In the solved sudoku every digit should be used once (and only once) in a row, in a column, and in a block.

Symmetries

The generator of sudoku masks supports:

Sudoku symmetrical on both diagonals

 

8. General Notes about the GUI

When you start the program, first you will see only the small top window with tabs Source, Base, Target, and Browse, and a button bar. Very soon the screen will not be enough.

There are no menus (with small exceptions in CPT Editor and CPT Word Lists). All options are selected via numerous dialogs started by buttons. There is a general rule about the numeric parameters: the value of -1 means "set the default value or ignore". For non-modal dialogs the OK button will only save the data but will not close the window. In these cases the Dismiss button will hide the window. The OK button in the top window will save the current state of all parameters you have set.

The layout of the buttons is just pictures (without text) and in the documentation they are referred by the contents of their tool tips, shown when the mouse is over a button. Some of the buttons are context sensitive - especially the Start button in the top window, it can run many different operations depending on the current tab and the options set.

The font used by all modules (except Print-Preview) is defined in View Options dialog in CPT Editor.

The control background color for all modules is defined in CPT Editor: File | Set Control Background Color.

In all windows the input focus is following the mouse movements. The 'keyboard focus' (using the Tab key) is also supported. When the focus is on a button, the keys Enter or Space Bar can be used to 'press' the button.

The communication with the clipboard is in Unicode and in logical order for the RTL scripts (under Linux you can use keyboard/clipboard one-byte encoding as well).
In most text fields/areas of the programs you can click with the right mouse button to show the pop-up menu having the following, depending of the context, items:

Note: For the earlier Linux Java versions to select an item from the pop-up menu, move the mouse pointer over the item and then release the right button.


top of page  |  cpt home