alexdej / puzpy Goto Github PK

Python library for reading and writing across lite crossword puzzle .puz files.

License: MIT License

Python 100.00%

puzpy's Introduction

puz.py: python crossword puzzle library (.puz file parser)

Implementation of .puz crossword puzzle file parser based on .puz file format documentation here: http://code.google.com/p/puz/wiki/FileFormat

Examples

Load a puzzle file:

import puz
p = puz.read('testfiles/washpost.puz')

Print clues with answers:

numbering = p.clue_numbering()

print 'Across'
for clue in numbering.across:
    answer = ''.join(
        p.solution[clue['cell'] + i]
        for i in range(clue['len']))
    print clue['num'], clue['clue'], '-', answer

print 'Down'
for clue in numbering.down:
    answer = ''.join(
        p.solution[clue['cell'] + i * numbering.width]
        for i in range(clue['len']))
    print clue['num'], clue['clue'], '-', answer

Print the grid:

for row in range(p.height):
    cell = row * p.width
    # Substitute p.solution for p.fill to print the answers
    print ' '.join(p.fill[cell:cell + p.width])

Unlock a scrambled solution:

p.unlock_solution(7844)
# p.solution is unscambled

Save a puzzle with modifications:

p.fill = 'LAMB' + p.fill[4:]
p.save('mine.puz')

Notes

The parser is as strict as Across Lite, enforcing internal checksums and magic strings. The parser is designed to round-trip all data in the file, even fields whose utility is unknown. This makes testing easier. It is resilient to garbage at the beginning and end of the file (for example some publishers put the filename on the first line and some files have a rn at the end).

In addition to the handful of tests checked in here, the library has been tested on over 9700 crossword puzzles in .puz format drawn from the archives of several publications including The New York Times, The Washington Post, The Onion, and, the Wall Street Journal. As of writing, it can round-trip 100% of them with full fidelity.

Running tests

python tests.py

License

MIT License.

puzpy's People

Contributors

Stargazers

Watchers

puzpy's Issues

Creating brand new .puz files

Alternate title of this issue: calling Puzzle.save without having called Puzzle.load

I may be the first consumer of puzpy to try to create a .puz file from scratch using the provided API. The idea is to create a "blank" new Puzzle object and fill in the puzzle metatadata (title, rows, cols, etc.) the puzzle solution and the puzzle clues. My specific use case is converting another puzzle format to .puz format.

In attempting this, I ran into a few problems:

The default value of the preamble attribute cannot be written out
There is no provided API to populate rebus table or rebus solution
The rebus helper produces invalid .puz files because it forces the inclusion of the (unnecessary) RebuFill extension and decodes/encodes it incorrectly
Round-trip testing of puzzles with rebus entries does not exercise the encoding of any of the rebus extension sections because loading a .puz file with rebus extensions does not cause the rebus helper to be added

These are relatively easy issues to fix and I would be willing to try submit a pull request to address these issues, assuming you agree in spirit with the issues I described.

correctness check

correctness is typically pretty simple, just compare to Puzzle.answers. But in case of scrambled answers, it is useful to compare to the special answer checksum that is included. Should add a function that does the right thing depending on whether the answers are scrambled.

scramble/unscramble

Some .puz files have scrambled answers. I have code to do the scramble and unscramble, just need to incorporate it into this library.

Parsing fails when there is a notes section (global checksum does not match)

Some examples are:

https://www.nytimes.com/crosswords/game/daily/2019/03/31
https://www.nytimes.com/crosswords/game/daily/2019/04/02
https://www.nytimes.com/crosswords/game/daily/2019/04/09

I attached .puz and .pdf files. The puz files are gziped since github won't let me attach .puz files directly.

Mar3119.puz.gz
Apr0919.puz.gz
Apr0219.puz.gz
Mar3119.pdf
Apr0219.pdf
Apr0919.pdf

setup.py / upload on pypi

A setup.py file would be nice, so that this lib could be uploaded to pypi. It is not easily distributable without it.

Sample code for reading .puz files

Based on some emails I've gotten sounds like folks could use a bit of sample code in the README to help them get started with reading puz files and working with clue numbering, answers, etc.

For example:

Poking around I have been able to write out clues. Write out some of the answers. Here is what I am stuck on.

Printing out a csv listing of just the clues in one column and answers in the other column.
I believe I can get the across clues and answers to print out, but I am having trouble with the down answers. The down clues have a method to get those, but not finding the down answers.

Need to set preamble in Python 3.6.7

In Python 3.6.7 I can't write a .puz file unless I explicitly set the preamble to b''. Is there a workaround that will work in Python 2 and Python 3?

UTF-8 characters in puzzles not readable

FIrst off, good work! I found an "edge" case with the ISO-8859-1 encoding reads. Some puzzles have multi-byte characters in the clues. Probably even more true for other languages. https://ivan.mclauthlin.com/test/23-11-13-atl.puz is a small example. Clue 4 and 6 have emoji in them that look like they're UTF-8. I didn't debut into it to dig as I didn't see a super quick way to check. Thought I'd pass it along

Version number drift with PyPi

Hey there! Thank you thank you for this library, and I'm delighted to see some action in it lately. I wanted to flag that it looks like the release numbers here on Github have maybe drifted a bit from the release history over on PyPI. For my purposes (I use puzpy as a library in a terminal-based solving interface) the PyPI release is pretty key, so if there's anything I can do to help get those in sync please let me know!

clue numbering

.puz files rely on implicit clue numbering. I have clue numbering code, just need to incorporate it into this library.

Make .encode(ENCODING) more robust

Using python 3.7.9 and import html and from puz import (Puzzle, DefaultClueNumbering, BLACKSQUARE)

I'm converting files from the web into PUZ format with puzpy. The files contain html entities which I'm unescaping with html.escape.

So, for example:

&mdash; --> \u2014
&rsquo; --> \u2019

This causes:
UnicodeEncodeError: 'latin-1' codec can't encode character '\u2014' in position 28: ordinal not in range(256)

In puz.py, replacing every instance of
.encode(ENCODING)
with
.encode(ENCODING, 'replace')
solves the problem.

jpz support

factor Puzzle such that it can support .jpz format as well, and convert between .jpz and .puz.

`postscript` is read in as bytes but expects to be written out as a string

When reading a file that ends with "extra garbage" (as described in L207-208), that data is stored internally as bytes. It appears that the tobytes function later expects it to be a string.

I'm not sure what the purpose of postscript is—whether it's just for roundtripping insignificant data, or what—so I don't know for sure whether the best approach is

to change the return value of PuzzleBuffer.read_to_end() to be a string (as is the case with PuzzleBuffer.read_until())
to convert postscript to a string when it's assigned in Puzzle.load(), or
to continue to store it as bytes and just drop the .encode(ENCODING) from the tobytes() step on L273.

If one of these is preferable, I'd be happy to submit a PR to accomplish it. Even if none are suitable, I still would encourage moving the self.tobytes() step outside of the open context manager in Puzzle.save(), because in this case a failure at that step is causing the file to be opened empty and could result in data loss. (I've got a user experiencing that on an older set of puzzles.)

Add Python 3 support

The library was written a few years ago and relies on Python 2's use of str to represent bytes. This should be adapted to work for both Python 2 and 3.