Giter Club home page Giter Club logo

travdata's Introduction

Traveller Data Utils

Python library and assorted tools for assisting with the Mongoose Traveller TTRPG system.

The extracted data is not for redistribution, as it is almost certainly subject to copyright (I am not a lawyer - but it’s safer to assume caution over distribution). This utility (and its output) is intended as an aid to those who legally own a copy of the Mongoose Traveller materials, and wish to make use of the data for their own direct purposes. It is the sole responsibility of the user of this program to use the extracted data in a manner that respects the publisher’s IP rights.

Important
Do not distribute the data extracted PDF files without explicit permission from the copyright holder.

The purpose of this tool is to extract the data for usage by the legal owner of a copy of the original materal that it was extracted from.

Reporting issues

Report any problems you encounter or feature requests to https://github.com/huin/travdata/issues.

Please include:

  • information about which operating system you are using the program on,

  • steps to reproduce the problem,

  • what you expected to happen,

  • what actually happened.

Ideally include text output of any error messages from the program, and/or screenshots to demonstrate the problem if text output is not relevant.

Usage

This package is primarily intended for the provided CLI tools, but API access is also possible.

Requirements

Java Runtime Environment (JRE) must be installed. This is required by the code that extracts tables from PDFs. If not already installed, get it from java.com.

Installation

Prebuilt

You can download an executable version of the application for your platform at github.com/huin/travdata/releases. Currently executables are only generated for Linux and Windows, and seem to work on the author’s machines. A MacOS binary is also released, but it has not been tested.

Once downloaded, extract the .zip file to a suitable location. You can most easily use the command line interface from the directory that it was unpacked to.

You may also wish to make a shortcut to the travdata_gui executable, but it can also be run directly from the unzipped directory.

Pip install

This may work on platforms that have no prebuilt executable. Assuming that you have Python 3.11 or later installed, and you are running something similar to Linux, perform the following commands to install into a Python virtual environment:

mkdir travdata
cd travdata
python -m venv venv
source ./venv/bin/activate

You will also need to download a copy of the source code, in order to get a copy of the configuration. Visit releases, pick a recent release, and download the "Source code" zip file. Extract the config directory from it, and place it in the travdata directory you created earlier, such that the travdata directory contains two subdirectories:

  • config

  • venv

At this point, you can run python -m travdata.cli.cli instead of running travdata_cli from other examples.

GUI travdata_gui

The GUI binary provides a user interface to aid in extracting CSV files, similarly to travdata_cli extractcsvtables.

Screenshot of extraction configuration GUI
  1. "Extraction configuration" selects the configuration for extraction from PDFs. It should detect its own configuration automatically. If it does not, click "Select configuration" and select the _internal/config directory that should have come with the download of the program.

  2. "Input PDF" selects the PDF to extract from, and the book that that PDF corresponds to.

    Note: only a very limited number of books are supported - at the time of writing, only the Core Rulebook 2022 update.

    Click "Select PDF" and choose the input PDF file.

  3. If selecting a PDF file did not automatically choose the correct book (based on its filename), choose it from the drop-down box below "Select PDF".

  4. "Output directory" selects a directory to write the extracted CSV data into. Choose an empty directory.

  5. "Extract" should now be enabled. Click it to start extraction. It will open a window to display progress and any errors.

  6. Close the extraction window once extraction is completed (and if you no longer need the output).

Note: the program will not extract the same table again if it sees that the CSV file is present in the output directory. If you wish to force re-extraction, delete some or all files from the output directory.

CLI travdata_cli extractcsvtables

This tool extracts CSV files from tables in the given PDF, based on the given configuration files that specifies the specifics of how those tables can be turned into useful CSV data. As such, it only supports extraction of tables from known PDF files, where the individual tables have been configured.

The general form of the command is:

travdata_cli extractcsvtables BOOK_NAME INPUT.PDF OUT_DIR

Where:

BOOK_NAME

is the identifier for the book to extract tables from. This selects the correct book’s configuration from the files that . Use travdata_cli listbooks to list accepted values for this argument.

INPUT.PDF

is the path to the PDF file to read tables from.

OUT_DIR

is the path to a (potentially not existing) directory to output the resulting CSV files. This will result in containing a directory and file structure that mirrors that in CONFIG_DIR, but will contain .csv rather than .tabula-template.json files.

At the present time, the only supported input PDF file is the Mongoose Traveller Core Rulebook 2022, and not all tables are yet supported for extraction.

Example:

travdata_cli extractcsvtables \
    core_rulebook_2022 path/to/update_2022_core_rulebook.pdf \
    path_to_output_dir

Developing

See development.adoc for more information on developing and adding more tables to the configuration.

travdata's People

Contributors

huin avatar dependabot[bot] avatar

Stargazers

 avatar

Watchers

Lucian avatar  avatar  avatar

travdata's Issues

Attribute error running CLI

Windows 11, using the 0.4.1 version of the windows executable; unable to start CLI executable.

D:\Projects\RPG\Traveller Mg2\TravData>travdata_cli.exe listbooks
Traceback (most recent call last):
File "travdata\cli\cli.py", line 18, in
File "", line 1176, in _find_and_load
File "", line 1147, in _find_and_load_unlocked
File "", line 690, in _load_unlocked
File "PyInstaller\loader\pyimod02_importers.py", line 419, in exec_module
File "travdata\cli\cliutil.py", line 13, in
File "travdata\cli\cliutil.py", line 16, in UsageError
AttributeError: module 'os' has no attribute 'EX_USAGE'
[15556] Failed to execute script 'cli' due to unhandled exception!

Extract skills and specialisations from the core rulebook 2022

Proposed extracted data structures as two types of table:

  1. A table containing all top-level skills, with columns:
    • skill name,
    • skill description.
  2. One table per top-level skill that has specialities, with columns:
    • speciality name,
    • description and examples of a skill check combined.

GUI not finding config files

When trying to run an extract through the GUI, it can't find a book.yaml file that's present in the correct location. Windows 11, 0.4.1 release, using config.zip from the _internal directory in the install folder.

Traceback (most recent call last):
File "travdata\filesio.py", line 364, in open_read
File "zipfile.py", line 1563, in open
File "zipfile.py", line 1492, in getinfo
KeyError: "There is no item named 'core_rulebook_2022\\book.yaml' in the archive"

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "travdata\gui\extraction\runnerwin.py", line 46, in run
File "travdata\extraction\bookextract.py", line 217, in extract_book
File "travdata\config_init_.py", line 90, in load_group
File "travdata\config_init_.py", line 244, in load_book
File "contextlib.py", line 137, in enter
File "travdata\filesio.py", line 366, in open_read
travdata.filesio.NotFoundError: core_rulebook_2022\book.yaml

Stopped.

image

Consider using camelot instead of tabula

Camelot is a Python based library that would remove the need to have Java installed on the system. It still needs an external program (Ghostscript), which should be weighed in comparison to requiring Java.

travdata_cli should be released as a self-contained executable

Users shouldn't have to install Python and use pip or similar to install it.

Ideally the solution should be:

  • compatible with: Linux, OSX, Windows.
  • compatible with Github actions (to auto-release), but actually doing this is potentially a separate followup issue.

Provide a GUI/WUI

Not everyone likes to use or has the knowledge to use the CLI. Provide a GUI or web-UI to use the key features provided by the CLI:

  • CSV extraction
  • tradetable

Consider rewriting table extraction in Jsonnet

Jsonnet would allow reasonably sandboxed logic to be embedded with the configuration files, and help decouple transformation specific features in the binary from the configuration versioning.

It would also allow for much more flexible transformations.

tradetable should infer Ht and Lt from UWP

Ht and Lt trade codes may not always be present on sector data. The tradetable subcommand should look at the tech level from the UWP to determine if the criteria is met for these trade codes.

Configure extraction of traveller creation tables

These are currently disabled for extraction. The existing templates for them exist, but each career's template combines all tables into the single template, which doesn't work well with the current extraction model, which is one template per logical table.

Likely the best solution is to simply split up the templates into one per table, which should be fairly straightforward, but laborious. Probably make use of jq or something to streamline the process.

Use PEX instead of PyInstaller for POSIX-alike platforms

The PyInstaller build doesn't seem especially happy on Linux versions that I've tried.

The reason for using PyInstaller was that it supported Windows, where PEX did not. However, if PyInstaller is not supporting Linux (unclear on MacOS status), then we might need to use PEX on the non-Windows platforms anyway.

Include tagging metadata in configuration

The configuration should support tagging of tables, or entire groups of tables (recursively), to support a more logical way of referring to tables without tying client code to the slightly arbitrary group hierarchy.

Consider rewriting in Rust

Python has served fairly well, but it doesn't lend itself well to portable distribution as an executable. It's possible, but carries a lot of dead weight.

This is purely a speculative issue, and a place to gather notes for if this work can take place.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.