Giter Club home page Giter Club logo

docs-automation's Introduction

This repository contains code running the universaldependencies.org page automation.

Getting all language repositories

The scripts below assume you have all UD treebank repositories cloned. You can do this with the help of this script:

./fetch_ud_repos.sh

Treebank and language metadata and stats

The various components of the automation, especially the main language table, need metadata and stats gathered from the treebank data and the machine-readable section of the language readme. This is saved as a json file, one file per treebank, and is re-run every time anything changes in the corresponding treebank repository. All auto-generation scripts source from this json.

To run the following code, you need Python3 with the 'yaml' module installed. If it complains about the missing 'yaml' module, install it using:

pip3 install --user pyyaml
pip3 install --user jinja2

and then you can

repo_dir="/some/path/UD_Finnish-TDT"
OUTDIR="_corpus_metadata"
python3 corpus_stats.py --readme-dir $repo_dir --repo-name $(basename $repo_dir) --codes-flags ./codes_and_flags.yaml --json $repo_dir/*-ud-{train,dev,test}*.conllu > $OUTDIR/$(basename $repo_dir).json

or using the script:

./refresh_corpus_data.sh /some/path/UD_Lang1 /some/path/UD_Lang2

Languages at glance table

The accordion table on the UD index page is included from _includes/at_glance.html by Jekyll. This file at_glance.html is produced using the at_glance.py script:

python3 at_glance.py --codes codes_and_flags.yaml --genre genre_symbols.json --docs-dir docs-src _corpus_metadata/*.json > docs-src/_includes/at_glance.html
  • codes_and_flags.yaml and genre_symbols.json are self-explanatory
  • docs-src tells the script where to look for source files of the documentation, so that it can avoid generating dead links
  • *.json is a bunch of per-treebank jsons produced as above

The script can be given --skip empty or --skip withdata to generate separate tables for languages with/without data:

python3 at_glance.py --codes codes_and_flags.yaml --genre genre_symbols.json --docs-dir ../docs _corpus_metadata/*.json --skip empty > ../docs/_includes/at_glance.html
python3 at_glance.py --codes codes_and_flags.yaml --genre genre_symbols.json --docs-dir ../docs _corpus_metadata/*.json --skip withdata > ../docs/_includes/at_glance_empty.html

The table is generated from the Jinja2 template in templates/language.md, with several custom filters found in at_glance.py. These are fed the data straight from the treebank jsons and refer to their structure. If you add any functionality, please keep the template tidy without complex ifs, and implement complex stuff as custom filters.

docs-automation's People

Contributors

dan-zeman avatar fginter avatar jnivre avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.