We need to decide a good way to store the gwadoc data, but it's not yet clear what are the intended uses or who are the intended users beyond generating the HTML documentation.
The current (not checked-in) data is a python file that fills dictionaries with data. If generating documentation is the only use, we may as well put it directly into restructuredText. If we want a Python API, e.g., to request the localized name, definition, reverse, etc. from OMW, then it might make sense to make Python classes (Sphinx's autodoc could possibly be used to generate the docs, then).
In either case we could store the data in a data file and transform it (perhaps with validation) into the target representation. I propose using TOML. Even though it is relatively new and not in the standard library, it was chosen for Rust's package manager and for the future of Python packaging (see PEP-0518), so it has support by major projects.
Here's a what (part of) hypernym
would look like:
[hypernym]
[hypernym.name]
en = "Hypernym"
symbol = "⊃"
ja = "上位語"
[hypernym.def]
en = "a word that is more general than a given word"
pl = "Relacja łącząca znaczenie z drugim, ogólniejszym, niż to pierwsze, ale należącym do tej samej części mowy, co ono"
ja = "当該synsetが相手synsetに包含される"
There's some flexibility in TOML (but not as flexible as YAML, which is a good thing). Something like this would be equivalent, e.g., if you want to group all attributes by language:
[hypernym]
name.en = "Hypernym"
def.en = "a word that is more general than a given word"
# etc...
And while I would like to place this file (gwadoc.toml
or whatever) at the top level so it's more prominent for non-Python users/contributors, that would make it much more difficult to distribute with the project and for the python code to find when run. So it might go under gwadoc/gwadoc.toml
instead.
As an alternative, if we don't care much about non-Python users, we could make a Python class like Relation
and do things like this:
rels['hypernym'] = Relation(
name={
"en": "Hypernym",
"ja": "上位語",
},
def={
"en": "a word that is more general than a given word",
}
)
Then query it like this:
>>> hypernym = rels['hypernym']
>>> hypernym.name['en']
Hypernym