globalwordnet / gwadoc Goto Github PK

View Code? Open in Web Editor NEW

11.0 9.0 6.0 3.33 MB

documentation for things like relations and parts of speech used by wordnets

Home Page: https://globalwordnet.github.io/gwadoc/

License: Creative Commons Attribution 4.0 International

Python 92.70% HTML 7.11% Shell 0.20%

wordnets language

gwadoc's People

Stargazers

Watchers

Forkers

mayankgoel28 yoyo-go chaytanyasinha gconnect tejasvicsr1 damn-dvlpr

gwadoc's Issues

No documentation for how to install or build the docs

It would be good to have a simple explanation of how to install gwadoc, as well as how to build the docs.

Inconsistencies for Meronymy/Holonymy

In general, when reading through the documentation, I am a bit uncertain what Concepts A and B really refer to. I would expect Concept A to always be the first entity in an outgoing relationship with Concept B, such that B satisfies some attribute of A, i.e.

A   :has_some_named_attribute_satifisfied_by   B

relation(A, B)

However, in some cases this does not seem to be the case.

In certain cases, A and B seems to be backreferences to similarly-named entities A and B in their opposite relation entries. The order of mention seems to be the thing to go by in those cases (i.e. the first mentioned entity is the first entity in the triple rather than A), but this is not consistent either.

As a sidenote, A and B are often randomly called X and Y throughout the documentation before going back to being called A and B again. My intuition tells me to replace X with A and Y with B, but that introduces contradictions in certain places (see examples further down).

Basically, the naming of the entities seems quite random and it makes using this resource as a source of truth for my work a bit hard. Right now I have to resort to deducing the intended usage from the published XML files of the English Wordnet instead. Hopefully, creating Github issues such as this one will help make the resource more consistent. It is still a wonderful piece of documentation, but it seems to need a little work.

Examples of what I find confusing:

Meronym: The short description says "Y makes up a part of X" but the definition says "concept A makes up a part of concept B". So which is it?
Holonym: Has the reverse short description of Meronym which is to be expected, but the definition further down is the exact same as Meronym.
Location Meronym: The description says "X is a place located in Y" and this is consistent with the definition further down which says "concept A is a place located in concept B". However, this seems inconsistent with the Meronym entry above which states that "Y makes up a part of X".
Location Holonym: "Y is a place located in X" - again this seems inconsistent with the description of the Holonym entry which states "X makes up a part of Y".
Member Meronym: "Concept A is a member of Concept B" - seems inconsistent with Meronym which states "Y makes up a part of X". The canonical example provided also contradicts the single example under the Examples heading, i.e. "player has member-meronym team" is the opposite direction of "fleet has member-meronym ship".
Member Holonym: "Concept B is a member of Concept A" - seems inconsistent with Holonym which states "X makes up a part of Y". Like with Member Meronym, the canonical example provided also contradicts the single example under the Examples heading.
Part Meronym: "concept A is a component of concept B" - seems inconsistent with Meronym which states "Y makes up a part of X". Both examples seem to contradict the definition (opposite direction).
Part Holonym: "Concept B is the whole where Concept A is a part" - so this one is actually consistent with Holonym which states "X makes up a part of Y", although the description itself mentions the entities in opposite order which is a bit confusing. We could perhaps rewrite it as "concept A is a component of concept B" instead which is the same meaning, but unfortunately also the exact same description as Part Meronym which is obviously contradictory.
Portion Meronym: "X is an amount/piece/portion of Y" - inconsistent with Meronym which states "Y makes up a part of X". Furthermore, the canonical example and the example under the Examples heading seems contradictory (opposite directions).
Portion Holonym: Same issues as Portion Meronym.
Substance Meronym: "Concept A is made of concept B." - inconsistent with Meronym which states "Y makes up a part of X".
Substance Holonym: "Concept-B is a substance of Concept-A" - seems to be semantically the same as the definition for Substance Meronym (B is the substance in both cases).

In general, the directionality of many of these relations seems to be opposite of the prior usages they are intending to subsume, e.g. in DanNet has_holo_member goes from member to group and has_holo_madeof goes from substance to whole, while has_mero_madeof and has_mero_member go in the opposite direction. Some quick googling tells me that is also how these attributes have been used in other WordNets, see e.g. descriptions for the Estonian Wordnet.

Contributing

The contributing guide is still a little hard to read, and it does not give accurate information about how to give examples.

In particular, the Relation Style Guide is really the most important part.

Maybe instead of a table, it would be better to have a list or collection of subheadings with explanations?

The documentation should end with a shared references section

This can then be referred to by the phenomenon specific documentation.

This would require a new template

There is currently some examples of citation in doc_en.py

It would be good to have a script to build index.lang.html for all languages

currently we build by hand:

python docs/build.py --lang ja html > docs/index.ja.html

But it would be good to build an index for each lang in gwadoc/doc_LANG.py.

Maybe a bash script would be easiest?

No symbols for the sense-sense relations

We want to use these in the OMW interface:
{% set sl2sym = {1:'⇔', 4:'⊞', 5:'⊳'} %}
We only have symbols for antonym, pertainym, and derivation.

is_entailed_by showing incorrect reversed relation

The wrong reverse relation is being shown for is_entailed_by. It is originally defined correctly:

gwadoc/gwadoc/doc_basic.py

Line 1191 in ee7a269

relations.is_entailed_by.fa.reverse = 'entails'

But then the data for other incorrectly uses is_entailed_by when defining its reverse, thus shadowing the original info:

gwadoc/gwadoc/doc_basic.py

Lines 1200 to 1207 in ee7a269

 ### Relation: other 

 relations.other.fa.parent = None 

 relations.other.fa.synset_synset = True 

 relations.other.fa.sense_synset = True 

 relations.other.fa.sense_sense = True 

 relations.other.fa.inOMW = True 

 relations.is_entailed_by.fa.reverse = 'also'

But what is correct for other? Does it have an reverse, or should none be declared?

Open Wordnet Documentation website redesign

Change the existing design

Old Design

New Design

We should have a guideline for writing the documentation for a relation

This should cover things like how the short definition is written, how to reference, the format of tests and so forth.

Currently examples are not linked to the wordnets

I propose we add a macro or two:

Linked to omw 1.0 for now, but can change to omw 2.0 soon:
{%- macro pwn(name, key) -%}
{{ name }}
{%- endmacro %}

And maybe: for named ili links:
{%- macro ilin(name, symbol) -%}
⟪{{ name }}⟫
{%- endmacro %}

We could also think of trying to get the name with a language parameter from OMW in the future, not quite sure how, ...

@goodmami does this look good to you? I have some examples and would like to get a student to add them in, with links, ...

Create documentation for Other and it's sub-members

Other,
See also,
State Of - Be In State,
Causes - Is Caused By,
In Manner - Manner Of,
Attribute,
Subevent - Is Subevent Of,
Restricts - Restricted By,
Classifies - Classified By,
Entails - Is Entailed By,

Create documentation for Meronym - Holonym sub-members

Location Meronym - Location Holonym
Member Meronym - Member Holonym
Part Meronym - Part Holonym
Portion Meronym - Portion Holonym
Substance Meronym - Substance Holonym

Filter relations by project

Showing all possible relation all the time can be overwhelming and confusing for someone who is interested in only one wordnet because a large percentage of the relations are irrelevant. We could add some feature to filter out those that are not used by a particular wordnet, perhaps by adding HTML classes, CSS, and a bit of javascript, or through custom builds that use the project information.

Create GitHub action to autogenerate HTML files

The HTML would best not be included in the repository directly, but generated and deployed on pushes to the main branch. We could setup a GitHub action that runs the build steps and copies to a gh-pages branch, for instance.

ili links to compling which is down

It should link to the new server on lr:

e.g., meronym: https://lr.soh.ntu.edu.sg/omw/omw/concepts/ili/69575

No inverse relation for similar/near_synonym (i.e. near_antonym)

I noticed that similar is basically the GWA name for the relation called near_synonym elsewhere.

However, what about near_antonym? I have multiple near_antonym relations in the old Danish wordnet dataset and I am not sure how to represent them when there's only similar/near_synonym.

There's a selection of antonym relations available in the form of...

Antonym (antonym)
Gradable Antonym (anto_gradable)
Simple Antonym (anto_simple)
Converse Antonym (anto_converse)

... but neither of them seem to describe near antonyms.

I guess the general antonym relation is most applicable, but I think using it would result in some information loss...? I would be grateful to hear any better suggestions you may have.

Google Season of Docs

I find your project really exciting and want to contribute as a technical writer.
I meet the eligibility criteria and would like to give it a try.
Can you please explain the detailed process of contribution.
Thanks

Create documentation for Co-role relation groups and the last three

Co Role

Co Agent Patient - Co Patient Agent
Co Agent Instrument - Co Instrument Agent
Co Agent Result - Co Result Agent
Co Patient Instrument - Co Instrument Patient
Co Result Instrument - Co Instrument Result

Participle
Pertainym
Derivation

Change master branch to main

We didn't want to change the 'master' branch name to 'main' during GSOD because of the potential for confusion, but once it wraps up it would be good to make the switch.

The group page should be made by the build script

I just did it by hand, but it would be good if it could be built automatically.

schemas vs gwadoc

https://globalwordnet.github.io/schemas/ has also lists of relations. maybe this page should be reformulated to discuss only the encoding and avoid the list of relations that should be listed only here for simplicity and consistency.

Create Chinese documentation

Create a Simplified Chinese version of the wordnet docs.

Suppress empty fields

If a relation has no tests or examples, then maybe we should just show nothing, rather than explicitly say "no tests", ...

What do you all think?

Normalize short description and short examples

Because we want the short definitions and examples to be useful for quick reference (e.g., on tooltips), we should ensure that the short definition is a single short sentence or phrase, not multiple sentences, and the example gives a single easy-to-understand example of the relation.

We should also normalize the way we talk about things, e.g., for hypernym we say "a word that is more general than a given word" but for meronym we say "Y makes up a part of X". We should be consistent in the way they are described.

Similarly for examples, for hypernym we have "animal is a hypernym of dog" but for meronym we have "hand/finger".

"hammer classifies teapot"

Taken from: https://globalwordnet.github.io/gwadoc/#classifies

This example caused me to pause and wonder how on earth a hammer might classify a teapot. Is it because hammers can break teapots..? Is it because some metal teapots are made in part using a hammer...?

Here are two things that could be the case:

My understanding of what classifies means is lacking.
The example is simply wrong.

Create Documentations for Constitutive, Instance Hyponym, Instance Hypernym, Antonym, Equal Synonym,Similar

Create Documentations for;

Constitutive,
Instance Hyponym,
Instance Hypernym,
Antonym,
Equal Synonym,
Similar

Change symbol from a LANGUAGE to a PART

This will make it less confusing.

Hyponym example is wrong

gwadoc/gwadoc/doc_en.py

Line 113 in b63abf8

 relations.hyponym.ex.en = "`dog <ILIURL/46360>`_ has hyponym `animal <ILIURL/35563>`_" 

This should be:

relations.hyponym.ex.en = "`dog <ILIURL/46360>`_ is a hyponym of `animal <ILIURL/35563>`_"

Besides being correct, this is also more consistent with the other examples.

Add license text

The license is only mentioned in the README and it therefore isn't automatically recognized by GitHub. If we add a LICENSE file with the CC license text then GitHub should be able to detect that automatically (which can be good for searching, or for people accustomed to looking for license info in the project meta info, etc.).

Link External Projects

It would be good to have a url for each of the projects (defined in inventories), and then link the project to that url when we present them

Clarify the description of underspecified relations

Some underspecified relations, such as constitutive, are not used directly. Others, such as meronym may be (I'm not really sure) but are usually given by subtypes (mero_location, mero_substance, etc.). Providing examples for things like meronym that are actually subtypes would be confusing, so these need to be more fully specified. Also the comment at the bottom of, e.g., meronym, is inaccurate:

This is an unspecified relation that covers all the relations below. This can be computed automatically, it shouldn't be a special relation.

First, I think it's "underspecified" rather than "unspecified", and second "all the relations below" is not right because there is no clear list or hierarchy, so a reader might think it includes the rest of the relations on the page. The final sentence isn't clear, either. How about:

"This is an underspecified relation that covers Location Meronym, Member Meronym, Part Meronym, Portion Meronym, and Substance Meronym. As such, it is not specified as a relation directly by wordnets, but a wordnet application may employ it as a general relation covering all its subtypes."

(and so on for other underspecified relations)

Look into using standard tooling for translations

The MultiString class for managing translations of the documentation was intended for use with a templating system like Jinja2. This works fine for now (although we haven't made much use of it yet), but we may want to eventually consider a more standard tool for this, such as the gettext module in the standard library. Maybe we could wrap gettext's class-based API with something to maintain a similar usage to MultiString.

Add documentation for Paninian Syntacto-Semantic relations

https://ltrc.iiit.ac.in/Publications/pan_english.html
https://cdn.iiit.ac.in/cdn/ltrc.iiit.ac.in/downloads/nlpbook/nlp-panini.pdf
https://semioticon.com/sio/courses/dynamical-models-in-semiotics-semantic/indian-grammatical-theory/

The karak semantic relations are commonly used for Indian Languages. This is essentially a feature request, but I'd like to work on it.

Do we mean syntactic or semantic diminutives?

I was looking at the diminutive relation and it does not seem clear to me when to apply it.

For example, the German "Mädchen" (girl) is a syntactic diminutive of "Magd" (maid) by the addition of then (regular) -chen suffix, but does not mean 'little maid'. In contrast, 'cottage' is a small 'house' in English (as defined by PWN) but these words have no morphological relation.

Currently, the definition (below) suggests that we are only dealing with semantic diminutives.

A concept used to refer to generally smaller members of a class

However, allowing the relationship between senses (which also contradicts them being a kind of hyponym) suggests that we do in fact care about the morphological process. I also suspect that most users want this to be able to record the addition of regular suffixes to nouns with this property. I would suggest that we instead consider this as a kind of derivation and allow it only between senses, so that we capture the change properly (I also guess this is why the definition above uses 'generally')

Alternatively, we could allow this relationship to be ambiguous and represent hyponymy and/or derivation

This probably also applies to the other subtypes of hyponymy (feminine, masculine, young form and augmentative)

'other' on sense–synset relations

The WN-LMF-1.0.dtd has lists for "synset relations" and "sense relations", but it does not distinguish sense–sense and sense–synset relations. According to this repository, there are just 3 sense–synset relation types: domain_topic, domain_region, and exemplifies, although I don't recall seeing this info codified anywhere else.

It seems like the other relation, which is a catch-all for undefined relation types, might be appropriate for sense–synset relations as well. Should it be documented as such?

Add link to short examples

Some of the short examples does not have link
Ensure consistency across all short examples

Some are like this
relations.domain_topic.ex.en = "computer science <ILIURL/68812>_ is a domain topic of CPU <ILIURL/51710>_ "

And some are like this
relations.antonym.ex.en = "Smart has antonym Stupid"

Create documentation for Role and its sub members

Role (role)⇔ Involved (involved)

Agent (agent)⇔ Involved Agent (involved_agent)
Patient (patient)⇔ Involved Patient (involved_patient)
Result (result)⇔ Involved Result (involved_result)
Instrument (instrument)⇔ Involved Instrument (involved_instrument)
Location (location)⇔ Involved Location (involved_location)
Direction (direction)⇔ Involved Direction (involved_direction)
Target Direction (target_direction)⇔ Involved Target Direction (involved_target_direction)
Source Direction (source_direction)⇔ Involved Source Direction (involved_source_direction)

Copy-paste error in instance_hyponym?

These are the long definitions for instance_hypernym and instance_hyponym.

rels.instance_hypernym.dfn.en = """
A relation between two concepts where concept X (``instance_hyponym``)
is a type of concept Y (``instance_hypernym``), and where X is an
individual entity.  X will be a terminal node in the hierarchy.
Instances are expressed by proper nouns.
An ``instance hypernym`` can also be referred to as a ``type``
"""
[...]
rels.instance_hyponym.dfn.en = """
A relation between two concepts where concept X (``instance_hyponym``)
is a type of concept Y (``instance_hypernym``), and where X is an
individual entity.  X will be a terminal node in the hierarchy.
Instances are expressed by proper nouns.
An ``instance hypernym`` can also be referred to as a ``type``
"""

Should these be different somehow?

How to store the data

We need to decide a good way to store the gwadoc data, but it's not yet clear what are the intended uses or who are the intended users beyond generating the HTML documentation.
The current (not checked-in) data is a python file that fills dictionaries with data. If generating documentation is the only use, we may as well put it directly into restructuredText. If we want a Python API, e.g., to request the localized name, definition, reverse, etc. from OMW, then it might make sense to make Python classes (Sphinx's autodoc could possibly be used to generate the docs, then).

In either case we could store the data in a data file and transform it (perhaps with validation) into the target representation. I propose using TOML. Even though it is relatively new and not in the standard library, it was chosen for Rust's package manager and for the future of Python packaging (see PEP-0518), so it has support by major projects.

Here's a what (part of) hypernym would look like:

[hypernym]

  [hypernym.name]
    en = "Hypernym"
    symbol = "⊃"
    ja = "上位語"

  [hypernym.def]
    en = "a word that is more general than a given word"
    pl = "Relacja łącząca znaczenie z drugim, ogólniejszym, niż to pierwsze, ale należącym do tej samej części mowy, co ono"
    ja = "当該synsetが相手synsetに包含される"

There's some flexibility in TOML (but not as flexible as YAML, which is a good thing). Something like this would be equivalent, e.g., if you want to group all attributes by language:

[hypernym]
name.en = "Hypernym"
def.en = "a word that is more general than a given word"
# etc...

And while I would like to place this file (gwadoc.toml or whatever) at the top level so it's more prominent for non-Python users/contributors, that would make it much more difficult to distribute with the project and for the python code to find when run. So it might go under gwadoc/gwadoc.toml instead.

As an alternative, if we don't care much about non-Python users, we could make a Python class like Relation and do things like this:

rels['hypernym'] = Relation(
    name={
        "en": "Hypernym",
        "ja": "上位語",
    },
    def={
        "en": "a word that is more general than a given word",
    }
)

Then query it like this:

>>> hypernym = rels['hypernym']
>>> hypernym.name['en']
Hypernym

Add some new relations

We want to add some new relations, these are all already in use by some wordnets.

Simple Aspect
Secondary Aspect
Feminine form
Masculine Form
Young Form
Diminutive
Augmentative
Gradable Antonym
Simple Antonym
Converse Antonym
Inter-register Synonym

Most of these are used by the Polish Wordnet Project, some by Czech and Bulgarian.

Create documentation for Domain - In Domain sub members

Domain Topic - Has Domain Topic
Domain Region - Has Domain Region
Exemplifies - Is Exemplified By

relations with no tests

Many cases missing tests. How to differentiate the pending documentation from nonexistence of tests in the literature?

participle, pertainym, and derivation need some more info

I've created the data structures to hold documentation for participle, pertainym, and derivation, but they are pretty bare. We should add, at least, a short English definition for each in doc_en.py, but maybe also project names in doc_basic.py.

Francis, I'm assigning this to you but I'm happy to do it if you can point me to the relevant information for these fields.

Make descriptions even more clear

A persistent problem with our documentation is that it fails to clarify the thing it describes. For instance, here are some fields for "hyponym":

short definition: "a word that is more specific than a given word"
long definition: "A relation between two concepts where concept B is a type of concept A."
Examples:
- beef hyponym meat
- pear hyponym edible fruit
- dictionary hyponym wordbook

The short definition isn't so bad, but the long definition depends on the understanding that "A" is meant to be the hyponym, not "B". Also, the short definition says that a hyponym is a word while the long definition says it is a relation between concepts. We should clarify these to be consistent such that hyponym is the concept and hyponymy is the relation. Then the examples are terrible because it's often confused whether that short form means "beef is a hyponym of meat" (correct) or "beef has hyponym meat" (incorrect).

I propose the following template:

short definition: "a concept that is more specific than a given concept"
long definition: "Concept A is a hyponym of a given concept B when A is a subtype of B
long definition (alternative): "Hyponymy is a relation between concepts where A is a hyponym of B when A is a subtype of B.
Examples:
- beef is a hyponym of meat
- pear is a hyponym of edible fruit
- dictionary is a hyponym of wordbook

I think the examples read more fluidly in the "A is a (relation) of B" form than "B has (relation) A" and it better matches the patterns in the definition.

Thoughts, anyone?

Create the Overview section

Make an introduction section at the beginning of the doc

link to synsets

relations could be defined by synsets. For hyponym, hypernym, antonym, instrument and agent I found clear synsets that defined the relations.

	### Relation: other

	relations.other.fa.parent = None
	relations.other.fa.synset_synset = True
	relations.other.fa.sense_synset = True
	relations.other.fa.sense_sense = True
	relations.other.fa.inOMW = True
	relations.is_entailed_by.fa.reverse = 'also'

globalwordnet / gwadoc Goto Github PK

gwadoc's People

Stargazers

Watchers

Forkers

gwadoc's Issues

Recommend Projects

Recommend Topics

Recommend Org