Comments (4)
Hi,
if you have a wordnet derived from PWN 3.0 with the same offsets, then it can be done as follows:
>>> import wn
>>> ewn=wn.WordNet('omw-en:1.4')
>>> ewn.synset(f'omw-en-00981304-s')
Synset('omw-en-00981304-s')
Many people (including omw 1.0) treat all satellite adjectives (pos 's') as adjectives (pos 'a').
wn does not, so if you look up something with pos 'a' and it doesn't work, then it is worth also looking up 's'. So something like the following should get you what you want.
def offset2synset (wn, offset):
wnid= f'omw-en-{offset[3:-1]}-{offset[-1]}'
try:
synset = wn.synset(wnid)
except:
if offset[-1] == 'a':
wnid= f'omw-en-{offset[3:-1]}-s'
try:
synset = wn.synset(wnid)
except:
synset = None
else:
synset = None
return synset
>>> print(offset2synset(ewn, 'wn:00981304a'))
Synset('omw-en-00981304-s')
>>> print(offset2synset(ewn, 'wn:02001858v'))
Synset('omw-en-02001858-v')
from wn.
@BramVanroy thanks for the good questions (here and on the https://github.com/goodmami/penman project, too 👋). I agree that the documentation could be improved in this area, possibly in the NLTK migration guide.
And thanks, @fcbond, for the good description and solution.
The basic problem is that synset offsets (which are specific to each wordnet version) are not an inherent part of the WN-LMF formatted lexicons that are used by Wn, but for some lexicons (mainly the omw-
ones), the WordNet 3.0 offsets are conventionally used in the synset identifiers, so you just need to reformat the identifier appropriately, as @fcbond demonstrated.
Note that I also have an unmerged nltk
branch that tries to implement the NLTK's API as a shim on top of Wn, and its of2ss()
function is implemented using the same wn.util.synset_id_formatter()
function you linked to above:
Lines 329 to 342 in 5092e62
@fcbond said:
Many people (including omw 1.0) treat all satellite adjectives (pos 's') as adjectives (pos 'a').
wn does not
This is not entirely true. Wn does conflate s
and a
in the wn.ic
, wn.morphy
, wn.similarity
, and wn.taxonomy
modules, but it's true that it does not do so on the standard synset-lookup functions.
from wn.
First, thanks for the help! I settled for this:
def offset2omw_synset(wnet: wn.Wordnet, offset: str) -> Optional[wn.Synset]:
offset = offset.replace("wn:", "")
offset = "0" * (9-len(offset)) + offset
wnid = f"omw-en-{offset[:-1]}-{offset[-1]}"
wnid_s = None
try:
return wnet.synset(wnid)
except wn.Error:
if wnid[-1] == "a":
wnid_s = f"omw-en-{wnid[:-2]}-s"
try:
return wnet.synset(wnid_s)
except wn.Error:
pass
logging.warning(f"Could not find offset {offset} ({wnid}{' or ' + wnid_s if wnid_s else ''}) in {wnet._lexicons}")
I looked at the NLTK branch @goodmami and while I think that would be very useful, I just needed a quick function that I could easily plug into my code (without having to install from GitHub). But I think it'd be a useful API to have - although I can imagine it is a lot of work!
And thank you for your work. It seems a coincidence that you are providing exactly the tools that I need for my work. I am very thankful and motivated that you created these libraries - and that they work so well and are well-documented! I've also peeked at the internals/API and documentation to inspire my own work, so a big thank you!
from wn.
Thanks for the kind words, @BramVanroy! And I'm glad you were able to find a solution. I'm going to keep the issue open because, as the issue title states, I think this sort of information would be useful in the documentation, so the issue should be closed when that happens.
from wn.
Related Issues (20)
- Document "default mode" queries
- Missing Spanish definitions HOT 3
- antonyms in languages other than German and English HOT 5
- Support for PTB and Universal POS tags HOT 6
- If you create an entry with an ILIDefinition, but ill.id='' you lose the definition HOT 2
- Tracing back 'inferred' synsets to their reference lexicons HOT 3
- Stumped by multilingual relation traversal HOT 7
- Synset.relations() for some lexicons uses synset id as relation name
- Update Python versions HOT 2
- Is there any mapping between different English wordnet? HOT 4
- synset.relations fails with a KeyError HOT 3
- Merged synsets are lost in translation HOT 8
- SQLite objects created in a thread can only be used in that same thread HOT 3
- Add OEWN 2022 to index HOT 2
- Add a `conda` install option for `wn` on conda-forge channel HOT 5
- Allow contributors to self-assign issues with GitHub workflow HOT 3
- pyproject.toml: Fix ruff rules in tool.ruff.ignore
- Update Python versions, 3.8 to 3.12
- Add OEWN 2023 to index
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from wn.