Giter Club home page Giter Club logo

openhownet's Introduction

|En

OpenHowNet Logo

ReadTheDoc Status PyPI version GitHub release (latest by date) GitHub PRs are Welcome

OpenHowNet API is developed by THUNLP, which provides a convenient way to search information in HowNet, display sememe trees, calculate word similarity via sememes, etc. You can also visit our website to enjoy searching and exhibiting sememes of words online.

If you use any data or API provided by OpenHowNet in your research, please cite the following paper:

@article{qi2019openhownet,
    title={OpenHowNet: An Open Sememe-based Lexical Knowledge Base},
    author={Qi, Fanchao and Yang, Chenghao and Liu, Zhiyuan and Dong, Qiang and Sun, Maosong and Dong, Zhendong},
    journal={arXiv preprint arXiv:1901.09957},
    year={2019},
}

Introduction to HowNet

HowNet is the most typical sememe knowledge base. A sememe is defined as the minimum semantic unit in linguistics, and some linguists believe that the meanings of all words in any language can be represented by a limited set of sememes. Mr Zhendong Dong and his son Qiang Dong put this idea into practice, and spent almost 30 years building HowNet, which predefines about 2,000 sememes and uses them to annotate over 200,000 senses of English and Chinese words.

Since HowNet was constructed, it has been widely utilized in various NLP tasks. You can refer to this paper list to take a look at all the HowNet-related studies.

HowNet Dictionary

HowNet core data file (namely HowNet dictionary that can be downloaded here) consists of 237,973 concepts (or senses) represented by Chinese & English words and phrases. Each concept in HowNet is annotated with a sememe-based definition, the POS tag, sentiment orientation, example sentences, etc. Here is an example of how concepts are annotated in HowNet:

NO.=000000026417 	# Concept ID
W_C=不惜 	# Chinese word
G_C=verb 	# POS tag of the Chinese word
S_C=PlusFeeling|正面情感 	# Sentiment orientation
E_C=~牺牲业余时间,~付出全部精力,~出卖自己的灵魂 	# Example sentences of the Chinese word
W_E=do not hesitate to 	# English word 
G_E=verb 	# POS tag of the English word
S_E=PlusFeeling|正面情感 	# Sentiment orientation
E_E=               	# Example sentences of the English word
DEF={willing|愿意} 	# Sememe-based definition
RMK=

OpenHowNet API

Installation

You can choose either of the following two methods to install OpenHowNet API.

  1. Installation via pip (recommended)
pip install OpenHowNet
  1. Installation via Github
git clone https://github.com/thunlp/OpenHowNet/
cd OpenHowNet
python setup.py install
Requirements
  • Python>=3.6
  • anytree>=2.4.3
  • tqdm>=4.31.1
  • requests>=2.22.0

Core Data Type

  • HowNetDict:HowNet dictionary class, which encapsulates the core functions such as HowNet core data retrieval, presentation, similarity calculation, etc.
  • Sense:The class that encapsulates the information of concepts in HowNet, mainly including Chinese and English words, POS, sememe-based definition, etc.
  • Sememe:The class that encapsulates the information of sememes in HowNet, including Chinese and English words describing a sememe, frequency of a sememe in HowNet, and the relationship between sememes.

Basic Usage

The following code snippets illustrate some basic functions of OpenHowNet API. You can also download this Jupyter Notebook to run the code. For more functions and detailed information, please turn to our documentation.

Initialization

import OpenHowNet
hownet_dict = OpenHowNet.HowNetDict()

An error will occur if you haven't downloaded the HowNet data. In this case you need to run OpenHowNet.download() first.

Get Concepts Represented by a Word

By default, the api will search HowNet for all the concepts (senses) represented by the given word (in English or Chinese) and return a list of instances in the Sense class. You can also set the language to reduce search time. If the given word does not exist in HowNet, this api will return an empty list.

>>> # Get all the senses represented by the word "苹果".
>>> result_list = hownet_dict.get_sense("苹果")
>>> print("The number of retrievals: ", len(result_list))
The number of retrievals:  8
 
>>> print("An example of retrievals: ", result_list)
An example of retrievals:  [No.244401|apple|苹果, No.244402|malus pumila|苹果, No.244403|orchard apple tree|苹果, No.244396|apple|苹果, No.244397|apple|苹果, No.244398|IPHONE|苹果, No.244399|apple|苹果, No.244400|iphone|苹果]

You can get the detailed information of a sense by the Sense instance.

>>> sense_example = result_list[0]
>>> print("Sense example:", sense_example)
Sense example: No.244401|apple|苹果
>>> print("Sense id: ",sense_example.No)
Sense id:  000000244401
>>> print("English word in the sense: ", sense_example.en_word)
English word in the sense:  apple
>>> print("Chinese word in the sense: ", sense_example.zh_word)
Chinese word in the sense:  苹果
>>> print("HowNet Def of the sense: ", sense_example.Def)
HowNet Def of the sense:  {tree|:{reproduce|生殖:PatientProduct={fruit|水果},agent={~}}}
>>> print("Sememe list of the sense: ", sense_example.get_sememe_list())
Sememe list of the sense:  {fruit|水果, tree|, reproduce|生殖}

You can visualize the structured sememe-based definition of a sense (namely the "sememe tree")

>>> sense_example.visualize_sememe_tree()
[sense]No.244401|apple|苹果
└── [None]tree|
    └── [agent]reproduce|生殖
        └── [PatientProduct]fruit|水果

Get All Words and Sememes in HowNet

The package provides api to get all the senses, words and sememes in HowNet.

>>> all_senses = hownet_dict.get_all_senses()
>>> print("The number of all senses: {}".format(len(all_senses)))
The number of all senses: 237974
  
>>> zh_word_list = hownet_dict.get_zh_words()
>>> print("Chinese words in HowNet: ",zh_word_list[:30])
Chinese words in HowNet:  ['', '"', '#', '#号标签', '$', '$.J.', '$A.', '$NZ.', '%', "'", '(', ')', '*', '+', ',', '-', '--', '.', '...', '...为止', '...也同样使然', '...以上', '...以内', '...以来', '...何如', '...内', '...出什么问题', '...发生了什么', '...发生故障', '...家里有几口人']

>>> en_word_list = hownet_dict.get_en_words()
>>> print("English words in HowNet: ",en_word_list[:30])
English words in HowNet:  ['A', 'An', 'Frenchmen', 'Frenchwomen', 'Ottomans', 'a', 'aardwolves', 'abaci', 'abandoned', 'abbreviated', 'abode', 'aboideaux', 'aboiteaux', 'abscissae', 'absorbed', 'acanthi', 'acari', 'accepted', 'acciaccature', 'acclaimed', 'accommodating', 'accompanied', 'accounting', 'accused', 'acetabula', 'acetified', 'aching', 'acicula', 'acini', 'acquired']

>>> all_sememes = hownet_dict.get_all_sememes()
>>> print('There are {} sememes in HowNet'.format(len(all_sememes)))
There are 2540 sememes in HowNet

Get Sememes of a Word

You can retrieve sememe-based definitions of the senses represented by the given word. By default, the package will retrieve all the senses represented by the word and return their sememe list separately.

>>> hownet_dict.get_sememes_by_word(word = '苹果', display='list', merge=False, expanded_layer=-1, K=None)
[{'sense': No.244396|apple|苹果,
  'sememes': {PatternValue|样式值, SpeBrand|特定牌子, able|, bring|携带, computer|电脑}},
 {'sense': No.244397|apple|苹果, 
  'sememes': {fruit|水果}},
 {'sense': No.244398|IPHONE|苹果,
  'sememes': {PatternValue|样式值, SpeBrand|特定牌子, able|, bring|携带, communicate|交流, tool|用具}},
 {'sense': No.244399|apple|苹果,
  'sememes': {PatternValue|样式值, SpeBrand|特定牌子, able|, bring|携带, communicate|交流, tool|用具}},
 {'sense': No.244400|iphone|苹果,
  'sememes': {PatternValue|样式值, SpeBrand|特定牌子, able|, bring|携带, communicate|交流, tool|用具}},
 {'sense': No.244401|apple|苹果, 
  'sememes': {fruit|水果, reproduce|生殖, tree|}},
 {'sense': No.244402|malus pumila|苹果,
  'sememes': {fruit|水果, reproduce|生殖, tree|}},
 {'sense': No.244403|orchard apple tree|苹果,
  'sememes': {fruit|水果, reproduce|生殖, tree|}}]

By changing display , the sememes of a sense can be displayed in list form(list), dictionary form(dict), tree node form(tree) and visualization form(visual).

# Get the sememes in the form of dictionary
>>> hownet_dict.get_sememes_by_word(word='苹果',display='dict')[0]
{'sense': No.244396|apple|苹果, 'sememes': {'role': 'sense', 'name': No.244396|apple|苹果, 'children': [{'role': 'None', 'name': computer|电脑, 'children': [{'role': 'modifier', 'name': PatternValue|样式值, 'children': [{'role': 'CoEvent', 'name': able|, 'children': [{'role': 'scope', 'name': bring|携带, 'children': [{'role': 'patient', 'name': '$'}]}]}]}, {'role': 'patient', 'name': SpeBrand|特定牌子}]}]}}

# Get the sememes in the form of tree node (get the root node of the sememe tree)
>>> d.get_sememes_by_word(word='苹果',display='tree')[0]
{'sense': No.244396|apple|苹果, 'sememes': Node('/No.244396|apple|苹果', role='sense')}

# Visualize the sememes (Set K to control the num of visualized tree to print)
>>> d.get_sememes_by_word(word='苹果',display='visual',K=2)
Find 8 result(s)
Display #0 sememe tree
[sense]No.244396|apple|苹果
└── [None]computer|电脑
    ├── [modifier]PatternValue|样式值
    │   └── [CoEvent]able|
    │       └── [scope]bring|携带
    │           └── [patient]$
    └── [patient]SpeBrand|特定牌子

Display #1 sememe tree
[sense]No.244397|apple|苹果
└── [None]fruit|水果

Besides, when display=='list' , you can choose to merge all the sememe lists into one and limit the expand layer of the sememe trees by changing the parameter expanded_layer(-1 means expanding all layers).

>>> hownet_dict.get_sememes_by_word(word = '苹果', display='list', merge=True, expanded_layer=-1, K=None)
{PatternValue|样式值, SpeBrand|特定牌子, able|, bring|携带, communicate|交流, computer|电脑, fruit|水果,
 reproduce|生殖, tool|用具, tree|}

Get Relationship Between Two Sememes

You can get the relationship between two sememes by inputting the words (English or Chinese) that represent the sememes. You can choose to show the triplets of (sememe1, relation, sememe2).

>>> relations = hownet_dict.get_sememe_relation('FormValue','圆', return_triples=False)
>>> print(relations)
'hyponym'

>>> triples = hownet_dict.get_sememe_relation('FormValue','圆', return_triples=True)
>>> print(triples)
[(FormValue|形状值, 'hyponym', round|)]

Get Related Sememes with a Sememe

You can search all the sememes that have a certain relation with a sememe. Similarly, a sememe should be represented by a word (English or Chinese), but the relation must be in lowercase English.

>>> triples = hownet_dict.get_related_sememes('FormValue', relation = 'hyponym',return_triples=True)
>>> print(triples)
[(FormValue|形状值, 'hyponym', round|), (FormValue|形状值, 'hyponym', unformed|不成形), (AppearanceValue|外观值, 'hyponym', FormValue|形状值), (FormValue|形状值, 'hyponym', angular|), (FormValue|形状值, 'hyponym', square|), (FormValue|形状值, 'hyponym', netlike|), (FormValue|形状值, 'hyponym', formed|成形)]

Advanced Features

1: Sememe-based Word Similarity and Similar Words

The implementation is based on the paper:

Jiangming Liu, Jinan Xu, Yujie Zhang. An Approach of Hybrid Hierarchical Structure for Word Similarity Computing by HowNet. In Proceedings of IJCNLP 2013. [pdf]

Extra Initialization

Because there are some files required to be loaded for similarity calculation, the initialization overhead will be larger than before.

To begin with, you can initialize the hownet_dict object as follows:

>>> hownet_dict_advanced = OpenHowNet.HowNetDict(init_sim=True)
Initializing OpenHowNet succeeded!
Initializing similarity calculation succeeded!

You can also postpone the initialization of similarity calculation until use.

>>> hownet_dict.initialize_similarity_calculation()
Initializing similarity calculation succeeded!
Get senses that have exactly the same sememes

You can get senses that have the same sememe-based definition with a sense.

>>> s = hownet_dict_advanced.get_sense('苹果')[0]
>>> hownet_dict_advanced.get_sense_synonyns(s)[:10]
[No.110999|pear|山梨, No.111007|hawthorn|山楂, No.111009|haw|山楂树, No.111010|hawthorn|山楂树, No.111268|Chinese hawthorn|山里红, No.122955|Pistacia vera|开心果树, No.122956|pistachio|开心果树, No.122957|pistachio tree|开心果树, No.135467|almond tree|扁桃, No.154699|fig|无花果]
Get top-K nearest words for a word

The package search for senses that are represented by the given word, obtains the nearest top-K senses, and returns the corresponding words. Note that the language of the given word should be set.

You can also set the POS of words, choose to output the similarity, and merge all words belonging to difference senses into a single list, etc. Please see the documentation for more information.

If the input word is not in HowNet, the api returns an empty list.

>>> hownet_dict_advanced.get_nearest_words('苹果', language='zh',K=5)
{No.244396|apple|苹果: ['IBM', '东芝', '华为', '戴尔', '索尼'],
 No.244397|apple|苹果: ['丑橘', '乌梅', '五敛子', '凤梨', '刺梨'],
 No.244398|IPHONE|苹果: ['OPPO', '华为', '苹果', '智能手机', '彩笔'],
 No.244399|apple|苹果: ['OPPO', '华为', '苹果', '智能手机', '彩笔'],
 No.244400|iphone|苹果: ['OPPO', '华为', '苹果', '智能手机', '彩笔'],
 No.244401|apple|苹果: ['山梨', '山楂', '山楂树', '山里红', '开心果树'],
 No.244402|malus pumila|苹果: ['山梨', '山楂', '山楂树', '山里红', '开心果树'],
 No.244403|orchard apple tree|苹果: ['山梨', '山楂', '山楂树', '山里红', '开心果树']}
>>> hownet_dict_advanced.get_nearest_words('苹果', language='zh',K=5, merge=True)
['IBM', '东芝', '华为', '戴尔', '索尼']
Calculate the similarity between two words

If either of the two given words does not exist in HowNet, it will return -1.

>>> print('The similarity of 苹果 and 梨 is {}.'.format(hownet_dict_advanced.calculate_word_similarity('苹果','梨')))
The similarity of 苹果 and  is 1.0.

2: BabelNet Synset Dictionary

This package integrates query function for information of synsets in BabelNet (BabelNet synset). BabelNet is a multilingual encyclopedia dictionary composed of BabelNet synsets, each of which contains some multilingual synonyms that have the same meaning. The following work annotates sememes for some BabelNet synsets, and the function in this part is based on its annotation results.

Towards Building a Multilingual Sememe Knowledge Base: Predicting Sememes for BabelNet Synsets. Fanchao Qi, Liang Chang, Maosong Sun, Sicong Ouyang and Zhiyuan Liu. AAAI-20. [pdf] [code]

Extra Initialization

To begin with, you should initialize the BabelNet synset dictionary:

>>> hownet_dict.initialize_babelnet_dict()
Initializing BabelNet synset Dict succeeded!
# Or you can initialize when create the HowNetDict instance
>>> hownet_dict_advanced = HowNetDict(init_babel=True)
Initializing OpenHowNet succeeded!
Initializing BabelNet synset Dict succeeded!
BabelNet synset information

The following API allows you to query the rich information in a BabelNet synset (Chinese and English synonyms, definitions, picture urls, etc.).

>>> syn_list = hownet_dict_advanced.get_synset('黄色')
>>> print("{} results are retrieved and take the first one as an example".format(len(syn_list)))
3 results are retrieved and take the first one as an example

>>> syn_example = syn_list[0]
>>> print("Synset: {}".format(syn_example))
Synset: bn:00113968a|yellow|

>>> print("English synonyms: {}".format(syn_example.en_synonyms))
English synonyms: ['yellow', 'yellowish', 'xanthous']

>>> print("Chinese synonyms: {}".format(syn_example.zh_synonyms))
Chinese synonyms: ['黄', '黄色', '淡黄色+的', '黄色+的', '微黄色', '微黄色+的', '黄+的', '淡黄色']

>>> print("English glosses: {}".format(syn_example.en_glosses))
English glosses: ['Of the color intermediate between green and orange in the color spectrum; of something resembling the color of an egg yolk', 'Having the colour of a yolk, a lemon or gold.']

>>> print("Chinese glosses: {}".format(syn_example.zh_glosses))
Chinese glosses: ['像丝瓜花或向日葵花的颜色。']
BabelNet synset relations

You can get the related BabelNet synsets with a given synset.

>>> related_synsets = syn_example.get_related_synsets()
>>>print("There are {} synsets that have relation with the {}, they are: ".format(len(related_synsets), syn_example))
There are 6 synsets that have relation with the bn:00113968a|yellow|, they are: 

>>>print(related_synsets)
[bn:00099663a|chromatic|彩色, bn:00029925n|egg_yolk|蛋黄, bn:00092876v|resemble|相似, bn:00020726n|color|颜色, bn:00020748n|visible_spectrum|可见光, bn:00081866n|yellow|黄色]
Get sememe annotations of a BabelNet synset

You can get the sememes of BabelNet synsets by inputting the word in the BabelNet synsets:

>>> print(hownet_dict_advanced.get_sememes_by_word_in_BabelNet('黄色'))
[{'synset': bn:00113968a|yellow|, 'sememes': [yellow|]}, {'synset': bn:00101430a|dirty|淫秽的, 'sememes': [lascivious|, dirty|, despicable|卑劣, BadSocial|坏风气]}, {'synset': bn:00081866n|yellow|黄色, 'sememes': [yellow|]}]

>>> print(hownet_dict_advanced.get_sememes_by_word_in_BabelNet('黄色',merge=True))
[lascivious|, despicable|卑劣, BadSocial|坏风气, dirty|, yellow|]

For more detailed instructions, please refer to the documentation.

Citation

If the code or data help you, please cite the following paper:

@article{qi2019openhownet,
  title={Openhownet: An open sememe-based lexical knowledge base},
  author={Qi, Fanchao and Yang, Chenghao and Liu, Zhiyuan and Dong, Qiang and Sun, Maosong and Dong, Zhendong},
  journal={arXiv preprint arXiv:1901.09957},
  year={2019}
}

openhownet's People

Contributors

dozbear avatar fanchao-qi avatar lvcc2018 avatar yangalan123 avatar zibuyu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

openhownet's Issues

OpenHowNet.download()下载失败

在GPU服务器中调用API。下载的版本为OpenHowNet2.0
报错ConnectionError,请问该如何解决呢?
/OpenHowNet/resources.zip 该资源是否提供其他途径下载?谢谢!

python3.6.2 win10上hownet_dict.get_all_sememes() 有编码错误

UnicodeDecodeError Traceback (most recent call last)
in ()
----> 1 hownet_dict.get_all_sememes()

~\Anaconda3\envs\tensorflow\lib\site-packages\OpenHowNet\Standards.py in get_all_sememes(self)
247 package_directory = os.path.dirname(os.path.abspath(file))
248 f = get_resource("sememe_all.txt", 'r')
--> 249 buf = f.readlines()[0]
250 self.sememe_all = buf.strip().split()
251 return self.sememe_all

UnicodeDecodeError: 'gbk' codec can't decode byte 0xa7 in position 22: illegal multibyte sequence

get_resource函数里加上encoding参数才能对

Code error: .get_sememes_by_word

If you run the 'obj.get_sememes_by_word', you will get a TypeError:
TypeError: unsupported operand type(s) for |=: 'set' and 'list'
"res = set()
sememe_x = self.get_sememe(x, strict=strict)
for s_x in sememe_x:
res |= s_x.get_senses()
return list(res)"

You need add set() to s_x.get_senses().
"res = set()
sememe_x = self.get_sememe(x, strict=strict)
for s_x in sememe_x:
res |= set(s_x.get_senses())
return list(res)"

一个义原标注小错误

您好,我在进行义原中英文对照的时候发现本来就对应中文义原“文莱”的英文义原“Brunei”还有对应了另一个中文义原“乌干达”。我在OpenHowNet/OpenHowNet/HowNet_dict.zip文件中也找到了源头:(从HowNet_original_new.txt文件第2032729行开始)

NO.=169394
W_C=乌干达先令
G_C=noun [2�unit] [wu1 gan1 da2 xian1 ling4]
S_C=
E_C=
W_E=New Uganda Shilling
G_E=noun [53Shilling�noun�-0�unit�00 ]
S_E=
E_E=
DEF={money|货币:belong="Brunei|乌干达",domain={commerce|商业}{finance|金融}}
RMK=

Brunei应该是专指文莱的专有名词。

GPU running

你好,我有大量数据希望能通过OpenHowNet进行查询,不知道它是否能够部署到gpu上进行哪?如果可以能不能简单说明一下 谢谢

生成的pickle文件夹下的三个文件是错误的

我在使用解压到pickle文件夹下的hownet.pkl这个文件的时候,发现一些词返回的义项树是空的,只包含第一个义素,后边的children是空的,然后一些词的相似度排序也明显的不对,比如“男人”,返回的词是一些诸如“伕”、“伕役”、“俤”这样的词。
然后,我执行submit_user文件夹下的main.py文件重新生成了hownet.pkl等文件,义项树就不是空的了,“男人”的相似词也正常了。

get_sense_synonyns typo

s = hownet_dict_advanced.get_sense('苹果')[0]
hownet_dict_advanced.get_sense_synonyns(s)[:10]
[No.110999|pear|山梨, No.111007|hawthorn|山楂, No.111009|haw|山楂树, No.111010|hawthorn|山楂树, No.111268|Chinese hawthorn|山里红, No.122955|Pistacia vera|开心果树, No.122956|pistachio|开心果树, No.122957|pistachio tree|开心果树, No.135467|almond tree|扁桃, No.154699|fig|无花果]

get_sense_synonyns 应为 get_sense_synonyms

有大量数据重复

需要对数据进行清洗,输出数据重复了四次。

hownet_dict.visualize_sememe_trees("爱情", K=10)
Find 4 result(s)
Display #0 sememe tree
[sense]爱情
└── [None]emotion|情感
└── [CoEvent]BeInLove|恋爱
Display #1 sememe tree
[sense]爱情
└── [None]emotion|情感
└── [CoEvent]BeInLove|恋爱
Display #2 sememe tree
[sense]爱情
└── [None]emotion|情感
└── [CoEvent]BeInLove|恋爱
Display #3 sememe tree
[sense]爱情
└── [None]emotion|情感
└── [CoEvent]BeInLove|恋爱

使用OpenHowNet对动词进行分类

各位老师好!

我的研究课题是汉语及中介语里介词“对”和“向”及其搭配的动词的使用情况。目前,我已经将动词从语料库中提取出来了,每个语料库提取出有效动词的数量大概有在3、4千左右。接下来,我希望可以做到的是把动词按照语义和功能分类。

虽然这些动词的数量并不多,我本可以人工进行分类,可是我想让自己的研究少一些人为的主观干涉,并建立在更科学的研究方法上。因此,我想使用OpenHowNet动词数据集来对动词按照语义来分类。

从目前我在Python里运行的情况来看,我只学会了怎么将两个词进行对比。我想请教的是如何把我的几个动词在OpenHowNet里一次性分类呢?

我对NLP的基础基本是零,如果我问了一个非常愚蠢的问题还请不要笑话我。

十分感谢!

此外,好像现在知网(HowNet)核心数据的下载链接是无效的。

如何查询所有英文义原?

你好,API提供的get_all_semems()好像只能得到所有中文的义原,如果是要获得所有英文义原,要怎么处理呢?

关于hownet词的义原标注格式的说明

目前,我只在一篇2002年的论文中看到相关的说明《基于《知网》的词汇语义相似度计算 》。感觉有点过时。项目官网上也没有最新的说明文档。

OpenHowNet的sense问题

能否用OpenHowNet查询到每个sense对应的embedding呢?我找到了Improved Word Representation Learning with Sememes 这篇文章的一些学习结果,但是义项数目和相应的义原和OpenHowNet不是完全对应的,如果我想用OpenHowNet标注的义项,能否有什么方法可以获取它的embedding向量呢?

知网中的单词没有注音吗

请问可以将拼音也加在hownet里面的单词上面吗?一般意思确定了,单词的注音也确定了。知道了注音也可以帮助我们做一些其他的事情。

hownet版本问题

您好,请问这里面的是2012版的hownet吗?它和2011版有多大区别?我看一篇论文里用2011版,词的ID与这个hownet里不一致。

义原解析

如何把义原对应到PartPosi-tion、domain、whole、host和 modifier

Synonym extraction based on similarity

Hi, I am pretty interested in looking into the synonym extractions based on the sememe tree similarity using HowNet. I am wondering whether you or the original authors of the HowNet have benchmarked this method on some standard similarity evaluation dataset such as the SimLex-999 dataset and compared this method with some other popular methods for synonym extractions such as counter-fitting word embeddings. It would be great to have your thoughts on this topic. Thanks a lot!

Code error: all_senses = hownet_dict.get_all_sense()

The objects of hownet_dict is 'get_all_senses', not 'get_all_sense'.
All objects follow as:
['_HowNetDict__gen_sememe_list', '_HowNetDict__get_words_list_by_rule', '_HowNetDict__sense_similarity', 'calculate_word_similarity', 'en_map', 'get_all_babel_synsets', 'get_all_sememe_relations', 'get_all_sememes', 'get_all_sense_pos', 'get_all_senses', 'get_all_synset_pos', 'get_all_synset_relations', 'get_en_words', 'get_nearest_words', 'get_related_sememes', 'get_related_synsets', 'get_sememe', 'get_sememe_relation', 'get_sememes_by_word', 'get_sememes_by_word_in_BabelNet', 'get_sense', 'get_sense_synonyms', 'get_senses_by_sememe', 'get_synset', 'get_synset_relation', 'get_zh_words', 'has', 'initialize_babelnet_dict', 'initialize_similarity_calculation', 'sememe_dic', 'sense_dic', 'zh_map']

OpenHowNet 是否只能用于研究,而不能商用?

您好,之前查看知网信息的时候,发现知网的licence指明不能商用,所以没有继续跟进学习。

目前OpenHowNet我发现是MIT Licence,是不是说OpenHowNet是可以被用于商业软件开发了?

关于义原的role

用API hownet_dict.get_sememes_by_word("苹果",structured=True)[0]["tree"]返回的结果里面有个“role”key,请教下,这个是描述义原间关系的吗?这个roles是否是闭集,是否有API可以直接查询义原间的role呢?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.