thunlp / openhownet Goto Github PK

View Code? Open in Web Editor NEW

599.0 23.0 88.0 264.78 MB

Core Data of HowNet and OpenHowNet Python API

Home Page: https://openhownet.thunlp.org/

License: MIT License

Python 75.38% Jupyter Notebook 24.62%

sememe hownet openhownet knowledge-base semantics nlp

openhownet's Introduction

中|En

OpenHowNet API is developed by THUNLP, which provides a convenient way to search information in HowNet, display sememe trees, calculate word similarity via sememes, etc. You can also visit our website to enjoy searching and exhibiting sememes of words online.

If you use any data or API provided by OpenHowNet in your research, please cite the following paper:

@article{qi2019openhownet,
    title={OpenHowNet: An Open Sememe-based Lexical Knowledge Base},
    author={Qi, Fanchao and Yang, Chenghao and Liu, Zhiyuan and Dong, Qiang and Sun, Maosong and Dong, Zhendong},
    journal={arXiv preprint arXiv:1901.09957},
    year={2019},
}

Introduction to HowNet

HowNet is the most typical sememe knowledge base. A sememe is defined as the minimum semantic unit in linguistics, and some linguists believe that the meanings of all words in any language can be represented by a limited set of sememes. Mr Zhendong Dong and his son Qiang Dong put this idea into practice, and spent almost 30 years building HowNet, which predefines about 2,000 sememes and uses them to annotate over 200,000 senses of English and Chinese words.

Since HowNet was constructed, it has been widely utilized in various NLP tasks. You can refer to this paper list to take a look at all the HowNet-related studies.

HowNet Dictionary

HowNet core data file (namely HowNet dictionary that can be downloaded here) consists of 237,973 concepts (or senses) represented by Chinese & English words and phrases. Each concept in HowNet is annotated with a sememe-based definition, the POS tag, sentiment orientation, example sentences, etc. Here is an example of how concepts are annotated in HowNet:

NO.=000000026417 	# Concept ID
W_C=不惜 	# Chinese word
G_C=verb 	# POS tag of the Chinese word
S_C=PlusFeeling|正面情感 	# Sentiment orientation
E_C=~牺牲业余时间，~付出全部精力，~出卖自己的灵魂 	# Example sentences of the Chinese word
W_E=do not hesitate to 	# English word 
G_E=verb 	# POS tag of the English word
S_E=PlusFeeling|正面情感 	# Sentiment orientation
E_E=               	# Example sentences of the English word
DEF={willing|愿意} 	# Sememe-based definition
RMK=

OpenHowNet API

Installation

You can choose either of the following two methods to install OpenHowNet API.

Installation via pip (recommended)

pip install OpenHowNet

Installation via Github

git clone https://github.com/thunlp/OpenHowNet/
cd OpenHowNet
python setup.py install

Requirements

Python>=3.6
anytree>=2.4.3
tqdm>=4.31.1
requests>=2.22.0

Core Data Type

HowNetDict：HowNet dictionary class, which encapsulates the core functions such as HowNet core data retrieval, presentation, similarity calculation, etc.
Sense：The class that encapsulates the information of concepts in HowNet, mainly including Chinese and English words, POS, sememe-based definition, etc.
Sememe：The class that encapsulates the information of sememes in HowNet, including Chinese and English words describing a sememe, frequency of a sememe in HowNet, and the relationship between sememes.

Basic Usage

The following code snippets illustrate some basic functions of OpenHowNet API. You can also download this Jupyter Notebook to run the code. For more functions and detailed information, please turn to our documentation.

Initialization

import OpenHowNet
hownet_dict = OpenHowNet.HowNetDict()

An error will occur if you haven't downloaded the HowNet data. In this case you need to run OpenHowNet.download() first.

Get Concepts Represented by a Word

By default, the api will search HowNet for all the concepts (senses) represented by the given word (in English or Chinese) and return a list of instances in the Sense class. You can also set the language to reduce search time. If the given word does not exist in HowNet, this api will return an empty list.

>>> # Get all the senses represented by the word "苹果".
>>> result_list = hownet_dict.get_sense("苹果")
>>> print("The number of retrievals: ", len(result_list))
The number of retrievals:  8
 
>>> print("An example of retrievals: ", result_list)
An example of retrievals:  [No.244401|apple|苹果, No.244402|malus pumila|苹果, No.244403|orchard apple tree|苹果, No.244396|apple|苹果, No.244397|apple|苹果, No.244398|IPHONE|苹果, No.244399|apple|苹果, No.244400|iphone|苹果]

You can get the detailed information of a sense by the Sense instance.

>>> sense_example = result_list[0]
>>> print("Sense example:", sense_example)
Sense example: No.244401|apple|苹果
>>> print("Sense id: ",sense_example.No)
Sense id:  000000244401
>>> print("English word in the sense: ", sense_example.en_word)
English word in the sense:  apple
>>> print("Chinese word in the sense: ", sense_example.zh_word)
Chinese word in the sense:  苹果
>>> print("HowNet Def of the sense: ", sense_example.Def)
HowNet Def of the sense:  {tree|树:{reproduce|生殖:PatientProduct={fruit|水果},agent={~}}}
>>> print("Sememe list of the sense: ", sense_example.get_sememe_list())
Sememe list of the sense:  {fruit|水果, tree|树, reproduce|生殖}

You can visualize the structured sememe-based definition of a sense (namely the "sememe tree")

>>> sense_example.visualize_sememe_tree()
[sense]No.244401|apple|苹果
└── [None]tree|树
    └── [agent]reproduce|生殖
        └── [PatientProduct]fruit|水果

Get All Words and Sememes in HowNet

The package provides api to get all the senses, words and sememes in HowNet.

>>> all_senses = hownet_dict.get_all_senses()
>>> print("The number of all senses: {}".format(len(all_senses)))
The number of all senses: 237974
  
>>> zh_word_list = hownet_dict.get_zh_words()
>>> print("Chinese words in HowNet: ",zh_word_list[:30])
Chinese words in HowNet:  ['', '"', '#', '#号标签', '$', '$.J.', '$A.', '$NZ.', '%', "'", '(', ')', '*', '+', ',', '-', '--', '.', '...', '...为止', '...也同样使然', '...以上', '...以内', '...以来', '...何如', '...内', '...出什么问题', '...发生了什么', '...发生故障', '...家里有几口人']

>>> en_word_list = hownet_dict.get_en_words()
>>> print("English words in HowNet: ",en_word_list[:30])
English words in HowNet:  ['A', 'An', 'Frenchmen', 'Frenchwomen', 'Ottomans', 'a', 'aardwolves', 'abaci', 'abandoned', 'abbreviated', 'abode', 'aboideaux', 'aboiteaux', 'abscissae', 'absorbed', 'acanthi', 'acari', 'accepted', 'acciaccature', 'acclaimed', 'accommodating', 'accompanied', 'accounting', 'accused', 'acetabula', 'acetified', 'aching', 'acicula', 'acini', 'acquired']

>>> all_sememes = hownet_dict.get_all_sememes()
>>> print('There are {} sememes in HowNet'.format(len(all_sememes)))
There are 2540 sememes in HowNet

Get Sememes of a Word

You can retrieve sememe-based definitions of the senses represented by the given word. By default, the package will retrieve all the senses represented by the word and return their sememe list separately.

>>> hownet_dict.get_sememes_by_word(word = '苹果', display='list', merge=False, expanded_layer=-1, K=None)
[{'sense': No.244396|apple|苹果,
  'sememes': {PatternValue|样式值, SpeBrand|特定牌子, able|能, bring|携带, computer|电脑}},
 {'sense': No.244397|apple|苹果, 
  'sememes': {fruit|水果}},
 {'sense': No.244398|IPHONE|苹果,
  'sememes': {PatternValue|样式值, SpeBrand|特定牌子, able|能, bring|携带, communicate|交流, tool|用具}},
 {'sense': No.244399|apple|苹果,
  'sememes': {PatternValue|样式值, SpeBrand|特定牌子, able|能, bring|携带, communicate|交流, tool|用具}},
 {'sense': No.244400|iphone|苹果,
  'sememes': {PatternValue|样式值, SpeBrand|特定牌子, able|能, bring|携带, communicate|交流, tool|用具}},
 {'sense': No.244401|apple|苹果, 
  'sememes': {fruit|水果, reproduce|生殖, tree|树}},
 {'sense': No.244402|malus pumila|苹果,
  'sememes': {fruit|水果, reproduce|生殖, tree|树}},
 {'sense': No.244403|orchard apple tree|苹果,
  'sememes': {fruit|水果, reproduce|生殖, tree|树}}]

By changing display , the sememes of a sense can be displayed in list form(list), dictionary form(dict), tree node form(tree) and visualization form(visual).

# Get the sememes in the form of dictionary
>>> hownet_dict.get_sememes_by_word(word='苹果',display='dict')[0]
{'sense': No.244396|apple|苹果, 'sememes': {'role': 'sense', 'name': No.244396|apple|苹果, 'children': [{'role': 'None', 'name': computer|电脑, 'children': [{'role': 'modifier', 'name': PatternValue|样式值, 'children': [{'role': 'CoEvent', 'name': able|能, 'children': [{'role': 'scope', 'name': bring|携带, 'children': [{'role': 'patient', 'name': '$'}]}]}]}, {'role': 'patient', 'name': SpeBrand|特定牌子}]}]}}

# Get the sememes in the form of tree node (get the root node of the sememe tree)
>>> d.get_sememes_by_word(word='苹果',display='tree')[0]
{'sense': No.244396|apple|苹果, 'sememes': Node('/No.244396|apple|苹果', role='sense')}

# Visualize the sememes (Set K to control the num of visualized tree to print)
>>> d.get_sememes_by_word(word='苹果',display='visual',K=2)
Find 8 result(s)
Display #0 sememe tree
[sense]No.244396|apple|苹果
└── [None]computer|电脑
    ├── [modifier]PatternValue|样式值
    │   └── [CoEvent]able|能
    │       └── [scope]bring|携带
    │           └── [patient]$
    └── [patient]SpeBrand|特定牌子

Display #1 sememe tree
[sense]No.244397|apple|苹果
└── [None]fruit|水果

Besides, when display=='list' , you can choose to merge all the sememe lists into one and limit the expand layer of the sememe trees by changing the parameter expanded_layer(-1 means expanding all layers).

>>> hownet_dict.get_sememes_by_word(word = '苹果', display='list', merge=True, expanded_layer=-1, K=None)
{PatternValue|样式值, SpeBrand|特定牌子, able|能, bring|携带, communicate|交流, computer|电脑, fruit|水果,
 reproduce|生殖, tool|用具, tree|树}

Get Relationship Between Two Sememes

You can get the relationship between two sememes by inputting the words (English or Chinese) that represent the sememes. You can choose to show the triplets of (sememe1, relation, sememe2).

>>> relations = hownet_dict.get_sememe_relation('FormValue','圆', return_triples=False)
>>> print(relations)
'hyponym'

>>> triples = hownet_dict.get_sememe_relation('FormValue','圆', return_triples=True)
>>> print(triples)
[(FormValue|形状值, 'hyponym', round|圆)]

Get Related Sememes with a Sememe

You can search all the sememes that have a certain relation with a sememe. Similarly, a sememe should be represented by a word (English or Chinese), but the relation must be in lowercase English.

>>> triples = hownet_dict.get_related_sememes('FormValue', relation = 'hyponym',return_triples=True)
>>> print(triples)
[(FormValue|形状值, 'hyponym', round|圆), (FormValue|形状值, 'hyponym', unformed|不成形), (AppearanceValue|外观值, 'hyponym', FormValue|形状值), (FormValue|形状值, 'hyponym', angular|角), (FormValue|形状值, 'hyponym', square|方), (FormValue|形状值, 'hyponym', netlike|网), (FormValue|形状值, 'hyponym', formed|成形)]

Advanced Features

1: Sememe-based Word Similarity and Similar Words

The implementation is based on the paper:

Jiangming Liu, Jinan Xu, Yujie Zhang. An Approach of Hybrid Hierarchical Structure for Word Similarity Computing by HowNet. In Proceedings of IJCNLP 2013. [pdf]

Extra Initialization

Because there are some files required to be loaded for similarity calculation, the initialization overhead will be larger than before.

To begin with, you can initialize the hownet_dict object as follows:

>>> hownet_dict_advanced = OpenHowNet.HowNetDict(init_sim=True)
Initializing OpenHowNet succeeded!
Initializing similarity calculation succeeded!

You can also postpone the initialization of similarity calculation until use.

>>> hownet_dict.initialize_similarity_calculation()
Initializing similarity calculation succeeded!

Get senses that have exactly the same sememes

You can get senses that have the same sememe-based definition with a sense.

>>> s = hownet_dict_advanced.get_sense('苹果')[0]
>>> hownet_dict_advanced.get_sense_synonyns(s)[:10]
[No.110999|pear|山梨, No.111007|hawthorn|山楂, No.111009|haw|山楂树, No.111010|hawthorn|山楂树, No.111268|Chinese hawthorn|山里红, No.122955|Pistacia vera|开心果树, No.122956|pistachio|开心果树, No.122957|pistachio tree|开心果树, No.135467|almond tree|扁桃, No.154699|fig|无花果]

Get top-K nearest words for a word

The package search for senses that are represented by the given word, obtains the nearest top-K senses, and returns the corresponding words. Note that the language of the given word should be set.

You can also set the POS of words, choose to output the similarity, and merge all words belonging to difference senses into a single list, etc. Please see the documentation for more information.

If the input word is not in HowNet, the api returns an empty list.

>>> hownet_dict_advanced.get_nearest_words('苹果', language='zh',K=5)
{No.244396|apple|苹果: ['IBM', '东芝', '华为', '戴尔', '索尼'],
 No.244397|apple|苹果: ['丑橘', '乌梅', '五敛子', '凤梨', '刺梨'],
 No.244398|IPHONE|苹果: ['OPPO', '华为', '苹果', '智能手机', '彩笔'],
 No.244399|apple|苹果: ['OPPO', '华为', '苹果', '智能手机', '彩笔'],
 No.244400|iphone|苹果: ['OPPO', '华为', '苹果', '智能手机', '彩笔'],
 No.244401|apple|苹果: ['山梨', '山楂', '山楂树', '山里红', '开心果树'],
 No.244402|malus pumila|苹果: ['山梨', '山楂', '山楂树', '山里红', '开心果树'],
 No.244403|orchard apple tree|苹果: ['山梨', '山楂', '山楂树', '山里红', '开心果树']}
>>> hownet_dict_advanced.get_nearest_words('苹果', language='zh',K=5, merge=True)
['IBM', '东芝', '华为', '戴尔', '索尼']

Calculate the similarity between two words

If either of the two given words does not exist in HowNet, it will return -1.

>>> print('The similarity of 苹果 and 梨 is {}.'.format(hownet_dict_advanced.calculate_word_similarity('苹果','梨')))
The similarity of 苹果 and 梨 is 1.0.

2: BabelNet Synset Dictionary

This package integrates query function for information of synsets in BabelNet (BabelNet synset). BabelNet is a multilingual encyclopedia dictionary composed of BabelNet synsets, each of which contains some multilingual synonyms that have the same meaning. The following work annotates sememes for some BabelNet synsets, and the function in this part is based on its annotation results.

Towards Building a Multilingual Sememe Knowledge Base: Predicting Sememes for BabelNet Synsets. Fanchao Qi, Liang Chang, Maosong Sun, Sicong Ouyang and Zhiyuan Liu. AAAI-20. [pdf] [code]

Extra Initialization

To begin with, you should initialize the BabelNet synset dictionary:

>>> hownet_dict.initialize_babelnet_dict()
Initializing BabelNet synset Dict succeeded!
# Or you can initialize when create the HowNetDict instance
>>> hownet_dict_advanced = HowNetDict(init_babel=True)
Initializing OpenHowNet succeeded!
Initializing BabelNet synset Dict succeeded!

BabelNet synset information

The following API allows you to query the rich information in a BabelNet synset (Chinese and English synonyms, definitions, picture urls, etc.).

>>> syn_list = hownet_dict_advanced.get_synset('黄色')
>>> print("{} results are retrieved and take the first one as an example".format(len(syn_list)))
3 results are retrieved and take the first one as an example

>>> syn_example = syn_list[0]
>>> print("Synset: {}".format(syn_example))
Synset: bn:00113968a|yellow|黄

>>> print("English synonyms: {}".format(syn_example.en_synonyms))
English synonyms: ['yellow', 'yellowish', 'xanthous']

>>> print("Chinese synonyms: {}".format(syn_example.zh_synonyms))
Chinese synonyms: ['黄', '黄色', '淡黄色+的', '黄色+的', '微黄色', '微黄色+的', '黄+的', '淡黄色']

>>> print("English glosses: {}".format(syn_example.en_glosses))
English glosses: ['Of the color intermediate between green and orange in the color spectrum; of something resembling the color of an egg yolk', 'Having the colour of a yolk, a lemon or gold.']

>>> print("Chinese glosses: {}".format(syn_example.zh_glosses))
Chinese glosses: ['像丝瓜花或向日葵花的颜色。']

BabelNet synset relations

You can get the related BabelNet synsets with a given synset.

>>> related_synsets = syn_example.get_related_synsets()
>>>print("There are {} synsets that have relation with the {}, they are: ".format(len(related_synsets), syn_example))
There are 6 synsets that have relation with the bn:00113968a|yellow|黄, they are: 

>>>print(related_synsets)
[bn:00099663a|chromatic|彩色, bn:00029925n|egg_yolk|蛋黄, bn:00092876v|resemble|相似, bn:00020726n|color|颜色, bn:00020748n|visible_spectrum|可见光, bn:00081866n|yellow|黄色]

Get sememe annotations of a BabelNet synset

You can get the sememes of BabelNet synsets by inputting the word in the BabelNet synsets:

>>> print(hownet_dict_advanced.get_sememes_by_word_in_BabelNet('黄色'))
[{'synset': bn:00113968a|yellow|黄, 'sememes': [yellow|黄]}, {'synset': bn:00101430a|dirty|淫秽的, 'sememes': [lascivious|淫, dirty|龊, despicable|卑劣, BadSocial|坏风气]}, {'synset': bn:00081866n|yellow|黄色, 'sememes': [yellow|黄]}]

>>> print(hownet_dict_advanced.get_sememes_by_word_in_BabelNet('黄色',merge=True))
[lascivious|淫, despicable|卑劣, BadSocial|坏风气, dirty|龊, yellow|黄]

For more detailed instructions, please refer to the documentation.

Citation

If the code or data help you, please cite the following paper:

@article{qi2019openhownet,
  title={Openhownet: An open sememe-based lexical knowledge base},
  author={Qi, Fanchao and Yang, Chenghao and Liu, Zhiyuan and Dong, Qiang and Sun, Maosong and Dong, Zhendong},
  journal={arXiv preprint arXiv:1901.09957},
  year={2019}
}

openhownet's People

Contributors

Stargazers

Watchers

Forkers

kakaxi2shi huache jfdw currylym luojie-roger fishguysword boluoyu allensmile bangliu ryfan-rs vangogh0318 zhengsai168 hxcomet langfangctt iamlxb3 li-jp cosecant-csc ashalathaanke lrxzhy xuanzxx chenmoshushi chenny0808 lmx-hub whoisleilei himmelstein peiguijun gutengzczy gasbarroni8 barryzm yanchm rhtrht bluan2019 hongbo-sun jun-jie-huang markhsia lightjake michael-wzhu xiaoanshi weiziyao1130 tntxhy wwwwwyj xuxueshan88 koios-sh freedomkite elias-x axu4github minghui-hou logiai blmoistawinde samuel-12138 ringotc dragon77777 sunyusheng eecrazy dnimo 1208332367 fengrk dxsooo raghavjha01 sjyttkl xwixcn zhangxt potential2015 liu-nlper ouyoung techthiyanes lvcc2018 448523760 waow123 jalork liuyijiang1994 immasmiley minghsuanwu sukeey liubin18911671739 linjianhe0309 zoechen119 li793067761 gaodan223 dumpmemory gg-big-org pandaupc alimumu zzu-hzc juqiangj ouweiquan gengabcd qinghao-guan

openhownet's Issues

OpenHowNet.download()下载失败

在GPU服务器中调用API。下载的版本为OpenHowNet2.0
报错ConnectionError，请问该如何解决呢？
/OpenHowNet/resources.zip 该资源是否提供其他途径下载？谢谢！

python3.6.2 win10上hownet_dict.get_all_sememes() 有编码错误

UnicodeDecodeError Traceback (most recent call last)
in ()
----> 1 hownet_dict.get_all_sememes()

~\Anaconda3\envs\tensorflow\lib\site-packages\OpenHowNet\Standards.py in get_all_sememes(self)
247 package_directory = os.path.dirname(os.path.abspath(file))
248 f = get_resource("sememe_all.txt", 'r')
--> 249 buf = f.readlines()[0]
250 self.sememe_all = buf.strip().split()
251 return self.sememe_all

UnicodeDecodeError: 'gbk' codec can't decode byte 0xa7 in position 22: illegal multibyte sequence

get_resource函数里加上encoding参数才能对

Code error: .get_sememes_by_word

If you run the 'obj.get_sememes_by_word', you will get a TypeError:
TypeError: unsupported operand type(s) for |=: 'set' and 'list'
"res = set()
sememe_x = self.get_sememe(x, strict=strict)
for s_x in sememe_x:
res |= s_x.get_senses()
return list(res)"

You need add set() to s_x.get_senses().
"res = set()
sememe_x = self.get_sememe(x, strict=strict)
for s_x in sememe_x:
res |= set(s_x.get_senses())
return list(res)"

一个义原标注小错误

您好，我在进行义原中英文对照的时候发现本来就对应中文义原“文莱”的英文义原“Brunei”还有对应了另一个中文义原“乌干达”。我在OpenHowNet/OpenHowNet/HowNet_dict.zip文件中也找到了源头：（从HowNet_original_new.txt文件第2032729行开始）

NO.=169394
W_C=乌干达先令
G_C=noun [2�unit] [wu1 gan1 da2 xian1 ling4]
S_C=
E_C=
W_E=New Uganda Shilling
G_E=noun [53Shilling�noun�-0�unit�00 ]
S_E=
E_E=
DEF={money|货币:belong="Brunei|乌干达",domain={commerce|商业}{finance|金融}}
RMK=

Brunei应该是专指文莱的专有名词。

GPU running

你好，我有大量数据希望能通过OpenHowNet进行查询，不知道它是否能够部署到gpu上进行哪？如果可以能不能简单说明一下谢谢

生成的pickle文件夹下的三个文件是错误的

我在使用解压到pickle文件夹下的hownet.pkl这个文件的时候，发现一些词返回的义项树是空的，只包含第一个义素，后边的children是空的，然后一些词的相似度排序也明显的不对，比如“男人”，返回的词是一些诸如“伕”、“伕役”、“俤”这样的词。
然后，我执行submit_user文件夹下的main.py文件重新生成了hownet.pkl等文件，义项树就不是空的了，“男人”的相似词也正常了。

sememe_sim_table.pkl

这个相似度是如何构建的？

TypeError: get_sememes_by_word() got an unexpected keyword argument 'structured'

您好，
我在运行SememeWSD data_util.py这里的代码时在 File "data_util.py", line 44, in gen_sem_dict tree = hownet_dict.get_sememes_by_word(word,structured=True,lang='zh',merge= False)有一个报错: TypeError: get_sememes_by_word() got an unexpected keyword argument 'structured'. 不确定这个是不是HowNet的问题

get_sense_synonyns typo

s = hownet_dict_advanced.get_sense('苹果')[0]
hownet_dict_advanced.get_sense_synonyns(s)[:10]
[No.110999|pear|山梨, No.111007|hawthorn|山楂, No.111009|haw|山楂树, No.111010|hawthorn|山楂树, No.111268|Chinese hawthorn|山里红, No.122955|Pistacia vera|开心果树, No.122956|pistachio|开心果树, No.122957|pistachio tree|开心果树, No.135467|almond tree|扁桃, No.154699|fig|无花果]

get_sense_synonyns 应为 get_sense_synonyms

python3.6的脚本中运行到OpenHowNet.download()一行，出现“assertionError”错误

python3.6的脚本中运行到OpenHowNet.download()一行，开始出现下载data的进度条，然后出现了“assertionError”错误，请问这是为什么呢？

有大量数据重复

需要对数据进行清洗，输出数据重复了四次。

hownet_dict.visualize_sememe_trees("爱情", K=10)
Find 4 result(s)
Display #0 sememe tree
[sense]爱情
└── [None]emotion|情感
└── [CoEvent]BeInLove|恋爱
Display #1 sememe tree
[sense]爱情
└── [None]emotion|情感
└── [CoEvent]BeInLove|恋爱
Display #2 sememe tree
[sense]爱情
└── [None]emotion|情感
└── [CoEvent]BeInLove|恋爱
Display #3 sememe tree
[sense]爱情
└── [None]emotion|情感
└── [CoEvent]BeInLove|恋爱

知网核心数据下载不了

https://openhownet.thunlp.org/verified_download
请问下OpenHowNet官网认证后下载知网核心数据下载不了是什么原因呢？

使用OpenHowNet对动词进行分类

各位老师好！

我的研究课题是汉语及中介语里介词“对”和“向”及其搭配的动词的使用情况。目前，我已经将动词从语料库中提取出来了，每个语料库提取出有效动词的数量大概有在3、4千左右。接下来，我希望可以做到的是把动词按照语义和功能分类。

虽然这些动词的数量并不多，我本可以人工进行分类，可是我想让自己的研究少一些人为的主观干涉，并建立在更科学的研究方法上。因此，我想使用OpenHowNet动词数据集来对动词按照语义来分类。

从目前我在Python里运行的情况来看，我只学会了怎么将两个词进行对比。我想请教的是如何把我的几个动词在OpenHowNet里一次性分类呢？

我对NLP的基础基本是零，如果我问了一个非常愚蠢的问题还请不要笑话我。

十分感谢！

此外，好像现在知网（HowNet）核心数据的下载链接是无效的。

如何查询所有英文义原？

你好，API提供的get_all_semems()好像只能得到所有中文的义原，如果是要获得所有英文义原，要怎么处理呢？

关于hownet词的义原标注格式的说明

目前，我只在一篇2002年的论文中看到相关的说明《基于《知网》的词汇语义相似度计算》。感觉有点过时。项目官网上也没有最新的说明文档。

为什么我初始化之后字典是空的

我在linux和windows上都试了，似乎没载入。我是照着文档上操作的，有啥需要注意的地方吗

计算词语相似度，当输入为英语时，显示not annotated.

我尝试用calculate_word_similarity函数去进行相似度计算，但输入为英文词汇时，总是提示英语单词not annotated。请问这个问题该怎么解决呀。

核心数据下载链接丢失；知网官网无法进入

为什么都不能用了，出什么问题了

OpenHowNet的sense问题

能否用OpenHowNet查询到每个sense对应的embedding呢？我找到了Improved Word Representation Learning with Sememes 这篇文章的一些学习结果，但是义项数目和相应的义原和OpenHowNet不是完全对应的，如果我想用OpenHowNet标注的义项，能否有什么方法可以获取它的embedding向量呢？

知网中的单词没有注音吗

请问可以将拼音也加在hownet里面的单词上面吗？一般意思确定了，单词的注音也确定了。知道了注音也可以帮助我们做一些其他的事情。

hownet版本问题

您好，请问这里面的是2012版的hownet吗？它和2011版有多大区别？我看一篇论文里用2011版，词的ID与这个hownet里不一致。

义原解析

如何把义原对应到ＰａｒｔＰｏｓｉ－ｔｉｏｎ、ｄｏｍａｉｎ、ｗｈｏｌｅ、ｈｏｓｔ和ｍｏｄｉｆｉｅｒ

请问可以开源最新上下位关系吗

请问Hownet中的上下位关系有开源吗, 包括实体上下位,事件上下位, 属性上下位等

Synonym extraction based on similarity

Hi, I am pretty interested in looking into the synonym extractions based on the sememe tree similarity using HowNet. I am wondering whether you or the original authors of the HowNet have benchmarked this method on some standard similarity evaluation dataset such as the SimLex-999 dataset and compared this method with some other popular methods for synonym extractions such as counter-fitting word embeddings. It would be great to have your thoughts on this topic. Thanks a lot!

Code error: all_senses = hownet_dict.get_all_sense()

The objects of hownet_dict is 'get_all_senses', not 'get_all_sense'.
All objects follow as:
['_HowNetDict__gen_sememe_list', '_HowNetDict__get_words_list_by_rule', '_HowNetDict__sense_similarity', 'calculate_word_similarity', 'en_map', 'get_all_babel_synsets', 'get_all_sememe_relations', 'get_all_sememes', 'get_all_sense_pos', 'get_all_senses', 'get_all_synset_pos', 'get_all_synset_relations', 'get_en_words', 'get_nearest_words', 'get_related_sememes', 'get_related_synsets', 'get_sememe', 'get_sememe_relation', 'get_sememes_by_word', 'get_sememes_by_word_in_BabelNet', 'get_sense', 'get_sense_synonyms', 'get_senses_by_sememe', 'get_synset', 'get_synset_relation', 'get_zh_words', 'has', 'initialize_babelnet_dict', 'initialize_similarity_calculation', 'sememe_dic', 'sense_dic', 'zh_map']

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.