ntunlplab Goto Github PK

Name: NTU NLP Lab

Type: Organization

Bio: Natural Language Processing Laboratory, National Taiwan University

Location: Taiwan

NTU NLP Lab's Projects

VidLife contains personal life events with triple forms from The Big Bang Theory, eg. (Leonard, visit, Penny), which is designed for training and evaluating personal life event extraction systems. You could download part of the life events annotations, which is released in this repository. The complete dataset will be made available online after our paper is accepted.

lifelog-vislife

Recently, people tend to record their daily life via filming Video Weblog (VLog), which contains visual and audio data. These large scale multimodal data can be used to support information recall service that enables users to query their past experiences. To this end, we construct a visual lifelogging dataset for investigating the issues of personal life event extraction from vlogs shared on YouTube and constructing a personal knowledge base (PKB) for individuals. There are 1,733 videos from three selected YouTubers ranging from 2016 to 2019. The videos we crawled are all about traveling.

mobile01-corpus

Introduction Datasets are indispensable for the study of opinion spam/spammer detection. However, to prepare such a kind of dataset is more difficult than that for the other types of spam detection tasks such as email spams and web spams due to the subtle nature of opinion spams. This website provides a dataset for a real case, Samsung probed in Taiwan over ‘fake web reviews’, reported by BBC on 16 April 2013. It can be used to study the behaviors of opinion spammers, their interactions in terms of first posts and replies, and the detection tasks.

nl2kb

A total of 7,139 Chinese relation patterns that cover 1,087 DBpedia properties are extracted and verified by human annotators. This resource can be used for knowledge base construction and knowledge base retrieval (e.g., question-answering).

ntu-chinese-causal-corpus

A Chinese causal corpus containing 1,314 pairs of arguments based on the Chinese Discourse Treebank (CDTB) by Li et al. (2014).

ntu-english-tense-predictor

A rule-based English tense predictor based on the output of the dependency parser like Stanford CoreNLP.

ntu-irony-corpus

The NTU Irony Corpus consists of more than 1,000 microblog messages collected from the Plurk website. All the messages in the corpus are in Traditional Chinese and have been confirmed to be ironic. They are marked with three types of labels: (1) ironic word/phrase , (2) context, and (3) rhetoric element.

ntunlp-imagegallery

提供台大AI中心共享平台圖片。

ntusd

Sentiment words are employed to compute the tendency of a sentence, and then a document. To detect sentiment words in Chinese documents, a Chinese sentiment dictionary is indispensable. However, a small dictionary may suffer from the problem of coverage. A method to learn sentiment words and their strengths from multiple resources is developed in this task.

prrca

Peer Review and Rebuttal Counter-Arguments Dataset

seen

SEEN: Structured Event Enhancement Network for Explainable Need Detection of Information Recall Assistance

self-icl

traditional-chinese-alpaca

A Traditional-Chinese instruction-following model with datasets based on Alpaca.

tw-eh

Learning to Generate Explanation from e-Hospital Services for Medical Suggestion

wsd-gensense

With the aid of recently proposed word embedding algorithms, the study of semantic similarity has progressed and advanced rapidly. However, many natural language processing tasks need sense level representation. To address this issue, some researches propose sense embedding learning algorithms. In this paper, we present a generalized model from existing sense retrofitting model. The generalization takes three major components: semantic relations between the senses, the relation strengths and the semantic strengths. In the experiment, we show that the generalized model can outperform previous approaches in three types of experiment: semantic relatedness, contextual word similarity and semantic difference.

wsd-msd-1030

A word similarity dataset with high proportion of multi-sense words that is designed to facilitate more reliable evaluations of sense embeddings.

ntunlplab Goto Github PK

NTU NLP Lab's Projects

Recommend Projects

Recommend Topics

Recommend Org