Giter Club home page Giter Club logo

NTU NLP Lab's Projects

lifelog-vidlife icon lifelog-vidlife

VidLife contains personal life events with triple forms from The Big Bang Theory, eg. (Leonard, visit, Penny), which is designed for training and evaluating personal life event extraction systems. You could download part of the life events annotations, which is released in this repository. The complete dataset will be made available online after our paper is accepted.

lifelog-vislife icon lifelog-vislife

Recently, people tend to record their daily life via filming Video Weblog (VLog), which contains visual and audio data. These large scale multimodal data can be used to support information recall service that enables users to query their past experiences. To this end, we construct a visual lifelogging dataset for investigating the issues of personal life event extraction from vlogs shared on YouTube and constructing a personal knowledge base (PKB) for individuals. There are 1,733 videos from three selected YouTubers ranging from 2016 to 2019. The videos we crawled are all about traveling.

mobile01-corpus icon mobile01-corpus

Introduction Datasets are indispensable for the study of opinion spam/spammer detection. However, to prepare such a kind of dataset is more difficult than that for the other types of spam detection tasks such as email spams and web spams due to the subtle nature of opinion spams. This website provides a dataset for a real case, Samsung probed in Taiwan over ‘fake web reviews’, reported by BBC on 16 April 2013. It can be used to study the behaviors of opinion spammers, their interactions in terms of first posts and replies, and the detection tasks.

nl2kb icon nl2kb

A total of 7,139 Chinese relation patterns that cover 1,087 DBpedia properties are extracted and verified by human annotators. This resource can be used for knowledge base construction and knowledge base retrieval (e.g., question-answering).

ntu-chinese-causal-corpus icon ntu-chinese-causal-corpus

A Chinese causal corpus containing 1,314 pairs of arguments based on the Chinese Discourse Treebank (CDTB) by Li et al. (2014).

ntu-irony-corpus icon ntu-irony-corpus

The NTU Irony Corpus consists of more than 1,000 microblog messages collected from the Plurk website. All the messages in the corpus are in Traditional Chinese and have been confirmed to be ironic. They are marked with three types of labels: (1) ironic word/phrase , (2) context, and (3) rhetoric element.

ntusd icon ntusd

Sentiment words are employed to compute the tendency of a sentence, and then a document. To detect sentiment words in Chinese documents, a Chinese sentiment dictionary is indispensable. However, a small dictionary may suffer from the problem of coverage. A method to learn sentiment words and their strengths from multiple resources is developed in this task.

prrca icon prrca

Peer Review and Rebuttal Counter-Arguments Dataset

seen icon seen

SEEN: Structured Event Enhancement Network for Explainable Need Detection of Information Recall Assistance

tw-eh icon tw-eh

Learning to Generate Explanation from e-Hospital Services for Medical Suggestion

wsd-gensense icon wsd-gensense

With the aid of recently proposed word embedding algorithms, the study of semantic similarity has progressed and advanced rapidly. However, many natural language processing tasks need sense level representation. To address this issue, some researches propose sense embedding learning algorithms. In this paper, we present a generalized model from existing sense retrofitting model. The generalization takes three major components: semantic relations between the senses, the relation strengths and the semantic strengths. In the experiment, we show that the generalized model can outperform previous approaches in three types of experiment: semantic relatedness, contextual word similarity and semantic difference.

wsd-msd-1030 icon wsd-msd-1030

A word similarity dataset with high proportion of multi-sense words that is designed to facilitate more reliable evaluations of sense embeddings.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.