ntunlplab Goto Github PK
Name: NTU NLP Lab
Type: Organization
Bio: Natural Language Processing Laboratory, National Taiwan University
Location: Taiwan
Name: NTU NLP Lab
Type: Organization
Bio: Natural Language Processing Laboratory, National Taiwan University
Location: Taiwan
VidLife contains personal life events with triple forms from The Big Bang Theory, eg. (Leonard, visit, Penny), which is designed for training and evaluating personal life event extraction systems. You could download part of the life events annotations, which is released in this repository. The complete dataset will be made available online after our paper is accepted.
Recently, people tend to record their daily life via filming Video Weblog (VLog), which contains visual and audio data. These large scale multimodal data can be used to support information recall service that enables users to query their past experiences. To this end, we construct a visual lifelogging dataset for investigating the issues of personal life event extraction from vlogs shared on YouTube and constructing a personal knowledge base (PKB) for individuals. There are 1,733 videos from three selected YouTubers ranging from 2016 to 2019. The videos we crawled are all about traveling.
Introduction Datasets are indispensable for the study of opinion spam/spammer detection. However, to prepare such a kind of dataset is more difficult than that for the other types of spam detection tasks such as email spams and web spams due to the subtle nature of opinion spams. This website provides a dataset for a real case, Samsung probed in Taiwan over ‘fake web reviews’, reported by BBC on 16 April 2013. It can be used to study the behaviors of opinion spammers, their interactions in terms of first posts and replies, and the detection tasks.
A total of 7,139 Chinese relation patterns that cover 1,087 DBpedia properties are extracted and verified by human annotators. This resource can be used for knowledge base construction and knowledge base retrieval (e.g., question-answering).
A Chinese causal corpus containing 1,314 pairs of arguments based on the Chinese Discourse Treebank (CDTB) by Li et al. (2014).
A rule-based English tense predictor based on the output of the dependency parser like Stanford CoreNLP.
The NTU Irony Corpus consists of more than 1,000 microblog messages collected from the Plurk website. All the messages in the corpus are in Traditional Chinese and have been confirmed to be ironic. They are marked with three types of labels: (1) ironic word/phrase , (2) context, and (3) rhetoric element.
提供台大AI中心共享平台圖片。
Sentiment words are employed to compute the tendency of a sentence, and then a document. To detect sentiment words in Chinese documents, a Chinese sentiment dictionary is indispensable. However, a small dictionary may suffer from the problem of coverage. A method to learn sentiment words and their strengths from multiple resources is developed in this task.
Peer Review and Rebuttal Counter-Arguments Dataset
SEEN: Structured Event Enhancement Network for Explainable Need Detection of Information Recall Assistance
A Traditional-Chinese instruction-following model with datasets based on Alpaca.
Learning to Generate Explanation from e-Hospital Services for Medical Suggestion
With the aid of recently proposed word embedding algorithms, the study of semantic similarity has progressed and advanced rapidly. However, many natural language processing tasks need sense level representation. To address this issue, some researches propose sense embedding learning algorithms. In this paper, we present a generalized model from existing sense retrofitting model. The generalization takes three major components: semantic relations between the senses, the relation strengths and the semantic strengths. In the experiment, we show that the generalized model can outperform previous approaches in three types of experiment: semantic relatedness, contextual word similarity and semantic difference.
A word similarity dataset with high proportion of multi-sense words that is designed to facilitate more reliable evaluations of sense embeddings.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.