Giter Club home page Giter Club logo

discoeval's Introduction

DiscoEval

This repository contains the code for DiscoEval Evaluation Benchmarks and Learning Criteria for Discourse-Aware Sentence Representations (EMNLP 2019).

The structure of this repo:

  • train: the training code
  • discoeval: the Discourse Evaluation framework
  • data: the DiscoEval evaluation datasets

The pretrained models with different training signals can be downloaded from https://drive.google.com/file/d/1I0wFkNb2fmoC7kcj-FPxyVkHKyfX-MoX/view?usp=sharing

The training data (generated from Wikipedia) can be downloaded from

https://drive.google.com/open?id=1WPRJylC7PzLtYcg8-PMNX_ZUNNRkO3Bp

Evaluation example code

train/discoeval_example.py

The code is tested under the following environment/versions:

  • Python 3.6.2
  • PyTorch 1.0.0
  • numpy 1.16.0

Some code in this repo is adopted from SentEval.

Experiments

SP BSO DC SSP PDTB-E PDTB-I RST AVG
baseline 47.3 63.8 61.0 77.8 36.5 39.1 56.7 54.6
SDT 45.8 62.9 60.3 78.0 36.6 39.1 55.7 54.1
SPP 48.4 65.3 60.2 78.4 38.1 39.9 56.4 55.2
NL 46.9 64.0 61.0 78.9 37.6 39.9 56.5 55.0
SPP + NL 48.5 64.7 59.9 78.9 37.8 40.5 56.7 55.3
SDT + NL 46.1 63.0 60.8 78.1 36.7 38.1 56.2 54.1
SPP + SDT 46.5 63.9 60.4 77.6 35.2 38.6 56.3 54.1
ALL 46.1 63.7 60.0 78.6 36.3 37.6 55.3 53.9
Skipthought 47.5 64.6 55.2 77.5 39.3 40.2 59.7 54.8
InferSent 45.8 62.9 56.3 62.2 37.3 38.8 52.3 50.8
DisSent 47.7 64.9 54.8 62.2 42.2 40.7 57.8 52.9
ELMo 47.8 65.6 60.7 79.0 41.3 41.8 57.5 56.2
BERT base 53.1 68.5 58.9 80.3 41.9 42.4 58.8 57.7
BERT large 53.8 69.3 59.6 80.4 44.3 43.6 59.1 58.6

You may notice some difference from the above table with our camera-ready version appeared on EMNLP 2019. The differences are: we removed the hidden states in SSP (previously 2000 by mistake), we regenerated the SP dataset (previously the sentence orders were shuffled, now the sentences are in the original order except the first sentence).

Reference

@inproceedings{mchen-discoeval-19,
  author    = {Mingda Chen and Zewei Chu and Kevin Gimpel},
  title     = {Evaluation Benchmarks and Learning Criteria for Discourse-Aware Sentence Representations},
  booktitle = {Proc. of {EMNLP}},
  year      = {2019}
}

discoeval's People

Contributors

mingdachen avatar zeweichu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

discoeval's Issues

Diffierences between the results produced by the releasing code and the ones shown in the paper

Thanks for exposing the code. When I tried your code in '\examples' where the bert model is to be tested, the results I got is somehow diffierent from the ones released in your paper.
I got the following results:

ย  BSOarxiv BSOroc BSOwiki SParxiv SProc SPwiki DCchat DCwiki SSPabs PDTB-E PDTB-I RST
Bert base 64.56 69.16 67.42 44.67 60.55 48.8 55.09 65.5 79.9 41.51 40.38 55.81
The environment I used is as follows: python 3.7.10 torch 1.8.0+cu111

Clarification between your codes and the paper on NSP

Hi,

Thank you for releasing your codes! I am trying to train an encoder with the sentence position task, and I noticed that in your paper, you have neighbor sentence prediction (NSP) as part of your training objectives. I wonder if you could point me to the NSP implementation in your codes.

Also, I know that DiscoEval is designed to be an evaluation toolkit for pre-trained encoder, but do you think it is possible to start training from scratch using the DiscoEval framework?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.