Giter Club home page Giter Club logo

sports-reporter's Introduction

Sports Reporter

Conference arXiv Poster

Python code for Learning to Select, Track, and Generate for Data-to-Text (Iso et al; ACL 2019).

Resources

Rotowire-modified dataset

Please refer to rotowire-modified repo.

Usage

Dependencies

  • The code was written for Python 3.X and requires DyNet.
  • Dependencies can be installed using requirements.txt.
  • For running information extractor, you should install torch.

Preprocessing

Before starting an experiment, you should run our provided setup.sh.

./setup.sh

After that, you can make the annotation file for training data via information extractor:

cd ./data2text-1
cat ../rotowire_v2/train.json | python -c 'import sys, json, nltk; print("\n".join(" ".join(nltk.word_tokenize(" ".join(x["summary"]))) for x in json.load(sys.stdin)))' > ../rotowire_v2/train_summary.txt
python data_utils.py -mode prep_gen_data -gen_fi ../rotowire_v2/train_summary.txt -dict_pfx "rotowire-modified-ie" -output_fi train_gold.h5 -input_path "../rotowire_v2" -train
th extractor.lua -gpuid 1 -datafile rotowire-modified-ie.h5 -preddata train_gold.h5 -dict_pfx "rotowire-modified-ie" -just_eval

Then, you can see the annotation file train_gold.h5-tuples.txt and make a vocab file for training.

cd ..
VOCAB=<path to the vocablary file>
python make_data.py ./rotowire_v2 ./data2text-1/train_gold.h5-tuples.txt $VOCAB

Train model

python reporter.py train $VOCAB --valid_file ./rotowire_v2/valid.json

Decode

MODEL=<path to the trained model file>
python reporter.py decode $VOCAB $MODEL ./rotowire_v2/test.json

Updated Results for RotoWire-modified

without writer info RG (P% / #) CS (P% / R%) CO BLEU
Joint+Rec+TVD (B=5) 18.09 / 48.54 23.24 / 28/92 14.47 15.34
Conditional (B=5) 20.28 / 61.76 27.20 / 29.76 15.88 15.26
Puduppully+, AAAI'19 82.55 / 34.05 32.30 / 43.74 16.67 14.82
Puduppully+, ACL'19 91.13 / 32.41 37.05 / 43.06 20.62 15.23
Iso+, ACL'19 91.98 / 31.66 40.44 / 46.63 21.56 15.74
with writer info RG (P% / #) CS (P% / R%) CO BLEU
Puduppully+, AAAI'19 82.55 / 34.05 32.30 / 43.74 16.67 14.82
+ stage 1 85.54 / 30.26 42.33 / 49.38 21.26 18.01
+ stage 2 83.35 / 32.42 33.28 / 42.92 16.73 16.57
+ stage 1 & 2 84.09 / 28.16 43.63 / 47.75 21.96 18.57
Iso+, ACL'19 91.98 / 31.66 40.44 / 46.63 21.56 15.74
+ writer 93.32 / 29.44 51.76 / 55.21 24.97 20.62

License and References

This code is available under the MIT Licence, see LICENCE

When you write a paper using this code, please cite the followings.

@InProceedings{Iso2019Learning,
    author = {Iso, Hayate
              and Uehara, Yui
              and Ishigaki, Tatsuya
              and Noji, Hiroshi
              and Aramaki, Eiji
              and Kobayashi, Ichiro
              and Miyao, Yusuke
              and Okazaki, Naoaki
              and Takamura, Hiroya},
    title = {Learning to Select, Track, and Generate for Data-to-Text},
    booktitle = {Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (ACL)},
    year = {2019}
  }

Author

@isomap

sports-reporter's People

Contributors

isomap avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

Forkers

strategist922

sports-reporter's Issues

Annotation file

Where is the text annotation file? Or what is the name ? thanks

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.