Giter Club home page Giter Club logo

tacolm's Introduction

TacoLM

TemporAl COmmon Sense Language Model

A variation of BERT that is aware of temporal common sense.

Introduction

This is the code repository for our ACL 2020 paper Temporal Common Sense Acquisition with Minimal Supervision. This package is built upon huggingface/transformers at its April 2019 version.

Installation

  • pip install -r requirements.txt
  • pip install --editable .

Out of the box

Here are some things you can do with this package out of the box.

Train the main model

  • Access and download data/tmp_seq_data at Google Drive (4.6 G)
  • run sh train_taco_lm.sh

The script is set to default parameters and will export the model to models/. You can configure differently by editing the script.

The training process will generate one directory to store the loss logs, as well as NUM_EPOCH directories for each epoch's model. You will need to add BERT's vocab.txt to the epoch directories for evaluation. See more detail in the next section on pre-trained models.

The training data is pre-generated and formatted. More details here.

Experiments

You can download pre-trained models in models/ at Google Drive (0.4 G each), or follow the training procedure in the previous section.

General Usage

You can do many things with the model by just treating it as a set of transformer weights that fit exactly into a BERT-base model. Have an on-going project with BERT? Give it a try!

Intrinsic Experiments

The intrinsic evaluation relies on pre-formatted data.

  • run sh eval_intrinsic.sh
  • see eval_results/intrinsic.txt for results

TimeBank Experiment

  • by default this requires the epoch 2 model.
  • run sh eval_timebank.sh to produce evaluation results on 3 different seeds. They are by default stored under eval_results
  • run python scripts/eval_timebank.py to see result interpretations.

HiEVE Experiment

  • by default this requires the epoch 2 model.
  • run sh eval_hieve.sh to produce eval results under eval_results
  • run python scripts/eval_hieve.py to see interpretations.

MC-TACO Experiment

See MC-TACO.

  • use the augmented data under data/mctaco-tcs
  • use the transformer weights of taco_lm_epoch_2

Citation

See the following paper:

@inproceedings{ZNKR20,
    author = {Ben Zhou, Qiang Ning, Daniel Khashabi and Dan Roth},
    title = {Temporal Common Sense Acquisition with Minimal Supervision},
    booktitle = {ACL},
    year = {2020},
}

tacolm's People

Contributors

slash0bz avatar heglertissot avatar

Stargazers

WzhRslh avatar  avatar Kaito Sugimoto avatar  avatar  avatar CatherChen avatar Modestas Jurčius avatar Chao Shang avatar Mingyu Derek Ma avatar Hiromitsu OTA avatar Sijie Cheng avatar Bowen avatar Aaron Chan avatar Yunting Yin avatar Zonglin Yang avatar meizhiju avatar Tory S. Anderson avatar Harold avatar  avatar Yueqing Sun avatar

Watchers

Stephen Mayhew avatar James Cloos avatar Hiromitsu OTA avatar  avatar Sihao Chen avatar CogComp Dev avatar paper2code - bot avatar

tacolm's Issues

Problem about downstream task data.

The project works perfect in Bert setting, but unfortunately does not work for many other transformer models.

As I noticed, downstream task data is provided in processed formatted that is suitable for Bert, this limits implementations in other transformer models which uses different tokenization method that is different from Bert.

The pattern_extraction.py code seems only works for generating the pre-train TacoML data, but cannot process downstream tasks data and these downstream data in data directory is provided in processed format which means could only be used by Bert. For example , in augmented MC-TACO dataset, tokens like [unused7] did not appear in pattern_extraction.py , so I guess down stream task used a different extraction code.

This puts a dead end in reproducing these results using other transformer models. Any method in processing these downstream tasks data into the format that is suitable for other transformer models other than Bert ? Or any plan in releasing these downstream task processing codes and original data ? It is such a pity if this code only works for Bert :)

What are the gold label and dim in intrinsic data representing?

Hello,

In the data/intrinsic, there is data formatted in format "sentence" "target id" "gold label" "candidates" "dim".

I think "gold label" is the target's correct label which will be used in the evaluation part, however, I cannot find out what is the meaning of this number. For example, "[CLS] i do n ##’ ##t know what kind of money they are paying to you , but i do n ##’ ##t know how you sleep in the [MASK] . [SEP] 29 6 6440 2851 11501 5027 3944 18406 2305 7090 2" have 6 as a gold label, but what is this representing?

Also, about the "dim". I think dim's number is corresponding to DURATION, FREQUENCY, TYPICAL TIME, UPPERBOUND, and HIERARCHY in the paper, but I can not figure out which number corresponds to which. For example, in the above data, the "dim" is 2, and does it mean 2 is corresponding to TYPICAL TIME?

How to get candidates in training data of intrinsic?

Hello,

I am trying to predict different text data with the model, and find out some features needed for the model are unclear for me.

In the data/intrinsic, there is 4 training (evaluation) file for the model. Each line are in the format "sentence with MASK", "target id", "gold label", "candidates", "dim".

My understanding from observing the code is, "target id" is an index that points to the "MASK" of the sentence, "gold label" is the correct label of the data. However, I could not understand what "candidates" are representing.

For the prediction task, it looks like prediction are done in following way.

  1. Using (input_ids, segment_ids, input_mask) generated from "sentence" and, target_ids to get "cls" vector.
  2. reshape "cls" vector with cls.view(-1, 30522) which is called logits
  3. using logits and "candidates" to compute scores_in_order
  4. using scores_in_order to predict the predicted_relative_label_id which should be the model's prediction.

Since I want to get the predicted_relative_label_id as a prediction, I will need to get candidates, but I do not know how to get it.

What are candidates and how to get it from the text?

Format problem in the Google Drive Training Data

The paper's work on the temporal commensense is great, however, I have some problems with regard to the formatted tranning data that you provide in the Google Drive, here is the example:

[CLS] [MASK] [MASK] [MASK] [MASK] [MASK] [MASK] [MASK] [MASK] [MASK] [MASK] [MASK] [MASK] [MASK] [MASK] [MASK] [MASK] [MASK] [MASK] [SEP] [MASK] [MASK] liquid eye ##liner and leopard - print under [MASK] ##ments , her [MASK] [MASK] steel ##y [MASK] [MASK] thin [MASK] like [MASK] [MASK] [MASK] result of [MASK] 20 ##- [MASK] ##und [MASK] [MASK] [MASK] she [unused500] curls [MASK] [SEP] [MASK] [MASK] [MASK] [MASK] [MASK] [MASK] [MASK] [MASK] [MASK] [MASK] [MASK] [MASK] [MASK] [MASK] [MASK] [MASK] [MASK] [MASK] [MASK] [MASK] [MASK] [MASK] [MASK] [MASK] [SEP] [unused500] [unused502] [unused45] [SEP] -1 43 44 45 46 47 48 49 50 51 120 121 122 7.670003601399247e-05 0.010977975780067789 0.17749198395637333 0.3423587734906385 0.26762063340149095 0.1613272650199883 0.03558053856215351 0.004304815288057253 0.00026131446521643803 0.0 0.0 0.0 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 4218 2014 6381 -1 -1 -1 -1 -1 -1 -1 6843 8163 1010 -1 9128 2003 -1 -1 1010 2014 -1 2608 -1 14607 1010 1996 -1 -1 1996 -1 -1 6873 -1 3347 17327 2015 -1 -1 -1 1012 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1

1.I guess you masked the event in the sentence that I quoted here, but does the TacoLM could really predict so many masked words correctly? I don't think any human being could guess the [MASK] words in this sentence with so little information given.

2.The 'unuse' token that you described in your paper is used to construct 1-to-1 mapping to the new dictionary. But how could I know what the 'unuse' token really means?

  1. Why there is a always a number attached to the end of the sentence, like the '-1' attached to the '[SEP]' token ? In other exapmles. the number could be 79,43 and so on, what does this number actually mean?

4.After the '-1', there are still several numbers following, which based on the space between them, I don't think they are in the same group with the '-1' that I mentioned in Q3, what does these numbers mean?

5.What does the float number mean after these numbers?

6.The final numbers -1 -1 -1 ..... , I guess these are attention tokens? But it does not correspond to the HuggingFace's attention encoding, which is 1 for attention and 0 for no attention.

7.How does the whole tranning data form the (event, value, dimension) tuple in this case?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.