cogcomp / tacolm Goto Github PK

View Code? Open in Web Editor NEW

20.0 7.0 4.0 964 KB

Temporal Common Sense Acquisition with Minimal Supervision, ACL'20

Shell 3.09% Python 96.91%

tacolm's Introduction

TacoLM

TemporAl COmmon Sense Language Model

A variation of BERT that is aware of temporal common sense.

Introduction

This is the code repository for our ACL 2020 paper Temporal Common Sense Acquisition with Minimal Supervision. This package is built upon huggingface/transformers at its April 2019 version.

Installation

pip install -r requirements.txt
pip install --editable .

Out of the box

Here are some things you can do with this package out of the box.

Train the main model

Access and download data/tmp_seq_data at Google Drive (4.6 G)
run sh train_taco_lm.sh

The script is set to default parameters and will export the model to models/. You can configure differently by editing the script.

The training process will generate one directory to store the loss logs, as well as NUM_EPOCH directories for each epoch's model. You will need to add BERT's vocab.txt to the epoch directories for evaluation. See more detail in the next section on pre-trained models.

The training data is pre-generated and formatted. More details here.

Experiments

You can download pre-trained models in models/ at Google Drive (0.4 G each), or follow the training procedure in the previous section.

General Usage

You can do many things with the model by just treating it as a set of transformer weights that fit exactly into a BERT-base model. Have an on-going project with BERT? Give it a try!

Intrinsic Experiments

The intrinsic evaluation relies on pre-formatted data.

run sh eval_intrinsic.sh
see eval_results/intrinsic.txt for results

TimeBank Experiment

by default this requires the epoch 2 model.
run sh eval_timebank.sh to produce evaluation results on 3 different seeds. They are by default stored under eval_results
run python scripts/eval_timebank.py to see result interpretations.

HiEVE Experiment

by default this requires the epoch 2 model.
run sh eval_hieve.sh to produce eval results under eval_results
run python scripts/eval_hieve.py to see interpretations.

MC-TACO Experiment

See MC-TACO.

use the augmented data under data/mctaco-tcs
use the transformer weights of taco_lm_epoch_2

Citation

See the following paper:

@inproceedings{ZNKR20,
    author = {Ben Zhou, Qiang Ning, Daniel Khashabi and Dan Roth},
    title = {Temporal Common Sense Acquisition with Minimal Supervision},
    booktitle = {ACL},
    year = {2020},
}

tacolm's People

Contributors

Stargazers

Watchers

Forkers

pososagapo anjanatiha trellixvulnteam lim-stanley

tacolm's Issues

Problem about downstream task data.

The project works perfect in Bert setting, but unfortunately does not work for many other transformer models.

As I noticed, downstream task data is provided in processed formatted that is suitable for Bert, this limits implementations in other transformer models which uses different tokenization method that is different from Bert.

The pattern_extraction.py code seems only works for generating the pre-train TacoML data, but cannot process downstream tasks data and these downstream data in data directory is provided in processed format which means could only be used by Bert. For example , in augmented MC-TACO dataset, tokens like [unused7] did not appear in pattern_extraction.py , so I guess down stream task used a different extraction code.

This puts a dead end in reproducing these results using other transformer models. Any method in processing these downstream tasks data into the format that is suitable for other transformer models other than Bert ? Or any plan in releasing these downstream task processing codes and original data ? It is such a pity if this code only works for Bert :)

What are the gold label and dim in intrinsic data representing?

Hello,

In the data/intrinsic, there is data formatted in format "sentence" "target id" "gold label" "candidates" "dim".

I think "gold label" is the target's correct label which will be used in the evaluation part, however, I cannot find out what is the meaning of this number. For example, "[CLS] i do n ##’ ##t know what kind of money they are paying to you , but i do n ##’ ##t know how you sleep in the [MASK] . [SEP] 29 6 6440 2851 11501 5027 3944 18406 2305 7090 2" have 6 as a gold label, but what is this representing?

Also, about the "dim". I think dim's number is corresponding to DURATION, FREQUENCY, TYPICAL TIME, UPPERBOUND, and HIERARCHY in the paper, but I can not figure out which number corresponds to which. For example, in the above data, the "dim" is 2, and does it mean 2 is corresponding to TYPICAL TIME?

How to get candidates in training data of intrinsic?

Hello,

I am trying to predict different text data with the model, and find out some features needed for the model are unclear for me.

In the data/intrinsic, there is 4 training (evaluation) file for the model. Each line are in the format "sentence with MASK", "target id", "gold label", "candidates", "dim".

My understanding from observing the code is, "target id" is an index that points to the "MASK" of the sentence, "gold label" is the correct label of the data. However, I could not understand what "candidates" are representing.

For the prediction task, it looks like prediction are done in following way.

Using (input_ids, segment_ids, input_mask) generated from "sentence" and, target_ids to get "cls" vector.
reshape "cls" vector with cls.view(-1, 30522) which is called logits
using logits and "candidates" to compute scores_in_order
using scores_in_order to predict the predicted_relative_label_id which should be the model's prediction.

Since I want to get the predicted_relative_label_id as a prediction, I will need to get candidates, but I do not know how to get it.

What are candidates and how to get it from the text?

Format problem in the Google Drive Training Data

The paper's work on the temporal commensense is great, however, I have some problems with regard to the formatted tranning data that you provide in the Google Drive, here is the example:

[CLS] [MASK] [MASK] [MASK] [MASK] [MASK] [MASK] [MASK] [MASK] [MASK] [MASK] [MASK] [MASK] [MASK] [MASK] [MASK] [MASK] [MASK] [MASK] [SEP] [MASK] [MASK] liquid eye ##liner and leopard - print under [MASK] ##ments , her [MASK] [MASK] steel ##y [MASK] [MASK] thin [MASK] like [MASK] [MASK] [MASK] result of [MASK] 20 ##- [MASK] ##und [MASK] [MASK] [MASK] she [unused500] curls [MASK] [SEP] [MASK] [MASK] [MASK] [MASK] [MASK] [MASK] [MASK] [MASK] [MASK] [MASK] [MASK] [MASK] [MASK] [MASK] [MASK] [MASK] [MASK] [MASK] [MASK] [MASK] [MASK] [MASK] [MASK] [MASK] [SEP] [unused500] [unused502] [unused45] [SEP] -1 43 44 45 46 47 48 49 50 51 120 121 122 7.670003601399247e-05 0.010977975780067789 0.17749198395637333 0.3423587734906385 0.26762063340149095 0.1613272650199883 0.03558053856215351 0.004304815288057253 0.00026131446521643803 0.0 0.0 0.0 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 4218 2014 6381 -1 -1 -1 -1 -1 -1 -1 6843 8163 1010 -1 9128 2003 -1 -1 1010 2014 -1 2608 -1 14607 1010 1996 -1 -1 1996 -1 -1 6873 -1 3347 17327 2015 -1 -1 -1 1012 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1

1.I guess you masked the event in the sentence that I quoted here, but does the TacoLM could really predict so many masked words correctly? I don't think any human being could guess the [MASK] words in this sentence with so little information given.

2.The 'unuse' token that you described in your paper is used to construct 1-to-1 mapping to the new dictionary. But how could I know what the 'unuse' token really means?

Why there is a always a number attached to the end of the sentence, like the '-1' attached to the '[SEP]' token ? In other exapmles. the number could be 79,43 and so on, what does this number actually mean?

4.After the '-1', there are still several numbers following, which based on the space between them, I don't think they are in the same group with the '-1' that I mentioned in Q3, what does these numbers mean?

5.What does the float number mean after these numbers?

6.The final numbers -1 -1 -1 ..... ， I guess these are attention tokens? But it does not correspond to the HuggingFace's attention encoding, which is 1 for attention and 0 for no attention.

7.How does the whole tranning data form the (event, value, dimension) tuple in this case?