Giter Club home page Giter Club logo

table-fact-checking's People

Contributors

arielsho avatar chenjianshu avatar eisenjulian avatar siviltaram avatar wenhuchen avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

table-fact-checking's Issues

"Concatenation" method

Hi wenhu, is the "concatenation" method of Table-BERT available in the repo? Thanks a lot!

Not really an issue, but a question

Hey Wenhu, was wondering if you could shed some light on the batch size you used in training. Was it the default 6? I'm trying to replicate your paper results, but using your saved model I can't quite get the results you got in your paper. I know you are using 16 as the batch size in evaluation, so was wondering if maybe that was the same for training? Trying to replicate your fact first, template table bert results.

I actually just emailed, you as well :)

About differences between collected_data and tokenized data.

Thank you for sharing with us your interesting dataset.

I'm curious about the differences between collected_data and tokenized data.
What did you process the collected_data to generate the tokenized_data?

Originally, I've tried to split the collected_data into train/val/test splits by using the train_id.json/val_id.json/test_id.json in the data folder.
But, the number of examples in each split differs from the train/val/split in your paper as below.

[In my case]
train: 92,585
val: 12,851
test: 12,839

[In your paper]
train: 92,283
val: 12,792
test: 12,779

However, I found that the number of the train/val/test examples in the tokenized_data folder equals your paper.
Did you apply any filtering process to the collected_data?

Issue when running on CPU : "TypeError : iteration over a 0-d array " in model.py

I'm running python model.py --do_train --do_val --batch_size 2 with torch.device('cpu')

Here is the error message :

Traceback (most recent call last):
  File "model.py", line 257, in <module>
    precision, recall, accuracy = evaluate(val_dataloader, encoder_stat, encoder_prog)
  File "model.py", line 142, in evaluate
    for i, s, p, t, inp_id, prog_id in zip(index, similarity, pred_lab, true_lab, input_ids, prog_ids):
TypeError: iteration over a 0-d array

My configuration is the following :
ubuntu 20.04
python 3.8.1 (installed with pyenv)

python packages :
boto3==1.17.55
botocore==1.20.55
certifi==2020.12.5
chardet==4.0.0
click==7.1.2
idna==2.10
jmespath==0.10.0
joblib==1.0.1
nltk==3.6.2
numpy==1.20.2
pandas==1.2.4
protobuf==3.15.8
python-dateutil==2.8.1
pytorch-pretrained-bert==0.6.2
pytz==2021.1
regex==2021.4.4
requests==2.25.1
s3transfer==0.4.1
six==1.15.0
tdqm==0.0.1
tensorboardX==2.2
torch==1.8.1+cpu
tqdm==4.60.0
typing-extensions==3.7.4.3
ujson==1.35
Unidecode==1.2.0
urllib3==1.26.4

Problem in Loading model checkpoint

I am trying to run LPA on my custom dataset. My aim is to load the checkpoint and fine-tuning on my dataset but I am getting this error while loading the checkpoint:

File "model.py", line 205, in
encoder_prog.load_state_dict(torch.load(args.output_dir + "encoder_prog_{}.pt".format(args.id)))
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 1052, in load_state_dict
self.class.name, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for Decoder:
size mismatch for tgt_word_emb.weight: copying a param with shape torch.Size([148791, 128]) from checkpoint, the shape in current model is torch.Size([68717, 128]).

On analyzing the code the problem looks due to different vocab size. Is there any way to fine-tune my dataset on the provided checkpoint?

Thanks!

Question: Are the sentences prior to rewriting available?

Hi Wenhu, thanks for making all of these open-source! I was wondering if the positive sentences that were rewritten into negative ones are available in the repo as well, or if you would consider doing so, since I think it would be an interesting (albeit different) task as well.

Cannot reproduce Table-BERT results w/ model checkpoint & HF transformers

Hi, I tried to reproduce your results with the checkpoints of Table-BERT and using HF transformers.

The code is running without errors but evaluating the checkpoint model gives very low accuracy.

Thanks!

Edit: I had an error in migrating the code from pytroch-pretrained-bert to transformers, could fix it myself!

issues with preproces_data code

while using the pre_process data code, it is creating error as number of tags not equal number of words. Any suggestions on this?

About pairwise data

Hello, I notice that there is a "all_positive_negative_pairs.json" file in "pairwise_data" folder. I'm curious about:

  1. How do you obtain these samples?
  2. What are these samples use for?

Thanks!

Can not reproduce the results obtained when training

Hi wenhu, when evaluating the checkpoints trained with the provided code, I can't get the same results with the results when training. Is my model not loaded correctly? Do you know how to deal with the problem? Thank you very much!

Question about the bootstrap data

Hi Wenhu, thanks for sharing the data!
I wonder what are the ground-truth labels for the statements in the bootstrap folder? They seem to not match any statements in the tokenized_data.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.