Giter Club home page Giter Club logo

doduo's People

Contributors

horseno avatar suhara avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

doduo's Issues

Training fails when changing batch size

I tried running the training script with a smaller batch size since I'm running on machine without enough memory for the default batch size of 32. Instead trying with a batch size of 16, I get the error below.

$ python doduo/train_multi.py --batch_size=16
args={"shortcut_name": "bert-base-uncased", "max_length": 128, "batch_size": 16, "epoch": 30, "random_seed": 4649, "num_classes": 78, "multi_gpu": false, "fp16": false, "warmup": 0.0, "lr": 5e-05, "tasks": ["sato0"], "colpair": false, "train_ratios": [], "from_scratch": false, "single_col": false}
model/sato0_mosato_bert_bert-base-uncased-bs16-ml-128__sato0-1.00
Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForMultiOutputClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias']
- This IS expected if you are initializing BertForMultiOutputClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForMultiOutputClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForMultiOutputClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.weight', 'classifier.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.
Traceback (most recent call last):
  File "doduo/train_multi.py", line 436, in <module>
    logits, = model(batch["data"].T)  # (row, col) is opposite?
  File "/home/mmior/.local/share/virtualenvs/doduo-ztkaJOAZ/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/mmior/apps/doduo/doduo/model.py", line 372, in forward
    outputs = self.bert(
  File "/home/mmior/.local/share/virtualenvs/doduo-ztkaJOAZ/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/mmior/apps/doduo/doduo/model.py", line 286, in forward
    embedding_output = self.embeddings(input_ids=input_ids,
  File "/home/mmior/.local/share/virtualenvs/doduo-ztkaJOAZ/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/mmior/.local/share/virtualenvs/doduo-ztkaJOAZ/lib/python3.8/site-packages/transformers/models/bert/modeling_bert.py", line 207, in forward
    embeddings += position_embeddings
RuntimeError: The size of tensor a (650) must match the size of tensor b (512) at non-singleton dimension 1

how to load metadata

I want to get metadata and table specific data, and then use DOsolo for training, but I don't see the process of loading metadata from the process of loading data. Can you teach me how to do it in detail?

RuntimeWarning: invalid value encountered in true_divide

/root/doduo/doduo/util.py:15: RuntimeWarning: invalid value encountered in long_scalars
r = agg_conf_mat[1, 1] / agg_conf_mat[:, 1].sum()
/root/doduo/doduo/util.py:19: RuntimeWarning: invalid value encountered in true_divide
class_r = conf_mat[:, 1, 1] / conf_mat[:, :, 1].sum(axis=1)
/root/doduo/doduo/util.py:15: RuntimeWarning: invalid value encountered in long_scalars
r = agg_conf_mat[1, 1] / agg_conf_mat[:, 1].sum()
/root/doduo/doduo/util.py:18: RuntimeWarning: invalid value encountered in true_divide
class_p = conf_mat[:, 1, 1] / conf_mat[:, 1, :].sum(axis=1)
/root/doduo/doduo/util.py:19: RuntimeWarning: invalid value encountered in true_divide
class_r = conf_mat[:, 1, 1] / conf_mat[:, :, 1].sum(axis=1)

what causes this problem and how can I solve it?

Getting invalid version error

When executing "$ python doduo/train_multi.py --tasks turl turl_re-colpair --max_length 32 --batch_size 16", getting the following error. Could you please help?
Traceback (most recent call last):
File "doduo/train_multi.py", line 14, in
from transformers import BertTokenizer, BertForSequenceClassification, BertConfig
File "/data/conceptdrift/anaconda3/envs/doduo/bin/transformers/init.py", line 43, in
from . import dependency_versions_check
File "/data/conceptdrift/anaconda3/envs/doduo/bin/transformers/dependency_versions_check.py", line 41, in
require_version_core(deps[pkg])
File "/data/conceptdrift/anaconda3/envs/doduo/bin/transformers/utils/versions.py", line 101, in require_version_core
return require_version(requirement, hint)
File "/data/conceptdrift/anaconda3/envs/doduo/bin/transformers/utils/versions.py", line 92, in require_version
if want_ver is not None and not ops[op](version.parse(got_ver), version.parse(want_ver)):
File "/data/conceptdrift/anaconda3/envs/doduo/bin/packaging/version.py", line 52, in parse
return Version(version)
File "/data/conceptdrift/anaconda3/envs/doduo/bin/packaging/version.py", line 198, in init
raise InvalidVersion(f"Invalid version: '{version}'")
packaging.version.InvalidVersion: Invalid version: '0.10.1,<0.11'

The pre-trained DODUO model

Hi thank you for the open-sourced repository for the wonderful work. I wonder if you can release your pre-trained model on VizNet dataset so that we can have a quick try of DODUO model by evaluating its performance? Thanks!

Details for fine-tuning part of the model

Hi! Can you provide more details for the fine-tuning part of your model? Is the fine-tuning process prior to the training of the models? Or it is indeed the training process itself? In your paper, you did not state clearly the fine-tuning process (such as the number of epochs you fine-tuned, etc.). Also, it seems that there's no introduction for the fine-tuning used by the model in this repository.
I have been reproducing the results of the paper, yet it seems that I cannot achieve 96.3% micro F1 score stated in the paper.

out of memory

Hello, your training paper is written in a 16G T100, but I follow your steps to keep displaying out of memory

Description of tasks

The Doduo paper defines two tasks: "column type prediction and column relation annotation." However, the training script for Doduo in this repository defines 12 different tasks. How do these tasks map to the two tasks defined in the paper? It seems as though turl-re is the column relation annotation.

Are all the other tasks then different forms of type prediction? If so, why is a slightly different modwel needed for the different datasets used?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.