Giter Club home page Giter Club logo

Comments (14)

wangxinyu0922 avatar wangxinyu0922 commented on August 18, 2024

看起来像是torch以及对应的cudatookit 装错了,建议上torch官网根据自己的cuda版本重新装一下试试看,版本1.3.1以上应该是都可以的。

from ace.

gly99999 avatar gly99999 commented on August 18, 2024

我电脑的cuda是11.4的,我去官网安装了torch1.7.1和cudatookit 11.0
安装命令
pip install torch==1.7.1+cu110 torchvision==0.8.2+cu110 torchaudio==0.7.2 -f https://download.pytorch.org/whl/torch_stable.html
出现错误
torch.nn.modules.module.ModuleAttributeError: 'LSTM' object has no attribute '_flat_weights_names
但是官网上torch版本比这个低的就没有cuda11.0以上的,那我是不是还要更换我系统的cuda版本
image

或者说我用CPU跑呢,需要更改哪里的代码,CPU跑这个命令需要多久呢
CUDA_VISIBLE_DEVICES=0 python train.py --config config/conll_03_english.yaml --test

from ace.

wangxinyu0922 avatar wangxinyu0922 commented on August 18, 2024

我用的是torch1.7.1+cu10.1好像没有什么问题,这个LSTM的报错是在哪里出现的呢?

不建议使用cpu,应该会非常久

from ace.

gly99999 avatar gly99999 commented on August 18, 2024
Traceback (most recent call last):
  File "train.py", line 163, in <module>
    predict_posterior=args.predict_posterior,
  File "/home/gly/python_workspace/ACE/flair/trainers/reinforcement_trainer.py", line 1406, in final_test
    self.model = self.model.load(base_path / "best-model.pt", device='cpu')
  File "/home/gly/python_workspace/ACE/flair/nn.py", line 106, in load
    model.to(device)
  File "/home/gly/python_workspace/ACE/ace_py37/lib/python3.7/site-packages/torch/nn/modules/module.py", line 612, in to
    return self._apply(convert)
  File "/home/gly/python_workspace/ACE/ace_py37/lib/python3.7/site-packages/torch/nn/modules/module.py", line 359, in _apply
    module._apply(fn)
  File "/home/gly/python_workspace/ACE/ace_py37/lib/python3.7/site-packages/torch/nn/modules/module.py", line 359, in _apply
    module._apply(fn)
  File "/home/gly/python_workspace/ACE/ace_py37/lib/python3.7/site-packages/torch/nn/modules/module.py", line 359, in _apply
    module._apply(fn)
  File "/home/gly/python_workspace/ACE/ace_py37/lib/python3.7/site-packages/torch/nn/modules/rnn.py", line 160, in _apply
    self._flat_weights = [(lambda wn: getattr(self, wn) if hasattr(self, wn) else None)(wn) for wn in self._flat_weights_names]
  File "/home/gly/python_workspace/ACE/ace_py37/lib/python3.7/site-packages/torch/nn/modules/module.py", line 779, in __getattr__
    type(self).__name__, name))
torch.nn.modules.module.ModuleAttributeError: 'LSTM' object has no attribute '_flat_weights_names'

我的系统cuda是11.4应该会向下兼容的吧

from ace.

wangxinyu0922 avatar wangxinyu0922 commented on August 18, 2024

这个应该是保存的模型里的LSTM1在1.3版本和1.7版本不兼容的问题,你可以先试试看不用--test的情况下能不能正常进行训练:

CUDA_VISIBLE_DEVICES=0 python train.py --config config/conll_03_english.yaml

如果确实需要预先训练好的模型进行预测的话,建议还是想办法使用torch1.3.1,可以查询一下网上的一些解决方案,比如这个

from ace.

gly99999 avatar gly99999 commented on August 18, 2024

这个是我不加--test直接训练的,还挺奇怪的。

2022-04-06 22:28:25,251 ================================== Start episode 1 ==================================
['/home/gly/.flair/embeddings/lm-jw300-backward-v0.1.pt', '/home/gly/.flair/embeddings/lm-jw300-forward-v0.1.pt', '/home/gly/.flair/embeddings/news-backward-0.4.1.pt', '/home/gly/.flair/embeddings/news-forward-0.4.1.pt', '/home/yongjiang.jy/.cache/torch/transformers/bert-base-cased', '/home/yongjiang.jy/.flair/embeddings/xlm-roberta-large-finetuned-conll03-english', 'Char', 'Word: en', 'Word: glove', 'bert-base-multilingual-cased', 'elmo-original']
tensor([1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.], device='cuda:0')
tensor([0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000,
        0.5000, 0.5000], device='cuda:0', grad_fn=<SigmoidBackward>)
2022-04-06 22:28:25,260 ----------------------------------------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/gly/python_workspace/ACE/flair/trainers/reinforcement_trainer.py", line 686, in train
    loss = self.model.forward_loss(student_input)
  File "/home/gly/python_workspace/ACE/flair/models/sequence_tagger_model.py", line 1844, in forward_loss
    features = self.forward(data_points)
  File "/home/gly/python_workspace/ACE/flair/models/sequence_tagger_model.py", line 820, in forward
    self.embeddings.embed(sentences)
  File "/home/gly/python_workspace/ACE/flair/embeddings.py", line 189, in embed
    embedding.embed(sentences)
  File "/home/gly/python_workspace/ACE/flair/embeddings.py", line 97, in embed
    self._add_embeddings_internal(sentences)
  File "/home/gly/python_workspace/ACE/flair/embeddings.py", line 661, in _add_embeddings_internal
    embeddings = self.embed_sentences(sentences)
  File "/home/gly/python_workspace/ACE/flair/embeddings.py", line 652, in embed_sentences
    pack_char_seqs = pack_padded_sequence(input=char_embeds, lengths=char_lengths, batch_first=False, enforce_sorted=False)
  File "/home/gly/python_workspace/ACE/ace_py37/lib/python3.7/site-packages/torch/nn/utils/rnn.py", line 244, in pack_padded_sequence
    _VF._pack_padded_sequence(input, lengths, batch_first)
RuntimeError: 'lengths' argument should be a 1D CPU int64 tensor, but got 1D cuda:0 Long tensor
> /home/gly/python_workspace/ACE/flair/trainers/reinforcement_trainer.py(703)train()
-> torch.nn.utils.clip_grad_norm_(self.model.parameters(), 5.0)
(Pdb) c
Traceback (most recent call last):
  File "train.py", line 360, in <module>
    getattr(trainer,'train')(**train_config)
  File "/home/gly/python_workspace/ACE/flair/trainers/reinforcement_trainer.py", line 703, in train
    torch.nn.utils.clip_grad_norm_(self.model.parameters(), 5.0)
UnboundLocalError: local variable 'loss' referenced before assignment

from ace.

wangxinyu0922 avatar wangxinyu0922 commented on August 18, 2024

这个还是torch 1.3.1和1.7.1里LSTM函数不同导致的问题,我更新了代码修复了这个问题,你也可以直接修改你的flair/embeddings.py的652行:

pack_char_seqs = pack_padded_sequence(input=char_embeds, lengths=char_lengths.to('cpu'), batch_first=False, enforce_sorted=False)

from ace.

gly99999 avatar gly99999 commented on August 18, 2024

你好,我修改代码之后可以训练了,我训练了几轮之后,然后ctrl+c终止训练,也看到我的模型保存了,然后我加--test运行出现这样的问题。😭

Traceback (most recent call last):
  File "train.py", line 163, in <module>
    predict_posterior=args.predict_posterior,
  File "/home/gly/python_workspace/ACE/flair/trainers/reinforcement_trainer.py", line 1462, in final_test
    self.gpu_friendly_assign_embedding([loader], selection = self.model.selection)
  File "/home/gly/python_workspace/ACE/flair/trainers/distillation_trainer.py", line 1171, in gpu_friendly_assign_embedding
    embedding.embed(sentences)
  File "/home/gly/python_workspace/ACE/flair/embeddings.py", line 97, in embed
    self._add_embeddings_internal(sentences)
  File "/home/gly/python_workspace/ACE/flair/embeddings.py", line 2952, in _add_embeddings_internal
    self._add_embeddings_to_sentences(sentences)
  File "/home/gly/python_workspace/ACE/flair/embeddings.py", line 3041, in _add_embeddings_to_sentences
    subtokenized_sentence = self.tokenizer.tokenize(tokenized_string)

from ace.

wangxinyu0922 avatar wangxinyu0922 commented on August 18, 2024

发个完整的Traceback看一下,这个我看不出来

from ace.

gly99999 avatar gly99999 commented on August 18, 2024

这个可以吗,麻烦了

[2022-04-07 17:00:58,157 INFO] loading file https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-multilingual-cased-vocab.txt from cache at /home/gly/.cache/torch/transformers/96435fa287fbf7e469185f1062386e05a075cadbf6838b74da22bf64b080bc32.99bcd55fc66f4f3360bc49ba472b940b8dcf223ea6a345deb969d607ca900729
2022-04-07 17:01:01,282 Testing using best model ...
2022-04-07 17:01:01,286 Setting embedding mask to the best action: tensor([1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.], device='cuda:0')
['/home/gly/.flair/embeddings/lm-jw300-backward-v0.1.pt', '/home/gly/.flair/embeddings/lm-jw300-forward-v0.1.pt', '/home/gly/.flair/embeddings/news-backward-0.4.1.pt', '/home/gly/.flair/embeddings/news-forward-0.4.1.pt', '/home/yongjiang.jy/.cache/torch/transformers/bert-base-cased', '/home/yongjiang.jy/.flair/embeddings/xlm-roberta-large-finetuned-conll03-english', 'Char', 'Word: en', 'Word: glove', 'bert-base-multilingual-cased', 'elmo-original']
2022-04-07 17:01:02,668 /home/gly/.flair/embeddings/lm-jw300-backward-v0.1.pt 43087046
2022-04-07 17:01:12,048 /home/gly/.flair/embeddings/lm-jw300-forward-v0.1.pt 43087046
2022-04-07 17:01:28,571 /home/gly/.flair/embeddings/news-backward-0.4.1.pt 18257500
2022-04-07 17:01:43,615 /home/gly/.flair/embeddings/news-forward-0.4.1.pt 18257500
2022-04-07 17:01:58,789 /home/yongjiang.jy/.cache/torch/transformers/bert-base-cased 108310272
2022-04-07 17:01:58,789 mean
Traceback (most recent call last):
  File "train.py", line 163, in <module>
    predict_posterior=args.predict_posterior,
  File "/home/gly/python_workspace/ACE/flair/trainers/reinforcement_trainer.py", line 1464, in final_test
    self.gpu_friendly_assign_embedding([loader], selection = self.model.selection)
  File "/home/gly/python_workspace/ACE/flair/trainers/distillation_trainer.py", line 1171, in gpu_friendly_assign_embedding
    embedding.embed(sentences)
  File "/home/gly/python_workspace/ACE/flair/embeddings.py", line 97, in embed
    self._add_embeddings_internal(sentences)
  File "/home/gly/python_workspace/ACE/flair/embeddings.py", line 2952, in _add_embeddings_internal
    self._add_embeddings_to_sentences(sentences)
  File "/home/gly/python_workspace/ACE/flair/embeddings.py", line 3041, in _add_embeddings_to_sentences
    subtokenized_sentence = self.tokenizer.tokenize(tokenized_string)
AttributeError: 'NoneType' object has no attribute 'tokenize'

from ace.

wangxinyu0922 avatar wangxinyu0922 commented on August 18, 2024

修改了flair/trainer/reinforcement_trainer.py,你再试试看

from ace.

gly99999 avatar gly99999 commented on August 18, 2024

改了之后发现我直接ctrl+c保存模型有这个问题,我重新把代码改回去好像还是有这个问题

2022-04-07 23:20:14,546 Exiting from training early.
2022-04-07 23:20:14,546 Saving model ...
2022-04-07 23:21:01,679 Done.
['/home/gly/.cache/torch/transformers/bert-base-cased', '/home/gly/.flair/embeddings/lm-jw300-backward-v0.1.pt', '/home/gly/.flair/embeddings/lm-jw300-forward-v0.1.pt', '/home/gly/.flair/embeddings/news-backward-0.4.1.pt', '/home/gly/.flair/embeddings/news-forward-0.4.1.pt', '/home/gly/.flair/embeddings/xlm-roberta-large-finetuned-conll03-english', 'Char', 'Word: en', 'Word: glove', 'bert-base-multilingual-cased', 'elmo-original']
tensor([True, True, True, True, True, True, True, True, True, True, True],
       device='cuda:0')
2022-04-07 23:21:01,806 Final State dictionary: {}
Traceback (most recent call last):
  File "train.py", line 360, in <module>
    getattr(trainer,'train')(**train_config)
  File "/home/gly/python_workspace/ACE/flair/trainers/reinforcement_trainer.py", line 1097, in train
    self.model.selection=self.best_action
AttributeError: 'ReinforcementTrainer' object has no attribute 'best_action'

然后我加--test的话就是下面这个问题,找不到配置文件,最开始我是没有更改yaml文件里的embedding_name进行训练,原来embedding_name是/home/yongjiang.jy/.cache/torch/transformers/bert-base-cased,然后出现的报错信息也是下面的不过说的是找不到这个/home/yongjiang.jy/.cache/torch/transformers/bert-base-cased,我就想是不是之前训练的模型保存的embedding_name是/home/yongjiang.jy/.cache/torch/transformers/bert-base-cased,所以有问题,然后我把embedding_name也修改成/home/gly/.cache/torch/transformers/bert-base-cased,还是出现下面的报错。我也删除过.cache目录重新试过了,还是一样,是不是我哪里的缓存还没清掉导致会有这个问题

[2022-04-07 23:24:59,695 INFO] loading file https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-multilingual-cased-vocab.txt from cache at /home/gly/.cache/torch/transformers/96435fa287fbf7e469185f1062386e05a075cadbf6838b74da22bf64b080bc32.99bcd55fc66f4f3360bc49ba472b940b8dcf223ea6a345deb969d607ca900729
2022-04-07 23:25:07,784 Testing using best model ...
2022-04-07 23:25:07,857 Setting embedding mask to the best action: tensor([1., 0., 0., 0., 1., 1., 0., 1., 1., 1., 1.], device='cuda:0')
['/home/gly/.cache/torch/transformers/bert-base-cased', '/home/gly/.flair/embeddings/lm-jw300-backward-v0.1.pt', '/home/gly/.flair/embeddings/lm-jw300-forward-v0.1.pt', '/home/gly/.flair/embeddings/news-backward-0.4.1.pt', '/home/gly/.flair/embeddings/news-forward-0.4.1.pt', '/home/gly/.flair/embeddings/xlm-roberta-large-finetuned-conll03-english', 'Char', 'Word: en', 'Word: glove', 'bert-base-multilingual-cased', 'elmo-original']
Traceback (most recent call last):
  File "/home/gly/python_workspace/ACE/ace_py37/lib/python3.7/site-packages/transformers/configuration_utils.py", line 242, in get_config_dict
    raise EnvironmentError
OSError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "train.py", line 163, in <module>
    predict_posterior=args.predict_posterior,
  File "/home/gly/python_workspace/ACE/flair/trainers/reinforcement_trainer.py", line 1468, in final_test
    embedding.tokenizer = AutoTokenizer.from_pretrained(name, do_lower_case=True)
  File "/home/gly/python_workspace/ACE/ace_py37/lib/python3.7/site-packages/transformers/tokenization_auto.py", line 206, in from_pretrained
    config = AutoConfig.from_pretrained(pretrained_model_name_or_path, **kwargs)
  File "/home/gly/python_workspace/ACE/ace_py37/lib/python3.7/site-packages/transformers/configuration_auto.py", line 203, in from_pretrained
    config_dict, _ = PretrainedConfig.get_config_dict(pretrained_model_name_or_path, **kwargs)
  File "/home/gly/python_workspace/ACE/ace_py37/lib/python3.7/site-packages/transformers/configuration_utils.py", line 251, in get_config_dict
    raise EnvironmentError(msg)
OSError: Can't load config for '/home/gly/.cache/torch/transformers/bert-base-cased'. Make sure that:

- '/home/gly/.cache/torch/transformers/bert-base-cased' is a correct model identifier listed on 'https://huggingface.co/models'

- or '/home/gly/.cache/torch/transformers/bert-base-cased' is the correct path to a directory containing a config.json file

from ace.

wangxinyu0922 avatar wangxinyu0922 commented on August 18, 2024

第一个问题是你提前退出的太早了,模型在训练完第一个episode(不是epoch)得到模型accuracy之前不会保存best action。你可以复制一下预先训练好的模型里面的state 到你的模型保存路径试试看能不能跑起来

第二个问题,embedding_name是保证读取我预训练好的模型不会出错用的,你如果自己训练的话,所有的embedding_name可以删掉,要设定你的模型的路径应该是修改每个embedding下面的model,比如说

TransformerWordEmbeddings-1:
    model: /home/gly/.cache/torch/transformers/bert-base-cased 
    layers: -1,-2,-3,-4
    pooling_operation: mean

如果这种情况下还是读取不了embedding的话可能得确认一下/home/gly/.cache/torch/transformers/bert-base-cased路径下是不是你正确下载的模型,或者是只用model: bert-base-cased来让transformers自动读取他下载好的模型来用

from ace.

gly99999 avatar gly99999 commented on August 18, 2024

现在可以了,感谢!

from ace.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.