Comments (23)
I know, but ... the same problem ... my memory is limited .. so ...
PS. I am Chinese
from transformers.
是不是语料的问题,bert是在wiki上训练的。我用kp20k训练了一个mini bert,在测试集上的accuracy目前是80%,你要不要试试用我这个作为encoder?
from transformers.
Hi guys,
I would like to keep the issues of this repository focused on the package it-self.
I also think it's better to keep the conversation in english so everybody can participate.
Please move this conversation to your repository: https://github.com/memray/seq2seq-keyphrase-pytorch or emails.
Thanks, I am closing this discussion.
Best,
from transformers.
have u tried transformer decoder ?instead of rnn decoder.
from transformers.
not yet, I will try. But I think rnn decoder should not be such bad.
from transformers.
not yet, I will try. But I think rnn decoder should not be such bad.
emmm,maybe u should used mean of last layer to initialize decoder, not the last token representation of last layer.
I am also very concerned about the results of using transformer decoder. If you are done, can you tell me? Thank you.
from transformers.
I think the batch size of RNN with BERT is too small. pleas see
https://github.com/memray/seq2seq-keyphrase-pytorch/blob/master/pykp/dataloader.py
line 377-378
from transformers.
I don't know what you mean by giving me this link. I set to 10 really because of the memory problem. Actually, when sentence length is 512, the max batch size is only 5, if it is 6 or bigger there will be memory error for my GPU.
from transformers.
not yet, I will try. But I think rnn decoder should not be such bad.
emmm,maybe u should used mean of last layer to initialize decoder, not the last token representation of last layer.
I am also very concerned about the results of using transformer decoder. If you are done, can you tell me? Thank you.
You are right. Maybe the mean is better, I will try as well. Thanks.
from transformers.
May i ask a question? R u chinese?23333
from transformers.
Cause for one example, it has N targets. We wanna put all targets in the same batch. 10 is too small that the targets of one example would be in different batches probably.
from transformers.
I know, but ... the same problem ... my memory is limited .. so ...
PS. I am Chinese
i am as well hahaha
from transformers.
from transformers.
accuracy 是masklm和nextsentence两个任务的,不是key phrase generation,我没说清楚,抱歉。我的算力有限,两块p100, 快一个月了,目前还没训练完。80%是当前的表现。
from transformers.
你提到的mini bert 是什么意思?
from transformers.
我大概理解你的意思了,你相当于是用kp20重新预训练一个bert,不过这样做... 感觉确实蛮麻烦。
from transformers.
我大概理解你的意思了,你相当于是用kp20重新预训练一个bert,不过这样做... 感觉确实蛮麻烦。
是的,用的是 Junseong Kim的代码:https://github.com/codertimo/BERT-pytorch ,模型规模比谷歌的BERT-Base Uncased都小很多。这个是L-8 H-256 A-8.我把目前训练的checkpoint和vocab文件发给你
from transformers.
但是你这个checkpoint,我的这个版本能直接用吗,还是说我必须装你的那个版本的代码?
from transformers.
你可以发到我邮箱 [email protected] , 谢
from transformers.
但是你这个checkpoint,我的这个版本能直接用吗,还是说我必须装你的那个版本的代码?
可以根据Junseong Kim 的代码创建一个bert model然后加载参数,不一定得安装
from transformers.
好的把。那你把checkpoint 发给我试试。
from transformers.
accuracy 是masklm和nextsentence两个任务的,不是key phrase generation,我没说清楚,抱歉。我的算力有限,两块p100, 快一个月了,目前还没训练完。80%是当前的表现。
你好,能把mini版模型发我一下吗,[email protected],谢谢啦。
from transformers.
hi, @whqwill I have some doubts about the usage manner of bert with RNN.
In bert with RNN method, I see you only consider the last term's representation (I mean the TN's) as the input to RNN decoder, why not use the other term's representation, like T1 to TN-1 ? I think the last term's information is too less to represent all the context information.
from transformers.
Related Issues (20)
- Use models as Seq2Seq model
- Reconstruction statements mentioned in the paper HOT 6
- pipeline 'text-classification' in >=4.40.0 throwing TypeError: Got unsupported ScalarType BFloat16 HOT 1
- Saved weights differ from the original model HOT 3
- 'LlamaModel' object has no attribute '_gradient_checkpointing_func'. HOT 6
- The chatml template doesn't have a system message in the tokenizer_config.json HOT 2
- Error during inference of SpeechT5 model for TTS HOT 1
- flash-attention is not running, although is_flash_attn_2_available() returns true HOT 2
- Llama3 with LlamaForSequenceClassification - Shape mismatch error HOT 2
- Cannot replicate results from object detection task guide HOT 9
- Idefics2 fine-tuning: Error when unscale_gradients called on FP16 gradients during training with transformers and accelerate HOT 1
- Wav2Vec2CTCTokenizer adds random unknown tokens to encoded input HOT 1
- MLFlowCallback MLFLOW_RUN_ID not used HOT 1
- Correct check for SDPA in Vision Language Models HOT 1
- KeyError Issue Reason HOT 1
- ValueError Reason HOT 7
- Issue with `output_router_logits` Parameter Not Being Passed Correctly in `SwitchTransformersForConditionalGeneration`
- Make fx traced model with the use of `past_key_values` pickable again?
- Community contribution: enable dynamic resolution input for more vision models. HOT 5
- mistralai/Mixtral-8x7B-v0.1 bfloat16 much slower than FP32 on Intel EMR CPU HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from transformers.