Comments (3)
make a standard encoder-decoder architecture with RWKV
I think it's possible with this approach:
-
Run RWKV to encode context into hidden states, and store them.
-
Currently the RNN decoder will only use the hidden states of last token, but you can mix it with the encoder hidden states (and learn the layerwise channelwise mixing weights), then the decoder is using encoder output.
is there any report on in-context-learning/FSL capability of the latest trained model?
RWKV is good at zero-shot and you are welcome to test few-shot :) Let me know if you run into any issues.
from rwkv-lm.
Ahh, thank you for your reply. I don't understand completely when you say: mix it with the encoder hidden states (and learn the layerwise channelwise mixing weights)
but will try looking at code first.
PS: Please excuse late reply, was out of loop from this for a while. Will try some few-shot with this!
from rwkv-lm.
Originally RWKV RNN is only using [hidden state after last token].
But you can concat that with [hidden state after encoder].
Then all tokens in decoder always have access to both [hidden state after encoder] & [hidden state after last token].
Then the model can learn how to use them.
Please join our Discord for more discussions with everyone :)
from rwkv-lm.
Related Issues (20)
- Gratitude and Inquiries HOT 1
- Exception has occurred: IndexError HOT 1
- Training on Cuda version 11.2, 11.3 HOT 1
- 运行报错 HOT 3
- lora训练时出错 HOT 3
- 如何训练rwkv-5-0.1b,显示权重加载错误 HOT 1
- 想问一下能否提供一个CHN+JPNTuned的7B版本 HOT 1
- AttributeError: 'MyDataset' object has no attribute 'global_rank' HOT 3
- 出错 No such file or directory: 'cuda/wkv_op.cpp' HOT 1
- 可以给个requirements? HOT 4
- huggingface无法使用 HOT 1
- Got ImportError when using load() to load wkv_cuda HOT 1
- 如何将rwkv或者retnet用于ocr任务? HOT 1
- v5 train error HOT 2
- size mismatch for blocks.11.ffn.value.weight: copying a param with shape HOT 3
- v5 train error HOT 1
- 请教一下,训练RWKV-4-Pile-3B-20221008-8023,提示错误 HOT 3
- how to pretrain v5 other lang? HOT 3
- 训练RWKV-4,报错 HOT 1
- MoE support HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from rwkv-lm.