Comments (3)
作者,你好,进行RLHF后,模型效果怎么样的,在什么方面上会有提升?
现在只是简单测试一下,从测试的例子看,RLHF训练之后,模型的输出似乎更符合人的意图了;但大部分情况下,针对一些简单的问题,没有变化。下面是两个例子(为了方便对比,解码用的贪心搜索):
示例一:RLHF训练后,对人的意图理解更为准确
rlhf前:
从目前训练过程的reward等指标来看,还有很大的优化空间。不过,以现在的资源直接调LLaMA 7b,迭代很慢,正在切换回小一些的模型去实验,比如opt 1.3b。
from alpaca-rlhf.
作者,你好,进行RLHF后,模型效果怎么样的,在什么方面上会有提升?
优化了训练,RLHF训练的模型回答看起来更丰富一些,见readme
from alpaca-rlhf.
作者,你好,进行RLHF后,模型效果怎么样的,在什么方面上会有提升?
在readme里加了STF和RLHF模型的在线demo,可以直接体验对比
from alpaca-rlhf.
Related Issues (16)
- Fix pad_token_id bug HOT 2
- element 0 of tensors does not require grad and does not have a grad_fn HOT 5
- A question about setting tokens HOT 1
- step2和step3中padding side似乎不一样? HOT 1
- Step 3: Actor model和Reward model使用不同的tokenizer
- 训练问题
- 增大max_prompt_len和max_ans_len训练会出现非法的内存访问问题
- Steps HOT 1
- v100训练时显存oom HOT 2
- reward model在v100上训练时会卡住不动 HOT 2
- stop at step2 evaluation_reward HOT 4
- v100 step3 oom HOT 12
- how to run it, need more details HOT 2
- deepspeed.initialize的一些疑惑 HOT 8
- 关于Step3中是否需要把生成的answer中eos后面token mask掉 HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from alpaca-rlhf.