Comments (14)
@geighz Would love to sometime later this week! What's your username in the CarperAI Discord server? I'm mic.
from trlx.
Note that if you have a sufficiently generic implementation of PPO, you have A2C for free: https://github.com/vwxyzjn/a2c_is_a_special_case_of_ppo
from trlx.
(The discussions for this are currently ongoing in the #trlx channel on the CarperAI Discord.)
from trlx.
Hello, I'd like to be assigned to this issue!
from trlx.
Hey @ML-Chen , I am wondering how you are doing, would you like to pair up and tackle this issue?
from trlx.
Hey guys, I am interested in this too. Possible to include me here as well?
from trlx.
@manavgarg Sure! Just add me on Discord.
from trlx.
Might be late but also interested
from trlx.
Is this active?
from trlx.
Working on this now
from trlx.
Same, collabing with @ML-Chen.
from trlx.
(I'm Blitz on Discord)
from trlx.
Yes, we're aware of that. It's just a matter of making the configurations nice so that it's easily accessible to end-users. There are also other RL algorithms that can be added.
from trlx.
The pull request: #183
from trlx.
Related Issues (20)
- Use tiny models for the tests
- About the weight of word embedding being nan HOT 1
- Direct Policy Optimization HOT 4
- Add support for safetensors
- sanity check: PPO `log_ratio` should be zero when training is disabled HOT 1
- 8-bit inference
- Sanity check: SFT Model should be frozen (PPO) HOT 2
- support base model + multi adapter for actor, critic, ref and reward model
- Reward model negative numbers meaning HOT 2
- ppo using GLM2-6b as a backbone? HOT 1
- Implement Asynchronous PPO
- Add support for Falcon 7B/40B HOT 1
- Add support for LLaMA2 HOT 1
- Model does not load in the expected dtype HOT 5
- Caught signal 7 (Bus error: nonexistent physical address) HOT 5
- ILQL training batch2 tensor dimensions error HOT 2
- RuntimeError: module must have its parameters and buffers on device HOT 4
- Unable to load the trained model to do the inference HOT 8
- Memory occupy with multi GPUs Training HOT 1
- Unable to load and run inference on finetuned Alpaca model HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from trlx.