Topic: rlhf Goto Github
Some thing interesting about rlhf
Some thing interesting about rlhf
rlhf,Rewarded soups official implementation
User: alexrame
rlhf,✨Argilla: the open-source feedback platform for LLMs
Organization: argilla-io
Home Page: https://docs.argilla.io
rlhf,⚗️ AI Feedback framework for scalable LLM alignment
Organization: argilla-io
Home Page: https://distilabel.argilla.io
rlhf,pykoi: Active learning in one unified interface
User: cambioml
Home Page: https://www.cambioml.com
rlhf,Research platform for Human-in-the-loop learning (HILL) & Multi-Agent Reinforcement Learning (MARL)
Organization: cogment
Home Page: https://cogment.ai/cogment_verse
rlhf,Preference Transformer: Modeling Human Preferences using Transformers for RL (ICLR2023 Accepted)
User: csmile-1006
Home Page: https://sites.google.com/view/preference-transformer
rlhf,A Doctor for your data
Organization: docta-ai
rlhf,Implementations of Baseline Methods for Aligning Text2Img Diffusion Models with Human FeedBack
User: g-u-n
rlhf,Aligning Large Language Models with Human: A Survey
User: garyyufei
Home Page: https://arxiv.org/abs/2307.12966
rlhf,A curated list of Human Preference Datasets for LLM fine-tuning, RLHF, and eval.
User: glgh
rlhf,Fine-tuning ChatGLM-6B with PEFT | 基于 PEFT 的高效 ChatGLM 微调
User: hiyouga
rlhf,Easy-to-use LLM fine-tuning framework (LLaMA, BLOOM, Mistral, Baichuan, Qwen, ChatGLM)
User: hiyouga
rlhf,Robust recipes for to align language models with human and AI preferences
Organization: huggingface
Home Page: https://huggingface.co/HuggingFaceH4
rlhf,A full pipeline to finetune Alpaca LLM with LoRA and RLHF on consumer hardware. Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the Alpaca architecture. Basically ChatGPT but with Alpaca
User: jackaduma
rlhf,A full pipeline to finetune ChatGLM LLM with LoRA and RLHF on consumer hardware. Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the ChatGLM architecture. Basically ChatGPT but with ChatGLM
User: jackaduma
rlhf,A full pipeline to finetune Vicuna LLM with LoRA and RLHF on consumer hardware. Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the Vicuna architecture. Basically ChatGPT but with Vicuna
User: jackaduma
rlhf,聚宝盆(Cornucopia): 中文金融系列开源可商用大模型,并提供一套高效的金融垂直领域LLM训练框架(Pretraining、SFT、RLHF、Quantize等)
User: jerry1993-tech
Home Page: https://zhuanlan.zhihu.com/p/633736418
rlhf,The open source implementation of ChatGPT, Alpaca, Vicuna and RLHF Pipeline. 从0开始实现一个ChatGPT.
User: jianzhnie
Home Page: https://jianzhnie.github.io/machine-learning-wiki/#/deep-rl/papers/RLHF
rlhf,LLM Tuning with PEFT (SFT+RM+PPO+DPO with LoRA)
User: joyce94
rlhf,Finetuning LLaMA with RLHF (Reinforcement Learning with Human Feedback) based on DeepSpeed Chat
User: l294265421
Home Page: https://88aeeb3aef5040507e.gradio.live/
rlhf,Reproduce alpaca
User: l294265421
Home Page: https://88aeeb3aef5040507e.gradio.live/
rlhf,OpenAssistant is a chat-based assistant that understands tasks, can interact with third-party systems, and retrieve information dynamically to do so.
Organization: laion-ai
Home Page: https://open-assistant.io
rlhf,Chain-of-Hindsight, a simpler and more effective alternative to RLHF
User: lhao499
rlhf,Python client library for managing your LLM data in one place
Organization: log10-io
Home Page: https://log10.io
rlhf,MindSpore online courses: Step into LLM
Organization: mindspore-courses
rlhf,对ChatGLM直接使用RLHF提升或降低目标输出概率|Modify ChatGLM output with only RLHF
User: miraclemarvel55
rlhf,用RLHF可选LoRA对LLaMA和MOSS进行训练|Training LLaMA or MOSS with RLHF [LoRA]
User: miraclemarvel55
rlhf,Okapi: Instruction-tuned Large Language Models in Multiple Languages with Reinforcement Learning from Human Feedback
User: nlp-uoregon
rlhf,A curated list of reinforcement learning with human feedback resources (continually updated)
Organization: opendilab
rlhf,Tracking instruction-tuned LLM openness. Paper: Liesenfeld, Andreas, Alianda Lopez, and Mark Dingemanse. 2023. “Opening up ChatGPT: Tracking Openness, Transparency, and Accountability in Instruction-Tuned Text Generators.” In Proceedings of the 5th International Conference on Conversational User Interfaces. doi:10.1145/3571884.3604316.
Organization: opening-up-chatgpt
Home Page: https://opening-up-chatgpt.github.io/
rlhf,Theory and Practice about LLMs
User: patrick-tssn
rlhf,BeaverTails is a collection of datasets designed to facilitate research on safety alignment in large language models (LLMs).
Organization: pku-alignment
Home Page: https://sites.google.com/view/pku-beavertails
rlhf,Safe-RLHF: Constrained Value Alignment via Safe Reinforcement Learning from Human Feedback
Organization: pku-alignment
Home Page: https://pku-beaver.github.io
rlhf,The official GitHub page for the survey paper "A Survey of Large Language Models".
Organization: rucaibox
Home Page: https://arxiv.org/abs/2303.18223
rlhf,Curated list of open source and openly accessible large language models
User: sanjibnarzary
Home Page: https://github.com/sanjibnarzary/awesome-llm
rlhf,chatglm_rlhf_finetuning
User: ssbuild
rlhf,An automatic evaluator for instruction-following language models. Human-validated, high-quality, cheap, and fast.
Organization: tatsu-lab
Home Page: https://tatsu-lab.github.io/alpaca_eval/
rlhf,[NeurIPS 2023] ImageReward: Learning and Evaluating Human Preferences for Text-to-image Generation
Organization: thudm
rlhf,Code accompanying the paper Pretraining Language Models with Human Preferences
User: tomekkorbak
Home Page: https://arxiv.org/abs/2302.08582
rlhf,ZYN: Zero-Shot Reward Models with Yes-No Questions
User: vicgalle
rlhf,Implementation of ChatGPT RLHF (Reinforcement Learning with Human Feedback) on any generation model in huggingface's transformer (blommz-176B/bloom/gpt/bart/T5/MetaICL)
User: voidful
rlhf,🛰️ 基于真实医疗对话数据在ChatGLM上进行LoRA、P-Tuning V2、Freeze、RLHF等微调,我们的眼光不止于医疗问答
User: wangrongsheng
Home Page: https://www.wangrs.co/MedQA-ChatGLM/
rlhf,Implementation of Reinforcement Learning from Human Feedback (RLHF)
User: xrsrke
Home Page: https://xrsrke.github.io/instructGOOSE/
rlhf,Xtreme1 - The Next GEN Platform for Multimodal Training Data. #3D annotation, 3D segmentation, lidar-camera fusion annotation, image annotation and RLHF tools are supported!
Organization: xtreme1-io
Home Page: https://www.basic.ai
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.