Topic: rlhf Goto Github
Some thing interesting about rlhf
Some thing interesting about rlhf
rlhf,Achieving Efficient Alignment through Learned Correction
User: aligner2024
Home Page: https://aligner2024.github.io/
rlhf,RewardBench: the first evaluation tool for reward models.
Organization: allenai
Home Page: https://huggingface.co/spaces/allenai/reward-bench
rlhf,Argilla is a collaboration tool for AI engineers and domain experts to build high-quality datasets
Organization: argilla-io
Home Page: https://argilla-io.github.io/argilla/latest/
rlhf,⚗️ distilabel is a framework for synthetic data and AI feedback for AI engineers that require high-quality outputs, full data ownership, and overall efficiency.
Organization: argilla-io
Home Page: https://distilabel.argilla.io
rlhf,pykoi: Active learning in one unified interface
User: cambioml
Home Page: https://www.cambioml.com
rlhf,A library with extensible implementations of DPO, KTO, PPO, ORPO, and other human-aware loss functions (HALOs).
Organization: contextualai
Home Page: https://arxiv.org/abs/2402.01306
rlhf,Preference Transformer: Modeling Human Preferences using Transformers for RL (ICLR2023 Accepted)
User: csmile-1006
Home Page: https://sites.google.com/view/preference-transformer
rlhf,A Doctor for your data
Organization: docta-ai
rlhf,Chain-of-Hindsight, A Scalable RLHF Method
User: forhaoliu
rlhf,Aligning Large Language Models with Human: A Survey
User: garyyufei
Home Page: https://arxiv.org/abs/2307.12966
rlhf,A curated list of Human Preference Datasets for LLM fine-tuning, RLHF, and eval.
User: glgh
rlhf,Fine-tuning ChatGLM-6B with PEFT | 基于 PEFT 的高效 ChatGLM 微调
User: hiyouga
rlhf,A WebUI for Efficient Fine-Tuning of 100+ LLMs (ACL 2024)
User: hiyouga
Home Page: https://arxiv.org/abs/2403.13372
rlhf,Robust recipes to align language models with human and AI preferences
Organization: huggingface
Home Page: https://huggingface.co/HuggingFaceH4
rlhf,Official release of InternLM2.5 7B base and chat models. 1M context support
Organization: internlm
Home Page: https://internlm.intern-ai.org.cn/
rlhf,A full pipeline to finetune ChatGLM LLM with LoRA and RLHF on consumer hardware. Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the ChatGLM architecture. Basically ChatGPT but with ChatGLM
User: jackaduma
rlhf,A full pipeline to finetune Vicuna LLM with LoRA and RLHF on consumer hardware. Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the Vicuna architecture. Basically ChatGPT but with Vicuna
User: jackaduma
rlhf,聚宝盆(Cornucopia): 中文金融系列开源可商用大模型,并提供一套高效轻量化的垂直领域LLM训练框架(Pretraining、SFT、RLHF、Quantize等)
User: jerry1993-tech
Home Page: https://zhuanlan.zhihu.com/p/633736418
rlhf,The open source implementation of ChatGPT, Alpaca, Vicuna and RLHF Pipeline. 从0开始实现一个ChatGPT.
User: jianzhnie
Home Page: https://jianzhnie.github.io/llmtech
rlhf,LLM Tuning with PEFT (SFT+RM+PPO+DPO with LoRA)
User: joyce94
rlhf,Finetuning LLaMA with RLHF (Reinforcement Learning with Human Feedback) based on DeepSpeed Chat
User: l294265421
Home Page: https://88aeeb3aef5040507e.gradio.live/
rlhf,OpenAssistant is a chat-based assistant that understands tasks, can interact with third-party systems, and retrieve information dynamically to do so.
Organization: laion-ai
Home Page: https://open-assistant.io
rlhf,Code for Paper (ReMax: A Simple, Efficient and Effective Reinforcement Learning Method for Aligning Large Language Models)
User: liziniu
rlhf,Python client library for improving your LLM app accuracy
Organization: log10-io
Home Page: https://log10.io
rlhf,Video Diffusion Alignment via Reward Gradients. We improve a variety of video diffusion models such as VideoCrafter, OpenSora, ModelScope and StableVideoDiffusion by finetuning them using various reward models such as HPS, PickScore, VideoMAE, VJEPA, YOLO, Aesthetics etc.
User: mihirp1998
Home Page: https://vader-vid.github.io/
rlhf,MindSpore online courses: Step into LLM
Organization: mindspore-courses
rlhf,对ChatGLM直接使用RLHF提升或降低目标输出概率|Modify ChatGLM output with only RLHF
User: miraclemarvel55
rlhf,A curated list of reinforcement learning with human feedback resources (continually updated)
Organization: opendilab
rlhf,Tracking instruction-tuned LLM openness. Paper: Liesenfeld, Andreas, Alianda Lopez, and Mark Dingemanse. 2023. “Opening up ChatGPT: Tracking Openness, Transparency, and Accountability in Instruction-Tuned Text Generators.” In Proceedings of the 5th International Conference on Conversational User Interfaces. doi:10.1145/3571884.3604316.
Organization: opening-up-chatgpt
Home Page: https://opening-up-chatgpt.github.io/
rlhf,BeaverTails is a collection of datasets designed to facilitate research on safety alignment in large language models (LLMs).
Organization: pku-alignment
Home Page: https://sites.google.com/view/pku-beavertails
rlhf,Safe RLHF: Constrained Value Alignment via Safe Reinforcement Learning from Human Feedback
Organization: pku-alignment
Home Page: https://pku-beaver.github.io
rlhf,SimPO: Simple Preference Optimization with a Reference-Free Reward
Organization: princeton-nlp
rlhf,A recipe for online RLHF.
Organization: rlhflow
Home Page: https://rlhflow.github.io/
rlhf,Recipes to train reward model for RLHF.
Organization: rlhflow
Home Page: https://rlhflow.github.io/
rlhf,The official GitHub page for the survey paper "A Survey of Large Language Models".
Organization: rucaibox
Home Page: https://arxiv.org/abs/2303.18223
rlhf,An automatic evaluator for instruction-following language models. Human-validated, high-quality, cheap, and fast.
Organization: tatsu-lab
Home Page: https://tatsu-lab.github.io/alpaca_eval/
rlhf,[NeurIPS 2023] ImageReward: Learning and Evaluating Human Preferences for Text-to-image Generation
Organization: thudm
rlhf,Code accompanying the paper Pretraining Language Models with Human Preferences
User: tomekkorbak
Home Page: https://arxiv.org/abs/2302.08582
rlhf,Open Source Application for Advanced LLM Engineering: interact, train, fine-tune, and evaluate large language models on your own computer.
Organization: transformerlab
Home Page: https://transformerlab.ai/
rlhf,The official implementation of Self-Play Preference Optimization (SPPO)
User: uclaml
Home Page: https://uclaml.github.io/SPPO/
rlhf,Implementation of ChatGPT RLHF (Reinforcement Learning with Human Feedback) on any generation model in huggingface's transformer (blommz-176B/bloom/gpt/bart/T5/MetaICL)
User: voidful
rlhf,🛰️ 基于真实医疗对话数据在ChatGLM上进行LoRA、P-Tuning V2、Freeze、RLHF等微调,我们的眼光不止于医疗问答
User: wangrongsheng
Home Page: https://www.wangrs.co/MedQA-ChatGLM/
rlhf,Implementation of Reinforcement Learning from Human Feedback (RLHF)
User: xrsrke
Home Page: https://xrsrke.github.io/instructGOOSE/
rlhf,Xtreme1 is an all-in-one data labeling and annotation platform for multimodal data training and supports 3D LiDAR point cloud, image, and LLM.
Organization: xtreme1-io
Home Page: https://www.basic.ai
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.