Topic: rlhf Goto Github

Some thing interesting about rlhf

👇 Here are 137 public repositories matching this topic...

aligner2024 / aligner

rlhf,Achieving Efficient Alignment through Learned Correction

User: aligner2024

Home Page: https://aligner2024.github.io/

aligner alignment llm rlhf weak-to-strong

allenai / reward-bench

rlhf,RewardBench: the first evaluation tool for reward models.

Organization: allenai

Home Page: https://huggingface.co/spaces/allenai/reward-bench

preference-learning rlhf

argilla-io / argilla

rlhf,Argilla is a collaboration tool for AI engineers and domain experts to build high-quality datasets

Organization: argilla-io

Home Page: https://argilla-io.github.io/argilla/latest/

human-in-the-loop natural-language-processing mlops developer-tools text-labeling annotation-tool nlp machine-learning active-learning weak-supervision

argilla-io / distilabel

rlhf,⚗️ distilabel is a framework for synthetic data and AI feedback for AI engineers that require high-quality outputs, full data ownership, and overall efficiency.

Organization: argilla-io

Home Page: https://distilabel.argilla.io

ai huggingface llms openai python rlaif rlhf synthetic-data synthetic-dataset-generation

cambioml / pykoi-rlhf-finetuned-transformers

rlhf,pykoi: Active learning in one unified interface

User: cambioml

Home Page: https://www.cambioml.com

ai chatbot feedback language-model llm machine-learning rlhf

contextualai / halos

rlhf,A library with extensible implementations of DPO, KTO, PPO, ORPO, and other human-aware loss functions (HALOs).

Organization: contextualai

Home Page: https://arxiv.org/abs/2402.01306

alignment dpo halos kto ppo rlhf

csmile-1006 / preferencetransformer

rlhf,Preference Transformer: Modeling Human Preferences using Transformers for RL (ICLR2023 Accepted)

User: csmile-1006

Home Page: https://sites.google.com/view/preference-transformer

rl rlhf robotics

docta-ai / docta

rlhf,A Doctor for your data

Organization: docta-ai

data data-centric-ai data-centric-machine-learning data-curation data-diagnosis language-model rlhf

forhaoliu / chain-of-hindsight

rlhf,Chain-of-Hindsight, A Scalable RLHF Method

User: forhaoliu

large-language-models rlhf learning-from-human-feedback

garyyufei / alignllmhumansurvey

rlhf,Aligning Large Language Models with Human: A Survey

User: garyyufei

Home Page: https://arxiv.org/abs/2307.12966

chatgpt gpt-4 large-language-models llms rlhf supervised-finetuning survey awesome chinese-llama llama

glgh / awesome-llm-human-preference-datasets

rlhf,A curated list of Human Preference Datasets for LLM fine-tuning, RLHF, and eval.

User: glgh

awesome-list datasets eval human-preferences llm machine-learning nlp rlhf

hiyouga / chatglm-efficient-tuning

rlhf,Fine-tuning ChatGLM-6B with PEFT | 基于 PEFT 的高效 ChatGLM 微调

User: hiyouga

chatglm chatgpt fine-tuning lora alpaca peft huggingface language-model transformers pytorch

hiyouga / llama-factory

rlhf,A WebUI for Efficient Fine-Tuning of 100+ LLMs (ACL 2024)

User: hiyouga

Home Page: https://arxiv.org/abs/2403.13372

fine-tuning language-model llama llm peft transformers rlhf qlora quantization chatglm

huggingface / alignment-handbook

rlhf,Robust recipes to align language models with human and AI preferences

Organization: huggingface

Home Page: https://huggingface.co/HuggingFaceH4

llm rlhf transformers

internlm / internlm

rlhf,Official release of InternLM2.5 7B base and chat models. 1M context support

Organization: internlm

Home Page: https://internlm.intern-ai.org.cn/

chatbot gpt large-language-model long-context rlhf fine-tuning-llm llm chinese flash-attention pretrained-models

jackaduma / chatglm-lora-rlhf-pytorch

rlhf,A full pipeline to finetune ChatGLM LLM with LoRA and RLHF on consumer hardware. Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the ChatGLM architecture. Basically ChatGPT but with ChatGLM

User: jackaduma

lora chatglm chatglm-6b chatgpt finetune gpt llm pytorch rlhf llama

jackaduma / vicuna-lora-rlhf-pytorch

rlhf,A full pipeline to finetune Vicuna LLM with LoRA and RLHF on consumer hardware. Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the Vicuna architecture. Basically ChatGPT but with Vicuna

User: jackaduma

chatgpt finetune gpt llama llm lora peft ppo pytorch reward-models

jasonvanf / llama-trl

rlhf,LLaMA-TRL: Fine-tuning LLaMA with PPO and LoRA

User: jasonvanf

adapter gpt llama lora peft ppo rlhf trl chatgpt transformer

jerry1993-tech / cornucopia-llama-fin-chinese

rlhf,聚宝盆(Cornucopia): 中文金融系列开源可商用大模型，并提供一套高效轻量化的垂直领域LLM训练框架(Pretraining、SFT、RLHF、Quantize等)

User: jerry1993-tech

Home Page: https://zhuanlan.zhihu.com/p/633736418

llama nlp chinese finance rlhf sft qa text-generation large-language-models transformers

jianzhnie / llamatuner

rlhf,Easy and Efficient Finetuning LLMs. (Supported LLama, LLama2, LLama3, Qwen, Baichuan, GLM , Falcon) 大模型高效量化训练+部署.

User: jianzhnie

Home Page: https://jianzhnie.github.io/llmtech/

llama chatgpt dpo llama3 mixtral ppo qlora qwen rlhf

jianzhnie / open-chatgpt

rlhf,The open source implementation of ChatGPT, Alpaca, Vicuna and RLHF Pipeline. 从0开始实现一个ChatGPT.

User: jianzhnie

Home Page: https://jianzhnie.github.io/llmtech

chatgpt gpt llm rlhf ppo llama stanford-alpaca lora peft

joyce94 / llm-rlhf-tuning

rlhf,LLM Tuning with PEFT (SFT+RM+PPO+DPO with LoRA)

User: joyce94

fine-tuning language-model llama llm lora peft ppo reinforcement-learning rlhf

l294265421 / alpaca-rlhf

rlhf,Finetuning LLaMA with RLHF (Reinforcement Learning with Human Feedback) based on DeepSpeed Chat

User: l294265421

Home Page: https://88aeeb3aef5040507e.gradio.live/

chatgpt rlhf llama alpaca reinforcement-learning language-model llm large-language-models

laion-ai / open-assistant

rlhf,OpenAssistant is a chat-based assistant that understands tasks, can interact with third-party systems, and retrieve information dynamically to do so.

Organization: laion-ai

Home Page: https://open-assistant.io

chatgpt language-model rlhf ai assistant discord-bot machine-learning nextjs python

liziniu / remax

rlhf,Code for Paper (ReMax: A Simple, Efficient and Effective Reinforcement Learning Method for Aligning Large Language Models)

User: liziniu

large-language-models reinforcement-learning rlhf policy-gradient

log10-io / log10

rlhf,Python client library for improving your LLM app accuracy

Organization: log10-io

Home Page: https://log10.io

agents ai artificial-intelligence autonomous-agents debugging llmops logging monitoring openai python

mengdi-li / awesome-rlaif

rlhf,A continually updated list of literature on Reinforcement Learning from AI Feedback (RLAIF)

User: mengdi-li

rlaif alignment rlhf llms rl

mihirp1998 / vader

rlhf,Video Diffusion Alignment via Reward Gradients. We improve a variety of video diffusion models such as VideoCrafter, OpenSora, ModelScope and StableVideoDiffusion by finetuning them using various reward models such as HPS, PickScore, VideoMAE, VJEPA, YOLO, Aesthetics etc.

User: mihirp1998

Home Page: https://vader-vid.github.io/

alignment diffusion rlhf video-diffusion reinforcement-learning rl vader reinforcement-learning-human-feedback video-diffusion-alignment

mindspore-courses / step_into_llm

rlhf,MindSpore online courses: Step into LLM

Organization: mindspore-courses

llm natural-language-processing nlp large-language-models mindspore bert chatgpt codegeex gpt gpt2

miraclemarvel55 / chatglm-rlhf

rlhf,对ChatGLM直接使用RLHF提升或降低目标输出概率|Modify ChatGLM output with only RLHF

User: miraclemarvel55

chatglm custom nickname rlhf ppo reward similarity

opendilab / awesome-rlhf

rlhf,A curated list of reinforcement learning with human feedback resources (continually updated)

Organization: opendilab

deep-learning deep-reinforcement-learning human-feedback large-language-models reinforcement-learning rlhf

opening-up-chatgpt / opening-up-chatgpt.github.io

rlhf,Tracking instruction-tuned LLM openness. Paper: Liesenfeld, Andreas, Alianda Lopez, and Mark Dingemanse. 2023. “Opening up ChatGPT: Tracking Openness, Transparency, and Accountability in Instruction-Tuned Text Generators.” In Proceedings of the 5th International Conference on Conversational User Interfaces. doi:10.1145/3571884.3604316.

Organization: opening-up-chatgpt

Home Page: https://opening-up-chatgpt.github.io/

chatgpt llm open-source rlhf transparency chatgpt-free

pku-alignment / beavertails

rlhf,BeaverTails is a collection of datasets designed to facilitate research on safety alignment in large language models (LLMs).

Organization: pku-alignment

Home Page: https://sites.google.com/view/pku-beavertails

ai-safety human-feedback human-feedback-data language-model large-language-model llm llms rlhf safe-rlhf safety

pku-alignment / safe-rlhf

rlhf,Safe RLHF: Constrained Value Alignment via Safe Reinforcement Learning from Human Feedback

Organization: pku-alignment

Home Page: https://pku-beaver.github.io

ai-safety alpaca datasets deepspeed large-language-models llama llm llms reinforcement-learning reinforcement-learning-from-human-feedback

princeton-nlp / simpo

rlhf,SimPO: Simple Preference Optimization with a Reference-Free Reward

Organization: princeton-nlp

alignment large-language-models preference-alignment rlhf

rlhflow / online-rlhf

rlhf,A recipe for online RLHF.

Organization: rlhflow

Home Page: https://rlhflow.github.io/

llm rlhf llama3

rlhflow / rlhf-reward-modeling

rlhf,Recipes to train reward model for RLHF.

Organization: rlhflow

Home Page: https://rlhflow.github.io/

llm rlhf reward-models llama3

rucaibox / llmsurvey

rlhf,The official GitHub page for the survey paper "A Survey of Large Language Models".

Organization: rucaibox

Home Page: https://arxiv.org/abs/2303.18223

chain-of-thought chatgpt in-context-learning instruction-tuning large-language-models llm llms natural-language-processing pre-trained-language-models pre-training

tatsu-lab / alpaca_eval

rlhf,An automatic evaluator for instruction-following language models. Human-validated, high-quality, cheap, and fast.

Organization: tatsu-lab

Home Page: https://tatsu-lab.github.io/alpaca_eval/

deep-learning evaluation foundation-models instruction-following large-language-models leaderboard nlp rlhf

thudm / imagereward

rlhf,[NeurIPS 2023] ImageReward: Learning and Evaluating Human Preferences for Text-to-image Generation

Organization: thudm

diffusion-models generative-model rlhf human-preferences

thudm / webglm

rlhf,WebGLM: An Efficient Web-enhanced Question Answering System (KDD 2023)

Organization: thudm

chatgpt llm rlhf webglm

tomekkorbak / pretraining-with-human-feedback

rlhf,Code accompanying the paper Pretraining Language Models with Human Preferences

User: tomekkorbak

Home Page: https://arxiv.org/abs/2302.08582

ai-alignment ai-safety decision-transformers gpt language-models pretraining reinforcement-learning rlhf

transformerlab / transformerlab-app

rlhf,Open Source Application for Advanced LLM Engineering: interact, train, fine-tune, and evaluate large language models on your own computer.

Organization: transformerlab

Home Page: https://transformerlab.ai/

electron llama llms lora rlhf transformers mlx

tudb-labs / mlora

rlhf,An Efficient "Factory" to Build Multiple LoRA Adapters

Organization: tudb-labs

baichuan chatglm finetune llama llama2 llm lora peft gpu mlora

uclaml / sppo

rlhf,The official implementation of Self-Play Preference Optimization (SPPO)

User: uclaml

Home Page: https://uclaml.github.io/SPPO/

deep-learning fine-tuning large-language-models rlhf self-play

voidful / textrl

rlhf,Implementation of ChatGPT RLHF (Reinforcement Learning with Human Feedback) on any generation model in huggingface's transformer (blommz-176B/bloom/gpt/bart/T5/MetaICL)

User: voidful

nlp reinforcement-learning language-model nlg controlled-nlg gpt-3 gpt-2 pytorch rlhf chatgpt