Pengyu Cheng's Projects
Instruct-tune LLaMA on consumer hardware
Code for ACL2024 paper - Adversarial Preference Optimization (APO).
Reasoning in Large Language Models: Papers and Resources, including Chain-of-Thought, Instruction-Tuning and Multimodality.
A comprehensive list of PAPERS, CODEBASES, and, DATASETS on Decision Making using Foundation Models including LLMs and VLMs.
A comprehensive list of papers using large language/multi-modal models for Robotics/RL, including papers, codes, and related websites
A curated list of reinforcement learning with human feedback resources (continually updated)
BERT-based intent and slots detector for chatbots.
Code for ACL 2019 oral paper - Learning Compressed Sentence Representations for On-Device Text Processing.
Code for ICML2020 paper - CLUB: A Contrastive Log-ratio Upper Bound of Mutual Information
Code for the AAAI 2020 oral paper - Dynamic Embedding on Textual Networks via a Gaussian Process.
Domain-specific preference (DSP) data and customized RM fine-tuning.
The implement of ECC classification
My emacs init file for python coding in deep learning
My personal repository
Github Pages template for academic personal websites, forked from mmistakes/minimal-mistakes
A collection of LLM with RL papers
Ongoing research training transformer models at scale
Code for the paper - Replacing Language Model for Style Transfer
Self-playing Adversarial Language Game Enhances LLM Reasoning
Code for AISTATS 2023 paper - Estimating Total Correlation with Mutual Information Estimators