Giter Club home page Giter Club logo

laion-ai / open-assistant Goto Github PK

View Code? Open in Web Editor NEW
36.5K 422.0 3.2K 34.65 MB

OpenAssistant is a chat-based assistant that understands tasks, can interact with third-party systems, and retrieve information dynamically to do so.

Home Page: https://open-assistant.io

License: Apache License 2.0

Python 71.40% Mako 0.04% Shell 0.58% CSS 0.01% JavaScript 0.84% TypeScript 27.12% HTML 0.01% Dockerfile 0.01%
chatgpt language-model rlhf ai assistant discord-bot machine-learning nextjs python

open-assistant's Introduction

Open-Assistant

๐Ÿ“ NOTE: OpenAssistant is completed, and the project is now finished. Thank you to everyone who contributed! Check out our blog post for more information. The final published oasst2 dataset can be found on HuggingFace at OpenAssistant/oasst2

GitHub Repo stars Docs GitHub Workflow Status GitHub Workflow Status GitHub Workflow Status GitHub Workflow Status GitHub Workflow Status GitHub Workflow Status GitHub Workflow Status GitHub Workflow Status GitHub release (latest by date) Translate

Table of Contents


What is Open Assistant?

Open Assistant is a project meant to give everyone access to a great chat based large language model.

We believe that by doing this we will create a revolution in innovation in language. In the same way that stable-diffusion helped the world make art and images in new ways we hope Open Assistant can help improve the world by improving language itself.

Useful Links

How To Try It Out

Chatting with the AI

The chat frontend is now live here. Log in and start chatting! Please try to react with a thumbs up or down for the assistant's responses when chatting.

Contributing to Data Collection

The data collection frontend is now live here. Log in and start taking on tasks! We want to collect a high volume of quality data. By submitting, ranking, and labelling model prompts and responses you will be directly helping to improve the capabilities of Open Assistant.

Running the Development Setup Locally (without chat)

You do not need to run the project locally unless you are contributing to the development process. The website link above will take you to the public website where you can use the data collection app and the chat.

If you would like to run the data collection app locally for development, you can set up an entire stack needed to run Open-Assistant, including the website, backend, and associated dependent services, with Docker.

To start the demo, run this in the root directory of the repository (check this FAQ if you have problems):

docker compose --profile ci up --build --attach-dependencies

Note: when running on MacOS with an M1 chip you have to use: DB_PLATFORM=linux/x86_64 docker compose ...

Then, navigate to http://localhost:3000 (It may take some time to boot up) and interact with the website.

Note: If an issue occurs with the build, please head to the FAQ and check out the entries about Docker.

Note: When logging in via email, navigate to http://localhost:1080 to get the magic email login link.

Note: If you would like to run this in a standardized development environment (a "devcontainer") using vscode locally or in a web browser using GitHub Codespaces, you can use the provided .devcontainer folder.

Running the Development Setup Locally for Chat

You do not need to run the project locally unless you are contributing to the development process. The website link above will take you to the public website where you can use the data collection app and the chat.

Also note that the local setup is only for development and is not meant to be used as a local chatbot, unless you know what you are doing.

If you do know what you are doing, then see the inference folder for getting the inference system up and running, or have a look at --profile inference in addition to --profile ci in the above command.

The Vision

We are not going to stop at replicating ChatGPT. We want to build the assistant of the future, able to not only write email and cover letters, but do meaningful work, use APIs, dynamically research information, and much more, with the ability to be personalized and extended by anyone. And we want to do this in a way that is open and accessible, which means we must not only build a great assistant, but also make it small and efficient enough to run on consumer hardware.

The Plan

We want to get to an initial MVP as fast as possible, by following the 3-steps outlined in the InstructGPT paper
  1. Collect high-quality human generated Instruction-Fulfillment samples (prompt + response), goal >50k. We design a crowdsourced process to collect and reviewed prompts. We do not want to train on flooding/toxic/spam/junk/personal information data. We will have a leaderboard to motivate the community that shows progress and the most active users. Swag will be given to the top-contributors.
  2. For each of the collected prompts we will sample multiple completions. Completions of one prompt will then be shown randomly to users to rank them from best to worst. Again this should happen crowd-sourced, e.g. we need to deal with unreliable potentially malicious users. At least multiple votes by independent users have to be collected to measure the overall agreement. The gathered ranking-data will be used to train a reward model.
  3. Now follows the RLHF training phase based on the prompts and the reward model.

We can then take the resulting model and continue with completion sampling step 2 for a next iteration.

Slide Decks

Vision & Roadmap

Important Data Structures

How You Can Help

All open source projects begin with people like you. Open source is the belief that if we collaborate we can together gift our knowledge and technology to the world for the benefit of humanity.

Check out our contributing guide to get started.

open-assistant's People

Contributors

0x22almostevil avatar abdbarho avatar alexanderhott avatar andreaskoepf avatar andrewm4894 avatar bitplane avatar closechoice avatar dependabot[bot] avatar dvruette avatar fozziethebeat avatar guillehoardings avatar jack-michaud avatar johnflux avatar jojopirker avatar k-nearest-neighbor avatar klotske avatar kostiak avatar lucianpetri avatar martinnormark avatar ml729 avatar notmd avatar occupytheweb avatar olliestanley avatar othrayte avatar rjmacarthy avatar rsandb avatar sanagno avatar shahules786 avatar theblackcat102 avatar yk avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

open-assistant's Issues

Write instructions and examples for integration tests of the website

we want to run integration tests of the entire stack: backend + nextjs frontend. this requires pulling both up, including possibly a temporary postgres database for each, initializing the databases, and running tests against the system.

make a plan on how to achieve this, then implement a few examples and write down instructions for people who want to write their own integration tests.

One suggestion is to use pytest with a docker-compose plugin, but other possibilities exist.

Scoring-algorithm for crowd-sourced data collection

Interactions with our human-feedback front-ends (discord bot + website) should be incentivized with a user score that is shown on a leader board (gamification) + a news feed that notifies (on discord and website) about recent user activity (potentially anonymized if desired by the user).

It is not sufficient for us to collect a large number of user-interactions. We want to especially incentivize submitting high-quality instruct-fulfillment data points and human written agent-responses (demonstrations). We plan to estimate quality of user provided data in a similar way to how feedback on language-model outputs is generated - by using human feedback (e.g. grading/ranking input provided by other users).

We want to use a mix of distributed community and centralized administrator moderation. Centralized admin moderation (disabling users and deleting their data) is expensive and not easily scalable.

Challenging user behavior may include:

  • random ranking and grading
  • giving intentionally wrong feedback
  • politically motivated fake news (e.g. flooding the dataset)
  • duplicate data entry
  • DoS attacks
    ...

Backend docker starts multiple instances of backend

When executing docker compose up in scripts/frontend-development multiple instances of the backend are started as 'workers':

The tiangolo/uvicorn-gunicorn-fastapi says: "This image has an auto-tuning mechanism included to start a number of worker processes based on the available CPU cores. That way you can just add your code and get high performance automatically, which is useful in simple deployments." ... that's why the old backend dockerfile was created without this base-image...

frontend-development-backend-1  | [2022-12-18 01:10:28 +0000] [1] [INFO] Starting gunicorn 20.1.0
frontend-development-backend-1  | [2022-12-18 01:10:28 +0000] [1] [INFO] Listening at: http://0.0.0.0:8080 (1)
frontend-development-backend-1  | [2022-12-18 01:10:28 +0000] [1] [INFO] Using worker: uvicorn.workers.UvicornWorker
frontend-development-backend-1  | [2022-12-18 01:10:28 +0000] [7] [INFO] Booting worker with pid: 7
frontend-development-backend-1  | [2022-12-18 01:10:29 +0000] [8] [INFO] Booting worker with pid: 8
frontend-development-backend-1  | [2022-12-18 01:10:29 +0000] [9] [INFO] Booting worker with pid: 9
frontend-development-backend-1  | [2022-12-18 01:10:29 +0000] [10] [INFO] Booting worker with pid: 10
frontend-development-backend-1  | [2022-12-18 01:10:29 +0000] [11] [INFO] Booting worker with pid: 11
frontend-development-backend-1  | [2022-12-18 01:10:29 +0000] [12] [INFO] Booting worker with pid: 12
frontend-development-backend-1  | [2022-12-18 01:10:29 +0000] [13] [INFO] Booting worker with pid: 13
frontend-development-backend-1  | [2022-12-18 01:10:29 +0000] [14] [INFO] Booting worker with pid: 14
(...)

Better error feedback to the frontends

Currently, the backend responds with plain 400 messages to protocol errors, for example when a frontend responds to a ranking task of 3 items with a list of only 2 items. Ideally, the backend would send a helpful message to display to the user along with the 400 response.

Implement `rank_initial_prompts` for web

The rank_initial_prompts task type needs to display these fields:

  • prompts: An array of prompts

It takes the following interaction type:

  • type: post_ranking
  • ranking: An array of ints representing the preferred ordering of the prompts, indexed by zero. example: [2,1,0,3] for a series of 4 prompts.

Implement `rate_summary` for web

The rate_summary task type needs to display these fields:

  • summary
  • full_text
  • scale.min: The lowest score someone can use
  • scale.max: The largest score someone can use

It takes the following interaction type:

  • type: post_rating
  • rating: An int between scale.min and scale.max inclusive.

Implement `user_reply` for web

The user_reply task type needs to display these fields:

  • conversation.messages: An array of messages in a conversation.
  • hint: Some kind of hint.

It takes the following interaction type:

  • type: text_reply_to_post
  • text: A response to the conversation so far

Re-work the concept of postID to taskID

Originally, the frontends were supposed to be stateless, but it turns out more and more that it might be a better path to make the frontends minimally stateful, specially with the way discord works. This means, the entire mapping of postID to taskID that the backend has to do now might be less useful.

The goal of this issue is to conceptually clarify how the protocol would look like if everyone remembers the relevant task IDs (that are initially created for a task, i.e. the ones sent in the ACK messages), what the consequences of that would be and what it would mean in terms of trusting frontends to behave well.

Create new `journal` table and log all human-feedback interactions to it

We want to generate statistics about recent user interactions and calculate scores for a leaderboard (gamification). As a first step we will log all important user interactions into a 'journal' table.

journal table (event log table):

id: uuid # (time ordered UUIDs as PK)
created_date: datetime # (redundant to pk but simplifies debugging)
person_id: uuid # (nullable) user is called person in db
post_id: uuid # (nullable)
event_type: string
event_payload: jsonb #serialized event object
api_client_id: uuid

journal_integration table (track state of integration processes):

worker_id: string (name of integration worker process)
last_processed_id: uuid
last_run: datetime
last_error: string  # debug
failing_journal_id: uuid

Setup a consistent path scheme for tasks on the website

We should have a sensible hierarchy that will keep urls consistent.

I'm thinking something like:

  • /create/[task_type] for anything where someone has to write free-form responses (namely where the update type is text_reply_to_post).
  • /evaluate/[task_type] for anything where someone has to rate or rank a response.

Adapt discord bot to lastest backend API changes

A new task-based backend API has been implemented. The the old API was completely replaced. Our discord-bot now needs to be adapted to these changes.

Familiarize yourself with the new API

  1. protocol documentation: High-Level-Protocol-Architecture
  2. minimal sample frontend: text-frontend

Start the backend & try text-frontend

a) start the postgres docker container by running docker compose up -d in the backend/scripts folder
b) run the backend by starting ./scripts/run-local.sh in the backend directory.
c) run the text-mode sample application in the text-frontend folder (e.g. python text-frontend in the Open-Chat-GPT folder)

Communication with the backend

  1. get task from backend
  2. generate a task-dependent discord post (including instructions, UI elements like buttons, emojis etc.) & get its discord post-id
  3. accept (ack) the task and send the discord post-id together with the task-id (guid) to the backend (the task id does not have to be stored by the bot after this point).
  4. parse and validate user's interactions with the generated post, e.g. replies, button clicks etc., send valid interactions to the backend. It would be best to find a way to not store any state in the bot, e.g. recognize the kind of the task from the generated discord post (e.g. so that the bot could be restarted and does not need a db itself).

Channel vs. DM:

Users could interact with prompts mainly via direct-messages or as a team in a channel. The channel mode should be implemented first because we believe it is more engaging to collaborate with other users (e.g. we will add a scoring system with leaderboard soon for gamification). In a channel all users see the replies of other users. We hope this will make it easier for the community to moderate and spot bad-actors (moderation functions are a feature that we likely need to implement soon).

Group interaction

We'll probably find the best mode for bot<->group interaction in a channel by trial-and-error (incorporating user's feedback). First ideas:
a) users explicitly state that they want a new task by sending a command message in the channel
b) the bot pulls new random task based on a schedule or timeout, e.g. users have a predefined time to respond before the next task is automatically pulled from the backend and shown in the channel. Upon receipt of the first user reply (and a short minimal timespan) the next task is requested from the backend.

Verb and noun namespace coverage.

It would be good after the first MVP is built to have some statistics on the "open" coverage the assistant has current difficulty with (lack of training data), and even get the assistant to help with providing this information.

Seggest verb "tolly (v.)" to query the assistant about entropy expectation of words in training set as compared to a standardised dictionary of all response/dictionary/query/other-db including users accepting feedback for improvements.

For developers this would then become a query such as "what is the tolly of unanswered queries for the largest population of users?"

For data collection, perhaps this is a script to parse and pre-manipulate some input into some kind of query/response statistic, perhaps through an intermediate database table format.

Create API endpoints that return leaderboards

The frontends will want to display leaderboards. The backend needs to be able to compute these, and should expose API endpoints for the frontends to request their data.

Leaderboards can be made for multiple things: Global leaderboards, time-based (weekly,daily,...), task-based (being the assistant, making prompts, ...), and others. The API endpoints should have the flexibility to allow for this and the backend implementation should start with one or two basic ones (e.g. global and daily).

Setup a Privacy Policy

This privacy policy should follow LAION's current Privacy Policy but be changed to make clear what personal information gets stored and under what conditions.

When complete this should be presented in a website any user can visit.

Implement `summarize_story` for web

The summarize_story task type needs to display these fields:

  • story

It takes the following interaction type:

  • type: text_reply_to_post
  • text: The user's written summary

Add `lang` column to `post` table (ISO 639-1 code)

Add a string(2) lang field to the post model class to specify the language of a given post. This is necessary to query posts in a given language (e.g. which the current user is able to speak).

Don't forget to add the alembic migration scripts.

Implement dark mode on website

Dark Mode Implementation

Reasons

  • Night Owl users like me would like to have a way to switch the websites appearance to something more dark for the protection of our eyes.
  • With this implementation we inevitably also have the possibility to more easily theme the website.
  • Also while doing this we should create a suite of components to be reused
  • This will also speed up development for others that only implement logic and expect the theming to work properly

Description

  • dark mode is a theme complete with a color scheme and Context Provider
  • all components in Chakra-UI are tied to the Chakra ThemeProvider inside ChakraProvider
  • to switch between light and dark mode a basic switch should be placed in the Header
  • a light theme and dark theme has to be created and placed inside the styles folder ( naturally )
  • Chakra components must be added to the page layout and checked that backgrounds and foreground change correctly
  • any custom components should be wrapped with Chakra Components and be made reusable
  • the dark mode switch should look like a sun in light mode and a half moon in dark mode
  • the color mode should be initially taken from system and kept afterwards in LocalStorage
  • check for any SSR Issues that might arrise from hold the color mode value

Prerequisites

Components that need to be switched to Chakra-UI

Pages

Components

  • UserMenu
  • TBD

Reusable Components that need to be implemented

  • TBD

Tasks

  • Discuss color scheme for Light and dark Themes
  • create Light & Dark Theme to styles folder
  • add basic mode switch on header
  • style the switch to look pretty
  • add Color Mode
  • TBD

Contributing in this Issue

  • This Issue is too large to be kept in just 1 PR so we need to break this in multiple PR's
  • anyone working on any of these, Link this issue in the PR so we don't do double work
  • avoid changing too many files in each PR so we don't merge conflict ourselves
  • to color everything use the variant prop in components, these variants are affected by the selected theme

Implement `rank_assistant_replies` in web

The rank_assistant_replies task type needs to display these fields:

  • conversation.messages: An messages in a conversation
  • replies: An array of replies to the conversation

Note: This is functionally equivalent to rank_user_replies right now.

It takes the following interaction type:

  • type: post_ranking
  • ranking: An array of ints representing the preferred ordering of the replies, indexed by zero. example: [2,1,0,3] for a series of 4 replies.

Add new `text_source` table & corresponding reference in `post` table

Posts stored in the post table can originate from different sources, e.g. human-demonstration, a dataset or a language models. We want to track and later allow filtering based on these types.

  • add new text_source table:
*id: int # (SERIAL/auto-increment)
type: string  # a standard TextSourceType defined in a str-enum in python
name: string (nullable) # major identifier like dataset name, model name
details: jsonb   # allow to store variable details as dict in python
  • Add text_source_id (optional/nullable), fk-reference to post.
  • Generate almebic migration script.
  • Define a TextSourceType str-enum with some enum-members (see above), the enum could be placed in the file that contains the TextSource class (SQLModel).

Try Supervised Fine-Tuning on pseudo-QA-data

The first step in InstructGPT (https://openai.com/blog/instruction-following/) is supervised fine-tuning on human instruction data. Our website and bot are being created to collect this data. Meanwhile, we can already try out whether and how it's possible to fine-tune LLMs on such data, by substituting the not-yet-collected data for pseudo-data. One idea is to take a QA dataset (like squad or natural questions) and convert this into instruction-response pairs, then running the fine-tuning on top of that to get a feel for the dynamics of training.

Write instructions and examples for integration tests of the discord bot

we want to run integration tests of the entire stack: backend + discord bot frontend. this requires pulling both up, including possibly a temporary postgres database for the backend, initializing the database, and running tests against the system.

make a plan on how to achieve this, then implement a few examples and write down instructions for people who want to write their own integration tests.

pytest with a docker-compose plugin can be used to easily pull up containers for testing, but this task is especially challenging because of the need to either simulate, fake, or actually provide the interaction of the bot with a discord server.

Setup styling framework for website

When done, the website should have a well understood framework for styling new pages and have a semi-stable design pattern for new visual components that will be needed.

This will likely be Tailwindcss.

Make user-submitted data available for new tasks

We want to be able to run fully on user-submitted data. Thus, when a user is asked to provide a prompt, and does so, we would like to be able to then re-use that prompt for other tasks, for example the "rank prompts" tasks, but also the "act as assistant" task, where a user is asked to answer as if they were the assistant to a given prompt. That prompt should come from the database of user-submitted prompts.

Extensions to this could be that the user-submitted prompts are sampled according to how well they are ranked against other submitted prompts in the ranking task, but it's not necessary for start.

The same requirement exists for when users submit user-answers or assistant-answers, we would also like to be able to re-use those as data for further tasks (so we can build up conversations over time, one message per task).

Write instructions and examples for unit-testing nextjs & react code

  • research pros and cons of different testing frameworks for nextjs applications
  • decide on a test framework (jest)
  • implement a few example tests
  • write instructions for other people who want to write tests in a README

note: this is mainly for testing the JS/TS code in terms of business logic. there is another issue tracking tests for the UI itself.

Document how to develop on the backend API

Due to the need for an API key, it's not straightforward to see how to develop against the backend API locally.

Write documentation on the ability to provide the X-API-Key Header or the api_key query parameter when used with the environment variable ALLOW_ANY_API_KEY.

An alternative is to disable API key checking in development entirely and just assign a default one.

text-frontend, python 3.10+: ImportError: cannot import name 'Mapping' from 'collections'

In Python 3.10+ the Mapping class has been moved to the collections.abc module.

If you see the error ImportError: cannot import name 'Mapping' from 'collections' ( [...] /lib/python3.10/collections/__init__.py) try to update libraries that you see in the Traceback of the exception. In my case urllib3 was causing this error. Running pip3 install urllib3 --upgrade and pip install requests --upgrade resolved it.

To prevent this error from happening I will update the requests version in the requirements.txt of text-frontend.

Create a set of fake default users for the website

During development, it's often useful to have debug users that just exist without first needing to register. This would make development easier for newcomers as they could just flip a flag and not worry about the registration workflow.

Implement user_ranking_scoreboard for web

Scoreboard to rank users based on scores

The user_ranking_scoreboard task type needs to display these fields:

Rank: rank of the user on the leaderboard
Username: the username of the user
Score: score awarded to the user from prompt scoring
Medal: Medal awarded to the user for achieving a certain rank

Introduce Text Labels to the protocol

Text labels are labels that can be assigned to any piece of text the user interacts with.
Examples are:

  • contains toxic language
  • encourages illegal activity
  • good quality
  • bad quality
  • is spam

These should be introduced into the protocol

SVG conversion of the project's logo

We need a logo/icon for the Open-Assistant website and the profile image of our discord bot.

Please post icon/logo proposals in the #open-chat-gpt-project-coordination channel on the LAION discord server.

In mock-ups of the project website k_nearest_neighbor used a nice brain-like logo. Unfortunately it seems to be an official Google icon, therefore we cannot use it. Just as inspiration:
image

Better UI for Task Options Selection

Current logged-in index page looks boring, so I figured I could make it more pretty.

From This:

image

To This:

image
image

Whats Included:

  • new UserChoice Component
  • added LAION img ( This needs to be changed)
  • added grayscale transition on hover ( disabled on xs size which are phones )
  • reused the same text as in the Hero component

Whats Next:

  • generate contextual images for these choices
  • add images to public folder and use the pre-added img prop to link them

Train a reward model based on Instructor

Add a scalar last-token reward-head to Instructor and train it on human-feedback pairs (good-bad) of the openai/summarize-from-feedback dataset (see the Learning to summarize from human feedback paper for details about the objective).

  • place your training code in a new model/reward/instructor folder
  • please use wandb for experiment tracking, measure at least loss + accuracy (based on score of good example > bad example)
  • try to avoid modifying the original model, if possible aggregate the existing model (i.e. add the existing model as a member of the new model class)
  • compare with results from #78

Background:
We want to implement the RLHF stack for Open-Assistant in parallel to our data collection effort. As a temporary fill-in we use existing RLHF datasets like OpenAI's learning to summarize for model development. Instructor was proposed as a promising base-model candidate for a reward model.

You could use bits of reward model training code that I trained a couple of weeks ago which contains data loading code for the summarize-from-feedback data as inspirations. If you like, you can of course use of a framework like pytorch_lightning.

Implement Text Labels in the backend

Text labels are described in #40 .

Make sure the backend has an endpoint for frontends to submit such text labels, such that they are stored in the database.

Train a reward model based on RankGen

Add a reward-head (linear projection of the model's embedding to a scalar value) to martiansideofthemoon/rankgen & RankGen paper) and train it on human-feedback (good-bad example pairs) of the openai/summarize-from-feedback dataset (see the Learning to summarize from human feedback paper for details about the objective).

  • place your training code in a new model/reward/rankgen folder
  • please use wandb for experiment tracking, measure at least loss + accuracy (based on score of good example > bad example)
  • try to avoid modifying the original model, if possible aggregate the existing model (i.e. add the existing model as a member of the new model class)
  • compare with results from #77

Background:
We want to implement the RLHF stack for Open-Assistant in parallel to our data collection effort. As a temporary fill-in we use existing RLHF datasets like OpenAI's learning to summarize for model development. Instructor was proposed as a promising base-model candidate for a reward model.

You could use bits of reward model training code that I trained a couple of weeks ago which contains data loading code for the summarize-from-feedback data as inspirations. If you like, you can of course use of a framework like pytorch_lightning.

Scraping Reddit dumps

Reddit could provide a good source for training data, especially since the tree-like structure allows for multiple continuations of a conversation, which is amenable to ranking. Probably, not every subreddit will be ideal, most will just result in "general conversations" but there might be some that are essentially in instruction-reply form, or question-answer form (like r/whatisthisthing).

  • come up with an initial list of promising subreddits that would result in good training data for OpenAssistant
  • write a parser that takes in a reddit dump and extracts conversations as trees

From Christoph:

basically the idea is : We have a graph with 1 root and many branches and leaves

  1. parse the graph from the jsons
  2. get the paths from the root to the leaves that have the most upvotes & make plain text from them
    ( we should not get alll, cause then the parts near to the root would have high repetiton )
    https://files.pushshift.io/reddit/comments/
    https://files.pushshift.io/reddit/comments/sample_data.json

Implement `assistant_reply` for web

The assistant_reply task type needs to display these fields:

  • conversation.messages: An array of messages in request to an assistant.

It takes the following interaction type:

  • type: text_reply_to_post
  • text: A response to the conversation so far

Write instructions and examples for testing the website UI

there are several frameworks that simulate browser-interactions and are able to quickly test whether the UI is behaving as expected.

  • research the pros and cons of different frameworks
  • decide one a framework (cypress)
  • implement a few example tests
  • write instructions for people who want to write their own tests

Set up a process for collecting raw datasets

May community members spend a lot of time scraping, building, or otherwise assembling datasets that could be useful for training the assistant. We want to collect all of this data in a central place, so that this valuable work does not get lost.
For now, the goal is just to collect the raw data as people create it, no need yet to do any cleaning or processing. This can be done in a later step.

  • Set up an s3 bucket to collect raw datasets (via LAION)
  • Determine who gets what permissions to the s3 bucket
  • Define a process to get new datasets into the s3 bucket (e.g. send them to persons X or Y)
  • Document this in a public place where dataset creators can easily be pointed to

Implement `rank_user_replies` in web

The rank_user_replies task type needs to display these fields:

  • conversation.messages: An messages in a conversation
  • replies: An array of replies to the conversation

It takes the following interaction type:

  • type: post_ranking
  • ranking: An array of ints representing the preferred ordering of the replies, indexed by zero. example: [2,1,0,3] for a series of 4 replies.

Setup Docker based deployments for website

When done, someone should be able to create a docker image and run the website and associated databases.

This should also include documentation for what environment variables and secrets are required including at a minimum:

  • The Backend URL
  • The Backend API Key
  • Authorization secrets
  • The web-side database

Evaluate Detoxify to filter out unwanted prompts

Evaluate if unitaryai/detoxify could be used to automatically filter prompts (e.g. compute for all posts submitted to the db). Or whether it maybe could be used in a security-layer that filters input and output to a live assistant bot in production.

Please write a short report about your findings (or generate a ipynb), including the model sizes, GPU memory-requirements, inference performance, subjective opinion about the filtering quality (if possible provide some examples). Check if their license would allow us to use their model. Check how we could host the model (e.g. huggingface?).

Clicking on checkboxes resets prompt rating

Hi!

I'm testing http://localhost:3000/grading/grade-output# page.

  1. Click on rating button (e.g. 3)
  2. It gets highlighted with blue border
  3. Check the box (e.g. Contains violent content)
  4. Highlighting on button 3 disappears
CleanShot.2022-12-26.at.22.47.50.mp4

I think this is a bug, and highlighting should stay. Now it creates a feeling that if I submit, my rating won't be included.

Setup a more detailed contributor guide for web.

This should includes some best practices and how to setup a local dev environment.

Best Practices

This should cover anything not handled by prettier such as:

  • Preferred component patterns, one of
    1. class SomeComponent extends React.Component<SomeComponentProps, {}> {...}
    2. const SomeComponent = (props: SomeComponentProps) => {...};
    3. const SomeComponent: React.FC<SomeComponentProps> = (props) => {...};
  • Preferred ways to use useSWR* hooks.
  • Preference for useRef when feasible.
  • Suggestions on updating state (especially with arrays).

Setup

This should include

  • Pointer to running the backend stack in scripts/frontend-development.
  • Updating the helper script with a web DB
  • Some usable env variables for
    • NEXTAUTH_SECRET
    • DISCORD_CLIENT_ID
    • DISCORD_CLIENT_SECRET
    • EMAIL_SERVER_USER
    • EMAIL_SERVER_PASSWORD
    • EMAIL_SERVER_HOST
    • EMAIL_SERVER_PORT
    • EMAIL_FROM

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.