Comments (12)
Yes! you should use LLMPool:
https://distilabel.argilla.io/latest/technical-reference/llms/#processllm-and-llmpool
There's some examples there but let us know if there's doubts
from distilabel.
Thank you!! i'm also having this error when i changed the generations from 2-3
from distilabel.
Hi @drewskidang! Apparently that issue happens because during the FeedbackDataset
creation in Argilla, those keys are not created, but then present on the records, so that it fails while trying to add the suggestions for those. Could you please send me script to reproduce? Thanks in advance 🤗
from distilabel.
thank you ... sorry but i can't find the notebook do you have an example of uploading custom datasets?
from distilabel.
Yes, indeed once the dataset has been generated via Pipeline.generate
then only to_argilla
is needed to convert the datasets.Dataset
into argilla.FeedbackDataset
, and to later upload it to Argilla push_to_argilla
.
from distilabel.
@alvarobartt i mean i have my own custom dataset thats already made
from distilabel.
Oh fair, did you upload it to the HuggingFace Hub or somewhere? Also, what did you mean with i'm also having this error when i changed the generations from 2-3
?
from distilabel.
The datasets i uploaded to huggingface i also have private jsonl files that i would like to annotate. I was following this example but changed the code below
preference_dataset = preference_pipeline.generate(
instructions_dataset, # type: ignore
num_generations=2, #### i change to 3 and i got the error
batch_size=8,
display_progress_bar=True,
)
from distilabel.
would it be possible to get fireworkai intergration as well
from distilabel.
The datasets i uploaded to huggingface i also have private jsonl files that i would like to annotate.
@drewskidang reusing your dataset should be relatively straightforward, you should create a hf Dataset object and prepare the data in the format expected by the task in the distilabel Pipeline.
For example, if you want to use the PreferenceTask (for rating generations) you should create/rename a column as generations
with a list of your LLM responses (the len of the list should be reflected with the num_generations arg when running pipeline.generate())
If you can share pseudo code or fake dataset examples and what you'd like to achieve we can guide you through
from distilabel.
Sorry I have a question if the set up is right. Im trying to use two models for the preference dataset
from distilabel.tasks import UltraFeedbackTask
from distilabel.llm import LLM, LLMPool, ProcessLLM
from distilabel.tasks import Task, TextGenerationTask
def load_yi(task: Task) -> LLM:
from distilabel.llm import OpenAILLM
return TogetherInferenceLLM(
model="zero-one-ai/Yi-34B-Chat",
api_key='',
task=task,
num_threads=4,
)
def load_together(task: Task) -> LLM:
from distilabel.llm import OpenAILLM
return TogetherInferenceLLM(
model='NousResearch/Nous-Hermes-2-Mixtral-8x7B-DPO',
api_key='',
max_new_tokens=1048,
task=task,
num_threads=4
)
pool = LLMPool(
llms=[
ProcessLLM(task=TextGenerationTask(), load_llm_fn=load_yi),
ProcessLLM(task=TextGenerationTask(), load_llm_fn=load_together),
]
)
preference_labeller = TogetherInferenceLLM(
model='snorkelai/Snorkel-Mistral-PairRM-DPO',
api_key='',
task=UltraFeedbackTask.for_instruction_following(),
num_threads=8,
max_new_tokens=512,
)
preference_pipeline = pipeline(
"preference",
"instruction-following",
generator=pool,
labeller=preference_labeller,
temperature=0.0,
)
from distilabel.
Hi @drewskidang, sorry for not replying earlier! We're about to release distilabel 1.0.0
and the API will change a bit, so we're closing issues related to the old version. Feel free to reopen the issue if you consider it.
from distilabel.
Related Issues (20)
- [FEATURE] Review Tool-Integrated Reasoning to see if it can be useful to have a step for it
- [FEATURE] Add `huggingface_hub.utils.telemetry` HOT 2
- [DOCS] update phrasing from readme and docs
- [FEATURE] Assign a load stage to steps of a pipeline manually
- [FEATURE] add a functionality to visualise `Pipelines` and `Steps` HOT 1
- [DOCS] add recommendations requirements for models within `Tasks`
- [DOCS] review the onboarding / landing for new users for distilabel
- [DOCS] add a legend to the component gallery icons
- [DOCS] add a community page to showcase distilabel usage accross the hub
- [FEATURE] add a `GenerateTokenClassificationData` `Task`
- [FEATURE] add a `GenerateQuestionAnsweringData` `Task`
- [FEATURE] add `TextClassificationLabeler` `Task`
- [FEATURE] add `TokenClassificationLabeler` `Task`
- [FEATURE] add `QuestionAnsweringLabeler` `Task`
- [FEATURE] classifier from prompt and synthetic data
- [FEATURE] Add `Callable` and `GlobalCallable` that takes custom `callable` as argument HOT 3
- [BUG]
- [FEATURE] Add `set_structured_output` method to `LLM` so `Task`s can configure it
- [FEATURE] Allow `Step`s to produce artifacts
- [FEATURE] add `Step` overview to the `Distiset.to_hub`
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from distilabel.