Giter Club home page Giter Club logo

Comments (12)

dvsrepo avatar dvsrepo commented on July 28, 2024

Yes! you should use LLMPool:

https://distilabel.argilla.io/latest/technical-reference/llms/#processllm-and-llmpool

There's some examples there but let us know if there's doubts

from distilabel.

drewskidang avatar drewskidang commented on July 28, 2024

Thank you!! i'm also having this error when i changed the generations from 2-3
image

from distilabel.

alvarobartt avatar alvarobartt commented on July 28, 2024

Hi @drewskidang! Apparently that issue happens because during the FeedbackDataset creation in Argilla, those keys are not created, but then present on the records, so that it fails while trying to add the suggestions for those. Could you please send me script to reproduce? Thanks in advance 🤗

from distilabel.

drewskidang avatar drewskidang commented on July 28, 2024

thank you ... sorry but i can't find the notebook do you have an example of uploading custom datasets?

from distilabel.

alvarobartt avatar alvarobartt commented on July 28, 2024

Yes, indeed once the dataset has been generated via Pipeline.generate then only to_argilla is needed to convert the datasets.Dataset into argilla.FeedbackDataset, and to later upload it to Argilla push_to_argilla.

from distilabel.

drewskidang avatar drewskidang commented on July 28, 2024

@alvarobartt i mean i have my own custom dataset thats already made

from distilabel.

alvarobartt avatar alvarobartt commented on July 28, 2024

Oh fair, did you upload it to the HuggingFace Hub or somewhere? Also, what did you mean with i'm also having this error when i changed the generations from 2-3?

from distilabel.

drewskidang avatar drewskidang commented on July 28, 2024

The datasets i uploaded to huggingface i also have private jsonl files that i would like to annotate. I was following this example but changed the code below

preference_dataset = preference_pipeline.generate(
instructions_dataset, # type: ignore
num_generations=2, #### i change to 3 and i got the error
batch_size=8,
display_progress_bar=True,
)

https://github.com/argilla-io/distilabel/blob/main/docs/tutorials/pipeline-notus-instructions-preferences-legal.ipynb

from distilabel.

drewskidang avatar drewskidang commented on July 28, 2024

would it be possible to get fireworkai intergration as well

from distilabel.

dvsrepo avatar dvsrepo commented on July 28, 2024

The datasets i uploaded to huggingface i also have private jsonl files that i would like to annotate.

@drewskidang reusing your dataset should be relatively straightforward, you should create a hf Dataset object and prepare the data in the format expected by the task in the distilabel Pipeline.

For example, if you want to use the PreferenceTask (for rating generations) you should create/rename a column as generations with a list of your LLM responses (the len of the list should be reflected with the num_generations arg when running pipeline.generate())

If you can share pseudo code or fake dataset examples and what you'd like to achieve we can guide you through

from distilabel.

drewskidang avatar drewskidang commented on July 28, 2024

Sorry I have a question if the set up is right. Im trying to use two models for the preference dataset

from distilabel.tasks import UltraFeedbackTask
from distilabel.llm import LLM, LLMPool, ProcessLLM
from distilabel.tasks import Task, TextGenerationTask


def load_yi(task: Task) -> LLM:
    from distilabel.llm import OpenAILLM

    return TogetherInferenceLLM(
        model="zero-one-ai/Yi-34B-Chat",
        api_key='',
        task=task,
        num_threads=4,
    )


def load_together(task: Task) -> LLM:
    from distilabel.llm import OpenAILLM

    return TogetherInferenceLLM(
    model='NousResearch/Nous-Hermes-2-Mixtral-8x7B-DPO',
    api_key='',
    max_new_tokens=1048,
    task=task,
    num_threads=4

    )


pool = LLMPool(
    llms=[
        ProcessLLM(task=TextGenerationTask(), load_llm_fn=load_yi),
        ProcessLLM(task=TextGenerationTask(), load_llm_fn=load_together),
    ]
)
preference_labeller = TogetherInferenceLLM(
    model='snorkelai/Snorkel-Mistral-PairRM-DPO',
    api_key='',
    task=UltraFeedbackTask.for_instruction_following(),
    num_threads=8,
    max_new_tokens=512,
)



preference_pipeline = pipeline(
    "preference",
    "instruction-following",
    generator=pool,
    labeller=preference_labeller,
    temperature=0.0,
)

from distilabel.

gabrielmbmb avatar gabrielmbmb commented on July 28, 2024

Hi @drewskidang, sorry for not replying earlier! We're about to release distilabel 1.0.0 and the API will change a bit, so we're closing issues related to the old version. Feel free to reopen the issue if you consider it.

from distilabel.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.