Giter Club home page Giter Club logo

Comments (17)

okhat avatar okhat commented on August 24, 2024 1

max_tokens refers to the maximum output tokens, @sreenivasmrpivot

setting it to 4000 for llama only makes sense if your input is empty, which it isn’t

just set to 512 or consider restructuring the output to be one at a time as @drawal1 suggests

from dspy.

drawal1 avatar drawal1 commented on August 24, 2024

I switched to using gpt-3.5-turbo-16k to get around this problem, but its a paid/closed model. Perhaps someone here can suggest an equivalent open source/free model

from dspy.

sreenivasmrpivot avatar sreenivasmrpivot commented on August 24, 2024

I guess Giraffe model has longer context and can get around it. So if I understand correctly, DSPy cannot help with this problem. The only way is to choose a model with longer context.

from dspy.

okhat avatar okhat commented on August 24, 2024

Using a long-context model is the easiest thing.

But DSPy is a general framework. You can implement at least 5-6 different ways to deal with long context in your own logic. Think chunking with map/reduce style, etc.

If you can provide more details, I can suggest 1-2 approaches

from dspy.

sreenivasmrpivot avatar sreenivasmrpivot commented on August 24, 2024

@okhat I am trying to implement Gorilla model which uses API from HuggingFace, TensorflowHub and PytorchHub. My goal is to generate synthetic data using a fully open source model and avoid using GPT4 for commercially viable reasons.
So I want to make use of Llama 2, provide in-context self instruct prompts and get some output.
However when I try to do that directly using text prompting, I exceed the 4096 tokens allowed by Llama. I end up getting the error
Exception has occurred: APIError Invalid response object from API: '{"detail":{"object":"error","message":"This model\'s maximum context length is 4096 tokens. However, you requested 6276 tokens (2180 in the messages, 4096 in the completion). Please reduce the length of the messages or completion.","type":"invalid_request_error","param":null,"code":null}}' (HTTP response code was 400)

I am using vLLM and I guess you work with Rick Battle to some extent, I am trying to get this implemented and contribute to Rick's team.

Any suggestions are much appreciated.

from dspy.

okhat avatar okhat commented on August 24, 2024

Thanks @sreenivasmrpivot. Yes we collaborate with Rick very frequently!

However, you requested 6276 tokens (2180 in the messages, 4096 in the completion)

This error seems like your input isn't actually that long. The prompt is just 2180 tokens. Do you need 4096 output tokens?

Maybe just set the output to 256 tokens? Or 512?

from dspy.

sreenivasmrpivot avatar sreenivasmrpivot commented on August 24, 2024

@okhat I have attached my actual input prompt here. Do you still think I can get around the problem by controlling the output to 256 or 512? If yes, where can I set the output length in the code?

sample1.txt

The output from model is expected to have 10 "API-Inst pair - examples", which is pretty long.

If I use llama 2 13b, which has a max tokens of 4096, is there anyway to get this expected output using the combination of dspy and llama 2 13b?

If It is not possible, I am considering usage of https://huggingface.co/NousResearch/Yarn-Llama-2-13b-128k instead of llama 2 13b

from dspy.

sreenivasmrpivot avatar sreenivasmrpivot commented on August 24, 2024

@okhat do you have any suggestions or updates for this ^^^?

from dspy.

drawal1 avatar drawal1 commented on August 24, 2024

@sreenivasmrpivot you can increase max_tokens as follows:
llm = dspy.OpenAI(model='gpt-3.5-turbo-16k', max_tokens=8000)

Off the top, could you generate one API-Inst pair at a time and pass the "instruction"'s of the previously generated API-Inst pairs., asking the model not to generate an AP-Inst pair similar to the ones already generated?

from dspy.

sreenivasmrpivot avatar sreenivasmrpivot commented on August 24, 2024

@drawal1 I like the suggestion regarding max_tokens and generating 1 pair at a time though I am not sure if the generation would avoid repetitions unless I try it.

However since gpt-3.5-turbo-16k has 16k context length, it might work. Would the above approach work for llama 2 which is only 4k in context length.

from dspy.

drawal1 avatar drawal1 commented on August 24, 2024

4k tokens is roughly 3000 words, so llama 2 4k context might work. You won't know until you try

from dspy.

okhat avatar okhat commented on August 24, 2024

Is this resolved?

from dspy.

ahoho avatar ahoho commented on August 24, 2024

I'm also having an issue with this---if I compile a Module with a teleprompter, then try to run it forward, it often creates prompts that are too long. Is there a way to avoid this?

from dspy.

okhat avatar okhat commented on August 24, 2024

Hey @ahoho yes happy to help. I may need more details but basically:

you can reduce the parameters of the teleprompter (max_bootstrapped_demos and max_labeled_demos) for a start. They default to 4 and 16, respectively. Maybe do 1 and 0 to be extreme.

from dspy.

ahoho avatar ahoho commented on August 24, 2024

Yes, I think this is the issue, the demonstrations end up creating a prompt that's too long. I think it's because I'm mirroring a RAG setting for classification, and the context is repeated for each of the bootstrapped demos.

from dspy.

okhat avatar okhat commented on August 24, 2024

@ahoho Oh wow I just saw this by accident, not sure why I missed it earlier.

Did my suggestion resolve it? Setting max_bootstrapped_demos=1 and max_labeled_demos=0, assuming you're doing BootstrapFewShotWithRandomSearch

from dspy.

ahoho avatar ahoho commented on August 24, 2024

Sorry, I also missed your response! Yes, that did resolve the problem

from dspy.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.