I have tried using Llama V2 to generate synthetic data for self instruct. Unfortunatel

max_tokens refers to the maximum output tokens, <a class="user-mention notranslate" da

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Thanks <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-u

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Bypassing Context length / MaxToken length of LLMs using DSPy in context self instruct about dspy HOT 17 CLOSED

stanfordnlp commented on August 24, 2024

Bypassing Context length / MaxToken length of LLMs using DSPy in context self instruct

from dspy.

Comments (17)

okhat commented on August 24, 2024 1

max_tokens refers to the maximum output tokens, @sreenivasmrpivot

setting it to 4000 for llama only makes sense if your input is empty, which it isn’t

just set to 512 or consider restructuring the output to be one at a time as @drawal1 suggests

from dspy.

drawal1 commented on August 24, 2024

I switched to using gpt-3.5-turbo-16k to get around this problem, but its a paid/closed model. Perhaps someone here can suggest an equivalent open source/free model

from dspy.

sreenivasmrpivot commented on August 24, 2024

I guess Giraffe model has longer context and can get around it. So if I understand correctly, DSPy cannot help with this problem. The only way is to choose a model with longer context.

from dspy.

okhat commented on August 24, 2024

Using a long-context model is the easiest thing.

But DSPy is a general framework. You can implement at least 5-6 different ways to deal with long context in your own logic. Think chunking with map/reduce style, etc.

If you can provide more details, I can suggest 1-2 approaches

from dspy.

sreenivasmrpivot commented on August 24, 2024

@okhat I am trying to implement Gorilla model which uses API from HuggingFace, TensorflowHub and PytorchHub. My goal is to generate synthetic data using a fully open source model and avoid using GPT4 for commercially viable reasons.
So I want to make use of Llama 2, provide in-context self instruct prompts and get some output.
However when I try to do that directly using text prompting, I exceed the 4096 tokens allowed by Llama. I end up getting the error
Exception has occurred: APIError Invalid response object from API: '{"detail":{"object":"error","message":"This model\'s maximum context length is 4096 tokens. However, you requested 6276 tokens (2180 in the messages, 4096 in the completion). Please reduce the length of the messages or completion.","type":"invalid_request_error","param":null,"code":null}}' (HTTP response code was 400)

I am using vLLM and I guess you work with Rick Battle to some extent, I am trying to get this implemented and contribute to Rick's team.

Any suggestions are much appreciated.

from dspy.

okhat commented on August 24, 2024

Thanks @sreenivasmrpivot. Yes we collaborate with Rick very frequently!

However, you requested 6276 tokens (2180 in the messages, 4096 in the completion)

This error seems like your input isn't actually that long. The prompt is just 2180 tokens. Do you need 4096 output tokens?

Maybe just set the output to 256 tokens? Or 512?

from dspy.

sreenivasmrpivot commented on August 24, 2024

@okhat I have attached my actual input prompt here. Do you still think I can get around the problem by controlling the output to 256 or 512? If yes, where can I set the output length in the code?

sample1.txt

The output from model is expected to have 10 "API-Inst pair - examples", which is pretty long.

If I use llama 2 13b, which has a max tokens of 4096, is there anyway to get this expected output using the combination of dspy and llama 2 13b?

If It is not possible, I am considering usage of https://huggingface.co/NousResearch/Yarn-Llama-2-13b-128k instead of llama 2 13b

from dspy.

sreenivasmrpivot commented on August 24, 2024

@okhat do you have any suggestions or updates for this ^^^?

from dspy.

drawal1 commented on August 24, 2024

@sreenivasmrpivot you can increase max_tokens as follows:
llm = dspy.OpenAI(model='gpt-3.5-turbo-16k', max_tokens=8000)

Off the top, could you generate one API-Inst pair at a time and pass the "instruction"'s of the previously generated API-Inst pairs., asking the model not to generate an AP-Inst pair similar to the ones already generated?

from dspy.

sreenivasmrpivot commented on August 24, 2024

@drawal1 I like the suggestion regarding max_tokens and generating 1 pair at a time though I am not sure if the generation would avoid repetitions unless I try it.

However since gpt-3.5-turbo-16k has 16k context length, it might work. Would the above approach work for llama 2 which is only 4k in context length.

from dspy.

drawal1 commented on August 24, 2024

4k tokens is roughly 3000 words, so llama 2 4k context might work. You won't know until you try

from dspy.

okhat commented on August 24, 2024

Is this resolved?

from dspy.

ahoho commented on August 24, 2024

I'm also having an issue with this---if I compile a Module with a teleprompter, then try to run it forward, it often creates prompts that are too long. Is there a way to avoid this?

from dspy.

okhat commented on August 24, 2024

Hey @ahoho yes happy to help. I may need more details but basically:

you can reduce the parameters of the teleprompter (max_bootstrapped_demos and max_labeled_demos) for a start. They default to 4 and 16, respectively. Maybe do 1 and 0 to be extreme.

from dspy.

ahoho commented on August 24, 2024

Yes, I think this is the issue, the demonstrations end up creating a prompt that's too long. I think it's because I'm mirroring a RAG setting for classification, and the context is repeated for each of the bootstrapped demos.

from dspy.

okhat commented on August 24, 2024

@ahoho Oh wow I just saw this by accident, not sure why I missed it earlier.

Did my suggestion resolve it? Setting max_bootstrapped_demos=1 and max_labeled_demos=0, assuming you're doing BootstrapFewShotWithRandomSearch

from dspy.

ahoho commented on August 24, 2024

Sorry, I also missed your response! Yes, that did resolve the problem

from dspy.

Bypassing Context length / MaxToken length of LLMs using DSPy in context self instruct about dspy HOT 17 CLOSED

Comments (17)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent