Comments (26)
Hi @taoari, I've started work on this here, but haven't yet added it to the documentation. This hasn't been tested yetβthere is a chance it already works, but might require a bit of debugging.
from continue.
Usage would look like this:
from continuedev.src.continuedev.libs.llm.ht_tgi import HuggingFaceTGI
...
config=ContinueConfig(
...
models=Models(
default=HuggingFaceTGI(server_url="<SERVER_URL>")
)
)
I encountered friction installing TGI on my mac, which is why I haven't fully tested yet, so would be super helpful for me if you wanted to give it a try
from continue.
@sestinj I got the following error
ModuleNotFoundError: No module named 'continuedev.src.continuedev.libs.llm.ht_tgi'
from continue.
Just a typo, it should be hf_tgi
Can check the file here to be sure: https://github.com/continuedev/continue/blob/main/continuedev/src/continuedev/libs/llm/hf_tgi.py
from continue.
@sestinj No errors this time. But it still does not work. The "Play" button blinks all the time, and get no response.
from continue.
@sestinj I think it crashed on my computer.
I did uninstall,
lsof -i :65432 | grep "(LISTEN)" | awk '{print $2}' | xargs kill -9
delete ~/.continue
reinstall
It still does not work, I always got "Continue Server Starting".
from continue.
Ok, I might just need to go back and test this myself then. I'll update you when it's ready.
Is Continue completely unable to start up again? In worst case I think that uninstalling Continue and restarting VS Code should solve things.
Another way to make sure that no servers are running is just lsof -i :65432
You can check the logs with cmd+shift+p "View Continue Server Logs"
from continue.
I set up a local instance of TGI and added it in config.py
as follows:
from continuedev.src.continuedev.libs.llm.ht_tgi import HuggingFaceTGI
...
config=ContinueConfig(
...
models=Models(
default=HuggingFaceTGI(server_url="http://localhost:8080")
)
)
Please note, I am able to successfully obtain responses from /info
, /generate
and /generate_stream
endpoints of TGI.
If I type a simple prompt in Continue
box, I get the following error:
Traceback (most recent call last):
File "continuedev/src/continuedev/libs/util/create_async_task.py", line 21, in callback
future.result()
File "asyncio/futures.py", line 203, in result
File "asyncio/tasks.py", line 267, in __step
File "continuedev/src/continuedev/core/autopilot.py", line 543, in create_title
title = await self.continue_sdk.models.medium.complete(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "continuedev/src/continuedev/libs/llm/__init__.py", line 258, in complete
completion = await self._complete(prompt=prompt, options=options)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "continuedev/src/continuedev/libs/llm/__init__.py", line 334, in _complete
async for chunk in self._stream_complete(prompt=prompt, options=options):
File "/var/folders/nw/hfwjfm7n6h13ybsw6kxqh08w0000gn/T/_MEI1quhS4/continuedev/src/continuedev/libs/llm/hf_tgi.py", line 55, in _stream_complete
json_chunk = json.loads(chunk)
^^^^^^^^^^^^^^^^^
File "json/__init__.py", line 346, in loads
File "json/decoder.py", line 337, in decode
File "json/decoder.py", line 355, in raw_decode
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
from continue.
@abhinavkulkarni I've just released a new version that I think will fix this. It was a very obvious mistake on our end
from continue.
Thanks @sestinj, I now get a new error:
Traceback (most recent call last):
File "continuedev/src/continuedev/libs/util/create_async_task.py", line 21, in callback
future.result()
File "asyncio/futures.py", line 203, in result
File "asyncio/tasks.py", line 267, in __step
File "continuedev/src/continuedev/core/autopilot.py", line 543, in create_title
title = await self.continue_sdk.models.medium.complete(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "continuedev/src/continuedev/libs/llm/__init__.py", line 258, in complete
completion = await self._complete(prompt=prompt, options=options)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "continuedev/src/continuedev/libs/llm/__init__.py", line 334, in _complete
async for chunk in self._stream_complete(prompt=prompt, options=options):
File "/var/folders/nw/hfwjfm7n6h13ybsw6kxqh08w0000gn/T/_MEISzqrqf/continuedev/src/continuedev/libs/llm/hf_tgi.py", line 41, in _stream_complete
args = self.collect_args(options)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/var/folders/nw/hfwjfm7n6h13ybsw6kxqh08w0000gn/T/_MEISzqrqf/continuedev/src/continuedev/libs/llm/hf_tgi.py", line 37, in collect_args
args.pop("functions")
KeyError: 'functions'
If I comment this line out, I get Error parsing JSON: Expecting value: line 1 column 1 (char 0)
error.
Please note, I get a successful response from my local TGI setup:
$ curl http://localhost:8080/generate -X POST -d '{"inputs":"Write a hello world Python program","parameters":{"max_n
ew_tokens":512}}' -H 'Content-Type: application/json' | jq ".generated_text" -rc | cat
def main():
print("Hello World")
if __name__ == "__main__":
main()
from continue.
The "functions" error is an easy one. Let me give the other a deeper look and set TGI up on my own machine (embarrassing, but I haven't gotten to this yet, I was just following the API documentation). I think it might be something about how I'm calling the streaming endpoint.
The request I'm making right now is the equivalent of
curl -X POST -H "Content-Type: application/json" -d '{"inputs": "<prompt_value>", "parameters": {"max_new_tokens": 1024}}' http://localhost:8080/generate_stream
from continue.
resuming work in the morning, has been a slight pain to setup TGI on Mac.
If there's any chance you've seen this error would be curious how you solved it. Otherwise sure I'll get it tmr
RuntimeError: An error occurred while downloading using `hf_transfer`. Consider disabling HF_HUB_ENABLE_HF_TRANSFER for better error handling.
from continue.
@abhinavkulkarni finally got it, and successfully tested on my own TGI setup. Let me know if any problems still
and @taoari this should solve your error as well
from continue.
Thanks @sestinj, local TGI setup works and I can generate responses from it.
However, I am not able to feed it context by selecting code, please see the attached video. You can see responses being generated from the TGI in the integrated terminal window. Please note, when I switch to OpenAI maybe proxy, it does work and is able to answer questions based on the highlighted context.
from continue.
Also, for Llama 2 models, </s>
is a special token that indicates end of text/sequence and should not be displayed.
You can see in the following attached image, that is is shown for the title.
from continue.
@abhinavkulkarni I have a suspicion that the code is in the prompt, but the model is ignoring it. If you try this again and hover over the response, a magnifying glass button will show up. Clicking that shows the full prompts/completions as sent to the LLM. Could you share what that looks like?
We have a stop parameter that can be set for the model, but since CodeLlama/Llama is usually the model people use, I think it would be sensible to have </s> as the default there. Also noticing the [PYTHON] tags are probably a bit annoying. I'll make a change so they are converted to triple backticks
from continue.
Thanks, @sestinj, here's a video screengrab for a simple prompt. This is the full prompt and the response:
This is a log of the prompt/completion pairs sent/received from the LLM during this step
############################################
Prompt:
[INST] Tell me what this code is doing.
[/INST]
############################################
Completion:
This code is using the `requests` library to make a GET request to the URL `https://api.github.com/users/octocat/repos`. The `json()` method is used to parse the response as JSON data, and the `for` loop is used to iterate over the list of repositories returned in the response.
For each repository, the code is printing the repository name and the number of stars it has. The `print()` function is used to display the output.
This code is using the GitHub API to retrieve a list of repositories for the user "octocat" and then printing the name and number of stars for each repository..</s>
############################################
Prompt:
[INST] " This code is using the `requests` library to make a GET request to the URL `https://api.github.com/users/octocat/repos`. The `json()` method is used to parse the response as JSON data, and the `for` loop is used to iterate over the list of repositories returned in the response.
For each repository, the code is printing the repository name and the number of stars it has. The `print()` function is used to display the output.
This code is using the GitHub API to retrieve a list of repositories for the user "octocat" and then printing the name and number of stars for each repository..</s>"
Please write a short title summarizing the message quoted above. Use no more than 10 words:
[/INST]
############################################
Completion:
""The only way to do great work is to love what you do." β Steve Job
from continue.
Thanks. My suspicion is wrong... but...I see the problem! This is actually fixable through the config file, but I'll change the default to be the correct thing and push a new version soon
There is a template_messages
property of all LLM classes that converts chat history into a templated prompt, and the function I have as the default for HuggingFaceTGI is cutting out the chat history. The correct thing would look like this:
from continuedev.src.continuedev.libs.llm.prompts.chat import llama2_template_messages
...
...
default=HuggingFaceTGI(..., template_messages=llama2_template_messages)
from continue.
@abhinavkulkarni just released a new version, this is now the default so now highlighted code will be included
from continue.
Thanks, @sestinj, things work perfectly now, except for one small detail. The title generated seems to be random and has nothing to do with the prompt. I am attaching an example screengrab here.
Also attaching all the prompt/completion pairs.
from continue.
Which model are you using? I can then just test out the exact prompt here until I find something more reliable
The prompt looks ok, other than the end token. Adding a stop_tokens
option in the LLM class and there's a small chance that fixes it, but likely not
from continue.
Also relevant for now might be the "disable_summaries" option in config.py depending on how bad it is: https://continue.dev/docs/reference/config#:~:text=token%20is%20provided.-,disable_summaries,-(boolean)%20%3D%20False
from continue.
Hey @sestinj,
Also relevant for now might be the "disable_summaries" option in config.py
Thanks, that works.
Which model are you using?
I am using a 4-bit AWQ quantized version of codellama/CodeLlama-7b-Instruct-hf
, but you won't be able to run it on CPU (I read it in one of your previous replies that you were running these on a Mac). If so, you may want to test it with a 4-bit GGML/GGUF version of this model to see if you too are getting random quotes as titles.
Another problem I have observed is that the last character in the completion tends to be repeated - so if it is a period or an exclamation mark, it is repeated. If I feed the same prompt to my local TGI using curl
, I don't get this repetition.
Here's the screengrab attached:
from continue.
Ok, cool. I'll see what I can find. Seems like Continue is just extra excited lol !!
from continue.
@abhinavkulkarni just wanted to update you on this since I know it's been a while - I've been planning on potentially using LiteLLM to make API calls to different providers, such as HuggingFace TGI, and this would solve the above problem, so I've decided to postpone digging into it myself. I'll let you know as soon as there's an update here!
Also thought you might want to know this since I talked to them and they mentioned that you were a contributor : )
from continue.
π @abhinavkulkarni
from continue.
Related Issues (20)
- Support ollama's new `keep_alive` request parameter to prevent model unloading HOT 7
- Continue error: terminated HOT 2
- Slash commands should run in ide just like context provider HOT 2
- Custom context provider working when debugging Continue fails otherwise HOT 6
- Improve experience for folks using a largish model where you get say 1 or less tokens per second HOT 2
- Keyboard shortcuts on Windows show Mac shortcuts HOT 1
- Have a in project `config.json` HOT 2
- Output truncated in the extension HOT 3
- Answer appears in console (developer tools) but not in the VSCode window. HOT 14
- code block font should use editor.fontFamily by default HOT 4
- Don't automatically append /v1 to OpenAI API bases which are missing it HOT 5
- Chat template for Codellama70b? Getting terrible and off-topic output compared to web-hosted Codellama70b HOT 6
- LMStudio "apiBase" configuration regression HOT 3
- Header is not defined error HOT 5
- Export functions to add/remove custom models to other VS Code extensions HOT 1
- Debug(locals/globals/callstack/etc.) Context Provider Request HOT 3
- IntelliJ, control J and control shift J shortcuts removed from custom action HOT 3
- Failed to launch Continue with llama_cpp_python server HOT 12
- User setting to exclude "Ask Continue" from diagnostic quick-fix options HOT 2
- Failure in install-all-dependencies Task Due to Non-execution of npm run build Command HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. πππ
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google β€οΈ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from continue.