Is your feature request related to a problem? Please describe. A

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Usage would look like this: <div class="highlight highlight-source-python notransl

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Thanks <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-u

HuggingFace TGI Codellama support about continue HOT 26 OPEN

taoari commented on June 18, 2024 2

HuggingFace TGI Codellama support

from continue.

Comments (26)

sestinj commented on June 18, 2024

Hi @taoari, I've started work on this here, but haven't yet added it to the documentation. This hasn't been tested yet—there is a chance it already works, but might require a bit of debugging.

from continue.

sestinj commented on June 18, 2024

Usage would look like this:

from continuedev.src.continuedev.libs.llm.ht_tgi import HuggingFaceTGI
...
config=ContinueConfig(
  ...
  models=Models(
    default=HuggingFaceTGI(server_url="<SERVER_URL>")
  )
)

I encountered friction installing TGI on my mac, which is why I haven't fully tested yet, so would be super helpful for me if you wanted to give it a try

from continue.

taoari commented on June 18, 2024

@sestinj I got the following error

ModuleNotFoundError: No module named 'continuedev.src.continuedev.libs.llm.ht_tgi'

from continue.

sestinj commented on June 18, 2024

Just a typo, it should be hf_tgi

Can check the file here to be sure: https://github.com/continuedev/continue/blob/main/continuedev/src/continuedev/libs/llm/hf_tgi.py

from continue.

taoari commented on June 18, 2024

@sestinj No errors this time. But it still does not work. The "Play" button blinks all the time, and get no response.

from continue.

taoari commented on June 18, 2024

@sestinj I think it crashed on my computer.

I did uninstall,
lsof -i :65432 | grep "(LISTEN)" | awk '{print $2}' | xargs kill -9
delete ~/.continue
reinstall

It still does not work, I always got "Continue Server Starting".

from continue.

sestinj commented on June 18, 2024

Ok, I might just need to go back and test this myself then. I'll update you when it's ready.

Is Continue completely unable to start up again? In worst case I think that uninstalling Continue and restarting VS Code should solve things.

Another way to make sure that no servers are running is just lsof -i :65432

You can check the logs with cmd+shift+p "View Continue Server Logs"

from continue.

abhinavkulkarni commented on June 18, 2024

I set up a local instance of TGI and added it in config.py as follows:

from continuedev.src.continuedev.libs.llm.ht_tgi import HuggingFaceTGI
...
config=ContinueConfig(
  ...
  models=Models(
    default=HuggingFaceTGI(server_url="http://localhost:8080")
  )
)

Please note, I am able to successfully obtain responses from /info, /generate and /generate_stream endpoints of TGI.

If I type a simple prompt in Continue box, I get the following error:

Traceback (most recent call last):

  File "continuedev/src/continuedev/libs/util/create_async_task.py", line 21, in callback
    future.result()

  File "asyncio/futures.py", line 203, in result

  File "asyncio/tasks.py", line 267, in __step

  File "continuedev/src/continuedev/core/autopilot.py", line 543, in create_title
    title = await self.continue_sdk.models.medium.complete(
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "continuedev/src/continuedev/libs/llm/__init__.py", line 258, in complete
    completion = await self._complete(prompt=prompt, options=options)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "continuedev/src/continuedev/libs/llm/__init__.py", line 334, in _complete
    async for chunk in self._stream_complete(prompt=prompt, options=options):

  File "/var/folders/nw/hfwjfm7n6h13ybsw6kxqh08w0000gn/T/_MEI1quhS4/continuedev/src/continuedev/libs/llm/hf_tgi.py", line 55, in _stream_complete
    json_chunk = json.loads(chunk)
                 ^^^^^^^^^^^^^^^^^

  File "json/__init__.py", line 346, in loads

  File "json/decoder.py", line 337, in decode

  File "json/decoder.py", line 355, in raw_decode

json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

from continue.

sestinj commented on June 18, 2024

@abhinavkulkarni I've just released a new version that I think will fix this. It was a very obvious mistake on our end

from continue.

abhinavkulkarni commented on June 18, 2024

Thanks @sestinj, I now get a new error:

Traceback (most recent call last):

  File "continuedev/src/continuedev/libs/util/create_async_task.py", line 21, in callback
    future.result()

  File "asyncio/futures.py", line 203, in result

  File "asyncio/tasks.py", line 267, in __step

  File "continuedev/src/continuedev/core/autopilot.py", line 543, in create_title
    title = await self.continue_sdk.models.medium.complete(
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "continuedev/src/continuedev/libs/llm/__init__.py", line 258, in complete
    completion = await self._complete(prompt=prompt, options=options)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "continuedev/src/continuedev/libs/llm/__init__.py", line 334, in _complete
    async for chunk in self._stream_complete(prompt=prompt, options=options):

  File "/var/folders/nw/hfwjfm7n6h13ybsw6kxqh08w0000gn/T/_MEISzqrqf/continuedev/src/continuedev/libs/llm/hf_tgi.py", line 41, in _stream_complete
    args = self.collect_args(options)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/var/folders/nw/hfwjfm7n6h13ybsw6kxqh08w0000gn/T/_MEISzqrqf/continuedev/src/continuedev/libs/llm/hf_tgi.py", line 37, in collect_args
    args.pop("functions")

KeyError: 'functions'

If I comment this line out, I get Error parsing JSON: Expecting value: line 1 column 1 (char 0) error.

Please note, I get a successful response from my local TGI setup:

$ curl http://localhost:8080/generate     -X POST     -d '{"inputs":"Write a hello world Python program","parameters":{"max_n
ew_tokens":512}}'     -H 'Content-Type: application/json' | jq ".generated_text" -rc | cat

def main():
    print("Hello World")


if __name__ == "__main__":
    main()

from continue.

sestinj commented on June 18, 2024

The "functions" error is an easy one. Let me give the other a deeper look and set TGI up on my own machine (embarrassing, but I haven't gotten to this yet, I was just following the API documentation). I think it might be something about how I'm calling the streaming endpoint.

The request I'm making right now is the equivalent of

curl -X POST -H "Content-Type: application/json" -d '{"inputs": "<prompt_value>", "parameters": {"max_new_tokens": 1024}}' http://localhost:8080/generate_stream

from continue.

sestinj commented on June 18, 2024

resuming work in the morning, has been a slight pain to setup TGI on Mac.

If there's any chance you've seen this error would be curious how you solved it. Otherwise sure I'll get it tmr

RuntimeError: An error occurred while downloading using `hf_transfer`. Consider disabling HF_HUB_ENABLE_HF_TRANSFER for better error handling.

from continue.

sestinj commented on June 18, 2024

@abhinavkulkarni finally got it, and successfully tested on my own TGI setup. Let me know if any problems still

and @taoari this should solve your error as well

from continue.

abhinavkulkarni commented on June 18, 2024

Thanks @sestinj, local TGI setup works and I can generate responses from it.

However, I am not able to feed it context by selecting code, please see the attached video. You can see responses being generated from the TGI in the integrated terminal window. Please note, when I switch to OpenAI maybe proxy, it does work and is able to answer questions based on the highlighted context.

from continue.

abhinavkulkarni commented on June 18, 2024

Also, for Llama 2 models, </s> is a special token that indicates end of text/sequence and should not be displayed.

You can see in the following attached image, that is is shown for the title.

from continue.

sestinj commented on June 18, 2024

@abhinavkulkarni I have a suspicion that the code is in the prompt, but the model is ignoring it. If you try this again and hover over the response, a magnifying glass button will show up. Clicking that shows the full prompts/completions as sent to the LLM. Could you share what that looks like?

We have a stop parameter that can be set for the model, but since CodeLlama/Llama is usually the model people use, I think it would be sensible to have </s> as the default there. Also noticing the [PYTHON] tags are probably a bit annoying. I'll make a change so they are converted to triple backticks

from continue.

abhinavkulkarni commented on June 18, 2024

Thanks, @sestinj, here's a video screengrab for a simple prompt. This is the full prompt and the response:

This is a log of the prompt/completion pairs sent/received from the LLM during this step

############################################

Prompt: 

[INST] Tell me what this code is doing.
[/INST]

############################################

Completion: 

 This code is using the `requests` library to make a GET request to the URL `https://api.github.com/users/octocat/repos`. The `json()` method is used to parse the response as JSON data, and the `for` loop is used to iterate over the list of repositories returned in the response.

For each repository, the code is printing the repository name and the number of stars it has. The `print()` function is used to display the output.

This code is using the GitHub API to retrieve a list of repositories for the user "octocat" and then printing the name and number of stars for each repository..</s>

############################################

Prompt: 

[INST] " This code is using the `requests` library to make a GET request to the URL `https://api.github.com/users/octocat/repos`. The `json()` method is used to parse the response as JSON data, and the `for` loop is used to iterate over the list of repositories returned in the response.

For each repository, the code is printing the repository name and the number of stars it has. The `print()` function is used to display the output.

This code is using the GitHub API to retrieve a list of repositories for the user "octocat" and then printing the name and number of stars for each repository..</s>"

Please write a short title summarizing the message quoted above. Use no more than 10 words:
[/INST]

############################################

Completion: 

 ""The only way to do great work is to love what you do." ― Steve Job

from continue.

sestinj commented on June 18, 2024

Thanks. My suspicion is wrong... but...I see the problem! This is actually fixable through the config file, but I'll change the default to be the correct thing and push a new version soon

There is a template_messages property of all LLM classes that converts chat history into a templated prompt, and the function I have as the default for HuggingFaceTGI is cutting out the chat history. The correct thing would look like this:

from continuedev.src.continuedev.libs.llm.prompts.chat import llama2_template_messages
...
...
default=HuggingFaceTGI(..., template_messages=llama2_template_messages)

from continue.

sestinj commented on June 18, 2024

@abhinavkulkarni just released a new version, this is now the default so now highlighted code will be included

from continue.

abhinavkulkarni commented on June 18, 2024

Thanks, @sestinj, things work perfectly now, except for one small detail. The title generated seems to be random and has nothing to do with the prompt. I am attaching an example screengrab here.

Also attaching all the prompt/completion pairs.

prompt-completion.txt

from continue.

sestinj commented on June 18, 2024

Which model are you using? I can then just test out the exact prompt here until I find something more reliable

The prompt looks ok, other than the end token. Adding a stop_tokens option in the LLM class and there's a small chance that fixes it, but likely not

from continue.

sestinj commented on June 18, 2024

Also relevant for now might be the "disable_summaries" option in config.py depending on how bad it is: https://continue.dev/docs/reference/config#:~:text=token%20is%20provided.-,disable_summaries,-(boolean)%20%3D%20False

from continue.

abhinavkulkarni commented on June 18, 2024

Hey @sestinj,

Also relevant for now might be the "disable_summaries" option in config.py

Thanks, that works.

Which model are you using?

I am using a 4-bit AWQ quantized version of codellama/CodeLlama-7b-Instruct-hf, but you won't be able to run it on CPU (I read it in one of your previous replies that you were running these on a Mac). If so, you may want to test it with a 4-bit GGML/GGUF version of this model to see if you too are getting random quotes as titles.

Another problem I have observed is that the last character in the completion tends to be repeated - so if it is a period or an exclamation mark, it is repeated. If I feed the same prompt to my local TGI using curl, I don't get this repetition.

Here's the screengrab attached:

from continue.

sestinj commented on June 18, 2024

Ok, cool. I'll see what I can find. Seems like Continue is just extra excited lol !!

from continue.

sestinj commented on June 18, 2024

@abhinavkulkarni just wanted to update you on this since I know it's been a while - I've been planning on potentially using LiteLLM to make API calls to different providers, such as HuggingFace TGI, and this would solve the above problem, so I've decided to postpone digging into it myself. I'll let you know as soon as there's an update here!

Also thought you might want to know this since I talked to them and they mentioned that you were a contributor : )

from continue.

krrishdholakia commented on June 18, 2024

👋 @abhinavkulkarni

from continue.

HuggingFace TGI Codellama support about continue HOT 26 OPEN

Comments (26)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent