huggingface / llm-vscode Goto Github PK

View Code? Open in Web Editor NEW

1.2K 22.0 125.0 282 KB

LLM powered development for VSCode

License: Apache License 2.0

JavaScript 8.23% TypeScript 91.77%

llm-vscode's Introduction

LLM powered development for VSCode

llm-vscode is an extension for all things LLM. It uses llm-ls as its backend.

We also have extensions for:

Previously huggingface-vscode.

Note

When using the Inference API, you will probably encounter some limitations. Subscribe to the PRO plan to avoid getting rate limited in the free tier.

https://huggingface.co/pricing#pro

Features

Code completion

This plugin supports "ghost-text" code completion, à la Copilot.

Choose your model

Requests for code generation are made via an HTTP request.

You can use the Hugging Face Inference API or your own HTTP endpoint, provided it adheres to the APIs listed in backend.

The list of officially supported models is located in the config template section.

Always fit within the context window

The prompt sent to the model will always be sized to fit within the context window, with the number of tokens determined using tokenizers.

Code attribution

Hit Cmd+shift+a to check if the generated code is in The Stack. This is a rapid first-pass attribution check using stack.dataportraits.org. We check for sequences of at least 50 characters that match a Bloom filter. This means false positives are possible and long enough surrounding context is necesssary (see the paper for details on n-gram striding and sequence length). The dedicated Stack search tool is a full dataset index and can be used for a complete second pass.

Installation

Install like any other vscode extension.

By default, this extension uses bigcode/starcoder & Hugging Face Inference API for the inference.

HF API token

You can supply your HF API token (hf.co/settings/token) with this command:

Cmd/Ctrl+Shift+P to open VSCode command palette
Type: Llm: Login

If you previously logged in with huggingface-cli login on your system the extension will read the token from disk.

Configuration

You can check the full list of configuration settings by opening your settings page (cmd+,) and typing Llm.

Backend

You can configure the backend to which requests will be sent. llm-vscode supports the following backends:

huggingface: The Hugging Face Inference API (default)
ollama: Ollama
openai: any OpenAI compatible API (e.g. llama-cpp-python)
tgi: Text Generation Inference

Let's say your current code is this:

import numpy as np
import scipy as sp
{YOUR_CURSOR_POSITION}
def hello_world():
    print("Hello world")

The request body will then look like:

const inputs = `{start token}import numpy as np\nimport scipy as sp\n{end token}def hello_world():\n    print("Hello world"){middle token}`
const data = { inputs, ...configuration.requestBody };

const model = configuration.modelId;
let endpoint;
switch(configuration.backend) {
    // cf URL construction
    let endpoint = build_url(configuration);
}

const res = await fetch(endpoint, {
    body: JSON.stringify(data),
    headers,
    method: "POST"
});

const json = await res.json() as { generated_text: string };

Note that the example above is a simplified version to explain what is happening under the hood.

URL construction

The endpoint URL that is queried to fetch suggestions is build the following way:

depending on the backend, it will try to append the correct path to the base URL located in the configuration (e.g. {url}/v1/completions for the openai backend)
if no URL is set for the huggingface backend, it will automatically use the default URL
- it will error for other backends as there is no sensible default URL
if you do set the correct path at the end of the URL it will not add it a second time as it checks if it is already present
there is an option to disable this behavior: llm.disableUrlPathCompletion

Suggestion behavior

You can tune the way the suggestions behave:

llm.enableAutoSuggest lets you choose to enable or disable "suggest-as-you-type" suggestions.
llm.documentFilter lets you enable suggestions only on specific files that match the pattern matching syntax you will provide. The object must be of type DocumentFilter | DocumentFilter[]:
- to match on all types of buffers: llm.documentFilter: { pattern: "**" }
- to match on all files in my_project/: llm.documentFilter: { pattern: "/path/to/my_project/**" }
- to match on all python and rust files: llm.documentFilter: { pattern: "**/*.{py,rs}" }

Keybindings

llm-vscode sets two keybindings:

you can trigger suggestions with Cmd+shift+l by default, which corresponds to the editor.action.inlineSuggest.trigger command
code attribution is set to Cmd+shift+a by default, which corresponds to the llm.attribution command

llm-ls

By default, llm-ls is bundled with the extension. When developing locally or if you built your own binary because your platform is not supported, you can set the llm.lsp.binaryPath setting to the path of the binary.

Tokenizer

llm-ls uses tokenizers to make sure the prompt fits the context_window.

To configure it, you have a few options:

No tokenization, llm-ls will count the number of characters instead:

{
  "llm.tokenizer": null
}

from a local file on your disk:

{
  "llm.tokenizer": {
    "path": "/path/to/my/tokenizer.json"
  }
}

from a Hugging Face repository, llm-ls will attempt to download tokenizer.json at the root of the repository:

{
  "llm.tokenizer": {
    "repository": "myusername/myrepo",
    "api_token": null,
  }
}

Note: when api_token is set to null, it will use the token you set with Llm: Login command. If you want to use a different token, you can set it here.

from an HTTP endpoint, llm-ls will attempt to download a file via an HTTP GET request:

{
  "llm.tokenizer": {
    "url": "https://my-endpoint.example.com/mytokenizer.json",
    "to": "/download/path/of/mytokenizer.json"
  }
}

Code Llama

To test Code Llama 13B model:

Make sure you have the latest version of this extension.
Make sure you have supplied HF API token
Open Vscode Settings (cmd+,) & type: Llm: Config Template
From the dropdown menu, choose hf/codellama/CodeLlama-13b-hf

Phind and WizardCoder

To test Phind/Phind-CodeLlama-34B-v2 and/or WizardLM/WizardCoder-Python-34B-V1.0 :

Make sure you have the latest version of this extension.
Make sure you have supplied HF API token
Open Vscode Settings (cmd+,) & type: Llm: Config Template
From the dropdown menu, choose hf/Phind/Phind-CodeLlama-34B-v2 or hf/WizardLM/WizardCoder-Python-34B-V1.0

Read more about Phind-CodeLlama-34B-v2 here and WizardCoder-15B-V1.0 here.

Developing

Clone llm-ls: git clone https://github.com/huggingface/llm-ls
Build llm-ls: cd llm-ls && cargo build (you can also use cargo build --release for a release build)
Clone this repo: git clone https://github.com/huggingface/llm-vscode
Install deps: cd llm-vscode && npm ci
In vscode, open Run and Debug side bar & click Launch Extension
In the new vscode window, set the llm.lsp.binaryPath setting to the path of the llm-ls binary you built in step 2 (e.g. /path/to/llm-ls/target/debug/llm-ls)
Close the window and restart the extension with F5 or like in 5.

Community

Repository	Description
huggingface-vscode-endpoint-server	Custom code generation endpoint for this repository
llm-vscode-inference-server	An endpoint server for efficiently serving quantized open-source LLMs for code.

llm-vscode's People

Contributors

Stargazers

Watchers

Forkers

swayaminsync qirui-chen-9 penny-admixture stillerman ruyimarone moejd ntubiolin mohamadmansourx lj-court tzengwei good-fat shuyuhuang bazhang87 infomaniak lucienshui ethicalsecurity-agency touristshaun rockq ryanolson laurent2916 aduo foolishsailor sylvestre sudarshan-kamath borjagodoy storminstakk jazken joinai131419 mi4uu salvares79 paulermlersap diogenesbr saibaldas kenwalker22 alfasignde ilyajob05 lamecube wadelu xyh1999 wonpangnew romanstingler shohamy7 jeffwang0516 as-suvorov huaxlin brandonjbjelland matthiasgeihs bethington mchorfa orderofmagnitude1 quaddarv1ne nicolas-arnaise xianjianfans nashid rch datakult0r deepu-89 sundogs8603 capcodigitalengineering homanmirgolbabaee cluna80 tjtanaa pacman100 prawel osanseviero fromct xiwen426 mickolka pandagopal ducphuc wissamantoun rahulvramesh clnye nikolarobottesla omashish bbye913 wangcx18 yodatech1988 pjahad yuryyakhno jihyun-nv yujun-8848 bufferoverflow rastna12 brucecc icnahom elmo1032 davidpissarra dorucioclea brewdxyz siliciuss mattgpt antoinejeannot wsobolewski marxlp noahbald marchdown simon-stone hennerm longhoang888

llm-vscode's Issues

Is it possible to run "Code attribution" against a selfhosted API?

I was just thinking about using the "Code attribution" feature to run a search against a company internal code database instead of "TheStack". That way i could simply mark some code lines im currently writing and look up if something similar was already written in a different projekt/repo.

I don't know how complicated it would be to make the endpoint configurable that gets hit if i request a "Code attribution" but i guess just beeing able to define a different endpoint there should be enought. 🤔

Sending input to the server only with a trigger key

As I understand every time I write a new line in vscode, huggingface extension send the code input to the server. But I want to send the input only when I pressed a specific key. So I want to set trigger key for huggingface model completion, I couldn't manage to find a configuration for this. Do you have this kind of feature ?

Getting results in C# and ASP.net core

Hi,
I'm trying to use this extension for ASP.net core or C# languages.
However I'm keep getting results in Python.
Is there a way to configure it to specific language?

Thank you
Moshe

Keyboard shortcut to trigger completion

I'd like to trigger completion only when I press a certain keyboard shortcut. Could a configuration option for this be added?

Publish `HF Code Autocomplete` to Open VSX

Dear extension author,
Please publish this extension to the Open VSX marketplace.

Context

Unfortunately, as Microsoft prohibits usages of the Microsoft marketplace by any other products or redistribution of .vsix files from it, in order to use VS Code extensions in non-Microsoft products, we kindly ask that you take ownership of the VS Code extension namespace in Open VSX and publish this extension on Open VSX.

What is Open VSX? Why does it exist?

Open VSX is a vendor neutral alternative to the MS marketplace used by most other derivatives of VS Code like VSCodium, Gitpod, OpenVSCode, Theia-based IDEs, and so on.

You can read on about Open VSX at the Eclipse Foundation's Open VSX FAQ.

How can you publish to Open VSX?

The docs to publish an extension can be found here. This process is straightforward and shouldn't take too long. Essentially, you need an authentication token and to execute the ovsx publish command to publish your extension. There's also a doc explaining the whole process with an example GitHub Action workflow.

Alternative plugin for Jetbrains IntelliJ?

Hi,

great stuff! Do you have a plan or do you know about any existing efforts to build the extension for IntelliJ?

I think it can be also made out of the existing project https://github.com/codota/tabnine-intellij similar to this repo.

Thanks!

Installing extension does not seem to work

When I try to install the extension in VS Code, I only get the TabNine extension installed. Are there any steps to solve that?
I already tried to install manually from the VSIX file, but only get TabNine.

VS Code version is 1.79.2

In which part can I turn this into a version using openai chatgpt api?

As title, I want to build an auto-complete extension for coding, and I want to use my OpenAI API for completion with ChatGPT. Which part in this repo should I dig into first?

Version for Visual Studio

Hello, I was wondering if the same extension could be built also for Visual Studio, maybe is in the plans?

Thank you

middle token and end token in the wrong place

Here is my output

INPUT to API: (with parameters {"max_new_tokens":60,"temperature":0.2,"do_sample":true,"top_p":0.95,"stop":["<|endoftext|>"]}) 
<fim_prefix># language: Python
def qsort(l: list[int]) -> list[int]:
    if 1 <= len(l):
        p = l[0]
        l1 = [x for x in l if x < p]
        l2 = [x for x in l if x == p]
        l3 = [x for x in l if x > p]
        return l1+l2+l3
    else:
        return <fim_suffix>

# Example for calling qsort

l = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
print(qsort(l))

# Output: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
<fim_middle>

I did not change the setting of this extensions.
It seems fim_middle ad fim_suffix should be exchanged.
Here is the description about it in the setting:

String that is sent to server is in format: {startToken}{code above cursor}{middleToken}{code below cursor if isFillMode=true}{endToken}.

Just getting TabNine - nothing for StableCode

Windows 11, using the visix i get tabnine and command pallet dows not recognize hugging face..

Fill-in-the-Middle (FIM) Mode not working

Hello,

Thank you so much for this awesome open source alternative!

I am having an issue with the FIM option of Starcoder. "Is Fill Mode" box is ticked but the Hugging Face Code Output Console only shows/sends to the API the upper half of the code, no matter where I set the cursor. I press enter in the middle of the code but only the upper part is sent.

I am using a custom local endpoint: http://127.0.0.1:4000/

VSCode version:
Version: 1.78.0 (user setup)
Commit: 252e5463d60e63238250799aef7375787f68b4ee
Date: 2023-05-03T20:09:00.748Z
Electron: 22.4.8
Chromium: 108.0.5359.215
Node.js: 16.17.1
V8: 10.8.168.25-electron.0
OS: Windows_NT x64 10.0.19044
Sandboxed: No

Many thanks,

Custom endpoint server

Hi, I wrote a program that serving the model as the backend of this vscode plugin.
Model generation not working yet due to something about configuration I guess, but api's format is correct and tested.
I'll make it working soon.

LucienShui/huggingface-vscode-endpoint-server

For Jupyter Notebooks: send other cells to the inference endpoint

This is a nice-to-have feature: this works with Jupyter Notebooks, using the Microsoft Jupyter extensions. But, the Input sent to the inference endpoint just has the current cell, according to the extension log.

Catastrophic typo in README?

I'm trying to write a C api adapter using libcurl so that I can add completion support to gnu nano, and ran into this.

const inputs = `{start token}import numpy as np\nimport scipy as sp\n{end token}def hello_world():\n    print("Hello world"){middle token}`

Is this meant to be {start token}text{middle token}text{end token}? Also, what are the textual representations of these tokens? I've seen other repos with <fim-prefix>, <fim_suffix>, etc. Is this something that was changed after the project was forked?

Feature Request: Enable or disable the extension per file-extension

For people using Copilot and HuggingFace Code simultaneously, this is handy for people like me using HuggingFace code with a custom backend with Galactica to write .tex files, and Copilot for the rest.

My Custom Galactica repo for people interested: https://github.com/WissamAntoun/galactica-vscode-vllm

Setting up API token gives error

I installed the extension on my VScode in MacOS and when I paste the token I get the error

Command 'Hugging Face Code: Set API token' resulted in an error (Password is required.)

This is my first time using this extension. Can someone guide me on how to solve this issue

Client is not running and can't be stopped. It's current state is: starting

Getting this error inside VScode WSL / Ubuntu.
It works fine in VScode windows

mainThreadExtensionService.ts:81 [HuggingFace.huggingface-vscode]Client is not running and can't be stopped. It's current state is: starting

Middle/Suffix swapped?

Version 0.29

It appears that <fim_suffix> and <fim_middle> token are in the wrong place.

Here is the inference request to the server:

2023-05-10T19:43:59.305424Z  INFO HTTP request{otel.name=POST /generate http.client_ip= http.flavor=1.1 http.host=a100:8080 http.method=POST http.route=/generate http.scheme=HTTP http.target=/generate http.user_agent=node-fetch otel.kind=server trace_id=d98de1f9236c4ef40447153215489d34}:generate{req=Json(GenerateRequest { inputs: "<fim_prefix>\n\ndef test():\n    # basic test function\n    x = 1 + 1\n    assert x == 2\n\ndef test1()\n    y = 1 + 3\n    assert y == 4\n\ndef  test2():\n    z = 1 + 5\n    assert z == 6\n\ndef test_3():\n    a = 1 + 7\n    assert a == 8\n\ndef test_4():\n    x = 2 + 2\n    assert x ==  <fim_suffix>\n\ndef test5():\n    pass<fim_middle>", parameters: GenerateParameters { best_of: None, temperature: Some(0.2), repetition_penalty: None, top_k: None, top_p: Some(0.95), typical_p: None, do_sample: true, max_new_tokens: 60, return_full_text: None, stop: ["<|endoftext|>"], truncate: None, watermark: false, details: false, seed: None } }) total_time="51.925919ms" validation_time="812.655µs" queue_time="67.947µs" inference_time="51.045468ms" time_per_token="51.045468ms" seed="Some(2882359693777549143)"}: text_generation_router::server: router/src/server.rs:285: Output:

Which matches the output from "hugging face code" in vs code:

INPUT to API: (with parameters {"max_new_tokens":60,"temperature":0.2,"do_sample":true,"top_p":0.95,"stop":["<|endoftext|>"]}) 
<fim_prefix>

def test():
    # basic test function
    x = 1 + 1
    assert x == 2

def test1()
    y = 1 + 3
    assert y == 4

def  test2():
    z = 1 + 5
    assert z == 6

def test_3():
    a = 1 + 7
    assert a == 8

def test_4():
    x = 2 + 2
    assert x ==  <fim_suffix>

def test5():
    pass<fim_middle>
OUTPUT from API:

The cursor is at <fim_suffix>

`GLIBC_2.32' not found ubuntu 20.4

Hi, I had the extension running fine for the last couple of month, now suddenly it doesn't work anymore.
On start vscode produce these messages:

[Error - 07:11:09] Server initialization failed.
  Message: Cannot call write after a stream was destroyed
  Code: -32099 
[Error - 07:11:09] LLM VS Code client: couldn't create connection to server.
  Message: Cannot call write after a stream was destroyed
  Code: -32099 
[Info  - 07:11:09] Connection to server got closed. Server will restart.
true
/home/es/.vscode/extensions/huggingface.huggingface-vscode-0.1.0-linux-x64/server/llm-ls: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.32' not found (required by /home/es/.vscode/extensions/huggingface.huggingface-vscode-0.1.0-linux-x64/server/llm-ls)
/home/es/.vscode/extensions/huggingface.huggingface-vscode-0.1.0-linux-x64/server/llm-ls: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.33' not found (required by /home/es/.vscode/extensions/huggingface.huggingface-vscode-0.1.0-linux-x64/server/llm-ls)
/home/es/.vscode/extensions/huggingface.huggingface-vscode-0.1.0-linux-x64/server/llm-ls: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.34' not found (required by /home/es/.vscode/extensions/huggingface.huggingface-vscode-0.1.0-linux-x64/server/llm-ls)

I'm on ubuntu 20.4 and I have version 2.31 lrwxrwxrwx 1 root root 12 apr 7 2022 /lib/x86_64-linux-gnu/libc.so.6 -> libc-2.31.so*
and it is the vesion it ships with

About hit Ctrl+Esc to check if the generated code is in in [The Stack]

The Ctrl+Esc combination key is used by default in Win11, which can cause conflicts.

windows-specific path problems?

When I try to use the extension with a custom code-completion server, I get the following error code.
Die Syntax für den Dateinamen, Verzeichnisnamen oder die Datenträgerbezeichnung ist falsch. (os error 123)
I am using a Windows OS, might that problem be related to this? How can I solve it?

fim tokens not always embedded in prompt + top_p

Observation

I noticed that the vs code request uses top_p set to 0.95, but the default curl command to the latest inference server defaults that value to None. This seems to have a negative effect on the performance of the starcoder model.

Similarly, I noticed that the fim start/middle/suffix tokens are not always included in the prompt.

Possible Solutions

Always use fim tokens
Change the default value of top_p to null or at least expose it as a parameter
Enable more options in the extensions, specifically,
- stop tokens -- it's nice to add \n as a stop token so you only generate a single line
- max_new_tokens -- set to line width / 120-ish

Single line completions feel a bit better, which could be accomplished using a single \n stop token. If we could have use multi-tab or a new hot key, then multi-line would feel better. That is, use tab to complete the current line; multiple tabs will subsequently complete another line of the suggested auto-complete, and finally a special hot key to accept the full multi-line suggestion.

Details

In VS Code if I have a simple python file


    def test():
        x=1+1
        assert x ==

I'm seeing the following request on the server; note: the fim tokens are not included:

2023-05-10T22:34:38.711415Z  INFO HTTP request{otel.name=POST /generate http.client_ip= http.flavor=1.1 http.host=a100:8080 http.method=POST http.route=/generate http.scheme=HTTP http.target=/generate http.user_agent=node-fetch otel.kind=server trace_id=44d3a824707eb2d84d040afb9fa8ac59}:generate{req=Json(GenerateRequest { inputs: "\n    def test():\n        x=1+1\n        assert x == ", parameters: GenerateParameters { best_of: None, temperature: None, repetition_penalty: None, top_k: None, top_p: Some(0.95), typical_p: None, do_sample: false, max_new_tokens: 60, return_full_text: None, stop: ["<|endoftext|>"], truncate: None, watermark: false, details: false, seed: None } }) total_time="33.124231ms" validation_time="438.753µs" queue_time="80.04µs" inference_time="32.605648ms" time_per_token="32.605648ms" seed="Some(9473460394149812587)"}: text_generation_router::server: router/src/server.rs:285: Output:

The corresponding output on the console is:

INPUT to API: (with parameters {"max_new_tokens":60,"temperature":null,"do_sample":false,"top_p":0.95,"stop":["<|endoftext|>"]}) 

    def test():
        x=1+1
        assert x == 
OUTPUT from API:

The output is a \n.

Now if I make the call to the server using curl with the fim tokens in place (note, i'm adding an extra stop token):

deepops@a100:~$ time curl 127.0.0.1:8080/generate     -X POST     -d '{"inputs":"<fim_prefix>\n    def test():\n        x=1+1\n        assert x == <fim_middle><fim_suffix>","parameters":{"max_new_tokens":60,"stop":["<|endoftext|>","\n"]}}'     
-H 'Content-Type: application/json'
{"generated_text":"  def test():\n        x=1+1\n        assert x == 2\n"}

I see the corresponding server-side output:

2023-05-10T22:12:14.591559Z  INFO HTTP request{otel.name=POST /generate http.client_ip= http.flavor=1.1 http.host=127.0.0.1:8080 http.method=POST http.route=/generate http.scheme=HTTP http.target=/generate http.user_agent=curl/7.68.0 otel.kind=server trace_id=9b28be13203a06e56e54cfea2cf21d21}:generate{req=Json(GenerateRequest { inputs: "<fim_prefix>\n    def test():\n        x=1+1\n        assert x == <fim_middle><fim_suffix>", parameters: GenerateParameters { best_of: None, temperature: None, repetition_penalty: None, top_k: None, top_p: None, typical_p: None, do_sample: false, max_new_tokens: 60, return_full_text: None, stop: ["<|endoftext|>", "\n"], truncate: None, watermark: false, details: false, seed: None } }) total_time="448.996747ms" validation_time="353.56µs" queue_time="66.814µs" inference_time="448.576563ms" time_per_token="26.386856ms" seed="None"}: text_generation_router::server: router/src/server.rs:285: Output:   def test():
        x=1+1
        assert x == 2

Note: top_p: 0.95 when issued via curl, which is the default setting of the latest version of the text-generation-inference server

deepops@a100:~$ time curl 127.0.0.1:8080/generate     -X POST     -d '{"inputs":"\n    def test():\n        x=1+1\n        assert x == ","parameters":{"max_new_tokens":60,"stop":["<|endoftext|>"],"top_p":0.95}}'     -H 'Content-Type: applicatio
n/json'
{"generated_text":""}

server-side:

2023-05-10T22:37:01.085903Z  INFO HTTP request{otel.name=POST /generate http.client_ip= http.flavor=1.1 http.host=127.0.0.1:8080 http.method=POST http.route=/generate http.scheme=HTTP http.target=/generate http.user_agent=curl/7.68.0 otel.kind=server trace_id=b72534db620649bd7c296957040cb18c}:generate{req=Json(GenerateRequest { inputs: "\n    def test():\n        x=1+1\n        assert x == ", parameters: GenerateParameters { best_of: None, temperature: None, repetition_penalty: None, top_k: None, top_p: Some(0.95), typical_p: None, do_sample: false, max_new_tokens: 60, return_full_text: None, stop: ["<|endoftext|>"], truncate: None, watermark: false, details: false, seed: None } }) total_time="32.852386ms" validation_time="332.854µs" queue_time="60.754µs" inference_time="32.458948ms" time_per_token="32.458948ms" seed="Some(14293755121722667098)"}: text_generation_router::server: router/src/server.rs:285: Output:

Getting a "failed to find document error"

When I try to use the extension in a jupyter notebook, I get the "failed to find document error".

I previously had this error on files that weren't saved. Once I saved the files to disk that error went away.

In this case, I have renamed the file and saved it to disk. I changed the log level, and here is a snippet of the error (removing my token) from the log file set at info from: ~/.cache/llm_ls/llm-ls.log

Running this on macbook (12.2.1 ) using vscode (1.82.2)

{"timestamp":"2023-09-25T18:46:10.486229Z","level":"INFO","message":"get_completions CompletionParams { text_document_position: TextDocumentPositionParams { text_document: TextDocumentIdentifier { uri: Url { scheme: \"vscode-notebook-cell\", cannot_be_a_base: false, username: \"\", password: None, host: None, port: None, path: \"/Users/rajivshah/testnotebook.ipynb\", query: None, fragment: Some(\"W4sZmlsZQ%3D%3D\") } }, position: Position { line: 2, character: 8 } }, request_params: RequestParams { max_new_tokens: 60, temperature: 0.2, do_sample: true, top_p: 0.95, stop_tokens: None }, ide: VSCode, fim: FimParams { enabled: true, prefix: \"<fim_prefix>\", middle: \"<fim_middle>\", suffix: \"<fim_suffix>\" }, api_token: Some(\"hf_Q*****WeNfdwfx\"), model: \"bigcode/starcoder\", tokens_to_clear: [\"<|endoftext|>\"], tokenizer_config: None, context_window: 8192, tls_skip_verify_insecure: false }","target":"llm_ls","line_number":451}
{"timestamp":"2023-09-25T18:46:10.486395Z","level":"ERROR","err_msg":"failed to find document","target":"llm_ls","line_number":177}

Inference on marked text

Can you please add functionality of marking text as an input to the model?

Support different languages than python (like TypeScript)

Thanks for creating this extension!
I tried it with Python like the example in the README - and it auto-completes the code as expected.
StartCoder supports 80+ languages - including TypeScript.
Even the playground allows me to complete TypeScript code.
So it will be nice to add support to those languages in the VSCode extension - which only worked for me in Python.
Thanks!

Default request starcoder error：error decoding response body: expected value at line 1 column 1

Working with ollama or llama.cpp

With the publication of codellama, it became possible to run LLM on a local machine using ollama or llama.cpp.
How to configure your extension to work with local codellama?

I have deployed the codel-7b-instruct service locally, how do I use it in HF

Publish `EXTENSION_NAME` to Open VSX

Dear extension author,
Please publish this extension to the Open VSX marketplace.

Context

What is Open VSX? Why does it exist?

Open VSX is a vendor neutral alternative to the MS marketplace used by most other derivatives of VS Code like VSCodium, Gitpod, OpenVSCode, Theia-based IDEs, and so on.

You can read on about Open VSX at the Eclipse Foundation's Open VSX FAQ.

How can you publish to Open VSX?

Privacy concerns

This may not be the place, but I would like to know if it would be possible to have some configuration about allowing data sharing, or at least inform that the model that is used or the API that is used is the one who manages these aspects.

On the other hand I would like to ask if anyone knows if bigcode somehow stores sent data because I have not found anything about this, only about the use of the model but not about the data used in the inference API used by this repository.

Thanks in advance.

Developing Step 2: Install Deps failing

Following the setup steps in the README I get as far as Developing Step 2

I installed yarn, cloned the git repo and can cd into it. N.B. Not an experienced user of node.js / yarn.

Have pasted the terminal readout having run

yarn install --frozen-lockfile

➤ YN0050: The --frozen-lockfile option is deprecated; use --immutable and/or --immutable-cache instead
➤ YN0000: ┌ Resolution step
➤ YN0061: │ vscode-test@npm:1.6.1 is deprecated: This package has been renamed to @vscode/test-electron, please update to the new name
➤ YN0061: │ vsce@npm:1.93.0 is deprecated: vsce has been renamed to @vscode/vsce. Install using @vscode/vsce instead.
➤ YN0061: │ vscode-extension-telemetry@npm:0.1.7 is deprecated: This package has been renamed to @vscode/extension-telemetry, please update to the new name
➤ YN0061: │ vsce@npm:2.14.0 is deprecated: vsce has been renamed to @vscode/vsce. Install using @vscode/vsce instead.
➤ YN0032: │ keytar@npm:7.9.0: Implicit dependencies on node-gyp are discouraged
➤ YN0032: │ node-addon-api@npm:4.3.0: Implicit dependencies on node-gyp are discouraged
➤ YN0061: │ @npmcli/move-file@npm:2.0.1 is deprecated: This functionality has been moved to @npmcli/fs
➤ YN0032: │ fsevents@npm:2.3.2: Implicit dependencies on node-gyp are discouraged
➤ YN0002: │ eslint-config-airbnb-typescript@npm:12.3.1 doesn't provide eslint (p6e2aa), requested by eslint-config-airbnb-base
➤ YN0002: │ eslint-config-airbnb-typescript@npm:12.3.1 doesn't provide eslint (p32c28), requested by @typescript-eslint/parser
➤ YN0002: │ eslint-config-airbnb-typescript@npm:12.3.1 doesn't provide eslint (p1d9db), requested by eslint-config-airbnb
➤ YN0002: │ eslint-config-airbnb-typescript@npm:12.3.1 doesn't provide eslint-plugin-import (p4f5e3), requested by eslint-config-airbnb-base
➤ YN0002: │ eslint-config-airbnb-typescript@npm:12.3.1 doesn't provide eslint-plugin-import (p98c99), requested by eslint-config-airbnb
➤ YN0002: │ eslint-config-airbnb-typescript@npm:12.3.1 doesn't provide eslint-plugin-jsx-a11y (p220ec), requested by eslint-config-airbnb
➤ YN0002: │ eslint-config-airbnb-typescript@npm:12.3.1 doesn't provide eslint-plugin-react (p92df3), requested by eslint-config-airbnb
➤ YN0002: │ eslint-config-airbnb-typescript@npm:12.3.1 doesn't provide eslint-plugin-react-hooks (pa6140), requested by eslint-config-airbnb
➤ YN0060: │ huggingface-vscode@workspace:. provides @typescript-eslint/parser (p3b6ed) with version 4.22.0, which doesn't satisfy what @typescript-eslint/eslint-plugin requests
➤ YN0060: │ huggingface-vscode@workspace:. provides eslint (pd8c35) with version 8.28.0, which doesn't satisfy what @typescript-eslint/parser requests
➤ YN0000: │ Some peer dependencies are incorrectly met; run yarn explain peer-requirements for details, where is the six-letter p-prefixed code
➤ YN0000: └ Completed in 13s 77ms
➤ YN0000: ┌ Post-resolution validation
➤ YN0028: │ The lockfile would have been modified by this install, which is explicitly forbidden.
➤ YN0000: └ Completed in 0s 209ms
➤ YN0000: Failed with errors in 13s 296ms

Cant set token

Type: Bug

Stuck at activating

Extension version: 0.0.35
VS Code version: Code 1.81.1 (6c3e3dba23e8fadc360aed75ce363ba185c49794, 2023-08-09T22:40:25.698Z)
OS version: Darwin arm64 22.5.0
Modes:

System Info

Item	Value
CPUs	Apple M1 Pro (8 x 24)
GPU Status	2d_canvas: enabled canvas_oop_rasterization: disabled_off direct_rendering_display_compositor: disabled_off_ok gpu_compositing: enabled metal: disabled_off multiple_raster_threads: enabled_on opengl: enabled_on rasterization: enabled raw_draw: disabled_off_ok video_decode: enabled video_encode: enabled vulkan: disabled_off webgl: enabled webgl2: enabled webgpu: enabled
Load (avg)	6, 7, 6
Memory (System)	16.00GB (0.07GB free)
Process Argv	--crash-reporter-id 74b4bb98-db5a-4060-9da5-ebbdc3365b5c
Screen Reader	no
VM	0%

A/B Experiments

vsliv368cf:30146710
vsreu685:30147344
python383cf:30185419
vspor879:30202332
vspor708:30202333
vspor363:30204092
vslsvsres303:30308271
vserr242:30382549
pythontb:30283811
vsjup518:30340749
pythonptprofiler:30281270
vshan820:30294714
vstes263cf:30335440
vscoreces:30445986
vscod805:30301674
binariesv615:30325510
bridge0708:30335490
bridge0723:30353136
vsaa593cf:30376535
pythonvs932:30410667
py29gd2263:30792226
vsclangdf:30486550
c4g48928:30535728
dsvsc012:30540252
pynewext54:30695312
azure-dev_surveyone:30548225
vsccc:30803844
3biah626:30602489
f6dab269:30613381
showlangstatbar:30737416
0bi6i642:30823812
03d35959:30757346
pythonfmttext:30731395
fixshowwlkth:30771522
showindicator:30805244
pythongtdpath:30769146
i26e3531:30792625
pythonnosmt12:30797651
pythonidxpt:30805730
pythonnoceb:30805159
synctok:30821570
dsvsc013:30795093
dsvsc014:30804076
diffeditorv2:30821572

error trying to connect: invalid certificate: UnknownIssuer

I'm getting error trying to connect: invalid certificate: UnknownIssuer when llm-vscode is configured to use a self-hosted TGI endpoint. Is there a way to disable certificate checking or take the environment variable REQUESTS_CA_BUNDLE?

Should yarn be under package.json devDependencies ?

HuggingFaceCode::setApiToken not found

I'm getting the following error when trying to set my HF API token:

Version: 1.78.0 (Universal)
Commit: 252e5463d60e63238250799aef7375787f68b4ee
Date: 2023-05-03T20:11:00.813Z (4 days ago)
Electron: 22.4.8
Chromium: 108.0.5359.215
Node.js: 16.17.1
V8: 10.8.168.25-electron.0
OS: Darwin x64 22.4.0
Sandboxed: No```

Error to set up API Key GBUS Related

System: Nixos 22.04
DE: i3 + xfce4
vscode version: 1.73.1

I've tried to use it but it gives this error so it's unusable at this point at least for me.

Consider ignore built vsix and move building and releasing to Github Action

This would make life easier for contributors.

Requires Tabnine

There is a collision with tabnine installation

Checking if the generated code is in The Stack shortcut not working

Hi,
I have done testing of the extension installation steps and then tried Cmd+shift+a shortcut. It didn't work. I changed system boundings for this shortcut, but it does nothing anyway.
Have I missed something?

How to authenticate?

Hello,
You are saying here "You can supply your HF API token (hf.co/settings/token) with this command:" but you literally not providing any command.
I've installed the extension, I've generated the token, but I have not the slightest idea where to insert it.
How do I get to that window where "Hugging Face Code: Set API token" appears?
Thank you

Client is not running

That's what I get instead of autocompletion, whenever I type:

Quite cryptic TBH.

Runtime status:

Uncaught Errors (9)
write EPIPE
Client is not running and can't be stopped. It's current state is: starting
write EPIPE
write EPIPE
Client is not running and can't be stopped. It's current state is: starting
Pending response rejected since connection got disposed
Client is not running and can't be stopped. It's current state is: starting
Client is not running and can't be stopped. It's current state is: startFailed
Client is not running and can't be stopped. It's current state is: startFailed

Using Windows 11 with my local LM Studio server.

What happened to CodeLlaMa-34B?

When I try to select this model in the extension, I get a 503 error. When I went to the repository, I couldn't find this model there. Where did it go?

Unable to set API token

I've tried run the command below but getting not found error for the command

Cmd/Ctrl+Shift+P to open VSCode command palette
Type: Hugging Face Code: Set API token

Error 'HuggingFaceCode::setApiToken' not found in VSCode

While Type: Hugging Face Code: Set API token in VSCode (Mac OS) getting
Command 'Hugging Face Code: Set API token' resulted in an error (command 'HuggingFaceCode::setApiToken' not found)

Integrate LSP (Language Server Protocol) to improve suggestions

LSPs are spun up by most modern languages and IDEs.

So any way that can improve suggestions by taking the contexts from the language server such as the dependencies currently in the project can help filter out the suggestions. For example, suggesting fastapi when it's there in project dependency.

It can also help filter out junk suggestions through features, for example in case developers haven't explicitly declared a function with the types through LSP the model may be aware of it.

Request Feature for Huggingchat integration

Hello HuggingFace team!

I came across a deprecated VSCode extension for HuggingChat, but it doesn't seem to be actively maintained. I believe integrating HuggingChat into the official HuggingFace extension would benefit many users like me who rely on these tools for their daily work.

Thank you for considering my feature request! I look forward to hearing your thoughts and feedback.

Getting Input validation error

Keep getting this error message when the llm-vscode extension is running.

Input validation error: `inputs` tokens + `max_new_tokens` must be <= 2048. Given: 3474 `inputs` tokens and 60 `max_new_tokens`

the vsix doesn't work?,how to fix it

i download the vsix from https://marketplace.visualstudio.com/items?itemName=HuggingFace.huggingface-vscode&ssr=false#version-history，but in vscode when i installed it ,it doesn't work 。could you fix this?

Disabling fill mode does not disable fill mode

I'm on v0.0.30 of the HuggingFace VSCode plugin.
I'm trying to use the plugin with some models that do not support FIM.
However, after I disabled FIM in the settings, the plugin continues to send requests in FIM format to the server.

huggingface / llm-vscode Goto Github PK

llm-vscode's Introduction

LLM powered development for VSCode

Features

Code completion

Choose your model

Always fit within the context window

Code attribution

Installation

HF API token

Configuration

Backend

URL construction

Suggestion behavior

Keybindings

Tokenizer

Code Llama

Phind and WizardCoder

Developing

Community

llm-vscode's People

Contributors

Stargazers

Watchers

Forkers

llm-vscode's Issues

Context

What is Open VSX? Why does it exist?

How can you publish to Open VSX?

Observation

Possible Solutions

Details

Context

What is Open VSX? Why does it exist?

How can you publish to Open VSX?

Recommend Projects

Recommend Topics

Recommend Org