tabbyml / tabby Goto Github PK

Self-hosted AI coding assistant

License: Other

Python 8.54% Makefile 0.28% Shell 0.33% JavaScript 0.18% Rust 87.13% Scheme 0.45% PowerShell 0.01% HTML 2.32% TypeScript 0.77%

ai codegen coding-assistant coding-language developer-experience developer-tools gen-ai ide llms

tabby's People

Contributors

Stargazers

Watchers

Forkers

icycodes kustomzone jithinraj lilleswing seanreynoldscs uakbr artisr michaelkofron mlashcorp meticulo3366 aitips kokizzu mehdi-alouane tomchapin raxbits jeffhsu3 gauravdeshmukhgw wang-shun ultim8k dumpmemory weiplanet husseinlezzaik esmevane hbcbh1999 phaiax daxang x-tfrk mattzheng tnahddisttud bogdanberce justizin emadyay leonardomanzella localexpert tleers co-simulation vineedkaladharan fineboy0407 avineshwar hasokeric kenny-ngo hertera1 alexsiera sombochea apollohuang1 contropist zzmjohn casiai jonas-schmitt webclinic017 woyin thy3368 leeseon jaedukseo annihilatorrrr rsohlot klei22 techthiyanes cellinlab goswamig techventurebuilder daphne97 suryoabdi edwardsjenny compileok vasiache powmeister julienze hhy5277 hj3938 taihe4copilot lucoo01 theibrarkhan ssghost berjusti celinefu jrcribb alejandrosuarez eltociear ahmed-ali saved-repos curiosity007 topper-crypto dhee2211 tiago-peres app-creative takao-h ravenclawer alexjuca nova-land bgdnchik richardsonjf alienmckoon corlixa agt glennfu mbrukman cyzhou314 ai-ld codefarmerman

tabby's Issues

Support fine-tuning model in Tabby

DAG config:
https://github.com/TabbyML/tabby/blob/main/tabby/tasks/trainer.yaml

Training:
https://github.com/TabbyML/tabby/blob/main/tabby/tools/trainer

Add model training in dag pipeline.

Add plugin for IntelliJ based ides

There are currently plugins for VIM and VS Code, it would be great to have one for pycharm as well

Please reply with a 👍 if you want this feature.

The attention mask and the pad token id were not set.

I follow the quick start, then visit the http://127.0.0.1:5000/_admin/

it shows
triton | down server,vector,dagu | live
Congrats, your server is live!
To get started with Tabby, you can either install the extensions below or use the Editor.

but when i go to Editor, it can't complete for my code.
I go to the log, and got this：

2023-04-07 09:45:20,415 DEBG 'server' stderr output:
As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.

2023-04-07 09:45:20,415 DEBG 'server' stderr output:
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.

what should i do

CUDA runtime error: operation not supported

Hi,

I tried running the docker version mentioned in the README, only to be greeted with the following startup error, in a loop. nvidia-container-toolkit 1.12.1 is correctly installed.

2023-04-07 19:15:35,308 DEBG 'triton' stderr output:
terminate called after throwing an instance of 'std::runtime_error'
  what():  [FT][ERROR] CUDA runtime error: operation not supported /workspace/build/fastertransformer_backend/build/_deps/repo-ft-src/src/fastertransformer/utils/allocator.h:160
[c8d62a9a73a2:00460] *** Process received signal ***
[c8d62a9a73a2:00460] Signal: Aborted (6)
[c8d62a9a73a2:00460] Signal code:  (-6)

2023-04-07 19:15:35,308 DEBG 'triton' stderr output:
[c8d62a9a73a2:00460] [ 0] /usr/lib/x86_64-linux-gnu/libpthread.so.0(+0x14420)[0x7f61d6d24420]
[c8d62a9a73a2:00460] [ 1] /usr/lib/x86_64-linux-gnu/libc.so.6(gsignal+0xcb)[0x7f61d55af00b]
[c8d62a9a73a2:00460] [ 2] /usr/lib/x86_64-linux-gnu/libc.so.6(abort+0x12b)[0x7f61d558e859]
[c8d62a9a73a2:00460] [ 3] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0x9e911)[0x7f61d5968911]
[c8d62a9a73a2:00460] [ 4] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xaa38c)[0x7f61d597438c]
[c8d62a9a73a2:00460] [ 5] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xaa3f7)[0x7f61d59743f7]
[c8d62a9a73a2:00460] [ 6] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xaa6a9)[0x7f61d59746a9]
[c8d62a9a73a2:00460] [ 7] 

2023-04-07 19:15:35,308 DEBG 'triton' stderr output:
/opt/tritonserver/backends/fastertransformer/libtransformer-shared.so(_ZN17fastertransformer5checkI9cudaErrorEEvT_PKcS4_i+0x219)[0x7f617b00c0f9]
[c8d62a9a73a2:00460] [ 8] /opt/tritonserver/backends/fastertransformer/libtransformer-shared.so(_ZN17fastertransformer9AllocatorILNS_13AllocatorTypeE0EEC1Ei+0x123)[0x7f617b048d73]
[c8d62a9a73a2:00460] [ 9] /opt/tritonserver/backends/fastertransformer/libtransformer-shared.so(_ZN15GptJTritonModelI13__nv_bfloat16E19createModelInstanceEiiP11CUstream_stSt4pairISt6vectorIN17fastertransformer9NcclParamESaIS7_EES9_ESt10shared_ptrINS6_18AbstractCustomCommEE+0xa7)[0x7f617b108967]
[c8d62a9a73a2:00460] [10] /opt/tritonserver/backends/fastertransformer/libtriton_fastertransformer.so(+0x19b88)[0x7f61d0632b88]
[c8d62a9a73a2:00460] [11] /opt/tritonserver/backends/fastertransformer/libtriton_fastertransformer.so(+0x1a473)[0x7f61d0633473]
[c8d62a9a73a2:00460] [12] /opt/tritonserver/backends/fastertransformer/libtriton_fastertransformer.so(+0x3c29e)[0x7f61d065529e]
[c8d62a9a73a2:00460] [13] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xd6de4)[0x7f61d59a0de4]
[c8d62a9a73a2:00460] [14] /usr/lib/x86_64-linux-gnu/libpthread.so.0(+0x8609)[0x7f61d6d18609]
[c8d62a9a73a2:00460] [15] 
2023-04-07 19:15:35,309 DEBG 'triton' stderr output:
/usr/lib/x86_64-linux-gnu/libc.so.6(clone+0x43)[0x7f61d568b133]
[c8d62a9a73a2:00460] *** End of error message ***

Apple Silicon Support ( w/ CPU inference)

Currently, we only support inference on CUDA. However, it would be advantageous to expand our support to include M1/M2, as many developers work on Mac laptops.

We'd like to revisited the feature once ggml became stable

References

Please reply with a 👍 if you want this feature.

Providing a docker deployment option

It'll be nice if tabby can be deployed in a single docker run - just for a quick try-on.

Add a filter for server-returned completions

The client should filter out unuseful completions before displaying them, such as:

Duplicated with the content that the user has written in lines after the cursor.

error: creating server: Internal - failed to load all models

according https://github.com/TabbyML/tabby#get-started-server to start a docker container not running

Nvidia GPU is Tesla P40 GPU Driver is 470.57.02

Failed to start DAG: exit status 1/runner: entry failed analytic: exit status 1

Describe the bug
running tabby server but client cant output anything generate

Information about your GPU

Additional context
2023-05-09 12:27:00,172 DEBG 'dagu_scheduler' stdout output:
2023/05/09 12:27:00 Failed to start DAG: exit status 1

2023-05-09 12:27:00,172 DEBG 'dagu_scheduler' stdout output:
2023/05/09 12:27:00 runner: entry failed analytic: exit status 1

2023-05-09 12:30:00,059 DEBG 'dagu_scheduler' stdout output:
2023/05/09 12:30:00 [2023-05-09 12:30:00] start analytic

2023-05-09 12:30:00,063 DEBG 'dagu_scheduler' stdout output:
2023/05/09 12:30:00 server is running at "/tmp/@dagu-analytic-af09eed2a725067a7f323f1fc0f93c29.sock"

2023-05-09 12:30:00,063 DEBG 'dagu_scheduler' stdout output:
2023/05/09 12:30:00 start running: Collect Tabby

2023-05-09 12:30:00,137 DEBG 'dagu_scheduler' stdout output:
2023/05/09 12:30:00 Collect Tabby failed

2023-05-09 12:30:00,164 DEBG 'dagu_scheduler' stdout output:
2023/05/09 12:30:00 schedule finished.

2023-05-09 12:30:00,168 DEBG 'dagu_scheduler' stdout output:

replit model converter for ctranslate2 backend

Looking at https://huggingface.co/replit/replit-code-v1-3b/blob/main/replit_lm.py, it seems to be a standard GPT model adapted from https://github.com/mosaicml/examples/blob/52cd4fef69497f225a034fcd10692f8613732d10/examples/llm/src/models/mosaic_gpt/mosaic_gpt.py

code snippet in completion request

Meilisearch has now been integrated into #85, enabling us to experiment with including relevant code snippets.

Related flags:
* [] FLAGS_enable_meilisearch
* [ ] FLAGS_rewrite_prompt_with_search_snippet

Create a monaco editor front end w/ streamlit, and connect it to tabby server.

To be used as a page in admin server.

Error when I launch docker on Mac M1

2023-04-11 16:42:16,242 DEBG 'admin' stderr output:
<jemalloc>: MADV_DONTNEED does not work (memset will be used instead)
<jemalloc>: (This is the expected behaviour if you are running under QEMU)

2023-04-11 16:42:20,620 DEBG 'admin' stdout output:

Collecting usage statistics. To deactivate, set browser.gatherUsageStats to False.


2023-04-11 16:42:20,628 DEBG 'admin' stderr output:
Traceback (most recent call last):

2023-04-11 16:42:20,631 DEBG 'admin' stderr output:
  File "/home/app/.pyenv/versions/3.10.10/bin/streamlit", line 8, in <module>
    sys.exit(main())
  File "/home/app/.pyenv/versions/3.10.10/lib/python3.10/site-packages/click/core.py", line 1130, in __call__

2023-04-11 16:42:20,633 DEBG 'admin' stderr output:
    return self.main(*args, **kwargs)
  File "/home/app/.pyenv/versions/3.10.10/lib/python3.10/site-packages/click/core.py", line 1055, in main

2023-04-11 16:42:20,637 DEBG 'admin' stderr output:
    rv = self.invoke(ctx)
  File "/home/app/.pyenv/versions/3.10.10/lib/python3.10/site-packages/click/core.py", line 1657, in invoke

2023-04-11 16:42:20,641 DEBG 'admin' stderr output:
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/app/.pyenv/versions/3.10.10/lib/python3.10/site-packages/click/core.py", line 1404, in invoke

2023-04-11 16:42:20,644 DEBG 'admin' stderr output:
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/app/.pyenv/versions/3.10.10/lib/python3.10/site-packages/click/core.py", line 760, in invoke

2023-04-11 16:42:20,649 DEBG 'admin' stderr output:
    return __callback(*args, **kwargs)
  File "/home/app/.pyenv/versions/3.10.10/lib/python3.10/site-packages/streamlit/web/cli.py", line 209, in main_run
    _main_run(target, args, flag_options=kwargs)
  File "/home/app/.pyenv/versions/3.10.10/lib/python3.10/site-packages/streamlit/web/cli.py", line 245, in _main_run

2023-04-11 16:42:20,656 DEBG 'admin' stderr output:
    bootstrap.run(file, command_line, args, flag_options)
  File "/home/app/.pyenv/versions/3.10.10/lib/python3.10/site-packages/streamlit/web/bootstrap.py", line 397, in run
    _install_pages_watcher(main_script_path)
  File "/home/app/.pyenv/versions/3.10.10/lib/python3.10/site-packages/streamlit/web/bootstrap.py", line 373, in _install_pages_watcher
    watch_dir(
  File "/home/app/.pyenv/versions/3.10.10/lib/python3.10/site-packages/streamlit/watcher/path_watcher.py", line 153, in watch_dir
    return _watch_path(
  File "/home/app/.pyenv/versions/3.10.10/lib/python3.10/site-packages/streamlit/watcher/path_watcher.py", line 128, in _watch_path
    watcher_class(
  File "/home/app/.pyenv/versions/3.10.10/lib/python3.10/site-packages/streamlit/watcher/event_based_path_watcher.py", line 92, in __init__

2023-04-11 16:42:20,667 DEBG 'admin' stderr output:
    path_watcher.watch_path(
  File "/home/app/.pyenv/versions/3.10.10/lib/python3.10/site-packages/streamlit/watcher/event_based_path_watcher.py", line 170, in watch_path
    folder_handler.watch = self._observer.schedule(
  File "/home/app/.pyenv/versions/3.10.10/lib/python3.10/site-packages/watchdog/observers/api.py", line 301, in schedule
    emitter.start()
  File "/home/app/.pyenv/versions/3.10.10/lib/python3.10/site-packages/watchdog/utils/__init__.py", line 92, in start
    self.on_thread_start()
  File "/home/app/.pyenv/versions/3.10.10/lib/python3.10/site-packages/watchdog/observers/inotify.py", line 119, in on_thread_start
    self._inotify = InotifyBuffer(path, self.watch.is_recursive)
  File "/home/app/.pyenv/versions/3.10.10/lib/python3.10/site-packages/watchdog/observers/inotify_buffer.py", line 37, in __init__
    self._inotify = Inotify(path, recursive)
  File "/home/app/.pyenv/versions/3.10.10/lib/python3.10/site-packages/watchdog/observers/inotify_c.py", line 167, in __init__
    Inotify._raise_error()
  File "/home/app/.pyenv/versions/3.10.10/lib/python3.10/site-packages/watchdog/observers/inotify_c.py", line 432, in _raise_error
    raise OSError(err, os.strerror(err))
OSError: [Errno 38] Function not implemented

2023-04-11 16:42:20,997 DEBG fd 6 closed, stopped monitoring <POutputDispatcher at 274928023968 for <Subprocess at 274928023872 with name admin in state RUNNING> (stdout)>
2023-04-11 16:42:20,999 DEBG fd 8 closed, stopped monitoring <POutputDispatcher at 274928472448 for <Subprocess at 274928023872 with name admin in state RUNNING> (stderr)>
2023-04-11 16:42:21,001 WARN exited: admin (exit status 1; not expected)
2023-04-11 16:42:21,003 DEBG received SIGCHLD indicating a child quit
2023-04-11 16:42:22,009 INFO spawned: 'admin' with pid 930
2023-04-11 16:42:23,018 INFO success: admin entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)

The following command are used to launch docker, with chown -R $USER data

docker run \
  -it --rm \
  -v "/$(pwd)/data:/data" \
  -v "/$(pwd)/data/hf_cache:/home/app/.cache/huggingface" \
  -p 5001:5001 \
  --platform linux/amd64 \
  -e MODEL_NAME=TabbyML/J-350M \
  tabbyml/tabby

Support run tabby without docker

Would like clearer instructions on how one would run this on an ubuntu machine locally without using docker.

Additional context
NVIDIA/nvidia-container-toolkit#229
https://forums.docker.com/t/nvidia-cuda-doesnt-work-on-docker-desktop-but-works-on-docker-engine/130668/5

Please reply with a 👍 if you want this feature.

Add list of supported languages

Please describe the feature you want
It would be nice if we could have list of supported languages in docs.

Please reply with a 👍 if you want this feature.

Build an agent lib for sharing client logic in IDE extensions

The agent lib should hold http client and common client-side logic and provide a json IO interface to work with native part for each IDE extension.
This should start with extracting node_scripts from current VIM plugin.

VSCode Extension should handle unsupported language

When a languageId is not presented in openai specs (http://localhost:5000/openai.json), it should drop the request.

How to deploy locally

How to deploy locally In Linux

using cpu mode

Hey guys,

Is there a specific reason why you removed the CPU mode from the project readme's description? If access to GPU is not available, is there an alternative way to run the project?

Improve the Completion Cache

Adding API-level cache Key: hash(completion request) -> Value: completion response at client-side should improve response speed, and reduce API requests and model invocation.
This should be useful in the case of user typing backspace.

Related to #130.

422 Unprocessable Entity

Describe the bug
I'm trying to use it and I'm not getting a hint. Launched via docker

vscode

console

Information about your GPU
Please provide output of nvidia-smi

nvidia-smi

Additional context
Add any other context about the problem here.

Manjaro

VIM Client

It'll be nice if we could support completion in vim.

Maybe starts with an omni func: https://vim.fandom.com/wiki/Omni_completion

Build with compute capability 6.1

Nice to have access to such a great open source project. I tried running it on my tesla p40 but failed. Can you please provide the compilation method of the model so that I can try to compile it myself using compute capability 6.1.
By the way, is there a model with more parameters to try?

Vulkan Backend Support for improved device compatibility

Please describe the feature you want
Support for the pytorch vulkan backend so that older nvidia gpus, as well as intell, amd, and some phone gpus can be supported.
https://pytorch.org/tutorials/prototype/vulkan_workflow.html

Additional context
Personally ran into difficulties testing this project, because my laptop is too old to support Nvidia, and my cloud accounts aren't authorized to deploy GPU compute. I imagine I am not the only one limited on working on this project by these kinds of lim facs.

Please reply with a 👍 if you want this feature.

Securing access to both the admin panel and the server

Please describe the feature you want

How would one achieve securing the admin panel as well as the server. How would we be able to setup the vscode extension to provide such authentication?

tabby.toml usage

Is there any direction on how to use the tabby.toml file to add additional projects? Specifically if they're in a private repo?

Would love for it to be documented in the README.md.

Error when I run the Docker Hub container or build from scratch

2023-04-09 00:05:43,013 DEBG 'triton' stderr output:
terminate called after throwing an instance of 'std::runtime_error'
  what():  [FT][ERROR] CUDA runtime error: the provided PTX was compiled with an unsupported toolchain. /workspace/build/fastertransformer_backend/build/_deps/repo-ft-src/src/fastertransformer/utils/cuda_utils.h:274

[e79652de0dc1:01747] *** Process received signal ***
[e79652de0dc1:01747] Signal: Aborted (6)
[e79652de0dc1:01747] Signal code:  (-6)
[e79652de0dc1:01747] [ 0] /usr/lib/x86_64-linux-gnu/libpthread.so.0(+0x14420)[0x7fbedac13420]
[e79652de0dc1:01747] [ 1] /usr/lib/x86_64-linux-gnu/libc.so.6(gsignal+0xcb)[0x7fbed949e00b]
[e79652de0dc1:01747] [ 2] /usr/lib/x86_64-linux-gnu/libc.so.6(abort+0x12b)[0x7fbed947d859]
[e79652de0dc1:01747] [ 3] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0x9e911)[0x7fbed9857911]
[e79652de0dc1:01747] [ 4]
2023-04-09 00:05:43,014 DEBG 'triton' stderr output:
/usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xaa38c)[0x7fbed986338c]
[e79652de0dc1:01747] [ 5] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xaa3f7)[0x7fbed98633f7]
[e79652de0dc1:01747] [ 6] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xaa6a9)[0x7fbed98636a9]
[e79652de0dc1:01747] [ 7] /opt/tritonserver/backends/fastertransformer/libtriton_fastertransformer.so(+0x3b949)[0x7fbebc686949]
[e79652de0dc1:01747] [ 8] /opt/tritonserver/backends/fastertransformer/libtriton_fastertransformer.so(+0x20f65)[0x7fbebc66bf65]
[e79652de0dc1:01747] [ 9] /opt/tritonserver/backends/fastertransformer/libtriton_fastertransformer.so(+0x2d794)[0x7fbebc678794]
[e79652de0dc1:01747] [10] /opt/tritonserver/backends/fastertransformer/libtriton_fastertransformer.so(TRITONBACKEND_ModelInitialize+0x38d)[0x7fbebc678e0d]
[e79652de0dc1:01747] [11] /opt/tritonserver/bin/../lib/libtritonserver.so(+0x10689b)[0x7fbed9d4889b]
[e79652de0dc1:01747] [12] /opt/tritonserver/bin/../lib/libtritonserver.so(+0x1c4f5d)[0x7fbed9e06f5d]
[e79652de0dc1:01747] [13] /opt/tritonserver/bin/../lib/libtritonserver.so(+0x1caccd)[0x7fbed9e0cccd]
[e79652de0dc1:01747] [14] /opt/tritonserver/bin/../lib/libtritonserver.so(+0x3083a0)[0x7fbed9f4a3a0]
[e79652de0dc1:01747] [15] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xd6de4)[0x7fbed988fde4]
[e79652de0dc1:01747] [16] /usr/lib/x86_64-linux-gnu/libpthread.so.0(+0x8609)[0x7fbedac07609]
[e79652de0dc1:01747] [17] /usr/lib/x86_64-linux-gnu/libc.so.6(clone+0x43)[0x7fbed957a133]
[e79652de0dc1:01747] *** End of error message ***

Typescript support

Please describe the feature you want
Support typescript in tabby

backend change
extension support

Additional context
Add any other context or screenshots about the feature request here.

Please reply with a 👍 if you want this feature.

When input characters according to the displayed inline suggestion, the remaining part should keep unchanged.

In the case of when some inline suggestion has shown up, and i do not use <Tab> to accept full suggestion. Instead, I input characters according to suggestion, or use Accept Word(<Ctl+→>) or Accept Line in VSCode, the remaining part of inline suggestion should keep unchanged.

Please reply with a 👍 if you want this feature.

Error when using relative paths for Docker data volumes

Following the README.md and using Docker version 20.10.21, build baeda1f on Mac M1 with the following commands:

docker run \
  -it --rm \
  -v ./data:/data \
  -v ./data/hf_cache:/home/app/.cache/huggingface \
  -p 5000:5000 \
  -e MODEL_NAME=TabbyML/J-350M \
  tabbyml/tabby

leads to the following error:

docker: Error response from daemon: create ./data: "./data" includes invalid characters for a local volume name, only "[a-zA-Z0-9][a-zA-Z0-9_.-]" are allowed. If you intended to pass a host directory, use absolute path.

SOLUTION

docker run \    
  -it --rm \
  -v "/$(pwd)/data:/data" \
  -v "/$(pwd)/data/hf_cache:/home/app/.cache/huggingface" \
  -p 5000:5000 \
  --platform linux/amd64 \
  -e MODEL_NAME=TabbyML/J-350M \
  tabbyml/tabby

for explanation, see:
https://docs.docker.com/storage/bind-mounts/
https://stackoverflow.com/questions/46526165/docker-invalid-characters-for-volume-when-using-relative-paths

Deploy Tabby as huggingface space application

Vim plugin should not invoke completion request with cursor moving key strokes.

The Vim plugin currently invokes a completion request every time the cursor moves in INSERT mode. This can cause confusion when attempting to simply move the cursor through a word or code block without any intention of editing it.

Like other completion engines, the Vim plugin should not invoke a completion request under these circumstances.

Metrics panel in admin

Collect tabby-server events.
Inject events into duck db.
VIsualize the result with duck db in admin panel

Mac Silicon supported?

Hello,

I tried it on my Mac OS and I got this error:

https://app.warp.dev/block/0irl2mLPBzNRJCWZIXUtFI

Any help?

Cannot create directory /data/config

On running the docker script

mkdir -p data/hf_cache && chown -R 1000 data

docker run \
  --gpus all \
  -it --rm \
  -v "/$(pwd)/data:/data" \
  -v "/$(pwd)/data/hf_cache:/home/app/.cache/huggingface" \
  -p 5000:5000 \
  -e MODEL_NAME=TabbyML/J-350M \
  -e MODEL_BACKEND=triton \
  --name=tabby \
  tabbyml/tabby

I get the error

mkdir: cannot create directory '/data/config': Permission denied

Similar to issue 58. But I didn't use sudo and I don't see anything in the script pointing to an absolute path /data/config.

I'm running NixOS with an Nvidia 1060.

Install of Tabby VSCode Extension

Please describe the feature you want

Is there some instruction how to install Tabby VSCode Extension ?
I searching in the repository or on the Tabby site but didn't find nothing.

Clearer instructions on how to install

Sorry I'm not a professional programmer, but the read me is really unhelpful. Need clearer detail to install

VSCode extension should handle view event

When an inline suggestion appears on the UI, Tabby clients should send a view event to the server.
We currently need a proposed VSCode API to accomplish this.
Waiting for an update from VSCode to implement this.

tabbyml vscode extension open-vsx.org share request

The Visual Studio Code Marketplace is not available where the OSS version of vscode is used due to licensing issues. As an alternative, open-vsx.org is heavily used. Please register the tabby extension with open-vsx.org as well.

Is adding apple silicon support possible?

it seems that you use TensorFlow, so it should be fairly trivial, and m1/m2 should be powerful enough to run it, probably.

Ruby Support

Please describe the feature you want
Support Ruby in tabby

backend change
extension support

Additional context Add any other context or screenshots about the feature request here.

I believe that the LanguagePresets should look like this:

    Language.RUBY: LanguagePreset(
        max_length=128,
        stop_words=[
            "\n\n",
            "\ndef",
            "\n#",
            "\nrequire",
            "\ninclude",
            "\nclass",
            "\nmodule",
        ],
    ),

However I don't know how to implement the extension support.

Please reply with a 👍 if you want this feature.

Not getting any completions

I tried tabby in a browser and in vscode and I'm not getting any completions:

No requests are sent from vscode:

Server logs for that time:

2023-04-07 09:59:43,953 DEBG 'server' stdout output:
INFO:     10.0.2.100:0 - "POST /v1/completions HTTP/1.1" 200 OK



2023-04-07 10:00:00,060 DEBG 'dagu_scheduler' stdout output:
2023/04/07 10:00:00 [2023-04-07 10:00:00] start analytic

2023-04-07 10:00:00,066 DEBG 'dagu_scheduler' stdout output:
2023/04/07 10:00:00 server is running at "/tmp/@dagu-analytic-af09eed2a725067a7f323f1fc0f93c29.sock"

2023-04-07 10:00:00,067 DEBG 'dagu_scheduler' stdout output:
2023/04/07 10:00:00 start running: Collect Tabby

2023-04-07 10:00:00,161 DEBG 'dagu_scheduler' stdout output:
2023/04/07 10:00:00 Collect Tabby failed

2023-04-07 10:00:00,167 DEBG 'dagu_scheduler' stdout output:
2023/04/07 10:00:00 schedule finished.

2023-04-07 10:00:00,167 DEBG 'dagu_scheduler' stdout output:
2023/04/07 10:00:00
Summary ->
+--------------------------------------+----------+---------------------+---------------------+--------+--------+---------------+
| REQUESTID                            | NAME     | STARTED AT          | FINISHED AT         | STATUS | PARAMS | ERROR         |
+--------------------------------------+----------+---------------------+---------------------+--------+--------+---------------+
| 1a9305ea-f32c-4865-995b-ed5c809c6b1f | analytic | 2023-04-07 10:00:00 | 2023-04-07 10:00:00 | failed |        | exit status 1 |
+--------------------------------------+----------+---------------------+---------------------+--------+--------+---------------+
Details ->
+---+---------------+---------------------+---------------------+--------+----------------------------------------------------------+---------------+
| # | STEP          | STARTED AT          | FINISHED AT         | STATUS | COMMAND                                                  | ERROR         |
+---+---------------+---------------------+---------------------+--------+----------------------------------------------------------+---------------+
| 1 | Collect Tabby | 2023-04-07 10:00:00 | 2023-04-07 10:00:00 | failed | ./tabby/tools/analytic/main.sh collect_tabby_server_logs | exit status 1 |
+---+---------------+---------------------+---------------------+--------+----------------------------------------------------------+---------------+

2023-04-07 10:00:00,168 DEBG 'dagu_scheduler' stdout output:
2023/04/07 10:00:00 Failed to start DAG: exit status 1

2023-04-07 10:00:00,168 DEBG 'dagu_scheduler' stdout output:
2023/04/07 10:00:00 runner: entry failed analytic: exit status 1

I can see CPU utilisation going up to 550% for 3-6 seconds, then drops with no results

"mkdir: cannot create directory '/data/config': Operation not permitted" when running Tabby Docker container on Mac M1

Using Mac M1 and the following for running the Docker container:

# Create data dir and grant owner to 1000 (Tabby run as uid 1000 in container)
sudo mkdir -p data/hf_cache && chown -R 1000 data

docker run \
  -it --rm \
  -v "/$(pwd)/data:/data" \
  -v "/$(pwd)/data/hf_cache:/home/app/.cache/huggingface" \
  -p 5000:5000 \
  --platform linux/amd64 \
  -e MODEL_NAME=TabbyML/J-350M \
  tabbyml/tabby

I get this error output:

WARNING: The NVIDIA Driver was not detected. GPU functionality will not be available.
Use the NVIDIA Container Toolkit to start this container with GPU support; see
https://docs.nvidia.com/datacenter/cloud-native/ .

ERROR: This container was built for CPUs supporting at least the AVX instruction set, but
the CPU detected was , which does not report
support for AVX. An Illegal Instrution exception at runtime is likely to result.
See https://en.wikipedia.org/wiki/Advanced_Vector_Extensions#CPUs_with_AVX .

mkdir: cannot create directory '/data/config': Operation not permitted

Is this error related to incompatibility with Mac M1, the docker run command, or something entirely different?

Supports int8 inference for python backend with bitsandbytes

tabby/tabby/server/backend/python.py

Line 34 in be7894a

AutoModelForCausalLM.from_pretrained(

Improve python backend‘s stop words implementation

tabby/tabby/server/backend/python.py

Line 90 in be7894a

# FIXME(meng): trie based lookup.

Maybe prefix tree (trie)

where can I put the model files so it won't be downloading it from huggingface.co

I don't have internet connection on the machine I am about to deploy this project. May I know where should I put the TabbyML/J-350M files so I can successfully run the docker image? Now it's giving me the error message:

'HTTPSConnectionPool(host='huggingface.co', port=443): Max retries exceeded with url: /TabbyML/J-350M/resolve/main/tokenizer_config.json (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7f8ac76d3430>, 'Connection to huggingface.co timed out. (connect timeout=10)'))' thrown while requesting HEAD https://huggingface.co/TabbyML/J-350M/resolve/main/tokenizer_config.json
^CTraceback (most recent call last):
File "/home/app/.pyenv/versions/3.10.10/lib/python3.10/site-packages/transformers/utils/hub.py", line 409, in cached_file
resolved_file = hf_hub_download(
File "/home/app/.pyenv/versions/3.10.10/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 120, in _inner_fn
return fn(*args, **kwargs)
File "/home/app/.pyenv/versions/3.10.10/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1247, in hf_hub_download
raise LocalEntryNotFoundError(
huggingface_hub.utils._errors.LocalEntryNotFoundError: Cannot find the requested files in the disk cache and outgoing traffic has been disabled. To enable hf.co look-ups and downloads online, set 'local_files_only' to False.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/app/tabby/tools/download_models.py", line 41, in
preload(local_files_only=args.prefer_local_files)
File "/home/app/tabby/tools/download_models.py", line 26, in preload
AutoTokenizer.from_pretrained(args.repo_id, local_files_only=local_files_only)
File "/home/app/.pyenv/versions/3.10.10/lib/python3.10/site-packages/transformers/models/auto/tokenization_auto.py", line 634, in from_pretrained
config = AutoConfig.from_pretrained(
File "/home/app/.pyenv/versions/3.10.10/lib/python3.10/site-packages/transformers/models/auto/configuration_auto.py", line 896, in from_pretrained
config_dict, unused_kwargs = PretrainedConfig.get_config_dict(pretrained_model_name_or_path, **kwargs)
File "/home/app/.pyenv/versions/3.10.10/lib/python3.10/site-packages/transformers/configuration_utils.py", line 573, in get_config_dict
config_dict, kwargs = cls._get_config_dict(pretrained_model_name_or_path, **kwargs)
File "/home/app/.pyenv/versions/3.10.10/lib/python3.10/site-packages/transformers/configuration_utils.py", line 628, in _get_config_dict
resolved_config_file = cached_file(
File "/home/app/.pyenv/versions/3.10.10/lib/python3.10/site-packages/transformers/utils/hub.py", line 443, in cached_file
raise EnvironmentError(
OSError: We couldn't connect to 'https://huggingface.co' to load this file, couldn't find it in the cached files and it looks like TabbyML/J-350M is not the path to a directory containing a file named config.json.
Checkout your internet connection or see how to run the library in offline mode at 'https://huggingface.co/docs/transformers/installation#offline-mode'.

Include name flag in docker command under 'Get started'

Thank you for releasing Tabby, very excited to try this!

This is admittedly small potatoes, but it would be nice to include a name flag in the docker command in the readme: --name=tabby (beats 'priceless_agnesi')

Model open-source releases

Python - 450M
Typescript / Javascript - 450M
C++ - 450M
Java - 450M

The script `huggingface_gptneox_convert.py` met problem using tensor parallel.

I try to use the script in this repo to convert gptneox model from huggingface model file to fastertransformer model file.

It worked when I converted files for single-GPU inference. However, when I converted a 2-GPU version of the fastertransformer model file that should work with tensor parallel, it generated nonsensical results.

Model: https://huggingface.co/TabbyML/NeoX-70M

Convert command:

# 1-gpu
python huggingface_gptneox_convert.py \
    -i /input/huggingface/model/path -o /output/fastertransfomrer/model/path -i_g 1 -m_n gptneox
# 2-gpu
python huggingface_gptneox_convert.py \
    -i /input/huggingface/model/path -o /output/fastertransfomrer/model/path -i_g 2 -m_n gptneox

1-gpu FT model result:

[WARNING] gemm_config.in is not found; using default GEMM algo
[FT][WARNING] Skip NCCL initialization since requested tensor/pipeline parallel sizes are equals to 1.
[WARNING] gemm_config.in is not found; using default GEMM algo
[FT][WARNING] Skip NCCL initialization since requested tensor/pipeline parallel sizes are equals to 1.
====================
latency: 0.011725187301635742
--------------------
prompt: 
--------------------
Game start, 
--------------------
output: 
--------------------


The first thing you notice is that the first thing you notice is that

2-gpu FT model resut:

My script did not judge the rank number before print, so the result was printed twice.

[WARNING] gemm_config.in is not found; using default GEMM algo
[WARNING] gemm_config.in is not found; using default GEMM algo
[FT][INFO] NCCL initialized rank=0 world_size=2 tensor_para=NcclParam[rank=0, world_size=2, nccl_comm=0x55b3a743e150] pipeline_para=NcclParam[rank=0, world_size=1, nccl_comm=0x55b3a6d71000]
[FT][INFO] NCCL initialized rank=1 world_size=2 tensor_para=NcclParam[rank=1, world_size=2, nccl_comm=0x557fa70e5f20] pipeline_para=NcclParam[rank=0, world_size=1, nccl_comm=0x557fa6aca340]
[WARNING] gemm_config.in is not found; using default GEMM algo
[WARNING] gemm_config.in is not found; using default GEMM algo
[FT][INFO] NCCL initialized rank=1 world_size=2 tensor_para=NcclParam[rank=1, world_size=2, nccl_comm=0x557fa70e5f20] pipeline_para=NcclParam[rank=0, world_size=1, nccl_comm=0x557fa6aca340]
[FT][INFO] NCCL initialized rank=0 world_size=2 tensor_para=NcclParam[rank=0, world_size=2, nccl_comm=0x55b3a743e150] pipeline_para=NcclParam[rank=0, world_size=1, nccl_comm=0x55b3a6d71000]
====================
latency: 0.011738300323486328
--------------------
prompt: 
--------------------
Game start, 
--------------------
output: 
--------------------
,,,,,,,,,,,,,,,,
====================
latency: 0.011530399322509766
--------------------
prompt: 
--------------------
Game start, 
--------------------
output: 
--------------------
,,,,,,,,,,,,,,,,

attention_mask issue

Describe the bug
Got "attention mask and the pad token id were not set" when trying to use tabby with VSCode extension. When executing API requests or using playground everything working correct.
Console log when got error:

2023-04-12 11:44:42,190 DEBG 'server' stderr output:
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.

2023-04-12 11:44:45,657 DEBG 'server' stdout output:
INFO:     172.17.0.1:0 - "POST /v1/completions HTTP/1.1" 200 OK

Also got the same result when using CPU.

Information about your GPU

Wed Apr 12 14:29:45 2023
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 530.30.02              Driver Version: 531.18       CUDA Version: 12.1     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                  Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf            Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 3060 L...    On | 00000000:01:00.0 Off |                  N/A |
| N/A   46C    P8               12W /  N/A|    179MiB /  6144MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A        22      G   /Xwayland                                 N/A      |
+---------------------------------------------------------------------------------------+

Additional context
Run command:

sudo docker run \
  -it --rm \
  --gpus all \
  -v "/$(pwd)/data:/data" \
  -v "/$(pwd)/data/hf_cache:/home/app/.cache/huggingface" \
  -p 5000:5000 \
  -e MODEL_NAME=TabbyML/J-350M \
  --name=tabby \
  tabbyml/tabby

Using WSL2, Ubuntu 22.04 distribution and Docker Desktop. GPU: Nvidia GeForce RTX 3060 Laptop.

tabbyml / tabby Goto Github PK

tabby's People

Contributors

Stargazers

Watchers

Forkers

tabby's Issues

I don't have internet connection on the machine I am about to deploy this project. May I know where should I put the TabbyML/J-350M files so I can successfully run the docker image? Now it's giving me the error message:

Recommend Projects

Recommend Topics

Recommend Org