Giter Club home page Giter Club logo

tabby's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

tabby's Issues

Ruby Support

Please describe the feature you want
Support Ruby in tabby

  • backend change
  • extension support

Additional context Add any other context or screenshots about the feature request here.

I believe that the LanguagePresets should look like this:

    Language.RUBY: LanguagePreset(
        max_length=128,
        stop_words=[
            "\n\n",
            "\ndef",
            "\n#",
            "\nrequire",
            "\ninclude",
            "\nclass",
            "\nmodule",
        ],
    ),

However I don't know how to implement the extension support.

Please reply with a πŸ‘ if you want this feature.

Not getting any completions

I tried tabby in a browser and in vscode and I'm not getting any completions:

image

No requests are sent from vscode:
image

image

Server logs for that time:

2023-04-07 09:59:43,953 DEBG 'server' stdout output:
INFO:     10.0.2.100:0 - "POST /v1/completions HTTP/1.1" 200 OK



2023-04-07 10:00:00,060 DEBG 'dagu_scheduler' stdout output:
2023/04/07 10:00:00 [2023-04-07 10:00:00] start analytic

2023-04-07 10:00:00,066 DEBG 'dagu_scheduler' stdout output:
2023/04/07 10:00:00 server is running at "/tmp/@dagu-analytic-af09eed2a725067a7f323f1fc0f93c29.sock"

2023-04-07 10:00:00,067 DEBG 'dagu_scheduler' stdout output:
2023/04/07 10:00:00 start running: Collect Tabby

2023-04-07 10:00:00,161 DEBG 'dagu_scheduler' stdout output:
2023/04/07 10:00:00 Collect Tabby failed

2023-04-07 10:00:00,167 DEBG 'dagu_scheduler' stdout output:
2023/04/07 10:00:00 schedule finished.

2023-04-07 10:00:00,167 DEBG 'dagu_scheduler' stdout output:
2023/04/07 10:00:00
Summary ->
+--------------------------------------+----------+---------------------+---------------------+--------+--------+---------------+
| REQUESTID                            | NAME     | STARTED AT          | FINISHED AT         | STATUS | PARAMS | ERROR         |
+--------------------------------------+----------+---------------------+---------------------+--------+--------+---------------+
| 1a9305ea-f32c-4865-995b-ed5c809c6b1f | analytic | 2023-04-07 10:00:00 | 2023-04-07 10:00:00 | failed |        | exit status 1 |
+--------------------------------------+----------+---------------------+---------------------+--------+--------+---------------+
Details ->
+---+---------------+---------------------+---------------------+--------+----------------------------------------------------------+---------------+
| # | STEP          | STARTED AT          | FINISHED AT         | STATUS | COMMAND                                                  | ERROR         |
+---+---------------+---------------------+---------------------+--------+----------------------------------------------------------+---------------+
| 1 | Collect Tabby | 2023-04-07 10:00:00 | 2023-04-07 10:00:00 | failed | ./tabby/tools/analytic/main.sh collect_tabby_server_logs | exit status 1 |
+---+---------------+---------------------+---------------------+--------+----------------------------------------------------------+---------------+

2023-04-07 10:00:00,168 DEBG 'dagu_scheduler' stdout output:
2023/04/07 10:00:00 Failed to start DAG: exit status 1

2023-04-07 10:00:00,168 DEBG 'dagu_scheduler' stdout output:
2023/04/07 10:00:00 runner: entry failed analytic: exit status 1

I can see CPU utilisation going up to 550% for 3-6 seconds, then drops with no results

Add plugin for IntelliJ based ides

There are currently plugins for VIM and VS Code, it would be great to have one for pycharm as well


Please reply with a πŸ‘ if you want this feature.

Error when I run the Docker Hub container or build from scratch

2023-04-09 00:05:43,013 DEBG 'triton' stderr output:
terminate called after throwing an instance of 'std::runtime_error'
  what():  [FT][ERROR] CUDA runtime error: the provided PTX was compiled with an unsupported toolchain. /workspace/build/fastertransformer_backend/build/_deps/repo-ft-src/src/fastertransformer/utils/cuda_utils.h:274

[e79652de0dc1:01747] *** Process received signal ***
[e79652de0dc1:01747] Signal: Aborted (6)
[e79652de0dc1:01747] Signal code:  (-6)
[e79652de0dc1:01747] [ 0] /usr/lib/x86_64-linux-gnu/libpthread.so.0(+0x14420)[0x7fbedac13420]
[e79652de0dc1:01747] [ 1] /usr/lib/x86_64-linux-gnu/libc.so.6(gsignal+0xcb)[0x7fbed949e00b]
[e79652de0dc1:01747] [ 2] /usr/lib/x86_64-linux-gnu/libc.so.6(abort+0x12b)[0x7fbed947d859]
[e79652de0dc1:01747] [ 3] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0x9e911)[0x7fbed9857911]
[e79652de0dc1:01747] [ 4]
2023-04-09 00:05:43,014 DEBG 'triton' stderr output:
/usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xaa38c)[0x7fbed986338c]
[e79652de0dc1:01747] [ 5] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xaa3f7)[0x7fbed98633f7]
[e79652de0dc1:01747] [ 6] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xaa6a9)[0x7fbed98636a9]
[e79652de0dc1:01747] [ 7] /opt/tritonserver/backends/fastertransformer/libtriton_fastertransformer.so(+0x3b949)[0x7fbebc686949]
[e79652de0dc1:01747] [ 8] /opt/tritonserver/backends/fastertransformer/libtriton_fastertransformer.so(+0x20f65)[0x7fbebc66bf65]
[e79652de0dc1:01747] [ 9] /opt/tritonserver/backends/fastertransformer/libtriton_fastertransformer.so(+0x2d794)[0x7fbebc678794]
[e79652de0dc1:01747] [10] /opt/tritonserver/backends/fastertransformer/libtriton_fastertransformer.so(TRITONBACKEND_ModelInitialize+0x38d)[0x7fbebc678e0d]
[e79652de0dc1:01747] [11] /opt/tritonserver/bin/../lib/libtritonserver.so(+0x10689b)[0x7fbed9d4889b]
[e79652de0dc1:01747] [12] /opt/tritonserver/bin/../lib/libtritonserver.so(+0x1c4f5d)[0x7fbed9e06f5d]
[e79652de0dc1:01747] [13] /opt/tritonserver/bin/../lib/libtritonserver.so(+0x1caccd)[0x7fbed9e0cccd]
[e79652de0dc1:01747] [14] /opt/tritonserver/bin/../lib/libtritonserver.so(+0x3083a0)[0x7fbed9f4a3a0]
[e79652de0dc1:01747] [15] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xd6de4)[0x7fbed988fde4]
[e79652de0dc1:01747] [16] /usr/lib/x86_64-linux-gnu/libpthread.so.0(+0x8609)[0x7fbedac07609]
[e79652de0dc1:01747] [17] /usr/lib/x86_64-linux-gnu/libc.so.6(clone+0x43)[0x7fbed957a133]
[e79652de0dc1:01747] *** End of error message ***

Metrics panel in admin

  • Collect tabby-server events.
  • Inject events into duck db.
  • VIsualize the result with duck db in admin panel

Add list of supported languages

Please describe the feature you want
It would be nice if we could have list of supported languages in docs.


Please reply with a πŸ‘ if you want this feature.

code snippet in completion request

Meilisearch has now been integrated into #85, enabling us to experiment with including relevant code snippets.

Related flags:
* [] FLAGS_enable_meilisearch
* [ ] FLAGS_rewrite_prompt_with_search_snippet

tabby.toml usage

Is there any direction on how to use the tabby.toml file to add additional projects? Specifically if they're in a private repo?

Would love for it to be documented in the README.md.

Vulkan Backend Support for improved device compatibility

Please describe the feature you want
Support for the pytorch vulkan backend so that older nvidia gpus, as well as intell, amd, and some phone gpus can be supported.
https://pytorch.org/tutorials/prototype/vulkan_workflow.html

Additional context
Personally ran into difficulties testing this project, because my laptop is too old to support Nvidia, and my cloud accounts aren't authorized to deploy GPU compute. I imagine I am not the only one limited on working on this project by these kinds of lim facs.


Please reply with a πŸ‘ if you want this feature.

Vim plugin should not invoke completion request with cursor moving key strokes.

The Vim plugin currently invokes a completion request every time the cursor moves in INSERT mode. This can cause confusion when attempting to simply move the cursor through a word or code block without any intention of editing it.

Like other completion engines, the Vim plugin should not invoke a completion request under these circumstances.

tabbyml vscode extension open-vsx.org share request

The Visual Studio Code Marketplace is not available where the OSS version of vscode is used due to licensing issues. As an alternative, open-vsx.org is heavily used. Please register the tabby extension with open-vsx.org as well.

where can I put the model files so it won't be downloading it from huggingface.co

I don't have internet connection on the machine I am about to deploy this project. May I know where should I put the TabbyML/J-350M files so I can successfully run the docker image? Now it's giving me the error message:

'HTTPSConnectionPool(host='huggingface.co', port=443): Max retries exceeded with url: /TabbyML/J-350M/resolve/main/tokenizer_config.json (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7f8ac76d3430>, 'Connection to huggingface.co timed out. (connect timeout=10)'))' thrown while requesting HEAD https://huggingface.co/TabbyML/J-350M/resolve/main/tokenizer_config.json
^CTraceback (most recent call last):
File "/home/app/.pyenv/versions/3.10.10/lib/python3.10/site-packages/transformers/utils/hub.py", line 409, in cached_file
resolved_file = hf_hub_download(
File "/home/app/.pyenv/versions/3.10.10/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 120, in _inner_fn
return fn(*args, **kwargs)
File "/home/app/.pyenv/versions/3.10.10/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1247, in hf_hub_download
raise LocalEntryNotFoundError(
huggingface_hub.utils._errors.LocalEntryNotFoundError: Cannot find the requested files in the disk cache and outgoing traffic has been disabled. To enable hf.co look-ups and downloads online, set 'local_files_only' to False.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/app/tabby/tools/download_models.py", line 41, in
preload(local_files_only=args.prefer_local_files)
File "/home/app/tabby/tools/download_models.py", line 26, in preload
AutoTokenizer.from_pretrained(args.repo_id, local_files_only=local_files_only)
File "/home/app/.pyenv/versions/3.10.10/lib/python3.10/site-packages/transformers/models/auto/tokenization_auto.py", line 634, in from_pretrained
config = AutoConfig.from_pretrained(
File "/home/app/.pyenv/versions/3.10.10/lib/python3.10/site-packages/transformers/models/auto/configuration_auto.py", line 896, in from_pretrained
config_dict, unused_kwargs = PretrainedConfig.get_config_dict(pretrained_model_name_or_path, **kwargs)
File "/home/app/.pyenv/versions/3.10.10/lib/python3.10/site-packages/transformers/configuration_utils.py", line 573, in get_config_dict
config_dict, kwargs = cls._get_config_dict(pretrained_model_name_or_path, **kwargs)
File "/home/app/.pyenv/versions/3.10.10/lib/python3.10/site-packages/transformers/configuration_utils.py", line 628, in _get_config_dict
resolved_config_file = cached_file(
File "/home/app/.pyenv/versions/3.10.10/lib/python3.10/site-packages/transformers/utils/hub.py", line 443, in cached_file
raise EnvironmentError(
OSError: We couldn't connect to 'https://huggingface.co' to load this file, couldn't find it in the cached files and it looks like TabbyML/J-350M is not the path to a directory containing a file named config.json.
Checkout your internet connection or see how to run the library in offline mode at 'https://huggingface.co/docs/transformers/installation#offline-mode'.

The script `huggingface_gptneox_convert.py` met problem using tensor parallel.

I try to use the script in this repo to convert gptneox model from huggingface model file to fastertransformer model file.

It worked when I converted files for single-GPU inference. However, when I converted a 2-GPU version of the fastertransformer model file that should work with tensor parallel, it generated nonsensical results.

Model: https://huggingface.co/TabbyML/NeoX-70M

Convert command:

# 1-gpu
python huggingface_gptneox_convert.py \
    -i /input/huggingface/model/path -o /output/fastertransfomrer/model/path -i_g 1 -m_n gptneox
# 2-gpu
python huggingface_gptneox_convert.py \
    -i /input/huggingface/model/path -o /output/fastertransfomrer/model/path -i_g 2 -m_n gptneox

1-gpu FT model result:

[WARNING] gemm_config.in is not found; using default GEMM algo
[FT][WARNING] Skip NCCL initialization since requested tensor/pipeline parallel sizes are equals to 1.
[WARNING] gemm_config.in is not found; using default GEMM algo
[FT][WARNING] Skip NCCL initialization since requested tensor/pipeline parallel sizes are equals to 1.
====================
latency: 0.011725187301635742
--------------------
prompt: 
--------------------
Game start, 
--------------------
output: 
--------------------


The first thing you notice is that the first thing you notice is that

2-gpu FT model resut:

My script did not judge the rank number before print, so the result was printed twice.

[WARNING] gemm_config.in is not found; using default GEMM algo
[WARNING] gemm_config.in is not found; using default GEMM algo
[FT][INFO] NCCL initialized rank=0 world_size=2 tensor_para=NcclParam[rank=0, world_size=2, nccl_comm=0x55b3a743e150] pipeline_para=NcclParam[rank=0, world_size=1, nccl_comm=0x55b3a6d71000]
[FT][INFO] NCCL initialized rank=1 world_size=2 tensor_para=NcclParam[rank=1, world_size=2, nccl_comm=0x557fa70e5f20] pipeline_para=NcclParam[rank=0, world_size=1, nccl_comm=0x557fa6aca340]
[WARNING] gemm_config.in is not found; using default GEMM algo
[WARNING] gemm_config.in is not found; using default GEMM algo
[FT][INFO] NCCL initialized rank=1 world_size=2 tensor_para=NcclParam[rank=1, world_size=2, nccl_comm=0x557fa70e5f20] pipeline_para=NcclParam[rank=0, world_size=1, nccl_comm=0x557fa6aca340]
[FT][INFO] NCCL initialized rank=0 world_size=2 tensor_para=NcclParam[rank=0, world_size=2, nccl_comm=0x55b3a743e150] pipeline_para=NcclParam[rank=0, world_size=1, nccl_comm=0x55b3a6d71000]
====================
latency: 0.011738300323486328
--------------------
prompt: 
--------------------
Game start, 
--------------------
output: 
--------------------
,,,,,,,,,,,,,,,,
====================
latency: 0.011530399322509766
--------------------
prompt: 
--------------------
Game start, 
--------------------
output: 
--------------------
,,,,,,,,,,,,,,,,

Typescript support

Please describe the feature you want
Support typescript in tabby

  • backend change
  • extension support

Additional context
Add any other context or screenshots about the feature request here.


Please reply with a πŸ‘ if you want this feature.

CUDA runtime error: operation not supported

Hi,

I tried running the docker version mentioned in the README, only to be greeted with the following startup error, in a loop. nvidia-container-toolkit 1.12.1 is correctly installed.

2023-04-07 19:15:35,308 DEBG 'triton' stderr output:
terminate called after throwing an instance of 'std::runtime_error'
  what():  [FT][ERROR] CUDA runtime error: operation not supported /workspace/build/fastertransformer_backend/build/_deps/repo-ft-src/src/fastertransformer/utils/allocator.h:160
[c8d62a9a73a2:00460] *** Process received signal ***
[c8d62a9a73a2:00460] Signal: Aborted (6)
[c8d62a9a73a2:00460] Signal code:  (-6)

2023-04-07 19:15:35,308 DEBG 'triton' stderr output:
[c8d62a9a73a2:00460] [ 0] /usr/lib/x86_64-linux-gnu/libpthread.so.0(+0x14420)[0x7f61d6d24420]
[c8d62a9a73a2:00460] [ 1] /usr/lib/x86_64-linux-gnu/libc.so.6(gsignal+0xcb)[0x7f61d55af00b]
[c8d62a9a73a2:00460] [ 2] /usr/lib/x86_64-linux-gnu/libc.so.6(abort+0x12b)[0x7f61d558e859]
[c8d62a9a73a2:00460] [ 3] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0x9e911)[0x7f61d5968911]
[c8d62a9a73a2:00460] [ 4] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xaa38c)[0x7f61d597438c]
[c8d62a9a73a2:00460] [ 5] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xaa3f7)[0x7f61d59743f7]
[c8d62a9a73a2:00460] [ 6] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xaa6a9)[0x7f61d59746a9]
[c8d62a9a73a2:00460] [ 7] 

2023-04-07 19:15:35,308 DEBG 'triton' stderr output:
/opt/tritonserver/backends/fastertransformer/libtransformer-shared.so(_ZN17fastertransformer5checkI9cudaErrorEEvT_PKcS4_i+0x219)[0x7f617b00c0f9]
[c8d62a9a73a2:00460] [ 8] /opt/tritonserver/backends/fastertransformer/libtransformer-shared.so(_ZN17fastertransformer9AllocatorILNS_13AllocatorTypeE0EEC1Ei+0x123)[0x7f617b048d73]
[c8d62a9a73a2:00460] [ 9] /opt/tritonserver/backends/fastertransformer/libtransformer-shared.so(_ZN15GptJTritonModelI13__nv_bfloat16E19createModelInstanceEiiP11CUstream_stSt4pairISt6vectorIN17fastertransformer9NcclParamESaIS7_EES9_ESt10shared_ptrINS6_18AbstractCustomCommEE+0xa7)[0x7f617b108967]
[c8d62a9a73a2:00460] [10] /opt/tritonserver/backends/fastertransformer/libtriton_fastertransformer.so(+0x19b88)[0x7f61d0632b88]
[c8d62a9a73a2:00460] [11] /opt/tritonserver/backends/fastertransformer/libtriton_fastertransformer.so(+0x1a473)[0x7f61d0633473]
[c8d62a9a73a2:00460] [12] /opt/tritonserver/backends/fastertransformer/libtriton_fastertransformer.so(+0x3c29e)[0x7f61d065529e]
[c8d62a9a73a2:00460] [13] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xd6de4)[0x7f61d59a0de4]
[c8d62a9a73a2:00460] [14] /usr/lib/x86_64-linux-gnu/libpthread.so.0(+0x8609)[0x7f61d6d18609]
[c8d62a9a73a2:00460] [15] 
2023-04-07 19:15:35,309 DEBG 'triton' stderr output:
/usr/lib/x86_64-linux-gnu/libc.so.6(clone+0x43)[0x7f61d568b133]
[c8d62a9a73a2:00460] *** End of error message ***

Error when using relative paths for Docker data volumes

Following the README.md and using Docker version 20.10.21, build baeda1f on Mac M1 with the following commands:

docker run \
  -it --rm \
  -v ./data:/data \
  -v ./data/hf_cache:/home/app/.cache/huggingface \
  -p 5000:5000 \
  -e MODEL_NAME=TabbyML/J-350M \
  tabbyml/tabby

leads to the following error:

docker: Error response from daemon: create ./data: "./data" includes invalid characters for a local volume name, only "[a-zA-Z0-9][a-zA-Z0-9_.-]" are allowed. If you intended to pass a host directory, use absolute path.

SOLUTION

docker run \    
  -it --rm \
  -v "/$(pwd)/data:/data" \
  -v "/$(pwd)/data/hf_cache:/home/app/.cache/huggingface" \
  -p 5000:5000 \
  --platform linux/amd64 \
  -e MODEL_NAME=TabbyML/J-350M \
  tabbyml/tabby

for explanation, see:
https://docs.docker.com/storage/bind-mounts/
https://stackoverflow.com/questions/46526165/docker-invalid-characters-for-volume-when-using-relative-paths

Install of Tabby VSCode Extension

Please describe the feature you want

Is there some instruction how to install Tabby VSCode Extension ?
I searching in the repository or on the Tabby site but didn't find nothing.

Build with compute capability 6.1

Nice to have access to such a great open source project. I tried running it on my tesla p40 but failed. Can you please provide the compilation method of the model so that I can try to compile it myself using compute capability 6.1.
By the way, is there a model with more parameters to try?

Error when I launch docker on Mac M1

2023-04-11 16:42:16,242 DEBG 'admin' stderr output:
<jemalloc>: MADV_DONTNEED does not work (memset will be used instead)
<jemalloc>: (This is the expected behaviour if you are running under QEMU)

2023-04-11 16:42:20,620 DEBG 'admin' stdout output:

Collecting usage statistics. To deactivate, set browser.gatherUsageStats to False.


2023-04-11 16:42:20,628 DEBG 'admin' stderr output:
Traceback (most recent call last):

2023-04-11 16:42:20,631 DEBG 'admin' stderr output:
  File "/home/app/.pyenv/versions/3.10.10/bin/streamlit", line 8, in <module>
    sys.exit(main())
  File "/home/app/.pyenv/versions/3.10.10/lib/python3.10/site-packages/click/core.py", line 1130, in __call__

2023-04-11 16:42:20,633 DEBG 'admin' stderr output:
    return self.main(*args, **kwargs)
  File "/home/app/.pyenv/versions/3.10.10/lib/python3.10/site-packages/click/core.py", line 1055, in main

2023-04-11 16:42:20,637 DEBG 'admin' stderr output:
    rv = self.invoke(ctx)
  File "/home/app/.pyenv/versions/3.10.10/lib/python3.10/site-packages/click/core.py", line 1657, in invoke

2023-04-11 16:42:20,641 DEBG 'admin' stderr output:
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/app/.pyenv/versions/3.10.10/lib/python3.10/site-packages/click/core.py", line 1404, in invoke

2023-04-11 16:42:20,644 DEBG 'admin' stderr output:
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/app/.pyenv/versions/3.10.10/lib/python3.10/site-packages/click/core.py", line 760, in invoke

2023-04-11 16:42:20,649 DEBG 'admin' stderr output:
    return __callback(*args, **kwargs)
  File "/home/app/.pyenv/versions/3.10.10/lib/python3.10/site-packages/streamlit/web/cli.py", line 209, in main_run
    _main_run(target, args, flag_options=kwargs)
  File "/home/app/.pyenv/versions/3.10.10/lib/python3.10/site-packages/streamlit/web/cli.py", line 245, in _main_run

2023-04-11 16:42:20,656 DEBG 'admin' stderr output:
    bootstrap.run(file, command_line, args, flag_options)
  File "/home/app/.pyenv/versions/3.10.10/lib/python3.10/site-packages/streamlit/web/bootstrap.py", line 397, in run
    _install_pages_watcher(main_script_path)
  File "/home/app/.pyenv/versions/3.10.10/lib/python3.10/site-packages/streamlit/web/bootstrap.py", line 373, in _install_pages_watcher
    watch_dir(
  File "/home/app/.pyenv/versions/3.10.10/lib/python3.10/site-packages/streamlit/watcher/path_watcher.py", line 153, in watch_dir
    return _watch_path(
  File "/home/app/.pyenv/versions/3.10.10/lib/python3.10/site-packages/streamlit/watcher/path_watcher.py", line 128, in _watch_path
    watcher_class(
  File "/home/app/.pyenv/versions/3.10.10/lib/python3.10/site-packages/streamlit/watcher/event_based_path_watcher.py", line 92, in __init__

2023-04-11 16:42:20,667 DEBG 'admin' stderr output:
    path_watcher.watch_path(
  File "/home/app/.pyenv/versions/3.10.10/lib/python3.10/site-packages/streamlit/watcher/event_based_path_watcher.py", line 170, in watch_path
    folder_handler.watch = self._observer.schedule(
  File "/home/app/.pyenv/versions/3.10.10/lib/python3.10/site-packages/watchdog/observers/api.py", line 301, in schedule
    emitter.start()
  File "/home/app/.pyenv/versions/3.10.10/lib/python3.10/site-packages/watchdog/utils/__init__.py", line 92, in start
    self.on_thread_start()
  File "/home/app/.pyenv/versions/3.10.10/lib/python3.10/site-packages/watchdog/observers/inotify.py", line 119, in on_thread_start
    self._inotify = InotifyBuffer(path, self.watch.is_recursive)
  File "/home/app/.pyenv/versions/3.10.10/lib/python3.10/site-packages/watchdog/observers/inotify_buffer.py", line 37, in __init__
    self._inotify = Inotify(path, recursive)
  File "/home/app/.pyenv/versions/3.10.10/lib/python3.10/site-packages/watchdog/observers/inotify_c.py", line 167, in __init__
    Inotify._raise_error()
  File "/home/app/.pyenv/versions/3.10.10/lib/python3.10/site-packages/watchdog/observers/inotify_c.py", line 432, in _raise_error
    raise OSError(err, os.strerror(err))
OSError: [Errno 38] Function not implemented

2023-04-11 16:42:20,997 DEBG fd 6 closed, stopped monitoring <POutputDispatcher at 274928023968 for <Subprocess at 274928023872 with name admin in state RUNNING> (stdout)>
2023-04-11 16:42:20,999 DEBG fd 8 closed, stopped monitoring <POutputDispatcher at 274928472448 for <Subprocess at 274928023872 with name admin in state RUNNING> (stderr)>
2023-04-11 16:42:21,001 WARN exited: admin (exit status 1; not expected)
2023-04-11 16:42:21,003 DEBG received SIGCHLD indicating a child quit
2023-04-11 16:42:22,009 INFO spawned: 'admin' with pid 930
2023-04-11 16:42:23,018 INFO success: admin entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)

The following command are used to launch docker, with chown -R $USER data

docker run \
  -it --rm \
  -v "/$(pwd)/data:/data" \
  -v "/$(pwd)/data/hf_cache:/home/app/.cache/huggingface" \
  -p 5001:5001 \
  --platform linux/amd64 \
  -e MODEL_NAME=TabbyML/J-350M \
  tabbyml/tabby

attention_mask issue

Describe the bug
Got "attention mask and the pad token id were not set" when trying to use tabby with VSCode extension. When executing API requests or using playground everything working correct.
Console log when got error:

2023-04-12 11:44:42,190 DEBG 'server' stderr output:
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.

2023-04-12 11:44:45,657 DEBG 'server' stdout output:
INFO:     172.17.0.1:0 - "POST /v1/completions HTTP/1.1" 200 OK

Also got the same result when using CPU.

Information about your GPU

Wed Apr 12 14:29:45 2023
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 530.30.02              Driver Version: 531.18       CUDA Version: 12.1     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                  Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf            Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 3060 L...    On | 00000000:01:00.0 Off |                  N/A |
| N/A   46C    P8               12W /  N/A|    179MiB /  6144MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A        22      G   /Xwayland                                 N/A      |
+---------------------------------------------------------------------------------------+

Additional context
Run command:

sudo docker run \
  -it --rm \
  --gpus all \
  -v "/$(pwd)/data:/data" \
  -v "/$(pwd)/data/hf_cache:/home/app/.cache/huggingface" \
  -p 5000:5000 \
  -e MODEL_NAME=TabbyML/J-350M \
  --name=tabby \
  tabbyml/tabby

Using WSL2, Ubuntu 22.04 distribution and Docker Desktop. GPU: Nvidia GeForce RTX 3060 Laptop.

The attention mask and the pad token id were not set.

I follow the quick start, then visit the http://127.0.0.1:5000/_admin/

it shows
triton | down server,vector,dagu | live
Congrats, your server is live!
To get started with Tabby, you can either install the extensions below or use the Editor.

but when i go to Editor, it can't complete for my code.
I go to the log, and got this:

2023-04-07 09:45:20,415 DEBG 'server' stderr output:
As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.

2023-04-07 09:45:20,415 DEBG 'server' stderr output:
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.

what should i do

Improve the Completion Cache

Adding API-level cache Key: hash(completion request) -> Value: completion response at client-side should improve response speed, and reduce API requests and model invocation.
This should be useful in the case of user typing backspace.

Related to #130.

Cannot create directory /data/config

On running the docker script

mkdir -p data/hf_cache && chown -R 1000 data

docker run \
  --gpus all \
  -it --rm \
  -v "/$(pwd)/data:/data" \
  -v "/$(pwd)/data/hf_cache:/home/app/.cache/huggingface" \
  -p 5000:5000 \
  -e MODEL_NAME=TabbyML/J-350M \
  -e MODEL_BACKEND=triton \
  --name=tabby \
  tabbyml/tabby

I get the error

mkdir: cannot create directory '/data/config': Permission denied

Similar to issue 58. But I didn't use sudo and I don't see anything in the script pointing to an absolute path /data/config.

I'm running NixOS with an Nvidia 1060.

using cpu mode

Hey guys,

Is there a specific reason why you removed the CPU mode from the project readme's description? If access to GPU is not available, is there an alternative way to run the project?

"mkdir: cannot create directory '/data/config': Operation not permitted" when running Tabby Docker container on Mac M1

Using Mac M1 and the following for running the Docker container:

# Create data dir and grant owner to 1000 (Tabby run as uid 1000 in container)
sudo mkdir -p data/hf_cache && chown -R 1000 data

docker run \
  -it --rm \
  -v "/$(pwd)/data:/data" \
  -v "/$(pwd)/data/hf_cache:/home/app/.cache/huggingface" \
  -p 5000:5000 \
  --platform linux/amd64 \
  -e MODEL_NAME=TabbyML/J-350M \
  tabbyml/tabby

I get this error output:

WARNING: The NVIDIA Driver was not detected. GPU functionality will not be available.
Use the NVIDIA Container Toolkit to start this container with GPU support; see
https://docs.nvidia.com/datacenter/cloud-native/ .

ERROR: This container was built for CPUs supporting at least the AVX instruction set, but
the CPU detected was , which does not report
support for AVX. An Illegal Instrution exception at runtime is likely to result.
See https://en.wikipedia.org/wiki/Advanced_Vector_Extensions#CPUs_with_AVX .

mkdir: cannot create directory '/data/config': Operation not permitted

Is this error related to incompatibility with Mac M1, the docker run command, or something entirely different?

422 Unprocessable Entity

Describe the bug
I'm trying to use it and I'm not getting a hint. Launched via docker

vscode

image

console

image

Information about your GPU
Please provide output of nvidia-smi

nvidia-smi

image

Additional context
Add any other context about the problem here.

Manjaro

Failed to start DAG: exit status 1/runner: entry failed analytic: exit status 1

Describe the bug
running tabby server but client cant output anything generate

Information about your GPU
image

Additional context
2023-05-09 12:27:00,172 DEBG 'dagu_scheduler' stdout output:
2023/05/09 12:27:00 Failed to start DAG: exit status 1

2023-05-09 12:27:00,172 DEBG 'dagu_scheduler' stdout output:
2023/05/09 12:27:00 runner: entry failed analytic: exit status 1

2023-05-09 12:30:00,059 DEBG 'dagu_scheduler' stdout output:
2023/05/09 12:30:00 [2023-05-09 12:30:00] start analytic

2023-05-09 12:30:00,063 DEBG 'dagu_scheduler' stdout output:
2023/05/09 12:30:00 server is running at "/tmp/@dagu-analytic-af09eed2a725067a7f323f1fc0f93c29.sock"

2023-05-09 12:30:00,063 DEBG 'dagu_scheduler' stdout output:
2023/05/09 12:30:00 start running: Collect Tabby

2023-05-09 12:30:00,137 DEBG 'dagu_scheduler' stdout output:
2023/05/09 12:30:00 Collect Tabby failed

2023-05-09 12:30:00,164 DEBG 'dagu_scheduler' stdout output:
2023/05/09 12:30:00 schedule finished.

2023-05-09 12:30:00,168 DEBG 'dagu_scheduler' stdout output:
image

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.