🐛 Bug Trying to serve gorilla openfunctions v1 will crash during

Thank you <a class="user-mention notranslate" data-hovercard-type="user" data-hovercar

Sorry, <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-u

Hey <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url=

Thanks <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-u

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

[Bug] gorilla-openfunctions-v1-q4f16_1-MLC crashes on JIT lib build on cuda12.2 about mlc-llm HOT 7 CLOSED

Sing-Li commented on June 12, 2024

[Bug] gorilla-openfunctions-v1-q4f16_1-MLC crashes on JIT lib build on cuda12.2

from mlc-llm.

Comments (7)

MasterJH5574 commented on June 12, 2024 2

Thank you @Sing-Li for checking again. This issue #2121 (comment) also reports the similar error. We will look into that.

from mlc-llm.

MasterJH5574 commented on June 12, 2024

Thank you @Sing-Li for reporting! That is because the mlc-chat-config.json in the prebuilt weight repo was not updated.

I just updated the conv_template field https://huggingface.co/mlc-ai/gorilla-openfunctions-v1-q4f16_1-MLC/commit/e83c4a2bbb4735c1ccde096dae0df635dd172310 and I think it should be good now. Would you mind trying again?

from mlc-llm.

Sing-Li commented on June 12, 2024

Thank you @MasterJH5574 It works fine now. Closing the issue.

from mlc-llm.

Sing-Li commented on June 12, 2024

Sorry, @MasterJH5574 Is it possible to update the configs for the other two gorilla function weights as well 🙏

https://huggingface.co/mlc-ai/gorilla-openfunctions-v2-q4f32_1-MLC

https://huggingface.co/mlc-ai/gorilla-openfunctions-v2-q4f16_1-MLC

from mlc-llm.

MasterJH5574 commented on June 12, 2024

Hey @Sing-Li, sorry for the late reply. Just updated these two repositories. If I remember correctly, there might still be some output formatting issue for the function calling of gorilla v2. Could you try a bit at your convenience and see how it goes?

from mlc-llm.

Sing-Li commented on June 12, 2024

Thanks @MasterJH5574

Test results:
gorilla-openfunctions-v2-q4f32_1

chat - seems to work
serve - I only have 12GB VRAM and serve ran out of memory

gorilla-openfunctions-v2-q4f16_1

chat - crashes with the following dump

[2024-04-16 04:10:14] INFO estimate_memory_usage.py:57: [Memory usage] Function `sampler_take_probs`: 0.00 MB
[2024-04-16 04:10:14] INFO estimate_memory_usage.py:57: [Memory usage] Function `softmax_with_temperature`: 0.00 MB
[2024-04-16 04:10:14] INFO pipeline.py:50: Compiling external modules
[2024-04-16 04:10:14] INFO pipeline.py:50: Compilation complete! Exporting to disk
[2024-04-16 04:10:31] INFO model_metadata.py:96: Total memory usage: 4169.98 MB (Parameters: 3707.35 MB. KVCache: 0.00 MB. Temporary buffer: 462.62 MB)
[2024-04-16 04:10:31] INFO model_metadata.py:105: To reduce memory usage, tweak `prefill_chunk_size`, `context_window_size` and `sliding_window_size`
[2024-04-16 04:10:31] INFO compile.py:198: Generated: /tmp/tmphmrwlwhl/lib.so
[2024-04-16 04:10:31] INFO jit.py:98: Using compiled model lib: /root/.cache/mlc_llm/model_lib/5c413127c1217b4fc4779c7be427b220.so
[2024-04-16 04:10:32] INFO model_metadata.py:96: Total memory usage: 4169.98 MB (Parameters: 3707.35 MB. KVCache: 0.00 MB. Temporary buffer: 462.62 MB)
[2024-04-16 04:10:32] INFO model_metadata.py:105: To reduce memory usage, tweak `prefill_chunk_size`, `context_window_size` and `sliding_window_size`
You can use the following special commands:
 /help               print the special commands
 /exit               quit the cli
 /stats              print out the latest stats (token/sec)
 /reset              restart a fresh chat
 /set [overrides]    override settings in the generation config. For example,
                     `/set temperature=0.5;max_gen_len=100;stop=end,stop`
                     Note: Separate stop words in the `stop` option with commas (,).
 Multi-line input: Use escape+enter to start a new line.

Traceback (most recent call last):
 File "/usr/local/bin/mlc_llm", line 8, in <module>
   sys.exit(main())
 File "/usr/local/lib/python3.10/dist-packages/mlc_llm/__main__.py", line 37, in main
   cli.main(sys.argv[2:])
 File "/usr/local/lib/python3.10/dist-packages/mlc_llm/cli/chat.py", line 41, in main
   chat(
 File "/usr/local/lib/python3.10/dist-packages/mlc_llm/interface/chat.py", line 135, in chat
   cm._process_system_prompts()  # pylint: disable=protected-access
 File "/usr/local/lib/python3.10/dist-packages/mlc_llm/chat_module.py", line 1228, in _process_system_prompts
   self._process_system_prompts_func()
 File "tvm/_ffi/_cython/./packed_func.pxi", line 332, in tvm._ffi._cy3.core.PackedFuncBase.__call__
 File "tvm/_ffi/_cython/./packed_func.pxi", line 263, in tvm._ffi._cy3.core.FuncCall
 File "tvm/_ffi/_cython/./packed_func.pxi", line 252, in tvm._ffi._cy3.core.FuncCall3
 File "tvm/_ffi/_cython/./base.pxi", line 182, in tvm._ffi._cy3.core.CHECK_CALL
 File "/usr/local/lib/python3.10/dist-packages/tvm/_ffi/base.py", line 481, in raise_last_ffi_error
   raise py_err
tvm._ffi.base.TVMError: TVMError: Unsupported layout: 0

running serve also crashes with same error when a REST completion requests comes in:

[2024-04-16 04:11:59] INFO auto_device.py:76: Found device: cuda:0
[2024-04-16 04:12:00] INFO auto_device.py:85: Not found device: rocm:0
[2024-04-16 04:12:01] INFO auto_device.py:85: Not found device: metal:0
[2024-04-16 04:12:02] INFO auto_device.py:85: Not found device: vulkan:0
[2024-04-16 04:12:03] INFO auto_device.py:85: Not found device: opencl:0
[2024-04-16 04:12:03] INFO auto_device.py:33: Using device: cuda:0
[2024-04-16 04:12:03] INFO chat_module.py:362: Downloading model from HuggingFace: HF://mlc-ai/gorilla-openfunctions-v2-q4f16_1-MLC
[2024-04-16 04:12:03] INFO download.py:131: Weights already downloaded: /root/.cache/mlc_llm/model_weights/mlc-ai/gorilla-openfunctions-v2-q4f16_1-MLC
[2024-04-16 04:12:03] INFO jit.py:35: MLC_JIT_POLICY = ON. Can be one of: ON, OFF, REDO, READONLY
[2024-04-16 04:12:03] INFO jit.py:117: Using cached model lib: /root/.cache/mlc_llm/model_lib/5c413127c1217b4fc4779c7be427b220.so
[2024-04-16 04:12:05] INFO engine_base.py:241: Estimated KVCacheConfig "max_total_sequence_length": 13445.
[2024-04-16 04:12:05] INFO engine_base.py:246: Estimated total single GPU memory usage: 10839.99 MB (Parameters: 3707.35 MB. KVCache: 6479.40 MB. Temporary buffer: 653.24 MB)
INFO:     Started server process [1]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
Exception in thread Thread-1 (_background_loop):
Traceback (most recent call last):
  File "/usr/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.10/threading.py", line 953, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python3.10/dist-packages/mlc_llm/serve/engine_base.py", line 602, in _background_loop
    self._ffi["run_background_loop"]()
  File "tvm/_ffi/_cython/./packed_func.pxi", line 332, in tvm._ffi._cy3.core.PackedFuncBase.__call__
  File "tvm/_ffi/_cython/./packed_func.pxi", line 263, in tvm._ffi._cy3.core.FuncCall
  File "tvm/_ffi/_cython/./packed_func.pxi", line 252, in tvm._ffi._cy3.core.FuncCall3
  File "tvm/_ffi/_cython/./base.pxi", line 182, in tvm._ffi._cy3.core.CHECK_CALL
  File "/usr/local/lib/python3.10/dist-packages/tvm/_ffi/base.py", line 481, in raise_last_ffi_error
    raise py_err
tvm._ffi.base.TVMError: TVMError: Unsupported layout: 0

from mlc-llm.

MasterJH5574 commented on June 12, 2024

Hi @Sing-Li @ollmer, we have fixed this issue in the latest pip package. Please update the packages and try again, thank you!

from mlc-llm.

[Bug] gorilla-openfunctions-v1-q4f16_1-MLC crashes on JIT lib build on cuda12.2 about mlc-llm HOT 7 CLOSED

Comments (7)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent