Giter Club home page Giter Club logo

Comments (7)

MasterJH5574 avatar MasterJH5574 commented on June 12, 2024 2

Thank you @Sing-Li for checking again. This issue #2121 (comment) also reports the similar error. We will look into that.

from mlc-llm.

MasterJH5574 avatar MasterJH5574 commented on June 12, 2024

Thank you @Sing-Li for reporting! That is because the mlc-chat-config.json in the prebuilt weight repo was not updated.

I just updated the conv_template field https://huggingface.co/mlc-ai/gorilla-openfunctions-v1-q4f16_1-MLC/commit/e83c4a2bbb4735c1ccde096dae0df635dd172310 and I think it should be good now. Would you mind trying again?

from mlc-llm.

Sing-Li avatar Sing-Li commented on June 12, 2024

Thank you @MasterJH5574 It works fine now. Closing the issue.

from mlc-llm.

Sing-Li avatar Sing-Li commented on June 12, 2024

Sorry, @MasterJH5574 Is it possible to update the configs for the other two gorilla function weights as well 🙏

https://huggingface.co/mlc-ai/gorilla-openfunctions-v2-q4f32_1-MLC

https://huggingface.co/mlc-ai/gorilla-openfunctions-v2-q4f16_1-MLC

from mlc-llm.

MasterJH5574 avatar MasterJH5574 commented on June 12, 2024

Hey @Sing-Li, sorry for the late reply. Just updated these two repositories. If I remember correctly, there might still be some output formatting issue for the function calling of gorilla v2. Could you try a bit at your convenience and see how it goes?

from mlc-llm.

Sing-Li avatar Sing-Li commented on June 12, 2024

Thanks @MasterJH5574

Test results:
gorilla-openfunctions-v2-q4f32_1

  • chat - seems to work
  • serve - I only have 12GB VRAM and serve ran out of memory

gorilla-openfunctions-v2-q4f16_1

  • chat - crashes with the following dump
[2024-04-16 04:10:14] INFO estimate_memory_usage.py:57: [Memory usage] Function `sampler_take_probs`: 0.00 MB
[2024-04-16 04:10:14] INFO estimate_memory_usage.py:57: [Memory usage] Function `softmax_with_temperature`: 0.00 MB
[2024-04-16 04:10:14] INFO pipeline.py:50: Compiling external modules
[2024-04-16 04:10:14] INFO pipeline.py:50: Compilation complete! Exporting to disk
[2024-04-16 04:10:31] INFO model_metadata.py:96: Total memory usage: 4169.98 MB (Parameters: 3707.35 MB. KVCache: 0.00 MB. Temporary buffer: 462.62 MB)
[2024-04-16 04:10:31] INFO model_metadata.py:105: To reduce memory usage, tweak `prefill_chunk_size`, `context_window_size` and `sliding_window_size`
[2024-04-16 04:10:31] INFO compile.py:198: Generated: /tmp/tmphmrwlwhl/lib.so
[2024-04-16 04:10:31] INFO jit.py:98: Using compiled model lib: /root/.cache/mlc_llm/model_lib/5c413127c1217b4fc4779c7be427b220.so
[2024-04-16 04:10:32] INFO model_metadata.py:96: Total memory usage: 4169.98 MB (Parameters: 3707.35 MB. KVCache: 0.00 MB. Temporary buffer: 462.62 MB)
[2024-04-16 04:10:32] INFO model_metadata.py:105: To reduce memory usage, tweak `prefill_chunk_size`, `context_window_size` and `sliding_window_size`
You can use the following special commands:
 /help               print the special commands
 /exit               quit the cli
 /stats              print out the latest stats (token/sec)
 /reset              restart a fresh chat
 /set [overrides]    override settings in the generation config. For example,
                     `/set temperature=0.5;max_gen_len=100;stop=end,stop`
                     Note: Separate stop words in the `stop` option with commas (,).
 Multi-line input: Use escape+enter to start a new line.

Traceback (most recent call last):
 File "/usr/local/bin/mlc_llm", line 8, in <module>
   sys.exit(main())
 File "/usr/local/lib/python3.10/dist-packages/mlc_llm/__main__.py", line 37, in main
   cli.main(sys.argv[2:])
 File "/usr/local/lib/python3.10/dist-packages/mlc_llm/cli/chat.py", line 41, in main
   chat(
 File "/usr/local/lib/python3.10/dist-packages/mlc_llm/interface/chat.py", line 135, in chat
   cm._process_system_prompts()  # pylint: disable=protected-access
 File "/usr/local/lib/python3.10/dist-packages/mlc_llm/chat_module.py", line 1228, in _process_system_prompts
   self._process_system_prompts_func()
 File "tvm/_ffi/_cython/./packed_func.pxi", line 332, in tvm._ffi._cy3.core.PackedFuncBase.__call__
 File "tvm/_ffi/_cython/./packed_func.pxi", line 263, in tvm._ffi._cy3.core.FuncCall
 File "tvm/_ffi/_cython/./packed_func.pxi", line 252, in tvm._ffi._cy3.core.FuncCall3
 File "tvm/_ffi/_cython/./base.pxi", line 182, in tvm._ffi._cy3.core.CHECK_CALL
 File "/usr/local/lib/python3.10/dist-packages/tvm/_ffi/base.py", line 481, in raise_last_ffi_error
   raise py_err
tvm._ffi.base.TVMError: TVMError: Unsupported layout: 0

running serve also crashes with same error when a REST completion requests comes in:

[2024-04-16 04:11:59] INFO auto_device.py:76: Found device: cuda:0
[2024-04-16 04:12:00] INFO auto_device.py:85: Not found device: rocm:0
[2024-04-16 04:12:01] INFO auto_device.py:85: Not found device: metal:0
[2024-04-16 04:12:02] INFO auto_device.py:85: Not found device: vulkan:0
[2024-04-16 04:12:03] INFO auto_device.py:85: Not found device: opencl:0
[2024-04-16 04:12:03] INFO auto_device.py:33: Using device: cuda:0
[2024-04-16 04:12:03] INFO chat_module.py:362: Downloading model from HuggingFace: HF://mlc-ai/gorilla-openfunctions-v2-q4f16_1-MLC
[2024-04-16 04:12:03] INFO download.py:131: Weights already downloaded: /root/.cache/mlc_llm/model_weights/mlc-ai/gorilla-openfunctions-v2-q4f16_1-MLC
[2024-04-16 04:12:03] INFO jit.py:35: MLC_JIT_POLICY = ON. Can be one of: ON, OFF, REDO, READONLY
[2024-04-16 04:12:03] INFO jit.py:117: Using cached model lib: /root/.cache/mlc_llm/model_lib/5c413127c1217b4fc4779c7be427b220.so
[2024-04-16 04:12:05] INFO engine_base.py:241: Estimated KVCacheConfig "max_total_sequence_length": 13445.
[2024-04-16 04:12:05] INFO engine_base.py:246: Estimated total single GPU memory usage: 10839.99 MB (Parameters: 3707.35 MB. KVCache: 6479.40 MB. Temporary buffer: 653.24 MB)
INFO:     Started server process [1]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
Exception in thread Thread-1 (_background_loop):
Traceback (most recent call last):
  File "/usr/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.10/threading.py", line 953, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python3.10/dist-packages/mlc_llm/serve/engine_base.py", line 602, in _background_loop
    self._ffi["run_background_loop"]()
  File "tvm/_ffi/_cython/./packed_func.pxi", line 332, in tvm._ffi._cy3.core.PackedFuncBase.__call__
  File "tvm/_ffi/_cython/./packed_func.pxi", line 263, in tvm._ffi._cy3.core.FuncCall
  File "tvm/_ffi/_cython/./packed_func.pxi", line 252, in tvm._ffi._cy3.core.FuncCall3
  File "tvm/_ffi/_cython/./base.pxi", line 182, in tvm._ffi._cy3.core.CHECK_CALL
  File "/usr/local/lib/python3.10/dist-packages/tvm/_ffi/base.py", line 481, in raise_last_ffi_error
    raise py_err
tvm._ffi.base.TVMError: TVMError: Unsupported layout: 0

from mlc-llm.

MasterJH5574 avatar MasterJH5574 commented on June 12, 2024

Hi @Sing-Li @ollmer, we have fixed this issue in the latest pip package. Please update the packages and try again, thank you!

from mlc-llm.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.