Comments (7)
Thank you @Sing-Li for checking again. This issue #2121 (comment) also reports the similar error. We will look into that.
from mlc-llm.
Thank you @Sing-Li for reporting! That is because the mlc-chat-config.json
in the prebuilt weight repo was not updated.
I just updated the conv_template
field https://huggingface.co/mlc-ai/gorilla-openfunctions-v1-q4f16_1-MLC/commit/e83c4a2bbb4735c1ccde096dae0df635dd172310 and I think it should be good now. Would you mind trying again?
from mlc-llm.
Thank you @MasterJH5574 It works fine now. Closing the issue.
from mlc-llm.
Sorry, @MasterJH5574 Is it possible to update the configs for the other two gorilla function weights as well 🙏
https://huggingface.co/mlc-ai/gorilla-openfunctions-v2-q4f32_1-MLC
https://huggingface.co/mlc-ai/gorilla-openfunctions-v2-q4f16_1-MLC
from mlc-llm.
Hey @Sing-Li, sorry for the late reply. Just updated these two repositories. If I remember correctly, there might still be some output formatting issue for the function calling of gorilla v2. Could you try a bit at your convenience and see how it goes?
from mlc-llm.
Thanks @MasterJH5574
Test results:
gorilla-openfunctions-v2-q4f32_1
- chat - seems to work
- serve - I only have 12GB VRAM and
serve
ran out of memory
gorilla-openfunctions-v2-q4f16_1
- chat - crashes with the following dump
[2024-04-16 04:10:14] INFO estimate_memory_usage.py:57: [Memory usage] Function `sampler_take_probs`: 0.00 MB
[2024-04-16 04:10:14] INFO estimate_memory_usage.py:57: [Memory usage] Function `softmax_with_temperature`: 0.00 MB
[2024-04-16 04:10:14] INFO pipeline.py:50: Compiling external modules
[2024-04-16 04:10:14] INFO pipeline.py:50: Compilation complete! Exporting to disk
[2024-04-16 04:10:31] INFO model_metadata.py:96: Total memory usage: 4169.98 MB (Parameters: 3707.35 MB. KVCache: 0.00 MB. Temporary buffer: 462.62 MB)
[2024-04-16 04:10:31] INFO model_metadata.py:105: To reduce memory usage, tweak `prefill_chunk_size`, `context_window_size` and `sliding_window_size`
[2024-04-16 04:10:31] INFO compile.py:198: Generated: /tmp/tmphmrwlwhl/lib.so
[2024-04-16 04:10:31] INFO jit.py:98: Using compiled model lib: /root/.cache/mlc_llm/model_lib/5c413127c1217b4fc4779c7be427b220.so
[2024-04-16 04:10:32] INFO model_metadata.py:96: Total memory usage: 4169.98 MB (Parameters: 3707.35 MB. KVCache: 0.00 MB. Temporary buffer: 462.62 MB)
[2024-04-16 04:10:32] INFO model_metadata.py:105: To reduce memory usage, tweak `prefill_chunk_size`, `context_window_size` and `sliding_window_size`
You can use the following special commands:
/help print the special commands
/exit quit the cli
/stats print out the latest stats (token/sec)
/reset restart a fresh chat
/set [overrides] override settings in the generation config. For example,
`/set temperature=0.5;max_gen_len=100;stop=end,stop`
Note: Separate stop words in the `stop` option with commas (,).
Multi-line input: Use escape+enter to start a new line.
Traceback (most recent call last):
File "/usr/local/bin/mlc_llm", line 8, in <module>
sys.exit(main())
File "/usr/local/lib/python3.10/dist-packages/mlc_llm/__main__.py", line 37, in main
cli.main(sys.argv[2:])
File "/usr/local/lib/python3.10/dist-packages/mlc_llm/cli/chat.py", line 41, in main
chat(
File "/usr/local/lib/python3.10/dist-packages/mlc_llm/interface/chat.py", line 135, in chat
cm._process_system_prompts() # pylint: disable=protected-access
File "/usr/local/lib/python3.10/dist-packages/mlc_llm/chat_module.py", line 1228, in _process_system_prompts
self._process_system_prompts_func()
File "tvm/_ffi/_cython/./packed_func.pxi", line 332, in tvm._ffi._cy3.core.PackedFuncBase.__call__
File "tvm/_ffi/_cython/./packed_func.pxi", line 263, in tvm._ffi._cy3.core.FuncCall
File "tvm/_ffi/_cython/./packed_func.pxi", line 252, in tvm._ffi._cy3.core.FuncCall3
File "tvm/_ffi/_cython/./base.pxi", line 182, in tvm._ffi._cy3.core.CHECK_CALL
File "/usr/local/lib/python3.10/dist-packages/tvm/_ffi/base.py", line 481, in raise_last_ffi_error
raise py_err
tvm._ffi.base.TVMError: TVMError: Unsupported layout: 0
running serve
also crashes with same error when a REST completion requests comes in:
[2024-04-16 04:11:59] INFO auto_device.py:76: Found device: cuda:0
[2024-04-16 04:12:00] INFO auto_device.py:85: Not found device: rocm:0
[2024-04-16 04:12:01] INFO auto_device.py:85: Not found device: metal:0
[2024-04-16 04:12:02] INFO auto_device.py:85: Not found device: vulkan:0
[2024-04-16 04:12:03] INFO auto_device.py:85: Not found device: opencl:0
[2024-04-16 04:12:03] INFO auto_device.py:33: Using device: cuda:0
[2024-04-16 04:12:03] INFO chat_module.py:362: Downloading model from HuggingFace: HF://mlc-ai/gorilla-openfunctions-v2-q4f16_1-MLC
[2024-04-16 04:12:03] INFO download.py:131: Weights already downloaded: /root/.cache/mlc_llm/model_weights/mlc-ai/gorilla-openfunctions-v2-q4f16_1-MLC
[2024-04-16 04:12:03] INFO jit.py:35: MLC_JIT_POLICY = ON. Can be one of: ON, OFF, REDO, READONLY
[2024-04-16 04:12:03] INFO jit.py:117: Using cached model lib: /root/.cache/mlc_llm/model_lib/5c413127c1217b4fc4779c7be427b220.so
[2024-04-16 04:12:05] INFO engine_base.py:241: Estimated KVCacheConfig "max_total_sequence_length": 13445.
[2024-04-16 04:12:05] INFO engine_base.py:246: Estimated total single GPU memory usage: 10839.99 MB (Parameters: 3707.35 MB. KVCache: 6479.40 MB. Temporary buffer: 653.24 MB)
INFO: Started server process [1]
INFO: Waiting for application startup.
INFO: Application startup complete.
INFO: Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
Exception in thread Thread-1 (_background_loop):
Traceback (most recent call last):
File "/usr/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
self.run()
File "/usr/lib/python3.10/threading.py", line 953, in run
self._target(*self._args, **self._kwargs)
File "/usr/local/lib/python3.10/dist-packages/mlc_llm/serve/engine_base.py", line 602, in _background_loop
self._ffi["run_background_loop"]()
File "tvm/_ffi/_cython/./packed_func.pxi", line 332, in tvm._ffi._cy3.core.PackedFuncBase.__call__
File "tvm/_ffi/_cython/./packed_func.pxi", line 263, in tvm._ffi._cy3.core.FuncCall
File "tvm/_ffi/_cython/./packed_func.pxi", line 252, in tvm._ffi._cy3.core.FuncCall3
File "tvm/_ffi/_cython/./base.pxi", line 182, in tvm._ffi._cy3.core.CHECK_CALL
File "/usr/local/lib/python3.10/dist-packages/tvm/_ffi/base.py", line 481, in raise_last_ffi_error
raise py_err
tvm._ffi.base.TVMError: TVMError: Unsupported layout: 0
from mlc-llm.
Hi @Sing-Li @ollmer, we have fixed this issue in the latest pip package. Please update the packages and try again, thank you!
from mlc-llm.
Related Issues (20)
- run mlc_llm package ValueError: Git clone failed with return code 128: None. The command was HOT 4
- [Feature Request] please allow f32q5_k and f16q5_k quantizations
- [Bug] FlashInfer decode BeginForward error an illegal instruction was encountered HOT 1
- mlc_llm package is ERROR: returned non-zero exit status[Bug] HOT 9
- [Question] Running mlc_llm into a multi-phase container build HOT 10
- [Question] Error when running debug_chat.py HOT 4
- [Bug] chatglm4 mlc_llm shows error "TVMError: Check failed: append_length > 0 (0 vs. 0) : Append with length 0 is not allowed." during mlc_llm chat CLI HOT 9
- [Doc] A typo in TVM installation docs HOT 1
- QWen1.8b acuracy in noquantize HOT 13
- [Question] Quantization Problems HOT 4
- [Question] Unable to download and compile custom model from Hugging Face using `mlc_llm package` command HOT 5
- [Bug] TVM_SOURCE_DIR In Windows 10. In the identification path is error HOT 2
- [Bug] TVM ERROR when convert_weight HOT 2
- [Bug] Apple Metal/MPS -- TVM/MLC-LLM won't compile from source HOT 6
- [Bug] Could NOT find Doxygen (missing: DOXYGEN_EXECUTABLE) HOT 4
- [Question] Android demo app change model HOT 3
- Qwen2-72B-Instruct MultiGPU 8xP100 HOT 7
- [Question] Is there a way to compute ppl of models in MLC-LLM? HOT 1
- [Bug] in android folder there is no library folder witch contains prepare_lib.sh ? how build android tvm.so ? HOT 2
- [Bug] in android folder there is no library folder witch contains prepare_lib.sh ? how build android tvm.so ? HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from mlc-llm.