mlc-ai / binary-mlc-llm-libs Goto Github PK

View Code? Open in Web Editor NEW

153.0 153.0 41.0 218.25 MB

binary-mlc-llm-libs's People

Contributors

Stargazers

Watchers

binary-mlc-llm-libs's Issues

Uncaught Error: Cannot find model_url for Mistral-7B-Instruct-v0.1-q4f32_1 when running Chrome Extension example

Llama2 70b is not working

Hi,
I just download the colab you provide at this link: https://github.com/mlc-ai/notebooks/blob/main/mlc-llm/tutorial_chat_module_getting_started.ipynb.
I works properly if I use the 7b model, however if I change the settings in order to use the 70b model, I receive the following error:

InternalError Traceback (most recent call last)
in <cell line: 4>()
2 from mlc_chat.callback import StreamToStdout
3
----> 4 cm = ChatModule(
5 model="dist/Llama-2-70b-chat-hf-q4f16_1-MLC",
6 model_lib_path="dist/prebuilt_libs/Llama-2-70b-chat-hf/Llama-2-70b-chat-hf-q4f16_1-cuda.so"

5 frames
tvm/_ffi/_cython/./packed_func.pxi in tvm._ffi._cy3.core.PackedFuncBase.call()

tvm/_ffi/_cython/./packed_func.pxi in tvm._ffi._cy3.core.FuncCall()

tvm/_ffi/_cython/./packed_func.pxi in tvm._ffi._cy3.core.FuncCall3()

tvm/_ffi/_cython/./base.pxi in tvm._ffi._cy3.core.CHECK_CALL()

/workspace/mlc-llm/cpp/llm_chat.cc in LoadParams()

InternalError: Traceback (most recent call last):
7: mlc::llm::LLMChatModule::GetFunction(tvm::runtime::String const&, tvm::runtime::ObjectPtrtvm::runtime::Object const&)::{lambda(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)#1}::operator()(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const
at /workspace/mlc-llm/cpp/llm_chat.cc:1633
6: mlc::llm::LLMChat::Reload(tvm::runtime::TVMArgValue, tvm::runtime::String, tvm::runtime::String)
at /workspace/mlc-llm/cpp/llm_chat.cc:631
5: LoadParams
at /workspace/mlc-llm/cpp/llm_chat.cc:219
4: tvm::runtime::PackedFuncObj::Extractor<tvm::runtime::PackedFuncSubObj<tvm::runtime::TypedPackedFunc<void (std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, int, int)>::AssignTypedLambda<void ()(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, int, int)>(void ()(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, int, int), std::__cxx11::basic_string<char, std::char_traits, std::allocator >)::{lambda(tvm::runtime::TVMArgs const&, tvm::runtime::TVMRetValue*)#1}> >::Call(tvm::runtime::PackedFuncObj const*, tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)
3: tvm::runtime::relax_vm::NDArrayCache::Load(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, int, int)
2: tvm::runtime::relax_vm::NDArrayCacheMetadata::Load(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&)
1: tvm::runtime::LoadBinaryFromFile(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, std::__cxx11::basic_string<char, std::char_traits, std::allocator >*)
0: _ZN3tvm7runtime6deta
File "/workspace/tvm/src/runtime/file_utils.cc", line 121
InternalError: Check failed: (!fs.fail()) is false: Cannot open dist/Llama-2-70b-chat-hf-q4f16_1-MLC/ndarray-cache.json

[Request] Please generate CodeLlama-7b-Python-hf-q4f16_1-metal.so

It's missing in the latest update. Can you please add this so that M1/M2 can run the codellama? Thanks

Is it working? Just closing on S23.

Hi!
Is it working? Downloaded LLama-2-7b, clicked on Chat button, it shows some messages, "Ready to chat".
And application closed, probably crushed. That's it.
Samsung S23

WizardCoder-15B-V1.0-q4f16_1 failing to load on WebLLM

Following the available examples in the WebLLM repo such as the next-simple-chat:

I have added the model URL and ID,

{ model_url: "https://huggingface.co/mlc-ai/mlc-chat-WizardCoder-15B-V1.0-q4f32_1/resolve/main/", local_id: "WizardCoder-15B-V1.0-q4f32_1", }

then added the libmap

"WizardCoder-15B-V1.0-q4f32_1": "https://raw.githubusercontent.com/mlc-ai/binary-mlc-llm-libs/main/WizardCoder-15B-V1.0-q4f16_1-webgpu.wasm",

but I end up getting this error immediately after loading the model on the browser:

Init error, Error: Unknown conv template wizard_coder_or_math

TinyLlama is missing the wasm

Unfortunately TinyLlama is missing the wasm

Gemma isn't working

Doesn't reply and after few retries throws:
MLCChat failed

Stack trace:
org.apache.tvm.Base$TVMError: InternalError: Check failed: (e == CL_SUCCESS) is false: OpenCL Error, code=-54: CL_INVALID_WORK_GROUP_SIZE
Stack trace:
File "/Users/kartik/mlc/mlc-llm/3rdparty/tvm/src/runtime/opencl/opencl_module.cc", line 90

at org.apache.tvm.Base.checkCall(Base.java:173)
at org.apache.tvm.Function.invoke(Function.java:130)
at ai.mlc.mlcllm.ChatModule.decode(ChatModule.java:74)
at ai.mlc.mlcchat.AppViewModel$ChatState$requestGenerate$1$2.invoke(AppViewModel.kt:669)
at ai.mlc.mlcchat.AppViewModel$ChatState$requestGenerate$1$2.invoke(AppViewModel.kt:668)
at ai.mlc.mlcchat.AppViewModel$ChatState.callBackend(AppViewModel.kt:548)
at ai.mlc.mlcchat.AppViewModel$ChatState.requestGenerate$lambda$4(AppViewModel.kt:668)
at ai.mlc.mlcchat.AppViewModel$ChatState.$r8$lambda$lluIrcsPALEW5nCb2tohZYadhTY(Unknown Source:0)
at ai.mlc.mlcchat.AppViewModel$ChatState$$ExternalSyntheticLambda3.run(Unknown Source:6)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:487)
at java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:644)
at java.lang.Thread.run(Thread.java:1012)

Error message:
InternalError: Check failed: (e == CL_SUCCESS) is false: OpenCL Error, code=-54: CL_INVALID_WORK_GROUP_SIZE
Stack trace:
File "/Users/kartik/mlc/mlc-llm/3rdparty/tvm/src/runtime/opencl/opencl_module.cc", line 90

Missing library files for Llama-2-70b-chat-hf-q4f16_1 model

I downloaded the 70b model and encountered an error when running it. The command I used was:

mlc_chat_cli --local-id Llama-2-70b-chat-hf-q4f16_1

I received the following error message:

WARNING: lavapipe is not a conformant vulkan implementation, testing use only.
Use MLC config: "/home/lxr/software/dist/prebuilt/mlc-chat-Llama-2-70b-chat-hf-q4f16_1/mlc-chat-config.json"
Use model weights: "/home/lxr/software/dist/prebuilt/mlc-chat-Llama-2-70b-chat-hf-q4f16_1/ndarray-cache.json"
Cannot find library "Llama-2-70b-chat-hf-q4f16_1-vulkan.so" in "dist/prebuilt/lib" or other search paths.

However, in the same directory, I have successfully deployed the 13b model. Could you please provide further guidance on how to resolve this issue?

Missing lib for Android

Didn't find any lib for Android

Error When Implementing Mali GPU Acceleration on OrangePi5 with mlc-llm

Following the tutorial, I set up mlc-llm on my OrangePi5 with Mali GPU acceleration via OpenCL. Everything was smooth until I encountered an error. I've re-downloaded the Mali libraries (versions below) multiple times, but the error persists. Could the libraries be corrupted?

Library versions in use:

RedPajama-INCITE-Chat-3B-v1-q4f16_1
RedPajama-INCITE-Chat-3B-v1-q4f16_1-mali.so

Any advice on resolving this would be appreciated.

arm_release_ver: g13p0-01eac0, rk_so_ver: 3
arm_release_ver of this libmali is 'g6p0-01eac0', rk_so_ver is '7'.
Traceback (most recent call last):
  File "/home/yusepp/Desktop/test.py", line 6, in <module>
    cm = ChatModule(model=models+"/RedPajama-INCITE-Chat-3B-v1-q4f16_1",
  File "/home/yusepp/mlc-llm/python/mlc_chat/chat_module.py", line 842, in __init__
    self._reload(self.model_lib_path, self.model_path, user_chat_config_json_str)
  File "/home/yusepp/mlc-llm/python/mlc_chat/chat_module.py", line 1056, in _reload
    self._reload_func(lib, model_path, app_config_json)
  File "/home/yusepp/tvm_unity/python/tvm/_ffi/_ctypes/packed_func.py", line 239, in __call__
    raise_last_ffi_error()
  File "/home/yusepp/tvm_unity/python/tvm/_ffi/base.py", line 481, in raise_last_ffi_error
    raise py_err
  File "/home/yusepp/Desktop/tvm_unity/src/runtime/relax_vm/ndarray_cache_support.cc", line 255, in tvm::runtime::relax_vm::NDArrayCache::Load(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, int, int)
ValueError: Traceback (most recent call last):
  3: 0x0000ffff63d3ae9b
  2: 0x0000ffff63d3ac23
  1: 0x0000ffff63d392bf
  0: tvm::runtime::relax_vm::NDArrayCache::Load(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, int, int)
        at /home/yusepp/Desktop/tvm_unity/src/runtime/relax_vm/ndarray_cache_support.cc:255
  4: 0x0000ffff63d3ae9b
  3: 0x0000ffff63d3ac23
  2: 0x0000ffff63d392bf
  1: tvm::runtime::relax_vm::NDArrayCache::Load(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, int, int)
        at /home/yusepp/Desktop/tvm_unity/src/runtime/relax_vm/ndarray_cache_support.cc:253
  0: tvm::runtime::relax_vm::NDArrayCacheMetadata::FileRecord::Load(DLDevice, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*, tvm::runtime::Optional<tvm::runtime::NDArray>*) const
        at /home/yusepp/Desktop/tvm_unity/src/runtime/relax_vm/ndarray_cache_support.cc:193
  File "/home/yusepp/Desktop/tvm_unity/src/runtime/relax_vm/ndarray_cache_support.cc", line 255
ValueError: Error when loading parameters from params_shard_0.bin: [20:19:57] /home/yusepp/Desktop/tvm_unity/src/runtime/relax_vm/ndarray_cache_support.cc:193: Check failed: this->nbytes == raw_data_buffer->length() (64552960 vs. 133) : ValueError: Encountered an corrupted parameter shard. It means it is not downloaded completely or downloading is interrupted. Please try to download again.

[Request] Please update rwkv-raven-{1b5, 3b, 7b}-q8f16_0 to _1

As far as I understood the latest APK requires _1 models? I'd like to try the RWKV, because it didn't work with the original apk. What's the difference between the _0 and _1 versions, are they incompatible?

[Request] Please generate `metal.so` for WizardCoder and WizardMath

Currently missing support for M1/M2.

Android app crash after last models updated

MLCChat failed

Stack trace:
org.apache.tvm.Base$TVMError: ValueError: Error when loading parameters from params_shard_66.bin: [22:39:36] /Users/kartik/mlc/mlc-llm/3rdparty/tvm/src/runtime/relax_vm/ndarray_cache_support.cc:193: Check failed: this->nbytes == raw_data_buffer->length() (45088768 vs. 21734731) : ValueError: Encountered an corrupted parameter shard. It means it is not downloaded completely or downloading is interrupted. Please try to download again.
Stack trace:
File "/Users/kartik/mlc/mlc-llm/3rdparty/tvm/src/runtime/relax_vm/ndarray_cache_support.cc", line 255

at org.apache.tvm.Base.checkCall(Base.java:173)
at org.apache.tvm.Function.invoke(Function.java:130)
at ai.mlc.mlcllm.ChatModule.reload(ChatModule.java:43)
at ai.mlc.mlcchat.AppViewModel$ChatState$mainReloadChat$1$2.invoke(AppViewModel.kt:633)
at ai.mlc.mlcchat.AppViewModel$ChatState$mainReloadChat$1$2.invoke(AppViewModel.kt:631)
at ai.mlc.mlcchat.AppViewModel$ChatState.callBackend(AppViewModel.kt:534)
at ai.mlc.mlcchat.AppViewModel$ChatState.mainReloadChat$lambda$3(AppViewModel.kt:631)
at ai.mlc.mlcchat.AppViewModel$ChatState.$r8$lambda$JJKpoRMMpp77FzXKA0o00i8lgRA(Unknown Source:0)
at ai.mlc.mlcchat.AppViewModel$ChatState$$ExternalSyntheticLambda3.run(Unknown Source:8)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:487)
at java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:644)
at java.lang.Thread.run(Thread.java:1012)

Error message:
ValueError: Error when loading parameters from params_shard_66.bin: [22:39:36] /Users/kartik/mlc/mlc-llm/3rdparty/tvm/src/runtime/relax_vm/ndarray_cache_support.cc:193: Check failed: this->nbytes == raw_data_buffer->length() (45088768 vs. 21734731) : ValueError: Encountered an corrupted parameter shard. It means it is not downloaded completely or downloading is interrupted. Please try to download again.
Stack trace:
File "/Users/kartik/mlc/mlc-llm/3rdparty/tvm/src/runtime/relax_vm/ndarray_cache_support.cc", line 255

Intel MAC shared library files

Which files here should users on Intel Mac machines be using. It looks like the metal.so files are all built for arm64 architecture. What about x86?

android apk run error

install https://github.com/mlc-ai/binary-mlc-llm-libs/raw/main/mlc-chat.apk 33.3 MB (35,003,075 字节)。
show ‘ add model failed： xxx/xx/ open failed：EACCES( Permission denied)'

[Request] Please update the Android APK

The current one is already 2 months old. Seems the CI/Build process won't do it automatically.

Resource consumption degradation

Hi. I try this app from time to time to look over the progress in mobile LLMs. In one of the previous versions of MLCChat (a6b0a4c from 19.09.2023) my device with 8GB RAM managed to run a 7B model, but now none of the 7B models present in the app work, even if I clean up the RAM completely. CL_OUT_OF_RESOURCES in opencl_device_api_cc:246 is the error. Llama prints the error message in the chat, Mistral successfully loads the model, but after starting the generation it crashes the app with the same error.
Snapdragon 860

[Request] Please generate Llama-2-7b-chat-hf-q4f16_1-cuda.dll

I want to use mlc-llm on windows with cuda. I have compiled mlc_chat_cli.exe with cuda enabled, but I still need this dll to run llama.

💸 This repository is over its data quota.

Time to pull the credit card!

(env) louisbeaumont@louis030195com-third-brain:~/Documents/assistants$ git clone https://github.com/mlc-ai/binary-mlc-llm-libs.git dist/prebuilt_libs
Cloning into 'dist/prebuilt_libs'...
remote: Enumerating objects: 689, done.
remote: Counting objects: 100% (220/220), done.
remote: Compressing objects: 100% (65/65), done.
remote: Total 689 (delta 170), reused 197 (delta 155), pack-reused 469
Receiving objects: 100% (689/689), 184.05 MiB | 1.01 MiB/s, done.
Resolving deltas: 100% (495/495), done.
Updating files: 100% (197/197), done.
Downloading mlc-chat.apk (124 MB)
Error downloading object: mlc-chat.apk (b7b937c): Smudge error: Error downloading mlc-chat.apk (b7b937c7be3b7e5f8164f0f1ef58c9e1df15fd0f08721fbf7fe16d058ef09c6e): batch response: This repository is over its data quota. Account responsible for LFS bandwidth should purchase more data packs to restore access.

Errors logged to '/Users/louisbeaumont/Documents/assistants/dist/prebuilt_libs/.git/lfs/logs/20240213T142150.445523.log'.
Use `git lfs logs last` to view the log.
error: external filter 'git-lfs filter-process' failed
fatal: mlc-chat.apk: smudge filter lfs failed
warning: Clone succeeded, but checkout failed.
You can inspect what was checked out with 'git status'
and retry with 'git restore --source=HEAD :/'

[Bug] [Stack trace] RedPajama doesn't work

The latest apk (from 5 days ago) crashes while using RedPajama, but llama 2 based models seem to work (I tried the uncensored one). RedPajama gives this error:

MLCChat failed

Stack trace:
org.apache.tvm.Base$TVMError: ValueError: Check failed: shard_rec.nbytes == raw_data.length() (29583360 vs. 23663914) : Parameters are not loaded properly. Please check your parameter shards and git lfs installation
Stack trace:
  File "/Users/houbohan/tvm/src/runtime/relax_vm/ndarray_cache_support.cc", line 219

	at org.apache.tvm.Base.checkCall(Base.java:173)
	at org.apache.tvm.Function.invoke(Function.java:130)
	at ai.mlc.mlcllm.ChatModule.reload(ChatModule.java:43)
	at ai.mlc.mlcchat.AppViewModel$ChatState$mainReloadChat$1$2.invoke(AppViewModel.kt:636)
	at ai.mlc.mlcchat.AppViewModel$ChatState$mainReloadChat$1$2.invoke(AppViewModel.kt:634)
	at ai.mlc.mlcchat.AppViewModel$ChatState.callBackend(AppViewModel.kt:537)
	at ai.mlc.mlcchat.AppViewModel$ChatState.mainReloadChat$lambda$3(AppViewModel.kt:634)
	at ai.mlc.mlcchat.AppViewModel$ChatState.$r8$lambda$JJKpoRMMpp77FzXKA0o00i8lgRA(Unknown Source:0)
	at ai.mlc.mlcchat.AppViewModel$ChatState$$ExternalSyntheticLambda3.run(Unknown Source:8)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:463)
	at java.util.concurrent.FutureTask.run(FutureTask.java:264)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1137)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:637)
	at java.lang.Thread.run(Thread.java:1012)


Error message:
ValueError: Check failed: shard_rec.nbytes == raw_data.length() (29583360 vs. 23663914) : Parameters are not loaded properly. Please check your parameter shards and git lfs installation
Stack trace:
  File "/Users/houbohan/tvm/src/runtime/relax_vm/ndarray_cache_support.cc", line 219

I had RedPajama working on older versions of the apk.

Source code or how to compile these

Thanks for making these available.

How can they be built manually? Can we include that in the README?

mlc-ai / binary-mlc-llm-libs Goto Github PK

binary-mlc-llm-libs's People

Contributors

Stargazers

Watchers

Forkers

binary-mlc-llm-libs's Issues

Recommend Projects

Recommend Topics

Recommend Org