Giter Club home page Giter Club logo

binary-mlc-llm-libs's People

Contributors

acalatrava avatar charliefruan avatar david-sharma avatar davidpissarra avatar hzfengsy avatar jinhongyii avatar junrushao avatar kartik14 avatar masterjh5574 avatar rickzx avatar sangelone avatar sing-li avatar spectrometerhbh avatar tqchen avatar yzh119 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

binary-mlc-llm-libs's Issues

Llama2 70b is not working

Hi,
I just download the colab you provide at this link: https://github.com/mlc-ai/notebooks/blob/main/mlc-llm/tutorial_chat_module_getting_started.ipynb.
I works properly if I use the 7b model, however if I change the settings in order to use the 70b model, I receive the following error:

InternalError Traceback (most recent call last)
in <cell line: 4>()
2 from mlc_chat.callback import StreamToStdout
3
----> 4 cm = ChatModule(
5 model="dist/Llama-2-70b-chat-hf-q4f16_1-MLC",
6 model_lib_path="dist/prebuilt_libs/Llama-2-70b-chat-hf/Llama-2-70b-chat-hf-q4f16_1-cuda.so"

5 frames
tvm/_ffi/_cython/./packed_func.pxi in tvm._ffi._cy3.core.PackedFuncBase.call()

tvm/_ffi/_cython/./packed_func.pxi in tvm._ffi._cy3.core.FuncCall()

tvm/_ffi/_cython/./packed_func.pxi in tvm._ffi._cy3.core.FuncCall3()

tvm/_ffi/_cython/./base.pxi in tvm._ffi._cy3.core.CHECK_CALL()

/workspace/mlc-llm/cpp/llm_chat.cc in LoadParams()

InternalError: Traceback (most recent call last):
7: mlc::llm::LLMChatModule::GetFunction(tvm::runtime::String const&, tvm::runtime::ObjectPtrtvm::runtime::Object const&)::{lambda(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)#1}::operator()(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const
at /workspace/mlc-llm/cpp/llm_chat.cc:1633
6: mlc::llm::LLMChat::Reload(tvm::runtime::TVMArgValue, tvm::runtime::String, tvm::runtime::String)
at /workspace/mlc-llm/cpp/llm_chat.cc:631
5: LoadParams
at /workspace/mlc-llm/cpp/llm_chat.cc:219
4: tvm::runtime::PackedFuncObj::Extractor<tvm::runtime::PackedFuncSubObj<tvm::runtime::TypedPackedFunc<void (std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, int, int)>::AssignTypedLambda<void ()(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, int, int)>(void ()(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, int, int), std::__cxx11::basic_string<char, std::char_traits, std::allocator >)::{lambda(tvm::runtime::TVMArgs const&, tvm::runtime::TVMRetValue*)#1}> >::Call(tvm::runtime::PackedFuncObj const*, tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)
3: tvm::runtime::relax_vm::NDArrayCache::Load(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, int, int)
2: tvm::runtime::relax_vm::NDArrayCacheMetadata::Load(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&)
1: tvm::runtime::LoadBinaryFromFile(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, std::__cxx11::basic_string<char, std::char_traits, std::allocator >*)
0: _ZN3tvm7runtime6deta
File "/workspace/tvm/src/runtime/file_utils.cc", line 121
InternalError: Check failed: (!fs.fail()) is false: Cannot open dist/Llama-2-70b-chat-hf-q4f16_1-MLC/ndarray-cache.json

Is it working? Just closing on S23.

Hi!
Is it working? Downloaded LLama-2-7b, clicked on Chat button, it shows some messages, "Ready to chat".
And application closed, probably crushed. That's it.
Samsung S23

WizardCoder-15B-V1.0-q4f16_1 failing to load on WebLLM

Following the available examples in the WebLLM repo such as the next-simple-chat:

I have added the model URL and ID,

{ model_url: "https://huggingface.co/mlc-ai/mlc-chat-WizardCoder-15B-V1.0-q4f32_1/resolve/main/", local_id: "WizardCoder-15B-V1.0-q4f32_1", }

then added the libmap

"WizardCoder-15B-V1.0-q4f32_1": "https://raw.githubusercontent.com/mlc-ai/binary-mlc-llm-libs/main/WizardCoder-15B-V1.0-q4f16_1-webgpu.wasm",

but I end up getting this error immediately after loading the model on the browser:

Init error, Error: Unknown conv template wizard_coder_or_math

Gemma isn't working

Doesn't reply and after few retries throws:
MLCChat failed

Stack trace:
org.apache.tvm.Base$TVMError: InternalError: Check failed: (e == CL_SUCCESS) is false: OpenCL Error, code=-54: CL_INVALID_WORK_GROUP_SIZE
Stack trace:
File "/Users/kartik/mlc/mlc-llm/3rdparty/tvm/src/runtime/opencl/opencl_module.cc", line 90

at org.apache.tvm.Base.checkCall(Base.java:173)
at org.apache.tvm.Function.invoke(Function.java:130)
at ai.mlc.mlcllm.ChatModule.decode(ChatModule.java:74)
at ai.mlc.mlcchat.AppViewModel$ChatState$requestGenerate$1$2.invoke(AppViewModel.kt:669)
at ai.mlc.mlcchat.AppViewModel$ChatState$requestGenerate$1$2.invoke(AppViewModel.kt:668)
at ai.mlc.mlcchat.AppViewModel$ChatState.callBackend(AppViewModel.kt:548)
at ai.mlc.mlcchat.AppViewModel$ChatState.requestGenerate$lambda$4(AppViewModel.kt:668)
at ai.mlc.mlcchat.AppViewModel$ChatState.$r8$lambda$lluIrcsPALEW5nCb2tohZYadhTY(Unknown Source:0)
at ai.mlc.mlcchat.AppViewModel$ChatState$$ExternalSyntheticLambda3.run(Unknown Source:6)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:487)
at java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:644)
at java.lang.Thread.run(Thread.java:1012)

Error message:
InternalError: Check failed: (e == CL_SUCCESS) is false: OpenCL Error, code=-54: CL_INVALID_WORK_GROUP_SIZE
Stack trace:
File "/Users/kartik/mlc/mlc-llm/3rdparty/tvm/src/runtime/opencl/opencl_module.cc", line 90

Missing library files for Llama-2-70b-chat-hf-q4f16_1 model

I downloaded the 70b model and encountered an error when running it. The command I used was:

mlc_chat_cli --local-id Llama-2-70b-chat-hf-q4f16_1

I received the following error message:

WARNING: lavapipe is not a conformant vulkan implementation, testing use only.
Use MLC config: "/home/lxr/software/dist/prebuilt/mlc-chat-Llama-2-70b-chat-hf-q4f16_1/mlc-chat-config.json"
Use model weights: "/home/lxr/software/dist/prebuilt/mlc-chat-Llama-2-70b-chat-hf-q4f16_1/ndarray-cache.json"
Cannot find library "Llama-2-70b-chat-hf-q4f16_1-vulkan.so" in "dist/prebuilt/lib" or other search paths.

However, in the same directory, I have successfully deployed the 13b model. Could you please provide further guidance on how to resolve this issue?

Error When Implementing Mali GPU Acceleration on OrangePi5 with mlc-llm

Following the tutorial, I set up mlc-llm on my OrangePi5 with Mali GPU acceleration via OpenCL. Everything was smooth until I encountered an error. I've re-downloaded the Mali libraries (versions below) multiple times, but the error persists. Could the libraries be corrupted?

Library versions in use:

  • RedPajama-INCITE-Chat-3B-v1-q4f16_1
  • RedPajama-INCITE-Chat-3B-v1-q4f16_1-mali.so

Any advice on resolving this would be appreciated.

arm_release_ver: g13p0-01eac0, rk_so_ver: 3
arm_release_ver of this libmali is 'g6p0-01eac0', rk_so_ver is '7'.
Traceback (most recent call last):
  File "/home/yusepp/Desktop/test.py", line 6, in <module>
    cm = ChatModule(model=models+"/RedPajama-INCITE-Chat-3B-v1-q4f16_1",
  File "/home/yusepp/mlc-llm/python/mlc_chat/chat_module.py", line 842, in __init__
    self._reload(self.model_lib_path, self.model_path, user_chat_config_json_str)
  File "/home/yusepp/mlc-llm/python/mlc_chat/chat_module.py", line 1056, in _reload
    self._reload_func(lib, model_path, app_config_json)
  File "/home/yusepp/tvm_unity/python/tvm/_ffi/_ctypes/packed_func.py", line 239, in __call__
    raise_last_ffi_error()
  File "/home/yusepp/tvm_unity/python/tvm/_ffi/base.py", line 481, in raise_last_ffi_error
    raise py_err
  File "/home/yusepp/Desktop/tvm_unity/src/runtime/relax_vm/ndarray_cache_support.cc", line 255, in tvm::runtime::relax_vm::NDArrayCache::Load(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, int, int)
ValueError: Traceback (most recent call last):
  3: 0x0000ffff63d3ae9b
  2: 0x0000ffff63d3ac23
  1: 0x0000ffff63d392bf
  0: tvm::runtime::relax_vm::NDArrayCache::Load(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, int, int)
        at /home/yusepp/Desktop/tvm_unity/src/runtime/relax_vm/ndarray_cache_support.cc:255
  4: 0x0000ffff63d3ae9b
  3: 0x0000ffff63d3ac23
  2: 0x0000ffff63d392bf
  1: tvm::runtime::relax_vm::NDArrayCache::Load(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, int, int)
        at /home/yusepp/Desktop/tvm_unity/src/runtime/relax_vm/ndarray_cache_support.cc:253
  0: tvm::runtime::relax_vm::NDArrayCacheMetadata::FileRecord::Load(DLDevice, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*, tvm::runtime::Optional<tvm::runtime::NDArray>*) const
        at /home/yusepp/Desktop/tvm_unity/src/runtime/relax_vm/ndarray_cache_support.cc:193
  File "/home/yusepp/Desktop/tvm_unity/src/runtime/relax_vm/ndarray_cache_support.cc", line 255
ValueError: Error when loading parameters from params_shard_0.bin: [20:19:57] /home/yusepp/Desktop/tvm_unity/src/runtime/relax_vm/ndarray_cache_support.cc:193: Check failed: this->nbytes == raw_data_buffer->length() (64552960 vs. 133) : ValueError: Encountered an corrupted parameter shard. It means it is not downloaded completely or downloading is interrupted. Please try to download again.

Android app crash after last models updated

MLCChat failed

Stack trace:
org.apache.tvm.Base$TVMError: ValueError: Error when loading parameters from params_shard_66.bin: [22:39:36] /Users/kartik/mlc/mlc-llm/3rdparty/tvm/src/runtime/relax_vm/ndarray_cache_support.cc:193: Check failed: this->nbytes == raw_data_buffer->length() (45088768 vs. 21734731) : ValueError: Encountered an corrupted parameter shard. It means it is not downloaded completely or downloading is interrupted. Please try to download again.
Stack trace:
File "/Users/kartik/mlc/mlc-llm/3rdparty/tvm/src/runtime/relax_vm/ndarray_cache_support.cc", line 255

at org.apache.tvm.Base.checkCall(Base.java:173)
at org.apache.tvm.Function.invoke(Function.java:130)
at ai.mlc.mlcllm.ChatModule.reload(ChatModule.java:43)
at ai.mlc.mlcchat.AppViewModel$ChatState$mainReloadChat$1$2.invoke(AppViewModel.kt:633)
at ai.mlc.mlcchat.AppViewModel$ChatState$mainReloadChat$1$2.invoke(AppViewModel.kt:631)
at ai.mlc.mlcchat.AppViewModel$ChatState.callBackend(AppViewModel.kt:534)
at ai.mlc.mlcchat.AppViewModel$ChatState.mainReloadChat$lambda$3(AppViewModel.kt:631)
at ai.mlc.mlcchat.AppViewModel$ChatState.$r8$lambda$JJKpoRMMpp77FzXKA0o00i8lgRA(Unknown Source:0)
at ai.mlc.mlcchat.AppViewModel$ChatState$$ExternalSyntheticLambda3.run(Unknown Source:8)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:487)
at java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:644)
at java.lang.Thread.run(Thread.java:1012)

Error message:
ValueError: Error when loading parameters from params_shard_66.bin: [22:39:36] /Users/kartik/mlc/mlc-llm/3rdparty/tvm/src/runtime/relax_vm/ndarray_cache_support.cc:193: Check failed: this->nbytes == raw_data_buffer->length() (45088768 vs. 21734731) : ValueError: Encountered an corrupted parameter shard. It means it is not downloaded completely or downloading is interrupted. Please try to download again.
Stack trace:
File "/Users/kartik/mlc/mlc-llm/3rdparty/tvm/src/runtime/relax_vm/ndarray_cache_support.cc", line 255

Intel MAC shared library files

Which files here should users on Intel Mac machines be using. It looks like the metal.so files are all built for arm64 architecture. What about x86?

Resource consumption degradation

Hi. I try this app from time to time to look over the progress in mobile LLMs. In one of the previous versions of MLCChat (a6b0a4c from 19.09.2023) my device with 8GB RAM managed to run a 7B model, but now none of the 7B models present in the app work, even if I clean up the RAM completely. CL_OUT_OF_RESOURCES in opencl_device_api_cc:246 is the error. Llama prints the error message in the chat, Mistral successfully loads the model, but after starting the generation it crashes the app with the same error.
Snapdragon 860

💸 This repository is over its data quota.

Time to pull the credit card!

(env) louisbeaumont@louis030195com-third-brain:~/Documents/assistants$ git clone https://github.com/mlc-ai/binary-mlc-llm-libs.git dist/prebuilt_libs
Cloning into 'dist/prebuilt_libs'...
remote: Enumerating objects: 689, done.
remote: Counting objects: 100% (220/220), done.
remote: Compressing objects: 100% (65/65), done.
remote: Total 689 (delta 170), reused 197 (delta 155), pack-reused 469
Receiving objects: 100% (689/689), 184.05 MiB | 1.01 MiB/s, done.
Resolving deltas: 100% (495/495), done.
Updating files: 100% (197/197), done.
Downloading mlc-chat.apk (124 MB)
Error downloading object: mlc-chat.apk (b7b937c): Smudge error: Error downloading mlc-chat.apk (b7b937c7be3b7e5f8164f0f1ef58c9e1df15fd0f08721fbf7fe16d058ef09c6e): batch response: This repository is over its data quota. Account responsible for LFS bandwidth should purchase more data packs to restore access.

Errors logged to '/Users/louisbeaumont/Documents/assistants/dist/prebuilt_libs/.git/lfs/logs/20240213T142150.445523.log'.
Use `git lfs logs last` to view the log.
error: external filter 'git-lfs filter-process' failed
fatal: mlc-chat.apk: smudge filter lfs failed
warning: Clone succeeded, but checkout failed.
You can inspect what was checked out with 'git status'
and retry with 'git restore --source=HEAD :/'

[Bug] [Stack trace] RedPajama doesn't work

The latest apk (from 5 days ago) crashes while using RedPajama, but llama 2 based models seem to work (I tried the uncensored one). RedPajama gives this error:

MLCChat failed

Stack trace:
org.apache.tvm.Base$TVMError: ValueError: Check failed: shard_rec.nbytes == raw_data.length() (29583360 vs. 23663914) : Parameters are not loaded properly. Please check your parameter shards and git lfs installation
Stack trace:
  File "/Users/houbohan/tvm/src/runtime/relax_vm/ndarray_cache_support.cc", line 219

	at org.apache.tvm.Base.checkCall(Base.java:173)
	at org.apache.tvm.Function.invoke(Function.java:130)
	at ai.mlc.mlcllm.ChatModule.reload(ChatModule.java:43)
	at ai.mlc.mlcchat.AppViewModel$ChatState$mainReloadChat$1$2.invoke(AppViewModel.kt:636)
	at ai.mlc.mlcchat.AppViewModel$ChatState$mainReloadChat$1$2.invoke(AppViewModel.kt:634)
	at ai.mlc.mlcchat.AppViewModel$ChatState.callBackend(AppViewModel.kt:537)
	at ai.mlc.mlcchat.AppViewModel$ChatState.mainReloadChat$lambda$3(AppViewModel.kt:634)
	at ai.mlc.mlcchat.AppViewModel$ChatState.$r8$lambda$JJKpoRMMpp77FzXKA0o00i8lgRA(Unknown Source:0)
	at ai.mlc.mlcchat.AppViewModel$ChatState$$ExternalSyntheticLambda3.run(Unknown Source:8)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:463)
	at java.util.concurrent.FutureTask.run(FutureTask.java:264)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1137)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:637)
	at java.lang.Thread.run(Thread.java:1012)


Error message:
ValueError: Check failed: shard_rec.nbytes == raw_data.length() (29583360 vs. 23663914) : Parameters are not loaded properly. Please check your parameter shards and git lfs installation
Stack trace:
  File "/Users/houbohan/tvm/src/runtime/relax_vm/ndarray_cache_support.cc", line 219

I had RedPajama working on older versions of the apk.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.