It seems like the tuning is per device, although the m1 tuning is applied when using a

How to `tune_relax` for other targets about mlc-llm HOT 1 CLOSED

funnbot commented on May 9, 2024

How to `tune_relax` for other targets

from mlc-llm.

Comments (1)

junrushao commented on May 9, 2024 1

tune_relax isn't something we are currently using to tune LLMs, because it only supports static shape workloads. Instead we are using a mixed strategy that allows dynamic shape workloads as well. A tutorial should be coming soon :-)

from mlc-llm.

Related Issues (20)

[Question] Rust SDK + WebAssembly + GPU? HOT 1
[Bug] Failed to compile because the correct code page is not set
[Bug] Token IDs not accepted by JSON grammar HOT 4
[Question] Is GGUF model package format supported with quantized models? HOT 3
[Question] Can I serve multiple models with the same instance?
[Question] Is there an embeddings model in MLC format?
[Question] Is Apple Silicon Neural Engine (ANE) and Core ML model package format supported? HOT 1
[Model Request] OpenELM HOT 1
[Question] Support for Custom Attention Mask
[Bug] libc++abi: terminating due to uncaught exception of type tvm::runtime::InternalError: [14:02:26] HOT 3
[Model Request] Microsoft Phi-3 mini Instruct (Faster and better then LLama 3 8B) HOT 2
[Bug] Unexpected Error: The model weight size may be larger than GPU memory size HOT 5
[Bug] TVMError: Check failed: (result) is false: Failed to allocate 99121664 bytes with alignment 16 bytes
AutoTVM optimization? HOT 3
Phi-3-3.8 billion model [Model Request] HOT 1
[Question] Omniquant. (AFAIK) scores best for Q. Methods, why no adoption? In any case, is per-tensor quant. best for Mixtral/MoE models? HOT 1
[Bug] Error: could not compile `regex-syntax`
[Bug] `mlc_llm chat` throws errors for model `mlc-ai/Qwen1.5-1.8B-Chat-q4f16_1-MLC`
[Bug] `system-lib-prefix` would be cleared if `device` is not strictly `android` while `mlc_llm compile` HOT 2
/opt/AI/llm_obj/mlc-llm/3rdparty/tvm/jvm/native/src/main/native/org_apache_tvm_native_c_api.cc:232:31: error: cannot initialize a parameter of type 'void **' with an rvalue of type 'JNIEnv **' (aka 'JNIEnv_ **') 232 | _jvm->AttachCurrentThread(&env, nullptr); | ^~~~ /usr/local/java/jdk-17.0.11/include/jni.h:1938:37: note: passing argument to parameter 'penv' here 1938 | jint AttachCurrentThread(void **penv, void *args) { | ^ /opt/AI/llm_obj/mlc-llm/3rdparty/tvm/jvm/native/src/main/native/org_apache_tvm_native_c_api.cc:309:31: error: cannot initialize a parameter of type 'void **' with an rvalue of type 'JNIEnv **' (aka 'JNIEnv_ **') HOT 2

How to `tune_relax` for other targets about mlc-llm HOT 1 CLOSED

Comments (1)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent