bug: 2024-04-02 23:28:42.699 24048-24662 TVM_RUNTIME ai.mlc.mlcchat

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

I see. To help confirm the issue, could you help check the file <code class="notransla

Fwiw, I am getting the same error with the following config: <div class="snippet-c

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

clone the repo: git clone --recursive https://gith

KVCachw expects [Bug] ,about mlc-ai/mlc-llm

Comments (20)

wormedleaf commented on June 12, 2024 2

I also encountered the same problem, which can be solved by modifying mlc-llm/3rdparty/tvm/src/runtime/relax_vm/paged_kv_cache.cc :
line 1814:
TVM_REGISTER_GLOBAL("vm.builtin.paged_attention_kv_cache_create_reduced")
.set_body_typed([](ShapeTuple cache_config, int64_t num_layers, int64_t num_qo_heads,
int64_t num_kv_heads, int64_t head_dim, int rope_mode, double rotary_scale,
double rotary_theta, NDArray init, PackedFunc f_transpose_append,
PackedFunc f_attention_prefill, PackedFunc f_attention_decode,
PackedFunc f_attention_prefill_sliding_window,
PackedFunc f_attention_decode_sliding_window,
PackedFunc f_attention_prefill_ragged, PackedFunc f_merge_inplace,
PackedFunc f_split_rotary, PackedFunc f_copy_single_page) {
//Optional f_debug_get_kv
CHECK_EQ(cache_config.size(), 5);
int64_t reserved_num_seqs = cache_config[0];
int64_t total_token_capacity = cache_config[1];
int64_t prefill_chunk_size = cache_config[2];
int64_t page_size = cache_config[3];
bool support_sliding_window = cache_config[4];
int64_t num_total_pages = (total_token_capacity + page_size - 1) / page_size + 1;
if (support_sliding_window) {
// When sliding window is enabled, each sequence may use two more pages at most.
num_total_pages += reserved_num_seqs * 2;
}
ObjectPtr n = make_object(
page_size, num_layers, num_qo_heads, num_kv_heads, head_dim, reserved_num_seqs,
num_total_pages, prefill_chunk_size, support_sliding_window, RoPEMode(rope_mode),
rotary_scale, rotary_theta, init->dtype, init->device, std::move(f_transpose_append),
std::move(f_attention_prefill), std::move(f_attention_decode),
std::move(f_attention_prefill_sliding_window),
std::move(f_attention_decode_sliding_window), std::move(f_attention_prefill_ragged), //
NullOpt, NullOpt, NullOpt, NullOpt, NullOpt, NullOpt, //
std::move(f_merge_inplace), std::move(f_split_rotary), std::move(f_copy_single_page),
NullOpt);
//std::move(f_debug_get_kv));
return AttentionKVCache(std::move(n));
});

from mlc-llm.

xxxxyu commented on June 12, 2024 1

@sygi

I'm using mlc-ai-nightly==0.15.dev275, which should contain a prebuilt tvm package. I didn't compile tvm myself either, I only attempted to compile mlc-llm myself.

python -c "import tvm; print('\n'.join(f'{k}: {v}' for k, v in tvm.support.libinfo().items()))" gives: GIT_COMMIT_HASH: ae057a2e74e895a846df958c19ff342505131a65.

./prepare_libs.sh was successful to me.

BTW, I've noticed a small difference between our codes. You might want to try ObjectPtr<PagedAttentionKVCacheObj> n = make_object<PagedAttentionKVCacheObj>? But I'm not sure if this matters :)

from mlc-llm.

MasterJH5574 commented on June 12, 2024

Hi @Vinaysukhesh98 thank you for reporting. Could you re-compile the model with python -m mlc_llm compile ...? Also see this thread where the same issue happens. I believe model recompilation can help.

from mlc-llm.

Vinaysukhesh98 commented on June 12, 2024

Ha still same

from mlc-llm.

MasterJH5574 commented on June 12, 2024

I see. To help confirm the issue, could you help check the file mlc_llm/nn/kv_cache.py on your local side and see whether the Line 351 is the following? If it is not the following, it means your mlc_llm is not up to date, and please update mlc_llm to the latest nightly or the latests commit on the main branch.

bb.add_func(_copy_single_page(num_key_value_heads, page_size, head_dim, dtype, target), "kv_cache_copy_single_page"),

mlc-llm/python/mlc_llm/nn/kv_cache.py

Line 351 in 5bc3ffa

 bb.add_func(_copy_single_page(num_key_value_heads, page_size, head_dim, dtype, target), "kv_cache_copy_single_page"), 

from mlc-llm.

Vinaysukhesh98 commented on June 12, 2024

I see. To help confirm the issue, could you help check the file mlc_llm/nn/kv_cache.py on your local side and see whether the Line 351 is the following? If it is not the following, it means your mlc_llm is not up to date, and please update mlc_llm to the latest nightly or the latests commit on the main branch.
bb.add_func(_copy_single_page(num_key_value_heads, page_size, head_dim, dtype, target), "kv_cache_copy_single_page"),
mlc-llm/python/mlc_llm/nn/kv_cache.py

Line 351 in 5bc3ffa

bb.add_func(_copy_single_page(num_key_value_heads, page_size, head_dim, dtype, target), "kv_cache_copy_single_page"),

bb.add_func(_copy_single_page(num_key_value_heads, page_size, head_dim, dtype, target), "kv_cache_copy_single_page"),

from mlc-llm.

Vinaysukhesh98 commented on June 12, 2024

mlc_llm version
pip show mlc_llm
Name: mlc-llm
Version: 0.1.dev1068+gb7416c02
Summary: MLC LLM: an universal LLM deployment engine via ML compilation.
Home-page: https://llm.mlc.ai/
Author: MLC LLM Contributors
Author-email:
License: Apache 2.0
Location: /home/mbuhyd/Documents/mlc-llm/python
Editable project location: /home/mbuhyd/Documents/mlc-llm/python
Requires: fastapi, openai, prompt_toolkit, requests, safetensors, shortuuid, tiktoken, torch, tqdm, uvicorn
Required-by:

from mlc-llm.

sygi commented on June 12, 2024

Fwiw, I am getting the same error with the following config:

USE_NVTX: OFF
USE_GTEST: AUTO
SUMMARIZE: OFF
TVM_DEBUG_WITH_ABI_CHANGE: OFF
USE_IOS_RPC: OFF
USE_MSC: OFF
USE_ETHOSU: 
CUDA_VERSION: NOT-FOUND
USE_LIBBACKTRACE: AUTO
DLPACK_PATH: 3rdparty/dlpack/include
USE_TENSORRT_CODEGEN: OFF
USE_THRUST: OFF
USE_TARGET_ONNX: OFF
USE_AOT_EXECUTOR: ON
BUILD_DUMMY_LIBTVM: OFF
USE_CUDNN: OFF
USE_TENSORRT_RUNTIME: OFF
USE_ARM_COMPUTE_LIB_GRAPH_EXECUTOR: OFF
USE_CCACHE: AUTO
USE_ARM_COMPUTE_LIB: OFF
USE_CPP_RTVM: 
USE_OPENCL_GTEST: /path/to/opencl/gtest
USE_MKL: OFF
USE_PT_TVMDSOOP: OFF
MLIR_VERSION: NOT-FOUND
USE_CLML: OFF
USE_STACKVM_RUNTIME: OFF
USE_GRAPH_EXECUTOR_CUDA_GRAPH: OFF
ROCM_PATH: /opt/rocm
USE_DNNL: OFF
USE_VITIS_AI: OFF
USE_MLIR: OFF
USE_RCCL: OFF
USE_LLVM: ON
USE_VERILATOR: OFF
USE_TF_TVMDSOOP: OFF
USE_THREADS: ON
USE_MSVC_MT: OFF
BACKTRACE_ON_SEGFAULT: OFF
USE_GRAPH_EXECUTOR: ON
USE_NCCL: OFF
USE_ROCBLAS: OFF
GIT_COMMIT_HASH: 5400532c4ba37e8a30fcaac488c2ecb05a307e4f
USE_VULKAN: OFF
USE_RUST_EXT: OFF
USE_CUTLASS: OFF
USE_CPP_RPC: OFF
USE_HEXAGON: OFF
USE_CUSTOM_LOGGING: OFF
USE_UMA: OFF
USE_FALLBACK_STL_MAP: OFF
USE_SORT: ON
USE_RTTI: ON
GIT_COMMIT_TIME: 2024-03-30 17:34:21 -0400
USE_HEXAGON_SDK: /path/to/sdk
USE_BLAS: none
USE_ETHOSN: OFF
USE_LIBTORCH: OFF
USE_RANDOM: ON
USE_CUDA: OFF
USE_COREML: OFF
USE_AMX: OFF
BUILD_STATIC_RUNTIME: OFF
USE_CMSISNN: OFF
USE_KHRONOS_SPIRV: OFF
USE_CLML_GRAPH_EXECUTOR: OFF
USE_TFLITE: OFF
USE_HEXAGON_GTEST: /path/to/hexagon/gtest
PICOJSON_PATH: 3rdparty/picojson
USE_OPENCL_ENABLE_HOST_PTR: OFF
INSTALL_DEV: OFF
USE_PROFILER: ON
USE_NNPACK: OFF
LLVM_VERSION: 18.1.3
USE_MRVL: OFF
USE_OPENCL: OFF
COMPILER_RT_PATH: 3rdparty/compiler-rt
RANG_PATH: 3rdparty/rang/include
USE_SPIRV_KHR_INTEGER_DOT_PRODUCT: OFF
USE_OPENMP: OFF
USE_BNNS: OFF
USE_FLASHINFER: 
USE_CUBLAS: OFF
USE_METAL: OFF
USE_MICRO_STANDALONE_RUNTIME: OFF
USE_HEXAGON_EXTERNAL_LIBS: OFF
USE_ALTERNATIVE_LINKER: AUTO
USE_BYODT_POSIT: OFF
USE_HEXAGON_RPC: OFF
USE_MICRO: OFF
DMLC_PATH: 3rdparty/dmlc-core/include
INDEX_DEFAULT_I64: ON
USE_RELAY_DEBUG: OFF
USE_RPC: ON
USE_TENSORFLOW_PATH: none
TVM_CLML_VERSION: 
USE_MIOPEN: OFF
USE_ROCM: OFF
USE_PAPI: OFF
USE_CURAND: OFF
TVM_CXX_COMPILER_PATH: /usr/bin/c++
HIDE_PRIVATE_SYMBOLS: OFF

I have just cloned the repo, so everything should be up to date.

from mlc-llm.

MasterJH5574 commented on June 12, 2024

@Vinaysukhesh98 @sygi could you share your commands to run things, together with the full logging? From the information we have so far I am unable to find the reason. Thanks in ahead!

from mlc-llm.

sygi commented on June 12, 2024

clone the repo:
git clone --recursive https://github.com/mlc-ai/mlc-llm/
add tvm, java, android, etc to bashrc
compile tvm:
chmod +x 3rdparty/libbacktrace/configure && \ mkdir -p cmake-build && \ cmake -H. -Bcmake-build -DUSE_LLVM=ON && \ cmake --build cmake-build --target all -- -j 4 && \ mv cmake-build build
download configs etc for android + llama 7B f16 from here.
start virtualenv, install dependencies (attrs, numpy, typing_extensions, psutil, decorator)
go to android/library, add:

set(JAVA_AWT_LIBRARY NotNeeded)
set(JAVA_JVM_LIBRARY NotNeeded)
set(JAVA_INCLUDE_PATH2 NotNeeded)
set(JAVA_AWT_INCLUDE_PATH NotNeeded)

to CMakeLists.txt
7. modify app-config.json to only include the llama model and point to my android.tar.
8. Run prepare_libs.sh, make sure the libraries appeared.
9. Compile things in android studio, send to device.
10. Download the weights, start conversation.

from mlc-llm.

xxxxyu commented on June 12, 2024

Hello, same issue encountered when building and testing the android app as instructed in https://llm.mlc.ai/docs/deploy/android.html. The error message shows as the chatting UI initializes.

We use prebuilt models and libs:

We use the latest prebuilt packages:

mlc_llm v0.1.dev0
tvm 0.16.dev0.

We are attempting to fix this with earlier versions. No prebuilt packages are provided, so we have to build from source.

from mlc-llm.

xxxxyu commented on June 12, 2024

Hello, same issue encountered when building and testing the android app as instructed in llm.mlc.ai/docs/deploy/android.html. The error message shows as the chatting UI initializes.

We use prebuilt models and libs:

huggingface.co/mlc-ai/RedPajama-INCITE-Chat-3B-v1-q4f16_1-MLC

huggingface.co/mlc-ai/phi-2-q4f16_1-MLC

mlc-ai/binary-mlc-llm-libs

We use the latest prebuilt packages:

mlc_llm v0.1.dev0

tvm 0.16.dev0.

We are attempting to fix this with earlier versions. No prebuilt packages are provided, so we have to build from source.

Update:

Still no luck. We've managed to build the android .tar with tvm v0.15 (after fixing some version mismatch issues with mlc and llvm...), but it seems not compatible with the latest prebuilt weights and libs, and we got error message as in #2031. It looks like the deployment workflow is undergoing significant changes recently, and we are not able to find a walkaround for now by ourselves.

We understand that the android pipeline might not be the priority, but still appreciate it if you could look at this issue and provide a valid android building & deploying pipeline :)

from mlc-llm.

sygi commented on June 12, 2024

I actually looked into #2031, @xxxxyu a bit and it looks like the config format has changed a bit recently. Can you confirm that you have the last version of mlc-chat-config (which has conv_template as a string), as well as a new version of tvm (which parses the object here)?

from mlc-llm.

xxxxyu commented on June 12, 2024

@sygi Hi, the mlc-chat-config is exactly the same as you provided, but mlc/tvm version is earlier.

We actually tested the following 3 settings (windows + wsl + Pixel 6 Pro):

latest mlc + tvm 0.16: bug reported in this issue .
mlc commit #1659 + tvm 0.15 (built from source): bug reported by #2031.
latest mlc + tvm 0.15: mismatch.

from mlc-llm.

xxxxyu commented on June 12, 2024

#2076 (comment) works fine for me, thanks!

Update:

It seems this only works for the prebuilt libs. When I want to compile with customized configurations, the issue still exists. I'm using the latest mlc-llm, both prebuilt and build-from-source ones failed. So there might be something with the compilation codes too, could you guys confirm?

from mlc-llm.

sygi commented on June 12, 2024

Not sure if this is what @xxxxyu meant, but for me:

the compiler for tvm wasn't able to infer the types until I changed to:

make_object<PagedAttentionKVCacheObj>

When I ran ./prepare_libs.sh, I got:

In file included from /home/sygi/code/mlc-llm2/cpp/llm_chat.cc:26:
/home/sygi/code/mlc-llm2/cpp/./metadata/model.h:18:7: error: typedef redefinition with different types ('std::unordered_map<std::string, value>' (aka 'unordered_map<basic_string<char>, picojson::value>') vs 'value::object' (aka 'picojson::object_with_ordered_keys'))
using object = std::unordered_map<std::string, value>;
      ^
/home/sygi/code/mlc-llm2/3rdparty/tvm/3rdparty/picojson/picojson.h:326:23: note: previous definition is here
typedef value::object object;

from mlc-llm.

xxxxyu commented on June 12, 2024

Hi @sygi

Regarding the make_object type error, I've fixed it as you did to make it work.
I didn't encounter any other issue at compiling time, including the redefinition error you mentioned. You might need to check if the brackets still match after replacing the codes.

The error I mentioned in the update shows only at runtime, and only when I attempt to compile the model libs on my own (the prebuilt model libs on https://github.com/mlc-ai/binary-mlc-llm-libs are OK), and the error message is still "...relax.vm.AttentionKVCache expects 19 arguments, but 18 were provided".

from mlc-llm.

sygi commented on June 12, 2024

Thank you for confirming. To clarify, my error doesn't appear during compilation of tvm (which runs fine), but when running ./prepare_libs.sh from the android folder (presumably during linking with picojson). Did you also do this? Could you confirm that you're at 0f67508 commit of tvm submodule?

Fwiw, I only want to do the prebuilt libraries, compiling them yourself seems like another ordeal ^^

from mlc-llm.

tqchen commented on June 12, 2024

latest android sdk might help address related issues https://llm.mlc.ai/docs/deploy/android.html

from mlc-llm.

textmony commented on June 12, 2024

Hit the same issue with the latest code at 9998076 for iOS app with a customized llama model

Function vm.builtin.paged_attention_kv_cache_create_reduced(0: runtime.ShapeTuple, 1: int64_t, 2: int64_t, 3: int64_t, 4: int64_t, 5: int, 6: double, 7: double, 8: runtime.NDArray, 9: runtime.PackedFunc, 10: runtime.PackedFunc, 11: runtime.PackedFunc, 12: runtime.PackedFunc, 13: runtime.PackedFunc, 14: runtime.PackedFunc, 15: runtime.PackedFunc, 16: runtime.PackedFunc, 17: runtime.PackedFunc, 18: runtime.PackedFunc) -> relax.vm.AttentionKVCache expects 19 arguments, but 18 were provided.

from mlc-llm.

KVCachw expects [Bug] about mlc-llm HOT 20 CLOSED

Comments (20)

Update:

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent