Comments (8)
Hello,
For the server, I have tested (on the host, not within docker) with the command bash workspace/service_docker_up.sh
, and it works without segfault. I tested on CentOS 7, but it seems you are using Ubuntu. Could you please post all the error messages for us to investigate?
For the app, the correct command should be python3 -m lmdeploy.app {server_ip_address}:33337
. I think you are launching the app under lmdeploy/lmdeploy
directory so that the current directory is of the highest priority during import and the submodule name torch
override the actual pytorch. Just change to another directory and use python3 -m
to launch the app. Of course, you need to pip install lmdeploy first, otherwise, this module won't be found.
from lmdeploy.
=============================
== Triton Inference Server ==
=============================
NVIDIA Release 22.12 (build 50109463)
Triton Server Version 2.29.0
Copyright (c) 2018-2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES. All rights reserved.
This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license
I0710 06:49:48.339074 1 pinned_memory_manager.cc:240] Pinned memory pool is created at '0x7f4216000000' with size 268435456
I0710 06:49:48.339509 1 cuda_memory_manager.cc:105] CUDA memory pool is created on device 0 with size 67108864
I0710 06:49:48.342878 1 model_lifecycle.cc:459] loading: turbomind:1
I0710 06:49:48.342941 1 model_lifecycle.cc:459] loading: postprocessing:1
I0710 06:49:48.342985 1 model_lifecycle.cc:459] loading: preprocessing:1
I0710 06:49:48.514055 1 libfastertransformer.cc:1746] TRITONBACKEND_Initialize: turbomind
I0710 06:49:48.514097 1 libfastertransformer.cc:1753] Triton TRITONBACKEND API version: 1.10
I0710 06:49:48.514106 1 libfastertransformer.cc:1757] 'turbomind' TRITONBACKEND API version: 1.10
I0710 06:49:52.181203 1 libfastertransformer.cc:1784] TRITONBACKEND_ModelInitialize: turbomind (version 1)
I0710 06:49:52.182419 1 libfastertransformer.cc:307] Instance group type: KIND_CPU count: 48
num_nodes=1
tp_pp_size=1
gpu_size=1
world_size=1
model_instance_size=1
I0710 06:49:52.182461 1 libfastertransformer.cc:346] Sequence Batching: disabled
I0710 06:49:52.182471 1 libfastertransformer.cc:357] Dynamic Batching: disabled
[ERROR] Can't load '/workspace/models/model_repository/turbomind/1/weights/config.ini'
terminate called after throwing an instance of 'std::runtime_error'
what(): [FT][ERROR] Assertion fail: /opt/tritonserver/lmdeploy-main/src/turbomind/triton_backend/llama/LlamaTritonModel.cc:110
[b0c399235231:00001] *** Process received signal ***
[b0c399235231:00001] Signal: Aborted (6)
[b0c399235231:00001] Signal code: (-6)
[b0c399235231:00001] [ 0] /usr/lib/x86_64-linux-gnu/libpthread.so.0(+0x14420)[0x7f4267178420]
[b0c399235231:00001] [ 1] /usr/lib/x86_64-linux-gnu/libc.so.6(gsignal+0xcb)[0x7f4265a0300b]
[b0c399235231:00001] [ 2] /usr/lib/x86_64-linux-gnu/libc.so.6(abort+0x12b)[0x7f42659e2859]
[b0c399235231:00001] [ 3] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0x9e911)[0x7f4265dbc911]
[b0c399235231:00001] [ 4] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xaa38c)[0x7f4265dc838c]
[b0c399235231:00001] [ 5] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xaa3f7)[0x7f4265dc83f7]
[b0c399235231:00001] [ 6] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xaa6a9)[0x7f4265dc86a9]
[b0c399235231:00001] [ 7] /opt/tritonserver/backends/turbomind/libtransformer-shared.so(_ZN9turbomind17throwRuntimeErrorEPKciRKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE+0x23d)[0x7f41de1888dd]
[b0c399235231:00001] [ 8] /opt/tritonserver/backends/turbomind/libtransformer-shared.so(_ZN16LlamaTritonModelI6__halfEC1EmmiNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE+0x63e)[0x7f41de249b1e]
[b0c399235231:00001] [ 9] /opt/tritonserver/backends/turbomind/libtriton_turbomind.so(+0x16293)[0x7f4256145293]
[b0c399235231:00001] [10] /opt/tritonserver/backends/turbomind/libtriton_turbomind.so(+0x17811)[0x7f4256146811]
[b0c399235231:00001] [11] /opt/tritonserver/backends/turbomind/libtriton_turbomind.so(+0x24464)[0x7f4256153464]
[b0c399235231:00001] [12] /opt/tritonserver/backends/turbomind/libtriton_turbomind.so(TRITONBACKEND_ModelInitialize+0x341)[0x7f4256153a91]
[b0c399235231:00001] [13] /opt/tritonserver/lib/libtritonserver.so(+0x10689b)[0x7f42662ad89b]
[b0c399235231:00001] [14] /opt/tritonserver/lib/libtritonserver.so(+0x1c4f5d)[0x7f426636bf5d]
[b0c399235231:00001] [15] /opt/tritonserver/lib/libtritonserver.so(+0x1caccd)[0x7f4266371ccd]
[b0c399235231:00001] [16] /opt/tritonserver/lib/libtritonserver.so(+0x3083a0)[0x7f42664af3a0]
[b0c399235231:00001] [17] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xd6de4)[0x7f4265df4de4]
[b0c399235231:00001] [18] /usr/lib/x86_64-linux-gnu/libpthread.so.0(+0x8609)[0x7f426716c609]
[b0c399235231:00001] [19] /usr/lib/x86_64-linux-gnu/libc.so.6(clone+0x43)[0x7f4265adf133]
[b0c399235231:00001] *** End of error message ***
[b0c399235231:1 :0:81] Caught signal 11 (Segmentation fault: Sent by the kernel at address (nil))
==== backtrace (tid: 81) ====
0 0x0000000000014420 __funlockfile() ???:0
1 0x0000000000022941 abort() ???:0
2 0x000000000009e911 __cxa_throw_bad_array_new_length() ???:0
3 0x00000000000aa38c std::rethrow_exception() ???:0
4 0x00000000000aa3f7 std::terminate() ???:0
5 0x00000000000aa6a9 __cxa_throw() ???:0
6 0x00000000000618dd turbomind::throwRuntimeError() /opt/tritonserver/lmdeploy-main/src/turbomind/utils/cuda_utils.h:202
7 0x0000000000122b1e turbomind::myAssert() /opt/tritonserver/lmdeploy-main/src/turbomind/utils/cuda_utils.h:209
8 0x0000000000122b1e LlamaTritonModel<__half>::LlamaTritonModel() /opt/tritonserver/lmdeploy-main/src/turbomind/triton_backend/llama/LlamaTritonModel.cc:110
9 0x0000000000016293 __gnu_cxx::new_allocator<LlamaTritonModel<__half> >::construct<LlamaTritonModel<__half>, int const&, int const&, int const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&>() /usr/include/c++/9/ext/new_allocator.h:146
10 0x0000000000016293 std::allocator_traits<std::allocator<LlamaTritonModel<__half> > >::construct<LlamaTritonModel<__half>, int const&, int const&, int const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&>() /usr/include/c++/9/bits/alloc_traits.h:483
11 0x0000000000016293 std::_Sp_counted_ptr_inplace<LlamaTritonModel<__half>, std::allocator<LlamaTritonModel<__half> >, (__gnu_cxx::_Lock_policy)2>::_Sp_counted_ptr_inplace<int const&, int const&, int const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&>() /usr/include/c++/9/bits/shared_ptr_base.h:548
12 0x0000000000016293 std::__shared_count<(__gnu_cxx::_Lock_policy)2>::__shared_count<LlamaTritonModel<__half>, std::allocator<LlamaTritonModel<__half> >, int const&, int const&, int const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&>() /usr/include/c++/9/bits/shared_ptr_base.h:679
13 0x0000000000016293 std::__shared_ptr<LlamaTritonModel<__half>, (__gnu_cxx::_Lock_policy)2>::__shared_ptr<std::allocator<LlamaTritonModel<__half> >, int const&, int const&, int const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&>() /usr/include/c++/9/bits/shared_ptr_base.h:1344
14 0x0000000000016293 std::shared_ptr<LlamaTritonModel<__half> >::shared_ptr<std::allocator<LlamaTritonModel<__half> >, int const&, int const&, int const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&>() /usr/include/c++/9/bits/shared_ptr.h:359
15 0x0000000000016293 std::allocate_shared<LlamaTritonModel<__half>, std::allocator<LlamaTritonModel<__half> >, int const&, int const&, int const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&>() /usr/include/c++/9/bits/shared_ptr.h:702
16 0x0000000000016293 std::make_shared<LlamaTritonModel<__half>, int const&, int const&, int const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&>() /usr/include/c++/9/bits/shared_ptr.h:718
17 0x0000000000016293 triton::backend::turbomind_backend::ModelState::ModelFactory() /opt/tritonserver/lmdeploy-main/src/turbomind/triton_backend/libfastertransformer.cc:267
18 0x0000000000017811 triton::backend::turbomind_backend::ModelState::ModelState() /opt/tritonserver/lmdeploy-main/src/turbomind/triton_backend/libfastertransformer.cc:371
19 0x0000000000017811 std::__shared_ptr<AbstractTransformerModel, (__gnu_cxx::_Lock_policy)2>::operator=() /usr/include/c++/9/bits/shared_ptr_base.h:1265
20 0x0000000000017811 std::shared_ptr<AbstractTransformerModel>::operator=() /usr/include/c++/9/bits/shared_ptr.h:335
21 0x0000000000017811 triton::backend::turbomind_backend::ModelState::ModelState() /opt/tritonserver/lmdeploy-main/src/turbomind/triton_backend/libfastertransformer.cc:371
22 0x0000000000024464 triton::backend::turbomind_backend::ModelState::Create() /opt/tritonserver/lmdeploy-main/src/turbomind/triton_backend/libfastertransformer.cc:182
23 0x0000000000024a91 TRITONBACKEND_ModelInitialize() /opt/tritonserver/lmdeploy-main/src/turbomind/triton_backend/libfastertransformer.cc:1791
24 0x000000000010689b triton::core::TritonModel::Create() :0
25 0x00000000001c4f5d triton::core::ModelLifeCycle::CreateModel() :0
26 0x00000000001caccd std::_Function_handler<void (), triton::core::ModelLifeCycle::AsyncLoad(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, inference::ModelConfig const&, bool, std::shared_ptr<triton::core::TritonRepoAgentModelList> const&, std::function<void (triton::core::Status)>&&)::{lambda()#1}>::_M_invoke() model_lifecycle.cc:0
27 0x00000000003083a0 std::thread::_State_impl<std::thread::_Invoker<std::tuple<triton::common::ThreadPool::ThreadPool(unsigned long)::{lambda()#1}> > >::_M_run() thread_pool.cc:0
28 0x00000000000d6de4 std::error_code::default_error_condition() ???:0
29 0x0000000000008609 start_thread() ???:0
30 0x000000000011f133 clone() ???:0
=================================
[b0c399235231:00001] *** Process received signal ***
[b0c399235231:00001] Signal: Segmentation fault (11)
[b0c399235231:00001] Signal code: (-6)
[b0c399235231:00001] Failing at address: 0x1
[b0c399235231:00001] [ 0] /usr/lib/x86_64-linux-gnu/libpthread.so.0(+0x14420)[0x7f4267178420]
[b0c399235231:00001] [ 1] /usr/lib/x86_64-linux-gnu/libc.so.6(abort+0x213)[0x7f42659e2941]
[b0c399235231:00001] [ 2] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0x9e911)[0x7f4265dbc911]
[b0c399235231:00001] [ 3] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xaa38c)[0x7f4265dc838c]
[b0c399235231:00001] [ 4] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xaa3f7)[0x7f4265dc83f7]
[b0c399235231:00001] [ 5] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xaa6a9)[0x7f4265dc86a9]
[b0c399235231:00001] [ 6] /opt/tritonserver/backends/turbomind/libtransformer-shared.so(_ZN9turbomind17throwRuntimeErrorEPKciRKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE+0x23d)[0x7f41de1888dd]
[b0c399235231:00001] [ 7] /opt/tritonserver/backends/turbomind/libtransformer-shared.so(_ZN16LlamaTritonModelI6__halfEC1EmmiNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE+0x63e)[0x7f41de249b1e]
[b0c399235231:00001] [ 8] /opt/tritonserver/backends/turbomind/libtriton_turbomind.so(+0x16293)[0x7f4256145293]
[b0c399235231:00001] [ 9] /opt/tritonserver/backends/turbomind/libtriton_turbomind.so(+0x17811)[0x7f4256146811]
[b0c399235231:00001] [10] /opt/tritonserver/backends/turbomind/libtriton_turbomind.so(+0x24464)[0x7f4256153464]
[b0c399235231:00001] [11] /opt/tritonserver/backends/turbomind/libtriton_turbomind.so(TRITONBACKEND_ModelInitialize+0x341)[0x7f4256153a91]
[b0c399235231:00001] [12] /opt/tritonserver/lib/libtritonserver.so(+0x10689b)[0x7f42662ad89b]
[b0c399235231:00001] [13] /opt/tritonserver/lib/libtritonserver.so(+0x1c4f5d)[0x7f426636bf5d]
[b0c399235231:00001] [14] /opt/tritonserver/lib/libtritonserver.so(+0x1caccd)[0x7f4266371ccd]
[b0c399235231:00001] [15] /opt/tritonserver/lib/libtritonserver.so(+0x3083a0)[0x7f42664af3a0]
[b0c399235231:00001] [16] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xd6de4)[0x7f4265df4de4]
[b0c399235231:00001] [17] /usr/lib/x86_64-linux-gnu/libpthread.so.0(+0x8609)[0x7f426716c609]
[b0c399235231:00001] [18] /usr/lib/x86_64-linux-gnu/libc.so.6(clone+0x43)[0x7f4265adf133]
[b0c399235231:00001] *** End of error message ***
from lmdeploy.
this is the error I got when try :"bash workspace/service_docker_up.sh"
from lmdeploy.
The real problem should be [ERROR] Can't load '/workspace/models/model_repository/turbomind/1/weights/config.ini'
, have you correctly create the model weights? You can check with ls workspace/model_repository/turbomind/1/weights/
. There should be a config.ini
as well as a lot of parameter files, as follow.
config.ini layers.13.feed_forward.w1.0.weight layers.18.attention.wo.0.weight layers.21.ffn_norm.weight layers.26.feed_forward.w1.0.weight layers.30.attention.wo.0.weight layers.5.ffn_norm.weight
layers.0.attention_norm.weight layers.13.feed_forward.w2.0.weight layers.18.attention.w_qkv.0.bias layers.22.attention_norm.weight layers.26.feed_forward.w2.0.weight layers.30.attention.w_qkv.0.bias layers.6.attention_norm.weight
layers.0.attention.wo.0.bias layers.13.feed_forward.w3.0.weight layers.18.attention.w_qkv.0.weight layers.22.attention.wo.0.bias layers.26.feed_forward.w3.0.weight layers.30.attention.w_qkv.0.weight layers.6.attention.wo.0.bias
layers.0.attention.wo.0.weight layers.13.ffn_norm.weight layers.18.feed_forward.w1.0.weight layers.22.attention.wo.0.weight layers.26.ffn_norm.weight layers.30.feed_forward.w1.0.weight layers.6.attention.wo.0.weight
layers.0.attention.w_qkv.0.bias layers.14.attention_norm.weight layers.18.feed_forward.w2.0.weight layers.22.attention.w_qkv.0.bias layers.27.attention_norm.weight layers.30.feed_forward.w2.0.weight layers.6.attention.w_qkv.0.bias
layers.0.attention.w_qkv.0.weight layers.14.attention.wo.0.bias layers.18.feed_forward.w3.0.weight layers.22.attention.w_qkv.0.weight layers.27.attention.wo.0.bias layers.30.feed_forward.w3.0.weight layers.6.attention.w_qkv.0.weight
layers.0.feed_forward.w1.0.weight layers.14.attention.wo.0.weight layers.18.ffn_norm.weight layers.22.feed_forward.w1.0.weight layers.27.attention.wo.0.weight layers.30.ffn_norm.weight layers.6.feed_forward.w1.0.weight
layers.0.feed_forward.w2.0.weight layers.14.attention.w_qkv.0.bias layers.19.attention_norm.weight layers.22.feed_forward.w2.0.weight layers.27.attention.w_qkv.0.bias layers.31.attention_norm.weight layers.6.feed_forward.w2.0.weight
layers.0.feed_forward.w3.0.weight layers.14.attention.w_qkv.0.weight layers.19.attention.wo.0.bias layers.22.feed_forward.w3.0.weight layers.27.attention.w_qkv.0.weight layers.31.attention.wo.0.bias layers.6.feed_forward.w3.0.weight
layers.0.ffn_norm.weight layers.14.feed_forward.w1.0.weight layers.19.attention.wo.0.weight layers.22.ffn_norm.weight layers.27.feed_forward.w1.0.weight layers.31.attention.wo.0.weight layers.6.ffn_norm.weight
layers.10.attention_norm.weight layers.14.feed_forward.w2.0.weight layers.19.attention.w_qkv.0.bias layers.23.attention_norm.weight layers.27.feed_forward.w2.0.weight layers.31.attention.w_qkv.0.bias layers.7.attention_norm.weight
layers.10.attention.wo.0.bias layers.14.feed_forward.w3.0.weight layers.19.attention.w_qkv.0.weight layers.23.attention.wo.0.bias layers.27.feed_forward.w3.0.weight layers.31.attention.w_qkv.0.weight layers.7.attention.wo.0.bias
layers.10.attention.wo.0.weight layers.14.ffn_norm.weight layers.19.feed_forward.w1.0.weight layers.23.attention.wo.0.weight layers.27.ffn_norm.weight layers.31.feed_forward.w1.0.weight layers.7.attention.wo.0.weight
layers.10.attention.w_qkv.0.bias layers.15.attention_norm.weight layers.19.feed_forward.w2.0.weight layers.23.attention.w_qkv.0.bias layers.28.attention_norm.weight layers.31.feed_forward.w2.0.weight layers.7.attention.w_qkv.0.bias
layers.10.attention.w_qkv.0.weight layers.15.attention.wo.0.bias layers.19.feed_forward.w3.0.weight layers.23.attention.w_qkv.0.weight layers.28.attention.wo.0.bias layers.31.feed_forward.w3.0.weight layers.7.attention.w_qkv.0.weight
layers.10.feed_forward.w1.0.weight layers.15.attention.wo.0.weight layers.19.ffn_norm.weight layers.23.feed_forward.w1.0.weight layers.28.attention.wo.0.weight layers.31.ffn_norm.weight layers.7.feed_forward.w1.0.weight
layers.10.feed_forward.w2.0.weight layers.15.attention.w_qkv.0.bias layers.1.attention_norm.weight layers.23.feed_forward.w2.0.weight layers.28.attention.w_qkv.0.bias layers.3.attention_norm.weight layers.7.feed_forward.w2.0.weight
layers.10.feed_forward.w3.0.weight layers.15.attention.w_qkv.0.weight layers.1.attention.wo.0.bias layers.23.feed_forward.w3.0.weight layers.28.attention.w_qkv.0.weight layers.3.attention.wo.0.bias layers.7.feed_forward.w3.0.weight
layers.10.ffn_norm.weight layers.15.feed_forward.w1.0.weight layers.1.attention.wo.0.weight layers.23.ffn_norm.weight layers.28.feed_forward.w1.0.weight layers.3.attention.wo.0.weight layers.7.ffn_norm.weight
layers.11.attention_norm.weight layers.15.feed_forward.w2.0.weight layers.1.attention.w_qkv.0.bias layers.24.attention_norm.weight layers.28.feed_forward.w2.0.weight layers.3.attention.w_qkv.0.bias layers.8.attention_norm.weight
layers.11.attention.wo.0.bias layers.15.feed_forward.w3.0.weight layers.1.attention.w_qkv.0.weight layers.24.attention.wo.0.bias layers.28.feed_forward.w3.0.weight layers.3.attention.w_qkv.0.weight layers.8.attention.wo.0.bias
layers.11.attention.wo.0.weight layers.15.ffn_norm.weight layers.1.feed_forward.w1.0.weight layers.24.attention.wo.0.weight layers.28.ffn_norm.weight layers.3.feed_forward.w1.0.weight layers.8.attention.wo.0.weight
layers.11.attention.w_qkv.0.bias layers.16.attention_norm.weight layers.1.feed_forward.w2.0.weight layers.24.attention.w_qkv.0.bias layers.29.attention_norm.weight layers.3.feed_forward.w2.0.weight layers.8.attention.w_qkv.0.bias
layers.11.attention.w_qkv.0.weight layers.16.attention.wo.0.bias layers.1.feed_forward.w3.0.weight layers.24.attention.w_qkv.0.weight layers.29.attention.wo.0.bias layers.3.feed_forward.w3.0.weight layers.8.attention.w_qkv.0.weight
layers.11.feed_forward.w1.0.weight layers.16.attention.wo.0.weight layers.1.ffn_norm.weight layers.24.feed_forward.w1.0.weight layers.29.attention.wo.0.weight layers.3.ffn_norm.weight layers.8.feed_forward.w1.0.weight
layers.11.feed_forward.w2.0.weight layers.16.attention.w_qkv.0.bias layers.20.attention_norm.weight layers.24.feed_forward.w2.0.weight layers.29.attention.w_qkv.0.bias layers.4.attention_norm.weight layers.8.feed_forward.w2.0.weight
layers.11.feed_forward.w3.0.weight layers.16.attention.w_qkv.0.weight layers.20.attention.wo.0.bias layers.24.feed_forward.w3.0.weight layers.29.attention.w_qkv.0.weight layers.4.attention.wo.0.bias layers.8.feed_forward.w3.0.weight
layers.11.ffn_norm.weight layers.16.feed_forward.w1.0.weight layers.20.attention.wo.0.weight layers.24.ffn_norm.weight layers.29.feed_forward.w1.0.weight layers.4.attention.wo.0.weight layers.8.ffn_norm.weight
layers.12.attention_norm.weight layers.16.feed_forward.w2.0.weight layers.20.attention.w_qkv.0.bias layers.25.attention_norm.weight layers.29.feed_forward.w2.0.weight layers.4.attention.w_qkv.0.bias layers.9.attention_norm.weight
layers.12.attention.wo.0.bias layers.16.feed_forward.w3.0.weight layers.20.attention.w_qkv.0.weight layers.25.attention.wo.0.bias layers.29.feed_forward.w3.0.weight layers.4.attention.w_qkv.0.weight layers.9.attention.wo.0.bias
layers.12.attention.wo.0.weight layers.16.ffn_norm.weight layers.20.feed_forward.w1.0.weight layers.25.attention.wo.0.weight layers.29.ffn_norm.weight layers.4.feed_forward.w1.0.weight layers.9.attention.wo.0.weight
layers.12.attention.w_qkv.0.bias layers.17.attention_norm.weight layers.20.feed_forward.w2.0.weight layers.25.attention.w_qkv.0.bias layers.2.attention_norm.weight layers.4.feed_forward.w2.0.weight layers.9.attention.w_qkv.0.bias
layers.12.attention.w_qkv.0.weight layers.17.attention.wo.0.bias layers.20.feed_forward.w3.0.weight layers.25.attention.w_qkv.0.weight layers.2.attention.wo.0.bias layers.4.feed_forward.w3.0.weight layers.9.attention.w_qkv.0.weight
layers.12.feed_forward.w1.0.weight layers.17.attention.wo.0.weight layers.20.ffn_norm.weight layers.25.feed_forward.w1.0.weight layers.2.attention.wo.0.weight layers.4.ffn_norm.weight layers.9.feed_forward.w1.0.weight
layers.12.feed_forward.w2.0.weight layers.17.attention.w_qkv.0.bias layers.21.attention_norm.weight layers.25.feed_forward.w2.0.weight layers.2.attention.w_qkv.0.bias layers.5.attention_norm.weight layers.9.feed_forward.w2.0.weight
layers.12.feed_forward.w3.0.weight layers.17.attention.w_qkv.0.weight layers.21.attention.wo.0.bias layers.25.feed_forward.w3.0.weight layers.2.attention.w_qkv.0.weight layers.5.attention.wo.0.bias layers.9.feed_forward.w3.0.weight
layers.12.ffn_norm.weight layers.17.feed_forward.w1.0.weight layers.21.attention.wo.0.weight layers.25.ffn_norm.weight layers.2.feed_forward.w1.0.weight layers.5.attention.wo.0.weight layers.9.ffn_norm.weight
layers.13.attention_norm.weight layers.17.feed_forward.w2.0.weight layers.21.attention.w_qkv.0.bias layers.26.attention_norm.weight layers.2.feed_forward.w2.0.weight layers.5.attention.w_qkv.0.bias norm.weight
layers.13.attention.wo.0.bias layers.17.feed_forward.w3.0.weight layers.21.attention.w_qkv.0.weight layers.26.attention.wo.0.bias layers.2.feed_forward.w3.0.weight layers.5.attention.w_qkv.0.weight output.weight
layers.13.attention.wo.0.weight layers.17.ffn_norm.weight layers.21.feed_forward.w1.0.weight layers.26.attention.wo.0.weight layers.2.ffn_norm.weight layers.5.feed_forward.w1.0.weight tok_embeddings.weight
layers.13.attention.w_qkv.0.bias layers.18.attention_norm.weight layers.21.feed_forward.w2.0.weight layers.26.attention.w_qkv.0.bias layers.30.attention_norm.weight layers.5.feed_forward.w2.0.weight
layers.13.attention.w_qkv.0.weight layers.18.attention.wo.0.bias layers.21.feed_forward.w3.0.weight layers.26.attention.w_qkv.0.weight layers.30.attention.wo.0.bias layers.5.feed_forward.w3.0.weight
from lmdeploy.
ok, I run all the commands in ~/lmdeploy/, and it finally work. you'd better add the working directory to the command.
Anyway, I meet another bug, when I ask the robot, It can't stop answering.
from lmdeploy.
You can try disable (or weaken) sampling by setting temperature to zero (or a smaller value).
from lmdeploy.
ok,thank you.
from lmdeploy.
Please use internlm-chat-7b model instead of internlm-7b
from lmdeploy.
Related Issues (20)
- [Feature] model name should be settable or follow original full HF link name, not random new name HOT 6
- [Bug] lmdeploy lite auto_awq: RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! HOT 1
- [Bug] result of W4A16 quantized Qwen1.5-1.8B-Chat model not correct HOT 1
- Support for SWIFT finetuned models HOT 4
- 总是看到一个using default GEMM algo的WARNING,是否会因为使用了默认的GEMM而影响速度或者吞吐量? HOT 1
- [Bug] 下载代码执行internvl-v1.5量化,导入本地模型时报错 HOT 11
- [Feature] peft<=0.9.0 要求的版本要求太低,与较多环境要求peft>0.10冲突,能否修改
- [Feature] Support for LLaVA-NeXT HOT 1
- [Docs] How are multiple images handled? HOT 5
- [Bug] output diff when temperature set zero HOT 3
- batch inference HOT 1
- [Feature] InternVL-Chat-V1-5-AWQ merge LoRA adapter HOT 4
- [Feature] support for MiniCPM-Llama3-V 2.5 HOT 1
- [Bug] HOT 2
- InternVL模型在推理时是否可以控制max_tiles参数等或者是否可以pipline直接传入pixel_values HOT 3
- [Bug] not support inference qwen1.5 HOT 6
- Encountered core dump issue when quantifying the model HOT 2
- [Bug] ModuleNotFoundError: No module named '_turbomind' loading llava Mistral 7B HOT 1
- How is the support for RoPE difference between `hf llama` and `meta llama`?
- [Bug] 使用LM启动API服务器的InternVL-1.5无法识别图片 HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from lmdeploy.