I try to deploy in my server with 2*3090, cuda-11.7. It deploy normally with comma

<div class="snippet-clipboard-content notranslate position-relative overflow-auto" data-snippet-clip

<a target="_blank" rel="noopener noreferrer" href="https://private-user-images.githubu

Please use <a href="https://huggingface.co/internlm/internl

a bug,about internlm/lmdeploy

Comments (8)

wangruohui commented on July 17, 2024

Hello,

For the server, I have tested (on the host, not within docker) with the command bash workspace/service_docker_up.sh, and it works without segfault. I tested on CentOS 7, but it seems you are using Ubuntu. Could you please post all the error messages for us to investigate?

For the app, the correct command should be python3 -m lmdeploy.app {server_ip_address}:33337. I think you are launching the app under lmdeploy/lmdeploy directory so that the current directory is of the highest priority during import and the submodule name torch override the actual pytorch. Just change to another directory and use python3 -m to launch the app. Of course, you need to pip install lmdeploy first, otherwise, this module won't be found.

from lmdeploy.

wangyuwen1999 commented on July 17, 2024

=============================

== Triton Inference Server ==
=============================

NVIDIA Release 22.12 (build 50109463)
Triton Server Version 2.29.0

Copyright (c) 2018-2022, NVIDIA CORPORATION & AFFILIATES.  All rights reserved.

Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES.  All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

I0710 06:49:48.339074 1 pinned_memory_manager.cc:240] Pinned memory pool is created at '0x7f4216000000' with size 268435456
I0710 06:49:48.339509 1 cuda_memory_manager.cc:105] CUDA memory pool is created on device 0 with size 67108864
I0710 06:49:48.342878 1 model_lifecycle.cc:459] loading: turbomind:1
I0710 06:49:48.342941 1 model_lifecycle.cc:459] loading: postprocessing:1
I0710 06:49:48.342985 1 model_lifecycle.cc:459] loading: preprocessing:1
I0710 06:49:48.514055 1 libfastertransformer.cc:1746] TRITONBACKEND_Initialize: turbomind
I0710 06:49:48.514097 1 libfastertransformer.cc:1753] Triton TRITONBACKEND API version: 1.10
I0710 06:49:48.514106 1 libfastertransformer.cc:1757] 'turbomind' TRITONBACKEND API version: 1.10
I0710 06:49:52.181203 1 libfastertransformer.cc:1784] TRITONBACKEND_ModelInitialize: turbomind (version 1)
I0710 06:49:52.182419 1 libfastertransformer.cc:307] Instance group type: KIND_CPU count: 48
num_nodes=1
tp_pp_size=1
gpu_size=1
world_size=1
model_instance_size=1
I0710 06:49:52.182461 1 libfastertransformer.cc:346] Sequence Batching: disabled
I0710 06:49:52.182471 1 libfastertransformer.cc:357] Dynamic Batching: disabled
[ERROR] Can't load '/workspace/models/model_repository/turbomind/1/weights/config.ini'
terminate called after throwing an instance of 'std::runtime_error'
  what():  [FT][ERROR]  Assertion fail: /opt/tritonserver/lmdeploy-main/src/turbomind/triton_backend/llama/LlamaTritonModel.cc:110

[b0c399235231:00001] *** Process received signal ***
[b0c399235231:00001] Signal: Aborted (6)
[b0c399235231:00001] Signal code:  (-6)
[b0c399235231:00001] [ 0] /usr/lib/x86_64-linux-gnu/libpthread.so.0(+0x14420)[0x7f4267178420]
[b0c399235231:00001] [ 1] /usr/lib/x86_64-linux-gnu/libc.so.6(gsignal+0xcb)[0x7f4265a0300b]
[b0c399235231:00001] [ 2] /usr/lib/x86_64-linux-gnu/libc.so.6(abort+0x12b)[0x7f42659e2859]
[b0c399235231:00001] [ 3] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0x9e911)[0x7f4265dbc911]
[b0c399235231:00001] [ 4] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xaa38c)[0x7f4265dc838c]
[b0c399235231:00001] [ 5] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xaa3f7)[0x7f4265dc83f7]
[b0c399235231:00001] [ 6] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xaa6a9)[0x7f4265dc86a9]
[b0c399235231:00001] [ 7] /opt/tritonserver/backends/turbomind/libtransformer-shared.so(_ZN9turbomind17throwRuntimeErrorEPKciRKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE+0x23d)[0x7f41de1888dd]
[b0c399235231:00001] [ 8] /opt/tritonserver/backends/turbomind/libtransformer-shared.so(_ZN16LlamaTritonModelI6__halfEC1EmmiNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE+0x63e)[0x7f41de249b1e]
[b0c399235231:00001] [ 9] /opt/tritonserver/backends/turbomind/libtriton_turbomind.so(+0x16293)[0x7f4256145293]
[b0c399235231:00001] [10] /opt/tritonserver/backends/turbomind/libtriton_turbomind.so(+0x17811)[0x7f4256146811]
[b0c399235231:00001] [11] /opt/tritonserver/backends/turbomind/libtriton_turbomind.so(+0x24464)[0x7f4256153464]
[b0c399235231:00001] [12] /opt/tritonserver/backends/turbomind/libtriton_turbomind.so(TRITONBACKEND_ModelInitialize+0x341)[0x7f4256153a91]
[b0c399235231:00001] [13] /opt/tritonserver/lib/libtritonserver.so(+0x10689b)[0x7f42662ad89b]
[b0c399235231:00001] [14] /opt/tritonserver/lib/libtritonserver.so(+0x1c4f5d)[0x7f426636bf5d]
[b0c399235231:00001] [15] /opt/tritonserver/lib/libtritonserver.so(+0x1caccd)[0x7f4266371ccd]
[b0c399235231:00001] [16] /opt/tritonserver/lib/libtritonserver.so(+0x3083a0)[0x7f42664af3a0]
[b0c399235231:00001] [17] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xd6de4)[0x7f4265df4de4]
[b0c399235231:00001] [18] /usr/lib/x86_64-linux-gnu/libpthread.so.0(+0x8609)[0x7f426716c609]
[b0c399235231:00001] [19] /usr/lib/x86_64-linux-gnu/libc.so.6(clone+0x43)[0x7f4265adf133]
[b0c399235231:00001] *** End of error message ***
[b0c399235231:1    :0:81] Caught signal 11 (Segmentation fault: Sent by the kernel at address (nil))
==== backtrace (tid:     81) ====
 0 0x0000000000014420 __funlockfile()  ???:0
 1 0x0000000000022941 abort()  ???:0
 2 0x000000000009e911 __cxa_throw_bad_array_new_length()  ???:0
 3 0x00000000000aa38c std::rethrow_exception()  ???:0
 4 0x00000000000aa3f7 std::terminate()  ???:0
 5 0x00000000000aa6a9 __cxa_throw()  ???:0
 6 0x00000000000618dd turbomind::throwRuntimeError()  /opt/tritonserver/lmdeploy-main/src/turbomind/utils/cuda_utils.h:202
 7 0x0000000000122b1e turbomind::myAssert()  /opt/tritonserver/lmdeploy-main/src/turbomind/utils/cuda_utils.h:209
 8 0x0000000000122b1e LlamaTritonModel<__half>::LlamaTritonModel()  /opt/tritonserver/lmdeploy-main/src/turbomind/triton_backend/llama/LlamaTritonModel.cc:110
 9 0x0000000000016293 __gnu_cxx::new_allocator<LlamaTritonModel<__half> >::construct<LlamaTritonModel<__half>, int const&, int const&, int const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&>()  /usr/include/c++/9/ext/new_allocator.h:146
10 0x0000000000016293 std::allocator_traits<std::allocator<LlamaTritonModel<__half> > >::construct<LlamaTritonModel<__half>, int const&, int const&, int const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&>()  /usr/include/c++/9/bits/alloc_traits.h:483
11 0x0000000000016293 std::_Sp_counted_ptr_inplace<LlamaTritonModel<__half>, std::allocator<LlamaTritonModel<__half> >, (__gnu_cxx::_Lock_policy)2>::_Sp_counted_ptr_inplace<int const&, int const&, int const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&>()  /usr/include/c++/9/bits/shared_ptr_base.h:548
12 0x0000000000016293 std::__shared_count<(__gnu_cxx::_Lock_policy)2>::__shared_count<LlamaTritonModel<__half>, std::allocator<LlamaTritonModel<__half> >, int const&, int const&, int const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&>()  /usr/include/c++/9/bits/shared_ptr_base.h:679
13 0x0000000000016293 std::__shared_ptr<LlamaTritonModel<__half>, (__gnu_cxx::_Lock_policy)2>::__shared_ptr<std::allocator<LlamaTritonModel<__half> >, int const&, int const&, int const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&>()  /usr/include/c++/9/bits/shared_ptr_base.h:1344
14 0x0000000000016293 std::shared_ptr<LlamaTritonModel<__half> >::shared_ptr<std::allocator<LlamaTritonModel<__half> >, int const&, int const&, int const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&>()  /usr/include/c++/9/bits/shared_ptr.h:359
15 0x0000000000016293 std::allocate_shared<LlamaTritonModel<__half>, std::allocator<LlamaTritonModel<__half> >, int const&, int const&, int const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&>()  /usr/include/c++/9/bits/shared_ptr.h:702
16 0x0000000000016293 std::make_shared<LlamaTritonModel<__half>, int const&, int const&, int const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&>()  /usr/include/c++/9/bits/shared_ptr.h:718
17 0x0000000000016293 triton::backend::turbomind_backend::ModelState::ModelFactory()  /opt/tritonserver/lmdeploy-main/src/turbomind/triton_backend/libfastertransformer.cc:267
18 0x0000000000017811 triton::backend::turbomind_backend::ModelState::ModelState()  /opt/tritonserver/lmdeploy-main/src/turbomind/triton_backend/libfastertransformer.cc:371
19 0x0000000000017811 std::__shared_ptr<AbstractTransformerModel, (__gnu_cxx::_Lock_policy)2>::operator=()  /usr/include/c++/9/bits/shared_ptr_base.h:1265
20 0x0000000000017811 std::shared_ptr<AbstractTransformerModel>::operator=()  /usr/include/c++/9/bits/shared_ptr.h:335
21 0x0000000000017811 triton::backend::turbomind_backend::ModelState::ModelState()  /opt/tritonserver/lmdeploy-main/src/turbomind/triton_backend/libfastertransformer.cc:371
22 0x0000000000024464 triton::backend::turbomind_backend::ModelState::Create()  /opt/tritonserver/lmdeploy-main/src/turbomind/triton_backend/libfastertransformer.cc:182
23 0x0000000000024a91 TRITONBACKEND_ModelInitialize()  /opt/tritonserver/lmdeploy-main/src/turbomind/triton_backend/libfastertransformer.cc:1791
24 0x000000000010689b triton::core::TritonModel::Create()  :0
25 0x00000000001c4f5d triton::core::ModelLifeCycle::CreateModel()  :0
26 0x00000000001caccd std::_Function_handler<void (), triton::core::ModelLifeCycle::AsyncLoad(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, inference::ModelConfig const&, bool, std::shared_ptr<triton::core::TritonRepoAgentModelList> const&, std::function<void (triton::core::Status)>&&)::{lambda()#1}>::_M_invoke()  model_lifecycle.cc:0
27 0x00000000003083a0 std::thread::_State_impl<std::thread::_Invoker<std::tuple<triton::common::ThreadPool::ThreadPool(unsigned long)::{lambda()#1}> > >::_M_run()  thread_pool.cc:0
28 0x00000000000d6de4 std::error_code::default_error_condition()  ???:0
29 0x0000000000008609 start_thread()  ???:0
30 0x000000000011f133 clone()  ???:0
=================================
[b0c399235231:00001] *** Process received signal ***
[b0c399235231:00001] Signal: Segmentation fault (11)
[b0c399235231:00001] Signal code:  (-6)
[b0c399235231:00001] Failing at address: 0x1
[b0c399235231:00001] [ 0] /usr/lib/x86_64-linux-gnu/libpthread.so.0(+0x14420)[0x7f4267178420]
[b0c399235231:00001] [ 1] /usr/lib/x86_64-linux-gnu/libc.so.6(abort+0x213)[0x7f42659e2941]
[b0c399235231:00001] [ 2] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0x9e911)[0x7f4265dbc911]
[b0c399235231:00001] [ 3] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xaa38c)[0x7f4265dc838c]
[b0c399235231:00001] [ 4] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xaa3f7)[0x7f4265dc83f7]
[b0c399235231:00001] [ 5] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xaa6a9)[0x7f4265dc86a9]
[b0c399235231:00001] [ 6] /opt/tritonserver/backends/turbomind/libtransformer-shared.so(_ZN9turbomind17throwRuntimeErrorEPKciRKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE+0x23d)[0x7f41de1888dd]
[b0c399235231:00001] [ 7] /opt/tritonserver/backends/turbomind/libtransformer-shared.so(_ZN16LlamaTritonModelI6__halfEC1EmmiNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE+0x63e)[0x7f41de249b1e]
[b0c399235231:00001] [ 8] /opt/tritonserver/backends/turbomind/libtriton_turbomind.so(+0x16293)[0x7f4256145293]
[b0c399235231:00001] [ 9] /opt/tritonserver/backends/turbomind/libtriton_turbomind.so(+0x17811)[0x7f4256146811]
[b0c399235231:00001] [10] /opt/tritonserver/backends/turbomind/libtriton_turbomind.so(+0x24464)[0x7f4256153464]
[b0c399235231:00001] [11] /opt/tritonserver/backends/turbomind/libtriton_turbomind.so(TRITONBACKEND_ModelInitialize+0x341)[0x7f4256153a91]
[b0c399235231:00001] [12] /opt/tritonserver/lib/libtritonserver.so(+0x10689b)[0x7f42662ad89b]
[b0c399235231:00001] [13] /opt/tritonserver/lib/libtritonserver.so(+0x1c4f5d)[0x7f426636bf5d]
[b0c399235231:00001] [14] /opt/tritonserver/lib/libtritonserver.so(+0x1caccd)[0x7f4266371ccd]
[b0c399235231:00001] [15] /opt/tritonserver/lib/libtritonserver.so(+0x3083a0)[0x7f42664af3a0]
[b0c399235231:00001] [16] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xd6de4)[0x7f4265df4de4]
[b0c399235231:00001] [17] /usr/lib/x86_64-linux-gnu/libpthread.so.0(+0x8609)[0x7f426716c609]
[b0c399235231:00001] [18] /usr/lib/x86_64-linux-gnu/libc.so.6(clone+0x43)[0x7f4265adf133]
[b0c399235231:00001] *** End of error message ***

from lmdeploy.

wangyuwen1999 commented on July 17, 2024

this is the error I got when try :"bash workspace/service_docker_up.sh"

from lmdeploy.

wangruohui commented on July 17, 2024

The real problem should be [ERROR] Can't load '/workspace/models/model_repository/turbomind/1/weights/config.ini', have you correctly create the model weights? You can check with ls workspace/model_repository/turbomind/1/weights/. There should be a config.ini as well as a lot of parameter files, as follow.

config.ini                          layers.13.feed_forward.w1.0.weight  layers.18.attention.wo.0.weight     layers.21.ffn_norm.weight           layers.26.feed_forward.w1.0.weight  layers.30.attention.wo.0.weight     layers.5.ffn_norm.weight
layers.0.attention_norm.weight      layers.13.feed_forward.w2.0.weight  layers.18.attention.w_qkv.0.bias    layers.22.attention_norm.weight     layers.26.feed_forward.w2.0.weight  layers.30.attention.w_qkv.0.bias    layers.6.attention_norm.weight
layers.0.attention.wo.0.bias        layers.13.feed_forward.w3.0.weight  layers.18.attention.w_qkv.0.weight  layers.22.attention.wo.0.bias       layers.26.feed_forward.w3.0.weight  layers.30.attention.w_qkv.0.weight  layers.6.attention.wo.0.bias
layers.0.attention.wo.0.weight      layers.13.ffn_norm.weight           layers.18.feed_forward.w1.0.weight  layers.22.attention.wo.0.weight     layers.26.ffn_norm.weight           layers.30.feed_forward.w1.0.weight  layers.6.attention.wo.0.weight
layers.0.attention.w_qkv.0.bias     layers.14.attention_norm.weight     layers.18.feed_forward.w2.0.weight  layers.22.attention.w_qkv.0.bias    layers.27.attention_norm.weight     layers.30.feed_forward.w2.0.weight  layers.6.attention.w_qkv.0.bias
layers.0.attention.w_qkv.0.weight   layers.14.attention.wo.0.bias       layers.18.feed_forward.w3.0.weight  layers.22.attention.w_qkv.0.weight  layers.27.attention.wo.0.bias       layers.30.feed_forward.w3.0.weight  layers.6.attention.w_qkv.0.weight
layers.0.feed_forward.w1.0.weight   layers.14.attention.wo.0.weight     layers.18.ffn_norm.weight           layers.22.feed_forward.w1.0.weight  layers.27.attention.wo.0.weight     layers.30.ffn_norm.weight           layers.6.feed_forward.w1.0.weight
layers.0.feed_forward.w2.0.weight   layers.14.attention.w_qkv.0.bias    layers.19.attention_norm.weight     layers.22.feed_forward.w2.0.weight  layers.27.attention.w_qkv.0.bias    layers.31.attention_norm.weight     layers.6.feed_forward.w2.0.weight
layers.0.feed_forward.w3.0.weight   layers.14.attention.w_qkv.0.weight  layers.19.attention.wo.0.bias       layers.22.feed_forward.w3.0.weight  layers.27.attention.w_qkv.0.weight  layers.31.attention.wo.0.bias       layers.6.feed_forward.w3.0.weight
layers.0.ffn_norm.weight            layers.14.feed_forward.w1.0.weight  layers.19.attention.wo.0.weight     layers.22.ffn_norm.weight           layers.27.feed_forward.w1.0.weight  layers.31.attention.wo.0.weight     layers.6.ffn_norm.weight
layers.10.attention_norm.weight     layers.14.feed_forward.w2.0.weight  layers.19.attention.w_qkv.0.bias    layers.23.attention_norm.weight     layers.27.feed_forward.w2.0.weight  layers.31.attention.w_qkv.0.bias    layers.7.attention_norm.weight
layers.10.attention.wo.0.bias       layers.14.feed_forward.w3.0.weight  layers.19.attention.w_qkv.0.weight  layers.23.attention.wo.0.bias       layers.27.feed_forward.w3.0.weight  layers.31.attention.w_qkv.0.weight  layers.7.attention.wo.0.bias
layers.10.attention.wo.0.weight     layers.14.ffn_norm.weight           layers.19.feed_forward.w1.0.weight  layers.23.attention.wo.0.weight     layers.27.ffn_norm.weight           layers.31.feed_forward.w1.0.weight  layers.7.attention.wo.0.weight
layers.10.attention.w_qkv.0.bias    layers.15.attention_norm.weight     layers.19.feed_forward.w2.0.weight  layers.23.attention.w_qkv.0.bias    layers.28.attention_norm.weight     layers.31.feed_forward.w2.0.weight  layers.7.attention.w_qkv.0.bias
layers.10.attention.w_qkv.0.weight  layers.15.attention.wo.0.bias       layers.19.feed_forward.w3.0.weight  layers.23.attention.w_qkv.0.weight  layers.28.attention.wo.0.bias       layers.31.feed_forward.w3.0.weight  layers.7.attention.w_qkv.0.weight
layers.10.feed_forward.w1.0.weight  layers.15.attention.wo.0.weight     layers.19.ffn_norm.weight           layers.23.feed_forward.w1.0.weight  layers.28.attention.wo.0.weight     layers.31.ffn_norm.weight           layers.7.feed_forward.w1.0.weight
layers.10.feed_forward.w2.0.weight  layers.15.attention.w_qkv.0.bias    layers.1.attention_norm.weight      layers.23.feed_forward.w2.0.weight  layers.28.attention.w_qkv.0.bias    layers.3.attention_norm.weight      layers.7.feed_forward.w2.0.weight
layers.10.feed_forward.w3.0.weight  layers.15.attention.w_qkv.0.weight  layers.1.attention.wo.0.bias        layers.23.feed_forward.w3.0.weight  layers.28.attention.w_qkv.0.weight  layers.3.attention.wo.0.bias        layers.7.feed_forward.w3.0.weight
layers.10.ffn_norm.weight           layers.15.feed_forward.w1.0.weight  layers.1.attention.wo.0.weight      layers.23.ffn_norm.weight           layers.28.feed_forward.w1.0.weight  layers.3.attention.wo.0.weight      layers.7.ffn_norm.weight
layers.11.attention_norm.weight     layers.15.feed_forward.w2.0.weight  layers.1.attention.w_qkv.0.bias     layers.24.attention_norm.weight     layers.28.feed_forward.w2.0.weight  layers.3.attention.w_qkv.0.bias     layers.8.attention_norm.weight
layers.11.attention.wo.0.bias       layers.15.feed_forward.w3.0.weight  layers.1.attention.w_qkv.0.weight   layers.24.attention.wo.0.bias       layers.28.feed_forward.w3.0.weight  layers.3.attention.w_qkv.0.weight   layers.8.attention.wo.0.bias
layers.11.attention.wo.0.weight     layers.15.ffn_norm.weight           layers.1.feed_forward.w1.0.weight   layers.24.attention.wo.0.weight     layers.28.ffn_norm.weight           layers.3.feed_forward.w1.0.weight   layers.8.attention.wo.0.weight
layers.11.attention.w_qkv.0.bias    layers.16.attention_norm.weight     layers.1.feed_forward.w2.0.weight   layers.24.attention.w_qkv.0.bias    layers.29.attention_norm.weight     layers.3.feed_forward.w2.0.weight   layers.8.attention.w_qkv.0.bias
layers.11.attention.w_qkv.0.weight  layers.16.attention.wo.0.bias       layers.1.feed_forward.w3.0.weight   layers.24.attention.w_qkv.0.weight  layers.29.attention.wo.0.bias       layers.3.feed_forward.w3.0.weight   layers.8.attention.w_qkv.0.weight
layers.11.feed_forward.w1.0.weight  layers.16.attention.wo.0.weight     layers.1.ffn_norm.weight            layers.24.feed_forward.w1.0.weight  layers.29.attention.wo.0.weight     layers.3.ffn_norm.weight            layers.8.feed_forward.w1.0.weight
layers.11.feed_forward.w2.0.weight  layers.16.attention.w_qkv.0.bias    layers.20.attention_norm.weight     layers.24.feed_forward.w2.0.weight  layers.29.attention.w_qkv.0.bias    layers.4.attention_norm.weight      layers.8.feed_forward.w2.0.weight
layers.11.feed_forward.w3.0.weight  layers.16.attention.w_qkv.0.weight  layers.20.attention.wo.0.bias       layers.24.feed_forward.w3.0.weight  layers.29.attention.w_qkv.0.weight  layers.4.attention.wo.0.bias        layers.8.feed_forward.w3.0.weight
layers.11.ffn_norm.weight           layers.16.feed_forward.w1.0.weight  layers.20.attention.wo.0.weight     layers.24.ffn_norm.weight           layers.29.feed_forward.w1.0.weight  layers.4.attention.wo.0.weight      layers.8.ffn_norm.weight
layers.12.attention_norm.weight     layers.16.feed_forward.w2.0.weight  layers.20.attention.w_qkv.0.bias    layers.25.attention_norm.weight     layers.29.feed_forward.w2.0.weight  layers.4.attention.w_qkv.0.bias     layers.9.attention_norm.weight
layers.12.attention.wo.0.bias       layers.16.feed_forward.w3.0.weight  layers.20.attention.w_qkv.0.weight  layers.25.attention.wo.0.bias       layers.29.feed_forward.w3.0.weight  layers.4.attention.w_qkv.0.weight   layers.9.attention.wo.0.bias
layers.12.attention.wo.0.weight     layers.16.ffn_norm.weight           layers.20.feed_forward.w1.0.weight  layers.25.attention.wo.0.weight     layers.29.ffn_norm.weight           layers.4.feed_forward.w1.0.weight   layers.9.attention.wo.0.weight
layers.12.attention.w_qkv.0.bias    layers.17.attention_norm.weight     layers.20.feed_forward.w2.0.weight  layers.25.attention.w_qkv.0.bias    layers.2.attention_norm.weight      layers.4.feed_forward.w2.0.weight   layers.9.attention.w_qkv.0.bias
layers.12.attention.w_qkv.0.weight  layers.17.attention.wo.0.bias       layers.20.feed_forward.w3.0.weight  layers.25.attention.w_qkv.0.weight  layers.2.attention.wo.0.bias        layers.4.feed_forward.w3.0.weight   layers.9.attention.w_qkv.0.weight
layers.12.feed_forward.w1.0.weight  layers.17.attention.wo.0.weight     layers.20.ffn_norm.weight           layers.25.feed_forward.w1.0.weight  layers.2.attention.wo.0.weight      layers.4.ffn_norm.weight            layers.9.feed_forward.w1.0.weight
layers.12.feed_forward.w2.0.weight  layers.17.attention.w_qkv.0.bias    layers.21.attention_norm.weight     layers.25.feed_forward.w2.0.weight  layers.2.attention.w_qkv.0.bias     layers.5.attention_norm.weight      layers.9.feed_forward.w2.0.weight
layers.12.feed_forward.w3.0.weight  layers.17.attention.w_qkv.0.weight  layers.21.attention.wo.0.bias       layers.25.feed_forward.w3.0.weight  layers.2.attention.w_qkv.0.weight   layers.5.attention.wo.0.bias        layers.9.feed_forward.w3.0.weight
layers.12.ffn_norm.weight           layers.17.feed_forward.w1.0.weight  layers.21.attention.wo.0.weight     layers.25.ffn_norm.weight           layers.2.feed_forward.w1.0.weight   layers.5.attention.wo.0.weight      layers.9.ffn_norm.weight
layers.13.attention_norm.weight     layers.17.feed_forward.w2.0.weight  layers.21.attention.w_qkv.0.bias    layers.26.attention_norm.weight     layers.2.feed_forward.w2.0.weight   layers.5.attention.w_qkv.0.bias     norm.weight
layers.13.attention.wo.0.bias       layers.17.feed_forward.w3.0.weight  layers.21.attention.w_qkv.0.weight  layers.26.attention.wo.0.bias       layers.2.feed_forward.w3.0.weight   layers.5.attention.w_qkv.0.weight   output.weight
layers.13.attention.wo.0.weight     layers.17.ffn_norm.weight           layers.21.feed_forward.w1.0.weight  layers.26.attention.wo.0.weight     layers.2.ffn_norm.weight            layers.5.feed_forward.w1.0.weight   tok_embeddings.weight
layers.13.attention.w_qkv.0.bias    layers.18.attention_norm.weight     layers.21.feed_forward.w2.0.weight  layers.26.attention.w_qkv.0.bias    layers.30.attention_norm.weight     layers.5.feed_forward.w2.0.weight
layers.13.attention.w_qkv.0.weight  layers.18.attention.wo.0.bias       layers.21.feed_forward.w3.0.weight  layers.26.attention.w_qkv.0.weight  layers.30.attention.wo.0.bias       layers.5.feed_forward.w3.0.weight

from lmdeploy.

wangyuwen1999 commented on July 17, 2024

ok, I run all the commands in ~/lmdeploy/， and it finally work. you'd better add the working directory to the command.
Anyway, I meet another bug, when I ask the robot, It can't stop answering.

from lmdeploy.

wangruohui commented on July 17, 2024

You can try disable (or weaken) sampling by setting temperature to zero (or a smaller value).

from lmdeploy.

wangyuwen1999 commented on July 17, 2024

ok，thank you.

from lmdeploy.

lvhan028 commented on July 17, 2024

Please use internlm-chat-7b model instead of internlm-7b

from lmdeploy.

a bug about lmdeploy HOT 8 CLOSED

Comments (8)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent