Comments (1)
I'm not sure if this is the same issue, but I tried to use a 2gpu model and got this:
terminate called after throwing an instance of 'std::runtime_error'
what(): [FT][ERROR] shared_ft_model->getTensorParaSize() * shared_ft_model->getPipelineParaSize() == world_size Assertion fail: /workspace/build/fastertransformer_backend/src/libfastertransformer.cc:498
Complete logs
[+] Building 0.0s (1/4) docker:default
=> [copilot_proxy internal] load .dockerignore 0.0s
=> => transferring context: 2B 0.0s
=> [copilot_proxy internal] load build definition from proxy.Dockerfile 0.0s
=> => transferring dockerfile: 307B [+] Building 2.2s (17/17) FINISHED docker:default
=> [copilot_proxy internal] load .dockerignore 0.0s
=> => transferring context: 2B 0.0s
=> [copilot_proxy internal] load build definition from proxy.Dockerfile 0.1s
=> => transferring dockerfile: 307B 0.0s
=> [triton internal] load .dockerignore 0.1s
=> => transferring context: 2B 0.0s
=> [triton internal] load build definition from triton.Dockerfile 0.1s
=> => transferring dockerfile: 325B 0.0s
=> [copilot_proxy internal] load metadata for docker.io/library/python:3.10-slim-buster 1.7s
=> [triton internal] load metadata for docker.io/moyix/triton_with_ft:22.09 0.8s
=> [triton 1/3] FROM docker.io/moyix/triton_with_ft:22.09@sha256:5a15c1f29c6b018967b49c588eb0ea67acbf897abb7f26e509ec21844574c9b1 0.0s
=> CACHED [triton 2/3] RUN python3 -m pip install --disable-pip-version-check -U torch --extra-index-url https://download.pytorch.org/whl/cu116 0.0s
=> CACHED [triton 3/3] RUN python3 -m pip install --disable-pip-version-check -U transformers bitsandbytes accelerate 0.0s
=> [triton] exporting to image 0.0s
=> => exporting layers 0.0s
=> => writing image sha256:79dd3771c789003418dd215e18f816ca7e796d4d77a4de792907f7d8aa8a5bee 0.0s
=> => naming to docker.io/library/fauxpilot-triton 0.0s
=> [copilot_proxy 1/5] FROM docker.io/library/python:3.10-slim-buster@sha256:37aa274c2d001f09b14828450d903c55f821c90f225fdfdd80c5180fcca77b3f 0.0s
=> [copilot_proxy internal] load build context 0.3s
=> => transferring context: 1.10kB 0.3s
=> CACHED [copilot_proxy 2/5] WORKDIR /python-docker 0.0s
=> CACHED [copilot_proxy 3/5] COPY copilot_proxy/requirements.txt requirements.txt 0.0s
=> CACHED [copilot_proxy 4/5] RUN pip3 install --no-cache-dir -r requirements.txt 0.0s
=> CACHED [copilot_proxy 5/5] COPY copilot_proxy . 0.0s
=> [copilot_proxy] exporting to image 0.0s
=> => exporting layers 0.0s
=> => writing image sha256:6aaa5d89d067dcc60e23eed04bb393abeb1d1e62ff46fd6031ee15d63a480801 0.0s
=> => naming to docker.io/library/fauxpilot-copilot_proxy 0.0s
[+] Running 2/0
✔ Container fauxpilot-copilot_proxy-1 Created 0.0s
✔ Container fauxpilot-triton-1 Created 0.0s
Attaching to fauxpilot-copilot_proxy-1, fauxpilot-triton-1
fauxpilot-triton-1 |
fauxpilot-triton-1 | =============================
fauxpilot-triton-1 | == Triton Inference Server ==
fauxpilot-triton-1 | =============================
fauxpilot-triton-1 |
fauxpilot-triton-1 | NVIDIA Release 22.06 (build 39726160)
fauxpilot-triton-1 | Triton Server Version 2.23.0
fauxpilot-triton-1 |
fauxpilot-triton-1 | Copyright (c) 2018-2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
fauxpilot-triton-1 |
fauxpilot-triton-1 | Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES. All rights reserved.
fauxpilot-triton-1 |
fauxpilot-triton-1 | This container image and its contents are governed by the NVIDIA Deep Learning Container License.
fauxpilot-triton-1 | By pulling and using the container, you accept the terms and conditions of this license:
fauxpilot-triton-1 | https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license
fauxpilot-copilot_proxy-1 | INFO: Started server process [1]
fauxpilot-copilot_proxy-1 | INFO: Waiting for application startup.
fauxpilot-copilot_proxy-1 | INFO: Application startup complete.
fauxpilot-copilot_proxy-1 | INFO: Uvicorn running on http://0.0.0.0:5000 (Press CTRL+C to quit)
fauxpilot-triton-1 |
fauxpilot-triton-1 | I1021 20:33:12.659520 88 pinned_memory_manager.cc:240] Pinned memory pool is created at '0x204e00000' with size 268435456
fauxpilot-triton-1 | I1021 20:33:12.659623 88 cuda_memory_manager.cc:105] CUDA memory pool is created on device 0 with size 67108864
fauxpilot-triton-1 | I1021 20:33:17.888475 88 model_repository_manager.cc:1191] loading: fastertransformer:1
fauxpilot-triton-1 | I1021 20:33:18.058662 88 libfastertransformer.cc:1226] TRITONBACKEND_Initialize: fastertransformer
fauxpilot-triton-1 | I1021 20:33:18.058688 88 libfastertransformer.cc:1236] Triton TRITONBACKEND API version: 1.10
fauxpilot-triton-1 | I1021 20:33:18.058691 88 libfastertransformer.cc:1242] 'fastertransformer' TRITONBACKEND API version: 1.10
fauxpilot-triton-1 | I1021 20:33:18.058712 88 libfastertransformer.cc:1274] TRITONBACKEND_ModelInitialize: fastertransformer (version 1)
fauxpilot-triton-1 | W1021 20:33:18.059506 88 libfastertransformer.cc:149] model configuration:
fauxpilot-triton-1 | {
fauxpilot-triton-1 | "name": "fastertransformer",
fauxpilot-triton-1 | "platform": "",
fauxpilot-triton-1 | "backend": "fastertransformer",
fauxpilot-triton-1 | "version_policy": {
fauxpilot-triton-1 | "latest": {
fauxpilot-triton-1 | "num_versions": 1
fauxpilot-triton-1 | }
fauxpilot-triton-1 | },
fauxpilot-triton-1 | "max_batch_size": 1024,
fauxpilot-triton-1 | "input": [
fauxpilot-triton-1 | {
fauxpilot-triton-1 | "name": "input_ids",
fauxpilot-triton-1 | "data_type": "TYPE_UINT32",
fauxpilot-triton-1 | "format": "FORMAT_NONE",
fauxpilot-triton-1 | "dims": [
fauxpilot-triton-1 | -1
fauxpilot-triton-1 | ],
fauxpilot-triton-1 | "is_shape_tensor": false,
fauxpilot-triton-1 | "allow_ragged_batch": false,
fauxpilot-triton-1 | "optional": false
fauxpilot-triton-1 | },
fauxpilot-triton-1 | {
fauxpilot-triton-1 | "name": "start_id",
fauxpilot-triton-1 | "data_type": "TYPE_UINT32",
fauxpilot-triton-1 | "format": "FORMAT_NONE",
fauxpilot-triton-1 | "dims": [
fauxpilot-triton-1 | 1
fauxpilot-triton-1 | ],
fauxpilot-triton-1 | "reshape": {
fauxpilot-triton-1 | "shape": []
fauxpilot-triton-1 | },
fauxpilot-triton-1 | "is_shape_tensor": false,
fauxpilot-triton-1 | "allow_ragged_batch": false,
fauxpilot-triton-1 | "optional": true
fauxpilot-triton-1 | },
fauxpilot-triton-1 | {
fauxpilot-triton-1 | "name": "end_id",
fauxpilot-triton-1 | "data_type": "TYPE_UINT32",
fauxpilot-triton-1 | "format": "FORMAT_NONE",
fauxpilot-triton-1 | "dims": [
fauxpilot-triton-1 | 1
fauxpilot-triton-1 | ],
fauxpilot-triton-1 | "reshape": {
fauxpilot-triton-1 | "shape": []
fauxpilot-triton-1 | },
fauxpilot-triton-1 | "is_shape_tensor": false,
fauxpilot-triton-1 | "allow_ragged_batch": false,
fauxpilot-triton-1 | "optional": true
fauxpilot-triton-1 | },
fauxpilot-triton-1 | {
fauxpilot-triton-1 | "name": "input_lengths",
fauxpilot-triton-1 | "data_type": "TYPE_UINT32",
fauxpilot-triton-1 | "format": "FORMAT_NONE",
fauxpilot-triton-1 | "dims": [
fauxpilot-triton-1 | 1
fauxpilot-triton-1 | ],
fauxpilot-triton-1 | "reshape": {
fauxpilot-triton-1 | "shape": []
fauxpilot-triton-1 | },
fauxpilot-triton-1 | "is_shape_tensor": false,
fauxpilot-triton-1 | "allow_ragged_batch": false,
fauxpilot-triton-1 | "optional": false
fauxpilot-triton-1 | },
fauxpilot-triton-1 | {
fauxpilot-triton-1 | "name": "request_output_len",
fauxpilot-triton-1 | "data_type": "TYPE_UINT32",
fauxpilot-triton-1 | "format": "FORMAT_NONE",
fauxpilot-triton-1 | "dims": [
fauxpilot-triton-1 | -1
fauxpilot-triton-1 | ],
fauxpilot-triton-1 | "is_shape_tensor": false,
fauxpilot-triton-1 | "allow_ragged_batch": false,
fauxpilot-triton-1 | "optional": false
fauxpilot-triton-1 | },
fauxpilot-triton-1 | {
fauxpilot-triton-1 | "name": "runtime_top_k",
fauxpilot-triton-1 | "data_type": "TYPE_UINT32",
fauxpilot-triton-1 | "format": "FORMAT_NONE",
fauxpilot-triton-1 | "dims": [
fauxpilot-triton-1 | 1
fauxpilot-triton-1 | ],
fauxpilot-triton-1 | "reshape": {
fauxpilot-triton-1 | "shape": []
fauxpilot-triton-1 | },
fauxpilot-triton-1 | "is_shape_tensor": false,
fauxpilot-triton-1 | "allow_ragged_batch": false,
fauxpilot-triton-1 | "optional": true
fauxpilot-triton-1 | },
fauxpilot-triton-1 | {
fauxpilot-triton-1 | "name": "runtime_top_p",
fauxpilot-triton-1 | "data_type": "TYPE_FP32",
fauxpilot-triton-1 | "format": "FORMAT_NONE",
fauxpilot-triton-1 | "dims": [
fauxpilot-triton-1 | 1
fauxpilot-triton-1 | ],
fauxpilot-triton-1 | "reshape": {
fauxpilot-triton-1 | "shape": []
fauxpilot-triton-1 | },
fauxpilot-triton-1 | "is_shape_tensor": false,
fauxpilot-triton-1 | "allow_ragged_batch": false,
fauxpilot-triton-1 | "optional": true
fauxpilot-triton-1 | },
fauxpilot-triton-1 | {
fauxpilot-triton-1 | "name": "beam_search_diversity_rate",
fauxpilot-triton-1 | "data_type": "TYPE_FP32",
fauxpilot-triton-1 | "format": "FORMAT_NONE",
fauxpilot-triton-1 | "dims": [
fauxpilot-triton-1 | 1
fauxpilot-triton-1 | ],
fauxpilot-triton-1 | "reshape": {
fauxpilot-triton-1 | "shape": []
fauxpilot-triton-1 | },
fauxpilot-triton-1 | "is_shape_tensor": false,
fauxpilot-triton-1 | "allow_ragged_batch": false,
fauxpilot-triton-1 | "optional": true
fauxpilot-triton-1 | },
fauxpilot-triton-1 | {
fauxpilot-triton-1 | "name": "temperature",
fauxpilot-triton-1 | "data_type": "TYPE_FP32",
fauxpilot-triton-1 | "format": "FORMAT_NONE",
fauxpilot-triton-1 | "dims": [
fauxpilot-triton-1 | 1
fauxpilot-triton-1 | ],
fauxpilot-triton-1 | "reshape": {
fauxpilot-triton-1 | "shape": []
fauxpilot-triton-1 | },
fauxpilot-triton-1 | "is_shape_tensor": false,
fauxpilot-triton-1 | "allow_ragged_batch": false,
fauxpilot-triton-1 | "optional": true
fauxpilot-triton-1 | },
fauxpilot-triton-1 | {
fauxpilot-triton-1 | "name": "len_penalty",
fauxpilot-triton-1 | "data_type": "TYPE_FP32",
fauxpilot-triton-1 | "format": "FORMAT_NONE",
fauxpilot-triton-1 | "dims": [
fauxpilot-triton-1 | 1
fauxpilot-triton-1 | ],
fauxpilot-triton-1 | "reshape": {
fauxpilot-triton-1 | "shape": []
fauxpilot-triton-1 | },
fauxpilot-triton-1 | "is_shape_tensor": false,
fauxpilot-triton-1 | "allow_ragged_batch": false,
fauxpilot-triton-1 | "optional": true
fauxpilot-triton-1 | },
fauxpilot-triton-1 | {
fauxpilot-triton-1 | "name": "repetition_penalty",
fauxpilot-triton-1 | "data_type": "TYPE_FP32",
fauxpilot-triton-1 | "format": "FORMAT_NONE",
fauxpilot-triton-1 | "dims": [
fauxpilot-triton-1 | 1
fauxpilot-triton-1 | ],
fauxpilot-triton-1 | "reshape": {
fauxpilot-triton-1 | "shape": []
fauxpilot-triton-1 | },
fauxpilot-triton-1 | "is_shape_tensor": false,
fauxpilot-triton-1 | "allow_ragged_batch": false,
fauxpilot-triton-1 | "optional": true
fauxpilot-triton-1 | },
fauxpilot-triton-1 | {
fauxpilot-triton-1 | "name": "random_seed",
fauxpilot-triton-1 | "data_type": "TYPE_INT32",
fauxpilot-triton-1 | "format": "FORMAT_NONE",
fauxpilot-triton-1 | "dims": [
fauxpilot-triton-1 | 1
fauxpilot-triton-1 | ],
fauxpilot-triton-1 | "reshape": {
fauxpilot-triton-1 | "shape": []
fauxpilot-triton-1 | },
fauxpilot-triton-1 | "is_shape_tensor": false,
fauxpilot-triton-1 | "allow_ragged_batch": false,
fauxpilot-triton-1 | "optional": true
fauxpilot-triton-1 | },
fauxpilot-triton-1 | {
fauxpilot-triton-1 | "name": "is_return_log_probs",
fauxpilot-triton-1 | "data_type": "TYPE_BOOL",
fauxpilot-triton-1 | "format": "FORMAT_NONE",
fauxpilot-triton-1 | "dims": [
fauxpilot-triton-1 | 1
fauxpilot-triton-1 | ],
fauxpilot-triton-1 | "reshape": {
fauxpilot-triton-1 | "shape": []
fauxpilot-triton-1 | },
fauxpilot-triton-1 | "is_shape_tensor": false,
fauxpilot-triton-1 | "allow_ragged_batch": false,
fauxpilot-triton-1 | "optional": true
fauxpilot-triton-1 | },
fauxpilot-triton-1 | {
fauxpilot-triton-1 | "name": "beam_width",
fauxpilot-triton-1 | "data_type": "TYPE_UINT32",
fauxpilot-triton-1 | "format": "FORMAT_NONE",
fauxpilot-triton-1 | "dims": [
fauxpilot-triton-1 | 1
fauxpilot-triton-1 | ],
fauxpilot-triton-1 | "reshape": {
fauxpilot-triton-1 | "shape": []
fauxpilot-triton-1 | },
fauxpilot-triton-1 | "is_shape_tensor": false,
fauxpilot-triton-1 | "allow_ragged_batch": false,
fauxpilot-triton-1 | "optional": true
fauxpilot-triton-1 | },
fauxpilot-triton-1 | {
fauxpilot-triton-1 | "name": "bad_words_list",
fauxpilot-triton-1 | "data_type": "TYPE_INT32",
fauxpilot-triton-1 | "format": "FORMAT_NONE",
fauxpilot-triton-1 | "dims": [
fauxpilot-triton-1 | 2,
fauxpilot-triton-1 | -1
fauxpilot-triton-1 | ],
fauxpilot-triton-1 | "is_shape_tensor": false,
fauxpilot-triton-1 | "allow_ragged_batch": false,
fauxpilot-triton-1 | "optional": true
fauxpilot-triton-1 | },
fauxpilot-triton-1 | {
fauxpilot-triton-1 | "name": "stop_words_list",
fauxpilot-triton-1 | "data_type": "TYPE_INT32",
fauxpilot-triton-1 | "format": "FORMAT_NONE",
fauxpilot-triton-1 | "dims": [
fauxpilot-triton-1 | 2,
fauxpilot-triton-1 | -1
fauxpilot-triton-1 | ],
fauxpilot-triton-1 | "is_shape_tensor": false,
fauxpilot-triton-1 | "allow_ragged_batch": false,
fauxpilot-triton-1 | "optional": true
fauxpilot-triton-1 | }
fauxpilot-triton-1 | ],
fauxpilot-triton-1 | "output": [
fauxpilot-triton-1 | {
fauxpilot-triton-1 | "name": "output_ids",
fauxpilot-triton-1 | "data_type": "TYPE_UINT32",
fauxpilot-triton-1 | "dims": [
fauxpilot-triton-1 | -1,
fauxpilot-triton-1 | -1
fauxpilot-triton-1 | ],
fauxpilot-triton-1 | "label_filename": "",
fauxpilot-triton-1 | "is_shape_tensor": false
fauxpilot-triton-1 | },
fauxpilot-triton-1 | {
fauxpilot-triton-1 | "name": "sequence_length",
fauxpilot-triton-1 | "data_type": "TYPE_UINT32",
fauxpilot-triton-1 | "dims": [
fauxpilot-triton-1 | -1
fauxpilot-triton-1 | ],
fauxpilot-triton-1 | "label_filename": "",
fauxpilot-triton-1 | "is_shape_tensor": false
fauxpilot-triton-1 | },
fauxpilot-triton-1 | {
fauxpilot-triton-1 | "name": "cum_log_probs",
fauxpilot-triton-1 | "data_type": "TYPE_FP32",
fauxpilot-triton-1 | "dims": [
fauxpilot-triton-1 | -1
fauxpilot-triton-1 | ],
fauxpilot-triton-1 | "label_filename": "",
fauxpilot-triton-1 | "is_shape_tensor": false
fauxpilot-triton-1 | },
fauxpilot-triton-1 | {
fauxpilot-triton-1 | "name": "output_log_probs",
fauxpilot-triton-1 | "data_type": "TYPE_FP32",
fauxpilot-triton-1 | "dims": [
fauxpilot-triton-1 | -1,
fauxpilot-triton-1 | -1
fauxpilot-triton-1 | ],
fauxpilot-triton-1 | "label_filename": "",
fauxpilot-triton-1 | "is_shape_tensor": false
fauxpilot-triton-1 | }
fauxpilot-triton-1 | ],
fauxpilot-triton-1 | "batch_input": [],
fauxpilot-triton-1 | "batch_output": [],
fauxpilot-triton-1 | "optimization": {
fauxpilot-triton-1 | "priority": "PRIORITY_DEFAULT",
fauxpilot-triton-1 | "input_pinned_memory": {
fauxpilot-triton-1 | "enable": true
fauxpilot-triton-1 | },
fauxpilot-triton-1 | "output_pinned_memory": {
fauxpilot-triton-1 | "enable": true
fauxpilot-triton-1 | },
fauxpilot-triton-1 | "gather_kernel_buffer_threshold": 0,
fauxpilot-triton-1 | "eager_batching": false
fauxpilot-triton-1 | },
fauxpilot-triton-1 | "instance_group": [
fauxpilot-triton-1 | {
fauxpilot-triton-1 | "name": "fastertransformer_0",
fauxpilot-triton-1 | "kind": "KIND_CPU",
fauxpilot-triton-1 | "count": 1,
fauxpilot-triton-1 | "gpus": [],
fauxpilot-triton-1 | "secondary_devices": [],
fauxpilot-triton-1 | "profile": [],
fauxpilot-triton-1 | "passive": false,
fauxpilot-triton-1 | "host_policy": ""
fauxpilot-triton-1 | }
fauxpilot-triton-1 | ],
fauxpilot-triton-1 | "default_model_filename": "codegen-6B-mono",
fauxpilot-triton-1 | "cc_model_filenames": {},
fauxpilot-triton-1 | "metric_tags": {},
fauxpilot-triton-1 | "parameters": {
fauxpilot-triton-1 | "model_name": {
fauxpilot-triton-1 | "string_value": "codegen-6B-mono"
fauxpilot-triton-1 | },
fauxpilot-triton-1 | "is_half": {
fauxpilot-triton-1 | "string_value": "1"
fauxpilot-triton-1 | },
fauxpilot-triton-1 | "enable_custom_all_reduce": {
fauxpilot-triton-1 | "string_value": "0"
fauxpilot-triton-1 | },
fauxpilot-triton-1 | "vocab_size": {
fauxpilot-triton-1 | "string_value": "51200"
fauxpilot-triton-1 | },
fauxpilot-triton-1 | "tensor_para_size": {
fauxpilot-triton-1 | "string_value": "2"
fauxpilot-triton-1 | },
fauxpilot-triton-1 | "decoder_layers": {
fauxpilot-triton-1 | "string_value": "33"
fauxpilot-triton-1 | },
fauxpilot-triton-1 | "size_per_head": {
fauxpilot-triton-1 | "string_value": "256"
fauxpilot-triton-1 | },
fauxpilot-triton-1 | "max_seq_len": {
fauxpilot-triton-1 | "string_value": "2048"
fauxpilot-triton-1 | },
fauxpilot-triton-1 | "end_id": {
fauxpilot-triton-1 | "string_value": "50256"
fauxpilot-triton-1 | },
fauxpilot-triton-1 | "inter_size": {
fauxpilot-triton-1 | "string_value": "16384"
fauxpilot-triton-1 | },
fauxpilot-triton-1 | "head_num": {
fauxpilot-triton-1 | "string_value": "16"
fauxpilot-triton-1 | },
fauxpilot-triton-1 | "model_type": {
fauxpilot-triton-1 | "string_value": "GPT-J"
fauxpilot-triton-1 | },
fauxpilot-triton-1 | "model_checkpoint_path": {
fauxpilot-triton-1 | "string_value": "/model/fastertransformer/1/2-gpu"
fauxpilot-triton-1 | },
fauxpilot-triton-1 | "rotary_embedding": {
fauxpilot-triton-1 | "string_value": "64"
fauxpilot-triton-1 | },
fauxpilot-triton-1 | "pipeline_para_size": {
fauxpilot-triton-1 | "string_value": "1"
fauxpilot-triton-1 | },
fauxpilot-triton-1 | "start_id": {
fauxpilot-triton-1 | "string_value": "50256"
fauxpilot-triton-1 | }
fauxpilot-triton-1 | },
fauxpilot-triton-1 | "model_warmup": []
fauxpilot-triton-1 | }
fauxpilot-triton-1 | I1021 20:33:18.059575 88 libfastertransformer.cc:1320] TRITONBACKEND_ModelInstanceInitialize: fastertransformer_0 (device 0)
fauxpilot-triton-1 | W1021 20:33:18.059594 88 libfastertransformer.cc:453] Faster transformer model instance is created at GPU '0'
fauxpilot-triton-1 | W1021 20:33:18.059596 88 libfastertransformer.cc:459] Model name codegen-6B-mono
fauxpilot-triton-1 | W1021 20:33:18.059601 88 libfastertransformer.cc:578] Get input name: input_ids, type: TYPE_UINT32, shape: [-1]
fauxpilot-triton-1 | W1021 20:33:18.059603 88 libfastertransformer.cc:578] Get input name: start_id, type: TYPE_UINT32, shape: [1]
fauxpilot-triton-1 | W1021 20:33:18.059605 88 libfastertransformer.cc:578] Get input name: end_id, type: TYPE_UINT32, shape: [1]
fauxpilot-triton-1 | W1021 20:33:18.059606 88 libfastertransformer.cc:578] Get input name: input_lengths, type: TYPE_UINT32, shape: [1]
fauxpilot-triton-1 | W1021 20:33:18.059608 88 libfastertransformer.cc:578] Get input name: request_output_len, type: TYPE_UINT32, shape: [-1]
fauxpilot-triton-1 | W1021 20:33:18.059609 88 libfastertransformer.cc:578] Get input name: runtime_top_k, type: TYPE_UINT32, shape: [1]
fauxpilot-triton-1 | W1021 20:33:18.059611 88 libfastertransformer.cc:578] Get input name: runtime_top_p, type: TYPE_FP32, shape: [1]
fauxpilot-triton-1 | W1021 20:33:18.059612 88 libfastertransformer.cc:578] Get input name: beam_search_diversity_rate, type: TYPE_FP32, shape: [1]
fauxpilot-triton-1 | W1021 20:33:18.059614 88 libfastertransformer.cc:578] Get input name: temperature, type: TYPE_FP32, shape: [1]
fauxpilot-triton-1 | W1021 20:33:18.059615 88 libfastertransformer.cc:578] Get input name: len_penalty, type: TYPE_FP32, shape: [1]
fauxpilot-triton-1 | W1021 20:33:18.059616 88 libfastertransformer.cc:578] Get input name: repetition_penalty, type: TYPE_FP32, shape: [1]
fauxpilot-triton-1 | W1021 20:33:18.059618 88 libfastertransformer.cc:578] Get input name: random_seed, type: TYPE_INT32, shape: [1]
fauxpilot-triton-1 | W1021 20:33:18.059619 88 libfastertransformer.cc:578] Get input name: is_return_log_probs, type: TYPE_BOOL, shape: [1]
fauxpilot-triton-1 | W1021 20:33:18.059621 88 libfastertransformer.cc:578] Get input name: beam_width, type: TYPE_UINT32, shape: [1]
fauxpilot-triton-1 | W1021 20:33:18.059623 88 libfastertransformer.cc:578] Get input name: bad_words_list, type: TYPE_INT32, shape: [2, -1]
fauxpilot-triton-1 | W1021 20:33:18.059625 88 libfastertransformer.cc:578] Get input name: stop_words_list, type: TYPE_INT32, shape: [2, -1]
fauxpilot-triton-1 | W1021 20:33:18.059628 88 libfastertransformer.cc:620] Get output name: output_ids, type: TYPE_UINT32, shape: [-1, -1]
fauxpilot-triton-1 | W1021 20:33:18.059630 88 libfastertransformer.cc:620] Get output name: sequence_length, type: TYPE_UINT32, shape: [-1]
fauxpilot-triton-1 | W1021 20:33:18.059632 88 libfastertransformer.cc:620] Get output name: cum_log_probs, type: TYPE_FP32, shape: [-1]
fauxpilot-triton-1 | W1021 20:33:18.059634 88 libfastertransformer.cc:620] Get output name: output_log_probs, type: TYPE_FP32, shape: [-1, -1]
fauxpilot-triton-1 | terminate called after throwing an instance of 'std::runtime_error'
fauxpilot-triton-1 | what(): [FT][ERROR] shared_ft_model->getTensorParaSize() * shared_ft_model->getPipelineParaSize() == world_size Assertion fail: /workspace/build/fastertransformer_backend/src/libfastertransformer.cc:498
fauxpilot-triton-1 |
fauxpilot-triton-1 | [7704449ec6f1:00088] *** Process received signal ***
fauxpilot-triton-1 | [7704449ec6f1:00088] Signal: Aborted (6)
fauxpilot-triton-1 | [7704449ec6f1:00088] Signal code: (-6)
fauxpilot-triton-1 | [7704449ec6f1:00088] [ 0] /usr/lib/x86_64-linux-gnu/libpthread.so.0(+0x14420)[0x7f3aa66b6420]
fauxpilot-triton-1 | [7704449ec6f1:00088] [ 1] /usr/lib/x86_64-linux-gnu/libc.so.6(gsignal+0xcb)[0x7f3aa50aa00b]
fauxpilot-triton-1 | [7704449ec6f1:00088] [ 2] /usr/lib/x86_64-linux-gnu/libc.so.6(abort+0x12b)[0x7f3aa5089859]
fauxpilot-triton-1 | [7704449ec6f1:00088] [ 3] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0x9e911)[0x7f3aa5463911]
fauxpilot-triton-1 | [7704449ec6f1:00088] [ 4] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xaa38c)[0x7f3aa546f38c]
fauxpilot-triton-1 | [7704449ec6f1:00088] [ 5] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xaa3f7)[0x7f3aa546f3f7]
fauxpilot-triton-1 | [7704449ec6f1:00088] [ 6] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xaa6a9)[0x7f3aa546f6a9]
fauxpilot-triton-1 | [7704449ec6f1:00088] [ 7] /opt/tritonserver/backends/fastertransformer/libtriton_fastertransformer.so(+0x2a9a0)[0x7f3a930639a0]
fauxpilot-triton-1 | [7704449ec6f1:00088] [ 8] /opt/tritonserver/backends/fastertransformer/libtriton_fastertransformer.so(+0x1e79f)[0x7f3a9305779f]
fauxpilot-triton-1 | [7704449ec6f1:00088] [ 9] /opt/tritonserver/backends/fastertransformer/libtriton_fastertransformer.so(+0x1fd42)[0x7f3a93058d42]
fauxpilot-triton-1 | [7704449ec6f1:00088] [10] /opt/tritonserver/backends/fastertransformer/libtriton_fastertransformer.so(TRITONBACKEND_ModelInstanceInitialize+0x38c)[0x7f3a9305b63c]
fauxpilot-triton-1 | [7704449ec6f1:00088] [11] /opt/tritonserver/bin/../lib/libtritonserver.so(+0x10c275)[0x7f3aa5958275]
fauxpilot-triton-1 | [7704449ec6f1:00088] [12] /opt/tritonserver/bin/../lib/libtritonserver.so(+0x10d9c3)[0x7f3aa59599c3]
fauxpilot-triton-1 | [7704449ec6f1:00088] [13] /opt/tritonserver/bin/../lib/libtritonserver.so(+0x1019de)[0x7f3aa594d9de]
fauxpilot-triton-1 | [7704449ec6f1:00088] [14] /opt/tritonserver/bin/../lib/libtritonserver.so(+0x1b3b7a)[0x7f3aa59ffb7a]
fauxpilot-triton-1 | [7704449ec6f1:00088] [15] /opt/tritonserver/bin/../lib/libtritonserver.so(+0x1c29a1)[0x7f3aa5a0e9a1]
fauxpilot-triton-1 | [7704449ec6f1:00088] [16] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xd6de4)[0x7f3aa549bde4]
fauxpilot-triton-1 | [7704449ec6f1:00088] [17] /usr/lib/x86_64-linux-gnu/libpthread.so.0(+0x8609)[0x7f3aa66aa609]
fauxpilot-triton-1 | [7704449ec6f1:00088] [18] /usr/lib/x86_64-linux-gnu/libc.so.6(clone+0x43)[0x7f3aa5186133]
fauxpilot-triton-1 | [7704449ec6f1:00088] *** End of error message ***
fauxpilot-triton-1 | --------------------------------------------------------------------------
fauxpilot-triton-1 | Primary job terminated normally, but 1 process returned
fauxpilot-triton-1 | a non-zero exit code. Per user-direction, the job has been aborted.
fauxpilot-triton-1 | --------------------------------------------------------------------------
fauxpilot-triton-1 | --------------------------------------------------------------------------
fauxpilot-triton-1 | mpirun noticed that process rank 0 with PID 0 on node 7704449ec6f1 exited on signal 6 (Aborted).
fauxpilot-triton-1 | --------------------------------------------------------------------------
fauxpilot-triton-1 exited with code 134
Edit: my first gpu is Intel Xeon integrated graphics, this might not be an usable GPU for fauxpilot since the error
from fauxpilot.
Related Issues (20)
- Support arm64 to minimize cost
- Maybe add windows/etc installer all-in-one in this project's 'releases'.
- 400 Bad Request when file has around 100 lines of code HOT 3
- C# support! HOT 2
- Hello all. The comments above have been very helpful in setting up the Copilot extension. I managed to get it to work with my instance and figured I would combine the steps I used (this is for Windows. Linux installation is similar, just different locations):
- It was working fine before... HOT 1
- Support for AMD GPUs HOT 1
- Triton doesnt exist anymore I think? HOT 3
- K8s deployment (via helm chart) HOT 2
- Caught signal 11 (Segmentation fault: address not mapped to object at address (nil)) HOT 1
- why my response are all !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! HOT 3
- Can I merge images of triton and client into one?eg fastertransformer_backend get content_fetch <fastertransformer&client>in CMakeLists ? HOT 1
- help me HOT 1
- What is the comparison of these model in huggingface? HOT 2
- Python Backend: "Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0" HOT 2
- [promptlib] proxy {"cause":{}} HOT 1
- ollama HOT 2
- Company Proxy HOT 1
- is documentation outdated?
- Jetbrains Support
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from fauxpilot.