Giter Club home page Giter Club logo

aikit's Introduction

Hi there ๐Ÿ‘‹

aikit's People

Contributors

dependabot[bot] avatar sozercan avatar step-security-bot avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

aikit's Issues

[REQ] release should trigger model update

What kind of request is this?

Improvement of existing experience

What is your request or suggestion?

release should trigger update-model action automatically

Are you willing to submit PRs to contribute to this feature request?

  • Yes, I am willing to implement it.

[BUG] grpc service not ready

Expected Behavior

After building an image with buildx and running it on a container, we should be able to communicate with the LLM and start curl prompt

Actual Behavior

The LLM isn't responding, returning an error 500 : grpc service not ready

Steps To Reproduce

  1. Build the image with the aikitfile.yaml (same as here)
  2. run the image using docker run -d --rm -p 8080:8080 my-model
  3. The image is successfully running
  4. When trying to curl a prompt :
curl http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{
     "model": "llama-2-7b-chat",
     "messages": [{"role": "user", "content": "explain kubernetes in a sentence"}]
   }'

This wait a few seconds, then return {"error":{"code":500,"message":"grpc service not ready","type":""}}

Docker Logs

7:19AM DBG no galleries to load
7:19AM INF Starting LocalAI using 4 threads, with models path: /models
7:19AM INF LocalAI version: v2.9.0 (ff88c390bb51d9567572815a63c575eb2e3dd062)
7:19AM INF Preloading models from /models
7:19AM INF Model name: llama-2-7b-chat
7:19AM DBG Model: llama-2-7b-chat (config: {PredictionOptions:{Model:llama-2-7b-chat.Q4_K_M.gguf Language: N:0 TopP:0.7 TopK:80 Temperature:0.2 Maxtokens:0 Echo:false Batch:0 F16:false IgnoreEOS:false RepeatPenalty:0 Keep:0 MirostatETA:0 MirostatTAU:0 Mirostat:0 FrequencyPenalty:0 TFZ:0 TypicalP:0 Seed:0 NegativePrompt: RopeFreqBase:0 RopeFreqScale:0 NegativePromptScale:0 UseFastTokenizer:false ClipSkip:0 Tokenizer:} Name:llama-2-7b-chat F16:false Threads:0 Debug:false Roles:map[assistant:Assistant: assistant_function_call:Function Call: function:Function Result: system:System: user:User:] Embeddings:false Backend:llama TemplateConfig:{Chat: ChatMessage:llama-2-7b-chat Completion: Edit: Functions:} PromptStrings:[] InputStrings:[] InputToken:[] functionCallString: functionCallNameString: FunctionsConfig:{DisableNoAction:false NoActionFunctionName: NoActionDescriptionName: ParallelCalls:false} FeatureFlag:map[] LLMConfig:{SystemPrompt:You are a helpful assistant, below is a conversation, please respond with the next message and do not ask follow-up questions TensorSplit: MainGPU: RMSNormEps:0 NGQA:0 PromptCachePath: PromptCacheAll:false PromptCacheRO:false MirostatETA:0 MirostatTAU:0 Mirostat:0 NGPULayers:0 MMap:false MMlock:false LowVRAM:false Grammar: StopWords:[] Cutstrings:[] TrimSpace:[] TrimSuffix:[] ContextSize:4096 NUMA:false LoraAdapter: LoraBase: LoraScale:0 NoMulMatQ:false DraftModel: NDraft:0 Quantization: MMProj: RopeScaling: ModelType: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0} AutoGPTQ:{ModelBaseName: Device: Triton:false UseFastTokenizer:false} Diffusers:{CUDA:false PipelineType: SchedulerType: EnableParameters: CFGScale:0 IMG2IMG:false ClipSkip:0 ClipModel: ClipSubFolder: ControlNet:} Step:0 GRPC:{Attempts:0 AttemptsSleepTime:0} VallE:{AudioPath:} CUDA:false DownloadFiles:[] Description: Usage:})
7:19AM DBG Extracting backend assets files to /tmp/localai/backend_data
7:19AM DBG No uploadedFiles file found at /tmp/localai/upload/uploadedFiles.json

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ Fiber v2.50.0 โ”‚
โ”‚ http://127.0.0.1:8080 โ”‚
โ”‚ (bound on host 0.0.0.0 and port 8080) โ”‚
โ”‚ โ”‚
โ”‚ Handlers ........... 105 Processes ........... 1 โ”‚
โ”‚ Prefork ....... Disabled PID ................. 1 โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

7:20AM DBG Request received:
7:20AM DBG Configuration read: &{PredictionOptions:{Model:llama-2-7b-chat.Q4_K_M.gguf Language: N:0 TopP:0.7 TopK:80 Temperature:0.2 Maxtokens:0 Echo:false Batch:0 F16:false IgnoreEOS:false RepeatPenalty:0 Keep:0 MirostatETA:0 MirostatTAU:0 Mirostat:0 FrequencyPenalty:0 TFZ:0 TypicalP:0 Seed:0 NegativePrompt: RopeFreqBase:0 RopeFreqScale:0 NegativePromptScale:0 UseFastTokenizer:false ClipSkip:0 Tokenizer:} Name:llama-2-7b-chat F16:false Threads:4 Debug:true Roles:map[assistant:Assistant: assistant_function_call:Function Call: function:Function Result: system:System: user:User:] Embeddings:false Backend:llama TemplateConfig:{Chat: ChatMessage:llama-2-7b-chat Completion: Edit: Functions:} PromptStrings:[] InputStrings:[] InputToken:[] functionCallString: functionCallNameString: FunctionsConfig:{DisableNoAction:false NoActionFunctionName: NoActionDescriptionName: ParallelCalls:false} FeatureFlag:map[] LLMConfig:{SystemPrompt:You are a helpful assistant, below is a conversation, please respond with the next message and do not ask follow-up questions TensorSplit: MainGPU: RMSNormEps:0 NGQA:0 PromptCachePath: PromptCacheAll:false PromptCacheRO:false MirostatETA:0 MirostatTAU:0 Mirostat:0 NGPULayers:0 MMap:false MMlock:false LowVRAM:false Grammar: StopWords:[] Cutstrings:[] TrimSpace:[] TrimSuffix:[] ContextSize:4096 NUMA:false LoraAdapter: LoraBase: LoraScale:0 NoMulMatQ:false DraftModel: NDraft:0 Quantization: MMProj: RopeScaling: ModelType: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0} AutoGPTQ:{ModelBaseName: Device: Triton:false UseFastTokenizer:false} Diffusers:{CUDA:false PipelineType: SchedulerType: EnableParameters: CFGScale:0 IMG2IMG:false ClipSkip:0 ClipModel: ClipSubFolder: ControlNet:} Step:0 GRPC:{Attempts:0 AttemptsSleepTime:0} VallE:{AudioPath:} CUDA:false DownloadFiles:[] Description: Usage:}
7:20AM DBG Parameters: &{PredictionOptions:{Model:llama-2-7b-chat.Q4_K_M.gguf Language: N:0 TopP:0.7 TopK:80 Temperature:0.2 Maxtokens:0 Echo:false Batch:0 F16:false IgnoreEOS:false RepeatPenalty:0 Keep:0 MirostatETA:0 MirostatTAU:0 Mirostat:0 FrequencyPenalty:0 TFZ:0 TypicalP:0 Seed:0 NegativePrompt: RopeFreqBase:0 RopeFreqScale:0 NegativePromptScale:0 UseFastTokenizer:false ClipSkip:0 Tokenizer:} Name:llama-2-7b-chat F16:false Threads:4 Debug:true Roles:map[assistant:Assistant: assistant_function_call:Function Call: function:Function Result: system:System: user:User:] Embeddings:false Backend:llama TemplateConfig:{Chat: ChatMessage:llama-2-7b-chat Completion: Edit: Functions:} PromptStrings:[] InputStrings:[] InputToken:[] functionCallString: functionCallNameString: FunctionsConfig:{DisableNoAction:false NoActionFunctionName: NoActionDescriptionName: ParallelCalls:false} FeatureFlag:map[] LLMConfig:{SystemPrompt:You are a helpful assistant, below is a conversation, please respond with the next message and do not ask follow-up questions TensorSplit: MainGPU: RMSNormEps:0 NGQA:0 PromptCachePath: PromptCacheAll:false PromptCacheRO:false MirostatETA:0 MirostatTAU:0 Mirostat:0 NGPULayers:0 MMap:false MMlock:false LowVRAM:false Grammar: StopWords:[] Cutstrings:[] TrimSpace:[] TrimSuffix:[] ContextSize:4096 NUMA:false LoraAdapter: LoraBase: LoraScale:0 NoMulMatQ:false DraftModel: NDraft:0 Quantization: MMProj: RopeScaling: ModelType: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0} AutoGPTQ:{ModelBaseName: Device: Triton:false UseFastTokenizer:false} Diffusers:{CUDA:false PipelineType: SchedulerType: EnableParameters: CFGScale:0 IMG2IMG:false ClipSkip:0 ClipModel: ClipSubFolder: ControlNet:} Step:0 GRPC:{Attempts:0 AttemptsSleepTime:0} VallE:{AudioPath:} CUDA:false DownloadFiles:[] Description: Usage:}
7:20AM DBG templated message for chat:
[INST]
You are a helpful assistant, below is a conversation, please respond with the next message and do not ask follow-up questions
[/INST]

7:20AM DBG Prompt (before templating):
[INST]
You are a helpful assistant, below is a conversation, please respond with the next message and do not ask follow-up questions
[/INST]

7:20AM DBG Prompt (after templating):
[INST]
You are a helpful assistant, below is a conversation, please respond with the next message and do not ask follow-up questions
[/INST]

7:20AM INF Loading model 'llama-2-7b-chat.Q4_K_M.gguf' with backend llama
7:20AM DBG llama-cpp is an alias of llama-cpp
7:20AM DBG Loading model in memory from file: /models/llama-2-7b-chat.Q4_K_M.gguf
7:20AM DBG Loading Model llama-2-7b-chat.Q4_K_M.gguf with gRPC (file: /models/llama-2-7b-chat.Q4_K_M.gguf) (backend: llama-cpp): {backendString:llama model:llama-2-7b-chat.Q4_K_M.gguf threads:4 assetDir:/tmp/localai/backend_data context:{emptyCtx:{}} gRPCOptions:0xc0000f2600 externalBackends:map[] grpcAttempts:20 grpcAttemptsDelay:2 singleActiveBackend:false parallelRequests:false}
7:20AM DBG Loading GRPC Process: /tmp/localai/backend_data/backend-assets/grpc/llama-cpp
7:20AM DBG GRPC Service for llama-2-7b-chat.Q4_K_M.gguf will be running at: '127.0.0.1:45815'
7:20AM DBG GRPC Service state dir: /tmp/go-processmanager3527343173
7:20AM DBG GRPC Service Started
7:20AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45815): stderr /tmp/localai/backend_data/backend-assets/grpc/llama-cpp: error while loading shared libraries: libcudart.so.12: cannot open shared object file: No such file or directory
7:20AM ERR Failed starting/connecting to the gRPC service: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:45815: connect: connection refused"
7:20AM DBG GRPC Service NOT ready
[172.17.0.1]:39984 500 - POST /v1/chat/completions

aikitfile.yaml

#syntax=ghcr.io/sozercan/aikit:latest apiVersion: v1alpha1 debug: true runtime: cuda backends: - stablediffusion models: - name: llama-2-7b-chat source: https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGUF/resolve/main/llama-2-7b-chat.Q4_K_M.gguf sha256: "08a5566d61d7cb6b420c3e4387a39e0078e1f2fe5f055f3a03887385304d4bfa" promptTemplates: - name: "llama-2-7b-chat" template: | {{if eq .RoleName \"assistant\"}}{{.Content}}{{else}} [INST] {{if .SystemPrompt}}{{.SystemPrompt}}{{else if eq .RoleName \"system\"}}<>{{.Content}}<>

      {{else if .Content}}{{.Content}}{{end}}
      [/INST]
      {{end}}

config: |

  • name: "llama-2-7b-chat"
    backend: "llama"
    parameters:
    top_k: 80
    temperature: 0.2
    top_p: 0.7
    model: "llama-2-7b-chat.Q4_K_M.gguf"
    context_size: 4096
    roles:
    function: 'Function Result:'
    assistant_function_call: 'Function Call:'
    assistant: 'Assistant:'
    user: 'User:'
    system: 'System:'
    template:
    chat_message: "llama-2-7b-chat"
    system_prompt: "You are a helpful assistant, below is a conversation, please respond with the next message and do not ask follow-up questions"

Please note that :
I'm trying to run llama-2-7b-chat with CUDA
Running on Manjaro (Arch) with all GPU driver, toolkit, etc installed.

Are you willing to submit PRs to contribute to this bug fix?

  • Yes, I am willing to implement it.

[BUG] no galleries to load

Expected Behavior

First output response should be:
{"created":1701236489,"object":"chat.completion","id":"dd1ff40b-31a7-4418-9e32-42151ab6875a","model":"llama-2-7b-chat","choices":[{"index":0,"finish_reason":"stop","message":{"role":"assistant","content":"\nKubernetes is a container orchestration system that automates the deployment, scaling, and management of containerized applications in a microservices architecture."}}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}

Actual Behavior

Response received is:
{"error":{"code":500,"message":"could not load model: rpc error: code = Unavailable desc = error reading from server: EOF","type":""}}

Steps To Reproduce

  1. Install image and start docker by running docker run -d --rm -p 9000:8080 ghcr.io/sozercan/llama2:7b (Port 9000 because 8080 is already in use)
  2. Send curl request curl http://localhost:9000/v1/chat/completions -H "Content-Type: application/json" -d '{ "model": "llama-2-7b-chat", "messages": [{"role": "user", "content": "explain kubernetes in a sentence"}] }'
  3. Observe HTTP 500 error response: {"error":{"code":500,"message":"could not load model: rpc error: code = Unavailable desc = error reading from server: EOF","type":""}}

===============

Logs upon container boot up ๐Ÿ‘

5:37AM DBG no galleries to load
5:37AM INF Starting LocalAI using 4 threads, with models path: /models
5:37AM INF LocalAI version: v2.0.0 (238fec244ae6c9a66bc7fafd76c7e14671110a6f)
5:37AM DBG Model: llama-2-7b-chat (config: {PredictionOptions:{Model:llama-2-7b-chat.Q4_K_M.gguf Language: N:0 TopP:0.7 TopK:80 Temperature:0.2 Maxtokens:0 Echo:false Batch:0 F16:false IgnoreEOS:false RepeatPenalty:0 Keep:0 MirostatETA:0 MirostatTAU:0 Mirostat:0 FrequencyPenalty:0 TFZ:0 TypicalP:0 Seed:0 NegativePrompt: RopeFreqBase:0 RopeFreqScale:0 NegativePromptScale:0 UseFastTokenizer:false ClipSkip:0 Tokenizer:} Name:llama-2-7b-chat F16:false Threads:0 Debug:false Roles:map[] Embeddings:false Backend:llama TemplateConfig:{Chat: ChatMessage: Completion: Edit: Functions:} PromptStrings:[] InputStrings:[] InputToken:[] functionCallString: functionCallNameString: FunctionsConfig:{DisableNoAction:false NoActionFunctionName: NoActionDescriptionName:} FeatureFlag:map[] LLMConfig:{SystemPrompt: TensorSplit: MainGPU: RMSNormEps:0 NGQA:0 PromptCachePath: PromptCacheAll:false PromptCacheRO:false MirostatETA:0 MirostatTAU:0 Mirostat:0 NGPULayers:0 MMap:false MMlock:false LowVRAM:false Grammar: StopWords:[] Cutstrings:[] TrimSpace:[] ContextSize:4096 NUMA:false LoraAdapter: LoraBase: LoraScale:0 NoMulMatQ:false DraftModel: NDraft:0 Quantization: MMProj: RopeScaling: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0} AutoGPTQ:{ModelBaseName: Device: Triton:false UseFastTokenizer:false} Diffusers:{PipelineType: SchedulerType: CUDA:false EnableParameters: CFGScale:0 IMG2IMG:false ClipSkip:0 ClipModel: ClipSubFolder:} Step:0 GRPC:{Attempts:0 AttemptsSleepTime:0} VallE:{AudioPath:}})
5:37AM DBG Extracting backend assets files to /tmp/localai/backend_data

 โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” 
 โ”‚                   Fiber v2.50.0                   โ”‚ 
 โ”‚               http://127.0.0.1:8080               โ”‚ 
 โ”‚       (bound on host 0.0.0.0 and port 8080)       โ”‚ 
 โ”‚                                                   โ”‚ 
 โ”‚ Handlers ............ 74  Processes ........... 1 โ”‚ 
 โ”‚ Prefork ....... Disabled  PID ................. 1 โ”‚ 
 โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ 

Logs upon incoming HTTP request:

5:40AM DBG Request received: 
5:40AM DBG Configuration read: &{PredictionOptions:{Model:llama-2-7b-chat.Q4_K_M.gguf Language: N:0 TopP:0.7 TopK:80 Temperature:0.2 Maxtokens:0 Echo:false Batch:0 F16:false IgnoreEOS:false RepeatPenalty:0 Keep:0 MirostatETA:0 MirostatTAU:0 Mirostat:0 FrequencyPenalty:0 TFZ:0 TypicalP:0 Seed:0 NegativePrompt: RopeFreqBase:0 RopeFreqScale:0 NegativePromptScale:0 UseFastTokenizer:false ClipSkip:0 Tokenizer:} Name:llama-2-7b-chat F16:false Threads:4 Debug:true Roles:map[] Embeddings:false Backend:llama TemplateConfig:{Chat: ChatMessage: Completion: Edit: Functions:} PromptStrings:[] InputStrings:[] InputToken:[] functionCallString: functionCallNameString: FunctionsConfig:{DisableNoAction:false NoActionFunctionName: NoActionDescriptionName:} FeatureFlag:map[] LLMConfig:{SystemPrompt: TensorSplit: MainGPU: RMSNormEps:0 NGQA:0 PromptCachePath: PromptCacheAll:false PromptCacheRO:false MirostatETA:0 MirostatTAU:0 Mirostat:0 NGPULayers:0 MMap:false MMlock:false LowVRAM:false Grammar: StopWords:[] Cutstrings:[] TrimSpace:[] ContextSize:4096 NUMA:false LoraAdapter: LoraBase: LoraScale:0 NoMulMatQ:false DraftModel: NDraft:0 Quantization: MMProj: RopeScaling: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0} AutoGPTQ:{ModelBaseName: Device: Triton:false UseFastTokenizer:false} Diffusers:{PipelineType: SchedulerType: CUDA:false EnableParameters: CFGScale:0 IMG2IMG:false ClipSkip:0 ClipModel: ClipSubFolder:} Step:0 GRPC:{Attempts:0 AttemptsSleepTime:0} VallE:{AudioPath:}}
5:40AM DBG Parameters: &{PredictionOptions:{Model:llama-2-7b-chat.Q4_K_M.gguf Language: N:0 TopP:0.7 TopK:80 Temperature:0.2 Maxtokens:0 Echo:false Batch:0 F16:false IgnoreEOS:false RepeatPenalty:0 Keep:0 MirostatETA:0 MirostatTAU:0 Mirostat:0 FrequencyPenalty:0 TFZ:0 TypicalP:0 Seed:0 NegativePrompt: RopeFreqBase:0 RopeFreqScale:0 NegativePromptScale:0 UseFastTokenizer:false ClipSkip:0 Tokenizer:} Name:llama-2-7b-chat F16:false Threads:4 Debug:true Roles:map[] Embeddings:false Backend:llama TemplateConfig:{Chat: ChatMessage: Completion: Edit: Functions:} PromptStrings:[] InputStrings:[] InputToken:[] functionCallString: functionCallNameString: FunctionsConfig:{DisableNoAction:false NoActionFunctionName: NoActionDescriptionName:} FeatureFlag:map[] LLMConfig:{SystemPrompt: TensorSplit: MainGPU: RMSNormEps:0 NGQA:0 PromptCachePath: PromptCacheAll:false PromptCacheRO:false MirostatETA:0 MirostatTAU:0 Mirostat:0 NGPULayers:0 MMap:false MMlock:false LowVRAM:false Grammar: StopWords:[] Cutstrings:[] TrimSpace:[] ContextSize:4096 NUMA:false LoraAdapter: LoraBase: LoraScale:0 NoMulMatQ:false DraftModel: NDraft:0 Quantization: MMProj: RopeScaling: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0} AutoGPTQ:{ModelBaseName: Device: Triton:false UseFastTokenizer:false} Diffusers:{PipelineType: SchedulerType: CUDA:false EnableParameters: CFGScale:0 IMG2IMG:false ClipSkip:0 ClipModel: ClipSubFolder:} Step:0 GRPC:{Attempts:0 AttemptsSleepTime:0} VallE:{AudioPath:}}
5:40AM DBG Prompt (before templating): explain kubernetes in a sentence
5:40AM DBG Template failed loading: failed loading a template for llama-2-7b-chat.Q4_K_M.gguf
5:40AM DBG Prompt (after templating): explain kubernetes in a sentence
5:40AM DBG Loading model llama from llama-2-7b-chat.Q4_K_M.gguf
5:40AM DBG Loading model in memory from file: /models/llama-2-7b-chat.Q4_K_M.gguf
5:40AM DBG Loading Model llama-2-7b-chat.Q4_K_M.gguf with gRPC (file: /models/llama-2-7b-chat.Q4_K_M.gguf) (backend: llama): {backendString:llama model:llama-2-7b-chat.Q4_K_M.gguf threads:4 assetDir:/tmp/localai/backend_data context:{emptyCtx:{}} gRPCOptions:0xc0002a6780 externalBackends:map[] grpcAttempts:20 grpcAttemptsDelay:2 singleActiveBackend:false parallelRequests:false}
5:40AM DBG Loading GRPC Process: /tmp/localai/backend_data/backend-assets/grpc/llama
5:40AM DBG GRPC Service for llama-2-7b-chat.Q4_K_M.gguf will be running at: '127.0.0.1:45083'
5:40AM DBG GRPC Service state dir: /tmp/go-processmanager2300469877
5:40AM DBG GRPC Service Started
rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:45083: connect: connection refused"
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr 2023/12/15 05:40:48 gRPC Server listening at 127.0.0.1:45083
5:40AM DBG GRPC Service Ready
5:40AM DBG GRPC: Loading model with options: {state:{NoUnkeyedLiterals:{} DoNotCompare:[] DoNotCopy:[] atomicMessageInfo:<nil>} sizeCache:0 unknownFields:[] Model:llama-2-7b-chat.Q4_K_M.gguf ContextSize:4096 Seed:0 NBatch:512 F16Memory:false MLock:false MMap:false VocabOnly:false LowVRAM:false Embeddings:false NUMA:false NGPULayers:0 MainGPU: TensorSplit: Threads:4 LibrarySearchPath: RopeFreqBase:0 RopeFreqScale:0 RMSNormEps:0 NGQA:0 ModelFile:/models/llama-2-7b-chat.Q4_K_M.gguf Device: UseTriton:false ModelBaseName: UseFastTokenizer:false PipelineType: SchedulerType: CUDA:false CFGScale:0 IMG2IMG:false CLIPModel: CLIPSubfolder: CLIPSkip:0 Tokenizer: LoraBase: LoraAdapter: LoraScale:0 NoMulMatQ:false DraftModel: AudioPath: Quantization: MMProj: RopeScaling: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0}
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr create_gpt_params: loading model /models/llama-2-7b-chat.Q4_K_M.gguf
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr SIGILL: illegal instruction
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr PC=0x86853a m=0 sigcode=2
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr signal arrived during cgo execution
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr instruction bytes: 0xc4 0xe2 0x79 0x13 0xc9 0xc5 0xf2 0x59 0x15 0x3d 0x79 0x23 0x0 0xc4 0x81 0x7a
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr 
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr goroutine 34 [syscall]:
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr runtime.cgocall(0x821ae0, 0xc00014f4d8)
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr    /opt/hostedtoolcache/go/1.21.4/x64/src/runtime/cgocall.go:157 +0x4b fp=0xc00014f4b0 sp=0xc00014f478 pc=0x4176eb
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr github.com/go-skynet/go-llama%2ecpp._Cfunc_load_model(0xee9460, 0x1000, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x200, ...)
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr    _cgo_gotypes.go:266 +0x4f fp=0xc00014f4d8 sp=0xc00014f4b0 pc=0x8143af
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr github.com/go-skynet/go-llama%2ecpp.New({0xc000178000, 0x23}, {0xc000110240, 0x7, 0x926460?})
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr    /home/runner/work/LocalAI/LocalAI/sources/go-llama/llama.go:39 +0x385 fp=0xc00014f6e8 sp=0xc00014f4d8 pc=0x814da5
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr main.(*LLM).Load(0xc000012630, 0xc000148000)
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr    /home/runner/work/LocalAI/LocalAI/backend/go/llm/llama/llama.go:87 +0xc9c fp=0xc00014f900 sp=0xc00014f6e8 pc=0x81ed1c
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr github.com/go-skynet/LocalAI/pkg/grpc.(*server).LoadModel(0xc00002ad90, {0xc000148000?, 0x50a886?}, 0x0?)
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr    /home/runner/work/LocalAI/LocalAI/pkg/grpc/server.go:50 +0xe6 fp=0xc00014f9b0 sp=0xc00014f900 pc=0x81c566
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr github.com/go-skynet/LocalAI/pkg/grpc/proto._Backend_LoadModel_Handler({0x997880?, 0xc00002ad90}, {0xa7e610, 0xc00010e390}, 0xc000114100, 0x0)
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr    /home/runner/work/LocalAI/LocalAI/pkg/grpc/proto/backend_grpc.pb.go:264 +0x169 fp=0xc00014fa08 sp=0xc00014f9b0 pc=0x809829
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr google.golang.org/grpc.(*Server).processUnaryRPC(0xc0001ee1e0, {0xa7e610, 0xc00010e2d0}, {0xa81b38, 0xc0001e9040}, 0xc00013e000, 0xc0001f4d20, 0xd924b0, 0x0)
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr    /home/runner/go/pkg/mod/google.golang.org/[email protected]/server.go:1343 +0xe03 fp=0xc00014fdf0 sp=0xc00014fa08 pc=0x7f27c3
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr google.golang.org/grpc.(*Server).handleStream(0xc0001ee1e0, {0xa81b38, 0xc0001e9040}, 0xc00013e000)
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr    /home/runner/go/pkg/mod/google.golang.org/[email protected]/server.go:1737 +0xc4c fp=0xc00014ff78 sp=0xc00014fdf0 pc=0x7f772c
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr google.golang.org/grpc.(*Server).serveStreams.func1.1()
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr    /home/runner/go/pkg/mod/google.golang.org/[email protected]/server.go:986 +0x86 fp=0xc00014ffe0 sp=0xc00014ff78 pc=0x7f06c6
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr runtime.goexit()
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr    /opt/hostedtoolcache/go/1.21.4/x64/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc00014ffe8 sp=0xc00014ffe0 pc=0x47aa01
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr created by google.golang.org/grpc.(*Server).serveStreams.func1 in goroutine 13
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr    /home/runner/go/pkg/mod/google.golang.org/[email protected]/server.go:997 +0x145
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr 
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr goroutine 1 [IO wait]:
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr runtime.gopark(0x4c6b50?, 0xc0001dfb28?, 0x78?, 0xfb?, 0x4e6edd?)
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr    /opt/hostedtoolcache/go/1.21.4/x64/src/runtime/proc.go:398 +0xce fp=0xc0001dfb08 sp=0xc0001dfae8 pc=0x44be4e
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr runtime.netpollblock(0x478a72?, 0x416e86?, 0x0?)
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr    /opt/hostedtoolcache/go/1.21.4/x64/src/runtime/netpoll.go:564 +0xf7 fp=0xc0001dfb40 sp=0xc0001dfb08 pc=0x4448d7
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr internal/poll.runtime_pollWait(0x148304689eb0, 0x72)
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr    /opt/hostedtoolcache/go/1.21.4/x64/src/runtime/netpoll.go:343 +0x85 fp=0xc0001dfb60 sp=0xc0001dfb40 pc=0x475925
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr internal/poll.(*pollDesc).wait(0xc0001a6680?, 0x4?, 0x0)
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr    /opt/hostedtoolcache/go/1.21.4/x64/src/internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc0001dfb88 sp=0xc0001dfb60 pc=0x4dfb47
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr internal/poll.(*pollDesc).waitRead(...)
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr    /opt/hostedtoolcache/go/1.21.4/x64/src/internal/poll/fd_poll_runtime.go:89
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr internal/poll.(*FD).Accept(0xc0001a6680)
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr    /opt/hostedtoolcache/go/1.21.4/x64/src/internal/poll/fd_unix.go:611 +0x2ac fp=0xc0001dfc30 sp=0xc0001dfb88 pc=0x4e502c
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr net.(*netFD).accept(0xc0001a6680)
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr    /opt/hostedtoolcache/go/1.21.4/x64/src/net/fd_unix.go:172 +0x29 fp=0xc0001dfce8 sp=0xc0001dfc30 pc=0x640b09
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr net.(*TCPListener).accept(0xc0000aa4c0)
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr    /opt/hostedtoolcache/go/1.21.4/x64/src/net/tcpsock_posix.go:152 +0x1e fp=0xc0001dfd10 sp=0xc0001dfce8 pc=0x657abe
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr net.(*TCPListener).Accept(0xc0000aa4c0)
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr    /opt/hostedtoolcache/go/1.21.4/x64/src/net/tcpsock.go:315 +0x30 fp=0xc0001dfd40 sp=0xc0001dfd10 pc=0x656c70
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr google.golang.org/grpc.(*Server).Serve(0xc0001ee1e0, {0xa7dc20?, 0xc0000aa4c0})
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr    /home/runner/go/pkg/mod/google.golang.org/[email protected]/server.go:852 +0x462 fp=0xc0001dfe80 sp=0xc0001dfd40 pc=0x7ef322
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr github.com/go-skynet/LocalAI/pkg/grpc.StartServer({0x7ffd1517cf51?, 0xc0000241c0?}, {0xa82260?, 0xc000012630})
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr    /home/runner/work/LocalAI/LocalAI/pkg/grpc/server.go:178 +0x17d fp=0xc0001dff10 sp=0xc0001dfe80 pc=0x81df5d
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr main.main()
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr    /home/runner/work/LocalAI/LocalAI/backend/go/llm/llama/main.go:20 +0x85 fp=0xc0001dff40 sp=0xc0001dff10 pc=0x8212c5
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr runtime.main()
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr    /opt/hostedtoolcache/go/1.21.4/x64/src/runtime/proc.go:267 +0x2bb fp=0xc0001dffe0 sp=0xc0001dff40 pc=0x44b9fb
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr runtime.goexit()
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr    /opt/hostedtoolcache/go/1.21.4/x64/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc0001dffe8 sp=0xc0001dffe0 pc=0x47aa01
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr 
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr goroutine 2 [force gc (idle)]:
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr    /opt/hostedtoolcache/go/1.21.4/x64/src/runtime/proc.go:398 +0xce fp=0xc00008afa8 sp=0xc00008af88 pc=0x44be4e
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr runtime.goparkunlock(...)
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr    /opt/hostedtoolcache/go/1.21.4/x64/src/runtime/proc.go:404
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr runtime.forcegchelper()
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr    /opt/hostedtoolcache/go/1.21.4/x64/src/runtime/proc.go:322 +0xb3 fp=0xc00008afe0 sp=0xc00008afa8 pc=0x44bcd3
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr runtime.goexit()
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr    /opt/hostedtoolcache/go/1.21.4/x64/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc00008afe8 sp=0xc00008afe0 pc=0x47aa01
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr created by runtime.init.6 in goroutine 1
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr    /opt/hostedtoolcache/go/1.21.4/x64/src/runtime/proc.go:310 +0x1a
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr 
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr goroutine 3 [GC sweep wait]:
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr    /opt/hostedtoolcache/go/1.21.4/x64/src/runtime/proc.go:398 +0xce fp=0xc00008b778 sp=0xc00008b758 pc=0x44be4e
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr runtime.goparkunlock(...)
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr    /opt/hostedtoolcache/go/1.21.4/x64/src/runtime/proc.go:404
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr runtime.bgsweep(0x0?)
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr    /opt/hostedtoolcache/go/1.21.4/x64/src/runtime/mgcsweep.go:280 +0x94 fp=0xc00008b7c8 sp=0xc00008b778 pc=0x437d54
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr runtime.gcenable.func1()
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr    /opt/hostedtoolcache/go/1.21.4/x64/src/runtime/mgc.go:200 +0x25 fp=0xc00008b7e0 sp=0xc00008b7c8 pc=0x42cf25
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr runtime.goexit()
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr    /opt/hostedtoolcache/go/1.21.4/x64/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc00008b7e8 sp=0xc00008b7e0 pc=0x47aa01
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr created by runtime.gcenable in goroutine 1
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr    /opt/hostedtoolcache/go/1.21.4/x64/src/runtime/mgc.go:200 +0x66
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr 
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr goroutine 4 [GC scavenge wait]:
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr runtime.gopark(0xc0000b4000?, 0xa76dc8?, 0x1?, 0x0?, 0xc0000071e0?)
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr    /opt/hostedtoolcache/go/1.21.4/x64/src/runtime/proc.go:398 +0xce fp=0xc00008bf70 sp=0xc00008bf50 pc=0x44be4e
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr runtime.goparkunlock(...)
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr    /opt/hostedtoolcache/go/1.21.4/x64/src/runtime/proc.go:404
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr runtime.(*scavengerState).park(0xddb960)
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr    /opt/hostedtoolcache/go/1.21.4/x64/src/runtime/mgcscavenge.go:425 +0x49 fp=0xc00008bfa0 sp=0xc00008bf70 pc=0x435629
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr runtime.bgscavenge(0x0?)
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr    /opt/hostedtoolcache/go/1.21.4/x64/src/runtime/mgcscavenge.go:653 +0x3c fp=0xc00008bfc8 sp=0xc00008bfa0 pc=0x435bbc
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr runtime.gcenable.func2()
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr    /opt/hostedtoolcache/go/1.21.4/x64/src/runtime/mgc.go:201 +0x25 fp=0xc00008bfe0 sp=0xc00008bfc8 pc=0x42cec5
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr runtime.goexit()
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr    /opt/hostedtoolcache/go/1.21.4/x64/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc00008bfe8 sp=0xc00008bfe0 pc=0x47aa01
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr created by runtime.gcenable in goroutine 1
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr    /opt/hostedtoolcache/go/1.21.4/x64/src/runtime/mgc.go:201 +0xa5
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr 
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr goroutine 5 [finalizer wait]:
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr runtime.gopark(0x9c1d00?, 0x10044cf01?, 0x0?, 0x0?, 0x454005?)
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr    /opt/hostedtoolcache/go/1.21.4/x64/src/runtime/proc.go:398 +0xce fp=0xc00008a628 sp=0xc00008a608 pc=0x44be4e
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr runtime.runfinq()
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr    /opt/hostedtoolcache/go/1.21.4/x64/src/runtime/mfinal.go:193 +0x107 fp=0xc00008a7e0 sp=0xc00008a628 pc=0x42bfa7
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr runtime.goexit()
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr    /opt/hostedtoolcache/go/1.21.4/x64/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc00008a7e8 sp=0xc00008a7e0 pc=0x47aa01
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr created by runtime.createfing in goroutine 1
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr    /opt/hostedtoolcache/go/1.21.4/x64/src/runtime/mfinal.go:163 +0x3d
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr 
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr goroutine 11 [select]:
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr runtime.gopark(0xc000129f00?, 0x2?, 0x0?, 0x0?, 0xc000129ecc?)
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr    /opt/hostedtoolcache/go/1.21.4/x64/src/runtime/proc.go:398 +0xce fp=0xc000129d78 sp=0xc000129d58 pc=0x44be4e
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr runtime.selectgo(0xc000129f00, 0xc000129ec8, 0xc000129ee8?, 0x0, 0x95f7a0?, 0x1)
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr    /opt/hostedtoolcache/go/1.21.4/x64/src/runtime/select.go:327 +0x725 fp=0xc000129e98 sp=0xc000129d78 pc=0x45b8a5
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr google.golang.org/grpc/internal/transport.(*controlBuffer).get(0xc0000c25f0, 0x1)
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr    /home/runner/go/pkg/mod/google.golang.org/[email protected]/internal/transport/controlbuf.go:418 +0x113 fp=0xc000129f30 sp=0xc000129e98 pc=0x768893
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr google.golang.org/grpc/internal/transport.(*loopyWriter).run(0xc000116070)
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr    /home/runner/go/pkg/mod/google.golang.org/[email protected]/internal/transport/controlbuf.go:552 +0x86 fp=0xc000129f90 sp=0xc000129f30 pc=0x768fc6
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr google.golang.org/grpc/internal/transport.NewServerTransport.func2()
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr    /home/runner/go/pkg/mod/google.golang.org/[email protected]/internal/transport/http2_server.go:336 +0xd5 fp=0xc000129fe0 sp=0xc000129f90 pc=0x77f815
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr runtime.goexit()
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr    /opt/hostedtoolcache/go/1.21.4/x64/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc000129fe8 sp=0xc000129fe0 pc=0x47aa01
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr created by google.golang.org/grpc/internal/transport.NewServerTransport in goroutine 10
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr    /home/runner/go/pkg/mod/google.golang.org/[email protected]/internal/transport/http2_server.go:333 +0x1acc
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr 
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr goroutine 12 [select]:
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr runtime.gopark(0xc00008df70?, 0x4?, 0x0?, 0xa6?, 0xc00008dec0?)
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr    /opt/hostedtoolcache/go/1.21.4/x64/src/runtime/proc.go:398 +0xce fp=0xc00008dd28 sp=0xc00008dd08 pc=0x44be4e
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr runtime.selectgo(0xc00008df70, 0xc00008deb8, 0x0?, 0x0, 0x0?, 0x1)
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr    /opt/hostedtoolcache/go/1.21.4/x64/src/runtime/select.go:327 +0x725 fp=0xc00008de48 sp=0xc00008dd28 pc=0x45b8a5
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr google.golang.org/grpc/internal/transport.(*http2Server).keepalive(0xc0001e9040)
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr    /home/runner/go/pkg/mod/google.golang.org/[email protected]/internal/transport/http2_server.go:1152 +0x225 fp=0xc00008dfc8 sp=0xc00008de48 pc=0x786ac5
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr google.golang.org/grpc/internal/transport.NewServerTransport.func4()
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr    /home/runner/go/pkg/mod/google.golang.org/[email protected]/internal/transport/http2_server.go:339 +0x25 fp=0xc00008dfe0 sp=0xc00008dfc8 pc=0x77f705
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr runtime.goexit()
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr    /opt/hostedtoolcache/go/1.21.4/x64/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc00008dfe8 sp=0xc00008dfe0 pc=0x47aa01
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr created by google.golang.org/grpc/internal/transport.NewServerTransport in goroutine 10
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr    /home/runner/go/pkg/mod/google.golang.org/[email protected]/internal/transport/http2_server.go:339 +0x1b0e
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr 
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr goroutine 13 [IO wait]:
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr runtime.gopark(0xdf3ac0?, 0xb?, 0x0?, 0x0?, 0x6?)
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr    /opt/hostedtoolcache/go/1.21.4/x64/src/runtime/proc.go:398 +0xce fp=0xc0000a0aa0 sp=0xc0000a0a80 pc=0x44be4e
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr runtime.netpollblock(0x4c4dd8?, 0x416e86?, 0x0?)
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr    /opt/hostedtoolcache/go/1.21.4/x64/src/runtime/netpoll.go:564 +0xf7 fp=0xc0000a0ad8 sp=0xc0000a0aa0 pc=0x4448d7
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr internal/poll.runtime_pollWait(0x148304689db8, 0x72)
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr    /opt/hostedtoolcache/go/1.21.4/x64/src/runtime/netpoll.go:343 +0x85 fp=0xc0000a0af8 sp=0xc0000a0ad8 pc=0x475925
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr internal/poll.(*pollDesc).wait(0xc0001a6800?, 0xc000118000?, 0x0)
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr    /opt/hostedtoolcache/go/1.21.4/x64/src/internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc0000a0b20 sp=0xc0000a0af8 pc=0x4dfb47
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr internal/poll.(*pollDesc).waitRead(...)
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr    /opt/hostedtoolcache/go/1.21.4/x64/src/internal/poll/fd_poll_runtime.go:89
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr internal/poll.(*FD).Read(0xc0001a6800, {0xc000118000, 0x8000, 0x8000})
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr    /opt/hostedtoolcache/go/1.21.4/x64/src/internal/poll/fd_unix.go:164 +0x27a fp=0xc0000a0bb8 sp=0xc0000a0b20 pc=0x4e0e3a
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr net.(*netFD).Read(0xc0001a6800, {0xc000118000?, 0x1060100000000?, 0x8?})
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr    /opt/hostedtoolcache/go/1.21.4/x64/src/net/fd_posix.go:55 +0x25 fp=0xc0000a0c00 sp=0xc0000a0bb8 pc=0x63eae5
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr net.(*conn).Read(0xc00008e310, {0xc000118000?, 0xc0000a0c90?, 0x3?})
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr    /opt/hostedtoolcache/go/1.21.4/x64/src/net/net.go:179 +0x45 fp=0xc0000a0c48 sp=0xc0000a0c00 pc=0x64f1e5
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr net.(*TCPConn).Read(0x0?, {0xc000118000?, 0xc0000a0ca0?, 0x469d2d?})
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr    <autogenerated>:1 +0x25 fp=0xc0000a0c78 sp=0xc0000a0c48 pc=0x661985
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr bufio.(*Reader).Read(0xc0000b93e0, {0xc0001c8120, 0x9, 0xc1571798b5839f33?})
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr    /opt/hostedtoolcache/go/1.21.4/x64/src/bufio/bufio.go:244 +0x197 fp=0xc0000a0cb0 sp=0xc0000a0c78 pc=0x5b9f97
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr io.ReadAtLeast({0xa7b640, 0xc0000b93e0}, {0xc0001c8120, 0x9, 0x9}, 0x9)
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr    /opt/hostedtoolcache/go/1.21.4/x64/src/io/io.go:335 +0x90 fp=0xc0000a0cf8 sp=0xc0000a0cb0 pc=0x4befd0
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr io.ReadFull(...)
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr    /opt/hostedtoolcache/go/1.21.4/x64/src/io/io.go:354
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr golang.org/x/net/http2.readFrameHeader({0xc0001c8120, 0x9, 0xc000028600?}, {0xa7b640?, 0xc0000b93e0?})
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr    /home/runner/go/pkg/mod/golang.org/x/[email protected]/http2/frame.go:237 +0x65 fp=0xc0000a0d48 sp=0xc0000a0cf8 pc=0x755305
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr golang.org/x/net/http2.(*Framer).ReadFrame(0xc0001c80e0)
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr    /home/runner/go/pkg/mod/golang.org/x/[email protected]/http2/frame.go:498 +0x85 fp=0xc0000a0df0 sp=0xc0000a0d48 pc=0x755a45
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr google.golang.org/grpc/internal/transport.(*http2Server).HandleStreams(0xc0001e9040, 0x1?)
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr    /home/runner/go/pkg/mod/google.golang.org/[email protected]/internal/transport/http2_server.go:636 +0x145 fp=0xc0000a0f00 sp=0xc0000a0df0 pc=0x782965
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr google.golang.org/grpc.(*Server).serveStreams(0xc0001ee1e0, {0xa81b38?, 0xc0001e9040})
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr    /home/runner/go/pkg/mod/google.golang.org/[email protected]/server.go:979 +0x1c2 fp=0xc0000a0f80 sp=0xc0000a0f00 pc=0x7f0462
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr google.golang.org/grpc.(*Server).handleRawConn.func1()
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr    /home/runner/go/pkg/mod/google.golang.org/[email protected]/server.go:920 +0x45 fp=0xc0000a0fe0 sp=0xc0000a0f80 pc=0x7efcc5
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr runtime.goexit()
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr    /opt/hostedtoolcache/go/1.21.4/x64/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc0000a0fe8 sp=0xc0000a0fe0 pc=0x47aa01
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr created by google.golang.org/grpc.(*Server).handleRawConn in goroutine 10
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr    /home/runner/go/pkg/mod/google.golang.org/[email protected]/server.go:919 +0x185
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr 
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr rax    0x0
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr rbx    0xe4ffc0
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr rcx    0x18
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr rdx    0x3dc65656
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr rdi    0x627541
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr rsi    0x7ec52
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr rbp    0xe6ffc0
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr rsp    0x7ffd1517b710
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr r8     0x23
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr r9     0x0
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr r10    0x7ffd151a1080
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr r11    0x3dc65656
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr r12    0xe8ffc0
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr r13    0xeaffc0
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr r14    0xe0ffc0
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr r15    0x0
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr rip    0x86853a
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr rflags 0x10202
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr cs     0x33
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr fs     0x0
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr gs     0x0
[172.17.0.1]:59270 500 - POST /v1/chat/completions

Are you willing to submit PRs to contribute to this bug fix?

  • Yes, I am willing to implement it.

[REQ] support for finetuning

What kind of request is this?

New feature

What is your request or suggestion?

No response

Are you willing to submit PRs to contribute to this feature request?

  • Yes, I am willing to implement it.

[BUG] exllama.py: missing port in address

Expected Behavior

No response

Actual Behavior

3:17AM ERR Failed starting/connecting to the gRPC service: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp: address /tmp/localai/backend/python/exllama/exllama.py: missing port in address"

Steps To Reproduce

https://github.com/sozercan/aikit/actions/runs/7523782972/job/20477643249#step:11:76

Are you willing to submit PRs to contribute to this bug fix?

  • Yes, I am willing to implement it.

[REQ] arm support

What kind of request is this?

Improvement of existing experience

What is your request or suggestion?

image this occour when trying to use with arm

Are you willing to submit PRs to contribute to this feature request?

  • Yes, I am willing to implement it.

[REQ] migrate from pip to uv

What kind of request is this?

Improvement of existing experience

What is your request or suggestion?

https://github.com/astral-sh/uv

  • finetuning
  • exllama, exllamav2
  • mamba

Are you willing to submit PRs to contribute to this feature request?

  • Yes, I am willing to implement it.

[BUG] Got a connection error when I send a request

Expected Behavior

Returns a valid response and not a timeout error

Actual Behavior

Ends up by returning a timeout error. I am unsure if it's a bug or something I am doing wrong, but the example is simple, so maybe I am missing some parameters. The example is simple, so maybe I am missing some parameters.

Running on Docker Engine 20.10.14 on Mac OS.

Steps To Reproduce

#syntax=ghcr.io/sozercan/aikit:latest
apiVersion: v1alpha1
models:
  - name: uncased-sentiment
    source: https://github.com/lordofthejars/bert-base-multilingual-uncased-sentiment_onnx/releases/download/1.0.0/model.onnx 

Then I build:

docker buildx create --use --name aikit-builder

docker buildx build . -t my-model -f aikitfile.yaml --load

Then I start it:

docker run -ti --rm -p 8080:8080 my-model

Finally I curl the model:

curl http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{
     "model": "uncased-sentiment",
     "messages": [{"role": "user", "content": "It is so good"}]
   }'

And in the container console I got the following error:

rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:39863: connect: connection refused"
rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:39839: connect: connection refused"
rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:33679: connect: connection refused"
```

Any idea why this is happening? The example is a simple, not complicated lifecycle.

### Are you willing to submit PRs to contribute to this bug fix?

- [ ] Yes, I am willing to implement it.

[REQ] create status.d metadata files for llama backend cuda images

What kind of request is this?

None

What is your request or suggestion?

currently aikit generates packages metadata as a file (debian standard) for cuda images. since images are based on distroless (which uses status.d folder), this creates a mixed package metadata environment

this only affects cuda images since they install cuda libraries from nvidia
https://explore.ggcr.dev/layers/ghcr.io/sozercan/llama2@sha256:8d388f7af641fa199639d78a99f655f3c80d41332f44561dfbbc3391db21b31c/var/lib/dpkg/status

convert status file to files in status.d dir
https://explore.ggcr.dev/layers/ghcr.io/sozercan/llama2@sha256:8d388f7af641fa199639d78a99f655f3c80d41332f44561dfbbc3391db21b31c/var/lib/dpkg/status.d/

Are you willing to submit PRs to contribute to this feature request?

  • Yes, I am willing to implement it.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.