sozercan / aikit Goto Github PK
View Code? Open in Web Editor NEW๐๏ธ Fine-tune, build, and deploy open-source LLMs easily!
Home Page: https://sozercan.github.io/aikit/
License: MIT License
๐๏ธ Fine-tune, build, and deploy open-source LLMs easily!
Home Page: https://sozercan.github.io/aikit/
License: MIT License
Improvement of existing experience
release should trigger update-model action automatically
After building an image with buildx and running it on a container, we should be able to communicate with the LLM and start curl prompt
The LLM isn't responding, returning an error 500 : grpc service not ready
docker run -d --rm -p 8080:8080 my-model
curl http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{
"model": "llama-2-7b-chat",
"messages": [{"role": "user", "content": "explain kubernetes in a sentence"}]
}'
This wait a few seconds, then return {"error":{"code":500,"message":"grpc service not ready","type":""}}
7:19AM DBG no galleries to load
7:19AM INF Starting LocalAI using 4 threads, with models path: /models
7:19AM INF LocalAI version: v2.9.0 (ff88c390bb51d9567572815a63c575eb2e3dd062)
7:19AM INF Preloading models from /models
7:19AM INF Model name: llama-2-7b-chat
7:19AM DBG Model: llama-2-7b-chat (config: {PredictionOptions:{Model:llama-2-7b-chat.Q4_K_M.gguf Language: N:0 TopP:0.7 TopK:80 Temperature:0.2 Maxtokens:0 Echo:false Batch:0 F16:false IgnoreEOS:false RepeatPenalty:0 Keep:0 MirostatETA:0 MirostatTAU:0 Mirostat:0 FrequencyPenalty:0 TFZ:0 TypicalP:0 Seed:0 NegativePrompt: RopeFreqBase:0 RopeFreqScale:0 NegativePromptScale:0 UseFastTokenizer:false ClipSkip:0 Tokenizer:} Name:llama-2-7b-chat F16:false Threads:0 Debug:false Roles:map[assistant:Assistant: assistant_function_call:Function Call: function:Function Result: system:System: user:User:] Embeddings:false Backend:llama TemplateConfig:{Chat: ChatMessage:llama-2-7b-chat Completion: Edit: Functions:} PromptStrings:[] InputStrings:[] InputToken:[] functionCallString: functionCallNameString: FunctionsConfig:{DisableNoAction:false NoActionFunctionName: NoActionDescriptionName: ParallelCalls:false} FeatureFlag:map[] LLMConfig:{SystemPrompt:You are a helpful assistant, below is a conversation, please respond with the next message and do not ask follow-up questions TensorSplit: MainGPU: RMSNormEps:0 NGQA:0 PromptCachePath: PromptCacheAll:false PromptCacheRO:false MirostatETA:0 MirostatTAU:0 Mirostat:0 NGPULayers:0 MMap:false MMlock:false LowVRAM:false Grammar: StopWords:[] Cutstrings:[] TrimSpace:[] TrimSuffix:[] ContextSize:4096 NUMA:false LoraAdapter: LoraBase: LoraScale:0 NoMulMatQ:false DraftModel: NDraft:0 Quantization: MMProj: RopeScaling: ModelType: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0} AutoGPTQ:{ModelBaseName: Device: Triton:false UseFastTokenizer:false} Diffusers:{CUDA:false PipelineType: SchedulerType: EnableParameters: CFGScale:0 IMG2IMG:false ClipSkip:0 ClipModel: ClipSubFolder: ControlNet:} Step:0 GRPC:{Attempts:0 AttemptsSleepTime:0} VallE:{AudioPath:} CUDA:false DownloadFiles:[] Description: Usage:})
7:19AM DBG Extracting backend assets files to /tmp/localai/backend_data
7:19AM DBG No uploadedFiles file found at /tmp/localai/upload/uploadedFiles.json
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Fiber v2.50.0 โ
โ http://127.0.0.1:8080 โ
โ (bound on host 0.0.0.0 and port 8080) โ
โ โ
โ Handlers ........... 105 Processes ........... 1 โ
โ Prefork ....... Disabled PID ................. 1 โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
7:20AM DBG Request received:
7:20AM DBG Configuration read: &{PredictionOptions:{Model:llama-2-7b-chat.Q4_K_M.gguf Language: N:0 TopP:0.7 TopK:80 Temperature:0.2 Maxtokens:0 Echo:false Batch:0 F16:false IgnoreEOS:false RepeatPenalty:0 Keep:0 MirostatETA:0 MirostatTAU:0 Mirostat:0 FrequencyPenalty:0 TFZ:0 TypicalP:0 Seed:0 NegativePrompt: RopeFreqBase:0 RopeFreqScale:0 NegativePromptScale:0 UseFastTokenizer:false ClipSkip:0 Tokenizer:} Name:llama-2-7b-chat F16:false Threads:4 Debug:true Roles:map[assistant:Assistant: assistant_function_call:Function Call: function:Function Result: system:System: user:User:] Embeddings:false Backend:llama TemplateConfig:{Chat: ChatMessage:llama-2-7b-chat Completion: Edit: Functions:} PromptStrings:[] InputStrings:[] InputToken:[] functionCallString: functionCallNameString: FunctionsConfig:{DisableNoAction:false NoActionFunctionName: NoActionDescriptionName: ParallelCalls:false} FeatureFlag:map[] LLMConfig:{SystemPrompt:You are a helpful assistant, below is a conversation, please respond with the next message and do not ask follow-up questions TensorSplit: MainGPU: RMSNormEps:0 NGQA:0 PromptCachePath: PromptCacheAll:false PromptCacheRO:false MirostatETA:0 MirostatTAU:0 Mirostat:0 NGPULayers:0 MMap:false MMlock:false LowVRAM:false Grammar: StopWords:[] Cutstrings:[] TrimSpace:[] TrimSuffix:[] ContextSize:4096 NUMA:false LoraAdapter: LoraBase: LoraScale:0 NoMulMatQ:false DraftModel: NDraft:0 Quantization: MMProj: RopeScaling: ModelType: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0} AutoGPTQ:{ModelBaseName: Device: Triton:false UseFastTokenizer:false} Diffusers:{CUDA:false PipelineType: SchedulerType: EnableParameters: CFGScale:0 IMG2IMG:false ClipSkip:0 ClipModel: ClipSubFolder: ControlNet:} Step:0 GRPC:{Attempts:0 AttemptsSleepTime:0} VallE:{AudioPath:} CUDA:false DownloadFiles:[] Description: Usage:}
7:20AM DBG Parameters: &{PredictionOptions:{Model:llama-2-7b-chat.Q4_K_M.gguf Language: N:0 TopP:0.7 TopK:80 Temperature:0.2 Maxtokens:0 Echo:false Batch:0 F16:false IgnoreEOS:false RepeatPenalty:0 Keep:0 MirostatETA:0 MirostatTAU:0 Mirostat:0 FrequencyPenalty:0 TFZ:0 TypicalP:0 Seed:0 NegativePrompt: RopeFreqBase:0 RopeFreqScale:0 NegativePromptScale:0 UseFastTokenizer:false ClipSkip:0 Tokenizer:} Name:llama-2-7b-chat F16:false Threads:4 Debug:true Roles:map[assistant:Assistant: assistant_function_call:Function Call: function:Function Result: system:System: user:User:] Embeddings:false Backend:llama TemplateConfig:{Chat: ChatMessage:llama-2-7b-chat Completion: Edit: Functions:} PromptStrings:[] InputStrings:[] InputToken:[] functionCallString: functionCallNameString: FunctionsConfig:{DisableNoAction:false NoActionFunctionName: NoActionDescriptionName: ParallelCalls:false} FeatureFlag:map[] LLMConfig:{SystemPrompt:You are a helpful assistant, below is a conversation, please respond with the next message and do not ask follow-up questions TensorSplit: MainGPU: RMSNormEps:0 NGQA:0 PromptCachePath: PromptCacheAll:false PromptCacheRO:false MirostatETA:0 MirostatTAU:0 Mirostat:0 NGPULayers:0 MMap:false MMlock:false LowVRAM:false Grammar: StopWords:[] Cutstrings:[] TrimSpace:[] TrimSuffix:[] ContextSize:4096 NUMA:false LoraAdapter: LoraBase: LoraScale:0 NoMulMatQ:false DraftModel: NDraft:0 Quantization: MMProj: RopeScaling: ModelType: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0} AutoGPTQ:{ModelBaseName: Device: Triton:false UseFastTokenizer:false} Diffusers:{CUDA:false PipelineType: SchedulerType: EnableParameters: CFGScale:0 IMG2IMG:false ClipSkip:0 ClipModel: ClipSubFolder: ControlNet:} Step:0 GRPC:{Attempts:0 AttemptsSleepTime:0} VallE:{AudioPath:} CUDA:false DownloadFiles:[] Description: Usage:}
7:20AM DBG templated message for chat:
[INST]
You are a helpful assistant, below is a conversation, please respond with the next message and do not ask follow-up questions
[/INST]
7:20AM DBG Prompt (before templating):
[INST]
You are a helpful assistant, below is a conversation, please respond with the next message and do not ask follow-up questions
[/INST]
7:20AM DBG Prompt (after templating):
[INST]
You are a helpful assistant, below is a conversation, please respond with the next message and do not ask follow-up questions
[/INST]
7:20AM INF Loading model 'llama-2-7b-chat.Q4_K_M.gguf' with backend llama
7:20AM DBG llama-cpp is an alias of llama-cpp
7:20AM DBG Loading model in memory from file: /models/llama-2-7b-chat.Q4_K_M.gguf
7:20AM DBG Loading Model llama-2-7b-chat.Q4_K_M.gguf with gRPC (file: /models/llama-2-7b-chat.Q4_K_M.gguf) (backend: llama-cpp): {backendString:llama model:llama-2-7b-chat.Q4_K_M.gguf threads:4 assetDir:/tmp/localai/backend_data context:{emptyCtx:{}} gRPCOptions:0xc0000f2600 externalBackends:map[] grpcAttempts:20 grpcAttemptsDelay:2 singleActiveBackend:false parallelRequests:false}
7:20AM DBG Loading GRPC Process: /tmp/localai/backend_data/backend-assets/grpc/llama-cpp
7:20AM DBG GRPC Service for llama-2-7b-chat.Q4_K_M.gguf will be running at: '127.0.0.1:45815'
7:20AM DBG GRPC Service state dir: /tmp/go-processmanager3527343173
7:20AM DBG GRPC Service Started
7:20AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45815): stderr /tmp/localai/backend_data/backend-assets/grpc/llama-cpp: error while loading shared libraries: libcudart.so.12: cannot open shared object file: No such file or directory
7:20AM ERR Failed starting/connecting to the gRPC service: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:45815: connect: connection refused"
7:20AM DBG GRPC Service NOT ready
[172.17.0.1]:39984 500 - POST /v1/chat/completions
#syntax=ghcr.io/sozercan/aikit:latest apiVersion: v1alpha1 debug: true runtime: cuda backends: - stablediffusion models: - name: llama-2-7b-chat source: https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGUF/resolve/main/llama-2-7b-chat.Q4_K_M.gguf sha256: "08a5566d61d7cb6b420c3e4387a39e0078e1f2fe5f055f3a03887385304d4bfa" promptTemplates: - name: "llama-2-7b-chat" template: | {{if eq .RoleName \"assistant\"}}{{.Content}}{{else}} [INST] {{if .SystemPrompt}}{{.SystemPrompt}}{{else if eq .RoleName \"system\"}}<>{{.Content}}<>
{{else if .Content}}{{.Content}}{{end}}
[/INST]
{{end}}
config: |
Please note that :
I'm trying to run llama-2-7b-chat with CUDA
Running on Manjaro (Arch) with all GPU driver, toolkit, etc installed.
New feature
support for interactive chat interface
https://github.com/charmbracelet/bubbletea/blob/master/examples/chat/main.go
First output response should be:
{"created":1701236489,"object":"chat.completion","id":"dd1ff40b-31a7-4418-9e32-42151ab6875a","model":"llama-2-7b-chat","choices":[{"index":0,"finish_reason":"stop","message":{"role":"assistant","content":"\nKubernetes is a container orchestration system that automates the deployment, scaling, and management of containerized applications in a microservices architecture."}}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}
Response received is:
{"error":{"code":500,"message":"could not load model: rpc error: code = Unavailable desc = error reading from server: EOF","type":""}}
docker run -d --rm -p 9000:8080 ghcr.io/sozercan/llama2:7b
(Port 9000 because 8080 is already in use)curl http://localhost:9000/v1/chat/completions -H "Content-Type: application/json" -d '{ "model": "llama-2-7b-chat", "messages": [{"role": "user", "content": "explain kubernetes in a sentence"}] }'
{"error":{"code":500,"message":"could not load model: rpc error: code = Unavailable desc = error reading from server: EOF","type":""}}
===============
Logs upon container boot up ๐
5:37AM DBG no galleries to load
5:37AM INF Starting LocalAI using 4 threads, with models path: /models
5:37AM INF LocalAI version: v2.0.0 (238fec244ae6c9a66bc7fafd76c7e14671110a6f)
5:37AM DBG Model: llama-2-7b-chat (config: {PredictionOptions:{Model:llama-2-7b-chat.Q4_K_M.gguf Language: N:0 TopP:0.7 TopK:80 Temperature:0.2 Maxtokens:0 Echo:false Batch:0 F16:false IgnoreEOS:false RepeatPenalty:0 Keep:0 MirostatETA:0 MirostatTAU:0 Mirostat:0 FrequencyPenalty:0 TFZ:0 TypicalP:0 Seed:0 NegativePrompt: RopeFreqBase:0 RopeFreqScale:0 NegativePromptScale:0 UseFastTokenizer:false ClipSkip:0 Tokenizer:} Name:llama-2-7b-chat F16:false Threads:0 Debug:false Roles:map[] Embeddings:false Backend:llama TemplateConfig:{Chat: ChatMessage: Completion: Edit: Functions:} PromptStrings:[] InputStrings:[] InputToken:[] functionCallString: functionCallNameString: FunctionsConfig:{DisableNoAction:false NoActionFunctionName: NoActionDescriptionName:} FeatureFlag:map[] LLMConfig:{SystemPrompt: TensorSplit: MainGPU: RMSNormEps:0 NGQA:0 PromptCachePath: PromptCacheAll:false PromptCacheRO:false MirostatETA:0 MirostatTAU:0 Mirostat:0 NGPULayers:0 MMap:false MMlock:false LowVRAM:false Grammar: StopWords:[] Cutstrings:[] TrimSpace:[] ContextSize:4096 NUMA:false LoraAdapter: LoraBase: LoraScale:0 NoMulMatQ:false DraftModel: NDraft:0 Quantization: MMProj: RopeScaling: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0} AutoGPTQ:{ModelBaseName: Device: Triton:false UseFastTokenizer:false} Diffusers:{PipelineType: SchedulerType: CUDA:false EnableParameters: CFGScale:0 IMG2IMG:false ClipSkip:0 ClipModel: ClipSubFolder:} Step:0 GRPC:{Attempts:0 AttemptsSleepTime:0} VallE:{AudioPath:}})
5:37AM DBG Extracting backend assets files to /tmp/localai/backend_data
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Fiber v2.50.0 โ
โ http://127.0.0.1:8080 โ
โ (bound on host 0.0.0.0 and port 8080) โ
โ โ
โ Handlers ............ 74 Processes ........... 1 โ
โ Prefork ....... Disabled PID ................. 1 โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Logs upon incoming HTTP request:
5:40AM DBG Request received:
5:40AM DBG Configuration read: &{PredictionOptions:{Model:llama-2-7b-chat.Q4_K_M.gguf Language: N:0 TopP:0.7 TopK:80 Temperature:0.2 Maxtokens:0 Echo:false Batch:0 F16:false IgnoreEOS:false RepeatPenalty:0 Keep:0 MirostatETA:0 MirostatTAU:0 Mirostat:0 FrequencyPenalty:0 TFZ:0 TypicalP:0 Seed:0 NegativePrompt: RopeFreqBase:0 RopeFreqScale:0 NegativePromptScale:0 UseFastTokenizer:false ClipSkip:0 Tokenizer:} Name:llama-2-7b-chat F16:false Threads:4 Debug:true Roles:map[] Embeddings:false Backend:llama TemplateConfig:{Chat: ChatMessage: Completion: Edit: Functions:} PromptStrings:[] InputStrings:[] InputToken:[] functionCallString: functionCallNameString: FunctionsConfig:{DisableNoAction:false NoActionFunctionName: NoActionDescriptionName:} FeatureFlag:map[] LLMConfig:{SystemPrompt: TensorSplit: MainGPU: RMSNormEps:0 NGQA:0 PromptCachePath: PromptCacheAll:false PromptCacheRO:false MirostatETA:0 MirostatTAU:0 Mirostat:0 NGPULayers:0 MMap:false MMlock:false LowVRAM:false Grammar: StopWords:[] Cutstrings:[] TrimSpace:[] ContextSize:4096 NUMA:false LoraAdapter: LoraBase: LoraScale:0 NoMulMatQ:false DraftModel: NDraft:0 Quantization: MMProj: RopeScaling: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0} AutoGPTQ:{ModelBaseName: Device: Triton:false UseFastTokenizer:false} Diffusers:{PipelineType: SchedulerType: CUDA:false EnableParameters: CFGScale:0 IMG2IMG:false ClipSkip:0 ClipModel: ClipSubFolder:} Step:0 GRPC:{Attempts:0 AttemptsSleepTime:0} VallE:{AudioPath:}}
5:40AM DBG Parameters: &{PredictionOptions:{Model:llama-2-7b-chat.Q4_K_M.gguf Language: N:0 TopP:0.7 TopK:80 Temperature:0.2 Maxtokens:0 Echo:false Batch:0 F16:false IgnoreEOS:false RepeatPenalty:0 Keep:0 MirostatETA:0 MirostatTAU:0 Mirostat:0 FrequencyPenalty:0 TFZ:0 TypicalP:0 Seed:0 NegativePrompt: RopeFreqBase:0 RopeFreqScale:0 NegativePromptScale:0 UseFastTokenizer:false ClipSkip:0 Tokenizer:} Name:llama-2-7b-chat F16:false Threads:4 Debug:true Roles:map[] Embeddings:false Backend:llama TemplateConfig:{Chat: ChatMessage: Completion: Edit: Functions:} PromptStrings:[] InputStrings:[] InputToken:[] functionCallString: functionCallNameString: FunctionsConfig:{DisableNoAction:false NoActionFunctionName: NoActionDescriptionName:} FeatureFlag:map[] LLMConfig:{SystemPrompt: TensorSplit: MainGPU: RMSNormEps:0 NGQA:0 PromptCachePath: PromptCacheAll:false PromptCacheRO:false MirostatETA:0 MirostatTAU:0 Mirostat:0 NGPULayers:0 MMap:false MMlock:false LowVRAM:false Grammar: StopWords:[] Cutstrings:[] TrimSpace:[] ContextSize:4096 NUMA:false LoraAdapter: LoraBase: LoraScale:0 NoMulMatQ:false DraftModel: NDraft:0 Quantization: MMProj: RopeScaling: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0} AutoGPTQ:{ModelBaseName: Device: Triton:false UseFastTokenizer:false} Diffusers:{PipelineType: SchedulerType: CUDA:false EnableParameters: CFGScale:0 IMG2IMG:false ClipSkip:0 ClipModel: ClipSubFolder:} Step:0 GRPC:{Attempts:0 AttemptsSleepTime:0} VallE:{AudioPath:}}
5:40AM DBG Prompt (before templating): explain kubernetes in a sentence
5:40AM DBG Template failed loading: failed loading a template for llama-2-7b-chat.Q4_K_M.gguf
5:40AM DBG Prompt (after templating): explain kubernetes in a sentence
5:40AM DBG Loading model llama from llama-2-7b-chat.Q4_K_M.gguf
5:40AM DBG Loading model in memory from file: /models/llama-2-7b-chat.Q4_K_M.gguf
5:40AM DBG Loading Model llama-2-7b-chat.Q4_K_M.gguf with gRPC (file: /models/llama-2-7b-chat.Q4_K_M.gguf) (backend: llama): {backendString:llama model:llama-2-7b-chat.Q4_K_M.gguf threads:4 assetDir:/tmp/localai/backend_data context:{emptyCtx:{}} gRPCOptions:0xc0002a6780 externalBackends:map[] grpcAttempts:20 grpcAttemptsDelay:2 singleActiveBackend:false parallelRequests:false}
5:40AM DBG Loading GRPC Process: /tmp/localai/backend_data/backend-assets/grpc/llama
5:40AM DBG GRPC Service for llama-2-7b-chat.Q4_K_M.gguf will be running at: '127.0.0.1:45083'
5:40AM DBG GRPC Service state dir: /tmp/go-processmanager2300469877
5:40AM DBG GRPC Service Started
rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:45083: connect: connection refused"
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr 2023/12/15 05:40:48 gRPC Server listening at 127.0.0.1:45083
5:40AM DBG GRPC Service Ready
5:40AM DBG GRPC: Loading model with options: {state:{NoUnkeyedLiterals:{} DoNotCompare:[] DoNotCopy:[] atomicMessageInfo:<nil>} sizeCache:0 unknownFields:[] Model:llama-2-7b-chat.Q4_K_M.gguf ContextSize:4096 Seed:0 NBatch:512 F16Memory:false MLock:false MMap:false VocabOnly:false LowVRAM:false Embeddings:false NUMA:false NGPULayers:0 MainGPU: TensorSplit: Threads:4 LibrarySearchPath: RopeFreqBase:0 RopeFreqScale:0 RMSNormEps:0 NGQA:0 ModelFile:/models/llama-2-7b-chat.Q4_K_M.gguf Device: UseTriton:false ModelBaseName: UseFastTokenizer:false PipelineType: SchedulerType: CUDA:false CFGScale:0 IMG2IMG:false CLIPModel: CLIPSubfolder: CLIPSkip:0 Tokenizer: LoraBase: LoraAdapter: LoraScale:0 NoMulMatQ:false DraftModel: AudioPath: Quantization: MMProj: RopeScaling: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0}
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr create_gpt_params: loading model /models/llama-2-7b-chat.Q4_K_M.gguf
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr SIGILL: illegal instruction
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr PC=0x86853a m=0 sigcode=2
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr signal arrived during cgo execution
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr instruction bytes: 0xc4 0xe2 0x79 0x13 0xc9 0xc5 0xf2 0x59 0x15 0x3d 0x79 0x23 0x0 0xc4 0x81 0x7a
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr goroutine 34 [syscall]:
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr runtime.cgocall(0x821ae0, 0xc00014f4d8)
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr /opt/hostedtoolcache/go/1.21.4/x64/src/runtime/cgocall.go:157 +0x4b fp=0xc00014f4b0 sp=0xc00014f478 pc=0x4176eb
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr github.com/go-skynet/go-llama%2ecpp._Cfunc_load_model(0xee9460, 0x1000, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x200, ...)
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr _cgo_gotypes.go:266 +0x4f fp=0xc00014f4d8 sp=0xc00014f4b0 pc=0x8143af
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr github.com/go-skynet/go-llama%2ecpp.New({0xc000178000, 0x23}, {0xc000110240, 0x7, 0x926460?})
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr /home/runner/work/LocalAI/LocalAI/sources/go-llama/llama.go:39 +0x385 fp=0xc00014f6e8 sp=0xc00014f4d8 pc=0x814da5
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr main.(*LLM).Load(0xc000012630, 0xc000148000)
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr /home/runner/work/LocalAI/LocalAI/backend/go/llm/llama/llama.go:87 +0xc9c fp=0xc00014f900 sp=0xc00014f6e8 pc=0x81ed1c
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr github.com/go-skynet/LocalAI/pkg/grpc.(*server).LoadModel(0xc00002ad90, {0xc000148000?, 0x50a886?}, 0x0?)
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr /home/runner/work/LocalAI/LocalAI/pkg/grpc/server.go:50 +0xe6 fp=0xc00014f9b0 sp=0xc00014f900 pc=0x81c566
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr github.com/go-skynet/LocalAI/pkg/grpc/proto._Backend_LoadModel_Handler({0x997880?, 0xc00002ad90}, {0xa7e610, 0xc00010e390}, 0xc000114100, 0x0)
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr /home/runner/work/LocalAI/LocalAI/pkg/grpc/proto/backend_grpc.pb.go:264 +0x169 fp=0xc00014fa08 sp=0xc00014f9b0 pc=0x809829
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr google.golang.org/grpc.(*Server).processUnaryRPC(0xc0001ee1e0, {0xa7e610, 0xc00010e2d0}, {0xa81b38, 0xc0001e9040}, 0xc00013e000, 0xc0001f4d20, 0xd924b0, 0x0)
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr /home/runner/go/pkg/mod/google.golang.org/[email protected]/server.go:1343 +0xe03 fp=0xc00014fdf0 sp=0xc00014fa08 pc=0x7f27c3
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr google.golang.org/grpc.(*Server).handleStream(0xc0001ee1e0, {0xa81b38, 0xc0001e9040}, 0xc00013e000)
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr /home/runner/go/pkg/mod/google.golang.org/[email protected]/server.go:1737 +0xc4c fp=0xc00014ff78 sp=0xc00014fdf0 pc=0x7f772c
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr google.golang.org/grpc.(*Server).serveStreams.func1.1()
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr /home/runner/go/pkg/mod/google.golang.org/[email protected]/server.go:986 +0x86 fp=0xc00014ffe0 sp=0xc00014ff78 pc=0x7f06c6
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr runtime.goexit()
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr /opt/hostedtoolcache/go/1.21.4/x64/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc00014ffe8 sp=0xc00014ffe0 pc=0x47aa01
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr created by google.golang.org/grpc.(*Server).serveStreams.func1 in goroutine 13
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr /home/runner/go/pkg/mod/google.golang.org/[email protected]/server.go:997 +0x145
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr goroutine 1 [IO wait]:
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr runtime.gopark(0x4c6b50?, 0xc0001dfb28?, 0x78?, 0xfb?, 0x4e6edd?)
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr /opt/hostedtoolcache/go/1.21.4/x64/src/runtime/proc.go:398 +0xce fp=0xc0001dfb08 sp=0xc0001dfae8 pc=0x44be4e
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr runtime.netpollblock(0x478a72?, 0x416e86?, 0x0?)
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr /opt/hostedtoolcache/go/1.21.4/x64/src/runtime/netpoll.go:564 +0xf7 fp=0xc0001dfb40 sp=0xc0001dfb08 pc=0x4448d7
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr internal/poll.runtime_pollWait(0x148304689eb0, 0x72)
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr /opt/hostedtoolcache/go/1.21.4/x64/src/runtime/netpoll.go:343 +0x85 fp=0xc0001dfb60 sp=0xc0001dfb40 pc=0x475925
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr internal/poll.(*pollDesc).wait(0xc0001a6680?, 0x4?, 0x0)
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr /opt/hostedtoolcache/go/1.21.4/x64/src/internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc0001dfb88 sp=0xc0001dfb60 pc=0x4dfb47
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr internal/poll.(*pollDesc).waitRead(...)
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr /opt/hostedtoolcache/go/1.21.4/x64/src/internal/poll/fd_poll_runtime.go:89
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr internal/poll.(*FD).Accept(0xc0001a6680)
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr /opt/hostedtoolcache/go/1.21.4/x64/src/internal/poll/fd_unix.go:611 +0x2ac fp=0xc0001dfc30 sp=0xc0001dfb88 pc=0x4e502c
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr net.(*netFD).accept(0xc0001a6680)
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr /opt/hostedtoolcache/go/1.21.4/x64/src/net/fd_unix.go:172 +0x29 fp=0xc0001dfce8 sp=0xc0001dfc30 pc=0x640b09
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr net.(*TCPListener).accept(0xc0000aa4c0)
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr /opt/hostedtoolcache/go/1.21.4/x64/src/net/tcpsock_posix.go:152 +0x1e fp=0xc0001dfd10 sp=0xc0001dfce8 pc=0x657abe
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr net.(*TCPListener).Accept(0xc0000aa4c0)
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr /opt/hostedtoolcache/go/1.21.4/x64/src/net/tcpsock.go:315 +0x30 fp=0xc0001dfd40 sp=0xc0001dfd10 pc=0x656c70
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr google.golang.org/grpc.(*Server).Serve(0xc0001ee1e0, {0xa7dc20?, 0xc0000aa4c0})
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr /home/runner/go/pkg/mod/google.golang.org/[email protected]/server.go:852 +0x462 fp=0xc0001dfe80 sp=0xc0001dfd40 pc=0x7ef322
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr github.com/go-skynet/LocalAI/pkg/grpc.StartServer({0x7ffd1517cf51?, 0xc0000241c0?}, {0xa82260?, 0xc000012630})
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr /home/runner/work/LocalAI/LocalAI/pkg/grpc/server.go:178 +0x17d fp=0xc0001dff10 sp=0xc0001dfe80 pc=0x81df5d
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr main.main()
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr /home/runner/work/LocalAI/LocalAI/backend/go/llm/llama/main.go:20 +0x85 fp=0xc0001dff40 sp=0xc0001dff10 pc=0x8212c5
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr runtime.main()
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr /opt/hostedtoolcache/go/1.21.4/x64/src/runtime/proc.go:267 +0x2bb fp=0xc0001dffe0 sp=0xc0001dff40 pc=0x44b9fb
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr runtime.goexit()
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr /opt/hostedtoolcache/go/1.21.4/x64/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc0001dffe8 sp=0xc0001dffe0 pc=0x47aa01
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr goroutine 2 [force gc (idle)]:
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr /opt/hostedtoolcache/go/1.21.4/x64/src/runtime/proc.go:398 +0xce fp=0xc00008afa8 sp=0xc00008af88 pc=0x44be4e
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr runtime.goparkunlock(...)
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr /opt/hostedtoolcache/go/1.21.4/x64/src/runtime/proc.go:404
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr runtime.forcegchelper()
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr /opt/hostedtoolcache/go/1.21.4/x64/src/runtime/proc.go:322 +0xb3 fp=0xc00008afe0 sp=0xc00008afa8 pc=0x44bcd3
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr runtime.goexit()
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr /opt/hostedtoolcache/go/1.21.4/x64/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc00008afe8 sp=0xc00008afe0 pc=0x47aa01
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr created by runtime.init.6 in goroutine 1
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr /opt/hostedtoolcache/go/1.21.4/x64/src/runtime/proc.go:310 +0x1a
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr goroutine 3 [GC sweep wait]:
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr /opt/hostedtoolcache/go/1.21.4/x64/src/runtime/proc.go:398 +0xce fp=0xc00008b778 sp=0xc00008b758 pc=0x44be4e
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr runtime.goparkunlock(...)
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr /opt/hostedtoolcache/go/1.21.4/x64/src/runtime/proc.go:404
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr runtime.bgsweep(0x0?)
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr /opt/hostedtoolcache/go/1.21.4/x64/src/runtime/mgcsweep.go:280 +0x94 fp=0xc00008b7c8 sp=0xc00008b778 pc=0x437d54
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr runtime.gcenable.func1()
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr /opt/hostedtoolcache/go/1.21.4/x64/src/runtime/mgc.go:200 +0x25 fp=0xc00008b7e0 sp=0xc00008b7c8 pc=0x42cf25
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr runtime.goexit()
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr /opt/hostedtoolcache/go/1.21.4/x64/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc00008b7e8 sp=0xc00008b7e0 pc=0x47aa01
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr created by runtime.gcenable in goroutine 1
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr /opt/hostedtoolcache/go/1.21.4/x64/src/runtime/mgc.go:200 +0x66
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr goroutine 4 [GC scavenge wait]:
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr runtime.gopark(0xc0000b4000?, 0xa76dc8?, 0x1?, 0x0?, 0xc0000071e0?)
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr /opt/hostedtoolcache/go/1.21.4/x64/src/runtime/proc.go:398 +0xce fp=0xc00008bf70 sp=0xc00008bf50 pc=0x44be4e
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr runtime.goparkunlock(...)
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr /opt/hostedtoolcache/go/1.21.4/x64/src/runtime/proc.go:404
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr runtime.(*scavengerState).park(0xddb960)
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr /opt/hostedtoolcache/go/1.21.4/x64/src/runtime/mgcscavenge.go:425 +0x49 fp=0xc00008bfa0 sp=0xc00008bf70 pc=0x435629
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr runtime.bgscavenge(0x0?)
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr /opt/hostedtoolcache/go/1.21.4/x64/src/runtime/mgcscavenge.go:653 +0x3c fp=0xc00008bfc8 sp=0xc00008bfa0 pc=0x435bbc
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr runtime.gcenable.func2()
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr /opt/hostedtoolcache/go/1.21.4/x64/src/runtime/mgc.go:201 +0x25 fp=0xc00008bfe0 sp=0xc00008bfc8 pc=0x42cec5
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr runtime.goexit()
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr /opt/hostedtoolcache/go/1.21.4/x64/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc00008bfe8 sp=0xc00008bfe0 pc=0x47aa01
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr created by runtime.gcenable in goroutine 1
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr /opt/hostedtoolcache/go/1.21.4/x64/src/runtime/mgc.go:201 +0xa5
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr goroutine 5 [finalizer wait]:
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr runtime.gopark(0x9c1d00?, 0x10044cf01?, 0x0?, 0x0?, 0x454005?)
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr /opt/hostedtoolcache/go/1.21.4/x64/src/runtime/proc.go:398 +0xce fp=0xc00008a628 sp=0xc00008a608 pc=0x44be4e
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr runtime.runfinq()
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr /opt/hostedtoolcache/go/1.21.4/x64/src/runtime/mfinal.go:193 +0x107 fp=0xc00008a7e0 sp=0xc00008a628 pc=0x42bfa7
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr runtime.goexit()
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr /opt/hostedtoolcache/go/1.21.4/x64/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc00008a7e8 sp=0xc00008a7e0 pc=0x47aa01
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr created by runtime.createfing in goroutine 1
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr /opt/hostedtoolcache/go/1.21.4/x64/src/runtime/mfinal.go:163 +0x3d
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr goroutine 11 [select]:
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr runtime.gopark(0xc000129f00?, 0x2?, 0x0?, 0x0?, 0xc000129ecc?)
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr /opt/hostedtoolcache/go/1.21.4/x64/src/runtime/proc.go:398 +0xce fp=0xc000129d78 sp=0xc000129d58 pc=0x44be4e
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr runtime.selectgo(0xc000129f00, 0xc000129ec8, 0xc000129ee8?, 0x0, 0x95f7a0?, 0x1)
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr /opt/hostedtoolcache/go/1.21.4/x64/src/runtime/select.go:327 +0x725 fp=0xc000129e98 sp=0xc000129d78 pc=0x45b8a5
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr google.golang.org/grpc/internal/transport.(*controlBuffer).get(0xc0000c25f0, 0x1)
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr /home/runner/go/pkg/mod/google.golang.org/[email protected]/internal/transport/controlbuf.go:418 +0x113 fp=0xc000129f30 sp=0xc000129e98 pc=0x768893
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr google.golang.org/grpc/internal/transport.(*loopyWriter).run(0xc000116070)
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr /home/runner/go/pkg/mod/google.golang.org/[email protected]/internal/transport/controlbuf.go:552 +0x86 fp=0xc000129f90 sp=0xc000129f30 pc=0x768fc6
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr google.golang.org/grpc/internal/transport.NewServerTransport.func2()
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr /home/runner/go/pkg/mod/google.golang.org/[email protected]/internal/transport/http2_server.go:336 +0xd5 fp=0xc000129fe0 sp=0xc000129f90 pc=0x77f815
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr runtime.goexit()
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr /opt/hostedtoolcache/go/1.21.4/x64/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc000129fe8 sp=0xc000129fe0 pc=0x47aa01
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr created by google.golang.org/grpc/internal/transport.NewServerTransport in goroutine 10
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr /home/runner/go/pkg/mod/google.golang.org/[email protected]/internal/transport/http2_server.go:333 +0x1acc
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr goroutine 12 [select]:
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr runtime.gopark(0xc00008df70?, 0x4?, 0x0?, 0xa6?, 0xc00008dec0?)
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr /opt/hostedtoolcache/go/1.21.4/x64/src/runtime/proc.go:398 +0xce fp=0xc00008dd28 sp=0xc00008dd08 pc=0x44be4e
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr runtime.selectgo(0xc00008df70, 0xc00008deb8, 0x0?, 0x0, 0x0?, 0x1)
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr /opt/hostedtoolcache/go/1.21.4/x64/src/runtime/select.go:327 +0x725 fp=0xc00008de48 sp=0xc00008dd28 pc=0x45b8a5
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr google.golang.org/grpc/internal/transport.(*http2Server).keepalive(0xc0001e9040)
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr /home/runner/go/pkg/mod/google.golang.org/[email protected]/internal/transport/http2_server.go:1152 +0x225 fp=0xc00008dfc8 sp=0xc00008de48 pc=0x786ac5
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr google.golang.org/grpc/internal/transport.NewServerTransport.func4()
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr /home/runner/go/pkg/mod/google.golang.org/[email protected]/internal/transport/http2_server.go:339 +0x25 fp=0xc00008dfe0 sp=0xc00008dfc8 pc=0x77f705
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr runtime.goexit()
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr /opt/hostedtoolcache/go/1.21.4/x64/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc00008dfe8 sp=0xc00008dfe0 pc=0x47aa01
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr created by google.golang.org/grpc/internal/transport.NewServerTransport in goroutine 10
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr /home/runner/go/pkg/mod/google.golang.org/[email protected]/internal/transport/http2_server.go:339 +0x1b0e
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr goroutine 13 [IO wait]:
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr runtime.gopark(0xdf3ac0?, 0xb?, 0x0?, 0x0?, 0x6?)
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr /opt/hostedtoolcache/go/1.21.4/x64/src/runtime/proc.go:398 +0xce fp=0xc0000a0aa0 sp=0xc0000a0a80 pc=0x44be4e
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr runtime.netpollblock(0x4c4dd8?, 0x416e86?, 0x0?)
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr /opt/hostedtoolcache/go/1.21.4/x64/src/runtime/netpoll.go:564 +0xf7 fp=0xc0000a0ad8 sp=0xc0000a0aa0 pc=0x4448d7
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr internal/poll.runtime_pollWait(0x148304689db8, 0x72)
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr /opt/hostedtoolcache/go/1.21.4/x64/src/runtime/netpoll.go:343 +0x85 fp=0xc0000a0af8 sp=0xc0000a0ad8 pc=0x475925
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr internal/poll.(*pollDesc).wait(0xc0001a6800?, 0xc000118000?, 0x0)
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr /opt/hostedtoolcache/go/1.21.4/x64/src/internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc0000a0b20 sp=0xc0000a0af8 pc=0x4dfb47
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr internal/poll.(*pollDesc).waitRead(...)
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr /opt/hostedtoolcache/go/1.21.4/x64/src/internal/poll/fd_poll_runtime.go:89
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr internal/poll.(*FD).Read(0xc0001a6800, {0xc000118000, 0x8000, 0x8000})
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr /opt/hostedtoolcache/go/1.21.4/x64/src/internal/poll/fd_unix.go:164 +0x27a fp=0xc0000a0bb8 sp=0xc0000a0b20 pc=0x4e0e3a
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr net.(*netFD).Read(0xc0001a6800, {0xc000118000?, 0x1060100000000?, 0x8?})
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr /opt/hostedtoolcache/go/1.21.4/x64/src/net/fd_posix.go:55 +0x25 fp=0xc0000a0c00 sp=0xc0000a0bb8 pc=0x63eae5
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr net.(*conn).Read(0xc00008e310, {0xc000118000?, 0xc0000a0c90?, 0x3?})
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr /opt/hostedtoolcache/go/1.21.4/x64/src/net/net.go:179 +0x45 fp=0xc0000a0c48 sp=0xc0000a0c00 pc=0x64f1e5
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr net.(*TCPConn).Read(0x0?, {0xc000118000?, 0xc0000a0ca0?, 0x469d2d?})
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr <autogenerated>:1 +0x25 fp=0xc0000a0c78 sp=0xc0000a0c48 pc=0x661985
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr bufio.(*Reader).Read(0xc0000b93e0, {0xc0001c8120, 0x9, 0xc1571798b5839f33?})
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr /opt/hostedtoolcache/go/1.21.4/x64/src/bufio/bufio.go:244 +0x197 fp=0xc0000a0cb0 sp=0xc0000a0c78 pc=0x5b9f97
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr io.ReadAtLeast({0xa7b640, 0xc0000b93e0}, {0xc0001c8120, 0x9, 0x9}, 0x9)
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr /opt/hostedtoolcache/go/1.21.4/x64/src/io/io.go:335 +0x90 fp=0xc0000a0cf8 sp=0xc0000a0cb0 pc=0x4befd0
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr io.ReadFull(...)
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr /opt/hostedtoolcache/go/1.21.4/x64/src/io/io.go:354
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr golang.org/x/net/http2.readFrameHeader({0xc0001c8120, 0x9, 0xc000028600?}, {0xa7b640?, 0xc0000b93e0?})
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr /home/runner/go/pkg/mod/golang.org/x/[email protected]/http2/frame.go:237 +0x65 fp=0xc0000a0d48 sp=0xc0000a0cf8 pc=0x755305
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr golang.org/x/net/http2.(*Framer).ReadFrame(0xc0001c80e0)
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr /home/runner/go/pkg/mod/golang.org/x/[email protected]/http2/frame.go:498 +0x85 fp=0xc0000a0df0 sp=0xc0000a0d48 pc=0x755a45
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr google.golang.org/grpc/internal/transport.(*http2Server).HandleStreams(0xc0001e9040, 0x1?)
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr /home/runner/go/pkg/mod/google.golang.org/[email protected]/internal/transport/http2_server.go:636 +0x145 fp=0xc0000a0f00 sp=0xc0000a0df0 pc=0x782965
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr google.golang.org/grpc.(*Server).serveStreams(0xc0001ee1e0, {0xa81b38?, 0xc0001e9040})
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr /home/runner/go/pkg/mod/google.golang.org/[email protected]/server.go:979 +0x1c2 fp=0xc0000a0f80 sp=0xc0000a0f00 pc=0x7f0462
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr google.golang.org/grpc.(*Server).handleRawConn.func1()
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr /home/runner/go/pkg/mod/google.golang.org/[email protected]/server.go:920 +0x45 fp=0xc0000a0fe0 sp=0xc0000a0f80 pc=0x7efcc5
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr runtime.goexit()
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr /opt/hostedtoolcache/go/1.21.4/x64/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc0000a0fe8 sp=0xc0000a0fe0 pc=0x47aa01
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr created by google.golang.org/grpc.(*Server).handleRawConn in goroutine 10
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr /home/runner/go/pkg/mod/google.golang.org/[email protected]/server.go:919 +0x185
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr rax 0x0
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr rbx 0xe4ffc0
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr rcx 0x18
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr rdx 0x3dc65656
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr rdi 0x627541
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr rsi 0x7ec52
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr rbp 0xe6ffc0
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr rsp 0x7ffd1517b710
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr r8 0x23
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr r9 0x0
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr r10 0x7ffd151a1080
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr r11 0x3dc65656
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr r12 0xe8ffc0
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr r13 0xeaffc0
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr r14 0xe0ffc0
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr r15 0x0
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr rip 0x86853a
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr rflags 0x10202
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr cs 0x33
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr fs 0x0
5:40AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:45083): stderr gs 0x0
[172.17.0.1]:59270 500 - POST /v1/chat/completions
New feature
No response
No response
3:17AM ERR Failed starting/connecting to the gRPC service: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp: address /tmp/localai/backend/python/exllama/exllama.py: missing port in address"
https://github.com/sozercan/aikit/actions/runs/7523782972/job/20477643249#step:11:76
Improvement of existing experience
update to https://github.com/mudler/LocalAI/releases/tag/v2.0.0
Improvement of existing experience
https://github.com/astral-sh/uv
Returns a valid response and not a timeout error
Ends up by returning a timeout error. I am unsure if it's a bug or something I am doing wrong, but the example is simple, so maybe I am missing some parameters. The example is simple, so maybe I am missing some parameters.
Running on Docker Engine 20.10.14 on Mac OS.
#syntax=ghcr.io/sozercan/aikit:latest
apiVersion: v1alpha1
models:
- name: uncased-sentiment
source: https://github.com/lordofthejars/bert-base-multilingual-uncased-sentiment_onnx/releases/download/1.0.0/model.onnx
Then I build:
docker buildx create --use --name aikit-builder
docker buildx build . -t my-model -f aikitfile.yaml --load
Then I start it:
docker run -ti --rm -p 8080:8080 my-model
Finally I curl the model:
curl http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{
"model": "uncased-sentiment",
"messages": [{"role": "user", "content": "It is so good"}]
}'
And in the container console I got the following error:
rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:39863: connect: connection refused"
rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:39839: connect: connection refused"
rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:33679: connect: connection refused"
```
Any idea why this is happening? The example is a simple, not complicated lifecycle.
### Are you willing to submit PRs to contribute to this bug fix?
- [ ] Yes, I am willing to implement it.
Other
pretty new to this and was just wondering how I could serve this locally
https://huggingface.co/TheBloke/CodeLlama-7B-Instruct-GGUF/resolve/main/codellama-7b-instruct.Q5_K_M.gguf
???
Other
set up an action to auto patch model images using https://github.com/project-copacetic/copacetic
related #68 for cuda images
None
currently aikit generates packages metadata as a file (debian standard) for cuda images. since images are based on distroless (which uses status.d folder), this creates a mixed package metadata environment
this only affects cuda images since they install cuda libraries from nvidia
https://explore.ggcr.dev/layers/ghcr.io/sozercan/llama2@sha256:8d388f7af641fa199639d78a99f655f3c80d41332f44561dfbbc3391db21b31c/var/lib/dpkg/status
convert status file to files in status.d dir
https://explore.ggcr.dev/layers/ghcr.io/sozercan/llama2@sha256:8d388f7af641fa199639d78a99f655f3c80d41332f44561dfbbc3391db21b31c/var/lib/dpkg/status.d/
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.