How to fix it if it's running slow,about janhq/jan

Comments (23)

zlzzzlll commented on June 11, 2024 1

@louis-jan Okay, I'll try. Thank you.

from jan.

louis-jan commented on June 11, 2024

My computer configuration is as follows WIN11, 16 cores 32 threads CPU,64GB RAM, 4090 video card. Question: 1, can't run 18G and above models 2, With the general model, the speed is up to Token Speed: 100.57/s when answering the first question, and becomes extremely slow when answering the second question , Is there a problem with my settings somewhere?

Hi @zlzzzlll, what version are you on?

from jan.

zlzzzlll commented on June 11, 2024

我的电脑配置如下WIN11，16核32线程CPU，64GB RAM，4090显卡。问题：1、无法运行18G及以上机箱2、使用通用机箱，回答第一个问题时速度达到Token Speed: 100.57/ s，回答第二个问题的时间变得极慢，我的设置是否在某处有问题？

你好@zlzzzlll，你用的是什么版本？

windows 11 professional 22H2,JAN is the latest version 0.46

from jan.

zlzzzlll commented on June 11, 2024

@louis-jan
HI,windows 11 professional 22H2,JAN is the latest version 0.46

from jan.

louis-jan commented on June 11, 2024

I see, could you please try latest nightly build? It seems to be resolved on this build.

from jan.

zlzzzlll commented on June 11, 2024

@louis-jan Okay, I'll try uninstalling and reinstalling.

from jan.

zlzzzlll commented on June 11, 2024

@louis-jan Problems remain

from jan.

Van-QA commented on June 11, 2024

@louis-jan Problems remain

hi @zlzzzlll can you give us the app.log? thanks

from jan.

zlzzzlll commented on June 11, 2024

@Van-QA It's all an error.2024-02-20T08:46:58.086Z [NITRO]::CPU informations - 172024-02-20T08:46:58.088Z [NITRO]::Debug: Request to kill Nitro
2024-02-20T08:46:58.111Z [NITRO]::Debug: Nitro process is terminated
2024-02-20T08:46:58.619Z [NITRO]::Debug: Spawn nitro at path: C:\Users\LZ\jan\extensions@janhq\inference-nitro-extension\dist\bin\win-cuda-12-0\nitro.exe, and args:
2024-02-20T08:46:59.633Z [NITRO]::Error: Device 0: NVIDIA GeForce RTX 4090, compute capability 8.9, VMM: yes
2024-02-20T08:46:59.800Z [NITRO]::Error: llama_model_loader: loaded meta data with 23 key-value pairs and 291 tensors from C:\Users\LZ\jan\models\openchat-3.5-7b\openchat-3.5-1210.Q4_K_M.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv 0: general.architecture str = llama
llama_model_loader: - kv 1: general.name str = openchat_openchat-3.5-1210
llama_model_loader: - kv 2: llama.context_length u32 = 8192
llama_model_loader: - kv 3: llama.embedding_length u32 = 4096
llama_model_loader: - kv 4: llama.block_count u32 = 32
llama_model_loader: - kv 5: llama.feed_forward_length u32 = 14336
llama_model_loader: - kv 6: llama.rope.dimension_count u32 = 128
llama_model_loader: - kv 7: llama.attention.head_count u32 = 32
llama_model_loader: - kv 8: llama.attention.head_count_kv u32 = 8

2024-02-20T08:46:59.800Z [NITRO]::Error: llama_model_loader: - kv 9: llama.attention.layer_norm_rms_epsilon f32 = 0.000010
llama_model_loader: - kv 10: llama.rope.freq_base f32 = 10000.000000
llama_model_loader: - kv 11: general.file_type u32 = 15
llama_model_loader: - kv 12: tokenizer.ggml.model str = llama

2024-02-20T08:46:59.809Z [NITRO]::Error: llama_model_loader: - kv 13: tokenizer.ggml.tokens arr[str,32002] = ["", "~~", "~~", "<0x00>", "<...

2024-02-20T08:46:59.829Z [NITRO]::Error: llama_model_loader: - kv 14: tokenizer.ggml.scores arr[f32,32002] = [0.000000, 0.000000, 0.000000, 0.0000...

2024-02-20T08:46:59.832Z [NITRO]::Error: llama_model_loader: - kv 15: tokenizer.ggml.token_type arr[i32,32002] = [2, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, ...
llama_model_loader: - kv 16: tokenizer.ggml.bos_token_id u32 = 1
llama_model_loader: - kv 17: tokenizer.ggml.eos_token_id u32 = 32000
llama_model_loader: - kv 18: tokenizer.ggml.padding_token_id u32 = 0
llama_model_loader: - kv 19: tokenizer.ggml.add_bos_token bool = true
llama_model_loader: - kv 20: tokenizer.ggml.add_eos_token bool = false
llama_model_loader: - kv 21: tokenizer.chat_template str = {{ bos_token }}{% for message in mess...
llama_model_loader: - kv 22: general.quantization_version u32 = 2
llama_model_loader: - type f32: 65 tensors
llama_model_loader: - type q4_K: 193 tensors
llama_model_loader: - type q6_K: 33 tensors

2024-02-20T08:46:59.851Z [NITRO]::Error: llm_load_vocab: special tokens definition check successful ( 261/32002 ).

from jan.

louis-jan commented on June 11, 2024

2024-02-20T08:46:59.851Z [NITRO]::Error: llm_load_vocab: special tokens definition check successful ( 261/32002 ).

I think the log is cleared; no token speed recorded. Could you please try send multiple requests, capture the logs and duplicate it here again?

from jan.

zlzzzlll commented on June 11, 2024

@louis-jan I have no experience in programming and the APP.LOG file is actually 20 gigs. can you tell me if emptying this file will have any effect on JAN? Thank you!

from jan.

louis-jan commented on June 11, 2024

@louis-jan I have no experience in programming and the APP.LOG file is actually 20 gigs. can you tell me if emptying this file will have any effect on JAN? Thank you!

Hey sorry for that, it should be cleared automatically on the latest nightly build. Please feel free to delete this file at anytime.

from jan.

zlzzzlll commented on June 11, 2024

@louis-jan 2024-02-20T14:38:42.066Z [NITRO]::Debug: 20240220 14:38:21.702000 UTC 11828 INFO Waiting for task to be released status:1 - llamaCPP.cc:366
20240220 14:38:22.205000 UTC 11828 INFO Waiting for task to be released status:1 - llamaCPP.cc:366
20240220 14:38:22.720000 UTC 11828 INFO Waiting for task to be released status:1 - llamaCPP.cc:366
20240220 14:38:23.226000 UTC 11828 INFO Waiting for task to be released status:1 - llamaCPP.cc:366
20240220 14:38:23.729000 UTC 11828 INFO Waiting for task to be released status:1 - llamaCPP.cc:366
20240220 14:38:24.236000 UTC 11828 INFO Waiting for task to be released status:1 - llamaCPP.cc:366
20240220 14:38:24.751000 UTC 11828 INFO Waiting for task to be released status:1 - llamaCPP.cc:366
20240220 14:38:25.262000 UTC 11828 INFO Waiting for task to be released status:1 - llamaCPP.cc:366
20240220 14:38:25.771000 UTC 11828 INFO Waiting for task to be released status:1 - llamaCPP.cc:366
20240220 14:38:26.272000 UTC 11828 INFO Waiting for task to be released status:1 - llamaCPP.cc:366
20240220 14:38:26.784000 UTC 11828 INFO Waiting for task to be released status:1 - llamaCPP.cc:366
20240220 14:38:27.295000 UTC 11828 INFO Waiting for task to be released status:1 - llamaCPP.cc:366
20240220 14:38:27.804000 UTC 11828 INFO Waiting for task to be released status:1 - llamaCPP.cc:366
20240220 14:38:28.314000 UTC 11828 INFO Waiting for task to be released status:1 - llamaCPP.cc:366
20240220 14:38:28.825000 UTC 11828 INFO Waiting for task to be released status:1 - llamaCPP.cc:366
20240220 14:38:29.339000 UTC 11828 INFO Waiting for task to be released status:1 - llamaCPP.cc:366
20240220 14:38:29.849000 UTC 11828 INFO Waiting for task to be released status:1 - llamaCPP.cc:366
20240220 14:38:30.359000 UTC 11828 INFO Waiting for task to be released status:1 - llamaCPP.cc:366
20240220 14:38:30.872000 UTC 11828 INFO Waiting for task to be released status:1 - llamaCPP.cc:366
20240220 14:38:31.380000 UTC 11828 INFO Waiting for task to be released status:1 - llamaCPP.cc:366
20240220 14:38:31.887000 UTC 11828 INFO Waiting for task to be released status:1 - llamaCPP.cc:366
20240220 14:38:32.397000 UTC 11828 INFO Waiting for task to be released status:1 - llamaCPP.cc:366
20240220 14:38:32.905000 UTC 11828 INFO Waiting for task to be released status:1 - llamaCPP.cc:366
20240220 14:38:33.413000 UTC 11828 INFO Waiting for task to be released status:1 - llamaCPP.cc:366
20240220 14:38:33.920000 UTC 11828 INFO Waiting for task to be released status:1 - llamaCPP.cc:366
20240220 14:38:34.427000 UTC 11828 INFO Waiting for task to be released status:1 - llamaCPP.cc:366
20240220 14:38:34.935000 UTC 11828 INFO Waiting for task to be released status:1 - llamaCPP.cc:366
20240220 14:38:35.443000 UTC 11828 INFO Waiting for task to be released status:1 - llamaCPP.cc:366
20240220 14:38:35.951000 UTC 11828 INFO Waiting for task to be released status:1 - llamaCPP.cc:366
20240220 14:38:36.458000 UTC 11828 INFO Waiting for task to be released status:1 - llamaCPP.cc:366
20240220 14:38:36.966000 UTC 11828 INFO Waiting for task to be released status:1 - llamaCPP.cc:366
20240220 14:38:37.473000 UTC 11828 INFO Waiting for task to be released status:1 - llamaCPP.cc:366
20240220 14:38:37.981000 UTC 11828 INFO Waiting for task to be released status:1 - llamaCPP.cc:366
20240220 14:38:38.489000 UTC 11828 INFO Waiting for task to be released status:1 - llamaCPP.cc:366
20240220 14:38:38.989000 UTC 11828 INFO Waiting for task to be released status:1 - llamaCPP.cc:366
20240220 14:38:39.500000 UTC 11828 INFO Waiting for task to be released status:1 - llamaCPP.cc:366
20240220 14:38:40.011000 UTC 11828 INFO Waiting for task to be released status:1 - llamaCPP.cc:366
20240220 14:38:40.526000 UTC 11828 INFO Waiting for task to be released status:1 - llamaCPP.cc:366
20240220 14:38:41.038000 UTC 11828 INFO Waiting for task to be released status:1 - llamaCPP.cc:366
20240220 14:38:41.551000 UTC 11828 INFO Waiting for task to be released status:1 - llamaCPP.cc:366
20240220 14:38:42.065000 UTC 11828 INFO Waiting for task to be released status:1 - llamaCPP.cc:2024-02-20T14:38:42.659Z [NITRO]::Debug: Request to kill Nitro
2024-02-20T14:39:04.405Z [NITRO]::CPU informations - 17
2024-02-20T14:39:04.407Z [NITRO]::Debug: Request to kill Nitro
2024-02-20T14:39:04.430Z [NITRO]::Debug: Nitro process is terminated
2024-02-20T14:39:04.942Z [NITRO]::Debug: Spawning Nitro subprocess...
2024-02-20T14:39:04.942Z [NITRO]::Debug: Spawn nitro at path: C:\Users\LZ\jan\extensions@janhq\inference-nitro-extension\dist\bin\win-cuda-12-0\nitro.exe, and args: 1,127.0.0.1,3928
2024-02-20T14:39:05.044Z [NITRO]::Debug: �[93m �[94m �[93m �[94m �[93m �[94m �[93m_�[94m_�[93m_�[94m �[93m �[94m �[93m �[94m �[93m �[94m �[93m �[94m �[93m �[94m �[93m �[94m �[93m �[94m �[93m �[94m �[93m �[94m �[93m �[94m �[93m �[94m �[93m �[94m �[93m �[94m �[93m �[94m �[93m �[94m �[93m �[94m �[93m �[94m �[93m_�[94m_�[93m_�[94m �[93m �[94m �[93m �[94m �[93m �[94m �[93m �[94m �[93m �[94m �[93m_�[94m_�[93m_�[94m �[93m �[94m �[93m �[94m �[0m
�[93m �[94m �[93m �[94m �[93m �[94m/�[93m_�[94m_�[93m/�[94m �[93m �[94m �[93m �[94m �[93m �[94m �[93m �[94m_�[93m_�[94m_�[93m �[94m �[93m �[94m �[93m �[94m �[93m �[94m �[93m �[94m �[93m �[94m_�[93m_�[94m_�[93m �[94m �[93m �[94m �[93m �[94m �[93m �[94m �[93m/�[94m �[93m �[94m/�[93m\�[94m �[93m �[94m �[93m �[94m �[93m �[94m �[93m �[94m �[93m/�[94m �[93m �[94m/�[93m\�[94m �[93m �[94m �[93m �[0m
�[93m �[94m �[93m �[94m �[93m �[94m\�[93m �[94m �[93m\�[94m:�[93m\�[94m �[93m �[94m �[93m �[94m �[93m �[94m/�[93m �[94m �[93m/�[94m\�[93m �[94m �[93m �[94m �[93m �[94m �[93m �[94m �[93m �[94m/�[93m �[94m �[93m/�[94m\�[93m �[94m �[93m �[94m �[93m �[94m �[93m/�[94m �[93m �[94m/�[93m:�[94m:�[93m\�[94m �[93m �[94m �[93m �[94m �[93m �[94m �[93m/�[94m �[93m �[94m/�[93m:�[94m:�[93m\�[94m �[93m �[94m �[0m
�[93m �[94m �[93m �[94m �[93m �[94m �[93m\�[94m �[93m �[94m\�[93m:�[94m\�[93m �[94m �[93m �[94m �[93m/�[94m �[93m �[94m/�[93m:�[94m/�[93m �[94m �[93m �[94m �[93m �[94m �[93m �[94m �[93m/�[94m �[93m �[94m/�[93m:�[94m/�[93m �[94m �[93m �[94m �[93m �[94m/�[93m �[94m �[93m/�[94m:�[93m/�[94m\�[93m:�[94m\�[93m �[94m �[93m �[94m �[93m �[94m/�[93m �[94m �[93m/�[94m:�[93m/�[94m\�[93m:�[94m\�[93m �[94m �[0m
�[93m �[94m �[93m_�[94m_�[93m_�[94m_�[93m_�[94m\�[93m_�[94m_�[93m\�[94m:�[93m\�[94m �[93m �[94m/�[93m_�[94m_�[93m/�[94m:�[93m:�[94m\�[93m �[94m �[93m �[94m �[93m �[94m �[93m �[94m/�[93m �[94m �[93m/�[94m:�[93m/�[94m �[93m �[94m �[93m �[94m �[93m/�[94m �[93m �[94m/�[93m:�[94m/�[93m �[94m �[93m\�[94m:�[93m\�[94m �[93m �[94m �[93m/�[94m �[93m �[94m/�[93m:�[94m/�[93m �[94m �[93m\�[94m:�[93m\�[94m �[0m
�[93m �[94m/�[93m_�[94m_�[93m/�[94m:�[93m:�[94m:�[93m:�[94m:�[93m:�[94m:�[93m:�[94m\�[93m �[94m\�[93m_�[94m_�[93m\�[94m/�[93m\�[94m:�[93m\�[94m_�[93m_�[94m �[93m �[94m �[93m/�[94m �[93m �[94m/�[93m:�[94m:�[93m\�[94m �[93m �[94m �[93m �[94m/�[93m_�[94m_�[93m/�[94m:�[93m/�[94m �[93m/�[94m:�[93m/�[94m_�[93m_�[94m_�[93m �[94m/�[93m_�[94m_�[93m/�[94m:�[93m/�[94m �[93m\�[94m_�[93m_�[94m\�[93m:�[94m\�[0m
�[93m �[94m\�[93m �[94m �[93m\�[94m:�[93m\�[94m~~�[93m~~�[94m\�[93m~~�[94m~~�[93m\�[94m/�[93m �[94m �[93m �[94m �[93m\�[94m �[93m �[94m\�[93m:�[94m\�[93m/�[94m\�[93m �[94m/�[93m_�[94m_�[93m/�[94m:�[93m/�[94m\�[93m:�[94m\�[93m �[94m �[93m �[94m\�[93m �[94m �[93m\�[94m:�[93m\�[94m/�[93m:�[94m:�[93m:�[94m:�[93m:�[94m/�[93m �[94m\�[93m �[94m �[93m\�[94m:�[93m\�[94m �[93m/�[94m �[93m �[94m/�[93m:�[94m/�[0m
�[93m �[94m �[93m\�[94m �[93m �[94m\�[93m:�[94m\�[93m �[94m �[93m~~�[94m~~�[93m�[94m �[93m �[94m �[93m �[94m �[93m �[94m\�[93m_�[94m_�[93m\�[94m:�[93m:�[94m/�[93m �[94m\�[93m_�[94m_�[93m\�[94m/�[93m �[94m �[93m\�[94m:�[93m\�[94m �[93m �[94m �[93m\�[94m �[93m �[94m\�[93m:�[94m:�[93m/�[94m�[93m~~�[94m~~�[93m~�[94m �[93m �[94m �[93m\�[94m �[93m �[94m\�[93m:�[94m\�[93m �[94m �[93m/�[94m:�[93m/�[94m �[0m
�[93m �[94m �[93m �[94m\�[93m �[94m �[93m\�[94m:�[93m\�[94m �[93m �[94m �[93m �[94m �[93m �[94m �[93m �[94m �[93m �[94m/�[93m_�[94m_�[93m/�[94m:�[93m/�[94m �[93m �[94m �[93m �[94m �[93m �[94m �[93m\�[94m �[93m �[94m\�[93m:�[94m\�[93m �[94m �[93m �[94m\�[93m �[94m �[93m\�[94m:�[93m\�[94m �[93m �[94m �[93m �[94m �[93m �[94m �[93m �[94m\�[93m �[94m �[93m\�[94m:�[93m\�[94m/�[93m:�[94m/�[93m �[94m �[0m
�[93m �[94m �[93m �[94m �[93m\�[94m �[93m �[94m\�[93m:�[94m\�[93m �[94m �[93m �[94m �[93m �[94m �[93m �[94m �[93m �[94m\�[93m_�[94m_�[93m\�[94m/�[93m �[94m �[93m �[94m �[93m �[94m �[93m �[94m �[93m �[94m\�[93m_�[94m_�[93m\�[94m/�[93m �[94m �[93m �[94m �[93m\�[94m �[93m �[94m\�[93m:�[94m\�[93m �[94m �[93m �[94m �[93m �[94m �[93m �[94m �[93m\�[94m �[93m �[94m\�[93m:�[94m:�[93m/�[94m �[93m �[94m �[0m
�[93m �[94m �[93m �[94m �[93m �[94m\�[93m_�[94m_�[93m\�[94m/�[93m �[94m �[93m �[94m �[93m �[
2024-02-20T14:39:05.283Z [NITRO]::Debug: Loading model with params {"ctx_len":4096,"prompt_template":"GPT4 Correct User: {prompt}<|end_of_turn|>GPT4 Correct Assistant:","llama_model_path":"C:\Users\LZ\jan\models\openchat-3.5-7b\openchat-3.5-1210.Q4_K_M.gguf","user_prompt":"GPT4 Correct User: ","ai_prompt":"<|end_of_turn|>GPT4 Correct Assistant:","cpu_threads":17,"ngl":100}
2024-02-20T14:39:05.282Z [NITRO]::Debug: Nitro is ready
2024-02-20T14:39:05.317Z [NITRO]::Error: ggml_init_cublas: GGML_CUDA_FORCE_MMQ: no
ggml_init_cublas: CUDA_USE_TENSOR_CORES: yes
ggml_init_cublas: found 1 CUDA devices:
Device 0: NVIDIA GeForce RTX 4090, compute capability 8.9, VMM: yes

2024-02-20T14:39:05.450Z [NITRO]::Debug: 94m �[93m �[94m �[93m �[94m �[93m �[94m �[93m �[94m �[93m �[94m �[93m �[94m �[93m �[94m �[93m �[94m �[93m �[94m �[93m �[94m �[93m �[94m �[93m �[94m �[93m �[94m �[93m �[94m\�[93m_�[94m_�[93m\�[94m/�[93m �[94m �[93m �[94m �[93m �[94m �[93m �[94m �[93m �[94m\�[93m_�[94m_�[93m\�[94m/�[93m �[94m �[93m �[94m �[0m
�[0m20240220 14:39:05.048000 UTC 13120 INFO Nitro version: - main.cc:50
20240220 14:39:05.048000 UTC 13120 INFO Server started, listening at: 127.0.0.1:3928 - main.cc:54
20240220 14:39:05.048000 UTC 13120 INFO Please load your model - main.cc:55
20240220 14:39:05.048000 UTC 13120 INFO Number of thread is:32 - main.cc:62
20240220 14:39:05.287000 UTC 5832 INFO Setting up GGML CUBLAS PARAMS - llamaCPP.cc:545
{"timestamp":1708439945,"level":"INFO","function":"loadModelImpl","line":561,"message":"system info","n_threads":17,"total_threads":32,"system_info":"AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 0 | VSX = 0 | "}

2024-02-20T14:39:05.453Z [NITRO]::Error: llama_model_loader: loaded meta data with 23 key-value pairs and 291 tensors from C:\Users\LZ\jan\models\openchat-3.5-7b\openchat-3.5-1210.Q4_K_M.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv 0: general.architecture str = llama
llama_model_loader: - kv 1: general.name str = openchat_openchat-3.5-1210
llama_model_loader: - kv 2: llama.context_length u32 = 8192
llama_model_loader: - kv 3: llama.embedding_length u32 = 4096
llama_model_loader: - kv 4: llama.block_count u32 = 32
llama_model_loader: - kv 5: llama.feed_forward_length u32 = 14336
llama_model_loader: - kv 6: llama.rope.dimension_count u32 = 128
llama_model_loader: - kv 7: llama.attention.head_count u32 = 32
llama_model_loader: - kv 8: llama.attention.head_count_kv u32 = 8

2024-02-20T14:39:05.454Z [NITRO]::Error: llama_model_loader: - kv 9: llama.attention.layer_norm_rms_epsilon f32 = 0.000010
llama_model_loader: - kv 10: llama.rope.freq_base f32 = 10000.000000
llama_model_loader: - kv 11: general.file_type u32 = 15
llama_model_loader: - kv 12: tokenizer.ggml.model str = llama

2024-02-20T14:39:05.462Z [NITRO]::Error: llama_model_loader: - kv 13: tokenizer.ggml.tokens arr[str,32002] = ["", "~~", "~~", "<0x00>", "<...

2024-02-20T14:39:05.482Z [NITRO]::Error: llama_model_loader: - kv 14: tokenizer.ggml.scores arr[f32,32002] = [0.000000, 0.000000, 0.000000, 0.0000...

2024-02-20T14:39:05.486Z [NITRO]::Error: llama_model_loader: - kv 15: tokenizer.ggml.token_type arr[i32,32002] = [2, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, ...
llama_model_loader: - kv 16: tokenizer.ggml.bos_token_id u32 = 1
llama_model_loader: - kv 17: tokenizer.ggml.eos_token_id u32 = 32000
llama_model_loader: - kv 18: tokenizer.ggml.padding_token_id u32 = 0
llama_model_loader: - kv 19: tokenizer.ggml.add_bos_token bool = true
llama_model_loader: - kv 20: tokenizer.ggml.add_eos_token bool = false
llama_model_loader: - kv 21: tokenizer.chat_template str = {{ bos_token }}{% for message in mess...
llama_model_loader: - kv 22: general.quantization_version u32 = 2
llama_model_loader: - type f32: 65 tensors
llama_model_loader: - type q4_K: 193 tensors
llama_model_loader: - type q6_K: 33 tensors

2024-02-20T14:39:05.503Z [NITRO]::Error: llm_load_vocab: special tokens definition check successful ( 261/32002 ).
llm_load_print_meta: format = GGUF V3 (latest)
llm_load_print_meta: arch = llama
llm_load_print_meta: vocab type = SPM
llm_load_print_meta: n_vocab = 32002
llm_load_print_meta: n_merges = 0
llm_load_print_meta: n_ctx_train = 8192
llm_load_print_meta: n_embd = 4096
llm_load_print_meta: n_head = 32
llm_load_print_meta: n_head_kv = 8
llm_load_print_meta: n_layer = 32
llm_load_print_meta: n_rot = 128
llm_load_print_meta: n_embd_head_k = 128
llm_load_print_meta: n_embd_head_v = 128
llm_load_print_meta: n_gqa = 4
llm_load_print_meta: n_embd_k_gqa = 1024
llm_load_print_meta: n_embd_v_gqa = 1024
llm_load_print_meta: f_norm_eps = 0.0e+00
llm_load_print_meta: f_norm_rms_eps = 1.0e-05
llm_load_print_meta: f_clamp_kqv = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: n_ff = 14336
llm_load_print_meta: n_expert = 0
llm_load_print_meta: n_expert_used = 0
llm_load_print_meta: rope scaling = linear
llm_load_print_meta: freq_base_train = 10000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_yarn_orig_ctx = 8192
llm_load_print_meta: rope_finetuned = unknown
llm_load_print_meta: model type = 7B
llm_load_print_meta: model ftype = Q4_K - Medium
llm_load_print_meta: model params = 7.24 B
llm_load_print_meta: model size = 4.07 GiB (4.83 BPW)
llm_load_print_meta: general.name = openchat_openchat-3.5-1210
llm_load_print_meta: BOS token = 1 ''
llm_load_print_meta: EOS token = 32000 '<|end_of_turn|>'
llm_load_print_meta: UNK token = 0 ''
llm_load_print_meta: PAD token = 0 ''
llm_load_print_meta: LF token = 13 '<0x0A>'

2024-02-20T14:39:05.504Z [NITRO]::Error: llm_load_tensors: ggml ctx size = 0.22 MiB

2024-02-20T14:39:05.809Z [NITRO]::Error: llm_load_tensors: offloading 32 repeating layers to GPU
llm_load_tensors: offloading non-repeating layers to GPU
llm_load_tensors: offloaded 33/33 layers to GPU
llm_load_tensors: CPU buffer size = 70.32 MiB
llm_load_tensors: CUDA0 buffer size = 4095.06 MiB
.
2024-02-20T14:39:05.902Z [NITRO]::Error: .
2024-02-20T14:39:05.910Z [NITRO]::Error: .
2024-02-20T14:39:05.919Z [NITRO]::Error: .
2024-02-20T14:39:05.939Z [NITRO]::Error: .
2024-02-20T14:39:05.948Z [NITRO]::Error: .
2024-02-20T14:39:05.956Z [NITRO]::Error: .
2024-02-20T14:39:05.963Z [NITRO]::Error: .
2024-02-20T14:39:05.981Z [NITRO]::Error: .
2024-02-20T14:39:05.987Z [NITRO]::Error: .
2024-02-20T14:39:06.001Z [NITRO]::Error: .
2024-02-20T14:39:06.018Z [NITRO]::Error: .
2024-02-20T14:39:06.021Z [NITRO]::Error: .
2024-02-20T14:39:06.038Z [NITRO]::Error: .
2024-02-20T14:39:06.047Z [NITRO]::Error: .
2024-02-20T14:39:06.058Z [NITRO]::Error: .
2024-02-20T14:39:06.070Z [NITRO]::Error: .
2024-02-20T14:39:06.088Z [NITRO]::Error: .
2024-02-20T14:39:06.091Z [NITRO]::Error: .
2024-02-20T14:39:06.103Z [NITRO]::Error: .
2024-02-20T14:39:06.120Z [NITRO]::Error: .
2024-02-20T14:39:06.126Z [NITRO]::Error: .
2024-02-20T14:39:06.140Z [NITRO]::Error: .
2024-02-20T14:39:06.157Z [NITRO]::Error: .
2024-02-20T14:39:06.163Z [NITRO]::Error: .
2024-02-20T14:39:06.172Z [NITRO]::Error: .
2024-02-20T14:39:06.190Z [NITRO]::Error: .
2024-02-20T14:39:06.196Z [NITRO]::Error: .
2024-02-20T14:39:06.214Z [NITRO]::Error: .
2024-02-20T14:39:06.222Z [NITRO]::Error: .
2024-02-20T14:39:06.242Z [NITRO]::Error: .
2024-02-20T14:39:06.259Z [NITRO]::Error: .
2024-02-20T14:39:06.264Z [NITRO]::Error: .
2024-02-20T14:39:06.283Z [NITRO]::Error: .
2024-02-20T14:39:06.291Z [NITRO]::Error: .
2024-02-20T14:39:06.300Z [NITRO]::Error: .
2024-02-20T14:39:06.319Z [NITRO]::Error: .
2024-02-20T14:39:06.328Z [NITRO]::Error: .
2024-02-20T14:39:06.337Z [NITRO]::Error: .
2024-02-20T14:39:06.353Z [NITRO]::Error: .
2024-02-20T14:39:06.362Z [NITRO]::Error: .
2024-02-20T14:39:06.371Z [NITRO]::Error: .
2024-02-20T14:39:06.387Z [NITRO]::Error: .
2024-02-20T14:39:06.396Z [NITRO]::Error: .
2024-02-20T14:39:06.408Z [NITRO]::Error: .
2024-02-20T14:39:06.424Z [NITRO]::Error: .
2024-02-20T14:39:06.434Z [NITRO]::Error: .
2024-02-20T14:39:06.443Z [NITRO]::Error: .
2024-02-20T14:39:06.459Z [NITRO]::Error: .
2024-02-20T14:39:06.468Z [NITRO]::Error: .
2024-02-20T14:39:06.481Z [NITRO]::Error: .
2024-02-20T14:39:06.494Z [NITRO]::Error: .
2024-02-20T14:39:06.512Z [NITRO]::Error: .
2024-02-20T14:39:06.518Z [NITRO]::Error: .
2024-02-20T14:39:06.532Z [NITRO]::Error: .
2024-02-20T14:39:06.541Z [NITRO]::Error: .
2024-02-20T14:39:06.553Z [NITRO]::Error: .
2024-02-20T14:39:06.566Z [NITRO]::Error: .
2024-02-20T14:39:06.585Z [NITRO]::Error: .
2024-02-20T14:39:06.591Z [NITRO]::Error: .
2024-02-20T14:39:06.610Z [NITRO]::Error: .
2024-02-20T14:39:06.619Z [NITRO]::Error: .
2024-02-20T14:39:06.625Z [NITRO]::Error: .
2024-02-20T14:39:06.638Z [NITRO]::Error: .
2024-02-20T14:39:06.656Z [NITRO]::Error: .
2024-02-20T14:39:06.662Z [NITRO]::Error: .
2024-02-20T14:39:06.699Z [NITRO]::Error: .
2024-02-20T14:39:06.717Z [NITRO]::Error: .
2024-02-20T14:39:06.726Z [NITRO]::Error: .
2024-02-20T14:39:06.736Z [NITRO]::Error: .
2024-02-20T14:39:06.754Z [NITRO]::Error: .
2024-02-20T14:39:06.757Z [NITRO]::Error: .
2024-02-20T14:39:06.773Z [NITRO]::Error: .
2024-02-20T14:39:06.782Z [NITRO]::Error: .
2024-02-20T14:39:06.795Z [NITRO]::Error: .
2024-02-20T14:39:06.807Z [NITRO]::Error: .
2024-02-20T14:39:06.825Z [NITRO]::Error: .
2024-02-20T14:39:06.828Z [NITRO]::Error: .
2024-02-20T14:39:06.841Z [NITRO]::Error: .
2024-02-20T14:39:06.859Z [NITRO]::Error: .
2024-02-20T14:39:06.865Z [NITRO]::Error: .
2024-02-20T14:39:06.879Z [NITRO]::Error: .
2024-02-20T14:39:06.896Z [NITRO]::Error: .
2024-02-20T14:39:06.902Z [NITRO]::Error: .
2024-02-20T14:39:06.917Z [NITRO]::Error: .
2024-02-20T14:39:06.926Z [NITRO]::Error: .
2024-02-20T14:39:06.938Z [NITRO]::Error: .
2024-02-20T14:39:06.955Z [NITRO]::Error: .
2024-02-20T14:39:06.964Z [NITRO]::Error: .
2024-02-20T14:39:06.973Z [NITRO]::Error: .
2024-02-20T14:39:06.993Z [NITRO]::Error: .
2024-02-20T14:39:07.002Z [NITRO]::Error: .
2024-02-20T14:39:07.011Z [NITRO]::Error: .
2024-02-20T14:39:07.031Z [NITRO]::Error: .
2024-02-20T14:39:07.041Z [NITRO]::Error: .
2024-02-20T14:39:07.050Z [NITRO]::Error: .
2024-02-20T14:39:07.057Z [NITRO]::Error: .

2024-02-20T14:39:07.057Z [NITRO]::Error: llama_new_context_with_model: n_ctx = 4096
llama_new_context_with_model: freq_base = 10000.0
llama_new_context_with_model: freq_scale = 1

2024-02-20T14:39:07.069Z [NITRO]::Error: llama_kv_cache_init: CUDA0 KV buffer size = 512.00 MiB
llama_new_context_with_model: KV self size = 512.00 MiB, K (f16): 256.00 MiB, V (f16): 256.00 MiB

2024-02-20T14:39:07.073Z [NITRO]::Error: llama_new_context_with_model: CUDA_Host input buffer size = 16.02 MiB

2024-02-20T14:39:07.080Z [NITRO]::Error: llama_new_context_with_model: CUDA0 compute buffer size = 316.80 MiB
llama_new_context_with_model: CUDA_Host compute buffer size = 8.80 MiB
llama_new_context_with_model: graph splits (measure): 3

2024-02-20T14:39:07.112Z [NITRO]::Debug: [1708439947] [C:\actions-runner_work\nitro\nitro\controllers\llamaCPP.h: 593][llama_server_context::initialize] Available slots:
[1708439947] [C:\actions-runner_work\nitro\nitro\controllers\llamaCPP.h: 601][llama_server_context::initialize] -> Slot 0 - max context: 4096

2024-02-20T14:39:07.113Z [NITRO]::Debug: 20240220 14:39:07.117000 UTC 5832 INFO Started background task here! - llamaCPP.cc:572
[1708439947] [C:\actions-runner_work\nitro\nitro\controllers\llamaCPP.h: 879][llama_server_context::launch_slot_with_data] slot 0 is processing [task id: 0]
[1708439947] [C:\actions-runner_work\nitro\nitro\controllers\llamaCPP.h: 1734][llama_server_context::update_slots] slot 0 : kv cache rm - [0, end)

2024-02-20T14:39:07.159Z [NITRO]::Debug: [1708439947] [C:\actions-runner_work\nitro\nitro\controllers\llamaCPP.h: 472][llama_client_slot::print_timings]
[1708439947] [C:\actions-runner_work\nitro\nitro\controllers\llamaCPP.h: 477][llama_client_slot::print_timings] print_timings: prompt eval time = 20.85 ms / 2 tokens ( 10.43 ms per token, 95.91 tokens per second)
[1708439947] [C:\actions-runner_work\nitro\nitro\controllers\llamaCPP.h: 482][llama_client_slot::print_timings] print_timings: eval time = 26.06 ms / 4 runs ( 6.52 ms per token, 153.47 tokens per second)
[1708439947] [C:\actions-runner_work\nitro\nitro\controllers\llamaCPP.h: 484][llama_client_slot::print_timings] print_timings: total time = 46.92 ms

2024-02-20T14:39:07.160Z [NITRO]::Debug: [1708439947] [C:\actions-runner_work\nitro\nitro\controllers\llamaCPP.h: 1597][llama_server_context::update_slots] slot 0 released (7 tokens in cache)
[1708439947] [C:\actions-runner_work\nitro\nitro\controllers\llamaCPP.h: 1541][llama_server_context::update_slots] all slots are idle and system prompt is empty, clear the KV cache

2024-02-20T14:39:07.165Z [NITRO]::Debug: Load model success with response {}
2024-02-20T14:39:07.168Z [NITRO]::Debug: Validate model state with response 200
2024-02-20T14:39:07.169Z [NITRO]::Debug: Validate model state success with response {"model_data":"{"frequency_penalty":0.0,"grammar":"","ignore_eos":false,"logit_bias":[],"min_p":0.05000000074505806,"mirostat":0,"mirostat_eta":0.10000000149011612,"mirostat_tau":5.0,"model":"C:\\Users\\LZ\\jan\\models\\openchat-3.5-7b\\openchat-3.5-1210.Q4_K_M.gguf","n_ctx":4096,"n_keep":0,"n_predict":2,"n_probs":0,"penalize_nl":true,"penalty_prompt_tokens":[],"presence_penalty":0.0,"repeat_last_n":64,"repeat_penalty":1.100000023841858,"seed":4294967295,"stop":[],"stream":false,"temperature":0.800000011920929,"tfs_z":1.0,"top_k":40,"top_p":0.949999988079071,"typical_p":1.0,"use_penalty_prompt_tokens":false}","model_loaded":true}
2024-02-20T14:39:07.286Z [NITRO]::Debug: 20240220 14:39:07.159000 UTC 5832 INFO {"content":" there! I'm","generation_settings":{"frequency_penalty":0.0,"grammar":"","ignore_eos":false,"logit_bias":[],"min_p":0.05000000074505806,"mirostat":0,"mirostat_eta":0.10000000149011612,"mirostat_tau":5.0,"model":"C:\Users\LZ\jan\models\openchat-3.5-7b\openchat-3.5-1210.Q4_K_M.gguf","n_ctx":4096,"n_keep":0,"n_predict":2,"n_probs":0,"penalize_nl":true,"penalty_prompt_tokens":[],"presence_penalty":0.0,"repeat_last_n":64,"repeat_penalty":1.100000023841858,"seed":4294967295,"stop":[],"stream":false,"temperature":0.800000011920929,"tfs_z":1.0,"top_k":40,"top_p":0.949999988079071,"typical_p":1.0,"use_penalty_prompt_tokens":false},"model":"C:\Users\LZ\jan\models\openchat-3.5-7b\openchat-3.5-1210.Q4_K_M.gguf","prompt":"Hello","slot_id":0,"stop":true,"stopped_eos":false,"stopped_limit":true,"stopped_word":false,"stopping_word":"","timings":{"predicted_ms":26.063,"predicted_n":4,"predicted_per_second":153.47427387484174,"predicted_per_token_ms":6.51575,"prompt_ms":20.853,"prompt_n":2,"prompt_per_second":95.90946146837385,"prompt_per_token_ms":10.4265},"tokens_cached":6,"tokens_evaluated":2,"tokens_predicted":4,"truncated":false} - llamaCPP.cc:133
20240220 14:39:07.289000 UTC 17116 DEBUG [makeHeaderString] send stream with transfer-encoding chunked - HttpResponseImpl.cc:535
[1708439947] [C:\actions-runner_work\nitro\nitro\controllers\llamaCPP.h: 879][llama_server_context::launch_slot_with_data] slot 0 is processing [task id: 1]

2024-02-20T14:39:07.287Z [NITRO]::Debug: [1708439947] [C:\actions-runner_work\nitro\nitro\controllers\llamaCPP.h: 1734][llama_server_context::update_slots] slot 0 : kv cache rm - [0, end)

2024-02-20T14:39:07.806Z [NITRO]::Debug: [1708439947] [C:\actions-runner_work\nitro\nitro\controllers\llamaCPP.h: 472][llama_client_slot::print_timings]
[1708439947] [C:\actions-runner_work\nitro\nitro\controllers\llamaCPP.h: 477][llama_client_slot::print_timings] print_timings: prompt eval time = 169.62 ms / 653 tokens ( 0.26 ms per token, 3849.76 tokens per second)
[1708439947] [C:\actions-runner_work\nitro\nitro\controllers\llamaCPP.h: 482][llama_client_slot::print_timings] print_timings: eval time = 350.57 ms / 37 runs ( 9.47 ms per token, 105.54 tokens per second)
[1708439947] [C:\actions-runner_work\nitro\nitro\controllers\llamaCPP.h: 484][llama_client_slot::print_timings] print_timings: total time = 520.19 ms
[1708439947] [C:\actions-runner_work\nitro\nitro\controllers\llamaCPP.h: 1597][llama_server_context::update_slots] slot 0 released (691 tokens in cache)

2024-02-20T14:39:07.807Z [NITRO]::Debug: 20240220 14:39:07.812000 UTC 17116 INFO reached result stop - llamaCPP.cc:354
[1708439947] [C:\actions-runner_work\nitro\nitro\controllers\llamaCPP.h: 1597][llama_server_context::update_slots] slot 0 released (691 tokens in cache)
20240220 14:39:07.812000 UTC 17116 INFO Connection closed or buffer is null. Reset context - llamaCPP.cc:318
[1708439947] [C:\actions-runner_work\nitro\nitro\controllers\llamaCPP.h: 879][llama_server_context::launch_slot_with_data] slot 0 is processing [task id: 3]
[1708439947] [C:\actions-runner_work\nitro\nitro\controllers\llamaCPP.h: 1462][llama_server_context::process_tasks] slot unavailable
[1708439947] [C:\actions-runner_work\nitro\nitro\controllers\llamaCPP.h: 1734][llama_server_context::update_slots] slot 0 : kv cache rm - [0, end)

2024-02-20T14:39:08.279Z [NITRO]::Debug: [1708439948] [C:\actions-runner_work\nitro\nitro\controllers\llamaCPP.h: 472][llama_client_slot::print_timings]
[1708439948] [C:\actions-runner_work\nitro\nitro\controllers\llamaCPP.h: 477][llama_client_slot::print_timings] print_timings: prompt eval time = 148.66 ms / 653 tokens ( 0.23 ms per token, 4392.43 tokens per second)
[1708439948] [C:\actions-runner_work\nitro\nitro\controllers\llamaCPP.h: 482][llama_client_slot::print_timings] print_timings: eval time = 323.66 ms / 37 runs ( 8.75 ms per token, 114.32 tokens per second)
[1708439948] [C:\actions-runner_work\nitro\nitro\controllers\llamaCPP.h: 484][llama_client_slot::print_timings] print_timings: total time = 472.32 ms
[1708439948] [C:\actions-runner_work\nitro\nitro\controllers\llamaCPP.h: 1597][llama_server_context::update_slots] slot 0 released (691 tokens in cache)

2024-02-20T14:39:14.751Z [NITRO]::Debug: 20240220 14:39:14.715000 UTC 17116 DEBUG [makeHeaderString] send stream with transfer-encoding chunked - HttpResponseImpl.cc:535
[1708439954] [C:\actions-runner_work\nitro\nitro\controllers\llamaCPP.h: 879][llama_server_context::launch_slot_with_data] slot 0 is processing [task id: 6]
[1708439954] [C:\actions-runner_work\nitro\nitro\controllers\llamaCPP.h: 1734][llama_server_context::update_slots] slot 0 : kv cache rm - [0, end)

2024-02-20T14:39:15.268Z [NITRO]::Debug: [1708439955] [C:\actions-runner_work\nitro\nitro\controllers\llamaCPP.h: 472][llama_client_slot::print_timings]
[1708439955] [C:\actions-runner_work\nitro\nitro\controllers\llamaCPP.h: 477][llama_client_slot::print_timings] print_timings: prompt eval time = 154.78 ms / 702 tokens ( 0.22 ms per token, 4535.41 tokens per second)
[1708439955] [C:\actions-runner_work\nitro\nitro\controllers\llamaCPP.h: 482][llama_client_slot::print_timings] print_timings: eval time = 402.91 ms / 37 runs ( 10.89 ms per token, 91.83 tokens per second)
[1708439955] [C:\actions-runner_work\nitro\nitro\controllers\llamaCPP.h: 484][llama_client_slot::print_timings] print_timings: total time = 557.69 ms

2024-02-20T14:39:15.269Z [NITRO]::Debug: [1708439955] [C:\actions-runner_work\nitro\nitro\controllers\llamaCPP.h: 1597][llama_server_context::update_slots] slot 0 released (740 tokens in cache)
20240220 14:39:15.273000 UTC 17116 INFO reached result stop - llamaCPP.cc:354
[1708439955] [C:\actions-runner_work\nitro\nitro\controllers\llamaCPP.h: 1597][llama_server_context::update_slots] slot 0 released (740 tokens in cache)
20240220 14:39:15.273000 UTC 17116 INFO Connection closed or buffer is null. Reset context - llamaCPP.cc:318
[1708439955] [C:\actions-runner_work\nitro\nitro\controllers\llamaCPP.h: 879][llama_server_context::launch_slot_with_data] slot 0 is processing [task id: 8]
[1708439955] [C:\actions-runner_work\nitro\nitro\controllers\llamaCPP.h: 1462][llama_server_context::process_tasks] slot unavailable
[1708439955] [C:\actions-runner_work\nitro\nitro\controllers\llamaCPP.h: 1734][llama_server_context::update_slots] slot 0 : kv cache rm - [0, end)

2024-02-20T14:39:15.743Z [NITRO]::Debug: [1708439955] [C:\actions-runner_work\nitro\nitro\controllers\llamaCPP.h: 472][llama_client_slot::print_timings]
[1708439955] [C:\actions-runner_work\nitro\nitro\controllers\llamaCPP.h: 477][llama_client_slot::print_timings] print_timings: prompt eval time = 152.17 ms / 702 tokens ( 0.22 ms per token, 4613.38 tokens per second)
[1708439955] [C:\actions-runner_work\nitro\nitro\controllers\llamaCPP.h: 482][llama_client_slot::print_timings] print_timings: eval time = 322.24 ms / 37 runs ( 8.71 ms per token, 114.82 tokens per second)
[1708439955] [C:\actions-runner_work\nitro\nitro\controllers\llamaCPP.h: 484][llama_client_slot::print_timings] print_timings: total time = 474.40 ms
[1708439955] [C:\actions-runner_work\nitro\nitro\controllers\llamaCPP.h: 1597][llama_server_context::update_slots] slot 0 released (740 tokens in cache)

2024-02-20T14:39:17.450Z [NITRO]::Debug: 20240220 14:39:17.451000 UTC 17116 DEBUG [makeHeaderString] send stream with transfer-encoding chunked - HttpResponseImpl.cc:535
[1708439957] [C:\actions-runner_work\nitro\nitro\controllers\llamaCPP.h: 879][llama_server_context::launch_slot_with_data] slot 0 is processing [task id: 11]
[1708439957] [C:\actions-runner_work\nitro\nitro\controllers\llamaCPP.h: 1734][llama_server_context::update_slots] slot 0 : kv cache rm - [0, end)

2024-02-20T14:39:18.168Z [NITRO]::Debug: [1708439958] [C:\actions-runner_work\nitro\nitro\controllers\llamaCPP.h: 472][llama_client_slot::print_timings]
[1708439958] [C:\actions-runner_work\nitro\nitro\controllers\llamaCPP.h: 477][llama_client_slot::print_timings] print_timings: prompt eval time = 157.95 ms / 751 tokens ( 0.21 ms per token, 4754.64 tokens per second)
[1708439958] [C:\actions-runner_work\nitro\nitro\controllers\llamaCPP.h: 482][llama_client_slot::print_timings] print_timings: eval time = 560.67 ms / 37 runs ( 15.15 ms per token, 65.99 tokens per second)
[1708439958] [C:\actions-runner_work\nitro\nitro\controllers\llamaCPP.h: 484][llama_client_slot::print_timings] print_timings: total time = 718.62 ms

2024-02-20T14:39:18.169Z [NITRO]::Debug: [1708439958] [C:\actions-runner_work\nitro\nitro\controllers\llamaCPP.h: 1597][llama_server_context::update_slots] slot 0 released (789 tokens in cache)
20240220 14:39:18.173000 UTC 17116 INFO reached result stop - llamaCPP.cc:354
[1708439958] [C:\actions-runner_work\nitro\nitro\controllers\llamaCPP.h: 1597][llama_server_context::update_slots] slot 0 released (789 tokens in cache)
20240220 14:39:18.173000 UTC 17116 INFO Connection closed or buffer is null. Reset context - llamaCPP.cc:318
[1708439958] [C:\actions-runner_work\nitro\nitro\controllers\llamaCPP.h: 879][llama_server_context::launch_slot_with_data] slot 0 is processing [task id: 13]
[1708439958] [C:\actions-runner_work\nitro\nitro\controllers\llamaCPP.h: 1462][llama_server_context::process_tasks] slot unavailable
[1708439958] [C:\actions-runner_work\nitro\nitro\controllers\llamaCPP.h: 1734][llama_server_context::update_slots] slot 0 : kv cache rm - [0, end)

2024-02-20T14:39:18.643Z [NITRO]::Debug: [1708439958] [C:\actions-runner_work\nitro\nitro\controllers\llamaCPP.h: 472][llama_client_slot::print_timings]
[1708439958] [C:\actions-runner_work\nitro\nitro\controllers\llamaCPP.h: 477][llama_client_slot::print_timings] print_timings: prompt eval time = 154.14 ms / 751 tokens ( 0.21 ms per token, 4872.19 tokens per second)

from jan.

louis-jan commented on June 11, 2024

@zlzzzlll. Looks like you are still on latest release build.

Could you please try install this build.
https://delta.jan.ai/latest/jan-win-x64-0.4.6-276.exe

Also, based on the log you provided, the token speeds are:

105.54
114.32
91.83
114.82
Last attempt: 65.99. Please help check if this will go back to normal in the next attempts?

from jan.

zlzzzlll commented on June 11, 2024

@louis-jan After I deleted APP.LOG and repeated typing a word, the speed was improved, but still trending downward

from jan.

louis-jan commented on June 11, 2024

@zlzzzlll. Please try install nightly build, https://delta.jan.ai/latest/jan-win-x64-0.4.6-276.exe

from jan.

zlzzzlll commented on June 11, 2024

@louis-jan To install this link you posted, do I have to uninstall the entire previous installation of JAN?

from jan.

louis-jan commented on June 11, 2024

uninstall

You don't have to.

from jan.

zlzzzlll commented on June 11, 2024

@louis-jan It's a direct install, right?

from jan.

louis-jan commented on June 11, 2024

@louis-jan It's a direct install, right?
@zlzzzlll, yes it will overwrite the existing version

from jan.

zlzzzlll commented on June 11, 2024

@louis-jan Hi, seems to have solved my dilemma, thank you! Also, this version you gave me which I have installed, will I be automatically prompted to upgrade in the future? If not, where can I download the latest latest nightly build? Can you give me a link? So that I can still use this method to solve the problem in the future. Thanks again!

from jan.

louis-jan commented on June 11, 2024

@louis-jan Hi, seems to have solved my dilemma, thank you! Also, this version you gave me which I have installed, will I be automatically prompted to upgrade in the future? If not, where can I download the latest latest nightly build? Can you give me a link? So that I can still use this method to solve the problem in the future. Thanks again!

It would prompt you to upgrade automatically, or you can go to "Jan" > "Check for Updates" to upgrade. Don't have to download new build manually.

from jan.

louis-jan commented on June 11, 2024

You can also find the latest nightly build on our website or Discord channel.

from jan.

How to fix it if it's running slow about jan HOT 23 CLOSED

Comments (23)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent