To not run out of space in the context's memory pool.
Consistently run into this every time a session reaches 500+ tokens or giving a 500+ token scenario when using a more recent ggjt version of Pygmalion located here. Does not appear to effect standard llama.cpp models. Have not tested other model types that are compatible with koboldcpp. DOES NOT EFFECT older conversion of Pygmalion model to ggml located here. It is able to handle a starting scenario of 1000+ tokens without this issue. Processing Prompt (864 / 1302 tokens)
Have plenty of RAM available when it happens.
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 43 bits physical, 48 bits virtual
Byte Order: Little Endian
CPU(s): 12
On-line CPU(s) list: 0-11
Vendor ID: AuthenticAMD
Model name: AMD Ryzen 5 2600 Six-Core Processor
CPU family: 23
Model: 8
Thread(s) per core: 2
Core(s) per socket: 6
Socket(s): 1
Stepping: 2
BogoMIPS: 7600.11
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr s
se sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop
_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2
movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a
misalignsse 3dnowprefetch osvw skinit wdt tce topoext perfctr_core perfctr_nb bpext perfc
tr_llc mwaitx cpb hw_pstate ssbd ibpb vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap
clflushopt sha_ni xsaveopt xsavec xgetbv1 clzero irperf xsaveerptr arat npt lbrv svm_lock
nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_v
msave_vmload vgif overflow_recov succor smca sev sev_es
Virtualization features:
Virtualization: AMD-V
Caches (sum of all):
L1d: 192 KiB (6 instances)
L1i: 384 KiB (6 instances)
L2: 3 MiB (6 instances)
L3: 16 MiB (2 instances)
NUMA:
NUMA node(s): 1
NUMA node0 CPU(s): 0-11
Vulnerabilities:
Itlb multihit: Not affected
L1tf: Not affected
Mds: Not affected
Meltdown: Not affected
Mmio stale data: Not affected
Retbleed: Vulnerable
Spec store bypass: Vulnerable
Spectre v1: Vulnerable: __user pointer sanitization and usercopy barriers only; no swapgs barriers
Spectre v2: Vulnerable, IBPB: disabled, STIBP: disabled, PBRSB-eIBRS: Not affected
Srbds: Not affected
Tsx async abort: Not affected
Please provide detailed steps for reproducing the issue. We are not sitting in front of your screen, so the more detail the better.
[rabid@rabid-ms7b87 koboldcpp]$ python koboldcpp.py ../pygmalion-6b-v3-ggml-ggjt-q4_0.bin --threads 6 --stream
Welcome to KoboldCpp - Version 1.3
Prebuilt OpenBLAS binaries only available for windows. Please manually build/link libopenblas from makefile with LLAMA_OPENBLAS=1
Initializing dynamic library: koboldcpp.dll
Loading model: /home/rabid/Desktop/pygmalion-6b-v3-ggml-ggjt-q4_0.bin
[Parts: 1, Threads: 6]
---
Identified as GPT-J model: (ver 102)
Attempting to Load...
---
gptj_model_load: loading model from '/home/rabid/Desktop/pygmalion-6b-v3-ggml-ggjt-q4_0.bin' - please wait ...
gptj_model_load: n_vocab = 50400
gptj_model_load: n_ctx = 2048
gptj_model_load: n_embd = 4096
gptj_model_load: n_head = 16
gptj_model_load: n_layer = 28
gptj_model_load: n_rot = 64
gptj_model_load: f16 = 2
gptj_model_load: ggml ctx size = 4505.45 MB
gptj_model_load: memory_size = 896.00 MB, n_mem = 57344
gptj_model_load: ................................... done
gptj_model_load: model size = 3609.38 MB / num tensors = 285
Load Model OK: True
Embedded Kobold Lite loaded.
Starting Kobold HTTP Server on port 5001
Please connect to custom endpoint at http://localhost:5001?streaming=1
127.0.0.1 - - [10/Apr/2023 09:42:18] "GET /?streaming=1 HTTP/1.1" 200 -
127.0.0.1 - - [10/Apr/2023 09:42:18] "GET /api/latest/model HTTP/1.1" 200 -
127.0.0.1 - - [10/Apr/2023 09:42:18] "GET /sw.js HTTP/1.1" 404 -
Input: {"n": 1, "max_context_length": 1000, "max_length": 8, "rep_pen": 1.15, "temperature": 0.6, "top_p": 0.9, "top_k": 40, "top_a": 0, "typical": 1, "tfs": 1, "rep_pen_range": 1024, "rep_pen_slope": 0.7, "sampler_order": [0, 1, 2, 3, 4, 5, 6], "prompt": "[The following is a chat message log between you and an extremely intelligent and knowledgeable AI system named KoboldGPT. KoboldGPT is a state-of-the-art Artificial General Intelligence. You may ask any question, or request any task, and KoboldGPT will always be able to respond accurately and truthfully.]\n\nYou: What are german shepherds?\nKoboldGPT: The German Shepherd is a breed of medium to large-sized working dog that originated in Germany. In the English language, the breed's officially recognized name is German Shepherd Dog. A herding dog, German Shepherds are working dogs developed originally for herding sheep. Since that time, however, because of their strength, intelligence, trainability, and obedience, German Shepherds around the world are often the preferred breed for many types of work, including disability assistance, search-and-rescue, police and military roles and acting.\nYou: Which is heavier, a duck or a car?\nKoboldGPT: A car weighs around 1300 kilograms, while a duck only weighs about 2 kilograms. Therefore, a car is heavier than a duck.\nYou: What is the atomic number of Calcium?\nKoboldGPT: A calcium atom has 20 protons, 20 electrons and 20 neutrons. The atomic number or proton number of a chemical element is the number of protons found in the nucleus of every atom of that element. The atomic number uniquely identifies a chemical element. It is identical to the charge number of the nucleus. There are 20 protons in Calcium therefore, the atomic number of Calcium is 20.\nYou: What is 28 + 13?\nKoboldGPT: 28 + 13 is equal to 41.\nYou: Tell me a joke.\nKoboldGPT: Why did the chicken cross the road? To get to the other side!\nYou: How is an earthquake measured?\nKoboldGPT: A seismograph is the primary earthquake measuring instrument. The seismograph produces a digital graphic recording of the ground motion caused by the seismic waves. The digital recording is called seismogram. A network of worldwide seismographs detects and measures the strength and duration of the earthquake's waves. The magnitude of an earthquake and the intensity of shaking is usually reported on the Richter scale.\n\nKoboldGPT: Hello, I am KoboldGPT, your personal AI assistant. What would you like to know?\nYou: what's 3*5?\nKoboldGPT: 3 \u00d7 5 = 15\nYou: capital of russia?\nKoboldGPT: Moscow\nYou: 4*6\nKoboldGPT:", "quiet": true}
Processing Prompt (584 / 589 tokens)ggml_new_tensor_impl: not enough space in the context's memory pool (needed 269340800, available 268435456)