Comments (5)
Do you have the full stack trace for the first issue? It may just be verbose output while it tries to figure out how to access, since GGUF are not true HF models, so it takes extra passes to identify.
from h2ogpt.
What startup should look like:
(/data/conda/h2ogpt) jon@pseudotensor:~/h2ogpt$ python generate.py --base_model=TheBloke/Mistral-7B-Instruct-v0.1-GGUF
Using Model llama
load INSTRUCTOR_Transformer
max_seq_length 512
Starting get_model: llama
i.e. it should have autodetected that model as "llama".cpp type, downloaded it after above command, then show below. It would be better for me to use CUDA_VISIBLE_DEVICES=0 python
instead to just stick to first GPU.
ggml_init_cublas: GGML_CUDA_FORCE_MMQ: yes
ggml_init_cublas: CUDA_USE_TENSOR_CORES: no
ggml_init_cublas: found 2 CUDA devices:
Device 0: NVIDIA GeForce RTX 3090 Ti, compute capability 8.6
Device 1: NVIDIA GeForce RTX 2080, compute capability 7.5
llama_model_loader: loaded meta data with 20 key-value pairs and 291 tensors from llamacpp_path/mistral-7b-instruct-v0.1.Q5_K_M.gguf (version GGUF V2)
llama_model_loader: - tensor 0: token_embd.weight q5_K [ 4096, 32000, 1, 1 ]
llama_model_loader: - tensor 1: blk.0.attn_q.weight q5_K [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 2: blk.0.attn_k.weight q5_K [ 4096, 1024, 1, 1 ]
llama_model_loader: - tensor 3: blk.0.attn_v.weight q6_K [ 4096, 1024, 1, 1 ]
llama_model_loader: - tensor 4: blk.0.attn_output.weight q5_K [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 5: blk.0.ffn_gate.weight q5_K [ 4096, 14336, 1, 1 ]
llama_model_loader: - tensor 6: blk.0.ffn_up.weight q5_K [ 4096, 14336, 1, 1 ]
llama_model_loader: - tensor 7: blk.0.ffn_down.weight q6_K [ 14336, 4096, 1, 1 ]
llama_model_loader: - tensor 8: blk.0.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 9: blk.0.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 10: blk.1.attn_q.weight q5_K [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 11: blk.1.attn_k.weight q5_K [ 4096, 1024, 1, 1 ]
llama_model_loader: - tensor 12: blk.1.attn_v.weight q6_K [ 4096, 1024, 1, 1 ]
llama_model_loader: - tensor 13: blk.1.attn_output.weight q5_K [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 14: blk.1.ffn_gate.weight q5_K [ 4096, 14336, 1, 1 ]
llama_model_loader: - tensor 15: blk.1.ffn_up.weight q5_K [ 4096, 14336, 1, 1 ]
llama_model_loader: - tensor 16: blk.1.ffn_down.weight q6_K [ 14336, 4096, 1, 1 ]
llama_model_loader: - tensor 17: blk.1.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 18: blk.1.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 19: blk.2.attn_q.weight q5_K [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 20: blk.2.attn_k.weight q5_K [ 4096, 1024, 1, 1 ]
llama_model_loader: - tensor 21: blk.2.attn_v.weight q6_K [ 4096, 1024, 1, 1 ]
llama_model_loader: - tensor 22: blk.2.attn_output.weight q5_K [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 23: blk.2.ffn_gate.weight q5_K [ 4096, 14336, 1, 1 ]
llama_model_loader: - tensor 24: blk.2.ffn_up.weight q5_K [ 4096, 14336, 1, 1 ]
llama_model_loader: - tensor 25: blk.2.ffn_down.weight q6_K [ 14336, 4096, 1, 1 ]
llama_model_loader: - tensor 26: blk.2.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 27: blk.2.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 28: blk.3.attn_q.weight q5_K [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 29: blk.3.attn_k.weight q5_K [ 4096, 1024, 1, 1 ]
llama_model_loader: - tensor 30: blk.3.attn_v.weight q6_K [ 4096, 1024, 1, 1 ]
llama_model_loader: - tensor 31: blk.3.attn_output.weight q5_K [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 32: blk.3.ffn_gate.weight q5_K [ 4096, 14336, 1, 1 ]
llama_model_loader: - tensor 33: blk.3.ffn_up.weight q5_K [ 4096, 14336, 1, 1 ]
llama_model_loader: - tensor 34: blk.3.ffn_down.weight q6_K [ 14336, 4096, 1, 1 ]
llama_model_loader: - tensor 35: blk.3.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 36: blk.3.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 37: blk.4.attn_q.weight q5_K [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 38: blk.4.attn_k.weight q5_K [ 4096, 1024, 1, 1 ]
llama_model_loader: - tensor 39: blk.4.attn_v.weight q5_K [ 4096, 1024, 1, 1 ]
llama_model_loader: - tensor 40: blk.4.attn_output.weight q5_K [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 41: blk.4.ffn_gate.weight q5_K [ 4096, 14336, 1, 1 ]
llama_model_loader: - tensor 42: blk.4.ffn_up.weight q5_K [ 4096, 14336, 1, 1 ]
llama_model_loader: - tensor 43: blk.4.ffn_down.weight q5_K [ 14336, 4096, 1, 1 ]
llama_model_loader: - tensor 44: blk.4.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 45: blk.4.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 46: blk.5.attn_q.weight q5_K [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 47: blk.5.attn_k.weight q5_K [ 4096, 1024, 1, 1 ]
llama_model_loader: - tensor 48: blk.5.attn_v.weight q5_K [ 4096, 1024, 1, 1 ]
llama_model_loader: - tensor 49: blk.5.attn_output.weight q5_K [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 50: blk.5.ffn_gate.weight q5_K [ 4096, 14336, 1, 1 ]
llama_model_loader: - tensor 51: blk.5.ffn_up.weight q5_K [ 4096, 14336, 1, 1 ]
llama_model_loader: - tensor 52: blk.5.ffn_down.weight q5_K [ 14336, 4096, 1, 1 ]
llama_model_loader: - tensor 53: blk.5.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 54: blk.5.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 55: blk.6.attn_q.weight q5_K [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 56: blk.6.attn_k.weight q5_K [ 4096, 1024, 1, 1 ]
llama_model_loader: - tensor 57: blk.6.attn_v.weight q6_K [ 4096, 1024, 1, 1 ]
llama_model_loader: - tensor 58: blk.6.attn_output.weight q5_K [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 59: blk.6.ffn_gate.weight q5_K [ 4096, 14336, 1, 1 ]
llama_model_loader: - tensor 60: blk.6.ffn_up.weight q5_K [ 4096, 14336, 1, 1 ]
llama_model_loader: - tensor 61: blk.6.ffn_down.weight q6_K [ 14336, 4096, 1, 1 ]
llama_model_loader: - tensor 62: blk.6.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 63: blk.6.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 64: blk.7.attn_q.weight q5_K [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 65: blk.7.attn_k.weight q5_K [ 4096, 1024, 1, 1 ]
llama_model_loader: - tensor 66: blk.7.attn_v.weight q5_K [ 4096, 1024, 1, 1 ]
llama_model_loader: - tensor 67: blk.7.attn_output.weight q5_K [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 68: blk.7.ffn_gate.weight q5_K [ 4096, 14336, 1, 1 ]
llama_model_loader: - tensor 69: blk.7.ffn_up.weight q5_K [ 4096, 14336, 1, 1 ]
llama_model_loader: - tensor 70: blk.7.ffn_down.weight q5_K [ 14336, 4096, 1, 1 ]
llama_model_loader: - tensor 71: blk.7.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 72: blk.7.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 73: blk.8.attn_q.weight q5_K [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 74: blk.8.attn_k.weight q5_K [ 4096, 1024, 1, 1 ]
llama_model_loader: - tensor 75: blk.8.attn_v.weight q5_K [ 4096, 1024, 1, 1 ]
llama_model_loader: - tensor 76: blk.8.attn_output.weight q5_K [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 77: blk.8.ffn_gate.weight q5_K [ 4096, 14336, 1, 1 ]
llama_model_loader: - tensor 78: blk.8.ffn_up.weight q5_K [ 4096, 14336, 1, 1 ]
llama_model_loader: - tensor 79: blk.8.ffn_down.weight q5_K [ 14336, 4096, 1, 1 ]
llama_model_loader: - tensor 80: blk.8.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 81: blk.8.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 82: blk.9.attn_q.weight q5_K [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 83: blk.9.attn_k.weight q5_K [ 4096, 1024, 1, 1 ]
llama_model_loader: - tensor 84: blk.9.attn_v.weight q6_K [ 4096, 1024, 1, 1 ]
llama_model_loader: - tensor 85: blk.9.attn_output.weight q5_K [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 86: blk.9.ffn_gate.weight q5_K [ 4096, 14336, 1, 1 ]
llama_model_loader: - tensor 87: blk.9.ffn_up.weight q5_K [ 4096, 14336, 1, 1 ]
llama_model_loader: - tensor 88: blk.9.ffn_down.weight q6_K [ 14336, 4096, 1, 1 ]
llama_model_loader: - tensor 89: blk.9.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 90: blk.9.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 91: blk.10.attn_q.weight q5_K [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 92: blk.10.attn_k.weight q5_K [ 4096, 1024, 1, 1 ]
llama_model_loader: - tensor 93: blk.10.attn_v.weight q5_K [ 4096, 1024, 1, 1 ]
llama_model_loader: - tensor 94: blk.10.attn_output.weight q5_K [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 95: blk.10.ffn_gate.weight q5_K [ 4096, 14336, 1, 1 ]
llama_model_loader: - tensor 96: blk.10.ffn_up.weight q5_K [ 4096, 14336, 1, 1 ]
llama_model_loader: - tensor 97: blk.10.ffn_down.weight q5_K [ 14336, 4096, 1, 1 ]
llama_model_loader: - tensor 98: blk.10.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 99: blk.10.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 100: blk.11.attn_q.weight q5_K [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 101: blk.11.attn_k.weight q5_K [ 4096, 1024, 1, 1 ]
llama_model_loader: - tensor 102: blk.11.attn_v.weight q5_K [ 4096, 1024, 1, 1 ]
llama_model_loader: - tensor 103: blk.11.attn_output.weight q5_K [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 104: blk.11.ffn_gate.weight q5_K [ 4096, 14336, 1, 1 ]
llama_model_loader: - tensor 105: blk.11.ffn_up.weight q5_K [ 4096, 14336, 1, 1 ]
llama_model_loader: - tensor 106: blk.11.ffn_down.weight q5_K [ 14336, 4096, 1, 1 ]
llama_model_loader: - tensor 107: blk.11.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 108: blk.11.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 109: blk.12.attn_q.weight q5_K [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 110: blk.12.attn_k.weight q5_K [ 4096, 1024, 1, 1 ]
llama_model_loader: - tensor 111: blk.12.attn_v.weight q6_K [ 4096, 1024, 1, 1 ]
llama_model_loader: - tensor 112: blk.12.attn_output.weight q5_K [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 113: blk.12.ffn_gate.weight q5_K [ 4096, 14336, 1, 1 ]
llama_model_loader: - tensor 114: blk.12.ffn_up.weight q5_K [ 4096, 14336, 1, 1 ]
llama_model_loader: - tensor 115: blk.12.ffn_down.weight q6_K [ 14336, 4096, 1, 1 ]
llama_model_loader: - tensor 116: blk.12.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 117: blk.12.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 118: blk.13.attn_q.weight q5_K [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 119: blk.13.attn_k.weight q5_K [ 4096, 1024, 1, 1 ]
llama_model_loader: - tensor 120: blk.13.attn_v.weight q5_K [ 4096, 1024, 1, 1 ]
llama_model_loader: - tensor 121: blk.13.attn_output.weight q5_K [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 122: blk.13.ffn_gate.weight q5_K [ 4096, 14336, 1, 1 ]
llama_model_loader: - tensor 123: blk.13.ffn_up.weight q5_K [ 4096, 14336, 1, 1 ]
llama_model_loader: - tensor 124: blk.13.ffn_down.weight q5_K [ 14336, 4096, 1, 1 ]
llama_model_loader: - tensor 125: blk.13.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 126: blk.13.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 127: blk.14.attn_q.weight q5_K [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 128: blk.14.attn_k.weight q5_K [ 4096, 1024, 1, 1 ]
llama_model_loader: - tensor 129: blk.14.attn_v.weight q5_K [ 4096, 1024, 1, 1 ]
llama_model_loader: - tensor 130: blk.14.attn_output.weight q5_K [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 131: blk.14.ffn_gate.weight q5_K [ 4096, 14336, 1, 1 ]
llama_model_loader: - tensor 132: blk.14.ffn_up.weight q5_K [ 4096, 14336, 1, 1 ]
llama_model_loader: - tensor 133: blk.14.ffn_down.weight q5_K [ 14336, 4096, 1, 1 ]
llama_model_loader: - tensor 134: blk.14.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 135: blk.14.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 136: blk.15.attn_q.weight q5_K [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 137: blk.15.attn_k.weight q5_K [ 4096, 1024, 1, 1 ]
llama_model_loader: - tensor 138: blk.15.attn_v.weight q6_K [ 4096, 1024, 1, 1 ]
llama_model_loader: - tensor 139: blk.15.attn_output.weight q5_K [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 140: blk.15.ffn_gate.weight q5_K [ 4096, 14336, 1, 1 ]
llama_model_loader: - tensor 141: blk.15.ffn_up.weight q5_K [ 4096, 14336, 1, 1 ]
llama_model_loader: - tensor 142: blk.15.ffn_down.weight q6_K [ 14336, 4096, 1, 1 ]
llama_model_loader: - tensor 143: blk.15.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 144: blk.15.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 145: blk.16.attn_q.weight q5_K [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 146: blk.16.attn_k.weight q5_K [ 4096, 1024, 1, 1 ]
llama_model_loader: - tensor 147: blk.16.attn_v.weight q5_K [ 4096, 1024, 1, 1 ]
llama_model_loader: - tensor 148: blk.16.attn_output.weight q5_K [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 149: blk.16.ffn_gate.weight q5_K [ 4096, 14336, 1, 1 ]
llama_model_loader: - tensor 150: blk.16.ffn_up.weight q5_K [ 4096, 14336, 1, 1 ]
llama_model_loader: - tensor 151: blk.16.ffn_down.weight q5_K [ 14336, 4096, 1, 1 ]
llama_model_loader: - tensor 152: blk.16.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 153: blk.16.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 154: blk.17.attn_q.weight q5_K [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 155: blk.17.attn_k.weight q5_K [ 4096, 1024, 1, 1 ]
llama_model_loader: - tensor 156: blk.17.attn_v.weight q5_K [ 4096, 1024, 1, 1 ]
llama_model_loader: - tensor 157: blk.17.attn_output.weight q5_K [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 158: blk.17.ffn_gate.weight q5_K [ 4096, 14336, 1, 1 ]
llama_model_loader: - tensor 159: blk.17.ffn_up.weight q5_K [ 4096, 14336, 1, 1 ]
llama_model_loader: - tensor 160: blk.17.ffn_down.weight q5_K [ 14336, 4096, 1, 1 ]
llama_model_loader: - tensor 161: blk.17.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 162: blk.17.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 163: blk.18.attn_q.weight q5_K [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 164: blk.18.attn_k.weight q5_K [ 4096, 1024, 1, 1 ]
llama_model_loader: - tensor 165: blk.18.attn_v.weight q6_K [ 4096, 1024, 1, 1 ]
llama_model_loader: - tensor 166: blk.18.attn_output.weight q5_K [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 167: blk.18.ffn_gate.weight q5_K [ 4096, 14336, 1, 1 ]
llama_model_loader: - tensor 168: blk.18.ffn_up.weight q5_K [ 4096, 14336, 1, 1 ]
llama_model_loader: - tensor 169: blk.18.ffn_down.weight q6_K [ 14336, 4096, 1, 1 ]
llama_model_loader: - tensor 170: blk.18.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 171: blk.18.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 172: blk.19.attn_q.weight q5_K [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 173: blk.19.attn_k.weight q5_K [ 4096, 1024, 1, 1 ]
llama_model_loader: - tensor 174: blk.19.attn_v.weight q5_K [ 4096, 1024, 1, 1 ]
llama_model_loader: - tensor 175: blk.19.attn_output.weight q5_K [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 176: blk.19.ffn_gate.weight q5_K [ 4096, 14336, 1, 1 ]
llama_model_loader: - tensor 177: blk.19.ffn_up.weight q5_K [ 4096, 14336, 1, 1 ]
llama_model_loader: - tensor 178: blk.19.ffn_down.weight q5_K [ 14336, 4096, 1, 1 ]
llama_model_loader: - tensor 179: blk.19.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 180: blk.19.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 181: blk.20.attn_q.weight q5_K [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 182: blk.20.attn_k.weight q5_K [ 4096, 1024, 1, 1 ]
llama_model_loader: - tensor 183: blk.20.attn_v.weight q5_K [ 4096, 1024, 1, 1 ]
llama_model_loader: - tensor 184: blk.20.attn_output.weight q5_K [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 185: blk.20.ffn_gate.weight q5_K [ 4096, 14336, 1, 1 ]
llama_model_loader: - tensor 186: blk.20.ffn_up.weight q5_K [ 4096, 14336, 1, 1 ]
llama_model_loader: - tensor 187: blk.20.ffn_down.weight q5_K [ 14336, 4096, 1, 1 ]
llama_model_loader: - tensor 188: blk.20.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 189: blk.20.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 190: blk.21.attn_q.weight q5_K [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 191: blk.21.attn_k.weight q5_K [ 4096, 1024, 1, 1 ]
llama_model_loader: - tensor 192: blk.21.attn_v.weight q6_K [ 4096, 1024, 1, 1 ]
llama_model_loader: - tensor 193: blk.21.attn_output.weight q5_K [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 194: blk.21.ffn_gate.weight q5_K [ 4096, 14336, 1, 1 ]
llama_model_loader: - tensor 195: blk.21.ffn_up.weight q5_K [ 4096, 14336, 1, 1 ]
llama_model_loader: - tensor 196: blk.21.ffn_down.weight q6_K [ 14336, 4096, 1, 1 ]
llama_model_loader: - tensor 197: blk.21.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 198: blk.21.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 199: blk.22.attn_q.weight q5_K [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 200: blk.22.attn_k.weight q5_K [ 4096, 1024, 1, 1 ]
llama_model_loader: - tensor 201: blk.22.attn_v.weight q5_K [ 4096, 1024, 1, 1 ]
llama_model_loader: - tensor 202: blk.22.attn_output.weight q5_K [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 203: blk.22.ffn_gate.weight q5_K [ 4096, 14336, 1, 1 ]
llama_model_loader: - tensor 204: blk.22.ffn_up.weight q5_K [ 4096, 14336, 1, 1 ]
llama_model_loader: - tensor 205: blk.22.ffn_down.weight q5_K [ 14336, 4096, 1, 1 ]
llama_model_loader: - tensor 206: blk.22.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 207: blk.22.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 208: blk.23.attn_q.weight q5_K [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 209: blk.23.attn_k.weight q5_K [ 4096, 1024, 1, 1 ]
llama_model_loader: - tensor 210: blk.23.attn_v.weight q5_K [ 4096, 1024, 1, 1 ]
llama_model_loader: - tensor 211: blk.23.attn_output.weight q5_K [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 212: blk.23.ffn_gate.weight q5_K [ 4096, 14336, 1, 1 ]
llama_model_loader: - tensor 213: blk.23.ffn_up.weight q5_K [ 4096, 14336, 1, 1 ]
llama_model_loader: - tensor 214: blk.23.ffn_down.weight q5_K [ 14336, 4096, 1, 1 ]
llama_model_loader: - tensor 215: blk.23.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 216: blk.23.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 217: blk.24.attn_q.weight q5_K [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 218: blk.24.attn_k.weight q5_K [ 4096, 1024, 1, 1 ]
llama_model_loader: - tensor 219: blk.24.attn_v.weight q6_K [ 4096, 1024, 1, 1 ]
llama_model_loader: - tensor 220: blk.24.attn_output.weight q5_K [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 221: blk.24.ffn_gate.weight q5_K [ 4096, 14336, 1, 1 ]
llama_model_loader: - tensor 222: blk.24.ffn_up.weight q5_K [ 4096, 14336, 1, 1 ]
llama_model_loader: - tensor 223: blk.24.ffn_down.weight q6_K [ 14336, 4096, 1, 1 ]
llama_model_loader: - tensor 224: blk.24.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 225: blk.24.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 226: blk.25.attn_q.weight q5_K [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 227: blk.25.attn_k.weight q5_K [ 4096, 1024, 1, 1 ]
llama_model_loader: - tensor 228: blk.25.attn_v.weight q5_K [ 4096, 1024, 1, 1 ]
llama_model_loader: - tensor 229: blk.25.attn_output.weight q5_K [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 230: blk.25.ffn_gate.weight q5_K [ 4096, 14336, 1, 1 ]
llama_model_loader: - tensor 231: blk.25.ffn_up.weight q5_K [ 4096, 14336, 1, 1 ]
llama_model_loader: - tensor 232: blk.25.ffn_down.weight q5_K [ 14336, 4096, 1, 1 ]
llama_model_loader: - tensor 233: blk.25.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 234: blk.25.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 235: blk.26.attn_q.weight q5_K [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 236: blk.26.attn_k.weight q5_K [ 4096, 1024, 1, 1 ]
llama_model_loader: - tensor 237: blk.26.attn_v.weight q5_K [ 4096, 1024, 1, 1 ]
llama_model_loader: - tensor 238: blk.26.attn_output.weight q5_K [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 239: blk.26.ffn_gate.weight q5_K [ 4096, 14336, 1, 1 ]
llama_model_loader: - tensor 240: blk.26.ffn_up.weight q5_K [ 4096, 14336, 1, 1 ]
llama_model_loader: - tensor 241: blk.26.ffn_down.weight q5_K [ 14336, 4096, 1, 1 ]
llama_model_loader: - tensor 242: blk.26.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 243: blk.26.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 244: blk.27.attn_q.weight q5_K [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 245: blk.27.attn_k.weight q5_K [ 4096, 1024, 1, 1 ]
llama_model_loader: - tensor 246: blk.27.attn_v.weight q6_K [ 4096, 1024, 1, 1 ]
llama_model_loader: - tensor 247: blk.27.attn_output.weight q5_K [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 248: blk.27.ffn_gate.weight q5_K [ 4096, 14336, 1, 1 ]
llama_model_loader: - tensor 249: blk.27.ffn_up.weight q5_K [ 4096, 14336, 1, 1 ]
llama_model_loader: - tensor 250: blk.27.ffn_down.weight q6_K [ 14336, 4096, 1, 1 ]
llama_model_loader: - tensor 251: blk.27.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 252: blk.27.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 253: blk.28.attn_q.weight q5_K [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 254: blk.28.attn_k.weight q5_K [ 4096, 1024, 1, 1 ]
llama_model_loader: - tensor 255: blk.28.attn_v.weight q6_K [ 4096, 1024, 1, 1 ]
llama_model_loader: - tensor 256: blk.28.attn_output.weight q5_K [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 257: blk.28.ffn_gate.weight q5_K [ 4096, 14336, 1, 1 ]
llama_model_loader: - tensor 258: blk.28.ffn_up.weight q5_K [ 4096, 14336, 1, 1 ]
llama_model_loader: - tensor 259: blk.28.ffn_down.weight q6_K [ 14336, 4096, 1, 1 ]
llama_model_loader: - tensor 260: blk.28.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 261: blk.28.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 262: blk.29.attn_q.weight q5_K [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 263: blk.29.attn_k.weight q5_K [ 4096, 1024, 1, 1 ]
llama_model_loader: - tensor 264: blk.29.attn_v.weight q6_K [ 4096, 1024, 1, 1 ]
llama_model_loader: - tensor 265: blk.29.attn_output.weight q5_K [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 266: blk.29.ffn_gate.weight q5_K [ 4096, 14336, 1, 1 ]
llama_model_loader: - tensor 267: blk.29.ffn_up.weight q5_K [ 4096, 14336, 1, 1 ]
llama_model_loader: - tensor 268: blk.29.ffn_down.weight q6_K [ 14336, 4096, 1, 1 ]
llama_model_loader: - tensor 269: blk.29.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 270: blk.29.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 271: blk.30.attn_q.weight q5_K [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 272: blk.30.attn_k.weight q5_K [ 4096, 1024, 1, 1 ]
llama_model_loader: - tensor 273: blk.30.attn_v.weight q6_K [ 4096, 1024, 1, 1 ]
llama_model_loader: - tensor 274: blk.30.attn_output.weight q5_K [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 275: blk.30.ffn_gate.weight q5_K [ 4096, 14336, 1, 1 ]
llama_model_loader: - tensor 276: blk.30.ffn_up.weight q5_K [ 4096, 14336, 1, 1 ]
llama_model_loader: - tensor 277: blk.30.ffn_down.weight q6_K [ 14336, 4096, 1, 1 ]
llama_model_loader: - tensor 278: blk.30.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 279: blk.30.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 280: blk.31.attn_q.weight q5_K [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 281: blk.31.attn_k.weight q5_K [ 4096, 1024, 1, 1 ]
llama_model_loader: - tensor 282: blk.31.attn_v.weight q6_K [ 4096, 1024, 1, 1 ]
llama_model_loader: - tensor 283: blk.31.attn_output.weight q5_K [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 284: blk.31.ffn_gate.weight q5_K [ 4096, 14336, 1, 1 ]
llama_model_loader: - tensor 285: blk.31.ffn_up.weight q5_K [ 4096, 14336, 1, 1 ]
llama_model_loader: - tensor 286: blk.31.ffn_down.weight q6_K [ 14336, 4096, 1, 1 ]
llama_model_loader: - tensor 287: blk.31.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 288: blk.31.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 289: output_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 290: output.weight q6_K [ 4096, 32000, 1, 1 ]
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv 0: general.architecture str = llama
llama_model_loader: - kv 1: general.name str = mistralai_mistral-7b-instruct-v0.1
llama_model_loader: - kv 2: llama.context_length u32 = 32768
llama_model_loader: - kv 3: llama.embedding_length u32 = 4096
llama_model_loader: - kv 4: llama.block_count u32 = 32
llama_model_loader: - kv 5: llama.feed_forward_length u32 = 14336
llama_model_loader: - kv 6: llama.rope.dimension_count u32 = 128
llama_model_loader: - kv 7: llama.attention.head_count u32 = 32
llama_model_loader: - kv 8: llama.attention.head_count_kv u32 = 8
llama_model_loader: - kv 9: llama.attention.layer_norm_rms_epsilon f32 = 0.000010
llama_model_loader: - kv 10: llama.rope.freq_base f32 = 10000.000000
llama_model_loader: - kv 11: general.file_type u32 = 17
llama_model_loader: - kv 12: tokenizer.ggml.model str = llama
llama_model_loader: - kv 13: tokenizer.ggml.tokens arr[str,32000] = ["<unk>", "<s>", "</s>", "<0x00>", "<...
llama_model_loader: - kv 14: tokenizer.ggml.scores arr[f32,32000] = [0.000000, 0.000000, 0.000000, 0.0000...
llama_model_loader: - kv 15: tokenizer.ggml.token_type arr[i32,32000] = [2, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, ...
llama_model_loader: - kv 16: tokenizer.ggml.bos_token_id u32 = 1
llama_model_loader: - kv 17: tokenizer.ggml.eos_token_id u32 = 2
llama_model_loader: - kv 18: tokenizer.ggml.unknown_token_id u32 = 0
llama_model_loader: - kv 19: general.quantization_version u32 = 2
llama_model_loader: - type f32: 65 tensors
llama_model_loader: - type q5_K: 193 tensors
llama_model_loader: - type q6_K: 33 tensors
llm_load_vocab: special tokens definition check successful ( 259/32000 ).
llm_load_print_meta: format = GGUF V2
llm_load_print_meta: arch = llama
llm_load_print_meta: vocab type = SPM
llm_load_print_meta: n_vocab = 32000
llm_load_print_meta: n_merges = 0
llm_load_print_meta: n_ctx_train = 32768
llm_load_print_meta: n_embd = 4096
llm_load_print_meta: n_head = 32
llm_load_print_meta: n_head_kv = 8
llm_load_print_meta: n_layer = 32
llm_load_print_meta: n_rot = 128
llm_load_print_meta: n_gqa = 4
llm_load_print_meta: f_norm_eps = 0.0e+00
llm_load_print_meta: f_norm_rms_eps = 1.0e-05
llm_load_print_meta: f_clamp_kqv = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: n_ff = 14336
llm_load_print_meta: n_expert = 0
llm_load_print_meta: n_expert_used = 0
llm_load_print_meta: rope scaling = linear
llm_load_print_meta: freq_base_train = 10000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_yarn_orig_ctx = 32768
llm_load_print_meta: rope_finetuned = unknown
llm_load_print_meta: model type = 7B
llm_load_print_meta: model ftype = mostly Q5_K - Medium
llm_load_print_meta: model params = 7.24 B
llm_load_print_meta: model size = 4.78 GiB (5.67 BPW)
llm_load_print_meta: general.name = mistralai_mistral-7b-instruct-v0.1
llm_load_print_meta: BOS token = 1 '<s>'
llm_load_print_meta: EOS token = 2 '</s>'
llm_load_print_meta: UNK token = 0 '<unk>'
llm_load_print_meta: LF token = 13 '<0x0A>'
llm_load_tensors: ggml ctx size = 0.12 MiB
llm_load_tensors: using CUDA for GPU acceleration
llm_load_tensors: mem required = 86.05 MiB
llm_load_tensors: offloading 32 repeating layers to GPU
llm_load_tensors: offloading non-repeating layers to GPU
llm_load_tensors: offloaded 33/33 layers to GPU
llm_load_tensors: VRAM used: 4807.05 MiB
warning: failed to mlock 90112000-byte buffer (after previously locking 0 bytes): Cannot allocate memory
Try increasing RLIMIT_MLOCK ('ulimit -l' as root).
..................................................................................................
llama_new_context_with_model: n_ctx = 2048
llama_new_context_with_model: freq_base = 10000.0
llama_new_context_with_model: freq_scale = 1
llama_new_context_with_model: KV self size = 256.00 MiB, K (f16): 128.00 MiB, V (f16): 128.00 MiB
llama_build_graph: non-view tensors processed: 676/676
llama_new_context_with_model: compute buffer total size = 315.32 MiB
llama_new_context_with_model: VRAM scratch buffer: 312.01 MiB
llama_new_context_with_model: total VRAM used: 5119.06 MiB (model: 4807.05 MiB, context: 312.01 MiB)
AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 |
Auto-detected LLaMa n_ctx=4096, will unload then reload with this setting.
Already have llamacpp_path/mistral-7b-instruct-v0.1.Q5_K_M.gguf from url https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.1-GGUF/resolve/main/mistral-7b-instruct-v0.1.Q5_K_M.gguf?download=true, delete file if invalid
llama_model_loader: loaded meta data with 20 key-value pairs and 291 tensors from llamacpp_path/mistral-7b-instruct-v0.1.Q5_K_M.gguf (version GGUF V2)
llama_model_loader: - tensor 0: token_embd.weight q5_K [ 4096, 32000, 1, 1 ]
llama_model_loader: - tensor 1: blk.0.attn_q.weight q5_K [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 2: blk.0.attn_k.weight q5_K [ 4096, 1024, 1, 1 ]
llama_model_loader: - tensor 3: blk.0.attn_v.weight q6_K [ 4096, 1024, 1, 1 ]
llama_model_loader: - tensor 4: blk.0.attn_output.weight q5_K [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 5: blk.0.ffn_gate.weight q5_K [ 4096, 14336, 1, 1 ]
llama_model_loader: - tensor 6: blk.0.ffn_up.weight q5_K [ 4096, 14336, 1, 1 ]
llama_model_loader: - tensor 7: blk.0.ffn_down.weight q6_K [ 14336, 4096, 1, 1 ]
llama_model_loader: - tensor 8: blk.0.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 9: blk.0.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 10: blk.1.attn_q.weight q5_K [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 11: blk.1.attn_k.weight q5_K [ 4096, 1024, 1, 1 ]
llama_model_loader: - tensor 12: blk.1.attn_v.weight q6_K [ 4096, 1024, 1, 1 ]
llama_model_loader: - tensor 13: blk.1.attn_output.weight q5_K [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 14: blk.1.ffn_gate.weight q5_K [ 4096, 14336, 1, 1 ]
llama_model_loader: - tensor 15: blk.1.ffn_up.weight q5_K [ 4096, 14336, 1, 1 ]
llama_model_loader: - tensor 16: blk.1.ffn_down.weight q6_K [ 14336, 4096, 1, 1 ]
llama_model_loader: - tensor 17: blk.1.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 18: blk.1.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 19: blk.2.attn_q.weight q5_K [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 20: blk.2.attn_k.weight q5_K [ 4096, 1024, 1, 1 ]
llama_model_loader: - tensor 21: blk.2.attn_v.weight q6_K [ 4096, 1024, 1, 1 ]
llama_model_loader: - tensor 22: blk.2.attn_output.weight q5_K [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 23: blk.2.ffn_gate.weight q5_K [ 4096, 14336, 1, 1 ]
llama_model_loader: - tensor 24: blk.2.ffn_up.weight q5_K [ 4096, 14336, 1, 1 ]
llama_model_loader: - tensor 25: blk.2.ffn_down.weight q6_K [ 14336, 4096, 1, 1 ]
llama_model_loader: - tensor 26: blk.2.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 27: blk.2.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 28: blk.3.attn_q.weight q5_K [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 29: blk.3.attn_k.weight q5_K [ 4096, 1024, 1, 1 ]
llama_model_loader: - tensor 30: blk.3.attn_v.weight q6_K [ 4096, 1024, 1, 1 ]
llama_model_loader: - tensor 31: blk.3.attn_output.weight q5_K [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 32: blk.3.ffn_gate.weight q5_K [ 4096, 14336, 1, 1 ]
llama_model_loader: - tensor 33: blk.3.ffn_up.weight q5_K [ 4096, 14336, 1, 1 ]
llama_model_loader: - tensor 34: blk.3.ffn_down.weight q6_K [ 14336, 4096, 1, 1 ]
llama_model_loader: - tensor 35: blk.3.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 36: blk.3.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 37: blk.4.attn_q.weight q5_K [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 38: blk.4.attn_k.weight q5_K [ 4096, 1024, 1, 1 ]
llama_model_loader: - tensor 39: blk.4.attn_v.weight q5_K [ 4096, 1024, 1, 1 ]
llama_model_loader: - tensor 40: blk.4.attn_output.weight q5_K [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 41: blk.4.ffn_gate.weight q5_K [ 4096, 14336, 1, 1 ]
llama_model_loader: - tensor 42: blk.4.ffn_up.weight q5_K [ 4096, 14336, 1, 1 ]
llama_model_loader: - tensor 43: blk.4.ffn_down.weight q5_K [ 14336, 4096, 1, 1 ]
llama_model_loader: - tensor 44: blk.4.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 45: blk.4.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 46: blk.5.attn_q.weight q5_K [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 47: blk.5.attn_k.weight q5_K [ 4096, 1024, 1, 1 ]
llama_model_loader: - tensor 48: blk.5.attn_v.weight q5_K [ 4096, 1024, 1, 1 ]
llama_model_loader: - tensor 49: blk.5.attn_output.weight q5_K [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 50: blk.5.ffn_gate.weight q5_K [ 4096, 14336, 1, 1 ]
llama_model_loader: - tensor 51: blk.5.ffn_up.weight q5_K [ 4096, 14336, 1, 1 ]
llama_model_loader: - tensor 52: blk.5.ffn_down.weight q5_K [ 14336, 4096, 1, 1 ]
llama_model_loader: - tensor 53: blk.5.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 54: blk.5.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 55: blk.6.attn_q.weight q5_K [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 56: blk.6.attn_k.weight q5_K [ 4096, 1024, 1, 1 ]
llama_model_loader: - tensor 57: blk.6.attn_v.weight q6_K [ 4096, 1024, 1, 1 ]
llama_model_loader: - tensor 58: blk.6.attn_output.weight q5_K [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 59: blk.6.ffn_gate.weight q5_K [ 4096, 14336, 1, 1 ]
llama_model_loader: - tensor 60: blk.6.ffn_up.weight q5_K [ 4096, 14336, 1, 1 ]
llama_model_loader: - tensor 61: blk.6.ffn_down.weight q6_K [ 14336, 4096, 1, 1 ]
llama_model_loader: - tensor 62: blk.6.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 63: blk.6.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 64: blk.7.attn_q.weight q5_K [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 65: blk.7.attn_k.weight q5_K [ 4096, 1024, 1, 1 ]
llama_model_loader: - tensor 66: blk.7.attn_v.weight q5_K [ 4096, 1024, 1, 1 ]
llama_model_loader: - tensor 67: blk.7.attn_output.weight q5_K [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 68: blk.7.ffn_gate.weight q5_K [ 4096, 14336, 1, 1 ]
llama_model_loader: - tensor 69: blk.7.ffn_up.weight q5_K [ 4096, 14336, 1, 1 ]
llama_model_loader: - tensor 70: blk.7.ffn_down.weight q5_K [ 14336, 4096, 1, 1 ]
llama_model_loader: - tensor 71: blk.7.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 72: blk.7.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 73: blk.8.attn_q.weight q5_K [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 74: blk.8.attn_k.weight q5_K [ 4096, 1024, 1, 1 ]
llama_model_loader: - tensor 75: blk.8.attn_v.weight q5_K [ 4096, 1024, 1, 1 ]
llama_model_loader: - tensor 76: blk.8.attn_output.weight q5_K [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 77: blk.8.ffn_gate.weight q5_K [ 4096, 14336, 1, 1 ]
llama_model_loader: - tensor 78: blk.8.ffn_up.weight q5_K [ 4096, 14336, 1, 1 ]
llama_model_loader: - tensor 79: blk.8.ffn_down.weight q5_K [ 14336, 4096, 1, 1 ]
llama_model_loader: - tensor 80: blk.8.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 81: blk.8.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 82: blk.9.attn_q.weight q5_K [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 83: blk.9.attn_k.weight q5_K [ 4096, 1024, 1, 1 ]
llama_model_loader: - tensor 84: blk.9.attn_v.weight q6_K [ 4096, 1024, 1, 1 ]
llama_model_loader: - tensor 85: blk.9.attn_output.weight q5_K [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 86: blk.9.ffn_gate.weight q5_K [ 4096, 14336, 1, 1 ]
llama_model_loader: - tensor 87: blk.9.ffn_up.weight q5_K [ 4096, 14336, 1, 1 ]
llama_model_loader: - tensor 88: blk.9.ffn_down.weight q6_K [ 14336, 4096, 1, 1 ]
llama_model_loader: - tensor 89: blk.9.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 90: blk.9.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 91: blk.10.attn_q.weight q5_K [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 92: blk.10.attn_k.weight q5_K [ 4096, 1024, 1, 1 ]
llama_model_loader: - tensor 93: blk.10.attn_v.weight q5_K [ 4096, 1024, 1, 1 ]
llama_model_loader: - tensor 94: blk.10.attn_output.weight q5_K [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 95: blk.10.ffn_gate.weight q5_K [ 4096, 14336, 1, 1 ]
llama_model_loader: - tensor 96: blk.10.ffn_up.weight q5_K [ 4096, 14336, 1, 1 ]
llama_model_loader: - tensor 97: blk.10.ffn_down.weight q5_K [ 14336, 4096, 1, 1 ]
llama_model_loader: - tensor 98: blk.10.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 99: blk.10.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 100: blk.11.attn_q.weight q5_K [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 101: blk.11.attn_k.weight q5_K [ 4096, 1024, 1, 1 ]
llama_model_loader: - tensor 102: blk.11.attn_v.weight q5_K [ 4096, 1024, 1, 1 ]
llama_model_loader: - tensor 103: blk.11.attn_output.weight q5_K [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 104: blk.11.ffn_gate.weight q5_K [ 4096, 14336, 1, 1 ]
llama_model_loader: - tensor 105: blk.11.ffn_up.weight q5_K [ 4096, 14336, 1, 1 ]
llama_model_loader: - tensor 106: blk.11.ffn_down.weight q5_K [ 14336, 4096, 1, 1 ]
llama_model_loader: - tensor 107: blk.11.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 108: blk.11.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 109: blk.12.attn_q.weight q5_K [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 110: blk.12.attn_k.weight q5_K [ 4096, 1024, 1, 1 ]
llama_model_loader: - tensor 111: blk.12.attn_v.weight q6_K [ 4096, 1024, 1, 1 ]
llama_model_loader: - tensor 112: blk.12.attn_output.weight q5_K [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 113: blk.12.ffn_gate.weight q5_K [ 4096, 14336, 1, 1 ]
llama_model_loader: - tensor 114: blk.12.ffn_up.weight q5_K [ 4096, 14336, 1, 1 ]
llama_model_loader: - tensor 115: blk.12.ffn_down.weight q6_K [ 14336, 4096, 1, 1 ]
llama_model_loader: - tensor 116: blk.12.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 117: blk.12.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 118: blk.13.attn_q.weight q5_K [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 119: blk.13.attn_k.weight q5_K [ 4096, 1024, 1, 1 ]
llama_model_loader: - tensor 120: blk.13.attn_v.weight q5_K [ 4096, 1024, 1, 1 ]
llama_model_loader: - tensor 121: blk.13.attn_output.weight q5_K [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 122: blk.13.ffn_gate.weight q5_K [ 4096, 14336, 1, 1 ]
llama_model_loader: - tensor 123: blk.13.ffn_up.weight q5_K [ 4096, 14336, 1, 1 ]
llama_model_loader: - tensor 124: blk.13.ffn_down.weight q5_K [ 14336, 4096, 1, 1 ]
llama_model_loader: - tensor 125: blk.13.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 126: blk.13.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 127: blk.14.attn_q.weight q5_K [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 128: blk.14.attn_k.weight q5_K [ 4096, 1024, 1, 1 ]
llama_model_loader: - tensor 129: blk.14.attn_v.weight q5_K [ 4096, 1024, 1, 1 ]
llama_model_loader: - tensor 130: blk.14.attn_output.weight q5_K [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 131: blk.14.ffn_gate.weight q5_K [ 4096, 14336, 1, 1 ]
llama_model_loader: - tensor 132: blk.14.ffn_up.weight q5_K [ 4096, 14336, 1, 1 ]
llama_model_loader: - tensor 133: blk.14.ffn_down.weight q5_K [ 14336, 4096, 1, 1 ]
llama_model_loader: - tensor 134: blk.14.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 135: blk.14.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 136: blk.15.attn_q.weight q5_K [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 137: blk.15.attn_k.weight q5_K [ 4096, 1024, 1, 1 ]
llama_model_loader: - tensor 138: blk.15.attn_v.weight q6_K [ 4096, 1024, 1, 1 ]
llama_model_loader: - tensor 139: blk.15.attn_output.weight q5_K [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 140: blk.15.ffn_gate.weight q5_K [ 4096, 14336, 1, 1 ]
llama_model_loader: - tensor 141: blk.15.ffn_up.weight q5_K [ 4096, 14336, 1, 1 ]
llama_model_loader: - tensor 142: blk.15.ffn_down.weight q6_K [ 14336, 4096, 1, 1 ]
llama_model_loader: - tensor 143: blk.15.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 144: blk.15.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 145: blk.16.attn_q.weight q5_K [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 146: blk.16.attn_k.weight q5_K [ 4096, 1024, 1, 1 ]
llama_model_loader: - tensor 147: blk.16.attn_v.weight q5_K [ 4096, 1024, 1, 1 ]
llama_model_loader: - tensor 148: blk.16.attn_output.weight q5_K [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 149: blk.16.ffn_gate.weight q5_K [ 4096, 14336, 1, 1 ]
llama_model_loader: - tensor 150: blk.16.ffn_up.weight q5_K [ 4096, 14336, 1, 1 ]
llama_model_loader: - tensor 151: blk.16.ffn_down.weight q5_K [ 14336, 4096, 1, 1 ]
llama_model_loader: - tensor 152: blk.16.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 153: blk.16.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 154: blk.17.attn_q.weight q5_K [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 155: blk.17.attn_k.weight q5_K [ 4096, 1024, 1, 1 ]
llama_model_loader: - tensor 156: blk.17.attn_v.weight q5_K [ 4096, 1024, 1, 1 ]
llama_model_loader: - tensor 157: blk.17.attn_output.weight q5_K [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 158: blk.17.ffn_gate.weight q5_K [ 4096, 14336, 1, 1 ]
llama_model_loader: - tensor 159: blk.17.ffn_up.weight q5_K [ 4096, 14336, 1, 1 ]
llama_model_loader: - tensor 160: blk.17.ffn_down.weight q5_K [ 14336, 4096, 1, 1 ]
llama_model_loader: - tensor 161: blk.17.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 162: blk.17.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 163: blk.18.attn_q.weight q5_K [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 164: blk.18.attn_k.weight q5_K [ 4096, 1024, 1, 1 ]
llama_model_loader: - tensor 165: blk.18.attn_v.weight q6_K [ 4096, 1024, 1, 1 ]
llama_model_loader: - tensor 166: blk.18.attn_output.weight q5_K [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 167: blk.18.ffn_gate.weight q5_K [ 4096, 14336, 1, 1 ]
llama_model_loader: - tensor 168: blk.18.ffn_up.weight q5_K [ 4096, 14336, 1, 1 ]
llama_model_loader: - tensor 169: blk.18.ffn_down.weight q6_K [ 14336, 4096, 1, 1 ]
llama_model_loader: - tensor 170: blk.18.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 171: blk.18.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 172: blk.19.attn_q.weight q5_K [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 173: blk.19.attn_k.weight q5_K [ 4096, 1024, 1, 1 ]
llama_model_loader: - tensor 174: blk.19.attn_v.weight q5_K [ 4096, 1024, 1, 1 ]
llama_model_loader: - tensor 175: blk.19.attn_output.weight q5_K [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 176: blk.19.ffn_gate.weight q5_K [ 4096, 14336, 1, 1 ]
llama_model_loader: - tensor 177: blk.19.ffn_up.weight q5_K [ 4096, 14336, 1, 1 ]
llama_model_loader: - tensor 178: blk.19.ffn_down.weight q5_K [ 14336, 4096, 1, 1 ]
llama_model_loader: - tensor 179: blk.19.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 180: blk.19.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 181: blk.20.attn_q.weight q5_K [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 182: blk.20.attn_k.weight q5_K [ 4096, 1024, 1, 1 ]
llama_model_loader: - tensor 183: blk.20.attn_v.weight q5_K [ 4096, 1024, 1, 1 ]
llama_model_loader: - tensor 184: blk.20.attn_output.weight q5_K [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 185: blk.20.ffn_gate.weight q5_K [ 4096, 14336, 1, 1 ]
llama_model_loader: - tensor 186: blk.20.ffn_up.weight q5_K [ 4096, 14336, 1, 1 ]
llama_model_loader: - tensor 187: blk.20.ffn_down.weight q5_K [ 14336, 4096, 1, 1 ]
llama_model_loader: - tensor 188: blk.20.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 189: blk.20.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 190: blk.21.attn_q.weight q5_K [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 191: blk.21.attn_k.weight q5_K [ 4096, 1024, 1, 1 ]
llama_model_loader: - tensor 192: blk.21.attn_v.weight q6_K [ 4096, 1024, 1, 1 ]
llama_model_loader: - tensor 193: blk.21.attn_output.weight q5_K [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 194: blk.21.ffn_gate.weight q5_K [ 4096, 14336, 1, 1 ]
llama_model_loader: - tensor 195: blk.21.ffn_up.weight q5_K [ 4096, 14336, 1, 1 ]
llama_model_loader: - tensor 196: blk.21.ffn_down.weight q6_K [ 14336, 4096, 1, 1 ]
llama_model_loader: - tensor 197: blk.21.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 198: blk.21.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 199: blk.22.attn_q.weight q5_K [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 200: blk.22.attn_k.weight q5_K [ 4096, 1024, 1, 1 ]
llama_model_loader: - tensor 201: blk.22.attn_v.weight q5_K [ 4096, 1024, 1, 1 ]
llama_model_loader: - tensor 202: blk.22.attn_output.weight q5_K [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 203: blk.22.ffn_gate.weight q5_K [ 4096, 14336, 1, 1 ]
llama_model_loader: - tensor 204: blk.22.ffn_up.weight q5_K [ 4096, 14336, 1, 1 ]
llama_model_loader: - tensor 205: blk.22.ffn_down.weight q5_K [ 14336, 4096, 1, 1 ]
llama_model_loader: - tensor 206: blk.22.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 207: blk.22.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 208: blk.23.attn_q.weight q5_K [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 209: blk.23.attn_k.weight q5_K [ 4096, 1024, 1, 1 ]
llama_model_loader: - tensor 210: blk.23.attn_v.weight q5_K [ 4096, 1024, 1, 1 ]
llama_model_loader: - tensor 211: blk.23.attn_output.weight q5_K [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 212: blk.23.ffn_gate.weight q5_K [ 4096, 14336, 1, 1 ]
llama_model_loader: - tensor 213: blk.23.ffn_up.weight q5_K [ 4096, 14336, 1, 1 ]
llama_model_loader: - tensor 214: blk.23.ffn_down.weight q5_K [ 14336, 4096, 1, 1 ]
llama_model_loader: - tensor 215: blk.23.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 216: blk.23.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 217: blk.24.attn_q.weight q5_K [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 218: blk.24.attn_k.weight q5_K [ 4096, 1024, 1, 1 ]
llama_model_loader: - tensor 219: blk.24.attn_v.weight q6_K [ 4096, 1024, 1, 1 ]
llama_model_loader: - tensor 220: blk.24.attn_output.weight q5_K [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 221: blk.24.ffn_gate.weight q5_K [ 4096, 14336, 1, 1 ]
llama_model_loader: - tensor 222: blk.24.ffn_up.weight q5_K [ 4096, 14336, 1, 1 ]
llama_model_loader: - tensor 223: blk.24.ffn_down.weight q6_K [ 14336, 4096, 1, 1 ]
llama_model_loader: - tensor 224: blk.24.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 225: blk.24.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 226: blk.25.attn_q.weight q5_K [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 227: blk.25.attn_k.weight q5_K [ 4096, 1024, 1, 1 ]
llama_model_loader: - tensor 228: blk.25.attn_v.weight q5_K [ 4096, 1024, 1, 1 ]
llama_model_loader: - tensor 229: blk.25.attn_output.weight q5_K [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 230: blk.25.ffn_gate.weight q5_K [ 4096, 14336, 1, 1 ]
llama_model_loader: - tensor 231: blk.25.ffn_up.weight q5_K [ 4096, 14336, 1, 1 ]
llama_model_loader: - tensor 232: blk.25.ffn_down.weight q5_K [ 14336, 4096, 1, 1 ]
llama_model_loader: - tensor 233: blk.25.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 234: blk.25.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 235: blk.26.attn_q.weight q5_K [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 236: blk.26.attn_k.weight q5_K [ 4096, 1024, 1, 1 ]
llama_model_loader: - tensor 237: blk.26.attn_v.weight q5_K [ 4096, 1024, 1, 1 ]
llama_model_loader: - tensor 238: blk.26.attn_output.weight q5_K [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 239: blk.26.ffn_gate.weight q5_K [ 4096, 14336, 1, 1 ]
llama_model_loader: - tensor 240: blk.26.ffn_up.weight q5_K [ 4096, 14336, 1, 1 ]
llama_model_loader: - tensor 241: blk.26.ffn_down.weight q5_K [ 14336, 4096, 1, 1 ]
llama_model_loader: - tensor 242: blk.26.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 243: blk.26.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 244: blk.27.attn_q.weight q5_K [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 245: blk.27.attn_k.weight q5_K [ 4096, 1024, 1, 1 ]
llama_model_loader: - tensor 246: blk.27.attn_v.weight q6_K [ 4096, 1024, 1, 1 ]
llama_model_loader: - tensor 247: blk.27.attn_output.weight q5_K [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 248: blk.27.ffn_gate.weight q5_K [ 4096, 14336, 1, 1 ]
llama_model_loader: - tensor 249: blk.27.ffn_up.weight q5_K [ 4096, 14336, 1, 1 ]
llama_model_loader: - tensor 250: blk.27.ffn_down.weight q6_K [ 14336, 4096, 1, 1 ]
llama_model_loader: - tensor 251: blk.27.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 252: blk.27.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 253: blk.28.attn_q.weight q5_K [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 254: blk.28.attn_k.weight q5_K [ 4096, 1024, 1, 1 ]
llama_model_loader: - tensor 255: blk.28.attn_v.weight q6_K [ 4096, 1024, 1, 1 ]
llama_model_loader: - tensor 256: blk.28.attn_output.weight q5_K [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 257: blk.28.ffn_gate.weight q5_K [ 4096, 14336, 1, 1 ]
llama_model_loader: - tensor 258: blk.28.ffn_up.weight q5_K [ 4096, 14336, 1, 1 ]
llama_model_loader: - tensor 259: blk.28.ffn_down.weight q6_K [ 14336, 4096, 1, 1 ]
llama_model_loader: - tensor 260: blk.28.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 261: blk.28.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 262: blk.29.attn_q.weight q5_K [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 263: blk.29.attn_k.weight q5_K [ 4096, 1024, 1, 1 ]
llama_model_loader: - tensor 264: blk.29.attn_v.weight q6_K [ 4096, 1024, 1, 1 ]
llama_model_loader: - tensor 265: blk.29.attn_output.weight q5_K [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 266: blk.29.ffn_gate.weight q5_K [ 4096, 14336, 1, 1 ]
llama_model_loader: - tensor 267: blk.29.ffn_up.weight q5_K [ 4096, 14336, 1, 1 ]
llama_model_loader: - tensor 268: blk.29.ffn_down.weight q6_K [ 14336, 4096, 1, 1 ]
llama_model_loader: - tensor 269: blk.29.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 270: blk.29.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 271: blk.30.attn_q.weight q5_K [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 272: blk.30.attn_k.weight q5_K [ 4096, 1024, 1, 1 ]
llama_model_loader: - tensor 273: blk.30.attn_v.weight q6_K [ 4096, 1024, 1, 1 ]
llama_model_loader: - tensor 274: blk.30.attn_output.weight q5_K [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 275: blk.30.ffn_gate.weight q5_K [ 4096, 14336, 1, 1 ]
llama_model_loader: - tensor 276: blk.30.ffn_up.weight q5_K [ 4096, 14336, 1, 1 ]
llama_model_loader: - tensor 277: blk.30.ffn_down.weight q6_K [ 14336, 4096, 1, 1 ]
llama_model_loader: - tensor 278: blk.30.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 279: blk.30.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 280: blk.31.attn_q.weight q5_K [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 281: blk.31.attn_k.weight q5_K [ 4096, 1024, 1, 1 ]
llama_model_loader: - tensor 282: blk.31.attn_v.weight q6_K [ 4096, 1024, 1, 1 ]
llama_model_loader: - tensor 283: blk.31.attn_output.weight q5_K [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 284: blk.31.ffn_gate.weight q5_K [ 4096, 14336, 1, 1 ]
llama_model_loader: - tensor 285: blk.31.ffn_up.weight q5_K [ 4096, 14336, 1, 1 ]
llama_model_loader: - tensor 286: blk.31.ffn_down.weight q6_K [ 14336, 4096, 1, 1 ]
llama_model_loader: - tensor 287: blk.31.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 288: blk.31.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 289: output_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 290: output.weight q6_K [ 4096, 32000, 1, 1 ]
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv 0: general.architecture str = llama
llama_model_loader: - kv 1: general.name str = mistralai_mistral-7b-instruct-v0.1
llama_model_loader: - kv 2: llama.context_length u32 = 32768
llama_model_loader: - kv 3: llama.embedding_length u32 = 4096
llama_model_loader: - kv 4: llama.block_count u32 = 32
llama_model_loader: - kv 5: llama.feed_forward_length u32 = 14336
llama_model_loader: - kv 6: llama.rope.dimension_count u32 = 128
llama_model_loader: - kv 7: llama.attention.head_count u32 = 32
llama_model_loader: - kv 8: llama.attention.head_count_kv u32 = 8
llama_model_loader: - kv 9: llama.attention.layer_norm_rms_epsilon f32 = 0.000010
llama_model_loader: - kv 10: llama.rope.freq_base f32 = 10000.000000
llama_model_loader: - kv 11: general.file_type u32 = 17
llama_model_loader: - kv 12: tokenizer.ggml.model str = llama
llama_model_loader: - kv 13: tokenizer.ggml.tokens arr[str,32000] = ["<unk>", "<s>", "</s>", "<0x00>", "<...
llama_model_loader: - kv 14: tokenizer.ggml.scores arr[f32,32000] = [0.000000, 0.000000, 0.000000, 0.0000...
llama_model_loader: - kv 15: tokenizer.ggml.token_type arr[i32,32000] = [2, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, ...
llama_model_loader: - kv 16: tokenizer.ggml.bos_token_id u32 = 1
llama_model_loader: - kv 17: tokenizer.ggml.eos_token_id u32 = 2
llama_model_loader: - kv 18: tokenizer.ggml.unknown_token_id u32 = 0
llama_model_loader: - kv 19: general.quantization_version u32 = 2
llama_model_loader: - type f32: 65 tensors
llama_model_loader: - type q5_K: 193 tensors
llama_model_loader: - type q6_K: 33 tensors
llm_load_vocab: special tokens definition check successful ( 259/32000 ).
llm_load_print_meta: format = GGUF V2
llm_load_print_meta: arch = llama
llm_load_print_meta: vocab type = SPM
llm_load_print_meta: n_vocab = 32000
llm_load_print_meta: n_merges = 0
llm_load_print_meta: n_ctx_train = 32768
llm_load_print_meta: n_embd = 4096
llm_load_print_meta: n_head = 32
llm_load_print_meta: n_head_kv = 8
llm_load_print_meta: n_layer = 32
llm_load_print_meta: n_rot = 128
llm_load_print_meta: n_gqa = 4
llm_load_print_meta: f_norm_eps = 0.0e+00
llm_load_print_meta: f_norm_rms_eps = 1.0e-05
llm_load_print_meta: f_clamp_kqv = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: n_ff = 14336
llm_load_print_meta: n_expert = 0
llm_load_print_meta: n_expert_used = 0
llm_load_print_meta: rope scaling = linear
llm_load_print_meta: freq_base_train = 10000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_yarn_orig_ctx = 32768
llm_load_print_meta: rope_finetuned = unknown
llm_load_print_meta: model type = 7B
llm_load_print_meta: model ftype = mostly Q5_K - Medium
llm_load_print_meta: model params = 7.24 B
llm_load_print_meta: model size = 4.78 GiB (5.67 BPW)
llm_load_print_meta: general.name = mistralai_mistral-7b-instruct-v0.1
llm_load_print_meta: BOS token = 1 '<s>'
llm_load_print_meta: EOS token = 2 '</s>'
llm_load_print_meta: UNK token = 0 '<unk>'
llm_load_print_meta: LF token = 13 '<0x0A>'
llm_load_tensors: ggml ctx size = 0.12 MiB
llm_load_tensors: using CUDA for GPU acceleration
llm_load_tensors: mem required = 86.05 MiB
llm_load_tensors: offloading 32 repeating layers to GPU
llm_load_tensors: offloading non-repeating layers to GPU
llm_load_tensors: offloaded 33/33 layers to GPU
llm_load_tensors: VRAM used: 4807.05 MiB
warning: failed to mlock 90112000-byte buffer (after previously locking 0 bytes): Cannot allocate memory
Try increasing RLIMIT_MLOCK ('ulimit -l' as root).
..................................................................................................
llama_new_context_with_model: n_ctx = 4096
llama_new_context_with_model: freq_base = 10000.0
llama_new_context_with_model: freq_scale = 1
llama_new_context_with_model: KV self size = 512.00 MiB, K (f16): 256.00 MiB, V (f16): 256.00 MiB
llama_build_graph: non-view tensors processed: 676/676
llama_new_context_with_model: compute buffer total size = 579.32 MiB
llama_new_context_with_model: VRAM scratch buffer: 576.01 MiB
llama_new_context_with_model: total VRAM used: 5383.06 MiB (model: 4807.05 MiB, context: 576.01 MiB)
AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 |
Model {'base_model': 'llama', 'base_model0': 'llama', 'tokenizer_base_model': '', 'lora_weights': '', 'inference_server': '', 'prompt_type': 'mistral', 'prompt_dict': {'promptA': '', 'promptB': '', 'PreInstruct': '<s>[INST] ', 'PreInput': None, 'PreResponse': '[/INST]', 'terminate_response': ['[INST]', '</s>'], 'chat_sep': ' ', 'chat_turn_sep': '</s> ', 'humanstr': '[INST]', 'botstr': '[/INST]', 'generates_leading_space': False, 'system_prompt': '', 'can_handle_system_prompt': False}, 'visible_models': None, 'h2ogpt_key': None, 'load_8bit': False, 'load_4bit': False, 'low_bit_mode': 1, 'load_half': True, 'use_flash_attention_2': False, 'load_gptq': '', 'load_awq': '', 'load_exllama': False, 'use_safetensors': False, 'revision': None, 'use_gpu_id': True, 'gpu_id': 0, 'compile_model': None, 'use_cache': None, 'llamacpp_dict': {'n_gpu_layers': 100, 'use_mlock': True, 'n_batch': 1024, 'n_gqa': 0, 'model_path_llama': 'https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.1-GGUF/resolve/main/mistral-7b-instruct-v0.1.Q5_K_M.gguf?download=true', 'model_name_gptj': '', 'model_name_gpt4all_llama': '', 'model_name_exllama_if_no_config': ''}, 'rope_scaling': {}, 'max_seq_len': 4096, 'exllama_dict': {}, 'gptq_dict': {}, 'attention_sinks': False, 'sink_dict': {}, 'truncation_generation': False, 'hf_model_dict': {}}
Starting get_model: OpenAssistant/reward-model-deberta-v3-large-v2
device_map: {'': 1}
Begin auto-detect HF cache text generation models
No loading model philschmid/bart-large-cnn-samsum because is_encoder_decoder=True
WARNING:transformers_modules.tiiuae.falcon-40b-instruct.ecb78d97ac356d098e79f0db222c9ce7c5d9ee5f.configuration_falcon:
WARNING: You are currently loading Falcon using legacy code contained in the model repository. Falcon has now been fully ported into the Hugging Face transformers library. For the most up-to-date and high-performance version of the Falcon model code, please update to the latest version of transformers and then load the model without the trust_remote_code=True argument.
No loading model openai/whisper-medium because is_encoder_decoder=True
No loading model Systran/faster-whisper-large-v3 because is_encoder_decoder=True
No loading model Salesforce/blip2-flan-t5-xxl because is_encoder_decoder=True
No loading model microsoft/speecht5_vc because is_encoder_decoder=True
No loading model philschmid/flan-t5-base-samsum because is_encoder_decoder=True
No loading model stabilityai/sdxl-turbo because stabilityai/sdxl-turbo does not appear to have a file named config.json. Checkout 'https://huggingface.co/stabilityai/sdxl-turbo/main' for available files.
No loading model openai/whisper-base.en because is_encoder_decoder=True
No loading model distil-whisper/distil-large-v2 because is_encoder_decoder=True
No loading model unstructuredio/detectron2_faster_rcnn_R_50_FPN_3x because unstructuredio/detectron2_faster_rcnn_R_50_FPN_3x does not appear to have a file named config.json. Checkout 'https://huggingface.co/unstructuredio/detectron2_faster_rcnn_R_50_FPN_3x/main' for available files.
/home/jon/.cache/huggingface/modules/transformers_modules/mosaicml/mpt-7b/5ee38ad4c3d81636c37388a49cbfe7711ad6909d/configuration_mpt.py:97: UserWarning: alibi is turned on, setting `learned_pos_emb` to `False.`
warnings.warn(f'alibi is turned on, setting `learned_pos_emb` to `False.`')
No loading model microsoft/speecht5_hifigan because 'hifigan'
No loading model h2oai/ggml because h2oai/ggml does not appear to have a file named config.json. Checkout 'https://huggingface.co/h2oai/ggml/main' for available files.
No loading model tloen/alpaca-lora-7b because tloen/alpaca-lora-7b does not appear to have a file named config.json. Checkout 'https://huggingface.co/tloen/alpaca-lora-7b/main' for available files.
No loading model Salesforce/blip2-flan-t5-xl because is_encoder_decoder=True
/home/jon/.cache/huggingface/modules/transformers_modules/mosaicml/mpt-30b-chat/1709f47cf96f0ea071e57d1c7bd0fadccb5dacfe/configuration_mpt.py:97: UserWarning: alibi is turned on, setting `learned_pos_emb` to `False.`
warnings.warn(f'alibi is turned on, setting `learned_pos_emb` to `False.`')
No loading model playgroundai/playground-v2-1024px-aesthetic because playgroundai/playground-v2-1024px-aesthetic does not appear to have a file named config.json. Checkout 'https://huggingface.co/playgroundai/playground-v2-1024px-aesthetic/main' for available files.
No loading model microsoft/speecht5_tts because is_encoder_decoder=True
No loading model google/pix2struct-textcaps-base because is_encoder_decoder=True
/home/jon/.cache/huggingface/modules/transformers_modules/mosaicml/mpt-30b-instruct/56bcbea5361d8381c297ca51c02ee5b6f0415cb4/configuration_mpt.py:97: UserWarning: alibi is turned on, setting `learned_pos_emb` to `False.`
warnings.warn(f'alibi is turned on, setting `learned_pos_emb` to `False.`')
No loading model openai/whisper-large-v3 because is_encoder_decoder=True
End auto-detect HF cache text generation models
Begin auto-detect llama.cpp models
End auto-detect llama.cpp models
Running on local URL: http://0.0.0.0:7861
To create a public link, set `share=True` in `launch()`.
Started Gradio Server and/or GUI: server_name: localhost port: None
Use local URL: http://localhost:7861/
/data/conda/h2ogpt/lib/python3.10/site-packages/pydantic/_internal/_fields.py:149: UserWarning: Field "model_name" has conflict with protected namespace "model_".
You may be able to resolve this warning by setting `model_config['protected_namespaces'] = ()`.
warnings.warn(
/data/conda/h2ogpt/lib/python3.10/site-packages/pydantic/_internal/_fields.py:149: UserWarning: Field "model_names" has conflict with protected namespace "model_".
You may be able to resolve this warning by setting `model_config['protected_namespaces'] = ()`.
warnings.warn(
OpenAI API URL: http://0.0.0.0:5000
INFO:__name__:OpenAI API URL: http://0.0.0.0:5000
OpenAI API key: EMPTY
INFO:__name__:OpenAI API key: EMPTY
Then do:
from h2ogpt.
As title suggested, I used clickonce file to start the app, not the CL. Here is the trace from my Terminal window on start:/Users/xxxx/Downloads/h2ogpt-osx-m1-gpu ; exit;
xxxx% /Users/xxxx/Downloads/h2ogpt-osx-m1-gpu ; exit;
file: /var/folders/mk/wbc3qtg57m33g06t4nh4r1zc0000gn/T/_MEIYgo7DZ/mac_run_app.py
PYTHONPATH: /var/folders/mk/wbc3qtg57m33g06t4nh4r1zc0000gn/T/_MEIYgo7DZ
Path_1: /var/folders/mk/wbc3qtg57m33g06t4nh4r1zc0000gn/T/_MEIYgo7DZ
NLTK_DATA: /var/folders/mk/wbc3qtg57m33g06t4nh4r1zc0000gn/T/_MEIYgo7DZ/nltk_data
PATH: /opt/homebrew/opt/python@3/libexec/bin:/Users/pleng/miniconda3/bin:/opt/homebrew/bin:/opt/homebrew/sbin:/usr/local/bin:/System/Cryptexes/App/usr/bin:/usr/bin:/bin:/usr/sbin:/sbin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/local/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/appleinternal/bin:/var/folders/mk/wbc3qtg57m33g06t4nh4r1zc0000gn/T/_MEIYgo7DZ/poppler/bin/:/var/folders/mk/wbc3qtg57m33g06t4nh4r1zc0000gn/T/_MEIYgo7DZ/poppler/lib/:/var/folders/mk/wbc3qtg57m33g06t4nh4r1zc0000gn/T/_MEIYgo7DZ/Tesseract-OCR
Path_3: /var/folders/mk/wbc3qtg57m33g06t4nh4r1zc0000gn/T/_MEIYgo7DZ/h2ogpt/src
Path_3: /var/folders/mk/wbc3qtg57m33g06t4nh4r1zc0000gn/T/_MEIYgo7DZ/h2ogpt/iterators
Path_3: /var/folders/mk/wbc3qtg57m33g06t4nh4r1zc0000gn/T/_MEIYgo7DZ/h2ogpt/gradio_utils
Path_3: /var/folders/mk/wbc3qtg57m33g06t4nh4r1zc0000gn/T/_MEIYgo7DZ/h2ogpt/metrics
Path_3: /var/folders/mk/wbc3qtg57m33g06t4nh4r1zc0000gn/T/_MEIYgo7DZ/h2ogpt/models
Path_3: /var/folders/mk/wbc3qtg57m33g06t4nh4r1zc0000gn/T/_MEIYgo7DZ/h2ogpt/.
gradio/components/dropdown.py:231: UserWarning: The value passed into gr.Dropdown() is not in the list of choices. Please update the list of choices to include: None or set allow_custom_value=True.
warnings.warn(
favicon_path1=h2o-logo.svg not found
favicon_path2: h2o-logo.svg not found in /var/folders/mk/wbc3qtg57m33g06t4nh4r1zc0000gn/T/_MEIYgo7DZ/h2ogpt/src
favicon_path3: h2o-logo.svg not found in /var/folders/mk/wbc3qtg57m33g06t4nh4r1zc0000gn/T/_MEIYgo7DZ/h2ogpt
Running on local URL: http://0.0.0.0:7860
from h2ogpt.
Here is the trace for loading model error:
Starting get_model: TheBloke/Mistral-7B-Instruct-v0.1-GGUF
Overriding max_seq_len -> 2048
Traceback (most recent call last):
File "gradio/queueing.py", line 407, in call_prediction
output = await route_utils.call_process_api(
File "gradio/route_utils.py", line 226, in call_process_api
output = await app.get_blocks().process_api(
File "gradio/blocks.py", line 1550, in process_api
result = await self.call_function(
File "gradio/blocks.py", line 1185, in call_function
prediction = await anyio.to_thread.run_sync(
File "anyio/to_thread.py", line 33, in run_sync
File "anyio/_backends/_asyncio.py", line 877, in run_sync_in_worker_thread
File "anyio/_backends/_asyncio.py", line 807, in run
File "gradio/utils.py", line 661, in wrapper
response = f(*args, **kwargs)
File "/var/folders/mk/wbc3qtg57m33g06t4nh4r1zc0000gn/T/_MEIYgo7DZ/h2ogpt/src/gradio_runner.py", line 3975, in load_model
model1, tokenizer1, device1 = get_model(reward_type=False,
File "/var/folders/mk/wbc3qtg57m33g06t4nh4r1zc0000gn/T/_MEIYgo7DZ/h2ogpt/src/gen.py", line 1805, in get_model
tokenizer = tokenizer_loader.from_pretrained(tokenizer_base_model, **tokenizer_kwargs)
File "transformers/models/auto/tokenization_auto.py", line 769, in from_pretrained
File "transformers/tokenization_utils_base.py", line 2001, in from_pretrained
OSError: Can't load tokenizer for 'TheBloke/Mistral-7B-Instruct-v0.1-GGUF'. If you were trying to load it from 'https://huggingface.co/models', make sure you don't have a local directory with the same name. Otherwise, make sure 'TheBloke/Mistral-7B-Instruct-v0.1-GGUF' is the correct path to a directory containing all relevant files for a LlamaTokenizerFast tokenizer.
from h2ogpt.
For 'TheBloke/Mistral-7B-Instruct-v0.1-GGUF
model it shouldn't be reaching get_hf_model
For now I recommend you use manual install for mac.
@Mathanraj-Sharma Please rebuild the one-click mac installer, I think it's too out of date. Thanks!
from h2ogpt.
Related Issues (20)
- How to make a chat-only interface without options or buttons on h2ogpt? HOT 1
- ValueError: Got a larger chunk overlap (0) than chunk size (-160), should be smaller. HOT 1
- --temperature and --top_p ignored when using inference server HOT 3
- consider beacon-like approach (bm_25 in model itself, used to guide model compression of activations)
- Is there a h2ogpt Android version? HOT 1
- How do I make h2ogpt fast in responding? Do the device’s resources have anything to do with the slowness of the response? HOT 7
- How can I make the h2ogpt interface the same as the interface in the picture? HOT 1
- How can I display the h2ogpt interface, the same interface shown in the picture?
- How can I display the h2ogpt interface, the same interface shown in the picture? HOT 7
- gpt4all: Request failed: HTTP 404 Not Found (type=value_error) HOT 3
- Is there a way to build an Arabic and English NLP model? HOT 3
- How are models locked? HOT 3
- Error AWQ kernels could not be loaded. Please install them from https://github.com/casper-hansen/AutoAWQ_kernels HOT 2
- UnicodeEncodeError: 'charmap' codec can't encode character '\u1ed9' in position 84: character maps to <undefined>
- What is the difference between H2O LLM Studio & h2ogpt ...What do we benefit from h2o.ai llm studio? HOT 1
- CUDA out of memory error when querying All documents HOT 3
- AVG flags installer as EvoGen malware HOT 1
- How do I make h2ogpt follow a link and answer based on the content or the specific question, the answer, and the information specified in the signed link without increasing or decreasing? HOT 1
- How to use Model: gpt-3.5-turbo-0613 in my local h2ogpt? HOT 1
- Can you explain this addition in the new version... How is it used Vision Models? HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from h2ogpt.