yoshoku / llama_cpp.rb Goto Github PK

View Code? Open in Web Editor NEW

144.0 10.0 13.0 8.1 MB

llama_cpp provides Ruby bindings for llama.cpp

Home Page: https://rubygems.org/gems/llama_cpp

License: MIT License

Ruby 24.19% Shell 0.11% C++ 75.67% JavaScript 0.04%

gem llm ruby ai llama

llama_cpp.rb's Introduction

llama_cpp.rb

llama_cpp.rb provides Ruby bindings for the llama.cpp.

This gem is still under development and may undergo many changes in the future.

Installation

Install the gem and add to the application's Gemfile by executing:

$ bundle add llama_cpp

If bundler is not being used to manage dependencies, install the gem by executing:

$ gem install llama_cpp

There are several installation options:

# use OpenBLAS
$ gem install llama_cpp -- --with-openblas

# use CUDA
$ gem install llama_cpp -- --with-cuda

Those options are defined in extconf.rb by with_config method.

Usage

Prepare the quantized model by refering to the usage section on the llama.cpp README. For example, preparing the quatization model based on open_llama_7b is as follows:

$ cd ~/
$ brew install git-lfs
$ git lfs install
$ git clone https://github.com/ggerganov/llama.cpp.git
$ cd llama.cpp
$ python3 -m pip install -r requirements.txt
$ cd models
$ git clone https://huggingface.co/openlm-research/open_llama_7b
$ cd ../
$ python3 convert.py models/open_llama_7b
$ make
$ ./quantize ./models/open_llama_7b/ggml-model-f16.gguf ./models/open_llama_7b/ggml-model-q4_0.bin q4_0

An example of Ruby code that generates sentences with the quantization model is as follows:

require 'llama_cpp'

model_params = LLaMACpp::ModelParams.new
model = LLaMACpp::Model.new(model_path: '/home/user/llama.cpp/models/open_llama_7b/ggml-model-q4_0.bin', params: model_params)

context_params = LLaMACpp::ContextParams.new
context_params.seed = 42
context = LLaMACpp::Context.new(model: model, params: context_params)

puts LLaMACpp.generate(context, 'Hello, World.')

Examples

There is a sample program in the examples directory that allow interactvie communication like ChatGPT.

$ git clone https://github.com/yoshoku/llama_cpp.rb.git
$ cd examples
$ bundle install
$ ruby chat.rb --model /home/user/llama.cpp/models/open_llama_7b/ggml-model-q4_0.bin --seed 2023
...
User: Who is the originator of the Ruby programming language?
Bob: The originator of the Ruby programming language is Mr. Yukihiro Matsumoto.
User:

Japanse chat is also possible using the Vicuna model on Hugging Face.

$ wget https://huggingface.co/CRD716/ggml-vicuna-1.1-quantized/resolve/main/ggml-vicuna-7b-1.1-q4_0.bin
$ ruby chat.rb --model ggml-vicuna-7b-1.1-q4_0.bin --file prompt_jp.txt

Contributing

Bug reports and pull requests are welcome on GitHub at https://github.com/yoshoku/llama_cpp.rb. This project is intended to be a safe, welcoming space for collaboration, and contributors are expected to adhere to the code of conduct.

License

The gem is available as open source under the terms of the MIT License.

Code of Conduct

Everyone interacting in the LlamaCpp project's codebases, issue trackers, chat rooms and mailing lists is expected to follow the code of conduct.

llama_cpp.rb's People

Contributors

Stargazers

Watchers

Forkers

holycrypto auscaster artero jeremedia jedld nullstyle keyasuda bravexone777 afolabiolaoluwa claytonpassmore atout-agile jahfer john1228

llama_cpp.rb's Issues

Generating embeddings from LLM

I don't think we can call "@client.embeddings" on Llamacpp.

I'll create PR if I can get to it but I thought it's important enough to mention.

Expose more llama.cpp options

I've been testing https://huggingface.co/TheBloke/wizardLM-7B-GGML , and the example code is:

./main -t 10 -ngl 32 -m wizardLM-7B.ggmlv3.q5_0.bin --color -c 2048 --temp 0.7 --repeat_penalty 1.1 -n -1 -p "### Instruction: Write a story about llamas\n### Response:"

A few of these options (and others) would be useful to expose:

temp
repeat_penalty
top-k
top-n

Chat.rb example: Crash when prompt exceeds context?

When the position n_past and the upcoming embedding exceeds the context size there is a bug in the code:

if n_past + embd.size > n_ctx
  ...
  embd.insert(0, last_n_tokens[(n_ctx - (n_left / 2) - embd.size)...-embd.size])

https://github.com/yoshoku/llama_cpp.rb/blob/97224779aff9923f357f0ad141604c1d3fbfff56/examples/chat.rb#L68C21-L68C21

Inserting like this will insert the sub-range at the position 0 as a new element rather than inserting the elements.

I tried using splat as follows:

  embd.insert(0, *last_n_tokens[(n_ctx - (n_left / 2) - embd.size)...-embd.size])

but this makes the GGML code crash:

GGML_ASSERT: ./src/ggml.c:4785: view_src == NULL || data_size + view_offs <= ggml_nbytes(view_src)

How to ensure CUDA is working?

I installed the gem using --cublas, but don't see any difference when running. Is there a way to know that cublas is being used?

Apple Metal support

I believe adding the metal cpp files from "LLaMACpp" repository to src and some small changes on "extconf.rb" should be sufficient enough to build the native extension with metal support in case anyone runs sidekiq queues on mac devices to generate the responses 10x faster.

https://github.com/ggerganov/llama.cpp

I couldn't manage to build the gem even without changing the code. I always get "failed build native extensions" with .o files are not found error.

I will try to create a pull request if I can figure out why I get ".o" files are not found error.

Just wanted to bring it up if it's a 5-minute thing to add from your end since you created this gem. I can do my best to help with testing or wherever you need help with.

Improving the quality of the results

Hi,

I'm using this with langchainrb(https://github.com/andreibondarev/langchainrb).

The purpose is to chain user added text as a "context" and create a response based on the "prompt.

First, I store user supplied "extra documents" and "text files" on a VectorStore(locally running Chroma db) after chopping them with

    splitter = Baran::CharacterTextSplitter.new(chunk_size: 500,
                                                chunk_overlap: 50,
                                                separator: "")

I implemented my own LangchainLlamaCpp class method on "langchainrb" gem to use this "Llamacpp.rb" gem as a new LLM to run things locally to avoid any online GPT APIs and run this completely offline.
In that new class, I required this gem "llama_cpp.rb" and then use

 params = LLaMACpp::ContextParams.new
params.n_ctx = DEFAULTS[:n_ctx]
client = LLaMACpp::Context.new(model_path: "model.bin",
                                      params: params)
LLaMACpp.generate(@client, prompt, n_threads: 4)"

After that, langchainrb literally chains the query to VectorStore(chromadb) and then to LLM(LlamaCpp.rb).
While I get partially correct answers related to the context I provided with the files, the answer quality is not that great compared to this python code: https://github.com/imartinez/privateGPT that I run. That one is very slow though so I avoid it. Ruby code I have is very fast like 15-30 seconds vs that python code that takes like 200-300 seconds.
I also used "sentence-transformers" with system calls to call a python code using back ticks "" from ruby to create embeddings with "MiniLM-L6-v2" as I couldn't figure out how I can work with embeddings and sentence transformers in ruby. There are no ruby gems that do what "sentence-transformers do in Python

Long story short, I literally implemented the code from the privateGPT(python)( https://github.com/imartinez/privateGPT) in ruby by following the exact structure and tried to copy everything that it does in ruby. It worked but with lower quality. I'm using it with the vicuna 13b model.

I'm not sure which setting to customize to get higher quality answers. Some of my answers include [ } @ ' " ` special characters too which doesn't make sense in a readable sentence. I'm thinking if I'm not escaping something right. That doesn't happen with the python code I mentioned above.

Do I need to monkey patch the "generate" method to change some values there(temperature etc.)?

getting error: dyld[12182]: missing symbol called

I tried 0.7.1 and could not get it to work. same with 8.0 but that was after having to do some fixes to get 0.8.0 compiled.
The gem that works so far is llama_cpp (0.6.0)

This is on a M2 Mac Studio. 0.6.0 works like a charm.

Here is the error I'm getting:
llm_load_print_meta: EOS token = 2 ''
llm_load_print_meta: UNK token = 0 ''
llm_load_print_meta: PAD token = 0 ''
llm_load_print_meta: LF token = 13 '<0x0A>'
llm_load_tensors: ggml ctx size = 0.10 MB
llm_load_tensors: mem required = 12853.11 MB
...................................................................................................
llama_new_context_with_model: n_ctx = 512
llama_new_context_with_model: freq_base = 10000.0
llama_new_context_with_model: freq_scale = 1
llama_new_context_with_model: kv self size = 256.00 MB
dyld[12182]: missing symbol called
zsh: abort ruby chat.rb main -m ../../llama.cpp/models/open_llama_7b/ggml-model-f16.ggufSo I tried to install the

and here is the exception that is logged by Macos:

Translated Report (Full Report Below)

Process: ruby [12182]
Path: /Users/USER/*/ruby
Identifier: ruby
Version: ???
Code Type: ARM-64 (Native)
Parent Process: zsh [2215]
Responsible: Terminal [1354]
User ID: 501

Date/Time: 2023-10-20 18:44:40.9678 -0700
OS Version: macOS 13.6 (22G120)
Report Version: 12
Anonymous UUID: A7E5B526-EC9A-DB18-A565-A255B30590AB

Sleep/Wake UUID: 57F5599C-F2A5-4D96-A2EE-912B4A679257

Time Awake Since Boot: 480000 seconds
Time Since Wake: 404018 seconds

System Integrity Protection: enabled

Crashed Thread: 0 Dispatch queue: com.apple.main-thread

Exception Type: EXC_CRASH (SIGABRT)
Exception Codes: 0x0000000000000000, 0x0000000000000000

Termination Reason: Namespace DYLD, Code 4 Symbol missing
missing symbol called
(terminated at launch; ignore backtrace)

Thread 0 Crashed:: Dispatch queue: com.apple.main-thread
0 dyld 0x18b891118 __abort_with_payload + 8
1 dyld 0x18b89cd7c abort_with_payload_wrapper_internal + 104
2 dyld 0x18b89cdb0 abort_with_payload + 16
3 dyld 0x18b8288a8 dyld4::halt(char const*) + 328
4 dyld 0x18b85e770 dyld4::APIs::_dyld_missing_symbol_abort() + 44
5 llama_cpp.bundle 0x103ae1c14 ggml_allocr_new_measure + 40
6 llama_cpp.bundle 0x103aee6f4 llama_new_context_with_model + 1072
7 llama_cpp.bundle 0x103ba1fa4 RbLLaMAContext::_llama_context_initialize(int, unsigned long*, unsigned long) + 456
8 libruby.3.2.dylib 0x100fef440 vm_call0_body + 980
9 libruby.3.2.dylib 0x101005e00 rb_call0 + 764
10 libruby.3.2.dylib 0x100eeba74 rb_class_new_instance_pass_kw + 60
11 libruby.3.2.dylib 0x101000684 vm_call_cfunc_with_frame + 232
12 libruby.3.2.dylib 0x100fe5ffc vm_exec_core + 8132
13 libruby.3.2.dylib 0x100ff7c24 rb_vm_exec + 2092
14 libruby.3.2.dylib 0x100e53430 rb_ec_exec_node + 300
15 libruby.3.2.dylib 0x100e5329c ruby_run_node + 96
16 ruby 0x100923f34 main + 104
17 dyld 0x18b823f28 start + 2236

Thread 1:
0 libsystem_kernel.dylib 0x18bb44854 poll + 8
1 libruby.3.2.dylib 0x100fba7f4 timer_pthread_fn + 172
2 libsystem_pthread.dylib 0x18bb7bfa8 _pthread_start + 148
3 libsystem_pthread.dylib 0x18bb76da0 thread_start + 8

Thread 0 crashed with ARM Thread State (64-bit):
x0: 0x0000000000000006 x1: 0x0000000000000004 x2: 0x000000016f4dd240 x3: 0x0000000000000014
x4: 0x000000016f4dce40 x5: 0x0000000000000000 x6: 0x0000000000000000 x7: 0x0000000000000000
x8: 0x0000000000000020 x9: 0x0000000000000009 x10: 0x0000000000000001 x11: 0x000000000000000a
x12: 0x0000000000000000 x13: 0x0000000000000031 x14: 0x0000000000000000 x15: 0x0000000000000000
x16: 0x0000000000000209 x17: 0x000000018b82135c x18: 0x0000000000000000 x19: 0x0000000000000000
x20: 0x000000016f4dce40 x21: 0x0000000000000014 x22: 0x000000016f4dd240 x23: 0x0000000000000004
x24: 0x0000000000000006 x25: 0x0000000100b0d460 x26: 0x0000000148127e00 x27: 0x0000600002dabab0
x28: 0x0000000055550483 fp: 0x000000016f4dce10 lr: 0x000000018b89cd7c
sp: 0x000000016f4dcdd0 pc: 0x000000018b891118 cpsr: 0x00001000
far: 0x0000000103ae1bec esr: 0x56000080 Address size fault

Binary Images:
0x100920000 - 0x100923fff ruby () /Users/USER//ruby
0x100dc0000 - 0x1010e7fff libruby.3.2.dylib () /Users/USER//libruby.3.2.dylib
0x1009e0000 - 0x100a37fff libgmp.10.dylib () /opt/homebrew//libgmp.10.dylib
0x100a90000 - 0x100a93fff encdb.bundle () <3b70e461-f912-39be-acc8-28ed3b8d8b37> /Users/USER//encdb.bundle
0x100ab0000 - 0x100ab3fff transdb.bundle () /Users/USER//transdb.bundle
0x100ad0000 - 0x100ad3fff monitor.bundle () <1aeaaee6-aa9b-3d43-b50a-677581f5915d> /Users/USER//monitor.bundle
0x103aa8000 - 0x103bd3fff llama_cpp.bundle () <20027139-d0f5-3535-a3a0-d884c06aefd3> /Users/USER//llama_cpp.bundle
0x103898000 - 0x10389ffff readline.bundle () /Users/USER//readline.bundle
0x103900000 - 0x10392bfff libreadline.8.2.dylib () /opt/homebrew//libreadline.8.2.dylib
0x18b81e000 - 0x18b8ac587 dyld () <49204446-242b-3d1e-9704-32f8ac99723e> /usr/lib/dyld
0x18bb3b000 - 0x18bb74fe7 libsystem_kernel.dylib () <1adb8ddc-762b-3b9f-a290-ca1e5ee7b419> /usr/lib/system/libsystem_kernel.dylib
0x18bb75000 - 0x18bb81fff libsystem_pthread.dylib (*) <1f30fb9a-bdf9-32db-a709-8417666a7e45> /usr/lib/system/libsystem_pthread.dylib

External Modification Summary:
Calls made by other processes targeting this process:
task_for_pid: 0
thread_create: 0
thread_set_state: 0
Calls made by this process:
task_for_pid: 0
thread_create: 0
thread_set_state: 0
Calls made by all processes on this machine:
task_for_pid: 0
thread_create: 0
thread_set_state: 0

VM Region Summary:
ReadOnly portion of Libraries: Total=1.0G resident=0K(0%) swapped_out_or_unallocated=1.0G(100%)
Writable regions: Total=1.8G written=0K(0%) resident=0K(0%) swapped_out=0K(0%) unallocated=1.8G(100%)

                            VIRTUAL   REGION

REGION TYPE SIZE COUNT (non-coalesced)
=========== ======= =======
Activity Tracing 256K 1
Kernel Alloc Once 32K 1
MALLOC 535.2M 35
MALLOC guard page 96K 5
MALLOC_MEDIUM (reserved) 840.0M 7 reserved VM address space (unallocated)
MALLOC_NANO (reserved) 384.0M 1 reserved VM address space (unallocated)
STACK GUARD 56.0M 2
Stack 8720K 2
VM_ALLOCATE 39.4M 103
__AUTH 576K 134
__AUTH_CONST 11.1M 274
__DATA 3316K 270
__DATA_CONST 14.0M 284
__DATA_DIRTY 680K 98
__FONT_DATA 2352 1
__LINKEDIT 804.9M 10
__OBJC_RO 66.4M 1
__OBJC_RW 2012K 1
__TEXT 217.0M 296
dyld private memory 272K 2
mapped file 12.6G 2
shared memory 32K 2
=========== ======= =======
TOTAL 15.5G 1532
TOTAL, minus reserved VM space 14.3G 1532

Full Report

{"app_name":"ruby","timestamp":"2023-10-20 18:44:41.00 -0700","app_version":"","slice_uuid":"c45ef933-a35f-350c-a477-73307d97d2a7","build_version":"","platform":1,"share_with_app_devs":0,"is_first_party":1,"bug_type":"309","os_version":"macOS 13.6 (22G120)","roots_installed":0,"incident_id":"42E9D782-A20F-4BB7-8563-136E0DBE8E0A","name":"ruby"}
{
"uptime" : 480000,
"procRole" : "Unspecified",
"version" : 2,
"userID" : 501,
"deployVersion" : 210,
"modelCode" : "Mac14,13",
"coalitionID" : 1420,
"osVersion" : {
"train" : "macOS 13.6",
"build" : "22G120",
"releaseType" : "User"
},
"captureTime" : "2023-10-20 18:44:40.9678 -0700",
"incident" : "42E9D782-A20F-4BB7-8563-136E0DBE8E0A",
"pid" : 12182,
"translated" : false,
"cpuType" : "ARM-64",
"roots_installed" : 0,
"bug_type" : "309",
"procLaunch" : "2023-10-20 18:44:40.4946 -0700",
"procStartAbsTime" : 11641675843746,
"procExitAbsTime" : 11641687184822,
"procName" : "ruby",
"procPath" : "/Users/USER//ruby",
"parentProc" : "zsh",
"parentPid" : 2215,
"coalitionName" : "com.apple.Terminal",
"crashReporterKey" : "A7E5B526-EC9A-DB18-A565-A255B30590AB",
"responsiblePid" : 1354,
"responsibleProc" : "Terminal",
"codeSigningID" : "ruby",
"codeSigningTeamID" : "",
"codeSigningFlags" : 570556929,
"codeSigningValidationCategory" : 10,
"codeSigningTrustLevel" : 0,
"wakeTime" : 404018,
"sleepWakeUUID" : "57F5599C-F2A5-4D96-A2EE-912B4A679257",
"sip" : "enabled",
"exception" : {"codes":"0x0000000000000000, 0x0000000000000000","rawCodes":[0,0],"type":"EXC_CRASH","signal":"SIGABRT"},
"termination" : {"code":4,"flags":518,"namespace":"DYLD","indicator":"Symbol missing","details":["(terminated at launch; ignore backtrace)"],"reasons":["missing symbol called"]},
"extMods" : {"caller":{"thread_create":0,"thread_set_state":0,"task_for_pid":0},"system":{"thread_create":0,"thread_set_state":0,"task_for_pid":0},"targeted":{"thread_create":0,"thread_set_state":0,"task_for_pid":0},"warnings":0},
"faultingThread" : 0,
"threads" : [{"triggered":true,"id":7231102,"threadState":{"x":[{"value":6},{"value":4},{"value":6162338368},{"value":20},{"value":6162337344},{"value":0},{"value":0},{"value":0},{"value":32},{"value":9},{"value":1},{"value":10},{"value":0},{"value":49},{"value":0},{"value":0},{"value":521},{"value":6635524956,"symbolLocation":416,"symbol":"__simple_bprintf"},{"value":0},{"value":0},{"value":6162337344},{"value":20},{"value":6162338368},{"value":4},{"value":6},{"value":4306556000},{"value":5504138752},{"value":105553164155568},{"value":1431635075}],"flavor":"ARM_THREAD_STATE64","lr":{"value":6636031356},"cpsr":{"value":4096},"fp":{"value":6162337296},"sp":{"value":6162337232},"esr":{"value":1442840704,"description":" Address size fault"},"pc":{"value":6635983128,"matchesCrashFrame":1},"far":{"value":4356709356}},"queue":"com.apple.main-thread","frames":[{"imageOffset":471320,"symbol":"__abort_with_payload","symbolLocation":8,"imageIndex":9},{"imageOffset":519548,"symbol":"abort_with_payload_wrapper_internal","symbolLocation":104,"imageIndex":9},{"imageOffset":519600,"symbol":"abort_with_payload","symbolLocation":16,"imageIndex":9},{"imageOffset":43176,"symbol":"dyld4::halt(char const)","symbolLocation":328,"imageIndex":9},{"imageOffset":264048,"symbol":"dyld4::APIs::_dyld_missing_symbol_abort()","symbolLocation":44,"imageIndex":9},{"imageOffset":236564,"symbol":"ggml_allocr_new_measure","symbolLocation":40,"imageIndex":6},{"imageOffset":288500,"symbol":"llama_new_context_with_model","symbolLocation":1072,"imageIndex":6},{"imageOffset":1023908,"symbol":"RbLLaMAContext::_llama_context_initialize(int, unsigned long*, unsigned long)","symbolLocation":456,"imageIndex":6},{"imageOffset":2290752,"symbol":"vm_call0_body","symbolLocation":980,"imageIndex":1},{"imageOffset":2383360,"symbol":"rb_call0","symbolLocation":764,"imageIndex":1},{"imageOffset":1227380,"symbol":"rb_class_new_instance_pass_kw","symbolLocation":60,"imageIndex":1},{"imageOffset":2360964,"symbol":"vm_call_cfunc_with_frame","symbolLocation":232,"imageIndex":1},{"imageOffset":2252796,"symbol":"vm_exec_core","symbolLocation":8132,"imageIndex":1},{"imageOffset":2325540,"symbol":"rb_vm_exec","symbolLocation":2092,"imageIndex":1},{"imageOffset":603184,"symbol":"rb_ec_exec_node","symbolLocation":300,"imageIndex":1},{"imageOffset":602780,"symbol":"ruby_run_node","symbolLocation":96,"imageIndex":1},{"imageOffset":16180,"symbol":"main","symbolLocation":104,"imageIndex":0},{"imageOffset":24360,"symbol":"start","symbolLocation":2236,"imageIndex":9}]},{"id":7231103,"frames":[{"imageOffset":38996,"symbol":"poll","symbolLocation":8,"imageIndex":10},{"imageOffset":2074612,"symbol":"timer_pthread_fn","symbolLocation":172,"imageIndex":1},{"imageOffset":28584,"symbol":"_pthread_start","symbolLocation":148,"imageIndex":11},{"imageOffset":7584,"symbol":"thread_start","symbolLocation":8,"imageIndex":11}]}],
"usedImages" : [
{
"source" : "P",
"arch" : "arm64",
"base" : 4304535552,
"size" : 16384,
"uuid" : "c45ef933-a35f-350c-a477-73307d97d2a7",
"path" : "/Users/USER//ruby",
"name" : "ruby"
},
{
"source" : "P",
"arch" : "arm64",
"base" : 4309385216,
"size" : 3309568,
"uuid" : "f442946f-f472-3eb2-a5b4-8f7568d6fc01",
"path" : "/Users/USER//libruby.3.2.dylib",
"name" : "libruby.3.2.dylib"
},
{
"source" : "P",
"arch" : "arm64",
"base" : 4305321984,
"size" : 360448,
"uuid" : "efc29ca3-3b2a-3664-976d-b890d76d3d08",
"path" : "/opt/homebrew//libgmp.10.dylib",
"name" : "libgmp.10.dylib"
},
{
"source" : "P",
"arch" : "arm64",
"base" : 4306042880,
"size" : 16384,
"uuid" : "3b70e461-f912-39be-acc8-28ed3b8d8b37",
"path" : "/Users/USER//encdb.bundle",
"name" : "encdb.bundle"
},
{
"source" : "P",
"arch" : "arm64",
"base" : 4306173952,
"size" : 16384,
"uuid" : "b3cf6094-5adf-3816-a573-8133784d8189",
"path" : "/Users/USER//transdb.bundle",
"name" : "transdb.bundle"
},
{
"source" : "P",
"arch" : "arm64",
"base" : 4306305024,
"size" : 16384,
"uuid" : "1aeaaee6-aa9b-3d43-b50a-677581f5915d",
"path" : "/Users/USER//monitor.bundle",
"name" : "monitor.bundle"
},
{
"source" : "P",
"arch" : "arm64",
"base" : 4356472832,
"size" : 1228800,
"uuid" : "20027139-d0f5-3535-a3a0-d884c06aefd3",
"path" : "/Users/USER//llama_cpp.bundle",
"name" : "llama_cpp.bundle"
},
{
"source" : "P",
"arch" : "arm64",
"base" : 4354310144,
"size" : 32768,
"uuid" : "bb8307d5-5172-3a48-afea-463526ee30de",
"path" : "/Users/USER//readline.bundle",
"name" : "readline.bundle"
},
{
"source" : "P",
"arch" : "arm64",
"base" : 4354736128,
"size" : 180224,
"uuid" : "d5547b5e-49ef-3c81-8ab5-2bdd014e318f",
"path" : "/opt/homebrew/*/libreadline.8.2.dylib",
"name" : "libreadline.8.2.dylib"
},
{
"source" : "P",
"arch" : "arm64e",
"base" : 6635511808,
"size" : 583048,
"uuid" : "49204446-242b-3d1e-9704-32f8ac99723e",
"path" : "/usr/lib/dyld",
"name" : "dyld"
},
{
"source" : "P",
"arch" : "arm64e",
"base" : 6638776320,
"size" : 237544,
"uuid" : "1adb8ddc-762b-3b9f-a290-ca1e5ee7b419",
"path" : "/usr/lib/system/libsystem_kernel.dylib",
"name" : "libsystem_kernel.dylib"
},
{
"source" : "P",
"arch" : "arm64e",
"base" : 6639013888,
"size" : 53248,
"uuid" : "1f30fb9a-bdf9-32db-a709-8417666a7e45",
"path" : "/usr/lib/system/libsystem_pthread.dylib",
"name" : "libsystem_pthread.dylib"
}
],
"sharedCache" : {
"base" : 6634848256,
"size" : 3553476608,
"uuid" : "c26be8cd-619e-3513-8673-3ff826317005"
},
"vmSummary" : "ReadOnly portion of Libraries: Total=1.0G resident=0K(0%) swapped_out_or_unallocated=1.0G(100%)\nWritable regions: Total=1.8G written=0K(0%) resident=0K(0%) swapped_out=0K(0%) unallocated=1.8G(100%)\n\n VIRTUAL REGION \nREGION TYPE SIZE COUNT (non-coalesced) \n=========== ======= ======= \nActivity Tracing 256K 1 \nKernel Alloc Once 32K 1 \nMALLOC 535.2M 35 \nMALLOC guard page 96K 5 \nMALLOC_MEDIUM (reserved) 840.0M 7 reserved VM address space (unallocated)\nMALLOC_NANO (reserved) 384.0M 1 reserved VM address space (unallocated)\nSTACK GUARD 56.0M 2 \nStack 8720K 2 \nVM_ALLOCATE 39.4M 103 \n__AUTH 576K 134 \n__AUTH_CONST 11.1M 274 \n__DATA 3316K 270 \n__DATA_CONST 14.0M 284 \n__DATA_DIRTY 680K 98 \n__FONT_DATA 2352 1 \n__LINKEDIT 804.9M 10 \n__OBJC_RO 66.4M 1 \n__OBJC_RW 2012K 1 \n__TEXT 217.0M 296 \ndyld private memory 272K 2 \nmapped file 12.6G 2 \nshared memory 32K 2 \n=========== ======= ======= \nTOTAL 15.5G 1532 \nTOTAL, minus reserved VM space 14.3G 1532 \n",
"legacyInfo" : {
"threadTriggered" : {
"queue" : "com.apple.main-thread"
}
},
"logWritingSignature" : "1ef3b85d30de645bb385dc42dd4360b8022fcf86",
"trialInfo" : {
"rollouts" : [
{
"rolloutId" : "62c74108bcb0435c2153f963",
"factorPackIds" : {
"SIRI_TEXT_TO_SPEECH" : "651df1e7be905e686a05edc8"
},
"deploymentId" : 240000367
},
{
"rolloutId" : "63582c5f8a53461413999550",
"factorPackIds" : {

  },
  "deploymentId" : 240000002
}

],
"experiments" : [
{
"treatmentId" : "6dd670af-0633-45e4-ae5f-122ae4df02be",
"experimentId" : "64406ba83deb637ac8a04419",
"deploymentId" : 900000017
}
]
}
}

Multimodal prompting?

Hello there! Thanks for providing this Gem!

I'm new to llama.cpp and was wondering how I might use the following llava-cli command to use with the Ruby bindings:

./llava-cli -m /Users/pulleasy/.cache/lm-studio/models/AI-Engine/BakLLaVA1-MistralLLaVA-7B-GGUF/BakLLaVA1-MistralLLaVA-7B.q5_K_M.gguf --mmproj /Users/pulleasy/.cache/lm-studio/models/AI-Engine/BakLLaVA1-MistralLLaVA-7B-GGUF/BakLLaVA1-clip-mmproj-model-f16.gguf --image ~/Downloads/test2.jpeg --temp 0.1

Any pointers where to inject the mmproj and image parameters would be highly appreciated 🙏

ASCII-8bit instead of UTF

I notice that the output of token_to_piece is ASCII-8bit:

irb(main):167:0> context.model.token_to_piece(10).encoding
=> #<Encoding:ASCII-8BIT>

It is certainly easy to run .encode('UTF-8', invalid: :replace, undef: :replace, replace: "") but perhaps this is unsafe. Is there a reason we get ASCII-8BIT out, and what's the best way to handle this (my understanding is that LLMs are generally trained on UTF.)

Error when trying to run the chat.rb example.

When trying to run the chat example I am getting an error.

ruby chat.rb --model /playingwithai/models/llama-2-7b-chat.Q8_0.gguf
llama_model_loader: loaded meta data with 19 key-value pairs and 291 tensors from /playingwithai/models/llama-2-7b-chat.Q8_0.gguf (version GGUF V2)
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv 0: general.architecture str = llama
llama_model_loader: - kv 1: general.name str = LLaMA v2
llama_model_loader: - kv 2: llama.context_length u32 = 4096
llama_model_loader: - kv 3: llama.embedding_length u32 = 4096
llama_model_loader: - kv 4: llama.block_count u32 = 32
llama_model_loader: - kv 5: llama.feed_forward_length u32 = 11008
llama_model_loader: - kv 6: llama.rope.dimension_count u32 = 128
llama_model_loader: - kv 7: llama.attention.head_count u32 = 32
llama_model_loader: - kv 8: llama.attention.head_count_kv u32 = 32
llama_model_loader: - kv 9: llama.attention.layer_norm_rms_epsilon f32 = 0.000001
llama_model_loader: - kv 10: general.file_type u32 = 7
llama_model_loader: - kv 11: tokenizer.ggml.model str = llama
llama_model_loader: - kv 12: tokenizer.ggml.tokens arr[str,32000] = ["", "~~", "~~", "<0x00>", "<...
llama_model_loader: - kv 13: tokenizer.ggml.scores arr[f32,32000] = [0.000000, 0.000000, 0.000000, 0.0000...
llama_model_loader: - kv 14: tokenizer.ggml.token_type arr[i32,32000] = [2, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, ...
llama_model_loader: - kv 15: tokenizer.ggml.bos_token_id u32 = 1
llama_model_loader: - kv 16: tokenizer.ggml.eos_token_id u32 = 2
llama_model_loader: - kv 17: tokenizer.ggml.unknown_token_id u32 = 0
llama_model_loader: - kv 18: general.quantization_version u32 = 2
llama_model_loader: - type f32: 65 tensors
llama_model_loader: - type q8_0: 226 tensors
llm_load_vocab: special tokens definition check successful ( 259/32000 ).
llm_load_print_meta: format = GGUF V2
llm_load_print_meta: arch = llama
llm_load_print_meta: vocab type = SPM
llm_load_print_meta: n_vocab = 32000
llm_load_print_meta: n_merges = 0
llm_load_print_meta: n_ctx_train = 4096
llm_load_print_meta: n_embd = 4096
llm_load_print_meta: n_head = 32
llm_load_print_meta: n_head_kv = 32
llm_load_print_meta: n_layer = 32
llm_load_print_meta: n_rot = 128
llm_load_print_meta: n_embd_head_k = 128
llm_load_print_meta: n_embd_head_v = 128
llm_load_print_meta: n_gqa = 1
llm_load_print_meta: n_embd_k_gqa = 4096
llm_load_print_meta: n_embd_v_gqa = 4096
llm_load_print_meta: f_norm_eps = 0.0e+00
llm_load_print_meta: f_norm_rms_eps = 1.0e-06
llm_load_print_meta: f_clamp_kqv = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: n_ff = 11008
llm_load_print_meta: n_expert = 0
llm_load_print_meta: n_expert_used = 0
llm_load_print_meta: rope scaling = linear
llm_load_print_meta: freq_base_train = 10000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_yarn_orig_ctx = 4096
llm_load_print_meta: rope_finetuned = unknown
llm_load_print_meta: model type = 7B
llm_load_print_meta: model ftype = Q8_0
llm_load_print_meta: model params = 6.74 B
llm_load_print_meta: model size = 6.67 GiB (8.50 BPW)
llm_load_print_meta: general.name = LLaMA v2
llm_load_print_meta: BOS token = 1 ''
llm_load_print_meta: EOS token = 2 ''
llm_load_print_meta: UNK token = 0 ''
llm_load_print_meta: LF token = 13 '<0x0A>'
llm_load_tensors: ggml ctx size = 0.11 MiB
llm_load_tensors: system memory used = 6828.75 MiB
....................................................................................................
llama_new_context_with_model: n_ctx = 512
llama_new_context_with_model: freq_base = 10000.0
llama_new_context_with_model: freq_scale = 1
llama_new_context_with_model: KV self size = 256.00 MiB, K (f16): 128.00 MiB, V (f16): 128.00 MiB
llama_build_graph: non-view tensors processed: 676/676
llama_new_context_with_model: compute buffer total size = 73.69 MiB
Transcript of a dialog, where the User interacts with an Assistant named Bob. Bob is helpful, kind, honest, good at writing, and never fails to answer the User's requests immediately and with precision.

User: Hello, Bob.
Bob: Hello. How may I help you today?
User: Please tell me the largest city in Europe.
Bob: Sure. The largest city in Europe is Moscow, the capital of Russia.
chat.rb:73:in block in main': undefined method get_one' for LLaMACpp::Batch:Class (NoMethodError)

      context.decode(LLaMACpp::Batch.get_one(tokens: embd[i...(i + n_eval)], n_tokens: n_eval, pos_zero: n_past, seq_id: 0))
                                    ^^^^^^^^
from chat.rb:71:in `step'
from chat.rb:71:in `main'
from /.asdf/installs/ruby/3.2.1/lib/ruby/gems/3.2.0/gems/thor-1.3.0/lib/thor/command.rb:28:in `run'
from /.asdf/installs/ruby/3.2.1/lib/ruby/gems/3.2.0/gems/thor-1.3.0/lib/thor/invocation.rb:127:in `invoke_command'
from /.asdf/installs/ruby/3.2.1/lib/ruby/gems/3.2.0/gems/thor-1.3.0/lib/thor.rb:527:in `dispatch'
from /.asdf/installs/ruby/3.2.1/lib/ruby/gems/3.2.0/gems/thor-1.3.0/lib/thor/base.rb:584:in `start'
from chat.rb:196:in `<main>'

I am happy to dig around. Im not really sure where to start. 😓

Default to using --with-metal on macOS

I think enabling this will make for a better experience for folks on macos. Otherwise, you'd have to know to reinstlal with --with-metal and/or run bundle config build.llama_cpp -- --with-metal.

Let me know if should file this on https://github.com/ggerganov/llama.cpp if that makes more sense

cc #3 where it was initially requested.

How start?

How install models and how using it to make a simple chat or whatewer?

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.