Giter Club home page Giter Club logo

Comments (6)

chiragbharambe avatar chiragbharambe commented on July 17, 2024 2

Thanks. It worked.
And where can I find all parameters? Parameter you mentioned is not here https://github.com/ollama/ollama/blob/main/docs/modelfile.md

Defaults
8 gpu layers for gemma2:27b
22 gpu layers for gemma2:9b
What worked
6 gpu layers for gemma2:27b
19 gpu layers for gemma2:9b

gemma2l

FROM gemma2:27b
PARAMETER num_gpu 6
ollama create gemma2l -f pathtofile/gemma2l

gemma2s

FROM gemma2
PARAMETER num_gpu 19
ollama create gemma2s -f pathtofile/gemma2s

Logs

Jun 28 13:40:14 archlinux ollama[211707]: time=2024-06-28T13:40:14.215+02:00 level=INFO source=server.go:368 msg="starting llama server" cmd="/tmp/ollama4270095903/runners/cuda_v12/ollama_llama_server --model /var/lib/ollama/.ollama/models/blobs/sha256-e84ed7399c82fbf7dbd6cdef3f12d356c3cdb5512e5d8b2a9898080cbcdd72e5 --ctx-size 2048 --batch-size 512 --embedding --log-disable --n-gpu-layers 22 --parallel 1 --port 33021"
Jun 28 13:40:36 archlinux ollama[211707]: time=2024-06-28T13:40:36.915+02:00 level=INFO source=server.go:368 msg="starting llama server" cmd="/tmp/ollama4270095903/runners/cuda_v12/ollama_llama_server --model /var/lib/ollama/.ollama/models/blobs/sha256-6a0746a1ec1aef3e7ec53868f220ff6e389f6f8ef87a01d77c96807de94ca2aa --ctx-size 2048 --batch-size 512 --embedding --log-disable --n-gpu-layers 18 --parallel 1 --port 36953"
Jun 28 13:40:51 archlinux ollama[211707]: time=2024-06-28T13:40:51.293+02:00 level=INFO source=server.go:368 msg="starting llama server" cmd="/tmp/ollama4270095903/runners/cuda_v12/ollama_llama_server --model /var/lib/ollama/.ollama/models/blobs/sha256-b6ee2328408ebc031359e9745973b09963df9269468d37e1ea7912862aadec72 --ctx-size 2048 --batch-size 512 --embedding --log-disable --n-gpu-layers 8 --parallel 1 --port 33949"
Jun 28 13:44:57 archlinux ollama[1036]: time=2024-06-28T13:44:57.033+02:00 level=INFO source=server.go:368 msg="starting llama server" cmd="/tmp/ollama2544715102/runners/cuda_v12/ollama_llama_server --model /var/lib/ollama/.ollama/models/blobs/sha256-e84ed7399c82fbf7dbd6cdef3f12d356c3cdb5512e5d8b2a9898080cbcdd72e5 --ctx-size 2048 --batch-size 512 --embedding --log-disable --n-gpu-layers 22 --parallel 1 --port 41111"
Jun 28 13:45:39 archlinux ollama[1036]: time=2024-06-28T13:45:39.994+02:00 level=INFO source=server.go:368 msg="starting llama server" cmd="/tmp/ollama2544715102/runners/cuda_v12/ollama_llama_server --model /var/lib/ollama/.ollama/models/blobs/sha256-e84ed7399c82fbf7dbd6cdef3f12d356c3cdb5512e5d8b2a9898080cbcdd72e5 --ctx-size 2048 --batch-size 512 --embedding --log-disable --n-gpu-layers 22 --parallel 1 --port 41445"
Jun 28 13:53:46 archlinux ollama[1036]: time=2024-06-28T13:53:46.235+02:00 level=INFO source=server.go:368 msg="starting llama server" cmd="/tmp/ollama2544715102/runners/cuda_v12/ollama_llama_server --model /var/lib/ollama/.ollama/models/blobs/sha256-b6ee2328408ebc031359e9745973b09963df9269468d37e1ea7912862aadec72 --ctx-size 2048 --batch-size 512 --embedding --log-disable --n-gpu-layers 8 --parallel 1 --port 45083"
Jun 28 14:08:13 archlinux ollama[1036]: time=2024-06-28T14:08:13.101+02:00 level=INFO source=server.go:368 msg="starting llama server" cmd="/tmp/ollama2544715102/runners/cuda_v12/ollama_llama_server --model /var/lib/ollama/.ollama/models/blobs/sha256-b26e6713dc749dda35872713fa19a568040f475cc71cb132cff332fe7e216462 --ctx-size 2048 --batch-size 512 --embedding --log-disable --n-gpu-layers 31 --parallel 1 --port 41101"
Jun 28 14:11:10 archlinux ollama[1036]: time=2024-06-28T14:11:10.351+02:00 level=INFO source=server.go:368 msg="starting llama server" cmd="/tmp/ollama2544715102/runners/cuda_v12/ollama_llama_server --model /var/lib/ollama/.ollama/models/blobs/sha256-e84ed7399c82fbf7dbd6cdef3f12d356c3cdb5512e5d8b2a9898080cbcdd72e5 --ctx-size 2048 --batch-size 512 --embedding --log-disable --n-gpu-layers 22 --parallel 1 --port 42557"
Jun 28 14:23:09 archlinux ollama[1036]: time=2024-06-28T14:23:09.476+02:00 level=INFO source=server.go:368 msg="starting llama server" cmd="/tmp/ollama2544715102/runners/cuda_v12/ollama_llama_server --model /var/lib/ollama/.ollama/models/blobs/sha256-e84ed7399c82fbf7dbd6cdef3f12d356c3cdb5512e5d8b2a9898080cbcdd72e5 --ctx-size 2048 --batch-size 512 --embedding --log-disable --n-gpu-layers 22 --parallel 1 --port 43707"
Jun 28 14:26:33 archlinux ollama[1036]: time=2024-06-28T14:26:33.030+02:00 level=INFO source=server.go:368 msg="starting llama server" cmd="/tmp/ollama2544715102/runners/cuda_v12/ollama_llama_server --model /var/lib/ollama/.ollama/models/blobs/sha256-e84ed7399c82fbf7dbd6cdef3f12d356c3cdb5512e5d8b2a9898080cbcdd72e5 --ctx-size 1024 --batch-size 512 --embedding --log-disable --n-gpu-layers 25 --parallel 1 --port 45287"
Jun 28 14:27:35 archlinux ollama[1036]: time=2024-06-28T14:27:35.763+02:00 level=INFO source=server.go:368 msg="starting llama server" cmd="/tmp/ollama2544715102/runners/cuda_v12/ollama_llama_server --model /var/lib/ollama/.ollama/models/blobs/sha256-6a0746a1ec1aef3e7ec53868f220ff6e389f6f8ef87a01d77c96807de94ca2aa --ctx-size 2048 --batch-size 512 --embedding --log-disable --n-gpu-layers 18 --parallel 1 --port 43549"
Jun 28 15:24:47 archlinux ollama[1036]: time=2024-06-28T15:24:47.997+02:00 level=INFO source=server.go:368 msg="starting llama server" cmd="/tmp/ollama2544715102/runners/cuda_v12/ollama_llama_server --model /var/lib/ollama/.ollama/models/blobs/sha256-e84ed7399c82fbf7dbd6cdef3f12d356c3cdb5512e5d8b2a9898080cbcdd72e5 --ctx-size 2048 --batch-size 512 --embedding --log-disable --n-gpu-layers 22 --parallel 1 --port 35471"

from ollama.

chiragbharambe avatar chiragbharambe commented on July 17, 2024

It seems like there's an issue with VRAM. But why can't this model be run on my system when it's possible to run Mixtral with reasonable performance? Is it possible to change some parameters to run it?

Jun 28 13:53:46 archlinux ollama[1036]: ggml_cuda_init: found 1 CUDA devices:
Jun 28 13:53:46 archlinux ollama[1036]:   Device 0: NVIDIA GeForce RTX 3050 Laptop GPU, compute capability 8.6, VMM: yes
Jun 28 13:53:46 archlinux ollama[1036]: llm_load_tensors: ggml ctx size =    0.49 MiB
Jun 28 13:53:54 archlinux ollama[1036]: llm_load_tensors: offloading 8 repeating layers to GPU
Jun 28 13:53:54 archlinux ollama[1036]: llm_load_tensors: offloaded 8/47 layers to GPU
Jun 28 13:53:54 archlinux ollama[1036]: llm_load_tensors:        CPU buffer size = 14898.60 MiB
Jun 28 13:53:54 archlinux ollama[1036]: llm_load_tensors:      CUDA0 buffer size =  2430.56 MiB
Jun 28 13:53:55 archlinux ollama[1036]: llama_new_context_with_model: n_ctx      = 2048
Jun 28 13:53:55 archlinux ollama[1036]: llama_new_context_with_model: n_batch    = 512
Jun 28 13:53:55 archlinux ollama[1036]: llama_new_context_with_model: n_ubatch   = 512
Jun 28 13:53:55 archlinux ollama[1036]: llama_new_context_with_model: flash_attn = 0
Jun 28 13:53:55 archlinux ollama[1036]: llama_new_context_with_model: freq_base  = 10000.0
Jun 28 13:53:55 archlinux ollama[1036]: llama_new_context_with_model: freq_scale = 1
Jun 28 13:53:55 archlinux ollama[1036]: llama_kv_cache_init:  CUDA_Host KV buffer size =   608.00 MiB
Jun 28 13:53:55 archlinux ollama[1036]: llama_kv_cache_init:      CUDA0 KV buffer size =   128.00 MiB
Jun 28 13:53:55 archlinux ollama[1036]: llama_new_context_with_model: KV self size  =  736.00 MiB, K (f16):  368.00 MiB, V (f16):  368.00 MiB
Jun 28 13:53:55 archlinux ollama[1036]: llama_new_context_with_model:  CUDA_Host  output buffer size =     0.99 MiB
Jun 28 13:53:55 archlinux ollama[1036]: ggml_backend_cuda_buffer_type_alloc_buffer: allocating 1431.85 MiB on device 0: cudaMalloc failed: out of memory
Jun 28 13:53:55 archlinux ollama[1036]: ggml_gallocr_reserve_n: failed to allocate CUDA0 buffer of size 1501405184
Jun 28 13:53:55 archlinux ollama[1036]: llama_new_context_with_model: failed to allocate compute buffers
Jun 28 13:53:55 archlinux ollama[1036]: llama_init_from_gpt_params: error: failed to create context with model '/var/lib/ollama/.ollama/models/blobs/sha256-b6ee2328408ebc031359e9745973b09963df9269468d37e1ea7912862aadec72'
Jun 28 13:53:56 archlinux ollama[16132]: ERROR [load_model] unable to load model | model="/var/lib/ollama/.ollama/models/blobs/sha256-b6ee2328408ebc031359e9745973b09963df9269468d37e1ea7912862aadec72" tid="134527161749504" timestamp=1719575
636

from ollama.

chiragbharambe avatar chiragbharambe commented on July 17, 2024

gemma2s

FROM gemma2
PARAMETER num_ctx 1024

Model does not load with smaller num_ctx 1024. Same error.
It is saying out of memory but it's not. Nearly entire 4096MB is availble (13/4096 MB).

from ollama.

rick-github avatar rick-github commented on July 17, 2024

1ed4f52 resolves (for me) the problem of OOM during model load. You can get the model to load without this patch by setting num_gpu lower (search logs for --n-gpu-layers to see what the default value is for your config).

from ollama.

rick-github avatar rick-github commented on July 17, 2024

The available options are listed in the API doc,

##### Request

from ollama.

chiragbharambe avatar chiragbharambe commented on July 17, 2024

Thanks.

from ollama.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.