What is the issue? Ollama v0.1.33 Intel Core i9 14900K 96GB ra

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

v0.1.33 can't load gemma:7b-instruct-v1.1-fp16 due to failed to create context with model about ollama HOT 4 OPEN

MarkWard0110 commented on June 30, 2024

v0.1.33 can't load gemma:7b-instruct-v1.1-fp16 due to failed to create context with model

from ollama.

Comments (4)

pdevine commented on June 30, 2024 4

@UmutAlihan we've actually been building out a test farm to better catch these issues before we release, but there are a lot of different permutations to test. Stability is incredibly important to us.

That said, in 0.1.33 we were trying to improve our memory calculation to more efficiently pack in models, and sometimes we weren't calculating enough space and some of the layers were being allocated to the GPU when they should have been allocated to the CPU. The problem is if we're too conservative then performance suffers because more layers get sent to the CPU and there will be a dozen issues with people complaining about slow performance.

Unfortunately I don't have a 4070 Ti Super to test out on. I think what's happening is the model is close to the size of your VRAM and we're not calculating the memory graph correctly w/ gemma. I'll double check with some other people on the team.

from ollama.

UmutAlihan commented on June 30, 2024 2

yes after 0.1.33 release many things have broken

unforunately I think contributors are trying to be so fast that they are unable to test in coverage or write clean quality code

I was very hopeful for the ollama and its community however if this FOMO release cycle continues to break things more I might need to turn back to LiteLLM or other alternatives :'/

from ollama.

UmutAlihan commented on June 30, 2024 2

well thank you for detailed response 🫡

I am using 2x 3060s and yes llama3 8b is loading to 24gb vram GPU with around 80% utilization. So I can assume that your root cause analysis is true and hope that more users would prefer stability over performance 🙏

from ollama.

oldmanjk commented on June 30, 2024

yes after 0.1.33 release many things have broken

unforunately I think contributors are trying to be so fast that they are unable to test in coverage or write clean quality code

I was very hopeful for the ollama and its community however if this FOMO release cycle continues to break things more I might need to turn back to LiteLLM or other alternatives :'/

100%

from ollama.

Recommend Projects

v0.1.33 can't load gemma:7b-instruct-v1.1-fp16 due to failed to create context with model about ollama HOT 4 OPEN

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent