Comments (4)
@UmutAlihan we've actually been building out a test farm to better catch these issues before we release, but there are a lot of different permutations to test. Stability is incredibly important to us.
That said, in 0.1.33 we were trying to improve our memory calculation to more efficiently pack in models, and sometimes we weren't calculating enough space and some of the layers were being allocated to the GPU when they should have been allocated to the CPU. The problem is if we're too conservative then performance suffers because more layers get sent to the CPU and there will be a dozen issues with people complaining about slow performance.
Unfortunately I don't have a 4070 Ti Super to test out on. I think what's happening is the model is close to the size of your VRAM and we're not calculating the memory graph correctly w/ gemma. I'll double check with some other people on the team.
from ollama.
yes after 0.1.33 release many things have broken
unforunately I think contributors are trying to be so fast that they are unable to test in coverage or write clean quality code
I was very hopeful for the ollama and its community however if this FOMO release cycle continues to break things more I might need to turn back to LiteLLM or other alternatives :'/
from ollama.
well thank you for detailed response 🫡
I am using 2x 3060s and yes llama3 8b is loading to 24gb vram GPU with around 80% utilization. So I can assume that your root cause analysis is true and hope that more users would prefer stability over performance 🙏
from ollama.
yes after 0.1.33 release many things have broken
unforunately I think contributors are trying to be so fast that they are unable to test in coverage or write clean quality code
I was very hopeful for the ollama and its community however if this FOMO release cycle continues to break things more I might need to turn back to LiteLLM or other alternatives :'/
100%
from ollama.
Related Issues (20)
- "Mock" model HOT 1
- Gemma2:27b start to output repetive trash after few generations HOT 3
- Ollama stderr returns info logs
- Gemma 2 9B cannot run HOT 6
- gguf success,but run error HOT 1
- [BUG]: Gemma2 crashes on run. HOT 2
- Ollama updates don't choose proper proxy
- Groq's "name" option within "messages" parameter of the chat endpoint payload HOT 3
- allow for num_ctx parameter in the openai API compatibility
- LLM Compiler Models
- Both Gemma2 model fail with cudaMalloc error despite available GPU memory, while other models run successfully. HOT 6
- Support for Snapdragon X Elite NPU & GPU HOT 2
- Ollama running very slow on Windows HOT 1
- allow temperature to be set on command line ( w/out using a modelfile ) HOT 5
- Error: llama runner process has terminated: signal: aborted (core dumped) HOT 1
- run gemma2 error HOT 2
- OpenAI Chat Compatibility Incorrect Prompt Eval
- Ollama Run provides numerical choice to run one of models from list
- Referring offline downloaded models in code
- dolphin-phi3 and dolphin-qwen2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ollama.