Light

Set model to eval mode for better performance about koboldai-client HOT 12 CLOSED

koboldai commented on August 18, 2024

Set model to eval mode for better performance

from koboldai-client.

Comments (12)

KoboldAI commented on August 18, 2024 1

Added use_cache=True to generate calls in tonight's push. Works much better on my end, let me know if reports of sluggishness continue. Thanks again.

from koboldai-client.

KoboldAI commented on August 18, 2024

Strangely, when I was first implementing transformers, I couldn't actually get it to use my GPU unless I added that device assignment. More recently I tried replacing the transformers pipeline with the torch.load implementation from your colab notebook, but I lost the ability to run the 2.7B models on my 8GB card (CUDA would OOM). I could avoid the OOM by clearing the checkpoint keys before assigning it to the model, but the generator results became just a random assortment of nonsense, so I scrapped the experiment.

from koboldai-client.

finetunej commented on August 18, 2024

The notebook is optimized for initializing both the checkpoint and model on GPU, because regualr colab has more VRAM than system RAM. For running locally, it should work as long as you don't have the map_location (or have it, but set to cpu) on the torch.load().

from koboldai-client.

KoboldAI commented on August 18, 2024

I'm not seeing a noticeable difference using either eval() or cuda(0). I have the following lines of code:
if(vars.hascuda and vars.usegpu):
model = model.cuda(0).eval()
generator = pipeline('text-generation', model=model, tokenizer=tokenizer, device=0)
I'm still seeing ~18 second generation times, and the GPU is definitely being used as I can see the dedicated memory usage sitting around 6-7Gb in Task Manager. I actually don't know when this started happening, I definitely remember having 3-5 second generation times at one point.

from koboldai-client.

finetunej commented on August 18, 2024

Maybe that is not the issue then. Is that with the original EleutherAI model?

from koboldai-client.

KoboldAI commented on August 18, 2024

Ran all three just now.
The stock 1.3B and 2.7B models are giving me ~3-5s generations without the .cuda(0).eval() additions.
Neo-Horni takes ~18s regardless of whether .cuda(0).eval() are in use.

from koboldai-client.

finetunej commented on August 18, 2024

Interesting, thanks for testing. Does horni get faster if you change gradient_checkpointing to false in its config file?

from koboldai-client.

KoboldAI commented on August 18, 2024

No, no change with gradient_checkpointing. However, if I swap Horni's config file for the one from stock 2.7B, generation times drop to ~6-7s

from koboldai-client.

finetunej commented on August 18, 2024

Didn't really see any other differences between the files. Very strange.

from koboldai-client.

finetunej commented on August 18, 2024

Actually, could you please try once more with "use_cache" set to true in horni's config?

from koboldai-client.

KoboldAI commented on August 18, 2024

Oh yeah, that did it. Lightning quick responses with use_cache set to true.

from koboldai-client.

finetunej commented on August 18, 2024

In that case, adding use_cache=True to the generator() call should do the trick.

from koboldai-client.

Related Issues (20)

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.