Giter Club home page Giter Club logo

Comments (12)

KoboldAI avatar KoboldAI commented on August 18, 2024 1

Added use_cache=True to generate calls in tonight's push. Works much better on my end, let me know if reports of sluggishness continue. Thanks again.

from koboldai-client.

KoboldAI avatar KoboldAI commented on August 18, 2024

Strangely, when I was first implementing transformers, I couldn't actually get it to use my GPU unless I added that device assignment. More recently I tried replacing the transformers pipeline with the torch.load implementation from your colab notebook, but I lost the ability to run the 2.7B models on my 8GB card (CUDA would OOM). I could avoid the OOM by clearing the checkpoint keys before assigning it to the model, but the generator results became just a random assortment of nonsense, so I scrapped the experiment.

from koboldai-client.

finetunej avatar finetunej commented on August 18, 2024

The notebook is optimized for initializing both the checkpoint and model on GPU, because regualr colab has more VRAM than system RAM. For running locally, it should work as long as you don't have the map_location (or have it, but set to cpu) on the torch.load().

from koboldai-client.

KoboldAI avatar KoboldAI commented on August 18, 2024

I'm not seeing a noticeable difference using either eval() or cuda(0). I have the following lines of code:
if(vars.hascuda and vars.usegpu):
model = model.cuda(0).eval()
generator = pipeline('text-generation', model=model, tokenizer=tokenizer, device=0)
I'm still seeing ~18 second generation times, and the GPU is definitely being used as I can see the dedicated memory usage sitting around 6-7Gb in Task Manager. I actually don't know when this started happening, I definitely remember having 3-5 second generation times at one point.

from koboldai-client.

finetunej avatar finetunej commented on August 18, 2024

Maybe that is not the issue then. Is that with the original EleutherAI model?

from koboldai-client.

KoboldAI avatar KoboldAI commented on August 18, 2024

Ran all three just now.
The stock 1.3B and 2.7B models are giving me ~3-5s generations without the .cuda(0).eval() additions.
Neo-Horni takes ~18s regardless of whether .cuda(0).eval() are in use.

from koboldai-client.

finetunej avatar finetunej commented on August 18, 2024

Interesting, thanks for testing. Does horni get faster if you change gradient_checkpointing to false in its config file?

from koboldai-client.

KoboldAI avatar KoboldAI commented on August 18, 2024

No, no change with gradient_checkpointing. However, if I swap Horni's config file for the one from stock 2.7B, generation times drop to ~6-7s

from koboldai-client.

finetunej avatar finetunej commented on August 18, 2024

Didn't really see any other differences between the files. Very strange.

from koboldai-client.

finetunej avatar finetunej commented on August 18, 2024

Actually, could you please try once more with "use_cache" set to true in horni's config?

from koboldai-client.

KoboldAI avatar KoboldAI commented on August 18, 2024

Oh yeah, that did it. Lightning quick responses with use_cache set to true.

from koboldai-client.

finetunej avatar finetunej commented on August 18, 2024

In that case, adding use_cache=True to the generator() call should do the trick.

from koboldai-client.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.