Giter Club home page Giter Club logo

Comments (9)

mmike87 avatar mmike87 commented on June 2, 2024 4

I watched my GPU usage and it was not touched.

from privategpt.

iker-lluvia avatar iker-lluvia commented on June 2, 2024 3

I can get it work in Ubuntu 22.04 installing llama-cpp-python with cuBLAS:
CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python==0.1.48

If installation fails because it doesn't find CUDA, it's probably because you have to include CUDA install path to PATH environment variable:
export PATH=/usr/local/cuda/bin${PATH:+:${PATH}}

Anyways, it only uses lesst than 1 GB of the VRAM on a RTX 2060 with 6 GB, so I don't know if something is still missing.

from privategpt.

shondle avatar shondle commented on June 2, 2024 2

If anyone can't still figure this out, I explained how I got it to work in detail here (issue #217)

from privategpt.

iker-lluvia avatar iker-lluvia commented on June 2, 2024 1

Aren't you just emulating the CPU? Idk if there's even working port for GPU support

It shouldn't. The llama.cpp library can perform BLAS acceleration using the CUDA cores of the Nvidia GPU through cuBLAS. I expect llama-cpp-python to do so as well when installing it with cuBLAS.
Any fast way to verify if the GPU is being used other than running nvidia-smi or nvtop?

from privategpt.

su77ungr avatar su77ungr commented on June 2, 2024 1

Nvm my collaborator found a way see

from privategpt.

walking-octopus avatar walking-octopus commented on June 2, 2024

Chances are, it's already partially using the GPU. As it is now, it's a script linking together LLaMa.cpp emeddings, Chroma vector DB, and GPT4All. GPT4All might be using PyTorch with GPU, Chroma is probably already heavily CPU parallelized, and LLaMa.cpp runs only on the CPU.

It's also worth noting that two LLMs are used with different inference implementations, meaning you may have to load the model twice.

from privategpt.

pabl-o-ce avatar pabl-o-ce commented on June 2, 2024

this mean that this work only with CPU?

I currently want to try this

Also can give some info on the Readme about the requirements of hardware.

from privategpt.

su77ungr avatar su77ungr commented on June 2, 2024

No, LlamaCpp was designed to take only CPU resources. For GPU you'd have to use the native Llama model from facebook.

from privategpt.

su77ungr avatar su77ungr commented on June 2, 2024

Aren't you just emulating the CPU?
Idk if there's even working port for GPU support

from privategpt.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.