Comments (9)
Are you referring to CPU or GPU memory? Thanks!
from h2ogpt.
@pseudotensor Both.
i.e. if I run in GPU mode, the build up that occurs from this is in GPU memory. If I run in CPU only mode, the buildup is in the CPU/system memory.
from h2ogpt.
Hi @ml-l Thanks for finding. Some clean-up while back led to issue. I pushed fix for case of loading new model or unloading model leading to memory still being consumed.
Does that solve your problem? Sorry for the long delay in fix.
from h2ogpt.
No worries regarding the delay.
I've tried using the latest docker image (tags 0.1.0-324 / latest) and seemingly the issue is still there.
from h2ogpt.
I'm confident the continued GPU use is fixed. I confirmed it was there and that I fixed it.
As for still using CPU, I also saw that was fixed.
If you give me a specific sequence of what you are doing I can take a look.
from h2ogpt.
It could very well be with how I've configured things or maybe where/how I'm checking might not align with where you've put in the fix(?)
My sequence of actions running in GPU mode are as follows:
1: removed my current gcr.io/vorvan/h2oai/h2ogpt-runtime:latest docker image to ensure that the latest one is downloaded.
2: run nvidia-smi -l 1
in a separate bash to monitor GPU usage (Idle usage output seen below)
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 545.23.08 Driver Version: 545.23.08 CUDA Version: 12.3 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA A100-PCIE-40GB On | 00000000:17:00.0 Off | 0 |
| N/A 34C P0 43W / 250W | 4MiB / 40960MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+----------------------+----------------------+
| 1 NVIDIA A100-PCIE-40GB On | 00000000:65:00.0 Off | 0 |
| N/A 34C P0 46W / 250W | 4MiB / 40960MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| No running processes found |
+---------------------------------------------------------------------------------------+
3: ran the following 2 commands to run h2ogpt in GPU mode and to be able to choose models dynamically
export GRADIO_SERVER_PORT=7860
sudo docker run
--gpus device=0 \
--runtime=nvidia \
--shm-size=2g \
-p $GRADIO_SERVER_PORT:$GRADIO_SERVER_PORT \
--rm --init \
-v /etc/passwd:/etc/passwd:ro \
-v /etc/group:/etc/group:ro \
-u `id -u`:`id -g` \
-v /mnt/alpha/.cache:/workspace/.cache \
-v /mnt/alpha/h2ogpt_share/save:/workspace/save \
-v /mnt/alpha/h2ogpt_share/user_path:/workspace/user_path \
-v /mnt/alpha/h2ogpt_share/db_dir_UserData:/workspace/db_dir_UserData \
-v /mnt/alpha/h2ogpt_share/users:/workspace/users \
-v /mnt/alpha/h2ogpt_share/db_nonusers:/workspace/db_nonusers \
-v /mnt/alpha/h2ogpt_share/llamacpp_path:/workspace/llamacpp_path \
-v /mnt/alpha/h2ogpt_share/h2ogpt_auth:/workspace/h2ogpt_auth \
-e USER=someone \
gcr.io/vorvan/h2oai/h2ogpt-runtime:latest /workspace/generate.py \
--use_safetensors=True \
--save_dir='/workspace/save/' \
--use_gpu_id=False \
--user_path=/workspace/user_path \
--langchain_mode="LLM" \
--langchain_modes="['UserData', 'LLM']" \
--score_model=None \
--max_max_new_tokens=2048 \
--max_new_tokens=1024
At this point, idle GPU0 usage is 2641MiB / 40960MiB
(GPU1 remained 4MiB / 40960MiB as intended from command).
4: Open an incognito/private browser (in my case Firefox but I don't think this should matter) to hosted h2oGPT instance.
5: Opened Models
tab to enter the following parameters
Base Model
= HuggingFaceH4/zephyr-7b-betaLORA
= None (default)Enter Server
= None (default)Prompt Type
= zephyr
6: Clicked Load (Download) Model
. And now GPU0 usage is 17195MiB / 40960MiB
.
7: Closed the browser that's connected to h2ogpt. GPU0 usage remains 17195MiB / 40960MiB
.
8: Re-opened browser in incognito/private mode again to go to hosted h2ogpt again. GPU0 usage remains 17195MiB / 40960MiB
.
9: Repeated step 5 to load in another zephyr model to see if the fix was preventing multiple copies of the same model being loaded. GPU0 usage is now 31619MiB / 40960MiB
10: Clicking UnLoad Model
brings GPU0 usage to 17195MiB / 40960MiB
11: Closing the browser session and checking again the GPU0 usage remains to be 17195MiB / 40960MiB
Only until I stop the docker container that's running h2oGPT that GPU0 memory usages goes back to 4MiB / 40960MiB
.
And double checking the hash of the docker image I'm using, the output of
sudo docker inspect --format='{{index .RepoDigests 0}}' gcr.io/vorvan/h2oai/h2ogpt-runtime:latest
is the following:
gcr.io/vorvan/h2oai/h2ogpt-runtime@sha256:806b31aadbd0ca24f1e0e2822c9f38a6a51b1e0f45c56a290081f35c04997dc4
EDIT: Step 9 should've said repeat steps 5 and 6 rather than 3. i.e. load in another zephyr model (to check whether the fix was through preventing extra memory being allocated for the same model being loaded in) rather than spinning up another Docker container.
from h2ogpt.
Ah yes, if you do the step:
Closed the browser that's connected to h2ogpt. GPU0 usage remains 17195MiB / 40960MiB.
The server loses who you are and the model is associated with that prior user.
The problem is this: gradio-app/gradio#4016
gradio-app/gradio#7227
I'm unsure how to work around.
from h2ogpt.
Ok, I'm following along. First I have:
- docker pulled
docker pull gcr.io/vorvan/h2oai/h2ogpt-runtime:0.1.0
- run
watch -n 1 nvidia-smi
- Then I ran my version of your run:
export GRADIO_SERVER_PORT=7860
docker run \
--gpus device=0 \
--runtime=nvidia \
--shm-size=2g \
-p $GRADIO_SERVER_PORT:$GRADIO_SERVER_PORT \
--rm --init \
-v /etc/passwd:/etc/passwd:ro \
-v /etc/group:/etc/group:ro \
-u `id -u`:`id -g` \
-v /home/jon/.cache/huggingface/hub:/workspace/.cache/huggingface/hub \
-v /home/jon/.cache/huggingface/modules:/workspace/.cache/huggingface/modules \
-v /home/jon/h2ogpt/save:/workspace/save \
-v /home/jon/h2ogpt/user_path:/workspace/user_path \
-v /home/jon/h2ogpt/db_dir_UserData:/workspace/db_dir_UserData \
-v /home/jon/h2ogpt/users:/workspace/users \
-v /home/jon/h2ogpt/db_nonusers:/workspace/db_nonusers \
-v /home/jon/h2ogpt/llamacpp_path:/workspace/llamacpp_path \
-v /home/jon/h2ogpt/h2ogpt_auth:/workspace/h2ogpt_auth \
-e USER=jon \
gcr.io/vorvan/h2oai/h2ogpt-runtime:latest /workspace/generate.py \
--use_safetensors=True \
--save_dir='/workspace/save/' \
--use_gpu_id=False \
--user_path=/workspace/user_path \
--langchain_mode="LLM" \
--langchain_modes="['UserData', 'LLM']" \
--score_model=None \
--max_max_new_tokens=2048 \
--max_new_tokens=1024
-
open browser localhost:7860
-
clicked load model on zehpyr 7b beta like you
-
check nvidia-smi, see 18236MB used
- click unload and then wait 20s, then see 3.6GB used (embedding etc.)
from h2ogpt.
I modified gradio to be able to do this now.
from h2ogpt.
Related Issues (20)
- pytorch_model.bin 1.34G download hangs forever on Linux HOT 7
- umbrella podSecurityContext null values are always overwritten by sub-chart default values
- [Question] how model learn data from new document ? HOT 1
- EventListener Failure HOT 2
- GPU Installation HOT 16
- Enchance h2oGPT UI to have librechat like features. HOT 2
- Sepparate Upload Document to Database H2O and Query-Summary HOT 1
- Linux install of h2ogpt--Require corrections in install Instructions HOT 5
- failed to concatenate document_choice HOT 1
- question regarding model_lock HOT 2
- Executing small model but missing config.json error with microsoft/Phi-3-mini-4k-instruct-gguf HOT 1
- Q and A not working for Youtube HOT 7
- sentence transformer version HOT 2
- Logging/Saving Settings and Instructions for Inference Jobs HOT 6
- Youtube ingestion doesn't work HOT 3
- Youtube chat does not work HOT 1
- Timestamps issue in Youtube Chat
- Document Content Presentation Difference Between Built-In UI and Custom UI using Gradio client HOT 1
- Change AutoGPT Agent Embeddings Model HOT 5
- Question:extracting preference data of clients' response HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from h2ogpt.