Seems like there's no way to run GPT-J-6B models locally using CPU or CPU+GPU modes. I

Can't run any GPT-J-6B model locally in CPU or GPU+CPU modes about koboldai-client HOT 4 CLOSED

koboldai commented on July 1, 2024

Can't run any GPT-J-6B model locally in CPU or GPU+CPU modes

from koboldai-client.

Comments (4)

henk717 commented on July 1, 2024

Alright, that is quite a lot to break down, this does not appear to be a KoboldAI bug as your running into issues with the dependencies itself but lets dig in.

So lets begin at the start with your setup process, the bundled requirements.txt really is only meant to be used with the Colabs and may or may not work well on your system. For most use cases we advice creating a conda environment instead using the bundled files (Make sure to use the official Miniconda3 install script, not something like apt-get install python-conda) instead of using a venv yourself, for now this will also be the finetuneanon version (Unless you really want that CPU mode to work). I don't think your specific issue is related to that, but its good to know for the future if you do ever run into dependency issues.

The second issue is that the Half mode does not work on the CPU, which is indeed correct and to be expected (Like you mentioned we documented this). We are phasing out Finetune's branch entirely in the upcoming version but since we ran into similar issues in the upcoming 0.17 builds it may or may not be fixed by other fixing efforts. We implemented the Half mode for the GPU in that upcoming version for the official transformers so that you get a GPU mode with even more efficient loading, and you also get fully functional CPU support. Finetune's branch got quite far behind upstream and most of its features have been integrated inside KoboldAI on that development branch so very soon we will no longer be recommending anyone to use it once 0.17 is finished.

Then the next part will get a bit more complicated. First of all the behavior of the models inside the official version of Transformers is entirely to be expected. Its supposed to load wrong, spew gibberish or not load at all. The reason for this is that Finetune invented his own format for 6B that the upstream version chose not to use. In the version of KoboldAI you are using we did not implement the official format yet and because of that it will all end up loading completely wrong, as it is trying to load a Neo model that is in reality a fake 6B format. So for now you will need either Finetune's fork or you will need the development version of KoboldAI (Currently at https://github.com/henk717/koboldai ) along with models converted to the official format for 6B which we to avoid confusion dubbed the HFJ format.

So, that explains away the error your getting on the CPU mode, and the gibberish you are getting when your not using the appropriate version for the models you are using.

That leaves one more issue we need to tackle, and thats the fact its not working for you, i downloaded the same model and loaded up the same branch of KoboldAI as your using (0.16). Then i loaded the Finetune version of Transformers (In my case ROCm since i have AMD). Generation went smooth and i did not run into any generation errors.

So the issue is most likely somewhere in your environment, are you certain its the model you list in your notes? Because on our downloads we also have the gpt-hfj-6b-adventure.7z which is that newer model format. If it is, then i highly recommend retesting using our tried and tested conda environments. Or the play-cuda.sh docker launcher if you have a Docker configured to be able to use your Nvidia.

If you'd like to get more one on one support i recommend joining our Discord https://discord.gg/XuQWadgU9k we can help you get going there and its quicker than resolving it over a Github issue.

from koboldai-client.

z80maniac commented on July 1, 2024

Thanks for a quick reply! It really clarified a lot of things.

I've tried the new version with the gpt-hfj-6b-adventure model and it indeed works. Also I am really surprised at how fast it works in GPU+CPU mode. When specifying only 4 VRAM blocks it generates the output after 60 seconds. I think it's at least tolerable. And the RAM usage is also relatively low: only 12GB or so.

BTW, I used venv just as before. But I also tested the current version with Conda (using environments/finetuneanon.yml) and gpt-j-6b-adventure-hf model. And got exactly the same error (module 'keras.backend' has no attribute 'is_tensor'). So I guess it's really some sort of model-transformer incompability and not a package issue. Maybe it's NVIDIA-only since there's no issue on AMD. Or maybe it's some weird problem with my OS or host environment (but I ran a lot of other CUDA projects and they worked). I didn't test it in Docker though. Tried play-cuda.sh, but it stuck while building the image in the middle of installing the packages. Not sure if it supposed to be like that and if I just needed to wait longer. May be I'll re-test it in the future.

the bundled requirements.txt really is only meant to be used with the Colabs and may or may not work well on your system.

But in the docs it says:

If you do not want to use conda install the requirements listed in requirements.txt and make sure that CUDA is properly installed.

If it's really not recommended or not supported then I think the docs should say so. Though I don't think there should be any difference. But I agree that the Conda way may be more fool-proof (but it didn't change anything in my case).

from koboldai-client.

henk717 commented on July 1, 2024

In 1.17 you can now use the regular version of transformers (huggingface.yml) for everything. I also updated the readme to let people know requirements.txt is not recommended.

The suitable models are all in the menu now. Let me know if your still having issues.

from koboldai-client.

z80maniac commented on July 1, 2024

Yeah, I've been using this version for a while now, and everything works correctly (except play-cuda.sh). 6B models are loaded without errors (though I loaded them from a folder, not from the menu). Also, requirements.txt works for me (didn't try Conda again).

This can probably be closed.

from koboldai-client.

Can't run any GPT-J-6B model locally in CPU or GPU+CPU modes about koboldai-client HOT 4 CLOSED

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent