Comments (8)
For cases like this I recommend docker because of the environment issues. I have windows as well, here's how I run it.
Use a container like so:docker run -it --mount type=volume,source=transformers,target=/transformerspython:3.11.4 /bin/bash
Clone the repo:
git clone [email protected]:abacaj/mpt-30B-inference.git
Follow directions in the readme for the rest: https://github.com/abacaj/mpt-30B-inference#setup. I just ran through this process once again now, and it works I can get model to generate correctly on my Ryzen/Windows machine:
Thank you. Ive created a conda env, installed the requirements, manually downloaded 2 models (q5_1 and q4_1). Any hint on why this empty responses? I really prefer not to use a container.
Great work by the way!
Likely has to do with ctransformers library, since that is how the bindings work from python -> ggml (though I'm not certain of it)
from mpt-30b-inference.
Issue Fixed Replace files with
https://github.com/mzubair31102/llama2.git
from mpt-30b-inference.
I'm having the same problem. Processing goes to 100% for a few seconds but returns empty answers. It goes around 24Gb of RAM usage.
I tested in VScode and in cmd. Same behaviour.
Ive tried to debug, but the "generator" variable had no kind of string text inside it.
I'm running mpt-30b-chat.ggmlv0.q5_1.bin model instead of default q4_0.
PC: Ryzen 5900X and 32 Gb RAM.
from mpt-30b-inference.
For cases like this I recommend docker because of the environment issues. I have windows as well, here's how I run it.
Use a container like so:
docker run -it -w /transformers --mount type=volume,source=transformers,target=/transformers python:3.11.4 /bin/bash
Clone the repo:
git clone [email protected]:abacaj/mpt-30B-inference.git
Follow directions in the readme for the rest: https://github.com/abacaj/mpt-30B-inference#setup.
I just ran through this process once again now, and it works I can get model to generate correctly on my Ryzen/Windows machine:
from mpt-30b-inference.
For cases like this I recommend docker because of the environment issues. I have windows as well, here's how I run it.
Use a container like so:
docker run -it --mount type=volume,source=transformers,target=/transformerspython:3.11.4 /bin/bash
Clone the repo:
git clone [email protected]:abacaj/mpt-30B-inference.git
Follow directions in the readme for the rest: https://github.com/abacaj/mpt-30B-inference#setup. I just ran through this process once again now, and it works I can get model to generate correctly on my Ryzen/Windows machine:
Thank you.
Ive created a conda env, installed the requirements, manually downloaded 2 models (q5_1 and q4_1). Any hint on why this empty responses? I really prefer not to use a container.
Great work by the way!
from mpt-30b-inference.
I have observed that when processing user queries, the CPU usage increases but I do not receive a response.
[user]: What is the capital of France?
[assistant]:
[user]"
from mpt-30b-inference.
For cases like this I recommend docker because of the environment issues. I have windows as well, here's how I run it.
Use a container like so:docker run -it --mount type=volume,source=transformers,target=/transformerspython:3.11.4 /bin/bash
Clone the repo:
git clone [email protected]:abacaj/mpt-30B-inference.git
Follow directions in the readme for the rest: https://github.com/abacaj/mpt-30B-inference#setup. I just ran through this process once again now, and it works I can get model to generate correctly on my Ryzen/Windows machine:
Thank you. Ive created a conda env, installed the requirements, manually downloaded 2 models (q5_1 and q4_1). Any hint on why this empty responses? I really prefer not to use a container.
Great work by the way!
python3 inference.py
Fetching 1 files: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 3584.88it/s]
GGML_ASSERT: /home/runner/work/ctransformers/ctransformers/models/ggml/ggml.c:4103: ctx->mem_buffer != NULL
Aborted
from mpt-30b-inference.
I'm also facing this issue on Windows.
However, the main problem is when I run this on a container it produces very slow responses.
from mpt-30b-inference.
Related Issues (11)
- How to implement continuous dialogue, like ChatGPT? HOT 1
- Illegal instruction (core dumped) HOT 2
- Core dumped
- MPT No response
- Wrong os.path HOT 2
- OSError: /lib/x86_64-linux-gnu/libm.so.6: version `GLIBC_2.29' not found HOT 1
- Its Slow on Mac M1 HOT 1
- Request: Dockerfile HOT 2
- Please help HOT 1
- Can we use GPU inference? HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from mpt-30b-inference.