during the inference after the user input the model waits for few seconds but does not

Issue Fixed Replace files with <a href="https://github.com/mzubair31102/llama2.git

mpt30b not showing any response. about mpt-30b-inference HOT 8 OPEN

abacaj commented on July 17, 2024 2

mpt30b not showing any response.

from mpt-30b-inference.

Comments (8)

abacaj commented on July 17, 2024 1

For cases like this I recommend docker because of the environment issues. I have windows as well, here's how I run it.
Use a container like so:
docker run -it --mount type=volume,source=transformers,target=/transformerspython:3.11.4 /bin/bash
Clone the repo:
git clone [email protected]:abacaj/mpt-30B-inference.git
Follow directions in the readme for the rest: https://github.com/abacaj/mpt-30B-inference#setup. I just ran through this process once again now, and it works I can get model to generate correctly on my Ryzen/Windows machine:
Thank you. Ive created a conda env, installed the requirements, manually downloaded 2 models (q5_1 and q4_1). Any hint on why this empty responses? I really prefer not to use a container.

Great work by the way!

Likely has to do with ctransformers library, since that is how the bindings work from python -> ggml (though I'm not certain of it)

from mpt-30b-inference.

mzubairumt commented on July 17, 2024 1

Issue Fixed Replace files with
https://github.com/mzubair31102/llama2.git

from mpt-30b-inference.

rodrigofarias-MECH commented on July 17, 2024

I'm having the same problem. Processing goes to 100% for a few seconds but returns empty answers. It goes around 24Gb of RAM usage.
I tested in VScode and in cmd. Same behaviour.
Ive tried to debug, but the "generator" variable had no kind of string text inside it.

I'm running mpt-30b-chat.ggmlv0.q5_1.bin model instead of default q4_0.

PC: Ryzen 5900X and 32 Gb RAM.

from mpt-30b-inference.

abacaj commented on July 17, 2024

For cases like this I recommend docker because of the environment issues. I have windows as well, here's how I run it.

Use a container like so:

docker run -it -w /transformers --mount type=volume,source=transformers,target=/transformers python:3.11.4 /bin/bash

Clone the repo:

git clone [email protected]:abacaj/mpt-30B-inference.git

Follow directions in the readme for the rest: https://github.com/abacaj/mpt-30B-inference#setup.
I just ran through this process once again now, and it works I can get model to generate correctly on my Ryzen/Windows machine:

from mpt-30b-inference.

rodrigofarias-MECH commented on July 17, 2024

For cases like this I recommend docker because of the environment issues. I have windows as well, here's how I run it.

Use a container like so:
docker run -it --mount type=volume,source=transformers,target=/transformerspython:3.11.4 /bin/bash
Clone the repo:
git clone [email protected]:abacaj/mpt-30B-inference.git
Follow directions in the readme for the rest: https://github.com/abacaj/mpt-30B-inference#setup. I just ran through this process once again now, and it works I can get model to generate correctly on my Ryzen/Windows machine:

Thank you.
Ive created a conda env, installed the requirements, manually downloaded 2 models (q5_1 and q4_1). Any hint on why this empty responses? I really prefer not to use a container.

Great work by the way!

from mpt-30b-inference.

mzubairumt commented on July 17, 2024

I have observed that when processing user queries, the CPU usage increases but I do not receive a response.
[user]: What is the capital of France?
[assistant]:
[user]"

from mpt-30b-inference.

mzubairumt commented on July 17, 2024

For cases like this I recommend docker because of the environment issues. I have windows as well, here's how I run it.
Use a container like so:
docker run -it --mount type=volume,source=transformers,target=/transformerspython:3.11.4 /bin/bash
Clone the repo:
git clone [email protected]:abacaj/mpt-30B-inference.git
Follow directions in the readme for the rest: https://github.com/abacaj/mpt-30B-inference#setup. I just ran through this process once again now, and it works I can get model to generate correctly on my Ryzen/Windows machine:
Thank you. Ive created a conda env, installed the requirements, manually downloaded 2 models (q5_1 and q4_1). Any hint on why this empty responses? I really prefer not to use a container.

Great work by the way!

python3 inference.py
Fetching 1 files: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 3584.88it/s]
GGML_ASSERT: /home/runner/work/ctransformers/ctransformers/models/ggml/ggml.c:4103: ctx->mem_buffer != NULL
Aborted

from mpt-30b-inference.

renanfferreira commented on July 17, 2024

I'm also facing this issue on Windows.
However, the main problem is when I run this on a container it produces very slow responses.

from mpt-30b-inference.

mpt30b not showing any response. about mpt-30b-inference HOT 8 OPEN

Comments (8)

Related Issues (11)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent