abacaj / mpt-30b-inference Goto Github PK

View Code? Open in Web Editor NEW

574.0 14.0 98.0 4.02 MB

Run inference on MPT-30B using CPU

License: MIT License

Python 100.00%

ctransformers ggml mpt-30b

mpt-30b-inference's People

Contributors

Stargazers

Watchers

Forkers

librty techthiyanes plaskod manniru abdoiiii octag0no kumar045 promptengineer48 aghilrs hurricanejin ybkangster taryaoui vziy98 kenwalker22 gladiopeace leecig shaunwei luomor-ai dongxiaolong stjordanis codeaudit apollohuang1 kwishna vinicius-ianni braman09 fusedon vpereverzev abhijitdalavi nacloudai ailabteam vasilishadrin198244 warlordpepe tonywhite11 weisk arcanemikasa ilseschak hhjiiuxm gimbogsunga githubs6-6 j0schihatake dst1213 weiplanet michel34343 kallsff 0nestpeherse johnsfon701 johnathonlogan analese denisbelogradskiy ukaserge wangye866 tensiespurte tersipawo ffreemt paixai conperwbronmu 1mensachaji 0lalacresgu 9graminkimga buoumao63 9suppcalktarto luelamyce leomit18 nosgeqsudo 0riaforhern opdenere adigponsicf garan1 gyyle1 shuist1 3creptufviasa 3sulhauysubsdo 8senmagflambi 9tricittiaba placevgeoma myobreedrocka quefastsihy 9selixelko 1laget 0deimiytiza spicyshoyo astachurski deltavml ai-ar4s-dev gshreyash renschni shijq23 yinjake edgarrc swap357 pingud98 ssshuishui lihuibng

mpt-30b-inference's Issues

MPT No response

I am trying to generate responses based on the input from mpt-30B model and create a API using flask but I am having trouble as it not giving response i.e. the response is empty for each input I am asking. I am using Standard F32s v2 (32 vcpus, 64 GiB memory) with remote access to server.

from flask import Flask, request, Response
from dataclasses import dataclass, asdict
from ctransformers import AutoModelForCausalLM, AutoConfig
import os
import time

app = Flask(name)

@DataClass
class GenerationConfig:
temperature: float
top_k: int
top_p: float
repetition_penalty: float
max_new_tokens: int
seed: int
reset: bool
stream: bool
threads: int
stop: list

def format_prompt(system_prompt: str, user_prompt: str):
system_prompt = f"system\n{system_prompt}\n"
user_prompt = f"user\n{user_prompt}\n"
assistant_prompt = f"assistant\n"
return f"{system_prompt}{user_prompt}{assistant_prompt}"

def generate(
llm: AutoModelForCausalLM,
generation_config: GenerationConfig,
system_prompt: str,
user_input: str,
):
# return llm(
# format_prompt(
# system_prompt,
# user_prompt,
# ),
# **asdict(generation_config),
# )
model_output = llm(
format_prompt(system_prompt, user_input.strip()),
**asdict(generation_config),
)
print("Model output:", model_output)
return model_output

@app.route('/generate', methods=['GET','POST'])
def generate_response_endpoint():
#user_input = request.data.decode('utf-8')
# Load the model and configuration
if request.method == 'GET':
user_input = request.args.get('user_input', '') # Get input from query parameter
elif request.method == 'POST':
user_input = request.data.decode('utf-8')

print("Loading model...")
#config = AutoConfig.from_pretrained("mosaicml/mpt-30b-chat", context_length=8192)
llm = AutoModelForCausalLM.from_pretrained(
    "/home/azureuser/mpt-30B-inference/models/mpt-30b-chat.ggmlv0.q4_1.bin",
    model_type="mpt"
)
print("model Loaded")

system_prompt = "Reply."

generation_config = GenerationConfig(
    temperature=0.2,
    top_k=0,
    top_p=0.9,
    repetition_penalty=1.0,
    max_new_tokens=512, 
    seed=42,
    reset=False,  
    stream=False, 
    threads=int(os.cpu_count() / 2),  # adjust for your CPU
    stop=["", "|<"],
)

generator = generate(llm, generation_config, system_prompt, user_input.strip())
#time.sleep(60)

print(generator)

response = generator

print(response)
return Response(response, content_type='text/plain; charset=utf-8')

if name == "main":
app.run(host='0.0.0.0', port=3002)

iimport requests

while True:
user_input = input("You: ")
if user_input.lower() in ['exit', 'quit']:
print("Exiting...")
break

data = {'user_input': user_input+"Respond to this."}
response = requests.post('http://127.0.0.1:3002/generate', json=data)

if response.status_code == 200:
    assistant_response = response.text
#    assistant_response = assistant_response.replace("You:", "").replace("Assistant:", "").strip()
    print("Assistant:", assistant_response)
else:
    print("Error:", response.status_code)

but the repsonse which i am getting is empty as can be seen below:-

You: 3+4
Assistant:
You:

Any idea what may be causing this issue here and what can be done to resolve this?

Wrong os.path

Hello! Kudos for your nice work on the open-source community of LLM's. I'm learning a lot for your findings.

I've tried to run the latest commit on a GCP VM with 32GB of ram.

And I've found this error:

(env) (base) sergiomoreno@production:~/mpt-30B-inference$ python inference.py
/home/sergiomoreno/mpt-30B-inference/models/mpt-30b-chat.ggmlv0.q4_1.bin
Fetching 1 files: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 7752.87it/s]
Traceback (most recent call last):
  File "/home/sergiomoreno/mpt-30B-inference/inference.py", line 51, in <module>
    llm = AutoModelForCausalLM.from_pretrained(
  File "/home/sergiomoreno/mpt-30B-inference/env/lib/python3.10/site-packages/ctransformers/hub.py", line 157, in from_pretrained
    return LLM(
  File "/home/sergiomoreno/mpt-30B-inference/env/lib/python3.10/site-packages/ctransformers/llm.py", line 203, in __init__
    if not Path(model_path).is_file():
  File "/home/sergiomoreno/miniconda3/lib/python3.10/pathlib.py", line 960, in __new__
    self = cls._from_parts(args)
  File "/home/sergiomoreno/miniconda3/lib/python3.10/pathlib.py", line 594, in _from_parts
    drv, root, parts = self._parse_args(args)
  File "/home/sergiomoreno/miniconda3/lib/python3.10/pathlib.py", line 578, in _parse_args
    a = os.fspath(a)
TypeError: expected str, bytes or os.PathLike object, not NoneType

After looking the files structure, I've saw that folder didn't exist, and had to replace it to this:

    llm = AutoModelForCausalLM.from_pretrained(
        os.path.abspath("models/models--TheBloke--mpt-30B-chat-GGML/snapshots/60df632f84e8b99fa7aeadf314467152be55adf4/mpt-30b-chat.ggmlv0.q4_1.bin"),
        model_type="mpt",

Is it something expected or can we improve it somehow?

How to implement continuous dialogue, like ChatGPT?

Request: Dockerfile

@abacaj Would be really helpful to have Dockerfile you've been using in your demo

Can we use GPU inference?

Thank you for such a great repo. I was wondering if we can use GPU inference for text generation?

Its Slow on Mac M1

why is it slow on Mac M1 takes several min to gen 1 token

Please help

Good afternoon, first of all, thank you for your work. It is very interesting. I am a java developer in production so I don't have much expertise in python. When I run the example from your youtube, I will give an error at the end.

Please tell me how much more difficult it would be to apply here rest communication by post request, if not difficult.

below is the description of the error:

(mpt30_final) D:\Develop\NeuronNetwork\Mpt30\mpt_30B_inference>python inference.py
Fetching 1 files: 100%|██████████████████████████████████████████████████████████████████████████| 1/1 [00:00<?, ?it/s]
Traceback (most recent call last):
File "D:\Develop\NeuronNetwork\Mpt30\mpt_30B_inference\inference.py", line 49, in
llm = AutoModelForCausalLM.from_pretrained(
File "C:\Users\j0sch\miniconda3\envs\mpt30_final\lib\site-packages\ctransformers\hub.py", line 157, in from_pretrained
return LLM(
File "C:\Users\j0sch\miniconda3\envs\mpt30_final\lib\site-packages\ctransformers\llm.py", line 203, in init
if not Path(model_path).is_file():
File "C:\Users\j0sch\miniconda3\envs\mpt30_final\lib\pathlib.py", line 958, in new
self = cls._from_parts(args)
File "C:\Users\j0sch\miniconda3\envs\mpt30_final\lib\pathlib.py", line 592, in _from_parts
drv, root, parts = self._parse_args(args)
File "C:\Users\j0sch\miniconda3\envs\mpt30_final\lib\pathlib.py", line 576, in _parse_args
a = os.fspath(a)
TypeError: expected str, bytes or os.PathLike object, not NoneType

Core dumped

Hello, I got an error as below after executed "python inference.py", How to fix it. Thanks.

Error msg:

Memory

Disk

VGA

CPU

OSError: /lib/x86_64-linux-gnu/libm.so.6: version `GLIBC_2.29' not found

Hi,

When trying to run inference, I got the following error message:

Downloading (…)feaf43e4/config.json: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.24k/1.24k [00:00<00:00, 8.64MB/s]
Fetching 1 files: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  2.38it/s]
Traceback (most recent call last):
  File "/home/matthieu/Deployment/mpt-30B-inference/inference.py", line 49, in <module>
    llm = AutoModelForCausalLM.from_pretrained(
  File "/home/matthieu/anaconda3/envs/mpt_30b_cpu/lib/python3.10/site-packages/ctransformers/hub.py", line 157, in from_pretrained
    return LLM(
  File "/home/matthieu/anaconda3/envs/mpt_30b_cpu/lib/python3.10/site-packages/ctransformers/llm.py", line 206, in __init__
    self._lib = load_library(lib)
  File "/home/matthieu/anaconda3/envs/mpt_30b_cpu/lib/python3.10/site-packages/ctransformers/llm.py", line 102, in load_library
    lib = CDLL(path)
  File "/home/matthieu/anaconda3/envs/mpt_30b_cpu/lib/python3.10/ctypes/__init__.py", line 374, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: /lib/x86_64-linux-gnu/libm.so.6: version `GLIBC_2.29' not found (required by /home/matthieu/anaconda3/envs/mpt_30b_cpu/lib/python3.10/site-packages/ctransformers/lib/avx2/libctransformers.so)

I use an AMD® Ryzen threadripper 3960x 24-core processor × 48 on Ubuntu 18.04 LTS.

Thanks for any help!

mpt30b not showing any response.

during the inference after the user input the model waits for few seconds but does not respond anything just returns empty. I'm using it on dell optiplex 7070 micro with intel core i7 9700t with 8 cores and 32gb ram.

abacaj / mpt-30b-inference Goto Github PK

mpt-30b-inference's People

Contributors

Stargazers

Watchers

Forkers

mpt-30b-inference's Issues

MPT No response

Wrong os.path

How to implement continuous dialogue, like ChatGPT?

Request: Dockerfile

Can we use GPU inference?

Its Slow on Mac M1

Please help

Core dumped

OSError: /lib/x86_64-linux-gnu/libm.so.6: version `GLIBC_2.29' not found

mpt30b not showing any response.

Illegal instruction (core dumped)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent