Comments (3)
Hi there, we have not tested LMQL with Qwen models yet, so this may be an issue with supporting its tokenizer. I will have to investigate a bit further.
from lmql.
You can fix the assertion by adding if p2 == "*": break
after this line:
lmql/src/lmql/ops/follow_map.py
Line 241 in ab02526
However, I could not get the model to do inference on my machine, since it never seems to finish a forward pass. Maybe you can try running with the change above, and report back with further results?
from lmql.
You can fix the assertion by adding
if p2 == "*": break
after this line:lmql/src/lmql/ops/follow_map.py
Line 241 in ab02526
However, I could not get the model to do inference on my machine, since it never seems to finish a forward pass. Maybe you can try running with the change above, and report back with further results?
Thank you for your reply! I still couldn't fix the error by adding the code at the location you mentioned. But I can directly add this code in front of the assertion which is throwing the error to allow the program to continue running. Now, my new error message is as follows:
[Loading Qwen/Qwen-72B-Chat with AutoModelForCausalLM.from_pretrained("Qwen/Qwen-72B-Chat", trust_remote_code=True)]]
The model is automatically converting to bf16 for faster inference. If you want to disable the automatic precision, please manually add bf16/fp16/fp32=True to "AutoModelForCausalLM.from_pretrained".
Try importing flash-attention for faster inference...
Warning: import flash_attn rotary fail, please install FlashAttention rotary to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/rotary
Warning: import flash_attn rms_norm fail, please install FlashAttention layer_norm to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/layer_norm
Warning: import flash_attn fail, please install FlashAttention to get higher efficiency https://github.com/Dao-AILab/flash-attention
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████| 82/82 [00:11<00:00, 7.04it/s]
[Qwen/Qwen-72B-Chat ready on device cpu]
/home/name/miniconda3/envs/lmql/lib/python3.10/site-packages/transformers/generation/configuration_utils.py:394: UserWarning: `do_sample` is set to `False`. However, `top_p` is set to `0.8` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `top_p`.
warnings.warn(
/home/name/miniconda3/envs/lmql/lib/python3.10/site-packages/transformers/generation/configuration_utils.py:404: UserWarning: `do_sample` is set to `False`. However, `top_k` is set to `0` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `top_k`.
warnings.warn(
[Error during generate()] expected scalar type c10::BFloat16 but found double
Traceback (most recent call last):
File "/home/name/user1/lmql/lmql_test.py", line 19, in <module>
print(prompt())
File "/home/name/miniconda3/envs/lmql/lib/python3.10/site-packages/lmql/api/queries.py", line 148, in lmql_query_wrapper
return module.query(*args, **kwargs)
File "/home/name/miniconda3/envs/lmql/lib/python3.10/site-packages/lmql/runtime/lmql_runtime.py", line 204, in __call__
return call_sync(self, *args, **kwargs)
File "/home/name/miniconda3/envs/lmql/lib/python3.10/site-packages/lmql/runtime/loop.py", line 37, in call_sync
res = loop.run_until_complete(task)
File "/home/name/miniconda3/envs/lmql/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
return future.result()
File "/home/name/miniconda3/envs/lmql/lib/python3.10/site-packages/lmql/runtime/lmql_runtime.py", line 230, in __acall__
results = await interpreter.run(self.fct, **query_kwargs)
File "/home/name/miniconda3/envs/lmql/lib/python3.10/site-packages/lmql/runtime/tracing/tracer.py", line 240, in wrapper
return await fct(*args, **kwargs)
File "/home/name/miniconda3/envs/lmql/lib/python3.10/site-packages/lmql/runtime/interpreter.py", line 1070, in run
async for _ in decoder_fct(prompt, **decoder_args):
File "/home/name/miniconda3/envs/lmql/lib/python3.10/site-packages/lmql/runtime/dclib/decoders.py", line 21, in argmax
h = h.extend(await model.argmax(h, noscore=True))
File "/home/name/miniconda3/envs/lmql/lib/python3.10/site-packages/lmql/runtime/dclib/dclib_cache.py", line 277, in argmax
return await arr.aelement_wise(op_argmax)
File "/home/name/miniconda3/envs/lmql/lib/python3.10/site-packages/lmql/runtime/dclib/dclib_array.py", line 318, in aelement_wise
result_items = await asyncio.gather(*[op_with_path(path, seqs, *args, **kwargs) for path, seqs in self.sequences.items()])
File "/home/name/miniconda3/envs/lmql/lib/python3.10/site-packages/lmql/runtime/dclib/dclib_array.py", line 317, in op_with_path
return path, await op(element, *args, **kwargs)
File "/home/name/miniconda3/envs/lmql/lib/python3.10/site-packages/lmql/runtime/dclib/dclib_cache.py", line 256, in op_argmax
non_cached_argmax = iter((await self.delegate.argmax(DataArray(non_cached), **kwargs)).items())
File "/home/name/miniconda3/envs/lmql/lib/python3.10/site-packages/lmql/models/lmtp/lmtp_dcmodel.py", line 307, in argmax
return await self.sample(sequences, temperature=0.0, **kwargs)
File "/home/name/miniconda3/envs/lmql/lib/python3.10/site-packages/lmql/models/lmtp/lmtp_dcmodel.py", line 350, in sample
return await sequences.aelement_wise(op_sample)
File "/home/name/miniconda3/envs/lmql/lib/python3.10/site-packages/lmql/runtime/dclib/dclib_array.py", line 318, in aelement_wise
result_items = await asyncio.gather(*[op_with_path(path, seqs, *args, **kwargs) for path, seqs in self.sequences.items()])
File "/home/name/miniconda3/envs/lmql/lib/python3.10/site-packages/lmql/runtime/dclib/dclib_array.py", line 317, in op_with_path
return path, await op(element, *args, **kwargs)
File "/home/name/miniconda3/envs/lmql/lib/python3.10/site-packages/lmql/models/lmtp/lmtp_dcmodel.py", line 340, in op_sample
tokens = await asyncio.gather(*[self.stream_and_return_first(s, await self.generate(s, temperature=temperature, **kwargs), mode) for s,mode in zip(seqs, unique_sampling_mode)])
File "/home/name/miniconda3/envs/lmql/lib/python3.10/site-packages/lmql/models/lmtp/lmtp_dcmodel.py", line 147, in stream_and_return_first
buffer += [await anext(iterator)]
File "/home/name/miniconda3/envs/lmql/lib/python3.10/site-packages/lmql/models/lmtp/lmtp_multiprocessing.py", line 188, in generate
async for token in self.stream_iterator(self.stream_id):
File "/home/name/miniconda3/envs/lmql/lib/python3.10/site-packages/lmql/models/lmtp/lmtp_multiprocessing.py", line 217, in stream_iterator
raise LMTPStreamError(item["error"])
lmql.models.lmtp.errors.LMTPStreamError: failed to generate tokens 'expected scalar type c10::BFloat16 but found double'
Task was destroyed but it is pending!
task: <Task cancelling name='lmtp_inprocess_client_loop' coro=<LMTPDcModel.inprocess_client_loop() running at /home/name/miniconda3/envs/lmql/lib/python3.10/site-packages/lmql/models/lmtp/lmtp_dcmodel.py:76> wait_for=<Future finished result=True>>
from lmql.
Related Issues (20)
- LMQL fails to use llama-cpp-python to load models HOT 3
- Lmql playground crashes with: [WinError 2] The system cannot find the specified file HOT 2
- OpenAI Chat 'choices' ("where")
- Support for Phi-2 HOT 1
- Broken Doc Links HOT 3
- [Question] Support constraining via context free grammar for dsls HOT 2
- does Azure really work ? HOT 8
- OpenAIStreamError: logprobs, best_of and echo parameters are not available on gpt-35-turbo model.
- [Question] SSLCertVerificationError when Using lmql with llamaindex HOT 1
- LMQL distribution: `NameError: name 'CLASSIFICATION' is not defined` HOT 2
- Do any of the playground examples work ? HOT 6
- Dolphin models doesn't run HOT 2
- Allow specifying model path for hf models HOT 1
- Is it possible to run LMQL with LMStudio locally ? HOT 2
- Is there any way to replicate function calling like in openai assistants api? HOT 1
- lmql.runtime.tokenizer.TokenizerNotAvailableError
- Add support for OpenAI tools/function-calling API HOT 3
- How can I get classification to work with OpenAI models? HOT 5
- Quotes aren't always parsed properly HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from lmql.