Comments (3)
Hey @o1iv3r thanks for sharing! I'll try to reproduce soon and share an update here.
from text-generation-inference.
FYI, I think it might be a problem in the outlines library which also doesn't work for me with a large number of fields.
from text-generation-inference.
Hi!
The LLM has to worry about generating the JSON as well as the fields in the schema, I think that's the issue. Grammar works 99% of the time really well with smaller schemas. I have to admit I've never seen a schema so long, but the use-case is absolutely something that should work effectively.
I've been doing some reading around schema based generation and I came across this article from Lamini here... it looks like they present to the LLM the pre-filled JSON, this saves on compute plus all the LLM has to do is generate the field contents. The schema parsing would never fail this way.
@drbh I'm not totally sure on the implementation currently in TGI but I'm assuming the LLM is also generating the JSON right now. Is there scope to implement something like this going forwards? I can see great benefit in this if so :)
Thanks.
from text-generation-inference.
Related Issues (20)
- Add `response_format` to chat/completions
- Cannot load Gemma models with TGI 2.0.3 HOT 3
- TGI 2.0.3 fails to serve CodeLlama models that 2.0.1 supports HOT 4
- The PHI-3 gives warnings about Tokens and returns additional tokens.
- Low-Rank Adaptation of Large Language Models HOT 1
- Cannot load microsoft/Phi-3-medium and microsoft/Phi-3-small with TGI-2.0.4 HOT 5
- Wrong tool choice makes server crash
- [Feature]: Additional metrics to enable better autoscaling / load balancing of TGI servers in Kubernetes HOT 2
- Expose `model` argument in python clients HOT 1
- Support OpenAI's stop parameter logic HOT 1
- memory usage 3x higher than plain code HOT 1
- Intel XPU Docker image import error on start
- Llama3 Tokenizer Troubles: All added_tokens unrecognized, given id of `None`
- Gemma not starting with tensor parallelism
- Unable to load quantized commandrplus-medusa on H100
- Deberta V3 not supported
- warmup doesn't work as expected
- [Feature Request] Add `v1/completions` alongside existing `v1/chat/completions` HOT 2
- Support for openbmb/MiniCPM-Llama3-V-2_5
- `stop` param doesn't work at all for `/v1/completions` endpoint
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from text-generation-inference.