Comments (4)
Thanks for the request. As we discussed in private, mirostat support is on the way. After that, we'll focus on a few more deterministic samplers then we can finally move to CFG and grammars.
I have not experimented with exllamav2 kernels yet, but I will assume it'll be straightforward enough to migrate from v1 to v2.
from aphrodite-engine.
At the moment, we've added support for:
- mirostat
- exllamav2 (though not the variable bitrates, GPTQ only)
CFG support is not planned and will likely not happen in the foreseeable future, as it slows down generations.
from aphrodite-engine.
Re-opening this issue so we can keep track of CFG support. After discussing internally, we decided to add it (but not as a high priority addition). We'll likely need to separate CFG requests to their own unique batches, so the throughput cost doesn't affect regular requests.
from aphrodite-engine.
Just saying that as an individual user, having CFG support with 8 bit KV cache is, obviously, really useful. exl2 does this too and it's killer. The net gain for quality feels like it's cheating.
I wonder if there'd be room to further squeeze the second cache - given that it's useful even when you just pass an empty string (or <s>) as the prompt?
I don't suppose the maths of CFG still makes sense with a shorter negative context? ...maybe just kludge it with
"I pinky promise that we found a sink token and not a null reference down there in the context mr transformer don't worry about it just keep up that forward() okay buddy!"
I feel it might not even be a bad thing if the negative guidance was, yknow, low quality? It's gotta stay on topic but
/shrug
Not like I'd know. So I'm greatful that you guys do. Keep it up and thanks for your work.
from aphrodite-engine.
Related Issues (20)
- [Feature]: Add support for DBRX model HOT 2
- [Bug]: Exllama v2 not working HOT 11
- [Feature]: Add support for Qwen2MoE HOT 1
- [Feature]: Add support for Command-r HOT 2
- [Feature]: actual working health endpoint HOT 2
- [Feature]: any workarounds for cc 6.0? HOT 2
- [Bug]: served-model-name is unused HOT 1
- [Installation]: No module named 'aphrodite._C' HOT 2
- [Crash]: Program gets terminated HOT 1
- [Bug]: Converting gguf to state_dict HOT 3
- [Feature]: Is there a reason CUDA 6.1 is the minimum? Would CUDA 6.0 on the P100 not work? HOT 5
- [Bug]: manually setting --max-model-len flag always leads to OOM, even if it is set very low HOT 2
- [Bug]: gguf loading failed. config.json? HOT 4
- [Feature]: Support hqq quantize method.
- [Bug]: Mixtral-8x22b-instruct not running with AWQ HOT 10
- [Feature]: Provide configuration via env vars or a configuration file
- [Usage]: odd use of GPUS number and tensor parallelism HOT 2
- [Installation]: Cannot install the library
- [Bug]: Unable to use OpenAI API with an auth key via a web browser due to OPTIONS preflight request returning 401. HOT 1
- [Bug]: HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from aphrodite-engine.