Comments (4)
Let me preface this by saying I have no idea what Iโm talking about ๐
BUTโฆ could it be because youโre using float32 instead of float16?
https://huggingface.co/databricks/dolly-v2-12b/discussions/18
from mlc-llm.
my build command is
python build.py --model dolly-v2-12b --dtype float32 --target cuda --quantization-mode int3 --quantization-sym --quantization-storage-nbit 32 --max-seq-len 2048
from mlc-llm.
with this mod in code to fit my card arch
from mlc-llm.
Quantization plays an important role in memory reduction if you wanted to run a larger model with consumer-class GPUs, so please turn it on :-)
from mlc-llm.
Related Issues (20)
- [Model Request] Large World Model
- [Bug] Access violation in TVM package when trying to run model converted and compiled for Windows
- [Question] build tiny-llama, use latest mlc-llm, report vm.builtin.paged_attention_kv_cache_attention_with_fused_qkv error HOT 6
- [Bug] Llama model not running properly in Android app HOT 9
- [Bug] error: package org.apache.tvm does not exist when building app in Android Studio HOT 6
- Issue Compiling gemma-2-it for Vulkan on Linux HOT 6
- [Bug] InternalError: Check failed: type == expected_type (float32x2 vs. float16x2) : Attempted to access buffer K_smem as element type float32x2 using an index of size 2 when the element type is float16 HOT 5
- [Feature Request] file selection and saving in chat
- [Bug] gemma-2b for Android. OpenCL Error Code=-54: CL_INVALID_WORK_GROUP_SIZE HOT 7
- [Bug] Mixtral-8x7B-Instruct-v0.1-q4f16_1-MLC can't run HOT 4
- [Model Request] Stable Diffusion and Controlnet for qualcom HOT 2
- Compile android model for Vulkan HOT 1
- [Question] How to set up Rag & Prompt engineering in Android app HOT 6
- [Question] Does MLC support speculative decoding on Android? HOT 1
- Bugs in `core.py` file and others HOT 2
- [Bug] TVMError: The output probabilities are all NaNs, can not sample from it HOT 5
- [Bug] WSL2 Ubuntu RTX 3060 CUDAError: cuModuleLoadData(&(module_[device_id]), data_.c_str()) failed with error: CUDA_ERROR_NO_BINARY_FOR_GPU HOT 1
- [Bug] mlc_llm.build error - `ValueError: Invoking function prefill requires 330 inputs but only 4 inputs are provided.` HOT 2
- [Question] How to print log under mlc-llm? HOT 5
- [Bug] Problem compiling llama model HOT 7
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ๐๐๐
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google โค๏ธ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from mlc-llm.