mlc_chat_cli --model dolly-v2-12b_int3 --dtype float32 Use lib /root/mlcai/dist/do

<a target="_blank" rel="noopener noreferrer nofollow" href="https://user-images.github

dolly 12b 3bit cuda out of memory on my wsl 3070 laptop card about mlc-llm HOT 4 CLOSED

mlc-ai commented on May 10, 2024

dolly 12b 3bit cuda out of memory on my wsl 3070 laptop card

from mlc-llm.

Comments (4)

zeeroh commented on May 10, 2024 1

Let me preface this by saying I have no idea what I’m talking about 😂

BUT… could it be because you’re using float32 instead of float16?

https://huggingface.co/databricks/dolly-v2-12b/discussions/18

from mlc-llm.

myhyh commented on May 10, 2024

my build command is
python build.py --model dolly-v2-12b --dtype float32 --target cuda --quantization-mode int3 --quantization-sym --quantization-storage-nbit 32 --max-seq-len 2048

from mlc-llm.

myhyh commented on May 10, 2024

with this mod in code to fit my card arch

from mlc-llm.

junrushao commented on May 10, 2024

Quantization plays an important role in memory reduction if you wanted to run a larger model with consumer-class GPUs, so please turn it on :-)

from mlc-llm.

Related Issues (20)

[Model Request] Large World Model
[Bug] Access violation in TVM package when trying to run model converted and compiled for Windows
[Question] build tiny-llama, use latest mlc-llm, report vm.builtin.paged_attention_kv_cache_attention_with_fused_qkv error HOT 6
[Bug] Llama model not running properly in Android app HOT 9
[Bug] error: package org.apache.tvm does not exist when building app in Android Studio HOT 6
Issue Compiling gemma-2-it for Vulkan on Linux HOT 6
[Bug] InternalError: Check failed: type == expected_type (float32x2 vs. float16x2) : Attempted to access buffer K_smem as element type float32x2 using an index of size 2 when the element type is float16 HOT 5
[Feature Request] file selection and saving in chat
[Bug] gemma-2b for Android. OpenCL Error Code=-54: CL_INVALID_WORK_GROUP_SIZE HOT 7
[Bug] Mixtral-8x7B-Instruct-v0.1-q4f16_1-MLC can't run HOT 4
[Model Request] Stable Diffusion and Controlnet for qualcom HOT 2
Compile android model for Vulkan HOT 1
[Question] How to set up Rag & Prompt engineering in Android app HOT 6
[Question] Does MLC support speculative decoding on Android? HOT 1
Bugs in `core.py` file and others HOT 2
[Bug] TVMError: The output probabilities are all NaNs, can not sample from it HOT 5
[Bug] WSL2 Ubuntu RTX 3060 CUDAError: cuModuleLoadData(&(module_[device_id]), data_.c_str()) failed with error: CUDA_ERROR_NO_BINARY_FOR_GPU HOT 1
[Bug] mlc_llm.build error - `ValueError: Invoking function prefill requires 330 inputs but only 4 inputs are provided.` HOT 2
[Question] How to print log under mlc-llm? HOT 5
[Bug] Problem compiling llama model HOT 7

dolly 12b 3bit cuda out of memory on my wsl 3070 laptop card about mlc-llm HOT 4 CLOSED

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent