Thanks for this great work. I am wondering if you just randomly initialized the adapt

@zsmmsz99 <a class="user-mention notranslate" data-hovercard-type="user" data-hovercar

@zsmmsz99 <a class="user-mention notranslate" data-hovercard-type="user"

Thanks for your interest <a class="user-mention notranslate

Question about the initialization of Adapter. about llama-adapter HOT 7 OPEN

TitleZ99 commented on August 14, 2024

Question about the initialization of Adapter.

from llama-adapter.

Comments (7)

gaopengpjlab commented on August 14, 2024

We randomly initialise the adaption prompts. The finetuning code will be released in a few days. Stay tuned.

from llama-adapter.

teknium1 commented on August 14, 2024

Looking forward to it!

from llama-adapter.

TitleZ99 commented on August 14, 2024

We randomly initialise the adaption prompts. The finetuning code will be released in a few days. Stay tuned.

Thanks again for your immediate reply. I am looking forward to it too!

from llama-adapter.

aojunzz commented on August 14, 2024

@zsmmsz99 @teknium1 We have released the simple FineTuning v1, an easy-to-understand and reproducible version. Please visit https://github.com/ZrrSkywalker/LLaMA-Adapter/tree/main/alpaca_finetuning_v1.

from llama-adapter.

teknium1 commented on August 14, 2024

@zsmmsz99 @teknium1 We have released the simple FineTuning v1, an easy-to-understand and reproducible version. Please visit https://github.com/ZrrSkywalker/LLaMA-Adapter/tree/main/alpaca_finetuning_v1.

Thanks. Could you explain a bit more about the settings for the fine tune? Or maybe that's outside your scope here, but What determines your choices for adapter layer, adapter len, blr? Were these all from original alpaca/is there no special args for your adapter vs alpaca-lora etc?

    --adapter_len 10 \
    --max_seq_len 512 \
    --batch_size 4 \
    --epochs 5 \
    --warmup_epochs 2 \
    --blr 9e-3 \
    --weight_decay 0.02 \```

from llama-adapter.

ruiyan1995 commented on August 14, 2024

Thanks for this great work. I am confused about "Zero-init Attention". As descript in Sec 3.2 of your paper, the learnable gating factor g is initialized by zero. But I wonder whether you initialize any other important parameters of the top L frozen Transformer blocks? Looking forward to your reply! @gaopengpjlab

from llama-adapter.

ZrrSkywalker commented on August 14, 2024

Thanks for your interest @ruiyan1995 . We only zero-initialize the gating factor. For other newly added learnable parameters, e.g., adaption prompts or visual projection layer, are all randomly initialized.

from llama-adapter.

Recommend Projects

Question about the initialization of Adapter. about llama-adapter HOT 7 OPEN

Comments (7)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent