Comments (7)
We randomly initialise the adaption prompts. The finetuning code will be released in a few days. Stay tuned.
from llama-adapter.
Looking forward to it!
from llama-adapter.
We randomly initialise the adaption prompts. The finetuning code will be released in a few days. Stay tuned.
Thanks again for your immediate reply. I am looking forward to it too!
from llama-adapter.
@zsmmsz99 @teknium1 We have released the simple FineTuning v1, an easy-to-understand and reproducible version. Please visit https://github.com/ZrrSkywalker/LLaMA-Adapter/tree/main/alpaca_finetuning_v1.
from llama-adapter.
@zsmmsz99 @teknium1 We have released the simple FineTuning v1, an easy-to-understand and reproducible version. Please visit https://github.com/ZrrSkywalker/LLaMA-Adapter/tree/main/alpaca_finetuning_v1.
Thanks. Could you explain a bit more about the settings for the fine tune? Or maybe that's outside your scope here, but What determines your choices for adapter layer, adapter len, blr? Were these all from original alpaca/is there no special args for your adapter vs alpaca-lora etc?
--adapter_len 10 \
--max_seq_len 512 \
--batch_size 4 \
--epochs 5 \
--warmup_epochs 2 \
--blr 9e-3 \
--weight_decay 0.02 \```
from llama-adapter.
Thanks for this great work. I am confused about "Zero-init Attention". As descript in Sec 3.2 of your paper, the learnable gating factor g is initialized by zero. But I wonder whether you initialize any other important parameters of the top L frozen Transformer blocks? Looking forward to your reply! @gaopengpjlab
from llama-adapter.
Thanks for your interest @ruiyan1995 . We only zero-initialize the gating factor. For other newly added learnable parameters, e.g., adaption prompts or visual projection layer, are all randomly initialized.
from llama-adapter.
Related Issues (20)
- Unable to produce the result between LLaMA-Adapter V1 and Alpaca HOT 1
- question about Pretrained LLAMA applicable to Llama_adapter model. thanks HOT 1
- I don't know which data to use to reproduce the model llama-adapter-multimodal-v2.
- Does storage space in the paper mean the capacity of checkpoint file? HOT 2
- Inquiry on Loading LLaMa-2 Model Parameters HOT 1
- how to set llama adapter max_seq_len = 4096
- [LLaMA Adapter V2] Evaluation on multiple choice questions. HOT 1
- AssertionError: Loading a checkpoint for MP=0 but world size is 1 HOT 2
- Don't find save path"ADAPTER_PATH" HOT 1
- Getting error "AF_UNIX path too long"
- Loss is nan, stopping training, while trying to reproduce alpaca_finetuning_v1 results. HOT 1
- Simple question about llama adapter v1 transformer forward function
- imagebind_LLM中的get_chinese_llama.py文件丢失,可以补充一下吗? HOT 1
- Getting weird output for multimodal 7B adapter HOT 3
- Assertation Error start_pos- AdapterV2 Multimodal
- The meaning of C_loss and M_loss HOT 1
- what is the dataset during pretraining llama_adapter_v2_multimodal7b?
- RuntimeError: CUDA out of memory
- RuntimeError: [enforce fail at CPUAllocator.cpp:68] . DefaultCPUAllocator: can't allocate memory: you tried to allocate 90177536 bytes. Error code 12 (Cannot allocate memor y)
- created a model on colab but cannot load for inference
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from llama-adapter.