Comments (5)
Perhaps you could try using the following code in pruning.sh instead of running with slurm.
# Run in bash, it will automatically use resources available in the current environment
composer $TRAIN_SCRIPT \
$config_file \
run_name=${run_name} \
data_local=${data_local} \
eval_loader.dataset.split=${eval_split_name} \
global_train_batch_size=${global_train_batch_size} \
device_train_microbatch_size=${device_train_microbatch_size} \
device_eval_batch_size=${device_eval_batch_size} \
max_seq_len=${max_seq_len} \
max_duration=${max_duration} \
eval_first=false \
scheduler.t_warmup=${t_warmup} \
save_folder=${save_dir} \
loggers.wandb.init_kwargs.dir=${wandb_dir} \
eval_interval=${eval_interval} \
save_interval=${save_interval} \
optimizer.lr=${lr} \
optimizer.lag_lr=${lag_lr} \
model.path=${path} \
model.l0_module.lagrangian_warmup_steps=${lagr_warmup} \
model.l0_module.pruning_modules='[head,intermediate,layer,hidden]' \
model.l0_module.eval_target_model=${eval_target_model} \
model.l0_module.target_model.d_model=${target_d_model} \
model.l0_module.target_model.n_heads=${target_n_heads} \
model.l0_module.target_model.n_layers=${target_n_layers} \
model.l0_module.target_model.intermediate_size=${target_intermediate_size} \
callbacks.data_loading.dynamic=${dynamic} \
callbacks.data_loading.set_names=${set_names} \
callbacks.data_loading.proportion=${proportion} \
callbacks.data_loading.update_type=${update_type} \
callbacks.data_loading.target_loss=${target_loss} \
train_loader.num_workers=0 \
train_loader.prefetch_factor=null \
train_loader.persistent_workers=false \
autoresume=false
from llm-shearing.
Composer supports multi-node training, you simply need to configure the MASTER_ADDR
, MASTER_PORT
and WORLD_SIZE
properly in the script, and run composer
command on each of the node. Similar to other multi-node training framworks, all the nodes will communicate with the head node (MASTER_ADDR).
from llm-shearing.
Perhaps you could try using the following code in pruning.sh instead of running with slurm.
# Run in bash, it will automatically use resources available in the current environment composer $TRAIN_SCRIPT \ $config_file \ run_name=${run_name} \ data_local=${data_local} \ eval_loader.dataset.split=${eval_split_name} \ global_train_batch_size=${global_train_batch_size} \ device_train_microbatch_size=${device_train_microbatch_size} \ device_eval_batch_size=${device_eval_batch_size} \ max_seq_len=${max_seq_len} \ max_duration=${max_duration} \ eval_first=false \ scheduler.t_warmup=${t_warmup} \ save_folder=${save_dir} \ loggers.wandb.init_kwargs.dir=${wandb_dir} \ eval_interval=${eval_interval} \ save_interval=${save_interval} \ optimizer.lr=${lr} \ optimizer.lag_lr=${lag_lr} \ model.path=${path} \ model.l0_module.lagrangian_warmup_steps=${lagr_warmup} \ model.l0_module.pruning_modules='[head,intermediate,layer,hidden]' \ model.l0_module.eval_target_model=${eval_target_model} \ model.l0_module.target_model.d_model=${target_d_model} \ model.l0_module.target_model.n_heads=${target_n_heads} \ model.l0_module.target_model.n_layers=${target_n_layers} \ model.l0_module.target_model.intermediate_size=${target_intermediate_size} \ callbacks.data_loading.dynamic=${dynamic} \ callbacks.data_loading.set_names=${set_names} \ callbacks.data_loading.proportion=${proportion} \ callbacks.data_loading.update_type=${update_type} \ callbacks.data_loading.target_loss=${target_loss} \ train_loader.num_workers=0 \ train_loader.prefetch_factor=null \ train_loader.persistent_workers=false \ autoresume=false
Yes, thanks, you are right, but Composer still doesn't support multiple nodes, does it? I don't know how to start multiple nodes. Could you give me a specific script?
from llm-shearing.
Perhaps you could try using the following code in pruning.sh instead of running with slurm.
# Run in bash, it will automatically use resources available in the current environment composer $TRAIN_SCRIPT \ $config_file \ run_name=${run_name} \ data_local=${data_local} \ eval_loader.dataset.split=${eval_split_name} \ global_train_batch_size=${global_train_batch_size} \ device_train_microbatch_size=${device_train_microbatch_size} \ device_eval_batch_size=${device_eval_batch_size} \ max_seq_len=${max_seq_len} \ max_duration=${max_duration} \ eval_first=false \ scheduler.t_warmup=${t_warmup} \ save_folder=${save_dir} \ loggers.wandb.init_kwargs.dir=${wandb_dir} \ eval_interval=${eval_interval} \ save_interval=${save_interval} \ optimizer.lr=${lr} \ optimizer.lag_lr=${lag_lr} \ model.path=${path} \ model.l0_module.lagrangian_warmup_steps=${lagr_warmup} \ model.l0_module.pruning_modules='[head,intermediate,layer,hidden]' \ model.l0_module.eval_target_model=${eval_target_model} \ model.l0_module.target_model.d_model=${target_d_model} \ model.l0_module.target_model.n_heads=${target_n_heads} \ model.l0_module.target_model.n_layers=${target_n_layers} \ model.l0_module.target_model.intermediate_size=${target_intermediate_size} \ callbacks.data_loading.dynamic=${dynamic} \ callbacks.data_loading.set_names=${set_names} \ callbacks.data_loading.proportion=${proportion} \ callbacks.data_loading.update_type=${update_type} \ callbacks.data_loading.target_loss=${target_loss} \ train_loader.num_workers=0 \ train_loader.prefetch_factor=null \ train_loader.persistent_workers=false \ autoresume=false
Yes, thanks, you are right, but Composer still doesn't support multiple nodes, does it? I don't know how to start multiple nodes. Could you give me a specific script?
Single-computer multi-GPU will be used automatically, but multi-computer multi-GPU I haven't tried that either.
It involves FSDP configuration, you can try it.
from llm-shearing.
Composer supports multi-node training, you simply need to configure the
MASTER_ADDR
,MASTER_PORT
andWORLD_SIZE
properly in the script, and runcomposer
command on each of the node. Similar to other multi-node training framworks, all the nodes will communicate with the head node (MASTER_ADDR).
thanks
from llm-shearing.
Related Issues (20)
- Could you provide tokenized continue-pretraining dataset for reproduction? HOT 2
- missmatch shape
- Start training but nothing continue HOT 6
- TypeError: buffer is too small for requested array
- Pruning fine-tuned model HOT 2
- save model meet problem HOT 1
- Instruction tuning dataset HOT 2
- 有没有不用Slurm跑剪枝的方法?
- None
- Start training but only output config information HOT 3
- The Project is not implemented for 70B llama? HOT 7
- LlamaRMSNorm() layer differs from original llama HOT 1
- composer model trans to pythia problem
- The dtype of tokenized data should be uint32 HOT 1
- Why the rope params are ignored while converting hf checkpoint to composer checkpoint? HOT 3
- about shearing params config HOT 1
- Can LLM-Shearing be used on ViT models? HOT 1
- Support for Llama-3 / GQA? HOT 1
- Open source the pruning mask. HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from llm-shearing.