Giter Club home page Giter Club logo

Comments (2)

flippercy avatar flippercy commented on August 15, 2024

@arnocandel @pseudotensor Thank you!

from h2ogpt.

pseudotensor avatar pseudotensor commented on August 15, 2024

#22

for specific model card, e.g. https://huggingface.co/h2oai/h2ogpt-oasst1-512-20b you can get the Training logs zip given there. The log file shows the full parameters used for finetune.py.

Training logs: zip

E.g. for that model card, the log file is called 1013.log, and inside you'll see a block like:

local_rank: 6
global rank: 6
local_rank: 0
global rank: 0
Training model with params:
save_code: True
run_id: 1013
tokenizer_base_model: EleutherAI/gpt-neox-20b
data_path: openassistant_oasst1.json
data_col_dict: None
valid_path: None
data_mix_in_path: 0-hero/OIG-small-chip2
data_mix_in_factor: 0.0
data_mix_in_col_dict: {'user': 'instruction', 'chip2': 'output'}
data_mix_in_prompt_type: instruct
output_dir: gpt-neox-20b.openassistant_oasst1.json.6.0_epochs.5a14ea8b3794c0d60476fc262d0a297f98dd712d.1013
lora_weights: 
batch_size: 64
micro_batch_size: 8
gradient_checkpointing: False
fp16: True
num_epochs: 6.0
learning_rate: 0.0003
val_set_size: 0
val_metrics: []
eval_steps: 32000
eval_epochs: None
lora_r: 16
lora_alpha: 16
lora_dropout: 0.05
lora_target_modules: ['query_key_value']
llama_type: False
group_by_length: False
resume_from_checkpoint: None
ddp: True
local_files_only: False
resume_download: True
warmup_steps: 100
logging_steps: 1
save_steps: 2000
add_eos_token: False
world_size: 8
local_rank: 0
rank: 0
gpus: 8
device_map: auto
gradient_accumulation_steps: 8
base_model: EleutherAI/gpt-neox-20b
cutoff_len: 512
prompt_type: plain
train_on_inputs: True
Command: finetune.py --base_model=EleutherAI/gpt-neox-20b --data_path=openassistant_oasst1.json --lora_target_modules=["query_key_value"] --run_id=1013 --batch_size=64 --micro_batch_size=8 --num_epochs=6.0 --val_set_size=0 --eval_steps=32000 --save_steps=2000 --data_mix_in_factor=0.0 --data_mix_in_factor=0.0 --prompt_type=plain --save_code=True --cutoff_len=512 --lora_r=16
Hash: 5a14ea8b3794c0d60476fc262d0a297f98dd712d
Distributed: data parallel

The "Command" shows the actual command used. It also shows the hash of the repo used. So everything can be perfectly reproduced.

The only issue you have to account for is the system issues. E.g. I specifically trained that 20B with this line:

WORLD_SIZE=8 CUDA_VISIBLE_DEVICES="0,1,2,3,4,5,6,7" torchrun --nproc_per_node=8 --master_port=1234 finetune.py --base_model='EleutherAI/gpt-neox-20b' --data_path='openassistant_oasst1.json' --lora_target_modules='["query_key_value"]' --run_id=1013 --batch_size=64 --micro_batch_size=8 --num_epochs=6.0 --val_set_size=0 --eval_steps=32000 --save_steps=2000 --data_mix_in_factor=0.0 --data_mix_in_factor=0.0  --prompt_type='plain' --save_code=True --cutoff_len=512 --lora_r=16 &> 1013.log

That is you'll see how I made that 1013.log file. The only thing missing from the 1013.log file is the WORLD_SIZE=8 CUDA_VISIBLE_DEVICES="0,1,2,3,4,5,6,7" torchrun --nproc_per_node=8 --master_port=1234 that is very system specific for a case we ran on 8*A100.

from h2ogpt.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.