Comments (2)
@arnocandel @pseudotensor Thank you!
from h2ogpt.
for specific model card, e.g. https://huggingface.co/h2oai/h2ogpt-oasst1-512-20b you can get the Training logs zip given there. The log file shows the full parameters used for finetune.py.
Training logs: zip
E.g. for that model card, the log file is called 1013.log, and inside you'll see a block like:
local_rank: 6
global rank: 6
local_rank: 0
global rank: 0
Training model with params:
save_code: True
run_id: 1013
tokenizer_base_model: EleutherAI/gpt-neox-20b
data_path: openassistant_oasst1.json
data_col_dict: None
valid_path: None
data_mix_in_path: 0-hero/OIG-small-chip2
data_mix_in_factor: 0.0
data_mix_in_col_dict: {'user': 'instruction', 'chip2': 'output'}
data_mix_in_prompt_type: instruct
output_dir: gpt-neox-20b.openassistant_oasst1.json.6.0_epochs.5a14ea8b3794c0d60476fc262d0a297f98dd712d.1013
lora_weights:
batch_size: 64
micro_batch_size: 8
gradient_checkpointing: False
fp16: True
num_epochs: 6.0
learning_rate: 0.0003
val_set_size: 0
val_metrics: []
eval_steps: 32000
eval_epochs: None
lora_r: 16
lora_alpha: 16
lora_dropout: 0.05
lora_target_modules: ['query_key_value']
llama_type: False
group_by_length: False
resume_from_checkpoint: None
ddp: True
local_files_only: False
resume_download: True
warmup_steps: 100
logging_steps: 1
save_steps: 2000
add_eos_token: False
world_size: 8
local_rank: 0
rank: 0
gpus: 8
device_map: auto
gradient_accumulation_steps: 8
base_model: EleutherAI/gpt-neox-20b
cutoff_len: 512
prompt_type: plain
train_on_inputs: True
Command: finetune.py --base_model=EleutherAI/gpt-neox-20b --data_path=openassistant_oasst1.json --lora_target_modules=["query_key_value"] --run_id=1013 --batch_size=64 --micro_batch_size=8 --num_epochs=6.0 --val_set_size=0 --eval_steps=32000 --save_steps=2000 --data_mix_in_factor=0.0 --data_mix_in_factor=0.0 --prompt_type=plain --save_code=True --cutoff_len=512 --lora_r=16
Hash: 5a14ea8b3794c0d60476fc262d0a297f98dd712d
Distributed: data parallel
The "Command" shows the actual command used. It also shows the hash of the repo used. So everything can be perfectly reproduced.
The only issue you have to account for is the system issues. E.g. I specifically trained that 20B with this line:
WORLD_SIZE=8 CUDA_VISIBLE_DEVICES="0,1,2,3,4,5,6,7" torchrun --nproc_per_node=8 --master_port=1234 finetune.py --base_model='EleutherAI/gpt-neox-20b' --data_path='openassistant_oasst1.json' --lora_target_modules='["query_key_value"]' --run_id=1013 --batch_size=64 --micro_batch_size=8 --num_epochs=6.0 --val_set_size=0 --eval_steps=32000 --save_steps=2000 --data_mix_in_factor=0.0 --data_mix_in_factor=0.0 --prompt_type='plain' --save_code=True --cutoff_len=512 --lora_r=16 &> 1013.log
That is you'll see how I made that 1013.log file. The only thing missing from the 1013.log file is the WORLD_SIZE=8 CUDA_VISIBLE_DEVICES="0,1,2,3,4,5,6,7" torchrun --nproc_per_node=8 --master_port=1234
that is very system specific for a case we ran on 8*A100.
from h2ogpt.
Related Issues (20)
- AutoGPT issue running on Local LLM HOT 3
- Running H2ogpt with Ollama inference Server HOT 2
- Unable to Programmatically Receive Sources with Prompts & Responses HOT 1
- h2o Windows installer "Web Search" and "Q/A" HOT 1
- Source Link opened in the same tab HOT 2
- KeyError: images_num_max HOT 3
- tab visibility flag like --visible_system_tab=False not working HOT 7
- Chunk should open on the same page from it has been taken HOT 2
- can't add personal data db/collection to auth.json HOT 6
- sidebar display control HOT 2
- h2ogpt tries to download model from hugging face when using local inference server HOT 2
- ModuleNotFoundError: No module named 'tenacity.asyncio' HOT 5
- Failed to load models HOT 1
- Traceback: 'NoneType' object is not iterable (gen.py, line 2435) HOT 1
- Gradio Client - Repeated WebSocket Connection Rejection (HTTP 403) HOT 1
- Can we query Qdrant directly without needing sources? HOT 4
- Offline mode is still attempting to fetch from HF HOT 4
- Use auto ingest with openai Api HOT 1
- Error installing through full linux bash script HOT 2
- Getting "an unexpected keyword argument 'cache_folder'" during import HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from h2ogpt.