I run Ant-v2
and Walker2d-v2
environment in DAgger and Behavior Cloning method.
Tensorboard files are located at /data directory.
I plot two pictures to compare the performance of above two methods and expert policy, which are located at /data directory.
Tip: While debugging, you probably want to keep the flag --video_log_freq -1
which will disable video logging and speed up the experiment. However, feel free to remove it to save videos of your awesome policy!
If running on Colab, adjust the #@params
in the Args
class according to the commmand line arguments above.
Command for problem 1:
python cs285/scripts/run_hw1.py \
--expert_policy_file cs285/policies/experts/Ant.pkl \
--env_name Ant-v2 --exp_name bc_ant --n_iter 1 \
--expert_data cs285/expert_data/expert_data_Ant-v2.pkl
--video_log_freq -1
Make sure to also try another environment.
See the homework PDF for more details on what else you need to run.
To generate videos of the policy, remove the --video_log_freq -1
flag.
Command for section 1:
(Note the --do_dagger
flag, and the higher value for n_iter
)
python cs285/scripts/run_hw1.py \
--expert_policy_file cs285/policies/experts/Ant.pkl \
--env_name Ant-v2 --exp_name dagger_ant --n_iter 10 \
--do_dagger --expert_data cs285/expert_data/expert_data_Ant-v2.pkl \
--video_log_freq -1
Make sure to also try another environment. See the homework PDF for more details on what else you need to run.
You can visualize your runs using tensorboard:
tensorboard --logdir data
You will see scalar summaries as well as videos of your trained policies (in the 'images' tab).
You can choose to visualize specific runs with a comma-separated list:
tensorboard --logdir data/run1,data/run2,data/run3...
If running on Colab, you will be using the %tensorboard
line magic to do the same thing; see the notebook for more details.