The course_rl from dc-wangjn

Homework description

I run Ant-v2 and Walker2d-v2 environment in DAgger and Behavior Cloning method.

Tensorboard files are located at /data directory.

I plot two pictures to compare the performance of above two methods and expert policy, which are located at /data directory.

Run the code

Tip: While debugging, you probably want to keep the flag --video_log_freq -1 which will disable video logging and speed up the experiment. However, feel free to remove it to save videos of your awesome policy!

If running on Colab, adjust the #@params in the Args class according to the commmand line arguments above.

Section 1 (Behavior Cloning)

Command for problem 1:

python cs285/scripts/run_hw1.py \
	--expert_policy_file cs285/policies/experts/Ant.pkl \
	--env_name Ant-v2 --exp_name bc_ant --n_iter 1 \
	--expert_data cs285/expert_data/expert_data_Ant-v2.pkl
	--video_log_freq -1

Make sure to also try another environment. See the homework PDF for more details on what else you need to run. To generate videos of the policy, remove the --video_log_freq -1 flag.

Section 2 (DAgger)

Command for section 1: (Note the --do_dagger flag, and the higher value for n_iter)

python cs285/scripts/run_hw1.py \
    --expert_policy_file cs285/policies/experts/Ant.pkl \
    --env_name Ant-v2 --exp_name dagger_ant --n_iter 10 \
    --do_dagger --expert_data cs285/expert_data/expert_data_Ant-v2.pkl \
	--video_log_freq -1

Make sure to also try another environment. See the homework PDF for more details on what else you need to run.

Visualization the saved tensorboard event file:

You can visualize your runs using tensorboard:

tensorboard --logdir data

You will see scalar summaries as well as videos of your trained policies (in the 'images' tab).

You can choose to visualize specific runs with a comma-separated list:

tensorboard --logdir data/run1,data/run2,data/run3...

If running on Colab, you will be using the %tensorboard line magic to do the same thing; see the notebook for more details.

dc-wangjn / course_rl Goto Github PK

course_rl's Introduction

Homework description

Run the code

Section 1 (Behavior Cloning)

Section 2 (DAgger)

Visualization the saved tensorboard event file:

course_rl's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent