microsoft / atac Goto Github PK

Code accompanying the paper Adversarially Trained Actor Critic for Offline Reinforcement Learning by Ching-An Cheng*, Tengyang Xie*, Nan Jiang, and Alekh Agarwal.

License: MIT License

Shell 0.82% Python 99.18%

atac's Introduction

ATAC: Adversarially Trained Actor Critic

This repository contains the code to reproduce the experimental results of ATAC algorithm in the paper Adversarially Trained Actor Critic for Offline Reinforcement Learning by Ching-An Cheng*, Tengyang Xie*, Nan Jiang, and Alekh Agarwal (https://arxiv.org/abs/2202.02446).

***Please see also https://github.com/microsoft/lightATAC for a lightweight reimplementation of ATAC, which gives a 1.5-2X speed up compared with the original code here.

Setup

Clone the repository and create a conda environment.

git clone https://github.com/microsoft/ATAC.git
conda create -n atac python=3.8
cd atac

Prerequisite: Install Mujoco

(Optional) Install free mujoco210 for mujoco_py and mujoco211 for dm_control.

bash install_mujoco.sh
echo "export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:~/.mujoco/mujoco210/bin:/usr/lib/nvidia" >> ~/.bashrc
source ~/.bashrc

Install ATAC

conda activate atac
pip install -e .[mujoco210]
# or below, if the original paid mujoco is used.
pip install -e .[mujoco200]

Run ATAC

python scripts/main.py

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.

When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.

Trademarks

This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.

atac's People

Contributors

Stargazers

Watchers

Forkers

offline-reinforcement-learning microsoft-fevieira test-mass-forker-org-1 zhan0903 fuyw xuefeng11 maoliyuan yangshi980

atac's Issues

Question about D4RL MuJoCo benchmark

Thanks for sharing the codes.
I have one question. It seems like you are using D4RL v2 (C.2.), and in Table 1 you mention that "the baseline results are from the respective papers". However, some previous papers were using D4RL v0. I believe the buffer quality is varied from v0 to v2 (see TD3BC paper). Thus, the comparison might be biased.

Why training ends at epoch 50?

Hello, I have tried to reproduce ATAC's results in the paper. However, when I run the official codes, the experiment automatically ends at epoch 50. I cannot find where the problem is? Could you give me some help?
For example, I have run 'python scripts/main.py -e hopper-medium-expert-v2 --gpu_id 0 --seed 15'. Are there any other hyperparameters that need to be given?
@chinganc

No python scripts/main.py

How to run ATAC? According to README.md, we should to run atac by using ‘python scripts/main.py’, but there is no scripts/main.py file.

Difference with Conservative-Q Learning (CQL)

The relative pessimism (1)(2)(2) proposed in ATAC seems exactly same as the learning objective (3) in [1] . And Algorithm 2 in ATAC looks remarkably similar to the Algorithm 1 in [1] omitting some implementation caveats. Could you explain what is the major difference between ATAC and CQL?

[1] Kumar et. al. Conservative Q-Learning for Offline Reinforcement Learning. NeurIPS 2020

[Bug in win11] when I run "main.py"

Hi, thank you very much for providing such a good open source algorithm, but I am having the problems when running "main.py" on windows.

Warning: Flow failed to import. Set the environment variable D4RL_SUPPRESS_IMPORT_ERROR=1 to suppress this message.
No module named 'flow'
Warning: FrankaKitchen failed to import. Set the environment variable D4RL_SUPPRESS_IMPORT_ERROR=1 to suppress this message.
No module named 'mujoco'
Warning: CARLA failed to import. Set the environment variable D4RL_SUPPRESS_IMPORT_ERROR=1 to suppress this message.
No module named 'carla'
pybullet build time: Nov 5 2022 13:03:11
Traceback (most recent call last):
File "D:\Anacondazzp\envs\pavelzzp\lib\site-packages\tensorboardX\record_writer.py", line 58, in open_file
factory = REGISTERED_FACTORIES[prefix]
KeyError: '.\exp_data\OfflineATAC_hopper-medium-replay-v2\beta_16_discount_0.99_norm_constraint_100_policy_lr_5e-07_value_lr_0.0005_use_two_qfs_True_fixed_alpha_None_q_eval_mode_0.5_0.5_n_warmstart_steps_100000_seed_0\events.out.tfevents.1667737712.Pavelzzp'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "D:\pythonproject\ATAC-master\scripts\main.py", line 234, in
run(**train_kwargs)
File "D:\pythonproject\ATAC-master\scripts\main.py", line 197, in run
full_score = train_agent(train_func,
File "c:\users\pavel\atac\src\atac\garage_tools\rl_utils.py", line 47, in train_agent
score = wrapped_train_func(**train_kwargs)
File "D:\Anacondazzp\envs\pavelzzp\lib\site-packages\garage\experiment\experiment.py", line 368, in call
ctxt = self._make_context(self._get_options(*args), **kwargs)
File "D:\Anacondazzp\envs\pavelzzp\lib\site-packages\garage\experiment\experiment.py", line 329, in _make_context
dowel.TensorBoardOutput(log_dir, x_axis=options['x_axis']))
File "D:\Anacondazzp\envs\pavelzzp\lib\site-packages\dowel\tensor_board_output.py", line 57, in init
self._writer = tbX.SummaryWriter(log_dir, flush_secs=flush_secs)
File "D:\Anacondazzp\envs\pavelzzp\lib\site-packages\tensorboardX\writer.py", line 301, in init
self._get_file_writer()
File "D:\Anacondazzp\envs\pavelzzp\lib\site-packages\tensorboardX\writer.py", line 349, in _get_file_writer
self.file_writer = FileWriter(logdir=self.logdir,
File "D:\Anacondazzp\envs\pavelzzp\lib\site-packages\tensorboardX\writer.py", line 105, in init
self.event_writer = EventFileWriter(
File "D:\Anacondazzp\envs\pavelzzp\lib\site-packages\tensorboardX\event_file_writer.py", line 106, in init
self._ev_writer = EventsWriter(os.path.join(
File "D:\Anacondazzp\envs\pavelzzp\lib\site-packages\tensorboardX\event_file_writer.py", line 43, in init
self._py_recordio_writer = RecordWriter(self._file_name)
File "D:\Anacondazzp\envs\pavelzzp\lib\site-packages\tensorboardX\record_writer.py", line 179, in init
self._writer = open_file(path)
File "D:\Anacondazzp\envs\pavelzzp\lib\site-packages\tensorboardX\record_writer.py", line 61, in open_file
return open(path, 'wb')
FileNotFoundError: [Errno 2] No such file or directory: '.\exp_data\OfflineATAC_hopper-medium-replay-v2\beta_16_discount_0.99_norm_constraint_100_policy_lr_5e-07_value_lr_0.0005_use_two_qfs_True_fixed_alpha_None_q_eval_mode_0.5_0.5_n_warmstart_steps_100000_seed_0\events.out.tfevents.1667737712.Pavelzzp'
Exception ignored in: <function LogOutput.del at 0x0000020BA6A3D9D0>

Traceback (most recent call last):
File "D:\Anacondazzp\envs\pavelzzp\lib\site-packages\dowel\logger.py", line 176, in del
self.close()
File "D:\Anacondazzp\envs\pavelzzp\lib\site-packages\dowel\tensor_board_output.py", line 156, in close
self._writer.close()
AttributeError: 'TensorBoardOutput' object has no attribute '_writer'

How can I solve the problems? Thank you very much.