Giter Club home page Giter Club logo

atac's Introduction

ATAC: Adversarially Trained Actor Critic

This repository contains the code to reproduce the experimental results of ATAC algorithm in the paper Adversarially Trained Actor Critic for Offline Reinforcement Learning by Ching-An Cheng*, Tengyang Xie*, Nan Jiang, and Alekh Agarwal (https://arxiv.org/abs/2202.02446).

***Please see also https://github.com/microsoft/lightATAC for a lightweight reimplementation of ATAC, which gives a 1.5-2X speed up compared with the original code here.

Setup

Clone the repository and create a conda environment.

git clone https://github.com/microsoft/ATAC.git
conda create -n atac python=3.8
cd atac

Prerequisite: Install Mujoco

(Optional) Install free mujoco210 for mujoco_py and mujoco211 for dm_control.

bash install_mujoco.sh
echo "export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:~/.mujoco/mujoco210/bin:/usr/lib/nvidia" >> ~/.bashrc
source ~/.bashrc

Install ATAC

conda activate atac
pip install -e .[mujoco210]
# or below, if the original paid mujoco is used.
pip install -e .[mujoco200]

Run ATAC

python scripts/main.py

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.

When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.

Trademarks

This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.

atac's People

Contributors

chinganc avatar microsoftopensource avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

atac's Issues

Question about D4RL MuJoCo benchmark

Thanks for sharing the codes.
I have one question. It seems like you are using D4RL v2 (C.2.), and in Table 1 you mention that "the baseline results are from the respective papers". However, some previous papers were using D4RL v0. I believe the buffer quality is varied from v0 to v2 (see TD3BC paper). Thus, the comparison might be biased.

Why training ends at epoch 50?

Hello, I have tried to reproduce ATAC's results in the paper. However, when I run the official codes, the experiment automatically ends at epoch 50. I cannot find where the problem is? Could you give me some help?
For example, I have run 'python scripts/main.py -e hopper-medium-expert-v2 --gpu_id 0 --seed 15'. Are there any other hyperparameters that need to be given?
@chinganc

No python scripts/main.py

How to run ATAC? According to README.md, we should to run atac by using ‘python scripts/main.py’, but there is no scripts/main.py file.

Difference with Conservative-Q Learning (CQL)

The relative pessimism (1)(2)(2) proposed in ATAC seems exactly same as the learning objective (3) in [1] . And Algorithm 2 in ATAC looks remarkably similar to the Algorithm 1 in [1] omitting some implementation caveats. Could you explain what is the major difference between ATAC and CQL?

[1] Kumar et. al. Conservative Q-Learning for Offline Reinforcement Learning. NeurIPS 2020

[Bug in win11] when I run "main.py"

Hi, thank you very much for providing such a good open source algorithm, but I am having the problems when running "main.py" on windows.

Warning: Flow failed to import. Set the environment variable D4RL_SUPPRESS_IMPORT_ERROR=1 to suppress this message.
No module named 'flow'
Warning: FrankaKitchen failed to import. Set the environment variable D4RL_SUPPRESS_IMPORT_ERROR=1 to suppress this message.
No module named 'mujoco'
Warning: CARLA failed to import. Set the environment variable D4RL_SUPPRESS_IMPORT_ERROR=1 to suppress this message.
No module named 'carla'
pybullet build time: Nov 5 2022 13:03:11
Traceback (most recent call last):
File "D:\Anacondazzp\envs\pavelzzp\lib\site-packages\tensorboardX\record_writer.py", line 58, in open_file
factory = REGISTERED_FACTORIES[prefix]
KeyError: '.\exp_data\OfflineATAC_hopper-medium-replay-v2\beta_16_discount_0.99_norm_constraint_100_policy_lr_5e-07_value_lr_0.0005_use_two_qfs_True_fixed_alpha_None_q_eval_mode_0.5_0.5_n_warmstart_steps_100000_seed_0\events.out.tfevents.1667737712.Pavelzzp'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "D:\pythonproject\ATAC-master\scripts\main.py", line 234, in
run(**train_kwargs)
File "D:\pythonproject\ATAC-master\scripts\main.py", line 197, in run
full_score = train_agent(train_func,
File "c:\users\pavel\atac\src\atac\garage_tools\rl_utils.py", line 47, in train_agent
score = wrapped_train_func(**train_kwargs)
File "D:\Anacondazzp\envs\pavelzzp\lib\site-packages\garage\experiment\experiment.py", line 368, in call
ctxt = self._make_context(self._get_options(*args), **kwargs)
File "D:\Anacondazzp\envs\pavelzzp\lib\site-packages\garage\experiment\experiment.py", line 329, in _make_context
dowel.TensorBoardOutput(log_dir, x_axis=options['x_axis']))
File "D:\Anacondazzp\envs\pavelzzp\lib\site-packages\dowel\tensor_board_output.py", line 57, in init
self._writer = tbX.SummaryWriter(log_dir, flush_secs=flush_secs)
File "D:\Anacondazzp\envs\pavelzzp\lib\site-packages\tensorboardX\writer.py", line 301, in init
self._get_file_writer()
File "D:\Anacondazzp\envs\pavelzzp\lib\site-packages\tensorboardX\writer.py", line 349, in _get_file_writer
self.file_writer = FileWriter(logdir=self.logdir,
File "D:\Anacondazzp\envs\pavelzzp\lib\site-packages\tensorboardX\writer.py", line 105, in init
self.event_writer = EventFileWriter(
File "D:\Anacondazzp\envs\pavelzzp\lib\site-packages\tensorboardX\event_file_writer.py", line 106, in init
self._ev_writer = EventsWriter(os.path.join(
File "D:\Anacondazzp\envs\pavelzzp\lib\site-packages\tensorboardX\event_file_writer.py", line 43, in init
self._py_recordio_writer = RecordWriter(self._file_name)
File "D:\Anacondazzp\envs\pavelzzp\lib\site-packages\tensorboardX\record_writer.py", line 179, in init
self._writer = open_file(path)
File "D:\Anacondazzp\envs\pavelzzp\lib\site-packages\tensorboardX\record_writer.py", line 61, in open_file
return open(path, 'wb')
FileNotFoundError: [Errno 2] No such file or directory: '.\exp_data\OfflineATAC_hopper-medium-replay-v2\beta_16_discount_0.99_norm_constraint_100_policy_lr_5e-07_value_lr_0.0005_use_two_qfs_True_fixed_alpha_None_q_eval_mode_0.5_0.5_n_warmstart_steps_100000_seed_0\events.out.tfevents.1667737712.Pavelzzp'
Exception ignored in: <function LogOutput.del at 0x0000020BA6A3D9D0>

Traceback (most recent call last):
File "D:\Anacondazzp\envs\pavelzzp\lib\site-packages\dowel\logger.py", line 176, in del
self.close()
File "D:\Anacondazzp\envs\pavelzzp\lib\site-packages\dowel\tensor_board_output.py", line 156, in close
self._writer.close()
AttributeError: 'TensorBoardOutput' object has no attribute '_writer'

How can I solve the problems? Thank you very much.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.