facebookresearch / reagent Goto Github PK

View Code? Open in Web Editor NEW

3.5K 150.0 515.0 31.96 MB

A platform for Reasoning systems (Reinforcement Learning, Contextual Bandits, etc.)

Home Page: https://reagent.ai

License: BSD 3-Clause "New" or "Revised" License

Python 80.93% Scala 3.02% C++ 3.31% CMake 0.12% Jupyter Notebook 12.59% Shell 0.03%

reagent's Introduction

ReAgent is officially archived and no longer maintained. For latest support on production-ready reinforcement learning open-source library, please refer to Pearl - Production-ready Reinforcement Learning AI Agent Library, by the Applied Reinforcement Learning team @ Meta.

Overview

ReAgent is an open source end-to-end platform for applied reinforcement learning (RL) developed and used at Facebook. ReAgent is built in Python and uses PyTorch for modeling and training and TorchScript for model serving. The platform contains workflows to train popular deep RL algorithms and includes data preprocessing, feature transformation, distributed training, counterfactual policy evaluation, and optimized serving. For more detailed information about ReAgent see the release post here and white paper here.

The platform was once named "Horizon" but we have adopted the name "ReAgent" recently to emphasize its broader scope in decision making and reasoning.

Algorithms Supported

Classic Off-Policy algorithms:

Discrete-Action DQN
Parametric-Action DQN
Double DQN, Dueling DQN, Dueling Double DQN
Distributional RL: C51 and QR-DQN
Twin Delayed DDPG (TD3)
Soft Actor-Critic (SAC)
Critic Regularized Regression (CRR)
Proximal Policy Optimization Algorithms (PPO)

RL for recommender systems:

Counterfactual Evaluation:

Doubly Robust (for bandits)
Doubly Robust (for sequential decisions)
MAGIC

Multi-Arm and Contextual Bandits:

Others:

Installation

ReAgent can be installed via. Docker or manually. Detailed instructions on how to install ReAgent can be found here.

Tutorial

ReAgent is designed for large-scale, distributed recommendation/optimization tasks where we don’t have access to a simulator. In this environment, it is typically better to train offline on batches of data, and release new policies slowly over time. Because the policy updates slowly and in batches, we use off-policy algorithms. To test a new policy without deploying it, we rely on counter-factual policy evaluation (CPE), a set of techniques for estimating a policy based on the actions of another policy.

We also have a set of tools to facilitate applying RL in real-world applications:

Domain Analysis Tool, which analyzes state/action feature importance and identifies whether the problem is a suitable for applying batch RL
Behavior Cloning, which clones from the logging policy to bootstrap the learning policy safely

Detailed instructions on how to use ReAgent can be found here.

License

ReAgent is released under a BSD 3-Clause license. Find out more about it here.

Citing

@article{gauci2018horizon,
  title={Horizon: Facebook's Open Source Applied Reinforcement Learning Platform},
  author={Gauci, Jason and Conti, Edoardo and Liang, Yitao and Virochsiri, Kittipat and Chen, Zhengxing and He, Yuchen and Kaden, Zachary and Narayanan, Vivek and Ye, Xiaohui},
  journal={arXiv preprint arXiv:1811.00260},
  year={2018}
}

reagent's People

Contributors

Stargazers

Watchers

Forkers

hychyc90 jdc08161063 dakeli bin913 nishadsingh1 keithmgould caozhengquan tony32769 chinpeng hychyc07 robot-ai-machinelearning darbour sak797 onisimchukv mistertea narayanmahto houseroad kittipatv hyzcn ml-lab raphael7788 wycharry hunglethanh9 kanm05 philliply rjammala mdiby codeaudit zhouyonglong anirband albertmcma dylanthomas yngtodd little1tow fuath rhendrickson42 azeem-r00t donrv yueyedeai cfwvip fishexpert jshuadvd uwesterr gavinljj brylie rtbins duke24k o7s8r6 joizhang2012 johnjjung sun2009ban joshrose mikepsinn tkhan3 jiapei100 seguce92 danromuald smilejx g-wang dongfangduoshou123 pvr1 vishwanath1306 jiths euwen yuziwenzju vagus30 techeye220 juraldinio phatlast96 hack121 hubbucket-team arunkumarramanan ferasos lilizheng-cn baek-jinoo deisler134 mbyase swansealeo renyi533 thzll2001 yishuihanhan capitulation davechenxy ilineicry minsu-daniel-kim collector-m che1qian2 sweatyrichard stjordanis vidit-bhatia miketembo jenny-nlc guanjiahui b-xiang boozyguo tu-cao valmsmith39a daniellsm chaoyue729 ailzy

reagent's Issues

How to specify embeddings for categorical variables in state space?

hi,

While estimating DQN model or DDPG, how to specify in config jsons a function approximator, of Q value network or the policy network, embeddings for categorical variables?

Some more documentation on the configuring jsons will be immensely helpful.

Narasimha

Could you please provide some example on using "Data Understanding Tool"?

Thanks for updating the paper.

In the updated paper, you mentioned you have implemented a "data understanding tool" based on world model. This is super useful and important.

Could you provide some example on how to use the tool?

OOM killed

Hi,

I played dqn_workflow with 7.9G training_data. But i got a OOM Killed.
Below is my environment and oom logs.

workflow : dqn_workflow.py
training_data : 8 features, 20,249,257 rows, 7.9G
training_eval_data : 8 features, 2,028,916 rows, 0.8G
RAM : 80G

INFO:ml.rl.evaluation.evaluation_data_page:EvaluationDataPage minibatch size: 2028912
WARNING:ml.rl.evaluation.doubly_robust_estimator:Can't normalize DR-CPE because of small or negative logged_policy_score
Killed

[Tue May  7 22:05:38 2019] python invoked oom-killer: gfp_mask=0x6200ca(GFP_HIGHUSER_MOVABLE), nodemask=(null), order=0, oom_score_adj=0
[Tue May  7 22:05:38 2019] python cpuset=42ee6ef8b84594988960735ef211ac05221059efc2d524f2afc1e2b49eb46d0c mems_allowed=0-1
[Tue May  7 22:05:38 2019] CPU: 1 PID: 51997 Comm: python Tainted: P           O      4.20.13-1.el7.elrepo.x86_64 #1
[Tue May  7 22:05:38 2019] Hardware name: Dell Inc. PowerEdge C4140/013M88, BIOS 1.6.11 11/21/2018
[Tue May  7 22:05:38 2019] Call Trace:
[Tue May  7 22:05:38 2019]  dump_stack+0x63/0x88
[Tue May  7 22:05:38 2019]  dump_header+0x78/0x2a4
[Tue May  7 22:05:38 2019]  ? mem_cgroup_scan_tasks+0x9c/0xf0
[Tue May  7 22:05:38 2019]  oom_kill_process+0x26b/0x290
[Tue May  7 22:05:38 2019]  out_of_memory+0x140/0x4b0
[Tue May  7 22:05:38 2019]  mem_cgroup_out_of_memory+0x4b/0x80
[Tue May  7 22:05:38 2019]  try_charge+0x6e2/0x750
[Tue May  7 22:05:38 2019]  mem_cgroup_try_charge+0x8c/0x1e0
[Tue May  7 22:05:38 2019]  __add_to_page_cache_locked+0x1a0/0x300
[Tue May  7 22:05:38 2019]  ? scan_shadow_nodes+0x30/0x30
[Tue May  7 22:05:38 2019]  add_to_page_cache_lru+0x4e/0xd0
[Tue May  7 22:05:38 2019]  filemap_fault+0x428/0x7c0
[Tue May  7 22:05:38 2019]  ? xas_find+0x138/0x1a0
[Tue May  7 22:05:38 2019]  ? filemap_map_pages+0x153/0x3c0
[Tue May  7 22:05:38 2019]  __do_fault+0x3e/0xc0
[Tue May  7 22:05:38 2019]  __handle_mm_fault+0xbd6/0xe80
[Tue May  7 22:05:38 2019]  handle_mm_fault+0x102/0x220
[Tue May  7 22:05:38 2019]  __do_page_fault+0x21c/0x4c0
[Tue May  7 22:05:38 2019]  do_page_fault+0x37/0x140
[Tue May  7 22:05:38 2019]  ? page_fault+0x8/0x30
[Tue May  7 22:05:38 2019]  page_fault+0x1e/0x30
...
[Tue May  7 22:05:38 2019] Memory cgroup out of memory: Kill process 51997 (python) score 997 or sacrifice child
[Tue May  7 22:05:38 2019] Killed process 51997 (python) total-vm:102757536kB, anon-rss:83335008kB, file-rss:132692kB, shmem-rss:8192kB
[Tue May  7 22:05:42 2019] oom_reaper: reaped process 51997 (python), now anon-rss:0kB, file-rss:127188kB, shmem-rss:8192kB

green : CPU
yellow : RAM

Running on ppc64le

Hi, I am running on ppc64le. There are many packages for which PackagesNotFoundError has come. I think this is because of I was using ppc64le. I went to 'anaconda.org' to search for the packages they are not present for ppc64le. Can I get any help in this issue.

Error in setup.py

requirements not specified in the install_requires=[] in setup.py file.

AttributeError: 'SACTrainingParameters' object has no attribute 'thrift_spec'

Hi,

I successfully installed via the docker instructions, but when trying to install locally I get the error below when running the test step. I've got a custom tensorflow build that works with Cuda 10.0/CudaNN 7.4 running on the host, so that may be a factor, but seems unlikely. The installation instructions say to set JAVA_HOME to the parent of the python dir. I don't have Java installed there, so that might also be a factor. In my case this is /home/jxstanford/.pyenv

System information

OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 18.04
TensorFlow installed from (source or binary): source
TensorFlow version: r1.12 and master (7561099fb65c18bd091751b60fc45550fc5d4805)
Python version: Python 3.6.3 :: Anaconda, Inc.
Installed using virtualenv? pip? conda?: pyenv
CUDA/cuDNN version: 10.0 / 7.4
GPU model and memory: GTX 1080ti

jxstanford@ryzen-1080:~/devel/conda-projects/Horizon$ python setup.py test running test running egg_info writing horizon.egg-info/PKG-INFO writing dependency_links to horizon.egg-info/dependency_links.txt writing top-level names to horizon.egg-info/top_level.txt reading manifest file 'horizon.egg-info/SOURCES.txt' writing manifest file 'horizon.egg-info/SOURCES.txt' running build_ext Traceback (most recent call last): File "setup.py", line 27, in <module> dependency_links=[], File "/home/jxstanford/.pyenv/versions/miniconda3-4.3.30/lib/python3.6/distutils/core.py", line 148, in setup dist.run_commands() File "/home/jxstanford/.pyenv/versions/miniconda3-4.3.30/lib/python3.6/distutils/dist.py", line 955, in run_commands self.run_command(cmd) File "/home/jxstanford/.pyenv/versions/miniconda3-4.3.30/lib/python3.6/distutils/dist.py", line 974, in run_command cmd_obj.run() File "/home/jxstanford/.pyenv/versions/miniconda3-4.3.30/lib/python3.6/site-packages/setuptools/command/test.py", line 215, in run self.run_tests() File "/home/jxstanford/.pyenv/versions/miniconda3-4.3.30/lib/python3.6/site-packages/setuptools/command/test.py", line 238, in run_tests **exit_kwarg File "/home/jxstanford/.pyenv/versions/miniconda3-4.3.30/lib/python3.6/unittest/main.py", line 94, in __init__ self.parseArgs(argv) File "/home/jxstanford/.pyenv/versions/miniconda3-4.3.30/lib/python3.6/unittest/main.py", line 124, in parseArgs self._do_discovery([]) File "/home/jxstanford/.pyenv/versions/miniconda3-4.3.30/lib/python3.6/unittest/main.py", line 229, in _do_discovery self.test = loader.discover(self.start, self.pattern, self.top) File "/home/jxstanford/.pyenv/versions/miniconda3-4.3.30/lib/python3.6/unittest/loader.py", line 341, in discover tests = list(self._find_tests(start_dir, pattern)) File "/home/jxstanford/.pyenv/versions/miniconda3-4.3.30/lib/python3.6/unittest/loader.py", line 398, in _find_tests full_path, pattern, namespace) File "/home/jxstanford/.pyenv/versions/miniconda3-4.3.30/lib/python3.6/unittest/loader.py", line 475, in _find_test_path tests = self.loadTestsFromModule(package, pattern=pattern) File "/home/jxstanford/.pyenv/versions/miniconda3-4.3.30/lib/python3.6/site-packages/setuptools/command/test.py", line 43, in loadTestsFromModule tests.append(self.loadTestsFromName(submodule)) File "/home/jxstanford/.pyenv/versions/miniconda3-4.3.30/lib/python3.6/unittest/loader.py", line 190, in loadTestsFromName return self.loadTestsFromModule(obj) File "/home/jxstanford/.pyenv/versions/miniconda3-4.3.30/lib/python3.6/site-packages/setuptools/command/test.py", line 43, in loadTestsFromModule tests.append(self.loadTestsFromName(submodule)) File "/home/jxstanford/.pyenv/versions/miniconda3-4.3.30/lib/python3.6/unittest/loader.py", line 190, in loadTestsFromName return self.loadTestsFromModule(obj) File "/home/jxstanford/.pyenv/versions/miniconda3-4.3.30/lib/python3.6/site-packages/setuptools/command/test.py", line 43, in loadTestsFromModule tests.append(self.loadTestsFromName(submodule)) File "/home/jxstanford/.pyenv/versions/miniconda3-4.3.30/lib/python3.6/unittest/loader.py", line 190, in loadTestsFromName return self.loadTestsFromModule(obj) File "/home/jxstanford/.pyenv/versions/miniconda3-4.3.30/lib/python3.6/site-packages/setuptools/command/test.py", line 43, in loadTestsFromModule tests.append(self.loadTestsFromName(submodule)) File "/home/jxstanford/.pyenv/versions/miniconda3-4.3.30/lib/python3.6/unittest/loader.py", line 153, in loadTestsFromName module = __import__(module_name) File "/home/jxstanford/devel/conda-projects/Horizon/ml/rl/test/gym/open_ai_gym_environment.py", line 8, in <module> from ml.rl.test.gym.gym_predictor import ( File "/home/jxstanford/devel/conda-projects/Horizon/ml/rl/test/gym/gym_predictor.py", line 8, in <module> from ml.rl.training.dqn_trainer import DQNTrainer File "/home/jxstanford/devel/conda-projects/Horizon/ml/rl/training/dqn_trainer.py", line 13, in <module> from ml.rl.preprocessing.normalization import ( File "/home/jxstanford/devel/conda-projects/Horizon/ml/rl/preprocessing/normalization.py", line 12, in <module> from ml.rl.thrift.core.ttypes import NormalizationParameters File "/home/jxstanford/devel/conda-projects/Horizon/ml/rl/thrift/core/ttypes.py", line 2070, in <module> class SACModelParameters(object): File "/home/jxstanford/devel/conda-projects/Horizon/ml/rl/thrift/core/ttypes.py", line 2087, in SACModelParameters }), training=SACTrainingParameters(**{ File "/home/jxstanford/devel/conda-projects/Horizon/ml/rl/thrift/core/ttypes.py", line 1941, in __init__ if q_network_optimizer is self.thrift_spec[2][4]: AttributeError: 'SACTrainingParameters' object has no attribute 'thrift_spec'

tableSample explanation

Wondering what tableSample means from the configs? Seems important.

https://github.com/facebookresearch/Horizon/blob/e3abf433f8bbeca3140e1a5422568a778b9a3887/preprocessing/src/main/scala/com/facebook/spark/rl/Preprocessor.scala#L13

What is the interpretation of these numbers?

After running the CPE for CartPole-v0, it gives the following output:

Reward Inverse Propensity Score : normalized 0.500 raw 0.500
Reward Direct Method : normalized 1.000 raw 1.000
Reward Doubly Robust P.E. : normalized 0.999 raw 0.999
Value Weighted Doubly Robust P.E. : normalized 12163.311 raw 1026487.620
Value Sequential Doubly Robust P.E. : normalized 2128.063 raw 179591.880
Value Magic Doubly Robust P.E. : normalized 4535.132 raw 382729.374

I am wondering what are these numbers? I think that they should be ideally close to 200, but they are off a little bit. Thanks.

mvn -f preprocessing/pom.xml clean package Failed

mvn -f preprocessing/pom.xml clean package
......

- two-state-mdp
19/02/20 23:17:09 INFO TimelineTest: Full logs for test case: /tmp/com.facebook.spark.rl.TimelineTest_filter-outliers.log
- filter-outliers *** FAILED ***
org.apache.spark.sql.AnalysisException: Undefined function: 'fb_approx_percentile'. This function is neither a registered temporary function nor a permanent function registered in the database 'default'.; line 3 pos 28
at org.apache.spark.sql.catalyst.analysis.Analyzer$LookupFunctions$$anonfun$apply$15$$anonfun$applyOrElse$50.apply(Analyzer.scala:1216)
at org.apache.spark.sql.catalyst.analysis.Analyzer$LookupFunctions$$anonfun$apply$15$$anonfun$applyOrElse$50.apply(Analyzer.scala:1216)
at org.apache.spark.sql.catalyst.analysis.package$.withPosition(package.scala:53)
at org.apache.spark.sql.catalyst.analysis.Analyzer$LookupFunctions$$anonfun$apply$15.applyOrElse(Analyzer.scala:1215)
at org.apache.spark.sql.catalyst.analysis.Analyzer$LookupFunctions$$anonfun$apply$15.applyOrElse(Analyzer.scala:1213)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:267)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:267)
at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:266)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
...
Run completed in 20 seconds, 846 milliseconds.
Total number of tests run: 5
Suites: completed 2, aborted 0
Tests: succeeded 4, failed 1, canceled 0, ignored 0, pending 0
*** 1 TEST FAILED ***

Add comprehensive documentation (using Sphinx)

Please add comprehensive documentation, including:

project overview
quickstart
one or more tutorials
API reference

By way of example, the scikit-learn docs are superbe. Also PyTorch and Keras have useful documentation.

Getting a error running spark-submit job

Hello,

I am trying to follow the instructions here: https://github.com/facebookresearch/Horizon/blob/master/docs/usage.md

When I run this script:
/usr/local/spark/bin/spark-submit
--class com.facebook.spark.rl.Preprocessor preprocessing/target/rl-preprocessing-1.1.jar
"cat ml/rl/workflow/sample_configs/discrete_action/timeline.json"

I am getting2019-02-27 00:57:03 INFO HiveMetaStore:746 - 0: get_database: global_temp
2019-02-27 00:57:03 INFO audit:371 - ugi=root ip=unknown-ip-addr cmd=get_database: global_temp
2019-02-27 00:57:03 WARN ObjectStore:568 - Failed to get database global_temp, returning NoSuchObjectException
Exception in thread "main" org.apache.spark.sql.AnalysisException: grouping expressions sequence is empty, and 'source_table.mdp_id' is not an aggregate function. Wrap '()' in windowing function(s) or wrap 'source_table.mdp_id' in first() (or first_value) if you don't care which value you get.;;
'Sort ['HASH('mdp_id, 'sequence_number) ASC NULLS FIRST], false
+- 'RepartitionByExpression ['HASH('mdp_id, 'sequence_number)], 200
+- 'Project [mdp_id#1, state_features#4, action#5, action_probability#3, reward#6, next_state_features#24, next_action#25, sequence_number#2, sequence_number_ordinal#26, time_diff#27, possible_actions#7, possible_next_actions#28, metrics#8]
+- 'Project [mdp_id#1, state_features#4, action#5, action_probability#3, reward#6, sequence_number#2, possible_actions#7, metrics#8, next_state_features#24, next_action#25, sequence_number_ordinal#26, _we3#30, possible_next_actions#28, next_state_features#24, next_action#25, sequence_number_ordinal#26, (coalesce(_we3#30, sequence_number#2) - sequence_number#2) AS time_diff#27, possible_next_actions#28]
+- 'Window [lead(state_features#4, 1, null) windowspecdefinition(mdp_id#1, mdp_id#1 ASC NULLS FIRST, sequence_number#2 ASC NULLS FIRST, specifiedwindowframe(RowFrame, 1, 1)) AS next_state_features#24, lead(action#5, 1, null) windowspecdefinition(mdp_id#1, mdp_id#1 ASC NULLS FIRST, sequence_number#2 ASC NULLS FIRST, specifiedwindowframe(RowFrame, 1, 1)) AS next_action#25, row_number() windowspecdefinition(mdp_id#1, mdp_id#1 ASC NULLS FIRST, sequence_number#2 ASC NULLS FIRST, specifiedwindowframe(RowFrame, unboundedpreceding$(), currentrow$())) AS sequence_number_ordinal#26, lead(sequence_number#2, 1, null) windowspecdefinition(mdp_id#1, mdp_id#1 ASC NULLS FIRST, sequence_number#2 ASC NULLS FIRST, specifiedwindowframe(RowFrame, 1, 1)) AS _we3#30, lead(possible_actions#7, 1, null) windowspecdefinition(mdp_id#1, mdp_id#1 ASC NULLS FIRST, sequence_number#2 ASC NULLS FIRST, specifiedwindowframe(RowFrame, 1, 1)) AS possible_next_actions#28], [mdp_id#1], [mdp_id#1 ASC NULLS FIRST, sequence_number#2 ASC NULLS FIRST]
+- 'Filter isnotnull('next_state_features)
+- Aggregate [mdp_id#1, state_features#4, action#5, action_probability#3, reward#6, sequence_number#2, possible_actions#7, metrics#8]
+- SubqueryAlias source_table
+- Project [mdp_id#1, state_features#4, action#5, action_probability#3, reward#6, sequence_number#2, possible_actions#7, metrics#8]
+- Filter ((ds#0 >= 2019-01-01) && (ds#0 <= 2019-01-01))
+- SubqueryAlias cartpole_discrete
+- Relation[ds#0,mdp_id#1,sequence_number#2,action_probability#3,state_features#4,action#5,reward#6,possible_actions#7,metrics#8] json

I tried the steps, after manually installing Hbase (This step is missing in the documentation. Please let me know, if you want me to add it)

I am using docker on Mac instructions (https://github.com/facebookresearch/Horizon/blob/master/docs/installation.md) to get going. Can anyone please help me on how to move forward?

actions/logged seem mismatched in TensorBoard

Hi guys,

I've been using Horizon to train a discrete_action DQN. My state features are labeled by "0","1","2", and there are 100 possible discrete actions. In the training timeline data, my actions are named 0,1,2,...,99. I trained the data using:

python ml/rl/workflow/dqn_workflow.py -p dqn.json

Since the state features has taken up 0,1,and 2, in dqn.json I listed my actions as "3", "4", ..., "102". I have also attached my simplified dqn.json file here.
dqn.txt

After the training is done, I then visualize the results with TensorBoard. And in the "actions" section I see something like this:

I have several questions about this:

Does actions/logged/4 describes how many times the action "4" shows up in each epoch? If so, why is the number oscillating? I would imagine that each epoch goes through the same training data and therefore the action count should be a constant?
In my training set, action 0 did not happen at all. I would imagine in Horizon's case that means action "3" should have a count of 0 (I assume the action correspondence is 0-"3", 1-"4",...,99-"102"). However, that is not the case as shown below:

My training set also does not have any actions beyond 60, which I interpret as I should see a count of 0 for all actions beyond "63" in TensorBoard, but instead I see count of 0 for actions between "61" and "101", but then oscillations around 8000 in "102".

Is there some other rules for corresponding the actions in training data and in dqn.json? I have been reading the code, but could not find where Horizon corresponds the training data action names to the action names defined in dqn.json. It would be great if you can point me to the part of the code that takes care of this correspondence.

Update: I have taken a closer look at action count in my training timeline data, and I am pretty sure the action correspondence between my action names (0,...,99) and dqn.json action names ("3",...,"102") is as follows:
3-"3",...,99-"99", 1 and 2 - "102", and then I am not sure if 0 corresponds to "100" or "101" since they all have 0 count. This is really weird. Do you guys have any idea how this comes to be?

I also noticed that some actions have slightly lower count in TensorBoard than in the training timeline data (e.g. 34 vs 35, 239 vs 241). Is it because the training data size cannot be perfectly divided by minibatch_size so some samples are not included in training?

Thank you!

Best,
Fengdan

How to adapt the framework to support recurrent policy?

Dear all,
It's nice to have such a powerful framework. I'm wondering if the recurrent policy could be supported, since the LSTM policy is essential for solving some partial MDP tasks.

Thanks
Shanchao Yang

Max being computed for wrong q values

In dqn_trainer_base.py, the method get_max_q_values_with_target() is returning the max of Q values from the current network instead of the max of Q values from the target network.

According to the Q learning Update equation: Loss = R + gamma . [max a' Q(s', a'; target)] - Q(s, a; current)

But according to the implementation in dqn_trainer_base.py, this is happening:
Loss = R + gamma . [max a' Q(s', a'; current)] - Q(s, a; current)

mdp_id unicode issue

Was using str(uuid.uuid4()) for mdp_id and training model fails on call to digest. Solved my making all mdp_id simply numeric codes.

This should not be publically available?

Hi All,

I guess this link shouldn't be available for external people? :) Is it possible to publish this on an external facing website so we can access it as well?

https://github.com/facebookresearch/Horizon/blob/2023696ec2db4fb94855792ef4190f76219f2920/preprocessing/src/main/scala/com/facebook/spark/rl/Timeline.scala#L23

low average reward for cartpole v0 example?

I am following the cartpole_discrete example and got score 9.23.
INFO:main:Achieved an average reward score of 9.23 over 100 evaluations.

Note: I am using the discrete_dqn_cartpole_v0.json, not the 100 eps file for training data. More data didn't help.

Its no way close to 195 expected to "solve. Could anyone comment to resolve this?

PS:CartPole-v0 defines "solving" as getting average reward of 195.0 over 100 consecutive trials.

Documentation of how optimisation of DQN happens with counterfactual policy evaluation

how to specify categorical and continuous variables, and embeddings ?

How do I specify some state space variables as categorical and some as continuous?

For example, in the network approximator (DQN or policy forward) I want to specify embedding layers for categorical variables and have them concatenated with continuous variables as Input.

Thanks,
Narasimha

Is the log max value "2.0" correct?

https://github.com/facebookresearch/Horizon/blob/7f9f8d4fbd1ab0073c4953d247b5d7cd09fe63b7/ml/rl/models/actor.py#L94
Is not 20.0?

Trying out different examples.

Hi, I followed the usage documentation and tried the example for dqn_workflow.py. So there are still other files in the ml/rl/workflow directory right!!! but when I try to run those with the command : python ml/rl/workflow/ddpg_workflow.py -p ml/rl/workflow/sample_configs/discrete_action/dqn_small.json I am getting the following error.
Error:
Traceback (most recent call last):
File "ml/rl/workflow/ddpg_workflow.py", line 189, in
main(params)
File "ml/rl/workflow/ddpg_workflow.py", line 131, in main
params["shared_training"]["minibatch_size"] *= minibatch_size_multiplier(
KeyError: 'shared_training'

So do i need to create the data again for different workflow, for example ddpg_workflow.py, parametric_dqn_workflow.py etc?
And also to try out different examples is there any method there? If I am doing anything wrong please help me.
There are also different models present in the model directory. Can I know how to use them?

I also have a doubt about the On-Policy Training, is one command enough to get the result or that command is used for data creating only? Command: python ml/rl/test/gym/run_gym.py -p ml/rl/test/gym/discrete_dqn_cartpole_v0.json. This command is not for training right!

Mistake in usage.md

Hi, I am currently trying to familiarize myself with Horizon and I noticed that the usage.md doesn't add up completely. In Part 2 after running

/usr/local/spark/bin/spark-submit \
  --class com.facebook.spark.rl.Preprocessor preprocessing/target/rl-preprocessing-1.1.jar \
  "`cat ml/rl/workflow/sample_configs/discrete_action/timeline.json`"

In the next steps it is always referred to cartpole_discrete_timeline but this directory doesn't exist. Do you mean cartpole_discrete_training? I checked out the master branch on commit 6a5cd15

Tensorboard not showing any data

Hi Team,

I have done all the installation in the Docker container. I run the command provided in the usage document and when running on Tensorboard it didn't display any image, histogram, scalar, audio.
Please see below images

Please help me to resolve this issue.
Thanks in Advance.

deprecation warning for nn.functional.tanh

After running ./scripts/setup.sh

/home/miniconda/lib/python3.6/site-packages/torch/nn/functional.py:1367: UserWarning: nn.functional.tanh is deprecated. Use torch.tanh instead.

from:

/home/miniconda/lib/python3.6/site-packages/torch/nn/functional.py:1367
/home/miniconda/lib/python3.6/site-packages/torch/nn/functional.py:1367
  /home/miniconda/lib/python3.6/site-packages/torch/nn/functional.py:1367: UserWarning: nn.functional.tanh is deprecated. Use torch.tanh instead.
    warnings.warn("nn.functional.tanh is deprecated. Use torch.tanh instead.")

-- Docs: https://docs.pytest.org/en/latest/warnings.html
================================================================= 4 passed, 6 skipped, 67 warnings in 12.88 seconds =================================================================

`next_action` key does not exist in timeline data

While following the doc usage, I found that the data generated by running python ml/rl/test/gym/run_gym.py -p ml/rl/test/gym/discrete_dqn_cartpole_v0_100_eps.json -f cartpole_discrete/training_data.json and the subsequent commands does not generate next_action, thus, generating this error when running python ml/rl/workflow/dqn_workflow.py -p ml/rl/workflow/sample_configs/discrete_action/dqn_example.json

root@66f0303019f9:~/Horizon# python ml/rl/workflow/dqn_workflow.py -p ml/rl/workflow/sample_configs/discrete_action/dqn_example.json
Traceback (most recent call last):
  File "ml/rl/workflow/dqn_workflow.py", line 164, in <module>
    train_network(params)
  File "ml/rl/workflow/dqn_workflow.py", line 121, in train_network
    tdp = preprocess_batch_for_training(preprocessor, batch, action_names)
  File "/home/Horizon/ml/rl/workflow/training_data_reader.py", line 234, in preprocess_batch_for_training
    next_actions = read_actions(action_names, batch["next_action"])

I got it to work with the gz files. It was a bit confusing that one of the suggested ways didn't work.

Pendulum Example, continuous action space example for DDPG

hi,

Can someone please help with continuous action example? What should the output format look like for continuous action space examples? Did anyone run pendulum example successfully? I tried running pendulum example but found errors.

step 1:-
Generate Data
python3 ml/rl/test/gym/run_gym.py -p /ddpg_pendulum_v0_datagen.json -f /training_data.json

Generated Data format is as following (sample):-
{"ds": "2019-01-01",
"mdp_id": "99",
"sequence_number": 179,
"state_features": {"0": 0.7077672662018737, "1": -0.7064456786569835, "2": -1.4642522740018904},
"action": {"3": 1.9947617053985596},
"reward": -0.8337657799161808,
"action_probability": 0.0,
"possible_actions": null,
"metrics": {"reward": -0.8337657799161808}}

step 2
Call spark job to process the data using config example for continuous_actions
{
"timeline": {
"startDs": "2019-01-01",
"endDs": "2019-01-01",
"addTerminalStateRow": false,
"actionDiscrete": false,
"inputTableName": "pendulum",
"outputTableName": "pendulum_training",
"evalTableName": "pendulum_eval",
"numOutputShards": 1
},
"query": {
"tableSample": 5
}
}

Here is the snippet of error ..

2019-04-09 19:27:01 INFO FileScanRDD:54 - Reading File path: file:///oxygen/ec2-user/project/narasimham/Dynamic_pricing/horizon/pendulum_ctn/pendulum/training_data.json, range: 0-4194304, partition values: [empty row]
2019-04-09 19:27:01 INFO FileScanRDD:54 - Reading File path: file:///oxygen/ec2-user/project/narasimham/Dynamic_pricing/horizon/pendulum_ctn/pendulum/training_data.json, range: 4194304-6500846, partition values: [empty row]
2019-04-09 19:27:01 ERROR Executor:91 - Exception in task 0.0 in stage 1.0 (TID 1)
org.apache.spark.SparkException: Failed to execute user defined function(anonfun$2: (array<map<string,double>>) => array<map<bigint,double>>)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$10$$anon$1.hasNext(WholeStageCodegenExec.scala:619)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:125)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
at org.apache.spark.scheduler.Task.run(Task.scala:109)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.NullPointerException
at scala.collection.mutable.ArrayOps$ofRef$.length$extension(ArrayOps.scala:192)
at scala.collection.mutable.ArrayOps$ofRef.length(ArrayOps.scala:192)
at scala.collection.SeqLike$class.size(SeqLike.scala:106)
at scala.collection.mutable.ArrayOps$ofRef.size(ArrayOps.scala:186)
at scala.collection.mutable.Builder$class.sizeHint(Builder.scala:69)
at scala.collection.mutable.ArrayBuilder.sizeHint(ArrayBuilder.scala:22)
at scala.collection.TraversableLike$class.builder$1(TraversableLike.scala:230)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:233)
at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186)
at com.facebook.spark.rl.Preprocessor$$anonfun$2.apply(Preprocessor.scala:67)
at com.facebook.spark.rl.Preprocessor$$anonfun$2.apply(Preprocessor.scala:66)

Thanks,
Narasimha

Issue while running "python ml/rl/test/gym/run_gym.py -p ml/rl/test/gym/discrete_dqn_cartpole_v0.json -f cartpole_discrete/training_data.json"

Hi, I am following the Horizon usage document step by step. When ever I am running the command "python ml/rl/test/gym/run_gym.py -p ml/rl/test/gym/discrete_dqn_cartpole_v0.json -f cartpole_discrete/training_data.json" I am getting the below error. I am running this in ppc64le. Please correct me if i am doing anything wrong.
Platform: ppc64le
Python version: 3.6
CUDA/cudnn:10.0/7.4

For some of the dependencies in the requirements.txt, versions mentioned are not available for ppc64le, so I installed some higher version for them like,
tensorflow : 1.13.1
thrift : 0.11.0
thrift-cpp: 0.11.0
tensorboard: 1.13.0
OpenJDK: 8.0.212

Error:
Traceback (most recent call last):
File "ml/rl/test/gym/run_gym.py", line 18, in
from ml.rl.preprocessing.normalization import get_num_output_features
File "/home/teja/Horizon/ml/rl/preprocessing/normalization.py", line 14, in
from ml.rl.thrift.core.ttypes import NormalizationParameters
File "/home/teja/Horizon/ml/rl/thrift/core/ttypes.py", line 2007, in
class SACModelParameters(object):
File "/home/teja/Horizon/ml/rl/thrift/core/ttypes.py", line 2023, in SACModelParameters
}), training=SACTrainingParameters(**{
File "/home/teja/Horizon/ml/rl/thrift/core/ttypes.py", line 1878, in init
if q_network_optimizer is self.thrift_spec[2][4]:
AttributeError: 'SACTrainingParameters' object has no attribute 'thrift_spec'

More supported models?

Dear authors,

Great work for the excellent. Below are the lists of supported models, which we think some other more methods are also crucial for some applications.
Discrete-Action DQN
Parametric-Action DQN
Double DQN, Dueling DQN, Dueling Double DQN
DDPG (DDPG)
Soft Actor-Critic (SAC)

Do have plans to implement some other deep RL models and include in this framework, such as (Async) Advantage Actor-Critic (A3C / A2C), Continuous A3C, and Distributed Deep Deterministic Policy Gradient (Distributed DDPG, aka D3PG), Parallelized Proximal Policy Optimization (P3O, similar to DPPO), etc.?

We plan to work on other more deep RL models using your framework, and hopefully, it is not a redundant work if you already plan to do so.

Thanks!

Unit tests not utilizing GPU

First time posting an issue; please pardon any improprieties....

The post install tests returned OK (with 1 skipped), but the tests don't appear to be using the GPU, as these lines appear repeatedly throughout the tests:

INFO:ml.rl.preprocessing.preprocessor:CUDA availability: True
INFO:ml.rl.preprocessing.preprocessor:NOT Using GPU: GPU not requested or not available.

Is this normal?

How might I verify that Horizon is properly configured to make use of the GPU?

Pytorch does see the GPU:

import torch
torch.cuda.current_device()
0
torch.cuda.device(0)
<torch.cuda.device object at 0x7f2a806edfd0>
torch.cuda.device_count()
1
torch.cuda.get_device_name(0)
'GeForce GTX 1060 with Max-Q Design'
torch.cuda.is_available()
True

I'm using a Dell G7 with Ubuntu 18.04.1

Thanks.

is the DR estimator suitable for continous action space?

Hi, I want to use DR estimator of Horizon to do off policy evaluation in RL problems in continuous action spaces, but I don't know whether it merely supports discrete action spaces or supports both. Any idea?

Did not find any episode values

https://github.com/facebookresearch/Horizon/blob/0e7fe9d742c408e1b42803b42e104efbc56bae5b/ml/rl/training/evaluator.py#L926

I have hit this condition a few times during runs and was wondering if there were docs to explain what it means.

Thanks!

Issue with "head -n1 cartpole_discrete_training/part*"

Hi, as I was not able to use Horizon on ppc64le, I shifted to ubuntu. I was able to install it well. And later I am following the usage page, all are going fine but when I reached the head -n1 cartpole_discrete_training/part* step its showing head: cannot open 'cartpole_discrete_training/part' for reading: No such file or directory*. Its showing same for head -n1 cartpole_discrete_eval/part* also. Please tell me if I am doing anything wrong. I am going as per the steps in the document.

Platform: Ubuntu 18
Python version: 3.6
CUDA/cudnn:10.0/7.4

sac test failing

I just installed Horizon and ran the tests, however there is an error:

======================================================================
ERROR: test_sac_trainer (ml.rl.test.gridworld.test_gridworld_sac.TestGridworldSAC)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/david/Code/Horizon/ml/rl/test/gridworld/test_gridworld_sac.py", line 191, in test_sac_trainer
    self._test_sac_trainer()
  File "/Users/david/Code/Horizon/ml/rl/test/gridworld/test_gridworld_sac.py", line 167, in _test_sac_trainer
    actor_predictor = self.get_actor_predictor(trainer, environment)
  File "/Users/david/Code/Horizon/ml/rl/test/gridworld/test_gridworld_sac.py", line 145, in get_actor_predictor
    trainer.actor_network, feature_extractor, output_transformer
  File "/Users/david/Code/Horizon/ml/rl/training/rl_exporter.py", line 59, in export
    output_transformer=self.output_transformer,
  File "/Users/david/Code/Horizon/ml/rl/models/base.py", line 152, in get_predictor_export_meta_and_workspace
    c2_model, input_blobs, output_blobs = self.get_caffe2_model()
  File "/Users/david/Code/Horizon/ml/rl/models/base.py", line 129, in get_caffe2_model
    caffe2.python.onnx.backend.prepare(model_protobuf),
  File "/Users/david/anaconda/envs/pytorch/lib/python3.6/site-packages/caffe2/python/onnx/backend.py", line 689, in prepare
    super(Caffe2Backend, cls).prepare(model, device, **kwargs)
  File "/Users/david/anaconda/envs/pytorch/lib/python3.6/site-packages/onnx/backend/base.py", line 74, in prepare
    onnx.checker.check_model(model)
  File "/Users/david/anaconda/envs/pytorch/lib/python3.6/site-packages/onnx/checker.py", line 86, in check_model
    C.check_model(model.SerializeToString())
onnx.onnx_cpp2py_export.checker.ValidationError: No Op or Function registered for ConstantFill with domain_version of 9

==> Context: Bad node spec: input: "25" output: "26" op_type: "ConstantFill" attribute { name: "dtype" i: 1 type: INT } attribute { name: "input_as_shape" i: 1 type: INT } attribute { name: "value" f: 0 type: FLOAT } doc_string: "/Users/david/Code/Horizon/ml/rl/models/actor.py(137): forward\n/Users/david/anaconda/envs/pytorch/lib/python3.6/site-packages/torch/nn/modules/module.py(477): _slow_forward\n/Users/david/anaconda/envs/pytorch/lib/python3.6/site-packages/torch/nn/modules/module.py(487): __call__\n/Users/david/Code/Horizon/ml/rl/models/base.py(337): forward\n/Users/david/anaconda/envs/pytorch/lib/python3.6/site-packages/torch/nn/modules/module.py(477): _slow_forward\n/Users/david/anaconda/envs/pytorch/lib/python3.6/site-packages/torch/nn/modules/module.py(487): __call__\n/Users/david/anaconda/envs/pytorch/lib/python3.6/site-packages/torch/jit/__init__.py(253): forward\n/Users/david/anaconda/envs/pytorch/lib/python3.6/site-packages/torch/nn/modules/module.py(489): __call__\n/Users/david/anaconda/envs/pytorch/lib/python3.6/site-packages/torch/jit/__init__.py(198): get_trace_graph\n/Users/david/anaconda/envs/pytorch/lib/python3.6/site-packages/torch/onnx/utils.py(192): _trace_and_get_graph_from_model\n/Users/david/anaconda/envs/pytorch/lib/python3.6/site-packages/torch/onnx/utils.py(224): _model_to_graph\n/Users/david/anaconda/envs/pytorch/lib/python3.6/site-packages/torch/onnx/utils.py(281): _export\n/Users/david/anaconda/envs/pytorch/lib/python3.6/site-packages/torch/onnx/utils.py(104): export\n/Users/david/anaconda/envs/pytorch/lib/python3.6/site-packages/torch/onnx/__init__.py(27): export\n/Users/david/Code/Horizon/ml/rl/models/base.py(120): export_to_buffer\n/Users/david/Code/Horizon/ml/rl/models/base.py(125): get_caffe2_model\n/Users/david/Code/Horizon/ml/rl/models/base.py(152): get_predictor_export_meta_and_workspace\n/Users/david/Code/Horizon/ml/rl/training/rl_exporter.py(59): export\n/Users/david/Code/Horizon/ml/rl/test/gridworld/test_gridworld_sac.py(145): get_actor_predictor\n/Users/david/Code/Horizon/ml/rl/test/gridworld/test_gridworld_sac.py(167): _test_sac_trainer\n/Users/david/Code/Horizon/ml/rl/test/gridworld/test_gridworld_sac.py(191): test_sac_trainer\n/Users/david/anaconda/envs/pytorch/lib/python3.6/unittest/case.py(605): run\n/Users/david/anaconda/envs/pytorch/lib/python3.6/unittest/case.py(653): __call__\n/Users/david/anaconda/envs/pytorch/lib/python3.6/unittest/suite.py(122): run\n/Users/david/anaconda/envs/pytorch/lib/python3.6/unittest/suite.py(84): __call__\n/Users/david/anaconda/envs/pytorch/lib/python3.6/unittest/suite.py(122): run\n/Users/david/anaconda/envs/pytorch/lib/python3.6/unittest/suite.py(84): __call__\n/Users/david/anaconda/envs/pytorch/lib/python3.6/unittest/suite.py(122): run\n/Users/david/anaconda/envs/pytorch/lib/python3.6/unittest/suite.py(84): __call__\n/Users/david/anaconda/envs/pytorch/lib/python3.6/unittest/suite.py(122): run\n/Users/david/anaconda/envs/pytorch/lib/python3.6/unittest/suite.py(84): __call__\n/Users/david/anaconda/envs/pytorch/lib/python3.6/unittest/suite.py(122): run\n/Users/david/anaconda/envs/pytorch/lib/python3.6/unittest/suite.py(84): __call__\n/Users/david/anaconda/envs/pytorch/lib/python3.6/unittest/suite.py(122): run\n/Users/david/anaconda/envs/pytorch/lib/python3.6/unittest/suite.py(84): __call__\n/Users/david/anaconda/envs/pytorch/lib/python3.6/unittest/suite.py(122): run\n/Users/david/anaconda/envs/pytorch/lib/python3.6/unittest/suite.py(84): __call__\n/Users/david/anaconda/envs/pytorch/lib/python3.6/unittest/runner.py(176): run\n/Users/david/anaconda/envs/pytorch/lib/python3.6/unittest/main.py(256): runTests\n/Users/david/anaconda/envs/pytorch/lib/python3.6/unittest/main.py(95): __init__\n/Users/david/anaconda/envs/pytorch/lib/python3.6/site-packages/setuptools/command/test.py(250): run_tests\n/Users/david/anaconda/envs/pytorch/lib/python3.6/site-packages/setuptools/command/test.py(228): run\n/Users/david/anaconda/envs/pytorch/lib/python3.6/distutils/dist.py(974): run_command\n/Users/david/anaconda/envs/pytorch/lib/python3.6/distutils/dist.py(955): run_commands\n/Users/david/anaconda/envs/pytorch/lib/python3.6/distutils/core.py(148): setup\n/Users/david/anaconda/envs/pytorch/lib/python3.6/site-packages/setuptools/__init__.py(143): setup\nsetup.py(27): <module>\n"

======================================================================
ERROR: test_sac_trainer_model_propensity (ml.rl.test.gridworld.test_gridworld_sac.TestGridworldSAC)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/david/Code/Horizon/ml/rl/test/gridworld/test_gridworld_sac.py", line 205, in test_sac_trainer_model_propensity
    self._test_sac_trainer(logged_action_uniform_prior=True)
  File "/Users/david/Code/Horizon/ml/rl/test/gridworld/test_gridworld_sac.py", line 167, in _test_sac_trainer
    actor_predictor = self.get_actor_predictor(trainer, environment)
  File "/Users/david/Code/Horizon/ml/rl/test/gridworld/test_gridworld_sac.py", line 145, in get_actor_predictor
    trainer.actor_network, feature_extractor, output_transformer
  File "/Users/david/Code/Horizon/ml/rl/training/rl_exporter.py", line 59, in export
    output_transformer=self.output_transformer,
  File "/Users/david/Code/Horizon/ml/rl/models/base.py", line 152, in get_predictor_export_meta_and_workspace
    c2_model, input_blobs, output_blobs = self.get_caffe2_model()
  File "/Users/david/Code/Horizon/ml/rl/models/base.py", line 129, in get_caffe2_model
    caffe2.python.onnx.backend.prepare(model_protobuf),
  File "/Users/david/anaconda/envs/pytorch/lib/python3.6/site-packages/caffe2/python/onnx/backend.py", line 689, in prepare
    super(Caffe2Backend, cls).prepare(model, device, **kwargs)
  File "/Users/david/anaconda/envs/pytorch/lib/python3.6/site-packages/onnx/backend/base.py", line 74, in prepare
    onnx.checker.check_model(model)
  File "/Users/david/anaconda/envs/pytorch/lib/python3.6/site-packages/onnx/checker.py", line 86, in check_model
    C.check_model(model.SerializeToString())
onnx.onnx_cpp2py_export.checker.ValidationError: No Op or Function registered for ConstantFill with domain_version of 9

==> Context: Bad node spec: input: "25" output: "26" op_type: "ConstantFill" attribute { name: "dtype" i: 1 type: INT } attribute { name: "input_as_shape" i: 1 type: INT } attribute { name: "value" f: 0 type: FLOAT } doc_string: "/Users/david/Code/Horizon/ml/rl/models/actor.py(137): forward\n/Users/david/anaconda/envs/pytorch/lib/python3.6/site-packages/torch/nn/modules/module.py(477): _slow_forward\n/Users/david/anaconda/envs/pytorch/lib/python3.6/site-packages/torch/nn/modules/module.py(487): __call__\n/Users/david/Code/Horizon/ml/rl/models/base.py(337): forward\n/Users/david/anaconda/envs/pytorch/lib/python3.6/site-packages/torch/nn/modules/module.py(477): _slow_forward\n/Users/david/anaconda/envs/pytorch/lib/python3.6/site-packages/torch/nn/modules/module.py(487): __call__\n/Users/david/anaconda/envs/pytorch/lib/python3.6/site-packages/torch/jit/__init__.py(253): forward\n/Users/david/anaconda/envs/pytorch/lib/python3.6/site-packages/torch/nn/modules/module.py(489): __call__\n/Users/david/anaconda/envs/pytorch/lib/python3.6/site-packages/torch/jit/__init__.py(198): get_trace_graph\n/Users/david/anaconda/envs/pytorch/lib/python3.6/site-packages/torch/onnx/utils.py(192): _trace_and_get_graph_from_model\n/Users/david/anaconda/envs/pytorch/lib/python3.6/site-packages/torch/onnx/utils.py(224): _model_to_graph\n/Users/david/anaconda/envs/pytorch/lib/python3.6/site-packages/torch/onnx/utils.py(281): _export\n/Users/david/anaconda/envs/pytorch/lib/python3.6/site-packages/torch/onnx/utils.py(104): export\n/Users/david/anaconda/envs/pytorch/lib/python3.6/site-packages/torch/onnx/__init__.py(27): export\n/Users/david/Code/Horizon/ml/rl/models/base.py(120): export_to_buffer\n/Users/david/Code/Horizon/ml/rl/models/base.py(125): get_caffe2_model\n/Users/david/Code/Horizon/ml/rl/models/base.py(152): get_predictor_export_meta_and_workspace\n/Users/david/Code/Horizon/ml/rl/training/rl_exporter.py(59): export\n/Users/david/Code/Horizon/ml/rl/test/gridworld/test_gridworld_sac.py(145): get_actor_predictor\n/Users/david/Code/Horizon/ml/rl/test/gridworld/test_gridworld_sac.py(167): _test_sac_trainer\n/Users/david/Code/Horizon/ml/rl/test/gridworld/test_gridworld_sac.py(205): test_sac_trainer_model_propensity\n/Users/david/anaconda/envs/pytorch/lib/python3.6/unittest/case.py(605): run\n/Users/david/anaconda/envs/pytorch/lib/python3.6/unittest/case.py(653): __call__\n/Users/david/anaconda/envs/pytorch/lib/python3.6/unittest/suite.py(122): run\n/Users/david/anaconda/envs/pytorch/lib/python3.6/unittest/suite.py(84): __call__\n/Users/david/anaconda/envs/pytorch/lib/python3.6/unittest/suite.py(122): run\n/Users/david/anaconda/envs/pytorch/lib/python3.6/unittest/suite.py(84): __call__\n/Users/david/anaconda/envs/pytorch/lib/python3.6/unittest/suite.py(122): run\n/Users/david/anaconda/envs/pytorch/lib/python3.6/unittest/suite.py(84): __call__\n/Users/david/anaconda/envs/pytorch/lib/python3.6/unittest/suite.py(122): run\n/Users/david/anaconda/envs/pytorch/lib/python3.6/unittest/suite.py(84): __call__\n/Users/david/anaconda/envs/pytorch/lib/python3.6/unittest/suite.py(122): run\n/Users/david/anaconda/envs/pytorch/lib/python3.6/unittest/suite.py(84): __call__\n/Users/david/anaconda/envs/pytorch/lib/python3.6/unittest/suite.py(122): run\n/Users/david/anaconda/envs/pytorch/lib/python3.6/unittest/suite.py(84): __call__\n/Users/david/anaconda/envs/pytorch/lib/python3.6/unittest/suite.py(122): run\n/Users/david/anaconda/envs/pytorch/lib/python3.6/unittest/suite.py(84): __call__\n/Users/david/anaconda/envs/pytorch/lib/python3.6/unittest/runner.py(176): run\n/Users/david/anaconda/envs/pytorch/lib/python3.6/unittest/main.py(256): runTests\n/Users/david/anaconda/envs/pytorch/lib/python3.6/unittest/main.py(95): __init__\n/Users/david/anaconda/envs/pytorch/lib/python3.6/site-packages/setuptools/command/test.py(250): run_tests\n/Users/david/anaconda/envs/pytorch/lib/python3.6/site-packages/setuptools/command/test.py(228): run\n/Users/david/anaconda/envs/pytorch/lib/python3.6/distutils/dist.py(974): run_command\n/Users/david/anaconda/envs/pytorch/lib/python3.6/distutils/dist.py(955): run_commands\n/Users/david/anaconda/envs/pytorch/lib/python3.6/distutils/core.py(148): setup\n/Users/david/anaconda/envs/pytorch/lib/python3.6/site-packages/setuptools/__init__.py(143): setup\nsetup.py(27): <module>\n"

For the installation, I have a separate conda environment I call pytorch. I had already installed pytorch before installing Horizon. The requirements seem to install a new pytorch from the nightly build. When I look at my conda environment below, I see both pytorch'es in there.

Also I installed spark via pip install - it got 2.4.0 not the 2.3.1 mentioned in the installation.md

Here is my conda environment, I am on Mac OS/X

(pytorch) ~/code/Horizon$ conda list
# packages in environment at /Users/david/anaconda/envs/pytorch:
#
_tflow_select             2.2.0                     eigen  
absl-py                   0.6.1                    py36_0  
apipkg                    1.5                        py_0    conda-forge
appnope                   0.1.0            py36hf537a9a_0  
astor                     0.7.1                    py36_0  
atari-py                  0.1.7                     <pip>
atomicwrites              1.2.1                      py_0    conda-forge
attrs                     18.2.0                     py_0    conda-forge
backcall                  0.1.0                    py36_0  
blas                      1.1                    openblas    conda-forge
bleach                    3.0.2                    py36_0  
box2d-py                  2.3.8                     <pip>
c-ares                    1.15.0               h1de35cc_1  
ca-certificates           2018.03.07                    0  
certifi                   2018.11.29               py36_0  
cffi                      1.11.5           py36h6174b99_1  
chardet                   3.0.4                     <pip>
cloudpickle               0.6.1                    py36_0  
cycler                    0.10.0           py36hfc81398_0  
cytoolz                   0.9.0.1          py36h1de35cc_1  
dask-core                 1.0.0                    py36_0  
dbus                      1.13.2               h760590f_1  
decorator                 4.3.0                    py36_0  
entrypoints               0.2.3                    py36_2  
execnet                   1.5.0                      py_0    conda-forge
expat                     2.2.6                h0a44026_0  
freetype                  2.9.1                hb4e5f40_0  
future                    0.17.1                    <pip>
gast                      0.2.0                    py36_0  
gettext                   0.19.8.1             h15daf44_3  
glib                      2.56.2               hd9629dc_0  
grpcio                    1.16.0          py36h9011c5e_1000    conda-forge
gym                       0.10.9                    <pip>
h5py                      2.9.0            py36h3134771_0  
hdf5                      1.10.4               hfa1e0ec_0  
icu                       58.2                 h4b95b61_1  
idna                      2.8                       <pip>
imageio                   2.4.1                    py36_0  
intel-openmp              2019.1                      144  
ipykernel                 5.1.0            py36h39e3cac_0  
ipython                   7.2.0            py36h39e3cac_0  
ipython_genutils          0.2.0            py36h241746c_0  
ipywidgets                7.4.2                    py36_0  
jedi                      0.13.2                   py36_0  
jinja2                    2.10                     py36_0  
jpeg                      9b                   he5867d9_2  
jsonschema                2.6.0            py36hb385e00_0  
jupyter                   1.0.0                    py36_7  
jupyter_client            5.2.4                    py36_0  
jupyter_console           6.0.0                    py36_0  
jupyter_core              4.4.0                    py36_0  
keras-applications        1.0.6                    py36_0  
keras-preprocessing       1.0.5                    py36_0  
kiwisolver                1.0.1            py36h0a44026_0  
libcxx                    4.0.1                hcfea43d_1  
libcxxabi                 4.0.1                hcfea43d_1  
libedit                   3.1.20170329         hb402a30_2  
libffi                    3.2.1                h475c297_4  
libgfortran               3.0.1                h93005f0_2  
libiconv                  1.15                 hdd342a3_7  
libopenblas               0.3.3                hdc02c5d_3  
libpng                    1.6.35               ha441bb4_0  
libprotobuf               3.6.1                hd9629dc_0  
libsodium                 1.0.16               h3efe00b_0  
libtiff                   4.0.9                hcb84e12_2  
markdown                  3.0.1                    py36_0  
markupsafe                1.1.0            py36h1de35cc_0  
matplotlib                3.0.2            py36h54f8f79_0  
maven                     3.5.0                         0    conda-forge
mistune                   0.8.4            py36h1de35cc_0  
mkl                       2019.1                      144  
mkl_fft                   1.0.10                   py36_0    conda-forge
mkl_random                1.0.2                    py36_0    conda-forge
more-itertools            4.3.0                 py36_1000    conda-forge
nbconvert                 5.3.1                    py36_0  
nbformat                  4.4.0            py36h827af21_0  
ncurses                   6.1                  h0a44026_1  
networkx                  2.2                      py36_1  
ninja                     1.8.2            py36h04f5b5a_1  
notebook                  5.7.4                    py36_0  
numpy                     1.15.1          py36_blas_openblashd3ea46f_1  [blas_openblas]  conda-forge
numpy-base                1.15.4           py36ha711998_0  
olefile                   0.46                     py36_0  
onnx                      1.4.1                     <pip>
openblas                  0.2.20                        8    conda-forge
openjdk                   8.0.144          zulu8.23.0.3_2    conda-forge
openssl                   1.0.2p            h1de35cc_1002    conda-forge
pandas                    0.23.4          py36h1702cab_1000    conda-forge
pandoc                    2.2.3.2                       0  
pandocfilters             1.4.2                    py36_1  
parso                     0.3.1                    py36_0  
pcre                      8.42                 h378b8a2_0  
pexpect                   4.6.0                    py36_0  
pickleshare               0.7.5                    py36_0  
pillow                    5.3.0            py36hb68e598_0  
pip                       18.1                     py36_0  
pluggy                    0.8.1                      py_0    conda-forge
prometheus_client         0.5.0                    py36_0  
prompt_toolkit            2.0.7                    py36_0  
protobuf                  3.6.1           py36h0a44026_1001    conda-forge
ptyprocess                0.6.0                    py36_0  
py                        1.7.0                      py_0    conda-forge
py4j                      0.10.7                    <pip>
pycparser                 2.19                     py36_0  
pyglet                    1.3.2                     <pip>
pygments                  2.3.1                    py36_0  
PyOpenGL                  3.1.0                     <pip>
pyparsing                 2.3.0                    py36_0  
pyqt                      5.9.2            py36h655552a_2  
pyspark                   2.4.0                     <pip>
pytest                    4.1.1                 py36_1000    conda-forge
pytest-forked             1.0.1                      py_0    conda-forge
pytest-xdist              1.24.0                     py_0    conda-forge
python                    3.6.6             h4a56312_1003    conda-forge
python-dateutil           2.7.5                    py36_0  
pytorch                   1.0.0                   py3.6_1    pytorch
pytorch-nightly           1.0.0.dev20190113         py3.6_0    pytorch
pytz                      2018.7                   py36_0  
pywavelets                1.0.1            py36h1d22016_0  
pyzmq                     17.1.2           py36h1de35cc_0  
qt                        5.9.7                h468cd18_1  
qtconsole                 4.4.3                    py36_0  
readline                  7.0                  h1de35cc_5  
requests                  2.21.0                    <pip>
scikit-image              0.14.1           py36h0a44026_0  
scipy                     1.1.0           py36_blas_openblash7943236_201  [blas_openblas]  conda-forge
send2trash                1.5.0                    py36_0  
setuptools                40.6.3                   py36_0  
sip                       4.19.8           py36h0a44026_0  
six                       1.12.0                   py36_0  
sqlite                    3.26.0               ha441bb4_0  
tensorboard               1.9.0                    py36_0    conda-forge
tensorboardx              1.4                        py_0    conda-forge
tensorflow                1.9.0                    py36_0    conda-forge
tensorflow-base           1.12.0          eigen_py36h4f0eeca_0  
termcolor                 1.1.0                    py36_1  
terminado                 0.8.1                    py36_1  
testpath                  0.4.2                    py36_0  
thrift                    0.10.0                   py36_0    conda-forge
thrift-cpp                0.10.0                        1    conda-forge
tk                        8.6.8                ha441bb4_0  
toolz                     0.9.0                    py36_0  
torchvision               0.2.1                      py_2    pytorch
tornado                   5.1.1            py36h1de35cc_0  
traitlets                 4.3.2            py36h65bd3ce_0  
typing                    3.6.6                     <pip>
typing-extensions         3.7.2                     <pip>
urllib3                   1.24.1                    <pip>
wcwidth                   0.1.7            py36h8c6ec74_0  
webencodings              0.5.1                    py36_1  
werkzeug                  0.14.1                   py36_0  
wheel                     0.32.3                   py36_0  
widgetsnbextension        3.4.2                    py36_0  
xz                        5.2.4                h1de35cc_4  
zeromq                    4.2.5                h0a44026_1  
zlib                      1.2.11               h1de35cc_3  
(pytorch) ~/code/Horizon$

Please fix: unit test failure

======================================================================
FAIL: test_trainer_maxq (ml.rl.test.constant_reward.test_constant_reward.TestConstantReward)

Traceback (most recent call last):
File "/home/Horizon/ml/rl/test/constant_reward/test_constant_reward.py", line 93, in test_trainer_maxq
trainer = self._train(maxq_parameters, env)
File "/home/Horizon/ml/rl/test/constant_reward/test_constant_reward.py", line 30, in _train
trainer = create_dqn_trainer_from_params(model_params, env.normalization)
File "/home/Horizon/ml/rl/workflow/transitional.py", line 97, in create_dqn_trainer_from_params
metrics_to_score=metrics_to_score,
File "/home/Horizon/ml/rl/training/dqn_trainer.py", line 44, in init
actions=parameters.actions,
File "/home/Horizon/ml/rl/training/rl_trainer_pytorch.py", line 74, in init
self.loss_reporter = LossReporter(actions)
File "/home/Horizon/ml/rl/training/loss_reporter.py", line 183, in init
BatchStats.add_custom_scalars(action_names)
File "/home/Horizon/ml/rl/training/loss_reporter.py", line 104, in add_custom_scalars
title="model",
File "/home/Horizon/ml/rl/tensorboardX.py", line 87, in add_custom_scalars_multilinechart
), "Title ({}) is already in category ({})".format(title, category)
AssertionError: Title (model) is already in category (propensities)

I follow the Docker installation and the error came at the last step when i ran the unit test.

Syntax Errors

Hi! Thanks SO much for putting this repo together.

Just pulled and in the tests (and in cart-pole example) getting same error:

    possible_next_actions: np.ndarray, reward_timelines, ds,
                         ^
SyntaxError: invalid syntax

I'm using:

Python 2.7.14
OSX

If it helps here was a full error from test:

======================================================================
ERROR: ml.rl.test.test_open_ai_gym (unittest.loader.ModuleImportFailure)
----------------------------------------------------------------------
ImportError: Failed to import test module: ml.rl.test.test_open_ai_gym
Traceback (most recent call last):
  File "/usr/local/Cellar/python/2.7.14/Frameworks/Python.framework/Versions/2.7/lib/python2.7/unittest/loader.py", line 254, in _find_tests
    module = self._get_module_from_name(name)
  File "/usr/local/Cellar/python/2.7.14/Frameworks/Python.framework/Versions/2.7/lib/python2.7/unittest/loader.py", line 232, in _get_module_from_name
    __import__(name)
  File "ml/rl/test/test_open_ai_gym.py", line 8, in <module>
    from ml.rl.test.gym.open_ai_gym_environment import OpenAIGymEnvironment
  File "ml/rl/test/gym/open_ai_gym_environment.py", line 10, in <module>
    from ml.rl.training.training_data_page import TrainingDataPage
  File "ml/rl/training/training_data_page.py", line 20
    possible_next_actions: np.ndarray, reward_timelines, ds,
                         ^
SyntaxError: invalid syntax

Improve link to article

Please improve the link to the Horizon paper in the repo's readme. For reference, the specific link is https://research.fb.com/publications/horizon-facebooks-open-source-applied-reinforcement-learning-platform/

Tests failing: AttributeError: 'TrainingParameters' object has no attribute 'thrift_spec'

Hey there, after a fresh install and running the tests I'm getting plenty of 'TrainingParameters' object has no attribute 'thrift_spec' errors.

I installed thrift via brew with no problems, I also get no errors when importing thrift in python on it's own. I'm not too sure sure how to continue here, any pointers on what to try?

Also 'fblib' is missing and I'm not quite sure how to obtain it, is it internal to FB?

Adding the errors after running the tests:

WARNING:root:This caffe2 python run does not have GPU support. Will run in CPU only mode.
WARNING:root:Debug message: No module named 'caffe2.python.caffe2_pybind11_state_gpu'
Ignoring @/caffe2/caffe2/fb/operators:replace_values_op as it is not a valid file.
EEEEEEEE.EE.
======================================================================
ERROR: ml.rl.test.gridworld.test_continuous_action_dqn_trainer (unittest.loader._FailedTest)
----------------------------------------------------------------------
ImportError: Failed to import test module: ml.rl.test.gridworld.test_continuous_action_dqn_trainer
Traceback (most recent call last):
  File "/Users/miquelllobet/anaconda/envs/gym/lib/python3.6/unittest/loader.py", line 428, in _find_test_path
    module = self._get_module_from_name(name)
  File "/Users/miquelllobet/anaconda/envs/gym/lib/python3.6/unittest/loader.py", line 369, in _get_module_from_name
    __import__(name)
  File "/Users/miquelllobet/Code/BlueWhale/ml/rl/test/gridworld/test_continuous_action_dqn_trainer.py", line 16, in <module>
    from ml.rl.thrift.core.ttypes import \
  File "/Users/miquelllobet/Code/BlueWhale/ml/rl/thrift/core/ttypes.py", line 394, in <module>
    class DiscreteActionModelParameters(object):
  File "/Users/miquelllobet/Code/BlueWhale/ml/rl/thrift/core/ttypes.py", line 405, in DiscreteActionModelParameters
    }), training=TrainingParameters(**{
  File "/Users/miquelllobet/Code/BlueWhale/ml/rl/thrift/core/ttypes.py", line 148, in __init__
    if layers is self.thrift_spec[4][4]:
AttributeError: 'TrainingParameters' object has no attribute 'thrift_spec'


======================================================================
ERROR: ml.rl.test.gridworld.test_gridworld (unittest.loader._FailedTest)
----------------------------------------------------------------------
ImportError: Failed to import test module: ml.rl.test.gridworld.test_gridworld
Traceback (most recent call last):
  File "/Users/miquelllobet/anaconda/envs/gym/lib/python3.6/unittest/loader.py", line 428, in _find_test_path
    module = self._get_module_from_name(name)
  File "/Users/miquelllobet/anaconda/envs/gym/lib/python3.6/unittest/loader.py", line 369, in _get_module_from_name
    __import__(name)
  File "/Users/miquelllobet/Code/BlueWhale/ml/rl/test/gridworld/test_gridworld.py", line 10, in <module>
    from libfb.py.testutil import data_provider
ModuleNotFoundError: No module named 'libfb'


======================================================================
ERROR: ml.rl.test.gridworld.test_gridworld_continuous (unittest.loader._FailedTest)
----------------------------------------------------------------------
ImportError: Failed to import test module: ml.rl.test.gridworld.test_gridworld_continuous
Traceback (most recent call last):
  File "/Users/miquelllobet/anaconda/envs/gym/lib/python3.6/unittest/loader.py", line 428, in _find_test_path
    module = self._get_module_from_name(name)
  File "/Users/miquelllobet/anaconda/envs/gym/lib/python3.6/unittest/loader.py", line 369, in _get_module_from_name
    __import__(name)
  File "/Users/miquelllobet/Code/BlueWhale/ml/rl/test/gridworld/test_gridworld_continuous.py", line 17, in <module>
    from libfb.py.testutil import data_provider
ModuleNotFoundError: No module named 'libfb'


======================================================================
ERROR: ml.rl.test.gridworld.test_limited_action_gridworld (unittest.loader._FailedTest)
----------------------------------------------------------------------
ImportError: Failed to import test module: ml.rl.test.gridworld.test_limited_action_gridworld
Traceback (most recent call last):
  File "/Users/miquelllobet/anaconda/envs/gym/lib/python3.6/unittest/loader.py", line 428, in _find_test_path
    module = self._get_module_from_name(name)
  File "/Users/miquelllobet/anaconda/envs/gym/lib/python3.6/unittest/loader.py", line 369, in _get_module_from_name
    __import__(name)
  File "/Users/miquelllobet/Code/BlueWhale/ml/rl/test/gridworld/test_limited_action_gridworld.py", line 14, in <module>
    from ml.rl.training.discrete_action_trainer import DiscreteActionTrainer
  File "/Users/miquelllobet/Code/BlueWhale/ml/rl/training/discrete_action_trainer.py", line 18, in <module>
    from ml.rl.thrift.core.ttypes import DiscreteActionModelParameters
  File "/Users/miquelllobet/Code/BlueWhale/ml/rl/thrift/core/ttypes.py", line 394, in <module>
    class DiscreteActionModelParameters(object):
  File "/Users/miquelllobet/Code/BlueWhale/ml/rl/thrift/core/ttypes.py", line 405, in DiscreteActionModelParameters
    }), training=TrainingParameters(**{
  File "/Users/miquelllobet/Code/BlueWhale/ml/rl/thrift/core/ttypes.py", line 148, in __init__
    if layers is self.thrift_spec[4][4]:
AttributeError: 'TrainingParameters' object has no attribute 'thrift_spec'


======================================================================
ERROR: ml.rl.test.gym.test_open_ai_gym (unittest.loader._FailedTest)
----------------------------------------------------------------------
ImportError: Failed to import test module: ml.rl.test.gym.test_open_ai_gym
Traceback (most recent call last):
  File "/Users/miquelllobet/anaconda/envs/gym/lib/python3.6/unittest/loader.py", line 428, in _find_test_path
    module = self._get_module_from_name(name)
  File "/Users/miquelllobet/anaconda/envs/gym/lib/python3.6/unittest/loader.py", line 369, in _get_module_from_name
    __import__(name)
  File "/Users/miquelllobet/Code/BlueWhale/ml/rl/test/gym/test_open_ai_gym.py", line 9, in <module>
    from libfb import parutil  # type: ignore
ModuleNotFoundError: No module named 'libfb'


======================================================================
ERROR: ml.rl.test.test_ml_trainer (unittest.loader._FailedTest)
----------------------------------------------------------------------
ImportError: Failed to import test module: ml.rl.test.test_ml_trainer
Traceback (most recent call last):
  File "/Users/miquelllobet/anaconda/envs/gym/lib/python3.6/unittest/loader.py", line 428, in _find_test_path
    module = self._get_module_from_name(name)
  File "/Users/miquelllobet/anaconda/envs/gym/lib/python3.6/unittest/loader.py", line 369, in _get_module_from_name
    __import__(name)
  File "/Users/miquelllobet/Code/BlueWhale/ml/rl/test/test_ml_trainer.py", line 14, in <module>
    from ml.rl.training.ml_trainer import MLTrainer
  File "/Users/miquelllobet/Code/BlueWhale/ml/rl/training/ml_trainer.py", line 22, in <module>
    from ml.rl.thrift.core.ttypes import TrainingParameters
  File "/Users/miquelllobet/Code/BlueWhale/ml/rl/thrift/core/ttypes.py", line 394, in <module>
    class DiscreteActionModelParameters(object):
  File "/Users/miquelllobet/Code/BlueWhale/ml/rl/thrift/core/ttypes.py", line 405, in DiscreteActionModelParameters
    }), training=TrainingParameters(**{
  File "/Users/miquelllobet/Code/BlueWhale/ml/rl/thrift/core/ttypes.py", line 148, in __init__
    if layers is self.thrift_spec[4][4]:
AttributeError: 'TrainingParameters' object has no attribute 'thrift_spec'


======================================================================
ERROR: test_normalize_dense_matrix_enum (ml.rl.test.test_normalization.TestNormalization)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/miquelllobet/Code/BlueWhale/ml/rl/test/test_normalization.py", line 201, in test_normalize_dense_matrix_enum
    False
  File "/Users/miquelllobet/Code/BlueWhale/ml/rl/preprocessing/preprocessor_net.py", line 374, in prepare_normalization
    reshaped_input_blob, normalization_param
  File "/Users/miquelllobet/Code/BlueWhale/ml/rl/preprocessing/preprocessor_net.py", line 120, in preprocess_blob
    self._net.ReplaceValuesOp(
  File "/Users/miquelllobet/anaconda/envs/gym/lib/python3.6/site-packages/caffe2/python/core.py", line 2040, in __getattr__
    ",".join(workspace.C.nearby_opnames(op_type)) + ']'
AttributeError: Method ReplaceValuesOp is not a registered operator. Did you mean: []

======================================================================
ERROR: test_normalize_feature_map_enum (ml.rl.test.test_normalization.TestNormalization)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/miquelllobet/Code/BlueWhale/ml/rl/test/test_normalization.py", line 151, in test_normalize_feature_map_enum
    False
  File "/Users/miquelllobet/Code/BlueWhale/ml/rl/preprocessing/preprocessor_net.py", line 374, in prepare_normalization
    reshaped_input_blob, normalization_param
  File "/Users/miquelllobet/Code/BlueWhale/ml/rl/preprocessing/preprocessor_net.py", line 120, in preprocess_blob
    self._net.ReplaceValuesOp(
  File "/Users/miquelllobet/anaconda/envs/gym/lib/python3.6/site-packages/caffe2/python/core.py", line 2040, in __getattr__
    ",".join(workspace.C.nearby_opnames(op_type)) + ']'
AttributeError: Method ReplaceValuesOp is not a registered operator. Did you mean: []

======================================================================
ERROR: test_prepare_normalization_and_normalize (ml.rl.test.test_normalization.TestNormalization)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/miquelllobet/Code/BlueWhale/ml/rl/test/test_normalization.py", line 47, in test_prepare_normalization_and_normalize
    False
  File "/Users/miquelllobet/Code/BlueWhale/ml/rl/preprocessing/preprocessor_net.py", line 374, in prepare_normalization
    reshaped_input_blob, normalization_param
  File "/Users/miquelllobet/Code/BlueWhale/ml/rl/preprocessing/preprocessor_net.py", line 120, in preprocess_blob
    self._net.ReplaceValuesOp(
  File "/Users/miquelllobet/anaconda/envs/gym/lib/python3.6/site-packages/caffe2/python/core.py", line 2040, in __getattr__
    ",".join(workspace.C.nearby_opnames(op_type)) + ']'
AttributeError: Method ReplaceValuesOp is not a registered operator. Did you mean: []

======================================================================
ERROR: test_preprocessing_network (ml.rl.test.test_normalization.TestNormalization)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/miquelllobet/Code/BlueWhale/ml/rl/test/test_normalization.py", line 301, in test_preprocessing_network
    feature_name, normalization_parameters[feature_name]
  File "/Users/miquelllobet/Code/BlueWhale/ml/rl/preprocessing/preprocessor_net.py", line 120, in preprocess_blob
    self._net.ReplaceValuesOp(
  File "/Users/miquelllobet/anaconda/envs/gym/lib/python3.6/site-packages/caffe2/python/core.py", line 2040, in __getattr__
    ",".join(workspace.C.nearby_opnames(op_type)) + ']'
AttributeError: Method ReplaceValuesOp is not a registered operator. Did you mean: []

----------------------------------------------------------------------
Ran 12 tests in 0.293s

cuda.DockerfileSpark : ONNX installation error and spark download URL

docker build -f cuda.Dockerfile -t horizon:dev .

While try to build cuda.Dockerfile, it throws an error at

Step 16/29 : RUN pip install onnx
....

....
....
Building wheels for collected packages: onnx
Running setup.py bdist_wheel for onnx: started
Running setup.py bdist_wheel for onnx: finished with status 'error'
Complete output from command /home/miniconda/bin/python -u -c "import setuptools, tokenize;file='/tmp/pip-install-368dwgq1/onnx/setup.py';f=getattr(tokenize, 'open', open)(file);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, file, 'exec'))" bdist_wheel -d /tmp/pip-wheel-a73n9mvb --python-tag cp36:
fatal: not a git repository (or any of the parent directories): .git
....

Failed building wheel for onnx
....

Command "/home/miniconda/bin/python -u -c "import setuptools, tokenize;file='/tmp/pip-install-368dwgq1/onnx/setup.py';f=getattr(tokenize, 'open', open)(file);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, file, 'exec'))" install --record /tmp/pip-record-e2laxxau/install-record.txt --single-version-externally-managed --compile" failed with error code 1 in /tmp/pip-install-368dwgq1/onnx/
The command '/bin/bash -c pip install onnx' returned a non-zero code: 1

And the second issue:

https://github.com/facebookresearch/Horizon/blob/810b89f31204eabe54b354bd7e828d0918ed46c1/docker/cuda.Dockerfile#L66

The URL does not work any more.

Segmentation fault happend when Run OpenAI Gym Examples

python run_rl_gym.py -g CartPole-v0 -l 0.1

args: Namespace(batch_size=128, constraint=False, discount_gamma=0.9, gpu=False, gymenv='CartPole-v0', learn_batch_num_every_iteration=100, learn_every_n_iterations=2, learning_rate=0.1, maxq_learning=True, model_id=u'new', model_type=u'DQN', number_iterations=1000, number_steps_timeout=-1, number_steps_total=1000000, optimizer=u'SGD', path='/home/bzhang/work/Git/reinforcement-learning-models/rlmodels/outputs/', render=False, save_iteration=-1, test=False, upload=False, verbosity=0)
Env gym: CartPole-v0
Env setting: state/action type(shape): Box(4,) Discrete(2)
Model Id: new
Model Type: MODEL_T.DQN
Model Optimizer: SGD
Model NN layers: [64] [u'relu']
test1
test2
*** Aborted at 1508656367 (unix time) try "date -d @1508656367" if you are using GNU date ***
PC: @ 0x7f046e6e2fb4 ZN6caffe210MakeStringIJSsA3_cSsEEESsDpRKT
*** SIGSEGV (@0xffffffffffffffe8) received by PID 7743 (TID 0x7f0487aed740) from PID 18446744073709551592; stack trace: ***
@ 0x7f04876e2330 (unknown)
@ 0x7f046e6e2fb4 ZN6caffe210MakeStringIJSsA3_cSsEEESsDpRKT
@ 0x7f046e6e314a caffe2::enforce_detail::EnforceFailMessage::get_message_and_free()
@ 0x7f046b8f92f6 caffe2::Tensor<>::dim32()
@ 0x7f04661bd18c caffe2::XavierFillOp<>::Fill()
@ 0x7f046619c685 caffe2::FillerOp<>::RunOnDevice()
@ 0x7f046e6e1895 caffe2::Operator<>::Run()
@ 0x7f046601dbb8 caffe2::SimpleNet::RunAsync()
@ 0x7f0466031882 caffe2::Workspace::RunNetOnce()
@ 0x7f046e6dbd2a ZZN8pybind1112cpp_function10initializeIZN6caffe26python16addGlobalMethodsERNS_6moduleEEUlRKNS_5bytesEE22_bIS8_EINS_4nameENS_7siblingENS_5scopeEEEEvOT_PFT0_DpT1_EDpRKT2_ENKUlPNS_6detail15function_recordENS_6handleESR_SR_E1_clESQ_SR_SR_SR.isra.1620
@ 0x7f046e6dbf57 ZZN8pybind1112cpp_function10initializeIZN6caffe26python16addGlobalMethodsERNS_6moduleEEUlRKNS_5bytesEE22_bIS8_EINS_4nameENS_7siblingENS_5scopeEEEEvOT_PFT0_DpT1_EDpRKT2_ENUlPNS_6detail15function_recordENS_6handleESR_SR_E1_4_FUNESQ_SR_SR_SR
@ 0x7f046e6e7b52 pybind11::cpp_function::dispatcher()
@ 0x52714b (unknown)
@ 0x555551 (unknown)
@ 0x525560 (unknown)
@ 0x555551 (unknown)
@ 0x525560 (unknown)
@ 0x5247ea (unknown)
@ 0x568b3a (unknown)
@ 0x4c2604 (unknown)
@ 0x4d1c5c (unknown)
@ 0x55f6db (unknown)
@ 0x5244dd (unknown)
@ 0x555551 (unknown)
@ 0x525560 (unknown)
@ 0x555551 (unknown)
@ 0x525560 (unknown)
@ 0x567d14 (unknown)
@ 0x465bf4 (unknown)
@ 0x46612d (unknown)
@ 0x466d92 (unknown)
@ 0x7f048732af45 (unknown)
Segmentation fault (core dumped)

Incorrect possible_actions for cartpole example

Hi,

I tried running the offline example in usage.md and spotted something that looks funny to me. I ran the spark job to preprocess the training data and looked at the first entry in the resulting json file. The values for possible_actions and possible_next_actions look weird to me - I'd have expected ["0", "1"] but instead it's [1,1] (see below). This is true over the whole file too. Am I misunderstanding something?

Thanks,

Martin.

{
"mdp_id": "129",
"sequence_number": 107,
"propensity": 0.95,
"state_features": {
"0": 0.9470082572631731,
"1": 1.2941002415409237,
"2": 0.11945455504963116,
"3": -0.24271752599511354
},
"action": 1,
"reward": 1.0,
"next_state_features": {
"0": 0.9728902620939915,
"1": 1.4873313444805643,
"2": 0.1146002045297289,
"3": -0.495462494594765
},
"time_diff": 1,
"possible_actions": [
1,
1
],
"possible_next_actions": [
1,
1
],
"next_action": 0,
"metrics": {
"reward": 1.0
}
}

Options to extract Q network weights?

Hi,

I was wondering if there is a way to extract the deep Q network weights from the trained predictor.c2? I was thinking I might be able to get the weights by calling predictor.predict_net first, and then extract weights from the net, but I googled around and could not find a way to get the weights.

Thank you,
Fengdan

[docs] location of `requirements.txt` file isn't specified

Small nitpick that tripped up a colleague for a bit -- the install instructions specify to cd Horizon and then

conda install `cat requirements.txt`

but the req file is located in the docker/ directory.

edp.logged_propensities always have a value 1.

I have a question about cpe calculation.
The value of propensity in log data needed to calculate ips disappears at preprocessing phase.

In more detail, a value of action_probability is needed in calculation of ips. - (1)
But a field action_probability of raw data changed into a field propensity at preprocessing phase. - (2)
So, all value of action_probability have a value 1 because a field action_probability is not in training_data. - (3)

I want to know whether this is intentional or not.

Thank you.

(1) A field logged_propensitities is used in ips calculation.
Horizon/ml/rl/evaluation/doubly_robust_estimator.py

        importance_weight = (
            target_propensity_for_action / edp.logged_propensities
        ).float()

(2) A field action_probability changed into a field name propensity at preprocessing level.
Horizon/preprocessing/src/main/scala/com/facebook/spark/rl/Query.scala

    var query = """
    SELECT
        mdp_id,
        sequence_number,
        action_probability as propensity,
        state_features,
        CASE action
    """

(3) Since, a field action_probability is not in training_data, propensities always have a value 1.
Horizon/ml/rl/workflow/preprocess_handler.py

        if "action_probability" in batch:
            propensities = torch.tensor(
                batch["action_probability"], dtype=torch.float32
            ).reshape(-1, 1)
        else:
            propensities = torch.ones(rewards.shape, dtype=torch.float32)

Multi-node distributed training

Hello,

In description it is stated that Horizon supports distributed training. Are there any example for multi-node distributed training?

Failing to run discrete_dqn_maxq_asteroids_v0

Used commit: 2901f36

When running ml/rl/test/gym/run_gym.py -p=ml/rl/test/gym/discrete_dqn_maxq_asteroids_v0.json (as well as other Atari experiments), I observe the following error:

INFO:main:Running gym with params
INFO:main:{'env': 'Asteroids-v0', 'model_type': 'pytorch_discrete_dqn', 'max_replay_memory_size': 100000, 'rl': {'gamma': 0.99, 'target_update_rate': 0.2, 'maxq_learning': 1, 'epsilon': 0.2, 'temperature': 0.35, 'softmax_policy': 0}, 'rainbow': {'double_q_learning': False, 'dueling_architecture': False}, 'training': {'layers': [-1, 128, 64, -1], 'activations': ['relu', 'relu', 'linear'], 'minibatch_size': 64, 'learning_rate': 0.001, 'optimizer': 'ADAM', 'lr_decay': 0.999, 'cnn_parameters': {'conv_dims': [3, 32, 16], 'conv_height_kernels': [8, 4], 'conv_width_kernels': [8, 4], 'pool_kernels_strides': [2, 2], 'pool_types': ['max', 'max']}}, 'run_details': {'num_episodes': 5001, 'max_steps': 200, 'train_every_ts': 1, 'train_after_ts': 1, 'test_every_ts': 2000, 'test_after_ts': 1, 'num_train_batches': 1, 'avg_over_num_episodes': 100}}
Traceback (most recent call last):
File "/home/pshevche/.vscode/extensions/ms-python.python-2019.4.12954/pythonFiles/ptvsd_launcher.py", line 43, in
main(ptvsdArgs)
File "/home/pshevche/.vscode/extensions/ms-python.python-2019.4.12954/pythonFiles/lib/python/ptvsd/main.py", line 410, in main
run()
File "/home/pshevche/.vscode/extensions/ms-python.python-2019.4.12954/pythonFiles/lib/python/ptvsd/main.py", line 291, in run_file
runpy.run_path(target, run_name='main')
File "/home/pshevche/miniconda3/envs/horizon-dev/lib/python3.6/runpy.py", line 263, in run_path
pkg_name=pkg_name, script_name=fname)
File "/home/pshevche/miniconda3/envs/horizon-dev/lib/python3.6/runpy.py", line 96, in _run_module_code
mod_name, mod_spec, pkg_name, script_name)
File "/home/pshevche/miniconda3/envs/horizon-dev/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/pshevche/Documents/ovgu/master_thesis/develop/horizon/Horizon/ml/rl/test/gym/run_gym.py", line 941, in
main(args[1:])
File "/home/pshevche/Documents/ovgu/master_thesis/develop/horizon/Horizon/ml/rl/test/gym/run_gym.py", line 597, in main
args.path_to_pickled_transitions,
File "/home/pshevche/Documents/ovgu/master_thesis/develop/horizon/Horizon/ml/rl/test/gym/run_gym.py", line 657, in run_gym
minimum_epsilon,
File "/home/pshevche/Documents/ovgu/master_thesis/develop/horizon/Horizon/ml/rl/test/gym/open_ai_gym_environment.py", line 72, in init
for a in range(self.action_dim)]
File "/home/pshevche/Documents/ovgu/master_thesis/develop/horizon/Horizon/ml/rl/test/gym/open_ai_gym_environment.py", line 72, in
for a in range(self.action_dim)]
AttributeError: 'OpenAIGymEnvironment' object has no attribute 'state_dim'

Could you help me fix this issue? Maybe it was already fixed in some other commit? So far I know that the problem originates in open_ai_gym_environment.py, lines 114 - 121. Since the observation space for Atari environments is 3-dimensional, the state_dim attribute is never set, but is used in open_ai_gym_environment.py, line 72 anyways, because Atari environments have discrete actions.

Any help is highly appreciated, thank you in advance!

request for more detailed documentation

documentation is very short. It's not clear how to run this on my dataset.
Moreover, what if my rewards are delayed and accumulative across time for each action, so at time t'>t I get reward events of action at time t.

Looking for simple, less generic examples

Hey Guys,

Thank you again for this repo. The issue I'm having is that its a bit much regarding learning how to use Caffe2 for RL. For comparison, this simple script is pretty helpful to see how to solve cart-pole with PyTorch.

Conceptually, I'm still pretty confused about how to do RL in Caffe2, and its a bit much to get through all of the code in BlueWhale. Maybe a blog post would be better than this repo for the type of example I'm looking for. But I can't find that anywhere. Have you guys seen anything, or do you have anything stashed away that might work? Even as a gist?

For what its worth the part that I'm stuck on is: It seems when you run the model, it goes through all operations (forward pass and then backward pass), but I don't want the backward pass until the episode is complete.

I got started on what I'm asking for above, its here but it only goes up to the forward pass, as I'm not sure how to work with Caffe2 to handle waiting until episode is complete for backward pass.

Thank you again!!

How to configure to run the examples on the CPU?

I followed instructions from here:
https://github.com/facebookresearch/Horizon/blob/master/docs/installation.md
to run Docker image on Mac. However when I am running the example, getting following error:


root@cb58ca621d80:~/Horizon/Horizon# python ml/rl/test/gym/run_gym.py -p ml/rl/test/gym/discrete_dqn_cartpole_v0.json
INFO:__main__:Running gym with params
INFO:__main__:{'env': 'CartPole-v0', 'model_type': 'pytorch_discrete_dqn', 'max_replay_memory_size': 10000, 'use_gpu': False, 'rl': {'gamma': 0.99, 'target_update_rate': 0.1, 'reward_burnin': 1, 'maxq_learning': 1, 'epsilon': 0.05, 'temperature': 0.35, 'softmax_policy': 0}, 'rainbow': {'double_q_learning': False, 'dueling_architecture': False}, 'training': {'layers': [-1, 128, 64, -1], 'activations': ['relu', 'relu', 'linear'], 'minibatch_size': 1024, 'learning_rate': 0.001, 'optimizer': 'ADAM', 'lr_decay': 0.999, 'use_noisy_linear_layers': False}, 'run_details': {'num_episodes': 200, 'max_steps': 200, 'train_every_ts': 1, 'train_after_ts': 1, 'test_every_ts': 2000, 'test_after_ts': 1, 'num_train_batches': 1, 'avg_over_num_episodes': 100, 'offline_train_epochs': 30}}
INFO:ml.rl.training.rl_trainer_pytorch:CUDA availability: False
INFO:ml.rl.training.rl_trainer_pytorch:NOT Using GPU: GPU not requested or not available.
Traceback (most recent call last):
  File "ml/rl/test/gym/run_gym.py", line 850, in <module>
    main(args[1:])
  File "ml/rl/test/gym/run_gym.py", line 564, in main
    args.path_to_pickled_transitions,
  File "ml/rl/test/gym/run_gym.py", line 632, in run_gym
    path_to_pickled_transitions=path_to_pickled_transitions,
  File "ml/rl/test/gym/run_gym.py", line 155, in train
    stop_training_after_solved,
  File "ml/rl/test/gym/run_gym.py", line 428, in train_gym_online_rl
    trainer.train(samples)
  File "/home/Horizon/Horizon/ml/rl/training/dqn_trainer.py", line 150, in train
    loss.backward()
  File "/home/miniconda/lib/python3.6/site-packages/torch/tensor.py", line 107, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph)
  File "/home/miniconda/lib/python3.6/site-packages/torch/autograd/__init__.py", line 93, in backward
    allow_unreachable=True)  # allow_unreachable flag
RuntimeError: CUDA error: CUDA driver version is insufficient for CUDA runtime version

How can I configure to run the example on the CPU?

what's Parametric-Action DQN??

is there any connection between Parametric-Action DQN and this papaer's ddpg model https://arxiv.org/abs/1511.04143?

What does the actions listed in sample configs means?

Hi,

In parameter file dqn_example.json#L17-L20 to train DQN model, the actions are

"actions": [
    "4",
    "5"
  ],

What does it mean given cartpole-v0 has only two actions {0, 1}?

Thanks