Fractal AI: A Fragile Theory of Intelligence

Please note this project is under active development and may change over time. Consider it as an open beta.

Once you start doubting, just like you’re supposed to doubt, you ask me if the science is true. You say no, we don’t know what’s true, we’re trying to find out and everything is possibly wrong.

–Richard P. Feynman, The Pleasure of Finding Things Out: The Best Short Works of Richard P. Feynman.

Abstract
Quick Start
Running the code
Benchmarks
Additional Resources
Cite us
FAQ
About the Authors
Todo
Bibliography

Abstract

Fractal AI (arXiv#1, arXiv#2) is a theory for efficiently sampling state spaces. It allows one to derive new mathematical tools that could be useful for modeling information using cellular automaton-like structures instead of smooth functions.

In this repository we present Fractal Monte Carlo (FMC), a new planning algorithm derived from the first principles of Fractal AI theory. A FMC agent is capable of solving Atari-2600 games under the OpenAI Gym several orders of magnitude more efficiently than similar planning algorithms, such as Monte Carlo Tree Search (MCTS) [1].

We also present a more advanced Swarm Wave implementation, also derived from Fractal AI principles, that allows one to solve Markov decision processes under a perfect/informative model of the environment. This implementation is far more efficient than FMC, effectively "solving" a substantial number of Atari games.

The code provided under this repository exemplifies how it is now possible to beat some of the current state-of-the-art benchmarks on Atari games while generating a large set of top-performing examples with little computation required, turning Reinforcement Learning (RL) into a supervised problem.

These new algorithms propose a new approach to modeling the decision space, while maintaining control over any aspects of the agent's behavior. The algorithms can be applied to all combinations of discrete or continuous decision and state spaces.

Quick Start

To quickly understand the fundamentals of Fractal AI you can refer to the Introduction to FAI. The document provides a brief explanation of the algorithms here presented and their potential applications on the field of Reinforcement Learning.

To test how the Fractal Monte Carlo Agent performs on any Atari game you can refer to the FMC example notebook. This example allows us to run games using either the RAM content or the pixel render as observations.

To better understand how the Swarm Wave algorithm works in practice you can refer to the Swarm Wave example notebook.

Pleas note the authors are open to discuss the ideas and code here presented under the conceptual framework of Reinforcement Learning and its standard terminology.

Installation

The code provided aims to be both simple and self-explanatory. Requirements and instructions to set up the environment are provided below.

Requirements

Python 3. Python 2 is not supported nor currently expected to be supported.
Python numpy library.
Python OpenAI Gym [Atari] [2].
- OpenAI Gym dependencies.
(Optional) Jupyter Notebook for running the example notebooks provided.

Installing dependencies

As a first step, install the dependencies as explained on the OpenAI gym documentation:

To install the full set of environments, you'll need to have some system packages installed. We'll build out the list here over time; please let us know what you end up installing on your platform. In case you want to run the notebook:

pip3 install jupyter

On OSX:

brew install cmake boost boost-python sdl2 swig wget

On Ubuntu 14.04:

sudo apt-get install -y python-numpy python-dev cmake zlib1g-dev libjpeg-dev xvfb libav-tools xorg-dev python-opengl libboost-all-dev libsdl2-dev swig libav-tools

Cloning and Installing the FractalAI Repository

On the terminal, run:

git clone [email protected]:FragileTheory/FractalAI.git
cd FractalAI
sudo pip3 install -e .

Benchmarks

It doesn't matter how beautiful your theory is, it doesn't matter how smart you are.

If it doesn't agree with experiment, it's wrong.

–Richard P. Feynman

We used a standard set of 50 Atari-2600 games, common to all the planning algorithms articles found in the literature, to compare our implementation of the FMC algorithm against:

Standard Human: a professional game tester after 2 hours of trainnig, as reported in [5].
World Human Record: the maximum score achieved by a human player, as reported in [8].
Planning SOtA: the best score achieved by any "State of the Art" planning algorithms (Full Tree, MCTS UCT, IW(1), p-IW(1), R.p-IW(1), 2BSF, BrFS), as reported in [12] [13] [14] [15] [16] [17].
Learning SOtA: the best score achieved by any "State of the Art" learning algorithms (Random, DQN, C51 DQN, NoisyNet-DQN, Dueling, NoisyNet-Dueling, A3C, NoisyNet-A3C, A2C, HyperNEAT, ES FF), as reported in [1], [3], [4], [5], [6], [7].
Hidden score limit: many games do not support scoring above 1M and reset score down to zero after 999,999 is reached. In most cases the limit was totally unknow as no one -human or algorithm- had ever been able to reach this limit before.

	FMC Wins	%
FMC vs Standard Human	49 / 50	98%
FMC vs World Human Record	32 / 50	64%
FMC vs Planning SOtA (1)	50 / 50	100%
FMC vs Learning SOtA	45 / 50	90%
FMC vs Hidden score limit	16 / 50	32%

(1) On average, the Swarm Wave version of FMC used 360 times fewer samples per action than the rest of planning algorithm, typically using 150k samples per action.

Fractal Monte Carlo Agent Performance Table

The following table depicts the Fractal Monte Carlo Agent performance on each tested game.

Game	Human Record	Planning SOtA	Learning SOtA	FMC
Alien	251916	38951	7967	479940
Amidar	155339	3122	4058	5779
Assault	8647	1970	11734	14472
Asterix (*)	335500	319667	406211	999500
Asteroids	10004100	68345	167159	12575000
Atlantis	7352737	198510	2872644.8	10000100
Bank Heist	199978	1171	1496	3139
Battle Zone (*)	863000	330880	53742	999000
Bean Rider (*)	999999	12243	21077	999999
Berzerk	1057940	2096	2500	17610
Bowling	300	69	135.8	180
Boxing	100	100	100	100
Breakout	752	772	748	864
Centipede	1301709	193799	25275.2	1351000
Chopper Command (*)	999900	34097	15600	999900
Crazy Climber	447000	141840	179877	2254100
Demon Attack (*)	999970	34405	130955	999970
Double Dunk	24	24	24	24
Enduro	3617.9	788	3454	5279
Fishing Derby	71	42	59	63
Freeway	34	32	34	33
Frostbyte (*)	552590	6427	4442	999960
Gopher (*)	120000	26297	41138	999980
Gravitar	1673950	6520	2308	14050
Hero	1000000	15280	105929.4	43255
Ice Hockey	36	62	10.6	64
Jamesbond	45550	23070	6963	152950
Kangaroo	1436500	8760	15470	10800
Krull	1006680	15788	35024	426534
Kung fu master	1000000	86290	79676	172600
Montezuma's Revenge	1219200	500	4739.6	5600
Ms. Pacman (*)	290090	30785	5913	999990
Name this Game	25220	15410	12542	53010
Pong	21	21	21	21
Private Eye	103100	2544	40908.2	41760
QBert ()	999975	44876	27543	999975
River Raid	194940	15410	24568	18510
Road Runner (*)	999900	120923	367023	999900
Robotank	74	75	65	94
Seaquest (*)	527160	35009	266434	999999
Space Invaders	621535	3974	7227	17970
Star Gunner (*)	77400	14193	84490	999800
Tennis	24	24	23.1	24
Time Pilot	66500	65213	18501	90000
Tutankham	3493	226	288	342
Up and Down (*)	168830	120200	155049	999999
Venture	31900	1200	1520	1500
Video Pinball (*)	999999	471859	999999	999999
Wizard of Wor (*)	99900	161640	16143	99900
Zaxxon	100000	39687	18057	92100

(*) Games with the "1 Million bug" where max. score is hard-limited.

Detailed Performance Sheet

We provide a more detailed Google Docs spreadsheet where the performance of the Fractal Monte Carlo Agent is logged relative to the current alternatives. In the spreadsheet we also provide the parameters used in each of the runs.

If you find any outdated benchmarks or for some reason you are unable to replicate some of our results, please open an issue and we will update the document accordingly.

Fractal AI performance sheet

Additional Resources

Theoretical Foundations

Fractal AI: A Fragile Theory of Intelligence: This document explains the fundamental principles of the Fractal AI theory in which our Agent is based. We worked all the fundamental principles completely from scratch to build our own solution. We try to be consistent with existing terminology, and this document should contain everything you need to understand the theory. Comments on how to better explain the content are appreciated.

Solving Atari Games Using Fractals And Entropy: A short version of the article written by Spiros Baxevanakis and submitted -under very high uncertaintly- to NIPS2018.

Blog

EntropicAI, Sergio Hernández Cerezo's blog: Here you can find the evolution of the research process for developing this algorithm, documented and explained, as well as experiments which aim to apply the theory to other fields of research.

YouTube

Fractal AI playlist: In the Youtube playlist you can find videos of the accomplishments over the years. Besides the recordings Atari games using the Agent, you can find videos recorded using a custom library that allows one to create different tasks in continuous control environments, as well as visualizations of how the Agent samples the state space.

Related Research

GAS paper [9]: A manuscript describing an application of the Fractal AI theory on general optimization problems. There are certainly better ways to apply the theory such problems, yet it illustrates why code is better than maths to explain the theory. When trying to formalize it, things can get really non-intuitive.

Causal Entropic Forces by Alexander Wissner-Gross [10]: The fundamental concepts behind this paper inspired the present research. We develop our theory aiming to calculate future entropy more quickly and being able to leverage the information contained in the Entropy of any state space, together with any potential function.

Cite us

@misc{1803.05049,
    Author = {Sergio Hernández Cerezo and Guillem Duran Ballester},
    Title = {Fractal AI: A fragile theory of intelligence},
    Year = {2018},
    Eprint = {arXiv:1803.05049},
  }

FAQ

As questions regarding the research and methodology we will address them under the FAQ.

You can refer to the FAQ document.

About the Authors

Authors:

Sergio Hernández Cerezo: Studied mathematics, works as programmer, dreams about physics.
Guillem Duran Ballester: Rogue scientist, loves learning and teaching. Currently looking for work opportunities related to AI.

The authors have developed the theory as personal side projects driven purely by intellectual curiosity. Guillem worked on it while attending college, and Sergio while working as a programmer. The authors are not part of academia, have no corporate affiliation and no formal track record.

All the time and resources involved came from the authors themselves, besides the support from:

HCSoft, which supported our research and believed the ideas since the very beginning.
source{d}, which kindly sponsored the project for 6 months.

We currently do not have the resources to further carry our research. We will gladly accept contributions or sponsorships that allow us to continue working with what is our passion.

Special thanks: We want to thank all the people who has believed in us along the years. Their patience, understanding and support made possible for this project to evolve to this point.

Our families, HCSoft, Guillem's parents: Joan and Francisca, Eulàlia Veny, and Fina.
The people at source{d}, specially Eiso Kant, Waren Long, Vadim Markovtsev, Marcelo Novaes, and Egor Bulychev.
Everyone who believed in our "alien math" since Guillem was in college, specially José M. Amigó, Antoni Elias, Jose Berengueres, Javier Ozón, Samuel Graván, and Marc Garcia.

TODO

We are actively working in improving this project, and we welcome all contributions. Some of the to-dos in our roadmap:

Build a new benchmarking tool for large scale testing with confidence intervals.
Making the repository more friendly to academia.
Improving the Introduction to Fractal AI document.
Improving code clarity and docstrings.
Providing a command line interface (CLI).
Uploading the project to pip and Conda package managers.
Creating a Docker container for ease of use.

Bibliography

[1] Guo, Xiaoxiao and Singh, Satinder and Lee, Honglak and Lewis, Richard L and Wang, Xiaoshi. Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Search Planning. NIPS2014_5421, 2014.
[2] Greg Brockman and Vicki Cheung and Ludwig Pettersson and Jonas Schneider and John Schulman and Jie Tang and Wojciech Zaremba. OpenAI Gym . arXiv:1606.01540, 2016.
[3] Marc G. Bellemare, Will Dabney Rémi Munos. A Distributional Perspective on Reinforcement Learning. arXiv:1707.06887, 2017.
[4] Meire Fortunato, Mohammad Gheshlaghi Azar, Bilal Piot, Jacob Menick, Matteo Hessel, Ian Osband, Alex Graves, Vlad Mnih, Remi Munos, Demis Hassabis, Olivier Pietquin, Charles Blundell, Shane Legg. Noisy networks for exploration. arXiv:1706.10295, 2018.
[5] Volodymyr Mnih & others. Human-level control through deep reinforcement learning. doi:10.1038/nature14236, 2015.
[6] Matthias Plappert, Rein Houthooft, Prafulla Dhariwal, Szymon Sidor, Richard Y. Chen, Xi Chen, Tamim Asfour, Pieter Abbeel, Marcin Andrychowicz. Parameter Space Noise for Exploration. arXiv:1706.01905, 2017.
[7] Tim Salimans, Jonathan Ho, Xi Chen, Szymon Sidor, Ilya Sutskever. Evolution Strategies as a Scalable Alternative to Reinforcement Learning. arXiv:1703.03864, 2017.
[8] ATARI VCS/2600 Scoreboard. Atari compendium, 2018.
[9] Sergio Hernández, Guillem Duran, José M. Amigó. General Algorithmic Search. arXiv:1705.08691, 2017.
[10] Alexander Wissner-Gross. Causal entropic forces. Physical Review Letters, 2013.
[11] Shane Legg Machine Super Intelligence. Doctoral Dissertation , 2008.
[12] Foley, Daniel John *Model-Based Reinforcement Learning in Atari 2600 Games. Thesis, 2017.
[13] Justin Fu, Irving Hsu *Model-Based Reinforcement Learning for Playing Atari Games. Stanford report, 2016.
[14] Xiaoxiao Guo et al. *Deep Learning for real-time Atari game play using offline MCTS planning. NIPS-2014, 2014.
[15] Marc G. Bellemare, Yavar Naddaf, Joel Veness, Michael Bowling *The Arcade Learning Environment: An Evaluation Platform for General Agents. arxiv:1207.4708, 2013.
[16] Alexander Shleyfman, Alexander Tuisov, Carmel Domshlak *Blind Search for Atari-Like Online Planning Revisited. IJCAI-16, 2016.
[17] Nir Lipovetzky, Miquel Ramirez, Hector Geffner *Classical Planning with Simulators: Results on the Atari Video Games. IJCAI-15, 2015.
[18] Wilmer Bandres, Blai Bonet, Hector Geffner *Planning with Pixels in (Almost) Real Time. arXiv:1801.03354, 2018.

downseq / fractalai Goto Github PK

fractalai's Introduction