karpathy / reinforcejs Goto Github PK

Reinforcement Learning Agents in Javascript (Dynamic Programming, Temporal Difference, Deep Q-Learning, Stochastic/Deterministic Policy Gradients)

JavaScript 34.60% HTML 65.40%

reinforcejs's People

Stargazers

Watchers

Forkers

gaapt peterjliu rtvt123 ospreyx czeinerb fdoperezi sundaylab ajaytalati mathn jorgelamb mryellow nagyistoce codeaudit edersantana sidec chentiejian xkhldy zhangweiabc wangg12 copyfun gradjitta cloudxtreme nagyistge brunogal gnonio vseledkin jamesw6811 belvo hli2020 imclab zzmjohn ciurmy-gianluca cantren pcooksey jordanmicahbennett deeplearnphy putraxor cinneesol rlugojr ignacy130 wulfebw ml-lab subercui cheng-xie gitali fantajeon yenchenlin deepstupid dastjead paulhendricks nazar-ivantsiv calvinalvin harishgp zbxzc35 michaelnkang vibster shangxing2015 lungtakumi arasharchor solaris33 krish240574 arkadiuszsz baxter-cs piandpower benjaminjackman vyraun gdg phpmind mydeeplearning arjunchandra vincentzhang vdpappu pengcheng-wang zyxue thecoons hezhongyu sibi-revi deepalcoholic selimam igor-krawczuk jithsjoy baerxxl cbentes michalliu shahnewazkhan sealionkat yangangchen studentese ingsanchezgarzon vtpp2014 malagori alexxnica kryndex cherishing78 x012 angice mathieuflamant kartechbabu saadmahboob tbryn

reinforcejs's Issues

Breakout Example

In case it's interesting or you have a sample gallery, breakout with deep-q learning (since that's the youtube video example originally shown for such games)
http://4quant.com/javascript-breakout/ repo: https://github.com/4Quant/javascript-breakout

waterworld.js - forward()

Hi,

First of all, thank you very much for sharing great demos!

In waterworld.js, I don't get this part in forward()

  forward: function() {
    // in forward pass the agent simply behaves in the environment
    // create input to brain
   .....
  for(var i=0;i<num_eyes;i++) {
      var e = this.eyes[i];
      input_array[i*5] = 1.0;   // ???
      input_array[i*5+1] = 1.0;  //  ???
      input_array[i*5+2] = 1.0;  //   ???
      input_array[i*5+3] = e.vx; // velocity information of the sensed target
      input_array[i*5+4] = e.vy;
      if(e.sensed_type !== -1) {
        // sensed_type is 0 for wall, 1 for food and 2 for poison.
        // lets do a 1-of-k encoding into the input array
        input_array[i*5 + e.sensed_type] = e.sensed_proximity/e.max_range; // normalize to [0,1]
      }
  }

I don't understand why the first three inputs are all 1.0. Shouldn't it be the type of sensed object or something?

On the demo page, it says:

The agent has 30 eye sensors pointing in all directions and in each direction is observes 5 variables: the range, the type of sensed object (green, red), and the velocity of the sensed object. The agent's proprioception includes two additional sensors for its own speed in both x and y directions. This is a total of 152-dimensional state space.

How to export and re-use agent brain information?

I made a program which trains the agent. But now if I want to export the data and reimport the data, how can I do that?

I want something like this:

var data = agent.exportData(); // this might give me a obj of the data for its NN or whatever

Then I just save it to a txt file or database.

Then later, I can import it like this

agent.SetData(data); // imports the data

This way I don't have to re-train it.

Does anyone know how to do something like this?

Thanks

Continuous control with deep reinforcement learning

hi, I wonder if the below paper algorithms can be added to the library.
Many thanks for your help.
Andrew

http://arxiv.org/abs/1509.02971

Globals and exports

Any reason for deciding to go with globals over checking for browser and deliveringwindow.RL or module.exports.RL?

Something you'd accept as a PR, if it was done in a way that fits?

act and learn input ranges

Should all state inputs to act be 0<=stateX<1?
Should all reward inputs be 0<=reward<1?
Is there any way to get out "nope, that wasn't a good reply. I want a second opinion!" (second place answer, etc)

Could throw error when input array mismatches getNumStates

I had neglected to set getNumStates yet nothing complains.

Guess it's an extra check every time you give an input array to act, a comparison of sizes in each call to setFrom would likely be over the top...

Could validate in Agent.forward before passing to DQNAgent.act, but then if doing it there, then why not in act.

Maybe there is a solution in checking the sizes once for the first call to act?

GridWorld: TD, Demo Page: Cannot reset cell reward to 0.00 once changed

After changing a cell's reward, one can never change it back to 0.00. The least possible amount to be chosen is always -0.1 or 0.1.

Missing link to API Docs

Can't find the API Docs. Where are they?

Multiple layers of neurons?

How hard would it be to implement this?

I'm trying ReinforceJS the 2048 game here: https://github.com/NullVoxPopuli/doctor-who-thirteen-game-ai/blob/master/worker.js#L105

and I've noticed a couple things:

the ai gets to it's best score (of not very high) pretty quickly
it seems to have trouble beating its best score
achieving the best score is likely a fluke of the random nature of tile spawns

Additionally,

I'm not sure how long I should expect training to take
is a day too long?

idk :D

Math.tanh failures in node.js

The current recommended version of node.js (4.5.0) uses v8 version (4.5.103.37) which has a
Math.tanh function that will return NaN for some inputs. The issue and suggsested fix are here:
http://stackoverflow.com/questions/34835641/tanh-returning-nan-for-large-input

is this worth doing a PR?

Reinforcejs VS ConvNetjs

SARSA not working in GridWorld_td

Using the default agent parameters, but set spec.update to 'sarsa', the model simply does not converge to the optimal solution.

// agent parameter spec to play with (this gets eval()'d on Agent reset)
var spec = {}
spec.update = 'sarsa'; // 'qlearn' or 'sarsa'
spec.gamma = 0.9; // discount factor, [0, 1)
spec.epsilon = 0.2; // initial epsilon for epsilon-greedy policy, [0, 1)
spec.alpha = 0.1; // value function learning rate
spec.lambda = 0.1; // eligibility trace decay, [0,1). 0 = no eligibility traces
spec.replacing_traces = true; // use replacing or accumulating traces
spec.planN = 0; // number of planning steps per iteration. 0 = no planning

spec.smooth_policy_update = true; // non-standard, updates policy smoothly to follow max_a Q
spec.beta = 0.1; // learning rate for smooth policy update

Multiple Workers

Is it possible to get this to run with multiple workers ?
Is there a paper I can look at that explains how this is done ?

hidden_size is most likely undefined on line 460...

reinforcejs/lib/rl.js

Line 460 in 08d2030

model['Whd'] = new RandMat(output_size, hidden_size, 0, 0.08);

On line 460 there is a call to make a new random matrix, and the second argument is hidden_size which is only actually defined inside the for loop above it, meaning it should resolve the argument d to undefined when calling the RandMat function because it is out of scope.

I only caught this because I am porting your rl.js to c++ and when doing a unit test, came across this one. I then realized most likely javascript would have let this one slip right under your nose, or anyone's noses, as these RL learners are so good at learning regardless of coding errors.

I hope one day you update as this is a 7 year old repository. I can imagine what you have learned about RL in 7 years and working with Tesla. It would be amazing to see some of the newer stuff like the recent DeepMind paper about continuous action space and their "Director" agent. This lib could provide even more generalization. Anyway, thanks for your wonderful code and keep up the great work. I like the way you go about things, and I can also imagine while porting this, that quite possibly you already made this in c++ and were actually porting it to JS as you refer to some things in comments as structs.

Thanks, and I hope you see this.

Using DQN

Hello,

In the example library usage, the environment is created with number of states and also maximum number of actions possible.

For example, if I have 5 possible states, which is defined with 2 values, and 3 possible actions, which is also defined with 2 values, what should be the relevant variables of environment object? Moreover, what should be the given state array in 'act(state)' method.

I normally have more states and actions however I couldn't get the general idea behind it.

P.S. I know that this question is more suitable for StackOverFlow, however I also think that it would be beneficial for other new people.

Saving trained agent with agent.toJSON()

Hi,

does anyone ever tried to get a proper JSON of the agent after you trained it. My agent's net object does contain nothing as far as i can see.

LaTeX is not rendered properly

@karpathy, for some reasons LaTeX equations are not being rendered properly. Below is a snippet of GridWorld: Dynamic Programming Demo page

Tex equations not loading on website

I am on chrome latest version on a mac os on the website
https://cs.stanford.edu/people/karpathy/reinforcejs/gridworld_td.html
and I don't see the tex equations being rendered properly

Looks from here: https://tex.stackexchange.com/questions/299523/how-can-i-compile-tex-code-appearing-on-websites, that MathJAX library could help

GridWorld: TD, Demo Page: Discounted Reward greater than 1.0?

Using the initial settings, how can the discounted reward of the center field be 1.1? The max reward the agent can get is 1.0 and then the goal is reached and the agent is reset.

Also, if changing the field below to R 1.0, I'd expect the discounted reward to be 10 instead of 9.9:

and here 50 instead of 49.90:

Question about strategy

Fantastic library!
I have a ton of questions, most of which likely have answers along the lines of "it depends" :) But, the top questions:

In a basic game with many invalid moves (don't crash into a wall, don't play an invalid move, etc) and only a few valid moves, is it normally better to let the system "work out" the rules? I was considering an alternate of only offering the agent a list of valid moves and having it pick among them, but intuitively it seems more confusing - a small shift in valid moves would "off-by-one" the list, and that would be hard for the agent to learn.
I combined a few examples for the spec. Are any of these "bad" in a general game solver?

    spec.update = 'qlearn'; // 'qlearn' or 'sarsa'
    spec.gamma = 0.9; // discount factor, [0, 1)
    spec.epsilon = 0.2; // initial epsilon for epsilon-greedy policy, [0, 1)
    spec.lambda = 0.8; // eligibility trace decay, [0,1). 0 = no eligibility traces
    spec.replacing_traces = false; // use replacing or accumulating traces
    spec.planN = 50; // number of planning steps per iteration. 0 = no planning
    spec.smooth_policy_update = true; // non-standard, updates policy smoothly to follow max_a Q
    spec.beta = 0.1; // learning rate for smooth policy update
    spec.alpha = 0.005; // value function learning rate
    spec.experience_add_every = 5; // number of time steps before we add another experience to replay memory
    spec.experience_size = 10000; // size of experience
    spec.learning_steps_per_iteration = 5;
    spec.tderror_clamp = 1.0; // for robustness
    spec.num_hidden_units = 100; // number of neurons in hidden layer

In your examples, you have

env.getNumStates = function() {
      return 9;
    };
env.getMaxNumActions...

Any particular reason to have it be a function? (relates back to my #1 question)

Failed to load file

For the problem waterworld, when I click the button 'Load a Pretrained Agent', it prompts

Failed to load file:///C:/Users/xuxiyang/Desktop/reinforcejs-master/agentzoo/wateragent.json: Cross origin requests are only supported for protocol schemes: http, data, chrome, chrome-extension, https.
jquery-2.1.3.min.js:4

Anyone knows how to resolve this and load the saved data?

Question about DQN inputs

I am trying to understand the inputs for the example given here
http://cs.stanford.edu/people/karpathy/reinforcejs/index.html

env.getNumStates()
This is the size of the vector that represents the variables of the current game configuration?

env.getMaxNumActions
For this one, is it the total number of configurations the game can have? Or is it the number of actions the play can currently do in the current game configuration, such as in a grid maze, the player has up to 4 directions to move, so it would be 4.

Inside the "setInterval" function, "s" is not defined. It is the vector of the variables of the current game configuration that I have to get myself?

And "reward" is something I have to calculate too based on the current "s" vector?

Also why is getNumStates and getMaxNumActions a function, when they seem to return a constant value? Is it supposed to support returning dynamic values? Can the vector size be allowed to be different at anytime? And the Max num of actions, is that dynamic too?

karpathy / reinforcejs Goto Github PK

reinforcejs's People

Stargazers

Watchers

Forkers

reinforcejs's Issues

Recommend Projects

Recommend Topics

Recommend Org