karpathy / reinforcejs Goto Github PK
View Code? Open in Web Editor NEWReinforcement Learning Agents in Javascript (Dynamic Programming, Temporal Difference, Deep Q-Learning, Stochastic/Deterministic Policy Gradients)
Reinforcement Learning Agents in Javascript (Dynamic Programming, Temporal Difference, Deep Q-Learning, Stochastic/Deterministic Policy Gradients)
In case it's interesting or you have a sample gallery, breakout with deep-q learning (since that's the youtube video example originally shown for such games)
http://4quant.com/javascript-breakout/ repo: https://github.com/4Quant/javascript-breakout
Hi,
First of all, thank you very much for sharing great demos!
In waterworld.js, I don't get this part in forward()
forward: function() {
// in forward pass the agent simply behaves in the environment
// create input to brain
.....
for(var i=0;i<num_eyes;i++) {
var e = this.eyes[i];
input_array[i*5] = 1.0; // ???
input_array[i*5+1] = 1.0; // ???
input_array[i*5+2] = 1.0; // ???
input_array[i*5+3] = e.vx; // velocity information of the sensed target
input_array[i*5+4] = e.vy;
if(e.sensed_type !== -1) {
// sensed_type is 0 for wall, 1 for food and 2 for poison.
// lets do a 1-of-k encoding into the input array
input_array[i*5 + e.sensed_type] = e.sensed_proximity/e.max_range; // normalize to [0,1]
}
}
I don't understand why the first three inputs are all 1.0. Shouldn't it be the type of sensed object or something?
On the demo page, it says:
The agent has 30 eye sensors pointing in all directions and in each direction is observes 5 variables: the range, the type of sensed object (green, red), and the velocity of the sensed object. The agent's proprioception includes two additional sensors for its own speed in both x and y directions. This is a total of 152-dimensional state space.
I made a program which trains the agent. But now if I want to export the data and reimport the data, how can I do that?
I want something like this:
var data = agent.exportData(); // this might give me a obj of the data for its NN or whatever
Then I just save it to a txt file or database.
Then later, I can import it like this
agent.SetData(data); // imports the data
This way I don't have to re-train it.
Does anyone know how to do something like this?
Thanks
hi, I wonder if the below paper algorithms can be added to the library.
Many thanks for your help.
Andrew
Any reason for deciding to go with globals over checking for browser and deliveringwindow.RL
or module.exports.RL
?
Something you'd accept as a PR, if it was done in a way that fits?
I had neglected to set getNumStates
yet nothing complains.
Guess it's an extra check every time you give an input array to act
, a comparison of sizes in each call to setFrom
would likely be over the top...
Could validate in Agent.forward
before passing to DQNAgent.act
, but then if doing it there, then why not in act
.
Maybe there is a solution in checking the sizes once for the first call to act
?
Can't find the API Docs. Where are they?
How hard would it be to implement this?
I'm trying ReinforceJS the 2048 game here: https://github.com/NullVoxPopuli/doctor-who-thirteen-game-ai/blob/master/worker.js#L105
and I've noticed a couple things:
Additionally,
idk :D
The current recommended version of node.js (4.5.0) uses v8 version (4.5.103.37) which has a
Math.tanh function that will return NaN for some inputs. The issue and suggsested fix are here:
http://stackoverflow.com/questions/34835641/tanh-returning-nan-for-large-input
is this worth doing a PR?
Using the default agent parameters, but set spec.update to 'sarsa', the model simply does not converge to the optimal solution.
// agent parameter spec to play with (this gets eval()'d on Agent reset)
var spec = {}
spec.update = 'sarsa'; // 'qlearn' or 'sarsa'
spec.gamma = 0.9; // discount factor, [0, 1)
spec.epsilon = 0.2; // initial epsilon for epsilon-greedy policy, [0, 1)
spec.alpha = 0.1; // value function learning rate
spec.lambda = 0.1; // eligibility trace decay, [0,1). 0 = no eligibility traces
spec.replacing_traces = true; // use replacing or accumulating traces
spec.planN = 0; // number of planning steps per iteration. 0 = no planning
spec.smooth_policy_update = true; // non-standard, updates policy smoothly to follow max_a Q
spec.beta = 0.1; // learning rate for smooth policy update
Is it possible to get this to run with multiple workers ?
Is there a paper I can look at that explains how this is done ?
Line 460 in 08d2030
On line 460 there is a call to make a new random matrix, and the second argument is hidden_size
which is only actually defined inside the for loop above it, meaning it should resolve the argument d
to undefined
when calling the RandMat function because it is out of scope.
I only caught this because I am porting your rl.js to c++ and when doing a unit test, came across this one. I then realized most likely javascript would have let this one slip right under your nose, or anyone's noses, as these RL learners are so good at learning regardless of coding errors.
I hope one day you update as this is a 7 year old repository. I can imagine what you have learned about RL in 7 years and working with Tesla. It would be amazing to see some of the newer stuff like the recent DeepMind paper about continuous action space and their "Director" agent. This lib could provide even more generalization. Anyway, thanks for your wonderful code and keep up the great work. I like the way you go about things, and I can also imagine while porting this, that quite possibly you already made this in c++ and were actually porting it to JS as you refer to some things in comments as structs
.
Thanks, and I hope you see this.
Hello,
In the example library usage, the environment is created with number of states and also maximum number of actions possible.
For example, if I have 5 possible states, which is defined with 2 values, and 3 possible actions, which is also defined with 2 values, what should be the relevant variables of environment object? Moreover, what should be the given state array in 'act(state)' method.
I normally have more states and actions however I couldn't get the general idea behind it.
P.S. I know that this question is more suitable for StackOverFlow, however I also think that it would be beneficial for other new people.
Hi,
does anyone ever tried to get a proper JSON of the agent after you trained it. My agent's net object does contain nothing as far as i can see.
@karpathy, for some reasons LaTeX equations are not being rendered properly. Below is a snippet of GridWorld: Dynamic Programming Demo page
I am on chrome latest version on a mac os on the website
https://cs.stanford.edu/people/karpathy/reinforcejs/gridworld_td.html
and I don't see the tex equations being rendered properly
Looks from here: https://tex.stackexchange.com/questions/299523/how-can-i-compile-tex-code-appearing-on-websites, that MathJAX library could help
Fantastic library!
I have a ton of questions, most of which likely have answers along the lines of "it depends" :) But, the top questions:
spec.update = 'qlearn'; // 'qlearn' or 'sarsa'
spec.gamma = 0.9; // discount factor, [0, 1)
spec.epsilon = 0.2; // initial epsilon for epsilon-greedy policy, [0, 1)
spec.lambda = 0.8; // eligibility trace decay, [0,1). 0 = no eligibility traces
spec.replacing_traces = false; // use replacing or accumulating traces
spec.planN = 50; // number of planning steps per iteration. 0 = no planning
spec.smooth_policy_update = true; // non-standard, updates policy smoothly to follow max_a Q
spec.beta = 0.1; // learning rate for smooth policy update
spec.alpha = 0.005; // value function learning rate
spec.experience_add_every = 5; // number of time steps before we add another experience to replay memory
spec.experience_size = 10000; // size of experience
spec.learning_steps_per_iteration = 5;
spec.tderror_clamp = 1.0; // for robustness
spec.num_hidden_units = 100; // number of neurons in hidden layer
env.getNumStates = function() {
return 9;
};
env.getMaxNumActions...
Any particular reason to have it be a function? (relates back to my #1 question)
For the problem waterworld, when I click the button 'Load a Pretrained Agent', it prompts
Failed to load file:///C:/Users/xuxiyang/Desktop/reinforcejs-master/agentzoo/wateragent.json: Cross origin requests are only supported for protocol schemes: http, data, chrome, chrome-extension, https.
jquery-2.1.3.min.js:4
Anyone knows how to resolve this and load the saved data?
I am trying to understand the inputs for the example given here
http://cs.stanford.edu/people/karpathy/reinforcejs/index.html
env.getNumStates()
This is the size of the vector that represents the variables of the current game configuration?
env.getMaxNumActions
For this one, is it the total number of configurations the game can have? Or is it the number of actions the play can currently do in the current game configuration, such as in a grid maze, the player has up to 4 directions to move, so it would be 4.
Inside the "setInterval" function, "s" is not defined. It is the vector of the variables of the current game configuration that I have to get myself?
And "reward" is something I have to calculate too based on the current "s" vector?
Also why is getNumStates and getMaxNumActions a function, when they seem to return a constant value? Is it supposed to support returning dynamic values? Can the vector size be allowed to be different at anytime? And the Max num of actions, is that dynamic too?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.