Giter Club home page Giter Club logo

deepracer-simapp's People

Contributors

anjrew avatar dependabot[bot] avatar gitobic avatar jamesjennens avatar larsll avatar mattcamp avatar richardfan1126 avatar ronzohan avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

deepracer-simapp's Issues

System fail transition between Evaluation and Training phases

During training, at the phase where sagemaker does policy training and robomaker does evaluation runs the, the system occasionally fails to transition back to train a new iteration once the policy training is done and the final evaluation run is completed.

Last lines of Sagemaker logs:

Policy training> Surrogate loss=-0.018119553104043007, KL divergence=0.0014306496595963836, Entropy=0.3204973340034485, training epoch=8, learning_rate=1e-06
Policy training> Surrogate loss=-0.015547456219792366, KL divergence=0.0013924918603152037, Entropy=0.3194952607154846, training epoch=9, learning_rate=1e-06
Checkpoint> Saving in path=['./checkpoint/411_Step-87087.ckpt']
Uploaded 3 files for checkpoint 411 in 0.95 seconds
saved intermediate frozen graph: Champs-May-12/model/model_411.pb
Best checkpoint number: 398, Last checkpoint number: 409
Copying the frozen checkpoint from ./frozen_models/agent/model_398.pb to /opt/ml/model/agent/model.pb.
Deleting the frozen models in s3 for the iterations: {'408'}

Last lines of Robomaker logs:

DEBUG: s: 167.0, wp_p: 144, wp_n: 145, wp_f: 5, rew: 0.0, prog: 0.94, saf: 172.0, eff: 1.66, d1: 23.04, d2: 69.02
DEBUG: s: 168.0, wp_p: 145, wp_n: 146, wp_f: 6, rew: 0.0, prog: 0.94, saf: 173.0, eff: 1.63, d1: 30.23, d2: 72.07
Testing> Name=main_level/agent, Worker=0, Episode=760, Total reward=338.25, Steps=88095, Training iteration=0
## agent: Finished evaluation phase. Success rate = 0.0, Avg Total Reward = 338.25

Reward function in this case is outputting one line per step; it is clear that there is no evaluation ongoing. It seems as if the two systems are waiting on each other to progress.

In the Video Stream one sees the car stand in a corner (against a wall) or driving in circles, depending on the last action that was sent. Environment was not reset.

Issue is seen in maybe 1% of transitions between evaluation and training.

AWS_Track hangs at startup

I have tried running most of the worlds (particularly Las Vegas, Spain, Canada) but AWS Track does not seem to load correctly. It hangs hanging on the find file function and then terminates.

Rogue circuit track

When would the rogue circuit track would be uploaded. It is a track for AWS student.

KVS starts with errors even if disabled

[ERROR] [1625995045.682140900, 12.518000000]: [KinesisVideoStreamSetup] Skipping stream id 0 due to failure initializing stream. Error code: 4100 Failed to setup the kinesis video streamer[ERROR] [1625995045.682206300, 12.518000000]: [InitializeStreamSubscriptions] KinesisVideoStreamerSetup failed with error code : 4096. Exiting

Cloning model takes best, not last checkpoint.

When cloning a model the "best_checkpoint" and not the "last_checkpoint" is used. This can cause a significant 'backwards' jump in training, esp. as the best checkpoint only uses the completion as a metric.

See: 48c2065/bundle/sagemaker_rl_agent/lib/python3.5/site-packages/markov/training_worker.py#L319

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.