The rl-project from harsh306

The rl-project Video Final Presentation

RL project at WPI for ML (CS 539) class by Prof. Kyumin Lee

Project Idea/Proposal Link

Literature Survey

Our research is inspired by these two papers.

Progressive GANs Link

Author's Abstract: We describe a new training methodology for generative adversarial networks. The key idea is to grow both the generator and discriminator progressively: starting from a low resolution, we add new layers that model increasingly fine details as training progresses. This both speeds the training up and greatly stabilizes it, allowing us to produce images of unprecedented quality, e.g., CelebA images at 1024^2. We also propose a simple way to increase the variation in generated images, and achieve a record inception score of 8.80 in unsupervised CIFAR10.

Teacher Student Curriculum Learning Link

Author's Abstract: We propose Teacher-Student Curriculum Learning (TSCL), a framework for automatic curriculum learning, where the Student tries to learn a complex task and the Teacher automatically chooses subtasks from a given set for the Student to train on. We describe a family of Teacher algorithms that rely on the intuition that the Student should practice more those tasks on which it makes the fastest progress, i.e. where the slope of the learning curve is highest. In addition, the Teacher algorithms address the problem of forgetting by also choosing tasks where the Student's performance is getting worse. We demonstrate that TSCL matches or surpasses the results of carefully hand-crafted curricula in two tasks: addition of decimal numbers with LSTM and navigation in Minecraft.

Curriculum startegies

We leveraged the idea from TSCL method. We list some limitations we noticed in their work. First, they assume all the tasks are in the same domain(game), which ignores the opportunity to learn from any other domains (games). Second, the authors do not consider the model capacity w.r.t to the difficulty of the tasks which means the model will adjust the structure or the number of parameters of the model based on the tasks’ difficulty. Third, all the experiments shown have a limited number of subtasks, we would like to see if the original hypothesis changes if there are thousands of subtasks. Fourth, they have a hyperparameter for the number of steps a student should train on each sub-task, without any empirical or analytical evidence. Finally, these subtasks are pre-defined by experts, however these subtasks can also be generated from a generative model i.e it can automatically generate the optimal environment for the student to learn, this will also enable the student to learn more complex tasks indefinitely. Our contribution is to come-up with environment ideas that can be easily scaled and programtically develop more complex tasks indefinitely. In particular we show how simple Pacman and coinrun game can be used to that very effect.

Pacman

Coinrun Blog

Coinrun have multiple levels of games, and here also teacher can select what level should be provided to the student agent to learn based on TSCL stratergies.

Experiments

In our experiments, Teacher is a sampling method that selects the difficulty level of the given game. And Student is our DQN agent (NN model).

Pacman

Standalone agent learning from 1, 2, and 3 enemy in 5x5 grid in the plots below respectively.

In y-axis there is the mean-avg reward [100-window]. x-axis is the number of episodes.

Validation results

Coinrun

The mean episode reward for TSCL:

The probability of each action:

Action 1
Action 2
Action 3
Action 4
Action 5
Action 6

About the team

Harsh Nilesh Pathak [Bio]

Currently, I am Data Scientist at Expedia and also a Ph.D. student at WPI. I am doing my research on continuation methods for Deep Learning Optimization with Prof. Randy Paffenroth. I have worked on a diverse set of applications of computer vision for example, GANs, Image classification, compression and object detection. Also, in NLP I have done industry projects of one plus year length. These include Text classification, Named Entity Recognition, Text similarity and Learning to rank frameworks.

Yichuan Li

Yichuan Li is a Ph.D. student at Worcester Polytechnic Institue, now he is affiliated with Infolabs under the supervision of Prof. Kyumn Lee. Research interests: Data Mining, Machine Learning, Social Computing.

Thejus Jose

I am a Masters student in the Robotics Engineering program at Worcester Polytechnic Institute. My current research is focused on robot learning through human demonstration under the guidance of Prof. Jane Li. In the past, I have worked on system identification, motion planning and human robot interaction

Paurvi Dixit

Master's Student at Worcester Polytechnic Institue.

Qihuan Aixinjueluo

Student at Worcester Polytechnic Institue.

Extras

How to use Github and usual flow, please read this Link
Github Setup Link
Prefer using Pycharm Community for Free Link

harsh306 / rl-project Goto Github PK

rl-project's Introduction

The rl-project Video Final Presentation

Project Idea/Proposal Link

Literature Survey

Curriculum startegies

Pacman

Coinrun Blog

Experiments

Pacman

Coinrun

About the team

Harsh Nilesh Pathak [Bio]

Yichuan Li

Thejus Jose

Paurvi Dixit

Qihuan Aixinjueluo

Extras

rl-project's People

Contributors

Stargazers

Watchers

Forkers

Recommend Projects

Recommend Topics

Recommend Org