-
cs18s038_PA1 Course Project
- Implementation of epsilon-greedy and related plots.
- Implementation of soft-max algorithm and related plots.
- Implementation of UCB1 algorithm and comparing with epsilon-greedy and soft-maxand related plots.
- Implementation of Median algorithm and comparing with epsilon-greedy, soft-maxand UCB1 and related plots.
- Comparison of the above four algorithm as the number of arm grows.
Note: For more detailed analysis of each observation and inferernce drawn from it, Please have a look at full report Click here.
-
cs18s038_PA2 Course Project
- Implementation of Sarsa for different goals and related plots in puddle world.
- Implementation of Q-Learning for different goals and related plots in puddle world.
- Implementation of Sarsa Lambda for different goals and related plots in puddle world.
-
Implementation of Policy gradient on above environments. Experiment involves-
- Hyper parameter tuning.
- Value function visualisation.
- Trajectory and policy.
- Inference from observation .
Note: For more detailed analysis of each observation and inferernce drawn from it, Please have a look at full report Click here.
3.cs18s038_PA3 Course Project
- Implementation of SMDP-Q Learning.
- Implementation of Intra option Q-Learning.
- Visualisation of Q-values.
- Visualisation of V-values.
- Analysis of Observations and Inference
Problem solved using DQN model. Experiment involves-
- Implementation of DQN and related plots.
- Best hyperparameters.
- Observations and inference.
- Playing with Replay memory and Target network .
Note: For more detailed analysis of each observation and inferernce drawn from it, Please have a look at full report Click here.