Giter Club home page Giter Club logo

alexkalergis / robot-writing-via-reinforcement-learning Goto Github PK

View Code? Open in Web Editor NEW
0.0 3.0 0.0 29.13 MB

• Master Thesis • In this project we aim to create a virtual environment of a 2DOF robot learning handwriting via reinforcement learning.

Home Page: https://hdl.handle.net/10889/25769

License: GNU General Public License v3.0

Python 100.00%
artificial-intelligence deep-q-learning deep-q-network machine-learning qlearning-algorithm reinforcement-learning robotics robot-writing

robot-writing-via-reinforcement-learning's Introduction

image

Project Overview

The aim of this project is to solve a problem of a robotic hand trying to learn how to write alphabetical characters and/or other shapes. We created a physical 2 DOF robotic system that represents a human arm that only operates in x-axis and y-axis. The methodology we used is an adaptation of reinforcement learning and specifically the Q-Learning algorithm as an easier to understand, use and have the desired results with less time consuming and complexity. The robot writing though is not just a physical approach, it has also impact in humanity. Robot writing is a task that could be used for helping purposes in human with disabilities or/and in early education. Further studies about reinforcement learning in robotic systems could also bring even better impact in humanity.

Q-Learning

Model-free reinforcement learning, it can also be viewed as a method of asynchronous dynamic programming. It provides agents with the capability of learning to act optimally in Markovian domains by experiencing the consequences of actions, without requiring them to build maps of the domains.

Algorithm 1: Q-learning

Input: episodes, α,γ
Output: Q
Initialize : set Q(s,a) arbitrarily, for each s in S and a in A;
set Q(terminalstate,∙)=0;
repeat for each episode
  Initialize S ;
  repeat for each step of episode
    Choose A from S using ε-greedy policy;
    Take action A, observe R,S′ ;
    Q(sₜ, aₜ) ← Q(sₜ, aₜ) + α * (rₜ₊₁ + γ * max(Q(sₜ₊₁, a')) - Q(sₜ, aₜ))
    S⟵S′ ;
  until S is a terminalstate ;
until all episodes are visited ;
  • State (𝑆): The state of the agent in the environment.

  • Action (𝐴): A set of actions which the agent can perform.

  • Reward (𝑅): For each action selected by the agent, the environment provides a reward. Usually, a scalar value.

  • Discount factor (𝛾): This has the effect of valuing rewards received earlier higher than those received later (reflecting the value of a "good start"). 𝛾 may also be interpreted as the probability to succeed.

  • Episodes: The end of the stage, where agents can’t take new action. It happens when the agent has achieved the goal or failed.

  • Quality Function: Q(sₜ, aₜ) ← Q(sₜ, aₜ) + α * (rₜ₊₁ + γ * max(Q(sₜ₊₁, a')) - Q(sₜ, aₜ))

    • Q(sₜ₊₁, aₜ): Expected optimal Q-value of doing the action in a particular state.
    • Q(sₜ, aₜ): The current estimation of Q(sₜ₊₁, aₜ).
  • Q-Table: The agent maintains the Q-table of sets of states and actions.

  • Temporal Differences (TD): Used to estimate the expected value of ( Q(s_{t+1}, a_t) ), by using the current state and action and previous state and action.

Deep Q-Learning

An extension of the Q-learning algorithm that combines reinforcement learning with deep neural networks. It was introduced by DeepMind in 2013 and has been widely used for solving complex problems in RL. The key idea behind DQN is to use a deep neural network to approximate the Q-values instead of a traditional Q-table. This allows DQN to handle high-dimensional and continuous state spaces, making it suitable for tasks such as image recognition or game playing.

Algorithm 2: Deep Q-learning

Input: episodes, α,γ
Output: Q
Initialize 𝑄−𝑁𝑒𝑡𝑤𝑜𝑟𝑘, 𝑇𝑎𝑟𝑔𝑒𝑡−𝑁𝑒𝑡𝑤𝑜𝑟𝑘, Memory D and ε
repeat for each episode
  Initialize S ;
  repeat for each step of episode
    Choose A from S using ε-greedy policy;
    Take action A, observe R,S′ ;
    Store this experience (𝐴,𝑅,𝑆,𝑆′) in Memory D ;
    Choose a random minibatch 𝐵 from experiences in D ;
    Train 𝑄−𝑁𝑒𝑡𝑤𝑜𝑟𝑘
      Choose random samples from stored experiences
      Update 𝑄−𝑁𝑒𝑡𝑤𝑜𝑟𝑘 based on experiences
    Update 𝑇𝑎𝑟𝑔𝑒𝑡−𝑁𝑒𝑡𝑤𝑜𝑟𝑘 with knowledge from 𝑄−𝑁𝑒𝑡𝑤𝑜𝑟𝑘
    S⟵S′ ;
  until S is a terminalstate ;
until all episodes are visited ;
image

Methodology

In our case we want to simulate the physical part of a 2 DOF robotic system. It tries to simulate a human arm but without the accessibility in the z-axis. The architecture of this system is shown in the image below.

Robot Architecture 2
  • Action Space: The most tricky and challenging part working with value-based RL algorithms was the discretization of action space. The bigger and finer discretization means difficulty in learning process and bigger time and computational cost. The action space we created to solve this problem is by making it more suitable for our shape in each case and reduce the limits and then with trial and error we tried to fine the best discretization step.

  • State Space: The most tricky and challenging part working with value-based RL algorithms was the discretization of action space. The bigger and finer discretization means difficulty in learning process and bigger time and computational cost. The action space we created to solve this problem is by making it more suitable for our shape in each case and reduce the limits and then with trial and error we tried to fine the best discretization step.

    image
  • Reward Function: The learning process in our problem is getting closer to the desired shape. So we created a reward function that computes the Euclidean distance between the pen position and the desired target point. With the help of Q-Learning that works with steps in the second for loop in its algorithm we managed to break down each shape into number of points and so discretize the shape. So, in every step the rewards function computes the distance between the target point and the pen position and returns as feedback the 1/(1 + 𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒). If the 𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒 > 0.5 reward is subtracted by 10 and if 𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒 < 0.1 reward is increased by 10. Another reward functions could calculate the similarity of the shapes in every step using advanced algorithms.

Results

Results.mp4

robot-writing-via-reinforcement-learning's People

Contributors

alexkalergis avatar

Watchers

 avatar Kostas Georgiou avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.