REALCompetitionStartingKit

This repository provides the "Starting Kit" to partecipate in NeurIPS 2019 - Robot open-Ended Autonomous Learning competition.

Installation
Basic usage
Sandbox
Competition Task

Install

The installation requires Python 3.5+. The Starting Kit was tested on Ubuntu (>= Ubuntu 16.04) but it can be also run on other operating systems.

Linux

To install the REAL Competition Starting Kit on linux

install gym and pybullet packages:
```
pip install gym pybullet pyopengl
```

download the REALCompetitionStartingKit repo:

git clone https://github.com/GOAL-Robots/REALCompetitionStartingKit.git

install the REALCompetitionStartingKit package:

cd REALCompetitionStartingKit
pip install -e .

Windows - anaconda

To install the REAL Competition Starting Kit on windows in the anaconda enviroment

install microsoft Visual Studio c++14 - community at https://visualstudio.microsoft.com/visual-cpp-build-tools/
install the anaconda environment for windows at https://www.anaconda.com/distribution/#windows
create a python virtual environment
```
conda create -n pyenv numpy pip
```
activate the virtual environment
```
conda activate pyenv
```
install gym and pybullet packages:
```
pip install gym pybullet pyopengl
```

download the REALCompetitionStartingKit repo:

git clone https://github.com/GOAL-Robots/REALCompetitionStartingKit.git

install the REALCompetitionStartingKit package:

cd REALCompetitionStartingKit
pip install -e .

Basic usage

The environment is a standard gym environment and can be called alone as shown here:

env = gym.make('REALComp-v0')

observation = env.reset()  
for t in range(10):
    
    # Call your controller to chose action 
    action = controller.step(observation, reward, done)
    
    # do action
    observation, reward, done, _ = env.step(action)

where the controller is any object with a step() attribute returning an action vector. A exammple type of the controller is given by this simple class:

class FakePolicy:
    """
    A fake controller chosing random actions
    """
    def __init__(self, action_space):
        self.action_space = action_space
        self.action = np.zeros(action_space.shape[0])

    def step(self, observation, reward, done):
        """
        Returns a vector of random values
        """
        self.action += 0.1*np.pi*np.random.randn(self.action_space.shape[0])
        return self.action

It includes a 7DoF kuka arm with a 2Dof gripper, a table with 3 objects on it and a camera looking at the table from the top. The gripper has four touch sensors on the inner part of its links.

Action

The actionattribute of env.step must be a vector of 9 joint positions in radiants. The first 7 joints have a range between -Pi/2 and +Pi/2. The two gripper joints have a range between 0 and +Pi/2. They are also coupled so that the second joint will be at most twice the angle of the first one.

index	joint name
0	lbr_iiwa_joint_1
1	lbr_iiwa_joint_2
2	lbr_iiwa_joint_3
3	lbr_iiwa_joint_4
4	lbr_iiwa_joint_5
5	lbr_iiwa_joint_6
6	lbr_iiwa_joint_7
7	base_to_finger0_joint
8	finger0_to_finger1_joint

Observation

The observation object returned byenv.step is a dictionary:

observation["joint_positions"] is a vector containing the current angles of the 9 joints
observation["touch_sensors"] is a vector containing the current touch intensity at the four touch sensors (see figure below)
observation["retina"] is a 240x320x3 array with the current top camera image
observation["goal"] is a 240x320x3 array with the target top camera image (all zeros except for the extrinsic phase, see below the task description)

For each sensor, intensity is defined as the maximum force that was exerted on it at the current timestep.

Reward

The reward value returned byenv.step is always put to 0.

Done

The done value returned byenv.step is set to True only when a phase is concluded (see below - intrinsic and extrinsic phases)

Sandbox

The environment can be also used in a sandbox. In realcomp_env specs you find an explanation of methods needed to read the objects, links, contacts, and other stuff. Using those methods in the final version of your controller is not permitted, but it might be useful while testing.

Competition Task

A complete simulation for the REAL Competition is made of two phases:

Intrinsic phase: No goal is given and the controller can do whatever it needs to explore and learn something from the environment. This phase will last 10 million timesteps.
Extrinsic phase: divided in trials. On each trial a goal is given and the controller must chose the actions that modify the environment so that the state corresponding to the goal is reached within 1000 timesteps.

realcomp/task/demo.py runs the entire simulation. The participants are supposed to substitute the MyController object in realcomp/task/my_controller.py with their own controller object.

Running demo.py also returns an extrinsic score for local evaluation.

aicrowd / realcompetitionstartingkit Goto Github PK