Giter Club home page Giter Club logo

robot_semantics's Introduction

Robot_Semantics

This is the official implementation of attn-seq2seq-cat decribed in our paper:

"Understanding Contexts Inside Joint Robot and Human Manipulation Tasks through Vision-Language Model with Ontology Constraints in a Video Streamline"

Update (2021-1-15): Create a wiki page to keep track of updated model scores after fixes. Please refer to the scores there to compare against our models in the paper.

Update (2021-1-6): A proper pre-trained model is updated.

Update (2021-1-3): Major codebase updates. This repo should work smoothly now.

Update (2020-12-19): We have uploaded and updated annotations for a complete release of our RS-RGBD dataset! Access the wiki page to check out more. Updated evaluation scores and pre-trained models will be updated in future.

Requirements

  • PyTorch (tested on 1.4)
  • TorchVision with PIL
  • numpy
  • OpenCV (tested with 4.1.0)
  • Jupyter Notebook
  • coco-caption, a modified version is used to support Python3
  • Owlready2
  • Graphviz

Experiments

To repeat the experiments on our Robot Semantics Dataset:

  1. Clone the repository.

  2. Download the Robot Semantics Dataset, check our wiki page for more details. Please extract the dataset and setup the directory path as`:

├── root_dir
|   ├── data
|   |   ├── RS-RGBD
|   |   |   ├── human_grasp_pour
|   |   |   ├── human_point_and_intend
|   |   |   ├── wam_grasp_pour
|   |   |   ├── wam_point_and_intend
|   |   |   ├── eval_human_grasp_pour
|   |   |   ├── eval_wam_grasp_pour
|   |   |   ├── eval_wam_grasp_pour_complex
  1. To extract features from pre-trained CNNs, under the folder experiment_RS-RGBD/offline_feat, run extract_features.py to sample offline dataset videos into clips for training and evaluation.

  2. Select a branch to repreat the experiment (Please check our paper for detailed experiment settings). Under the folder experiment_RS-RGBD/offline_feat, run generate_clips.py to sample offline dataset videos into clips for training and evaluation.

  3. To begin training, run train.py. Modify rs/config.py accordingly to adjust the hyperparameters.

  4. For evaluation, firstly run evaluate.py to generate predictions given all saved checkpoints. Run cocoeval.py to calculate scores for the predictions. Best scoring model will be moved to root_dir/results_RS-RGBD/.

To repeat the experiments on IIT-V2C Dataset, follow up the instructions in my other repository.

Demo

We offer pretrained models with our attention vision-language model, refer to the benchmark page and download the one you want. Put the downloaded model inside path: robot_semantics/checkpoints/:

├── root_dir
|   ├── checkpoint
|   |   ├── vocab.pkl
|   |   ├── saved
|   |   |   ├── v2l_trained.pth

A jupyter notebook to visualize attentions and the knowledge graph given outputs from the Vision-Language model. File is under robot_semantics/experiments/demo.

Some demos for visual attentions from our vision-language model:

Additional Note

Please leave me an issue if you find any potential bugs inside the code.

If you find this repository useful, please give me a star and consider citing:

@article{jiang2020understanding,
  title={Understanding Contexts Inside Robot and Human Manipulation Tasks through a Vision-Language Model and Ontology System in a Video Stream},
  author={Jiang, Chen and Dehghan, Masood and Jagersand, Martin},
  journal={arXiv preprint arXiv:2003.01163},
  year={2020}
}

robot_semantics's People

Contributors

cjiang2 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

robot_semantics's Issues

The structure of the datasets

hi,
I am interested in the great job you have done, and I want to go through it. I have downloaded your dataset from google driver. But how should the data in the dataset be structured to run your experiments?

About Dataset

Hi,
Good job!
When will you release your dataset?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.