Giter Club home page Giter Club logo

llara's Introduction

LLaRA: Large Language and Robotics Assistant

llara

LLaRA: Supercharging Robot Learning Data for Vision-Language Policy [Arxiv]

Xiang Li1, Cristina Mata1, Jongwoo Park1, Kumara Kahatapitiya1, Yoo Sung Jang1, Jinghuan Shang1, Kanchana Ranasinghe1, Ryan Burgert1, Mu Cai2, Yong Jae Lee2, and Michael S. Ryoo1

1Stony Brook University 2University of Wisconsin-Madison

Installation

  1. Set Up Python Environment:

    Follow the instructions to install the same Python environment as used by LLaVA. Details are available here.

  2. Replace LLaVA Implementation:

    Navigate to train-llava in this repo and install the llava package there:

    cd train-llava && pip install -e .
    
  3. Install VIMABench:

    Complete the setup for VIMABench.

Demo

  1. Download the Pretrained Model:

    Download the following model to ./checkpoints/

    • llava-1.5-7b-D-inBC + Aux(B) trained on VIMA-80k Hugging Face

    More models are available at Model Zoo

  2. Run the evaluation:

    cd eval
    # evaluate the model with oracle object detector
    python3 eval-llara.py D-inBC-AuxB-VIMA-80k --model-path ../checkpoints/llava-1.5-7b-llara-D-inBC-Aux-B-VIMA-80k --prompt-mode hso
    
    # the results will be saved to ../results/[hso]D-inBC-AuxB-VIMA-80k.json
    
  3. Check the results: Please refer to llara-result.ipynb

Quick Start Guide

  1. Prepare the Dataset:

    Visit the datasets directory to prepare your dataset for training.

  2. Finetune a LLaVA Model:

    To start finetuning a LLaVA model, refer to the instructions in train-llava.

  3. Evaluate the Trained Model:

    Follow the steps in eval to assess the performance of your trained model.

  4. Train a MaskRCNN for Object Detection:

    If you want to train a MaskRCNN for object detection, check out train-maskrcnn for detailed steps.

Issues

If you encounter any issues or have questions about the project, please submit an issue on our GitHub issues page.

License

This project is licensed under the Apache-2.0 License - see the LICENSE file for details.

Support us

If you find this work useful in your research, please consider giving it a star ⭐ and cite our work:

@article{li2024llara,
  title={LLaRA: Supercharging Robot Learning Data for Vision-Language Policy},
  author={Li, Xiang and Mata, Cristina and Park, Jongwoo and Kahatapitiya, Kumara and Jang, Yoo Sung and Shang, Jinghuan and Ranasinghe, Kanchana and Burgert, Ryan and Cai, Mu and Lee, Yong Jae and Ryoo, Michael S.},
  journal={arXiv preprint arXiv:2406.20095},
  year={2024}
}

Thanks!

llara's People

Contributors

lostxine avatar eltociear avatar

Stargazers

ChipsICU avatar  avatar Huy Anh Nguyen avatar faded avatar  avatar Janamejay Sharma avatar Haonan avatar Michał Damięcki avatar wuyujack (Mingfu Liang) avatar tanhuajie avatar Run-Ze Fan avatar  avatar Miles avatar  avatar Zekun Qi avatar Damiano Bonaccorsi avatar kelvin34501 avatar ganzhiruyi avatar SpyderZSY avatar TatsuyaAoki avatar  avatar Chuanchen Luo avatar Joel Jang  avatar  avatar  avatar Akshat avatar  avatar Qiang (Jony) Zhang avatar Hejia Zhang avatar Qi Lv avatar Jintao avatar  avatar ququ avatar  avatar Trevor Ablett avatar Faraz Ahmed avatar Chris Vaisnor avatar Sejong Yang avatar Jihoon Oh avatar Damon Hamm avatar SokhengDin avatar Sujay Kapadnis avatar Rez avatar Ryo Okada avatar Michael Zonneveldt avatar  avatar Edward Ngo avatar  avatar LanLingXiaoXiaoSheng avatar Hirokazu Ishida avatar Yuki Furuta avatar Wemersive, Inc avatar  avatar Paul Crouch avatar Adebayo Akinlalu avatar Md. Muhaimin Rahman avatar Christopher Nelson avatar Paolo Faccini avatar 庄庭达 avatar elucida avatar fun_dl avatar Shangjin_Xie avatar WuDuidi avatar Wen_Youpeng avatar Varun Belagali avatar zzp avatar ilnehc avatar Rei avatar Zubair Irshad avatar Ray Sun avatar Yan Ma avatar  avatar Qinyuan Cheng avatar Edd avatar seven8827 avatar Park Sang kil avatar Jie Feng avatar Dominick Reilly avatar Gilhwan Kang avatar  avatar Lei Zhou avatar Zhifeng Gu avatar EACH avatar 唐国梁Tommy avatar Amir Lau avatar Leibing Xiao avatar YC avatar kyle avatar Jongwoo Park avatar Feng Chen avatar Chris Taylor avatar Jia Zeng avatar Yuming Jiang avatar Tongjia avatar Rui Shao avatar Jiazhi Yang avatar 秋刀鱼谷 avatar  avatar Gihyeon Lee avatar Yoon, Seungje avatar

Watchers

 avatar Jinghuan Shang avatar Kanchana Ranasinghe avatar Ryan Burgert avatar Edward Ngo avatar

llara's Issues

read trajectory.pkl error

Thanks for releasing the code for the great work.
I download VIMA data, when i read trajectory.pkl will appear
ModuleNotFoundError: No module named 'vimasim' error.
How should I solve this? Thanks!

Problem about replicating results

What's a nice job! And code is easy to run.
While I have some problems in replicating eval results. Here is my process.

  1. I have download ckpt llava-1.5-7b-D-inBC + Aux(B) trained on VIMA-80k Hugging Face;

  2. Then build a new empty directory myresults and use command cd eval && python3 eval-llara.py D-inBC-AuxB-VIMA-80k --model-path ../checkpoints/llava-1.5-7b-llara-D-inBC-Aux-B-VIMA-80k --prompt-mode hso --output-path ../myresults/;

  3. I also cp results/llara-result.ipynb to ./myresults;

  4. In ./myresults directory, I run llara-result.ipynb to get the final result, but the result is too bad;
    image

Doing Step 2 and 3 is to get a new json result.

What mistakes happend in my process? Could anyone point out for me?

Besides, thanks authors to share training logs for us. I found the learning rate is changing among training according to ./checkpoints/llava-1.5-7b-llara-D-inBC-Aux-B-VIMA-80k/trainer_state.json. Which schedual is used in training? Following your guide, the learning rate is always 2e-05 except warm up stage.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.