Giter Club home page Giter Club logo

autodroid's Introduction

AutoDroid

About

This repository contains the code for the system for the paper: Empowering LLM to use Smartphone for Intelligent Task Automation.

For accessing the dataset DroidTask, you could download it from Google Cloud, and you could refer to the About Dataset Section.

AutoDroid is implemented based on the DroidBot framework.

How to install

Make sure you have:

  1. Python
  2. Java
  3. Android SDK
  4. Added platform_tools directory in Android SDK to PATH

Then clone this repo and install with pip:

git clone [email protected]:MobileLLM/AutoDroid.git
cd AutoDroid/
pip install -e .

How to use

  1. Prepare:

    • If you want to test AutoDroid with the apps used in our paper, please download the apk.zip folder from Google Cloud, and unzip it, and prepare a device or an emulator connected to your host machine via adb.
    • If you want to test AutoDroid with other apps, please download the .apk file to your host machine, and prepare a device or an emulator connected to your host machine via adb.
    • Prepare a GPT API key, and go to tools.py, replace the os.environ['GPT_URL'] with your own API key.
  2. Start:

    droidbot -a <path/to/.apk> -o <output/of/app> -task <your task> -keep_env -keep_app

    you can try the scripts in the ./scripts folder, and the tasks from the DroidTask are listed in the form.

About Dataset

Organization of the Dataset,

    DroidTask
    ├── applauncher
    │   ├── states
    │   │   ├── Screenshot 1.png
    │   │   ├── Screenshot 2.png
    │   │   ├── ...
    │   │   ├── View hierarchy 1.json
    │   │   ├── View hierarchy 2.json
    │   │   └── ...
    │   ├── task1.yaml
    │   ├── task2.yaml
    │   ├── ...
    │   └── utg.yaml
    ├── calendar
    │   ├── states
    │   │   ├── Screenshot 1.png
    │   │   ├── Screenshot 2.png
    │   │   ├── ...
    │   │   ├── View hierarchy 1.json
    │   │   ├── View hierarchy 2.json
    │   │   └── ...
    │   ├── task1.yaml
    │   ├── task2.yaml
    │   ├── ...
    │   └── utg.yaml
  • DroidTask: The top level of the dataset, containing folders for each application included in the DroidTask, such as applauncher and calendar.

  • Application Folders: Records all the screenshots and raw view hierarchy parsed by droidbot:

    • States Folder: This folder holds all the captured states of the application during usage. A state includes both visual representations (screenshots) and structural data (view hierarchies).

      • Screenshots: Images captured from the application's interface, named sequentially (e.g., Screenshot 1.png, Screenshot 2.png, etc.).

      • View Hierarchies: JSON files detailing the structure of the application's UI for each captured state (e.g., View hierarchy 1.json, View hierarchy 2.json, etc.).

    • Task Files: YAML files named task1.yaml, task2.yaml, etc., containing the ground truth data for specific tasks within the application.

    • UTG File: A utg.yaml file that records data from the user's random exploration of the application.

  • Mapping Between Tasks and States: If you want to use the screenshots in your method:

    • Each taski.yaml file (where i is the task number) references states through a state_str identifier.
    • This state_str can be matched with the state_str in a view hierarchy k.json file to associate tasks with their corresponding application states.
    • The name of the view hierarchy k.json file (where k is the state number) is also used to locate the corresponding screenshot, as the screenshot and view hierarchy files share the same naming convention.

Known limitations

  • The current implementation is not good at determining task completion.
  • The task automation performance may be unstable due to the randomness of LLMs, the style/quality of app GUI and task descriptions, etc.
  • It requires connecting to a host machine via adb, instead of a standalone on-device solution.

Welcome to contribute!

Note

  • We thank a lot for the wonderful open source apps from Simple Mobile Tools.
  • Note that AutoDroid is currently for research purpose only. It may perform unintended actions (e.g. modifying your account/settings). Please use at your own risk.

Enjoy!

autodroid's People

Contributors

wenh18 avatar yuanchun-li avatar kiteflykid avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.