cooelf / auto-ui Goto Github PK
View Code? Open in Web Editor NEWOfficial implementation for "You Only Look at Screens: Multimodal Chain-of-Action Agents" (stay tuned and more will be updated)
License: Apache License 2.0
Official implementation for "You Only Look at Screens: Multimodal Chain-of-Action Agents" (stay tuned and more will be updated)
License: Apache License 2.0
Thanks for the nice work.
Is there any demo code to inference a prompt with a mobile sceenshot to get the response from Auto-UI?
I just wanted to use your model in the Hugging Face model library but I don't see any model usage definitions, will you be defining any usage instructions or model card any time soon?
Hello, I found this an interesting project but the link provided for accessing preprocessed data and for trained models is not working. The link you provided is:
https://huggingface.co/cooelf/Auto-UI/tree/main
Can you provide the right link so we can look into the dataset structure and format?
Thanks for the work. I'd like to inference this model on custom images and goals, I tried to write the inference code by myself.
but I found that the obj file unpickles the image as a tensor, so I'd like to know what's the conversion method used to load the image?
According to the utils_data.py
, the image_ids was retrieved from image_ids = torch.tensor(source_image).squeeze()
;
According to the paper, "Given a screenshot Xscreen ∈
Rh×w×3 with height h and width w at step t ∈ [1, k], we first feed it to a frozen image encoder (e.g.,
BLIP-2 (Li et al., 2023)) and extract vision features Hscreen ∈ R1×ds where ds is the dimension of
the vision features."
So I believe that images are pickled after its image features has been extracted into the tensor. But there is no details and blip-2 model details used for feature extraction.
Hello, I endeavor to replicate the results of the base model using the "declare-lab/flan-alpaca-base" obtained from Hugging Face. I followed the commands provided in the readme for training; however, the loss does not exhibit a descent pattern, and, regrettably, the inference fails to produce any meaningful content. Below, I present a partial excerpt from my trainer_state for your reference:
{
"epoch": 0.02,
"learning_rate": 3.135779241141424e-06,
"loss": 17.987,
"step": 500
},
{
"epoch": 0.03,
"learning_rate": 6.271558482282848e-06,
"loss": 17.9571,
"step": 1000
},
……
{
"epoch": 9.99,
"learning_rate": 1.320328101533231e-07,
"loss": 16.2255,
"step": 318500
},
{
"epoch": 10.0,
"eval_gen_len": 1.0,
"eval_loss": 17.40145492553711,
"eval_rouge1": 0.007,
"eval_rouge2": 0.0,
"eval_rougeL": 0.0069,
"eval_rougeLsum": 0.007,
"eval_runtime": 411.3956,
"eval_samples_per_second": 21.349,
"eval_steps_per_second": 0.168,
"step": 318900
}
When attempting to conduct inference using the acquired model, the generated content proves entirely ineffective:
'- nooutput> - nooutput> - nooutput> - nooutput> - nooutput> - nooutput> - nooutput> - nooutput> - nooutput> - nooutput> - nooutput> - nooutput> - nooutput> - nooutput> '
What are the reasons for the above problems? Looking forward to your answer, thank you!
在所给的 https://huggingface.co/cooelf/Auto-UI/tree/main 的链接里的blip.zip文件解压后并没有找到用于inference的single_parsed_episode_t5_blip数据,这个数据在那可以得到,想尝试inference
I tried deploying this model on sagemaker following the instructions on https://huggingface.co/docs/sagemaker/inference#deploy-a-model-from-the-hub and the inference calls are failing with the following error:
OSError: /.sagemaker/mms/models/cooelf__Auto-UI does not appear to have a file named config.json. Checkout 'https://huggingface.co//.sagemaker/mms/models/cooelf__Auto-UI/None' for available files.
Any pointers on how to get this running on sagemaker?
Hi, I am following the steps in the Readme to run the model.
My goal is to be able to run the model to be able to provide my inputs. I dont want to train the model.
I did the following:
On running the command-
python -m torch.distributed.launch --nproc_per_node=8 --use_env main.py \
--data_root blip \
--model declare-lab/flan-alpaca-base \
--epoch 10 --lr 1e-4 \
--user_msg seq_future_blip_axis_all0.1_hist8_future4 --img_type blip --img_dim 1408 \
--bs 4 --eval_bs 16 --input_len 512 --output_len 128 --eval_acc 40 \
--transform_axis --warmup_ratio 0.05 \
--all_data 0.1 \
--use_history 8 \
--use_future 4 \
--eval_subset dataset/blip/general_blip \
--output_dir experiments
I get the following error :
args Namespace(all_data=0.1, bs=4, data_ratio=None, data_root='blip', debug_num=None, epoch=10, eval_acc=40, eval_bs=16, eval_name=None, eval_subset='dataset/blip/general_blip', evaluate_dir=None, final_eval=False, img_dim=1408, img_type='blip', input_len=512, local_rank=-1, lr=0.0001, model='declare-lab/flan-alpaca-base', output_dir='experiments', output_len=128, seed=42, transform_axis=True, use_future=4, use_generate=True, use_history=8, use_img_history=False, use_layout=False, user_msg='seq_future_blip_axis_all0.1_hist8_future4', warmup_ratio=0.05)
====Input Arguments====
{
"data_root": "blip",
"output_dir": "experiments",
"model": "declare-lab/flan-alpaca-base",
"data_ratio": null,
"eval_name": null,
"local_rank": -1,
"epoch": 10,
"lr": 0.0001,
"warmup_ratio": 0.05,
"bs": 4,
"debug_num": null,
"input_len": 512,
"output_len": 128,
"img_dim": 1408,
"eval_bs": 16,
"eval_acc": 40,
"all_data": 0.1,
"eval_subset": "dataset/blip/general_blip",
"use_history": 8,
"use_img_history": false,
"use_future": 4,
"use_layout": false,
"transform_axis": true,
"use_generate": true,
"final_eval": false,
"user_msg": "seq_future_blip_axis_all0.1_hist8_future4",
"img_type": "blip",
"evaluate_dir": null,
"seed": 42
}
args Namespace(all_data=0.1, bs=4, data_ratio=None, data_root='blip', debug_num=None, epoch=10, eval_acc=40, eval_bs=16, eval_name=None, eval_subset='dataset/blip/general_blip', evaluate_dir=None, final_eval=False, img_dim=1408, img_type='blip', input_len=512, local_rank=-1, lr=0.0001, model='declare-lab/flan-alpaca-base', output_dir='experiments', output_len=128, seed=42, transform_axis=True, use_future=4, use_generate=True, use_history=8, use_img_history=False, use_layout=False, user_msg='seq_future_blip_axis_all0.1_hist8_future4', warmup_ratio=0.05)
====Input Arguments====
{
"data_root": "blip",
"output_dir": "experiments",
"model": "declare-lab/flan-alpaca-base",
"data_ratio": null,
"eval_name": null,
"local_rank": -1,
"epoch": 10,
"lr": 0.0001,
"warmup_ratio": 0.05,
"bs": 4,
"debug_num": null,
"input_len": 512,
"output_len": 128,
"img_dim": 1408,
"eval_bs": 16,
"eval_acc": 40,
"all_data": 0.1,
"eval_subset": "dataset/blip/general_blip",
"use_history": 8,
"use_img_history": false,
"use_future": 4,
"use_layout": false,
"transform_axis": true,
"use_generate": true,
"final_eval": false,
"user_msg": "seq_future_blip_axis_all0.1_hist8_future4",
"img_type": "blip",
"evaluate_dir": null,
"seed": 42
}
args Namespace(all_data=0.1, bs=4, data_ratio=None, data_root='blip', debug_num=None, epoch=10, eval_acc=40, eval_bs=16, eval_name=None, eval_subset='dataset/blip/general_blip', evaluate_dir=None, final_eval=False, img_dim=1408, img_type='blip', input_len=512, local_rank=-1, lr=0.0001, model='declare-lab/flan-alpaca-base', output_dir='experiments', output_len=128, seed=42, transform_axis=True, use_future=4, use_generate=True, use_history=8, use_img_history=False, use_layout=False, user_msg='seq_future_blip_axis_all0.1_hist8_future4', warmup_ratio=0.05)
====Input Arguments====
{
"data_root": "blip",
"output_dir": "experiments",
"model": "declare-lab/flan-alpaca-base",
"data_ratio": null,
"eval_name": null,
"local_rank": -1,
"epoch": 10,
"lr": 0.0001,
"warmup_ratio": 0.05,
"bs": 4,
"debug_num": null,
"input_len": 512,
"output_len": 128,
"img_dim": 1408,
"eval_bs": 16,
"eval_acc": 40,
"all_data": 0.1,
"eval_subset": "dataset/blip/general_blip",
"use_history": 8,
"use_img_history": false,
"use_future": 4,
"use_layout": false,
"transform_axis": true,
"use_generate": true,
"final_eval": false,
"user_msg": "seq_future_blip_axis_all0.1_hist8_future4",
"img_type": "blip",
"evaluate_dir": null,
"seed": 42
}
args Namespace(all_data=0.1, bs=4, data_ratio=None, data_root='blip', debug_num=None, epoch=10, eval_acc=40, eval_bs=16, eval_name=None, eval_subset='dataset/blip/general_blip', evaluate_dir=None, final_eval=False, img_dim=1408, img_type='blip', input_len=512, local_rank=-1, lr=0.0001, model='declare-lab/flan-alpaca-base', output_dir='experiments', output_len=128, seed=42, transform_axis=True, use_future=4, use_generate=True, use_history=8, use_img_history=False, use_layout=False, user_msg='seq_future_blip_axis_all0.1_hist8_future4', warmup_ratio=0.05)
====Input Arguments====
{
"data_root": "blip",
"output_dir": "experiments",
"model": "declare-lab/flan-alpaca-base",
"data_ratio": null,
"eval_name": null,
"local_rank": -1,
"epoch": 10,
"lr": 0.0001,
"warmup_ratio": 0.05,
"bs": 4,
"debug_num": null,
"input_len": 512,
"output_len": 128,
"img_dim": 1408,
"eval_bs": 16,
"eval_acc": 40,
"all_data": 0.1,
"eval_subset": "dataset/blip/general_blip",
"use_history": 8,
"use_img_history": false,
"use_future": 4,
"use_layout": false,
"transform_axis": true,
"use_generate": true,
"final_eval": false,
"user_msg": "seq_future_blip_axis_all0.1_hist8_future4",
"img_type": "blip",
"evaluate_dir": null,
"seed": 42
}
[20:18:30] [Model]: Loading declare-lab/flan-alpaca-base... main.py:83
[Data]: Reading data... main.py:84
experiments/seq_future_blip_axis_all0.1_hist8_future4_declare-lab-flan-alpaca-base_blip_lr0.0001_bs0_ip512_op128_ep10
[20:18:30] [Model]: Loading declare-lab/flan-alpaca-base... main.py:83
[Data]: Reading data... main.py:84
experiments/seq_future_blip_axis_all0.1_hist8_future4_declare-lab-flan-alpaca-base_blip_lr0.0001_bs0_ip512_op128_ep10
[20:18:30] [Model]: Loading declare-lab/flan-alpaca-base... main.py:83
[20:18:30] [Model]: Loading declare-lab/flan-alpaca-base... main.py:83
[Data]: Reading data... main.py:84
experiments/seq_future_blip_axis_all0.1_hist8_future4_declare-lab-flan-alpaca-base_blip_lr0.0001_bs0_ip512_op128_ep10
[Data]: Reading data... main.py:84
experiments/seq_future_blip_axis_all0.1_hist8_future4_declare-lab-flan-alpaca-base_blip_lr0.0001_bs0_ip512_op128_ep10
[20:18:30] [Model]: Loading declare-lab/flan-alpaca-base... main.py:83
[Data]: Reading data... main.py:84
experiments/seq_future_blip_axis_all0.1_hist8_future4_declare-lab-flan-alpaca-base_blip_lr0.0001_bs0_ip512_op128_ep10
[20:18:30] [Model]: Loading declare-lab/flan-alpaca-base... main.py:83
[Data]: Reading data... main.py:84
experiments/seq_future_blip_axis_all0.1_hist8_future4_declare-lab-flan-alpaca-base_blip_lr0.0001_bs0_ip512_op128_ep10
[20:18:30] [Model]: Loading declare-lab/flan-alpaca-base... main.py:83
[Data]: Reading data... main.py:84
experiments/seq_future_blip_axis_all0.1_hist8_future4_declare-lab-flan-alpaca-base_blip_lr0.0001_bs0_ip512_op128_ep10
[20:18:30] [Model]: Loading declare-lab/flan-alpaca-base... main.py:83
[Data]: Reading data... main.py:84
experiments/seq_future_blip_axis_all0.1_hist8_future4_declare-lab-flan-alpaca-base_blip_lr0.0001_bs0_ip512_op128_ep10
model.safetensors: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 990M/990M [00:17<00:00, 56.1MB/s]
Some weights of T5ForMultimodalGeneration were not initialized from the model checkpoint at declare-lab/flan-alpaca-base and are newly initialized: ['mha_layer.out_proj.bias', 'gate_dense.bias', 'mha_layer.in_proj_bias', 'image_dense.weight', 'mha_layer.out_proj.weight', 'mha_layer.in_proj_weight', 'gate_dense.weight', 'image_dense.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Some weights of T5ForMultimodalGeneration were not initialized from the model checkpoint at declare-lab/flan-alpaca-base and are newly initialized: ['mha_layer.out_proj.bias', 'gate_dense.bias', 'mha_layer.in_proj_bias', 'gate_dense.weight', 'mha_layer.in_proj_weight', 'mha_layer.out_proj.weight', 'image_dense.weight', 'image_dense.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Some weights of T5ForMultimodalGeneration were not initialized from the model checkpoint at declare-lab/flan-alpaca-base and are newly initialized: ['image_dense.bias', 'mha_layer.out_proj.weight', 'image_dense.weight', 'mha_layer.in_proj_bias', 'gate_dense.bias', 'gate_dense.weight', 'mha_layer.out_proj.bias', 'mha_layer.in_proj_weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Some weights of T5ForMultimodalGeneration were not initialized from the model checkpoint at declare-lab/flan-alpaca-base and are newly initialized: ['mha_layer.out_proj.weight', 'mha_layer.in_proj_weight', 'mha_layer.in_proj_bias', 'gate_dense.bias', 'gate_dense.weight', 'mha_layer.out_proj.bias', 'image_dense.bias', 'image_dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Some weights of T5ForMultimodalGeneration were not initialized from the model checkpoint at declare-lab/flan-alpaca-base and are newly initialized: ['image_dense.weight', 'mha_layer.out_proj.weight', 'image_dense.bias', 'mha_layer.out_proj.bias', 'mha_layer.in_proj_bias', 'gate_dense.bias', 'mha_layer.in_proj_weight', 'gate_dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Some weights of T5ForMultimodalGeneration were not initialized from the model checkpoint at declare-lab/flan-alpaca-base and are newly initialized: ['mha_layer.out_proj.bias', 'gate_dense.bias', 'gate_dense.weight', 'mha_layer.in_proj_bias', 'mha_layer.out_proj.weight', 'mha_layer.in_proj_weight', 'image_dense.weight', 'image_dense.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Some weights of T5ForMultimodalGeneration were not initialized from the model checkpoint at declare-lab/flan-alpaca-base and are newly initialized: ['mha_layer.in_proj_bias', 'mha_layer.in_proj_weight', 'gate_dense.bias', 'image_dense.weight', 'mha_layer.out_proj.weight', 'mha_layer.out_proj.bias', 'gate_dense.weight', 'image_dense.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Some weights of T5ForMultimodalGeneration were not initialized from the model checkpoint at declare-lab/flan-alpaca-base and are newly initialized: ['mha_layer.in_proj_bias', 'gate_dense.weight', 'gate_dense.bias', 'mha_layer.out_proj.bias', 'mha_layer.in_proj_weight', 'image_dense.bias', 'mha_layer.out_proj.weight', 'image_dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
generation_config.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 142/142 [00:00<00:00, 25.5kB/s]
loading general 0
loading general 0
loading general 0
loading general 0loading general
0
loading general loading general0
0
loading general 0
loading google_apps 7580
loading google_apps 7580
loading google_apps 7580
loading google_apps 7580
loading google_apps 7580
loading google_apps 7580
loading google_apps 7580
loading google_apps 7580
[2024-01-07 20:20:07,853] torch.distributed.elastic.multiprocessing.api: [WARNING] Sending process 19300 closing signal SIGTERM
[2024-01-07 20:20:07,855] torch.distributed.elastic.multiprocessing.api: [WARNING] Sending process 19301 closing signal SIGTERM
[2024-01-07 20:20:07,855] torch.distributed.elastic.multiprocessing.api: [WARNING] Sending process 19302 closing signal SIGTERM
[2024-01-07 20:20:07,855] torch.distributed.elastic.multiprocessing.api: [WARNING] Sending process 19303 closing signal SIGTERM
[2024-01-07 20:20:07,855] torch.distributed.elastic.multiprocessing.api: [WARNING] Sending process 19304 closing signal SIGTERM
[2024-01-07 20:20:07,855] torch.distributed.elastic.multiprocessing.api: [WARNING] Sending process 19306 closing signal SIGTERM
[2024-01-07 20:20:07,855] torch.distributed.elastic.multiprocessing.api: [WARNING] Sending process 19307 closing signal SIGTERM
[2024-01-07 20:20:08,928] torch.distributed.elastic.multiprocessing.api: [ERROR] failed (exitcode: -9) local_rank: 5 (pid: 19305) of binary: /home/skirti/.pyenv/versions/3.8.11/bin/python
Traceback (most recent call last):
File "/home/skirti/.pyenv/versions/3.8.11/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/skirti/.pyenv/versions/3.8.11/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/home/skirti/.pyenv/versions/3.8.11/lib/python3.8/site-packages/torch/distributed/launch.py", line 196, in <module>
main()
File "/home/skirti/.pyenv/versions/3.8.11/lib/python3.8/site-packages/torch/distributed/launch.py", line 192, in main
launch(args)
File "/home/skirti/.pyenv/versions/3.8.11/lib/python3.8/site-packages/torch/distributed/launch.py", line 177, in launch
run(args)
File "/home/skirti/.pyenv/versions/3.8.11/lib/python3.8/site-packages/torch/distributed/run.py", line 797, in run
elastic_launch(
File "/home/skirti/.pyenv/versions/3.8.11/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 134, in __call__
return launch_agent(self._config, self._entrypoint, list(args))
File "/home/skirti/.pyenv/versions/3.8.11/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 264, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
============================================================
main.py FAILED
------------------------------------------------------------
Failures:
<NO_OTHER_FAILURES>
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
time : 2024-01-07_20:20:07
host : 211b70a3
rank : 5 (local_rank: 5)
exitcode : -9 (pid: 19305)
error_file: <N/A>
traceback : Signal 9 (SIGKILL) received by PID 19305
============================================================
Any pointers on what is causing this?
Hi Zhuosheng,
Nice work! I'd like to follow this work and for a fair comparison, could you please provide some information about the train/dev/test split since I need to locate the original data? Thanks!
Hi, thanks for the good work.
I wonder how the click accuracy and scroll accuracy is calculated in section 5.1. I can not find such code in main.py and action_matching.py.
Thanks~
Dear authors, thank you for this great work!
I wonder if the task_Impossible action defined in the original AitW paper is also applicable in this paper?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.