π The 1st Place Solution to The 8th NVIDIA AI City Challenge (CVPR 2024 workshop) Track 2: CityLLaVA: Efficient Fine-Tuning for VLMs in City Scenario.
TeamName | MRR Score | Rank |
---|---|---|
AliOpenTrek(Ours) | 33.4308 | 1 |
AIO_ISC | 32.8877 | 2 |
Lighthouse | 32.3006 | 3 |
- Install Package
conda create -n cityllava python=3.10 -y
conda activate cityllava
cd AICITY2024_Track2_AliOpenTrek_CityLLaVA/
pip install --upgrade pip # enable PEP 660 support
pip install -e .
pip install flash-attn --no-build-isolation
Firstly change the directory to data_preprocess
and create the data
directory.
cd data_preprocess
mkdir ./data
Please download the wts-dataset. Then, put the datasets under ./data
. After unzip the datasets, the directory structure should be like this:
.
βββ data
β βββ BDD_PC_5k
β β βββ annotations
β β β βββ bbox_annotated
β β β βββ bbox_generated
β β β βββ caption
β β βββ videos
β βββ WTS
β β βββ annotations
β β β βββ bbox_annotated
β β β βββ bbox_generated
β β β βββ caption
β β βββ videos
β βββ test_part
| βββ view_used_as_main_reference_for_multiview_scenario.csv
β βββ WTS_DATASET_PUBLIC_TEST
β βββ WTS_DATASET_PUBLIC_TEST_BBOX
βββ ... # python and shell scripts
Then run the following script to process the test data:
bash prepare_data_test.sh
After this script is excuted, all the test data is prepared. You can download the fintuned model and run the inference step directly.
Run the following script to process the train data:
bash prepare_data_train.sh
Note that the Openai or Qwen API is required in "prepare_data_train.sh". You should modify the API_KEY in this script.
After the execution, the folder structure should be like this:
.
βββ data
β βββ BDD_PC_5k
β β βββ annotations
β β β βββ bbox_annotated
β β β βββ bbox_generated
β β β βββ caption
β β βββ bbox_global # BDD global views
β β β βββ train
β β β βββ val
β β βββ bbox_local # BDD local views
β β β βββ train
β β β βββ val
β β βββ videos
β βββ WTS
β β βββ annotations
β β β βββ bbox_annotated
β β β βββ bbox_generated
β β β βββ caption
β β βββ bbox_global # WTS global views
β β β βββ train
β β β βββ val
β β βββ bbox_local # BDD local views
β β β βββ train
β β β βββ val
β β βββ videos
β βββ test_part
| βββ view_used_as_main_reference_for_multiview_scenario.csv
β βββ WTS_DATASET_PUBLIC_TEST
β β βββbbox_global/test/public # WTS Test Images
β β βββbbox_local/test/public
β β βββexternal/BDD_PC_5K
β β βββbbox_global/test/public # BDD Test Images
β β βββbbox_local/test/public
β βββ WTS_DATASET_PUBLIC_TEST_BBOX
βββ processed_anno
β βββ frame_bbox_anno
β β βββ bdd_test_all_video_with_bbox_anno_first_frame.json
β β βββ bdd_train_all_video_with_bbox_anno_first_frame.json
β β βββ bdd_val_all_video_with_bbox_anno_first_frame.json
β β βββ wts_test_all_video_with_bbox_anno_first_frame.json
β β βββ wts_train_all_video_with_bbox_anno_first_frame.json
β β βββ wts_val_all_video_with_bbox_anno_first_frame.json
β βββ llava_format
β β βββ wts_bdd_train.json
β β βββ wts_bdd_val.json
β βββbest_view_for_test.json
β βββperspective_test_images.json
βββ ... # python and shell scripts
Then the processed annotations could be found under ./processed_anno
, and the train json is:
'./data/processed_anno/llava_format/wts_bdd_llava_qa_train_stage_filted_checked.json'
We use the block expansion to fine-tune the VLMs. 8~16 blocks are suggested for balancing the performance and efficiency. We add 12 blcoks to the original llava-1.6-34b. the llava-1.6-34b-12block model could be created by these steps:
- Download the llava-1.6-34b model to
./models
, and add block with this script:
python block_expansion_llava_1_6.py
- Copy the
*.json
andtokenizer.model
form./models/llava-v1.6-34b
to./models/llava-v1.6-34b-12block
; - Modify the
num_hidden_layers=72
(new_layer_nums= original_layer_nums+block_layer_nums) inconfig.json
of the llava-1.6-34b-12block model.
We use 8xA100 GPUs for fine-tuning. The training process takes approximately 8 hours by this script:
bash scripts/finetune_block_bigsmall.sh
The fine-tuned model could be download here.
Firstly, you should check the parameters defined at ./scripts/inference.sh
, ensure that all essential files and model exist.
Now you can do inference on WTS_TEST_SET:
bash scripts/inference.sh
We use the wts-dataset for evaluation.
If you find CityLLaVA useful for your research and applications, please cite using this BibTeX:
@misc{duan2024cityllava,
title={CityLLaVA: Efficient Fine-Tuning for VLMs in City Scenario},
url={https://github.com/qingchunlizhi/AICITY2024_Track2_AliOpenTrek_CityLLaVA},
author={Zhizhao Duan, Hao Cheng, Duo Xu, Xi Wu, Xiangxie Zhang, Xi Ye, and Zhen Xie},
year={2024},
eprint={2405.03194},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
aicity2024_track2_aliopentrek_cityllava's People
aicity2024_track2_aliopentrek_cityllava's Issues
Inference.json is missing
I don't see nay results in results folder, as the file named inference.json
is empty file. Do you plan to provide the results?
Thank you
generate_test_frames.py
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. πππ
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google β€οΈ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.