This is the source code of Learning Fragment Self-Atention Embeddings for Image-Text Matching, ACM MM 2019.
- python 3.6
- pytorch 0.4.1
We use the precomputed image features provided by SCAN. Please download data.zip from SCAN.
We use the bert code from BERT-pytorch. Please following here to convert the Google bert model to a PyTorch save file.
python train.py --data_path /path/to/data --data_name f30k_precomp --bert_path /path/to/uncased_L-12_H-768_A-12/
python train.py --data_path /path/to/data --data_name coco_precomp --bert_path /path/to/uncased_L-12_H-768_A-12/