Code for the VizWiz Image Quality Issues Dataset, including API, baseline models and evaluation method.
- tensorflow v1.14
- keras v2.3.1
./
demo_recognizability.ipynb
: demo of recognizability prediction using Detectron or Resnet152 feature maps and its evaluation.demo_answerability_recognizability.ipynb
: demo of joint prediction of answerability and recognizability using Detectron or Resnet152 feature maps and its evaluation.
./api
load_annotations.ipynb
: This file shows how to compile VizWiz-VQA and VizWiz-ImageQualityIssues annotations saved in./annotations
into arrays for further use, such as Tensorflow Dataset as in./utils/DatasetAPI_to_tensor.py
. Compiled files are saved in./data
.
./annotations
vqa_annotations/train.json, val.json, test.json
(VizWiz-VQA training/val/test set)quality_annotations/train.json, val.json, test.json
(VizWiz-ImageQualityIssues training/val/test set)
./data
quality.json
: arrayed quality annotations compiled from./annotations/quality_annotations/train.json, val.json, test.json
by following./api/load_annotations.ipynb
. This file is for recognizability prediction.
data = json.load(open('./data/quality.json'))
# data is a dictionary with keys ['train', 'val', 'test'] corresponding to training/val/test set
# Take training set for example. data['train'] is a dictionary with keys ['image', 'flaws', 'recognizable']
# data['train']['image'], data['train']['flaws'], data['train']['recognizable'] are lists; can be converted numpy array with np.asarray()
# data['train']['image'][0] == the name of first image
# data['train']['flaws'][0] == the flaws of first image
# data['train']['recognizable'][0] == recognizability of first image
vqa_quality_merger.json
: arrayed vqa and quality annotations; incorporate./annotations/vqa_annotations/train.json, val.json, test.json
intoquality.json
. This file is for joint answerability-recognizability prediction.
data = json.load(open('./data/vqa_quality_merger.json'))
# In addition to the keys ['image', 'flaws', 'recognizable'], data['train'] has two other keys ['answerable', 'question']
./fmap
detectron/
: where image feature maps extracted by Detectron are stored. For how to extract the features, please refer to this notebook.resnet152/
: where image feature maps extracted by Resnet152 are stored. They can be derived by following the snippet of code below:
from keras.preprocessing import image
from keras.applications.imagenet_utils import preprocess_input
resnet152 = keras.applications.ResNet152(include_top=False, weights='imagenet', input_shape=[448, 448, 3])
base_model = keras.models.Model(inputs=resnet152.input, outputs=resnet152.get_layer('conv5_block3_add').output)
img = image.load_img(IMG_PATH, target_size=(448,448))
img = image.img_to_array(img)
img = np.expand_dims(img, axis=0)
img = preprocess_input(img)
img_feat = base_model.predict(img)
./model/VqaQualityModel.py
: Model adapted from Up-Down attention VQA model for the joint prediction of answerability and recognizability
./utils
DatasetAPI_to_tensor.py
: Tensorflow dataset API for loading data while running the modelword2vocab_vizwiz
: Ids of tokenized frequent words in the questions in VizWiz VQA dataset
./ckpt_rec
and ./ckpt_ans_rec
: checkpoints for the recongizability predictor and Up-Down model for answerability and recognizability prediction, respectively.
We use average precision as the evaluation metric.
Avg. precision of unrecognizability:
Feature maps | Validation set | Test-dev | Test-standard |
---|---|---|---|
Detectron | 79.44 | 77.36 | 78.49 |
Resnet-152 | 80.17 | 77.82 | 78.69 |
The format shown below is (avg. precision of unanswerability / avg. precision of unrecognizability given the question is unanswerable)
Feature maps | Validation set | Test-dev | Test-standard |
---|---|---|---|
Detectron | 71.41 / 83.08 | 72.26 / 85.38 | 70.53 / 86.20 |
Resnet-152 | 70.97 / 83.12 | 71.26 / 84.90 | 70.39 / 85.13 |