MSCOCO-it Dataset

A large scale dataset for Image Captioning in Italian

MSCOCO is a large scale dataset for training of image captioning systems. It contains(2014 version) more than 600,000 image-caption pairs. It contains training and validation subsets, made respectively of 82, 783 and 40, 504 images, where every image has 5 human-written annotations in English.

MSCOCO-it is derived from the MSCOCO dataset and it is obtained through semi-automatic translation of the dataset into Italian. It represents a large-scale dataset for image captioning in Italian. The dataset contains more than 600,000 image-caption pairs derived from the original English dataset. In the paper from (Vinyals et al., 2014), all the image-caption pairs (training+validation / five captions for each image) have been used to train the system, except for a development set of about 2000 images and a test set of about 4000 images that were held out from validation subsets for evaluation. In the following guide to the MSCOCO-it resource, we are going to refer to them as the MSCOCO2K development set and the MSCOCO4K test set. In the MSCOCO-it resource, two subsets of images along with their annotations taken from, respectively, the MSCOCO2K development set and MSCOCO4K test set and given that each image has five caption, all the captions (automatically translated from English to Italian) have been manually validated.

The MSCOCO-it dataset is composed of 6 files:

TRAINING SET

captions_ita_trainingset_train2014.json: it consists of the annotations for the images from the original full MSCOCO training set annotations file, except from the ones in the MSCOCO2K development set and MSCOCO4K test set. All these images and their captions can be used to train a model..
captions_ita_trainingset_val2014.json: it consists of the annotations for the images from the original full MSCOCO validation set annotations file, except from the ones in the MSCOCO2K development set and MSCOCO4K test set. All these images and their captions can be used to train a model.

DEVELOPMENT SET

captions_ita_devset_unvalidated.json:contains the annotations for all the images from the MSCOCO2K original development set (2000 images held out from the full MSCOCO validation set) whose Italian captions, translated with Bing, are all NOT manually validated.
captions_ita_devset_validated.json: contains all the validated annotations for a subset of the images from the MSCOCO2K original development set (2000 images held out from the full MSCOCO validation set).

TEST SET:

captions_ita_testset_unvalidated.json,
captions_ita_testset_validated.json: same file organization as the development set, referred to a subset of the original MSCOCO4K test set (4000 images held out for testing from the full MSCOCO validation set).

More details about MSCOCO-it can be found in the paper available at this link. Note that this release it is different from the document as regards the partially validated captions that are now validated.

IMAGES

Please refear to : http://cocodataset.org/#download

2014 Train images [83K/13GB]
2014 Val images [41K/6GB]

How to cite MSCOCO-it

This dataset was introduced in the work "Large scale datasets for Image and Video Captioning in Italian" available at the following link. If you find MSCOCO-it useful for your research, please cite the following paper:

@article{IJCOL:scaiella_et_al:2019,
	author = {Scaiella, Antonio and Croce, Danilo and Basili, Roberto},
	journal = {Italian Journal of Computational Linguistics},
	Editor = {Roberto Basili and Simonetta Montemagni},
	number = 5,
	pages = {49-60},
	title = {Large scale datasets for Image and Video Captioning in Italian},
	publisher = {Accademia University Press},
	url = {http://www.ai-lc.it/IJCoL/v5n2/IJCOL_5_2_3___scaiella_et_al.pdf},
	volume = 2,
	year = 2019
}

Download

To download the MSCOCO-it dataset, please refer to this folder

The resource is developed by the Semantic Analytics Group of the University of Roma Tor Vergata.

Release format

The same format used in the MSCOCO dataset is adopted:

{
"info": info, 
"images": [image], 
"annotations": [annotation],
"licenses": [license],
}

info{
"year": int,
"version": str, 
"description": str, 
"contributor": str,
"url": str,
"date_created": datetime,
}

image{
"id": int,
"width": int,
"height": int,
"file_name": str,
"license": int, 
"flickr_url": str,
"coco_url": str,
"date_captured": datetime,
}

license{
"id": int,
"name": str,
"url": str,
}
annotation{
"id": int, 
"image_id": int,
"caption": str,
}

Size of MSCOCO-it

The original MSCOCO dataset contains the following elements:

Element	Training Set	Validation set
Images	82 783	40 504
Captions	~414 000	~202 000

The final MSCOCO-it contains the following elements: unvalidated (u.) and validated (v.)

	#images	#captions	#words
training u.	116 195	581 286	~6 900 000
development v.	308	1 541	17 913
development u.	1 696	8 486	~102 000
test v.	596	2 982	34 657
test u.	3 422	17 120	~202 000

References

T. Lin, M. Maire, S. J. Belongie, L. D. Bourdev, R. B. Girshick, J. Hays,P. Perona, D. Ramanan, P. Doll ́ar, and C. L. Zitnick, “Microsoft COCO:common objects in context,”CoRR, vol. abs/1405.0312, 2014. [Online].Available: http://arxiv.org/abs/1405.0312

O. Vinyals, A. Toshev, S. Bengio, and D. Erhan, "Show and tell: A neural image caption generator," in 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2015, pp. 3156-3164. [Online]. Available: https://arxiv.org/abs/1411.4555

Contacts

For any questions or suggestions, you can send an e-mail to [email protected]

crux82 / mscoco-it Goto Github PK

mscoco-it's Introduction

MSCOCO-it Dataset

A large scale dataset for Image Captioning in Italian

How to cite MSCOCO-it

Download

Release format

Size of MSCOCO-it

References

Contacts

mscoco-it's People

Contributors

Stargazers

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent