Giter Club home page Giter Club logo

mscoco-it's Introduction

MSCOCO-it Dataset

A large scale dataset for Image Captioning in Italian

MSCOCO is a large scale dataset for training of image captioning systems. It contains(2014 version) more than 600,000 image-caption pairs. It contains training and validation subsets, made respectively of 82, 783 and 40, 504 images, where every image has 5 human-written annotations in English.

MSCOCO-it is derived from the MSCOCO dataset and it is obtained through semi-automatic translation of the dataset into Italian. It represents a large-scale dataset for image captioning in Italian. The dataset contains more than 600,000 image-caption pairs derived from the original English dataset. In the paper from (Vinyals et al., 2014), all the image-caption pairs (training+validation / five captions for each image) have been used to train the system, except for a development set of about 2000 images and a test set of about 4000 images that were held out from validation subsets for evaluation. In the following guide to the MSCOCO-it resource, we are going to refer to them as the MSCOCO2K development set and the MSCOCO4K test set. In the MSCOCO-it resource, two subsets of images along with their annotations taken from, respectively, the MSCOCO2K development set and MSCOCO4K test set and given that each image has five caption, all the captions (automatically translated from English to Italian) have been manually validated.

The MSCOCO-it dataset is composed of 6 files:

TRAINING SET

  • captions_ita_trainingset_train2014.json: it consists of the annotations for the images from the original full MSCOCO training set annotations file, except from the ones in the MSCOCO2K development set and MSCOCO4K test set. All these images and their captions can be used to train a model..
  • captions_ita_trainingset_val2014.json: it consists of the annotations for the images from the original full MSCOCO validation set annotations file, except from the ones in the MSCOCO2K development set and MSCOCO4K test set. All these images and their captions can be used to train a model.

DEVELOPMENT SET

  • captions_ita_devset_unvalidated.json:contains the annotations for all the images from the MSCOCO2K original development set (2000 images held out from the full MSCOCO validation set) whose Italian captions, translated with Bing, are all NOT manually validated.
  • captions_ita_devset_validated.json: contains all the validated annotations for a subset of the images from the MSCOCO2K original development set (2000 images held out from the full MSCOCO validation set).

TEST SET:

  • captions_ita_testset_unvalidated.json,
  • captions_ita_testset_validated.json: same file organization as the development set, referred to a subset of the original MSCOCO4K test set (4000 images held out for testing from the full MSCOCO validation set).

More details about MSCOCO-it can be found in the paper available at this link. Note that this release it is different from the document as regards the partially validated captions that are now validated.

IMAGES

Please refear to : http://cocodataset.org/#download

  • 2014 Train images [83K/13GB]
  • 2014 Val images [41K/6GB]

How to cite MSCOCO-it

This dataset was introduced in the work "Large scale datasets for Image and Video Captioning in Italian" available at the following link. If you find MSCOCO-it useful for your research, please cite the following paper:

@article{IJCOL:scaiella_et_al:2019,
	author = {Scaiella, Antonio and Croce, Danilo and Basili, Roberto},
	journal = {Italian Journal of Computational Linguistics},
	Editor = {Roberto Basili and Simonetta Montemagni},
	number = 5,
	pages = {49-60},
	title = {Large scale datasets for Image and Video Captioning in Italian},
	publisher = {Accademia University Press},
	url = {http://www.ai-lc.it/IJCoL/v5n2/IJCOL_5_2_3___scaiella_et_al.pdf},
	volume = 2,
	year = 2019
}

Download

To download the MSCOCO-it dataset, please refer to this folder

The resource is developed by the Semantic Analytics Group of the University of Roma Tor Vergata.

Release format

The same format used in the MSCOCO dataset is adopted:

{
"info": info, 
"images": [image], 
"annotations": [annotation],
"licenses": [license],
}

info{
"year": int,
"version": str, 
"description": str, 
"contributor": str,
"url": str,
"date_created": datetime,
}

image{
"id": int,
"width": int,
"height": int,
"file_name": str,
"license": int, 
"flickr_url": str,
"coco_url": str,
"date_captured": datetime,
}

license{
"id": int,
"name": str,
"url": str,
}
annotation{
"id": int, 
"image_id": int,
"caption": str,
}

Size of MSCOCO-it

The original MSCOCO dataset contains the following elements:

Element Training Set Validation set
Images 82 783 40 504
Captions ~414 000 ~202 000

The final MSCOCO-it contains the following elements: unvalidated (u.) and validated (v.)

#images #captions #words
training u. 116 195 581 286 ~6 900 000
development v. 308 1 541 17 913
development u. 1 696 8 486 ~102 000
test v. 596 2 982 34 657
test u. 3 422 17 120 ~202 000

References

T. Lin, M. Maire, S. J. Belongie, L. D. Bourdev, R. B. Girshick, J. Hays,P. Perona, D. Ramanan, P. Doll ́ar, and C. L. Zitnick, “Microsoft COCO:common objects in context,”CoRR, vol. abs/1405.0312, 2014. [Online].Available: http://arxiv.org/abs/1405.0312

O. Vinyals, A. Toshev, S. Bengio, and D. Erhan, "Show and tell: A neural image caption generator," in 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2015, pp. 3156-3164. [Online]. Available: https://arxiv.org/abs/1411.4555

Contacts

For any questions or suggestions, you can send an e-mail to [email protected]

mscoco-it's People

Contributors

crux82 avatar piperino11 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.