Giter Club home page Giter Club logo

dataset's Introduction

dataset's People

Contributors

andreasveit avatar gkrasin avatar jponttuset avatar marcreichman-pfi avatar nalldrin avatar rkrasin avatar shahabkam avatar vfgomes avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

dataset's Issues

Distinct labels with same description in dict.csv

Hi, I noticed that some labels have the same description in the file dict.csv.
Is that expected? Should these cases be treated as distinct entities or is it
better to merge them into a single label?

The list of repeated descriptions is:

/m/018w8, /m/0frqg3, basketball
/m/0449p, /m/0h5wslk, jaguar
/m/07ptj3n, /m/08gqpm, cup
/m/03_mn6, /m/0j67plv, lotus
/m/02g0fy, /m/0f5mx3, fiat 500
/m/019v__, /m/0j6n39d, tvr
/g/121sl9wl, /m/02jcyv, jaguar s-type
/m/01tv9, /m/08p92x, cream
/m/02m57w, /m/0ds4250, groom
/m/080cpdy, /m/0fq0fnf, plank
/m/01l849, /m/025rs2z, gold
/m/04bct1, /m/0dzdr, chest
/m/01txr2, /m/01ty0n, spring
/m/05b1swx, /m/0633h, polar bear
/m/03c_kl, /m/0l_yv, snowshoe
/m/028ygt, /m/02jwq3, punch
/m/01d380, /m/02hhhb, drill
/m/015zzv, /m/03bxt6z, runway
/m/018xm, /m/0dpm1v, ball
/m/0jqjp, /m/0lxkm, iris
/m/0319l, /m/04lmyz, horn
/m/025rw19, /m/03cld36, iron
/m/07bg4p, /m/095_n, heart
/m/017cc, /m/04n0b__, brain
/m/091410, /m/09141t, collar
/m/01z9v6, /m/054fyh, pitcher
/m/01443y, /m/0cphhk, headgear
/m/02tcwp, /m/031vtq, /m/03hqlh, trunk
/m/0879r3, /m/09gys, squid
/m/02g387, /m/033cnk, egg
/m/011_f4, /m/0d8lm, string instrument
/m/01fpbm, /m/04c38s, daisy
/m/01j_h3, /m/0h5wwjv, subaru
/m/02g7g2, /m/0cjs7, asparagus
/m/04mtl, /m/0h5x4j3, lamborghini
/m/09xp_, /m/09xqv, cricket
/m/03xr7y, /m/04gth, lavender
/m/0266skk, /m/0m775, tilapia
/m/01qk4t, /m/0l14v3, conch
/m/020lf, /m/04rmv, mouse
/m/05h2v35, /m/0by3w, jumping
/m/06wrt, /m/0cc6_9k, sailing
/m/03qsdpk, /m/05npqn, theatre
/m/02519, /m/07s6bqg, cable car
/m/031n9j, /m/04tdh, marble
/m/02qsq_1, /m/03bx3wh, corn on the cob
/m/01cjsf, /m/02zt3, kite
/m/013y0j, /m/013y1f, organ
/m/06g1w2, /m/0hwky, pattern
/m/0cyhj_, /m/0jc_p, orange
/m/0gqbt, /m/0j3gthp, shrub
/m/06ff5p, /m/0b209p, rolls-royce corniche
/m/0fsg8, /m/0m150, harrier
/m/01lbxg, /m/039hvj, nut
/m/026y54h, /m/02823g9, /m/0bzfym, alfa romeo giulietta
/m/04f6rz, /m/0fgkh, turquoise
/m/027y004, /m/0cqdf, sponge
/m/01226z, /m/02vx4, football
/m/01m0p1, /m/0jwr9, cardinal
/m/07_l0f, /m/0gzznm, powder
/m/03clckp, /m/083vt, wood
/m/04d01f, /m/0pbc, amber
/m/07pbfj, /m/0ch_cf, fish
/m/0gccln, /m/0gccmf, ford model a
/m/01b7b, /m/027k49j, bishop
/m/0151b0, /m/07jx7, triangle
/m/01brf, /m/04_10ss, bronze
/m/014sg5, /m/07_l6, viola
/m/08g_yr, /m/0cx45, temple
/m/01c43w, /m/01v50j, crane
/m/03r18y, /m/0dj6p, peach
/m/03wfhdl, /m/0y8r, armored car
/m/04ffcj, /m/0k354, lilac
/m/06s7q8, /m/0k2jq, sabre

Any timeframe on releasing the new trained model?

Hi,
I saw that you plan on releasing a trained detector and recognition model on the updated dataset?
Can you please tell me if you still plan on releasing the model and the approximate timeframe.

Thanks,
Akshay

Missing images

Hello, we are downloading all of the images in this dataset to a local storage array, and have noticed that there are several images whose OriginalURL gets an error 302 redirect to this PNG: https://s.yimg.com/pw/images/en-us/photo_unavailable_l.png

It's easier to catch now that OriginalMD5 column is available (as of v4), however we'd still like to have a complete dataset. (There's also a small number of error 500's, but those might be transient).

Please advise if there are any Alternate URLs (such as Google cache).
We can provide a list of failed ImageIDs after download finishes.

Accuracy of Released Image Classification Models

Dear @rkrasin,
Thank you for your fantastic Data Set. Would you please kindly inform us the accuracy (i.e., F1-Score) of your released models (i.e., ResNet 101 image classification model and Inception V3 model) on validation set?

print image vectors for new image (based on pretrained model)

Thanks for providing the pretrained Inception v3 model.
At the moment the classfy.py function outputs the nearest class for a given Image, is it possible to provide a function that outputs the highdimensional vector of a (new) image ? this could be useful to transform the image into a vector.

Missing human annotation for validation image 58c970f287de7333

Out of all 41,620 validation images, there is one that looks like it's completely missing human annotations, namely the image with image ID "58c970f287de7333". Can someone please verify that this is, in fact, the case and not a mistake from my side? This image also happens to be missing the field labeled "Thumbnail300KURL" in the CSV file.

downloading the images for bounding boxes only

Hello, I am working on the object detection problem and thank you for this amazing dataset. I want to download images which are required for training the bounding boxes (detection) model. Is there any way to download just 600,000 (approx) images for detection. The train/images.csv contains the list of all the images.
Thank you

Python Urlfetch Error: 'GET'

Hi:
Is the dataset unable to download rescently? It returns "DownloadError('Unspecified error in fetching URL: https://storage.googleapis.com/openimages/2017_07/images_2017_07.tar.gz',), timeout=19" when I try to download the dataset from chrome and by using shell script the message is as following :

--2017-10-16 11:15:09--  https://storage.googleapis.com/openimages/2016_08/model_2016_08.tar.gz
Resolving storage.googleapis.com (storage.googleapis.com)... 172.217.27.144
Connecting to storage.googleapis.com (storage.googleapis.com)|172.217.27.144|:443... connected.
GnuTLS: Error in the pull function.
Unable to establish SSL connection.

Images missing bounding box annotations?

Hi,

I downloaded the 1.76M images with bounding boxes, and noticed that not all validation/test images have bounding boxes annotations. Specifically, there are 5695 validation images and 17277 test images which do not have records in annotations-human-bbox.csv. Are they deliberate background images or there is something wrong?

What data set did you use to train the InceptionV3 model?

I notice that human-annotations labels.csv contains ~1.7M #samples, but in train/images.csv there's ~9M #samples. I assume they cannot fully join with labels file.
So I wonder you used the machine-annotations as training data set? or some else labels could join training set?

furthermore, for the unbalanced #samples between categories, how do you handle this? down or up sampling? or just leave them as many as they have?

Thanks,
XL

Complete file as torrent

It is quite hard to download the dataset at the moment. It would be nice if we could get a torrent up and running.
I would be happy to seed after I have downloaded.

I have a setup of i5 quad core. This makes it hard to thread it to download it fast. I have tried, and I hit the limit of the machines CPU and not the bandwidth.
Would be happy to get this dataset down to use it for my master thesis.

Detailed changelog

Hi, would you be able to provide more a detailed changelog about going from V2 to V3? More specifically, I would like to know whether the annotations existed in V2 were re-annotated in V3?

Thanks.

V2 dataset images

Are those the same as v1 images?
Or has the whole set of images also changed?

Confused on how to load in Python

I have been trying for a few hours to load the data into python for training with Keras. Is there a tutorial or can someone please guide me on how I can go about doing this.

Thanks,
-Alex

EDIT:
Should I maybe do this

Is this a free service?

I am looking for datasets for Image recognition of some object Classes. Downloading 990 mb seems unnecessary to me . Is there a way to get list of urls to download images of certain classes only for free ?

Speeding up a pretrained classifier

It takes about 9 seconds to process one image with this command:
tools/classify.py /path/to/img.jpg.
We're using NVIDIA Tesla K80.
Is there anyway to speed up the process of classification?

Importing Graph

@gkrasin, is there a way to easily import the graphdef and weights of the net trained on OpenImages? I try to import the graph from the model.ckpt.meta file as described here, but I run into a SSTableReader KeyError.

Missing labels in dict.csv

A number of labels appear in the annotation files but not in dict.csv.

For example:

grep '/m/03s2xy' ./train/labels.csv | wc -l
55
grep '/m/03s2xy' ../dict.csv | wc -l        
0

As far as I can tell this is the list for labels >50 occurences;

/m/03s2xy
/m/0dzlbx
/m/025ryqs
/m/0g4k1g
/m/03s2yq
/m/0c2r40
/m/02rnlk1
/m/0f8ym
/m/0jncr
/m/01x7t4
/m/04mtl
/m/05qwxx
/m/0mf2
/m/09bjv3

importing the pretrained model

I am trying to import the pretrained model checkpoint.

new_saver = tf.train.import_meta_graph('/root/project_dir/docker_volume/model.ckpt.meta')
new_saver.restore(sess, tf.train.latest_checkpoint('./'))

I get op undefined error:
ValueError: No op named SSTableReader in defined operations.

I am using tensorflow 1.1. Is there a specific tensorflow version I need to use?

[Feedback] Providing a method for integrity verification of dataset

Hello, really appreciate this project.

It would be really helpful if we can have an md5sum or better unique hash for each dataset so that the downloaded archive can be verified. Due to our network infrastructure, the download is often flaky, and thus even though we downloaded a 990MB archive, it turned out the archive was corrupted.

Thanks

I want to delete openimages viewer

Hello, gkrasin. How are you?

Sadly, Google banned my openimages viewer site as spam.
Along with this, my other sites related to the domain have also been banned from google search results.
To solve this problem, I need to delete my openimages site.
May I delete openimages site?

Include URL to smaller sized versions of images

The URL to the photo in the 'Image URLs and metadata' file is for the original sized version. To reduce the amount of data that needs to be downloaded, it would be useful if the URL (or just the secret) to a smaller sized version of the image can also be included.

If I understand correctly, the secret in the original sized URL is different from the secret for the other sizes (see https://www.flickr.com/services/api/misc.urls.html under 'Note'). I tried changing '_o' to '_z' in the URL but that unfortunately does not work in most cases.

Class Balances

Hi,

Big fan of google research datasets, have been hoping to use this dataset to train a model.

For my model, I am looking for 2000 to 3000 balanced classes with 1000 or more observations each. I examined the aggregate training and validation annotations, to test for class balance, taking up to 3000 observations from each class. The original annotations had 83 million labels, which this operation trimmed to 1.7 million rows. I expected my distribution to be a bit flatter at the upper bound otherwise be the same as the graphs provided in the repository, show around 2500 classes with 1000 occurrences each.

download

The graphs provided in the repository show around 2500 classes with 1000 occurrences each. I re-downloaded the annotations and checked and rewrote my scripts a bunch of times, but still keep getting this result. I was wondering if anyone else is having this issue, or if I am missing something.

Question about Human image-level annotations (Validation Set)

@gkrasin Thank you for publishing such awesome Open Images Dataset. In the Google research bog, it was mentioned that "For the validation set, we had human raters verify these automated labels to find and remove false positives." And you also provided a 9MB file "Human image-level annotations (validation set)". It is interesting to know will human raters annotate the labels that your previous vision model missed to predict? In other words, the human label file will certainly reduce number of labels (due to removing false positives). Meanwhile, will it increase the number of labels (those missing ones that the vision model used to predict). Thank you for you attention and looking forward to your reply!

images.csv contains decimal image ids, not hex as advertised.

grep 5215831864_46f356962f_o ./images_2016_08/train/images.csv
106528377697029,https://c1.staticflickr.com/5/4129/5215831864_46f356962f_o.jpg,https://www.flickr.com/photos/brokentaco/5215831864,https://creativecommons.org/licenses/by/2.0/,https://www.flickr.com/people/brokentaco/,David,28 Nov 2010 Our new house.

Thats the demo image noted on https://github.com/openimages/dataset ...
000060e3121c7305,"https://c1.staticflickr.com/5/4129/5215831864_46f356962f_o.jpg",\
"https://www.flickr.com/photos/bro....

Converting 106528377697029 to hex, does yield 000060e3121c7305

( from https://storage.googleapis.com/openimages/2016_08/images_2016_08.tar.gz )

lables.csv contains the hex id
./machine_ann_2016_08/train/labels.csv:000060e3121c7305,0.9:/m/06ht1,0.9:/m/05wrt,0.8:/m/01l0mw,0.7:/m/03d2wd,0.7:/m/03nxtz,0.7:/m/023907r,0.7:/m/020g49,0.6:/m/0l7_8,0.6:/m/02rfdq,0.6:/m/038t8_,0.6:/m/03f6tq,0.6:/m/01s105,0.5:/m/01nblt

Not a major issue, but makes it slightly awkward to work with.

WordNet codes for classes

This dataset is cool. Unfortunately, it is hard to understand the relations between classes. It would be even cooler to have something like the closest WordNet synset for each of the classes.

Can I download the full dataset(18TB)?

I want to download the full dataset (18TB) but I don't get the access permission.
I've already submitted CVDF access request.
I created google storage transfer but I got a message of 'permission denied', Can anyone tell me the way to download the full dataset?

UnicodeDecodeError on Windows

When trying to run "classify.py" on the Windows CMD I get the following error:

Traceback (most recent call last):
  File "D:\Softwareprojects\tensorflow\dataset-master\tools\classify.py", line 166, in <module>
    tf.app.run()
  File "D:\ProgramData\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\platform\app.py", line 48, in run
    _sys.exit(main(_sys.argv[:1] + flags_passthrough))
  File "D:\Softwareprojects\tensorflow\dataset-master\tools\classify.py", line 133, in main
    img_data = tf.gfile.FastGFile(image_path).read()
  File "D:\ProgramData\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\lib\io\file_io.py", line 122, in read
    pywrap_tensorflow.ReadFromStream(self._read_buf, length, status))
  File "D:\ProgramData\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\lib\io\file_io.py", line 90, in _prepare_value
    return compat.as_str_any(val)
  File "D:\ProgramData\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\util\compat.py", line 106, in as_str_any
    return as_str(value)
  File "D:\ProgramData\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\util\compat.py", line 84, in as_text
    return bytes_or_text.decode(encoding)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte

The problem is triggered by "tf.gfile.FastGFile" which tries to load the image as utf-8 string. https://github.com/openimages/dataset/blob/master/tools/classify.py#L132

I passed "rb" as option which seems to fix the problem:
img_data = tf.gfile.FastGFile(image_path, "rb").read()

Pretrained model details

Where can I get some information about how the pretrained model was trained?
Which images were used for training?
How well does the model perform?
Which code was used for training?

We made a object detection tutorial for open images

My company and I recently published an in-depth tutorial on how to use the object detection annotations and image files with the Tensorflow Object Detection API. We think it would be a great resource for newcomers to see when they take a look at the readme file. As we discovered for ourselves, Figuring out how to use this dataset with an existing object detection framework is non-trivial and could be quite daunting to the less experienced.
Here's a link to the tutorial

can't load the model: oidv2-resnet_v1_101.ckpt.data-00000-of-00001

Hi, many thanks for the new model.

I'm trying to run this script: classify_oidv2, but the command:
saver.restore(sess, 'oidv2-resnet_v1_101.ckpt.data-00000-of-00001') returns the error:
Process finished with exit code 139 (interrupted by signal 11: SIGSEGV)

I'm working on mac, cpu use.

Have the trained inception v3 model been released?

We have trained an Inception v3 model based on Open Images annotations alone, and the model is good enough to be used for fine-tuning applications as well as for other things, like DeepDream or artistic style transfer which require a well developed hierarchy of filters.

Have this model been released now ?

MID descriptions in other languages

Hello,

Just wondering if its possible to provide short descriptions of the mid's in other languages?

For example an entry in dict could like:

"/m/06z6r","swimming" -> "/m/06z6r","swimming","natación", "Schwimmen"

As freebase api is no longer available and there is strict quota on Google Knowledge graph if you could provide short descriptions in other languages it would be great. I'm currently looking for Spanish and German.

Or if there is a easier way to get the labels in other languages please let me know.

Spandana

MySQL won't accept 4-byte UTF-8 characters

mysql> show variables like '%char%';
+--------------------------+----------------------------+
| Variable_name | Value |
+--------------------------+----------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | utf8 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | utf8 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
8 rows in set (0.01 sec)

mysql> LOAD DATA LOCAL INFILE "images.csv" INTO TABLE images FIELDS TERMINATED BY ',' ENCLOSED BY '"' LINES TERMINATED BY '\n' IGNORE 1 ROWS;
ERROR 1300 (HY000): Invalid utf8 character string: '"#بسبوسة الفستق '

Help me! I don't know what I should do

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.