Giter Club home page Giter Club logo

Comments (6)

hlydecker avatar hlydecker commented on May 25, 2024

I'm looking into existing tools for concatenating/merging COCO annotations, and I am back to the question of how to handle image ids. One potential option would be to generate an id number for each image in a dataset upon import, and then when we merge datasets we generate a new set of id numbers

    "annotations": [
        {
            "id": "00001",
            "source_id": "00001",
            "image_id": "20160928-140314-0.jpg",
            "category_id": 0,
            "agdata_id": "deepweeds"
        },
        {
            "id": "00002",
            "source_id": "00002",
            "image_id": "20160928-140337-0.jpg",
            "category_id": 0,
            "agdata_id": "deepweeds"
        }

Then when we merge datasets, a new id number would be generated and the source_id would be retained:

    "annotations": [
        {
            "id": "00001",
            "source_id": "00001",
            "image_id": "20160928-140314-0.jpg",
            "category_id": 0,
            "agdata_id": "deepweeds"
        },
        {
            "id": "00002",
            "source_id": "00001",
            "image_id": "20140115-35235.jpg",
            "category_id": 3,
            "agdata_id": "cwfid"
        }

Another idea would be to generate ids for each image by combing the dataset name with a number (i.e. deepweeds1, cwid2003).

We could stick with the "unique" ids that camtraps uses, but there is still the extremely rare chance to have overlap with those ids.

from weed-ai.

hlydecker avatar hlydecker commented on May 25, 2024

Here is COCO-assistant's merging function: https://github.com/ashnair1/COCO-Assistant/blob/master/coco_assistant/coco_assistant.py

Edit: And another option for merging individual image JSON annotations into one MSCOCO JSON: https://github.com/fcakyon/labelme2coco/blob/master/labelme2coco/labelme2coco.py

from weed-ai.

jnothman avatar jnothman commented on May 25, 2024

Good to know about coco_assistant and pycocotools. They will be useful to test the compatibility of our format with existing tools.

I will admit that as the author of coco_assistant implies, the format of COCO is not so friendly to work with, seeing as you need to read the whole thing in to decode any part of it by dereferencing those id pointers, etc. It's designed like a normalised relational database rather than a document store (used in Pascal VOC for example) which keeps information about each item localised, and allows for easy concatenation of datasets.

And maybe we should have considered VOC more seriously: apart from merging datasets, we will need to convert whatever format we have to documented-oriented storage, for standard faceted search tools to work with. Similarly it's easier to turn COCO into a relational database and VOC into a dataframe, because VOC stores information redundantly and COCO stores it through reference.

from weed-ai.

hlydecker avatar hlydecker commented on May 25, 2024

Would it be too late for us to change gears to VOC? When we originally decided on COCO I think a large part of its appeal was its widespread popularity and the JSON format. I think I also thought the idea of one annotation for each dataset would be good. Now I'm not so sure...

from weed-ai.

jnothman avatar jnothman commented on May 25, 2024

I think I also thought the idea of one annotation for each dataset would be good. Now I'm not so sure...

Not sure what you mean by that. Let's talk about it tomorrow. I think VOC may have presented some other limitations (and it's become less popular I think?). You could look into the tooling around it...

from weed-ai.

hlydecker avatar hlydecker commented on May 25, 2024

I've been looking into the COCO-Assistant dataset utility functions for some inspiration: https://github.com/ashnair1/COCO-Assistant/blob/master/coco_assistant/coco_assistant.py

One of the key features of COCO-Assistant is that a specific directory structure is adhered to.

from weed-ai.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.