Not sure how useful this is in practice, but might be a good exercise to demonstrate t

Here is COCO-assistant's merging function: <a href="https://github.com/ashnair1/COCO-A

Good to know about coco_assistant and <code class="no

Add a tool to concatenate datasets about weed-ai HOT 6 CLOSED

weed-ai commented on May 25, 2024

Add a tool to concatenate datasets

from weed-ai.

Comments (6)

hlydecker commented on May 25, 2024

I'm looking into existing tools for concatenating/merging COCO annotations, and I am back to the question of how to handle image ids. One potential option would be to generate an id number for each image in a dataset upon import, and then when we merge datasets we generate a new set of id numbers

    "annotations": [
        {
            "id": "00001",
            "source_id": "00001",
            "image_id": "20160928-140314-0.jpg",
            "category_id": 0,
            "agdata_id": "deepweeds"
        },
        {
            "id": "00002",
            "source_id": "00002",
            "image_id": "20160928-140337-0.jpg",
            "category_id": 0,
            "agdata_id": "deepweeds"
        }

Then when we merge datasets, a new id number would be generated and the source_id would be retained:

    "annotations": [
        {
            "id": "00001",
            "source_id": "00001",
            "image_id": "20160928-140314-0.jpg",
            "category_id": 0,
            "agdata_id": "deepweeds"
        },
        {
            "id": "00002",
            "source_id": "00001",
            "image_id": "20140115-35235.jpg",
            "category_id": 3,
            "agdata_id": "cwfid"
        }

Another idea would be to generate ids for each image by combing the dataset name with a number (i.e. deepweeds1, cwid2003).

We could stick with the "unique" ids that camtraps uses, but there is still the extremely rare chance to have overlap with those ids.

from weed-ai.

hlydecker commented on May 25, 2024

Here is COCO-assistant's merging function: https://github.com/ashnair1/COCO-Assistant/blob/master/coco_assistant/coco_assistant.py

Edit: And another option for merging individual image JSON annotations into one MSCOCO JSON: https://github.com/fcakyon/labelme2coco/blob/master/labelme2coco/labelme2coco.py

from weed-ai.

jnothman commented on May 25, 2024

Good to know about coco_assistant and pycocotools. They will be useful to test the compatibility of our format with existing tools.

I will admit that as the author of coco_assistant implies, the format of COCO is not so friendly to work with, seeing as you need to read the whole thing in to decode any part of it by dereferencing those id pointers, etc. It's designed like a normalised relational database rather than a document store (used in Pascal VOC for example) which keeps information about each item localised, and allows for easy concatenation of datasets.

And maybe we should have considered VOC more seriously: apart from merging datasets, we will need to convert whatever format we have to documented-oriented storage, for standard faceted search tools to work with. Similarly it's easier to turn COCO into a relational database and VOC into a dataframe, because VOC stores information redundantly and COCO stores it through reference.

from weed-ai.

hlydecker commented on May 25, 2024

Would it be too late for us to change gears to VOC? When we originally decided on COCO I think a large part of its appeal was its widespread popularity and the JSON format. I think I also thought the idea of one annotation for each dataset would be good. Now I'm not so sure...

from weed-ai.

jnothman commented on May 25, 2024

I think I also thought the idea of one annotation for each dataset would be good. Now I'm not so sure...

Not sure what you mean by that. Let's talk about it tomorrow. I think VOC may have presented some other limitations (and it's become less popular I think?). You could look into the tooling around it...

from weed-ai.

hlydecker commented on May 25, 2024

I've been looking into the COCO-Assistant dataset utility functions for some inspiration: https://github.com/ashnair1/COCO-Assistant/blob/master/coco_assistant/coco_assistant.py

One of the key features of COCO-Assistant is that a specific directory structure is adhered to.

from weed-ai.

Add a tool to concatenate datasets about weed-ai HOT 6 CLOSED

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent