Comments (6)
I'm looking into existing tools for concatenating/merging COCO annotations, and I am back to the question of how to handle image ids. One potential option would be to generate an id number for each image in a dataset upon import, and then when we merge datasets we generate a new set of id numbers
"annotations": [
{
"id": "00001",
"source_id": "00001",
"image_id": "20160928-140314-0.jpg",
"category_id": 0,
"agdata_id": "deepweeds"
},
{
"id": "00002",
"source_id": "00002",
"image_id": "20160928-140337-0.jpg",
"category_id": 0,
"agdata_id": "deepweeds"
}
Then when we merge datasets, a new id number would be generated and the source_id would be retained:
"annotations": [
{
"id": "00001",
"source_id": "00001",
"image_id": "20160928-140314-0.jpg",
"category_id": 0,
"agdata_id": "deepweeds"
},
{
"id": "00002",
"source_id": "00001",
"image_id": "20140115-35235.jpg",
"category_id": 3,
"agdata_id": "cwfid"
}
Another idea would be to generate ids for each image by combing the dataset name with a number (i.e. deepweeds1, cwid2003).
We could stick with the "unique" ids that camtraps uses, but there is still the extremely rare chance to have overlap with those ids.
from weed-ai.
Here is COCO-assistant's merging function: https://github.com/ashnair1/COCO-Assistant/blob/master/coco_assistant/coco_assistant.py
Edit: And another option for merging individual image JSON annotations into one MSCOCO JSON: https://github.com/fcakyon/labelme2coco/blob/master/labelme2coco/labelme2coco.py
from weed-ai.
Good to know about coco_assistant
and pycocotools
. They will be useful to test the compatibility of our format with existing tools.
I will admit that as the author of coco_assistant
implies, the format of COCO is not so friendly to work with, seeing as you need to read the whole thing in to decode any part of it by dereferencing those id pointers, etc. It's designed like a normalised relational database rather than a document store (used in Pascal VOC for example) which keeps information about each item localised, and allows for easy concatenation of datasets.
And maybe we should have considered VOC more seriously: apart from merging datasets, we will need to convert whatever format we have to documented-oriented storage, for standard faceted search tools to work with. Similarly it's easier to turn COCO into a relational database and VOC into a dataframe, because VOC stores information redundantly and COCO stores it through reference.
from weed-ai.
Would it be too late for us to change gears to VOC? When we originally decided on COCO I think a large part of its appeal was its widespread popularity and the JSON format. I think I also thought the idea of one annotation for each dataset would be good. Now I'm not so sure...
from weed-ai.
I think I also thought the idea of one annotation for each dataset would be good. Now I'm not so sure...
Not sure what you mean by that. Let's talk about it tomorrow. I think VOC may have presented some other limitations (and it's become less popular I think?). You could look into the tooling around it...
from weed-ai.
I've been looking into the COCO-Assistant dataset utility functions for some inspiration: https://github.com/ashnair1/COCO-Assistant/blob/master/coco_assistant/coco_assistant.py
One of the key features of COCO-Assistant is that a specific directory structure is adhered to.
from weed-ai.
Related Issues (20)
- Download data in different formats HOT 2
- 24hr editing/cooling off period for uploads to delete or change
- ocfl deposit metadata is still using the user's email address
- Need a reindex after migrating to ocfl HOT 2
- Handling error messages in the frontend
- Duplicate categories error HOT 1
- increase maximum image file size
- Final issues before delivery
- dataset editing feature functioning inconsistently
- Add annotation tab to frontend HOT 2
- CVAT integration feedback HOT 1
- annotate tab doesn't appear on 'About' page HOT 1
- Upload error - completes but with ConnectionTimeout error
- dataset is approved but does not show in dataset list HOT 1
- Diagnostic tool
- Integration with AgML
- site becomes unresponsive on data upload HOT 1
- cvat build breaks sometimes on pip install sklearn
- cvat-ui build fails on node 17 due to an open-ssl hash incompatibility
- Celery dependency issue HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from weed-ai.