GETi Version - 1.0.1
SDK version - 1.2.0
Task: Trying to create a detection project with COCO dataset using geti-sdk. Images are getting uploaded successfully but getting following error while the annotations are being uploaded:
2023-03-07 20:26:35,383 - INFO - Project created successfully.
2023-03-07 20:26:45,543 - INFO - Starting image upload...
Uploading images: 100%
12/12 [00:05<00:00, 3.48it/s]
2023-03-07 20:26:50,824 - INFO - Upload complete. Uploaded 12 new images in 5.3 seconds.
2023-03-07 20:26:50,826 - INFO - Annotations have been converted to boxes
2023-03-07 20:26:50,827 - INFO - Dataset is prepared for detection task.
2023-03-07 20:26:50,828 - INFO - Starting image annotation upload...
Uploading image annotations: 0%
0/12 [00:00<?, ?it/s]
ValueError Traceback (most recent call last)
Cell In[35], line 1
----> 1 project = client.create_single_task_project_from_dataset(
2 project_name="m1",
3 project_type="detection",
4 path_to_images=dataset_path,
5 annotation_reader=annotation_reader,
6 # number_of_images_to_upload=100,
7 # number_of_images_to_annotate=90,
8 enable_auto_train=False,
9 )
File ~/venvs/geti_env/lib/python3.8/site-packages/geti_sdk/geti.py:638, in Geti.create_single_task_project_from_dataset(self, project_name, project_type, path_to_images, annotation_reader, labels, number_of_images_to_upload, number_of_images_to_annotate, enable_auto_train, upload_videos)
631 # Upload annotations
632 annotation_client = AnnotationClient(
633 session=self.session,
634 project=project,
635 workspace_id=self.workspace_id,
636 annotation_reader=annotation_reader,
637 )
--> 638 annotation_client.upload_annotations_for_images(images)
640 if len(videos) > 0:
641 annotation_client.upload_annotations_for_videos(videos)
File ~/venvs/geti_env/lib/python3.8/site-packages/geti_sdk/rest_clients/annotation_clients/annotation_client.py:148, in AnnotationClient.upload_annotations_for_images(self, images, append_annotations)
146 for image in tqdm(images, desc=tqdm_prefix):
147 if not append_annotations:
--> 148 response = self._upload_annotation_for_2d_media_item(
149 media_item=image
150 )
151 else:
152 response = self._append_annotation_for_2d_media_item(
153 media_item=image
154 )
File ~/venvs/geti_env/lib/python3.8/site-packages/geti_sdk/rest_clients/annotation_clients/base_annotation_client.py:169, in BaseAnnotationClient._upload_annotation_for_2d_media_item(self, media_item, annotation_scene)
167 else:
168 if self.annotation_reader is not None:
--> 169 scene_to_upload = self._read_2d_media_annotation_from_source(
170 media_item=media_item
171 )
172 else:
173 raise ValueError(
174 "You attempted to upload an annotation for a media item, but no "
175 "annotation data was passed directly and no annotation reader was "
176 "defined for the AnnotationClient. Therefore, the "
177 "AnnotationClient is unable to upload any annotation data."
178 )
File ~/venvs/geti_env/lib/python3.8/site-packages/geti_sdk/rest_clients/annotation_clients/base_annotation_client.py:306, in BaseAnnotationClient._read_2d_media_annotation_from_source(self, media_item, preserve_shape_for_global_labels)
293 def _read_2d_media_annotation_from_source(
294 self,
295 media_item: Union[Image, VideoFrame],
296 preserve_shape_for_global_labels: bool = False,
297 ) -> AnnotationScene:
298 """
299 Retrieve the annotation for the media_item, and return it in the
300 proper format to be sent to the GETi /annotations endpoint. This method uses the
(...)
304 :return: Dictionary containing the annotation, in GETi format
305 """
--> 306 annotation_list = self.annotation_reader.get_data(
307 filename=media_item.name,
308 label_name_to_id_mapping=self.label_mapping,
309 media_information=media_item.media_information,
310 preserve_shape_for_global_labels=preserve_shape_for_global_labels,
311 )
312 return AnnotationRESTConverter.from_dict(
313 {
314 "media_identifier": media_item.identifier,
(...)
317 }
318 )
File ~/venvs/geti_env/lib/python3.8/site-packages/geti_sdk/annotation_readers/datumaro_annotation_reader/datumaro_annotation_reader.py:174, in DatumAnnotationReader.get_data(self, filename, label_name_to_id_mapping, media_information, preserve_shape_for_global_labels)
148 def get_data(
149 self,
150 filename: str,
(...)
153 preserve_shape_for_global_labels: bool = False,
154 ) -> List[SCAnnotation]:
155 """
156 Return the annotation data for the dataset item corresponding to filename
.
157
(...)
172 dataset item.
173 """
--> 174 ds_item = self.dataset.get_item_by_id(filename)
175 image_size = ds_item.image.size
176 annotation_list: List[SCAnnotation] = []
File ~/venvs/geti_env/lib/python3.8/site-packages/geti_sdk/annotation_readers/datumaro_annotation_reader/datumaro_dataset.py:220, in DatumaroDataset.get_item_by_id(self, datum_id)
218 ds_item = self.dataset.get(id=datum_id, subset=subset_name)
219 if ds_item is None:
--> 220 raise ValueError(
221 f"Dataset item with id {datum_id} was not found in the dataset!"
222 )
223 return ds_item
ValueError: Dataset item with id img1364 was not found in the dataset!
Note: When I use the same "Create Project from Dataset" utility in the GETi UI the same dataset works well.
The dataset has a nested structure and I think the image names ("file_name" key in coco json) are not correctly populated and thus failing to read annotations for them.
Dataset strucure:
├── annotations
│ └── instances_default.json
└── images
└── default
├── images
│ ├── img0009.jpg
│ ├── img0013.jpg
│ ├── img0040.jpg
│ ├── img0051.jpg
│ ├── img0080.jpg
│ ├── img0093.jpg
│ ├── img0114.jpg
│ ├── img0127.jpg
│ ├── img0151.jpg
│ ├── img0178.jpg
│ ├── img0194.jpg
│ ├── img0200.jpg
│ ├── img0208.jpg
│ ├── img0229.jpg
│ └── img0234.jpg
└── train
├── img0896.jpg
├── img0927.jpg
├── img0940.jpg
├── img0959.jpg
├── img1317.jpg
├── img1338.jpg
├── img1364.jpg
├── img1368.jpg
├── img1409.jpg
├── img1424.jpg
├── img1995.jpg
├── img2030.jpg
├── img2100.jpg
├── img3648.jpg
└── img3831.jpg
Ans this is how coco json looks for the "datum_id" mentioned in the error, here you can see that the file_name key has a nested path string:
....
"images": [
{
"id": 225,
"width": 1920,
"height": 1080,
"file_name": "train/img1364.jpg",
"license": 0,
"flickr_url": "",
"coco_url": "",
"date_captured": 0
},
......