segments-ai / segments-ai Goto Github PK

Segments.ai Python SDK

License: MIT License

Python 99.89% Makefile 0.11%

computer-vision data-labeling deep-learning labeling-tool annotation dataset image-annotation panoptic-segmentation pointcloud pointcloud-segmentation robotics semantic-segmentation pointcloud-detection

segments-ai's Introduction

Segments.ai is the training data platform for computer vision engineers and labeling teams. Our powerful labeling interfaces, easy-to-use management features, and extensive API integrations help you iterate quickly between data labeling, model training and failure case discovery.

Quickstart

Walk through the Python SDK quickstart.

Documentation

Please refer to the documentation for usage instructions.

Blog

Read our blog posts to learn more about the platform.

Changelog

The most notable changes in v1.0 of the Python SDK compared to v0.73 include:

Added Python type hints and better auto-generated docs.
Improved error handling: functions now raise proper exceptions.
New functions for managing issues and collaborators.

You can upgrade to v1.0 with pip install -—upgrade segments-ai. Please be mindful of following breaking changes:

The client functions now return classes instead of dicts, so you should access properties using dot-based indexing (e.g. dataset.description) instead of dict-based indexing (e.g. dataset[’description’]).
Functions now consistently raise exceptions, instead of sometimes silently failing with a print statement. You might want to handle these exceptions with a try-except block.
Some legacy fields are no longer returned: dataset.tasks, dataset.task_readme, dataset.data_type.
The default value of the id_increment argument in utils.export_dataset() and utils.get_semantic_bitmap() is changed from 1 to 0.
Python 3.6 and lower are no longer supported.

segments-ai's People

Contributors

Stargazers

Watchers

Forkers

ssnirgudkar arasharchor marcoalejandro ayman-saleh altarizer shohmax junyeongchoi-airs kailthen nicolepcx

segments-ai's Issues

access labeler for a given sample or annotation

Hi, I'm looking for a way to know the user that submitted a particular annotation or sample. I tried figuring this out from the examples in the API docs, but I couldn't quite find the information I'm looking for. Is there a way to obtain a username or userid for an annotation through the API?

the documentation for the python SDK should mention the minimum supported python version per release

Problem:
the last SDK release does not work with python 3.8.10 (the python version of Ubuntu 20.04) due to errors from typing/pydantic. This should be documented.

KeyError: 'labelsets' when creating SegmentsDataset

Hi,
currently if I initialize dataset with code provided in google colab, I'm getting an issue. Yesterday the same code was working and some time ago instead of labelset was "task", but it seems that it changed to labelset permanently, right?
# Initialize a dataset from the release file release = client.get_release(dataset_name, 'v0.1') dataset = SegmentsDataset(release, labelset='ground-truth', filter_by='labeled')

`---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
in ()
4 # Initialize a dataset from the release file
5 release = client.get_release(dataset_name, 'v0.1')
----> 6 dataset = SegmentsDataset(release, labelset='ground-truth', filter_by='labeled')
7
8 # Visualize a few samples in the dataset

/usr/local/lib/python3.6/dist-packages/segments/dataset.py in init(self, release_file, labelset, filter_by, segments_dir)
43
44 # First some checks
---> 45 if not self.labelset in self.release['dataset']['labelsets']:
46 print('There is no labelset with name "{}".'.format(self.labelset))
47 return

KeyError: 'labelsets'`

PIL.UnidentifiedImageError when downloading dataset with missing image

Due to the fact that currently there is no transactional protection between metadata en image data it's possible to get a corrupted dataset where the metadata has an image path and marks a sample as annotated & reviewed but there is no actual image. When you try to download such a dataset you get a PIL.UnidentifiedImageError.

The following modified dataset.py has the necessary minimal changes to be able to download such a dataset but of course this solution isn't fully satisfactory because the client still has to deal with corrupt data (in this case missing images). The real good solution would be to prevent such corruption in the first place. Note that even in this case some changes to dataset.p error handling are still useful because some setting of dummy metadata in case of an error just didn't work (in wrong branch).
Note that the changes also include the change for issue #68
dataset.zip

export_dataset error

Hey, I'm getting an error when exporting a dataset.

`~/.local/share/virtualenvs/segments.ai-ds-dl-cuTqPwYt/lib/python3.6/site-packages/segments/export.py in export_coco_panoptic(dataset, export_folder)
    301             instance_mask = np.array(sample['segmentation_bitmap'], np.uint32) == instance['id']
    302             panoptic_label[instance_mask] = color
--> 303             y0, x0, y1, x1 = get_bbox(instance_mask)
    304             rle = mask.encode(np.array(instance_mask[:,:,None], dtype=np.uint8, order='F'))[0] # https://github.com/matterport/Mask_RCNN/issues/387#issuecomment-522671380
    305             area = int(mask.area(rle))

TypeError: 'bool' object is not iterableI checked and the functionget_bbox can returnFalse` which cannot be unpacked.

Also, with the option 'semantic' I seem to get colored PNGs, but according to https://docs.segments.ai/export they should be grayscale.

no json file for huggingface dataset

Hi,
I am using this tool for generating custom dataset. However, after release i cannot find the corresponding "id2label.json" file. Is that possible automatically to generate? Thanks.
Example:
{"0": "unlabeled", "1": "flat-road"}

Error in add_sample

Hello,

I created a new dataset:

name = "Test"
description = "Test."
task_type = "bboxes"

task_attributes = dataset.release['dataset']['task_attributes'] # copied from another dataset

dataset_test = client.add_dataset(name, description, task_type, task_attributes)
print(dataset_test)

I'm trying a add new samples:

samples = dataset.release["dataset"]["samples"]
for sample in samples:
  attributes = sample['attributes']
  print(attributes)
  name = sample['name']
  sample = client.add_sample(dataset_test, name, attributes)

I'm getting this cryptic error coming from the API (details missing?):

https://segmentsai-prod.s3.eu-west-2.amazonaws.com/assets/CamilleDP/4a05290d-f0db-4303-aa9b-5d397c43ce0d.jpg
{'ground-truth': {'label_status': 'LABELED', 'attributes': {'format_version': '0.1', 'annotations': [{'id': 1, 'category_id': 2, 'type': 'bbox', 'points': [[6086.1, 2973.29], [6353.32, 3225.53]]}, {'id': 2, 'category_id': 2, 'type': 'bbox', 'points': [[1337.66, 2158.71], [1546.79, 2367.41]]}]}}}
{'image': {'url': 'https://segmentsai-prod.s3.eu-west-2.amazonaws.com/assets/CamilleDP/4a05290d-f0db-4303-aa9b-5d397c43ce0d.jpg'}}
---------------------------------------------------------------------------
HTTPError                                 Traceback (most recent call last)
[/usr/local/lib/python3.10/dist-packages/segments/client.py](https://localhost:8080/#) in throw_segments_exception(model, *args, **kwargs)
    103             r = f(*args, **kwargs)
--> 104             r.raise_for_status()
    105             if r.content:

3 frames
[/usr/local/lib/python3.10/dist-packages/requests/models.py](https://localhost:8080/#) in raise_for_status(self)
   1020         if http_error_msg:
-> 1021             raise HTTPError(http_error_msg, response=self)
   1022 

HTTPError: 404 Client Error: Not Found for url: https://api.segments.ai/datasets/name='Test'%20full_name='sfoucher/Test'%20cloned_from=None%20description='Test.'%20category='other'%20public=False%20owner=Owner(username='sfoucher',%20created_at='2022-04-20T19:15:19Z',%20email=None)%20created_at='2024-04-19T16:24:21.105013Z'%20enable_ratings=False%20enable_skip_labeling=True%20enable_skip_reviewing=False%20enable_save_button=False%20enable_label_status_verified=False%20enable_same_dimensions_track_constraint=False%20enable_interpolation=True%20use_timestamps_for_interpolation=True%20task_type='bboxes'%20label_stats=LabelStats(REVIEWED=None,%20REVIEWING_IN_PROGRESS=None,%20LABELED=None,%20LABELING_IN_PROGRESS=None,%20REJECTED=None,%20PRELABELED=None,%20SKIPPED=None,%20VERIFIED=None,%20UNLABELED=None,%20TOTAL=None)%20labeling_inactivity_timeout_seconds=300%20samples_count=0%20collaborators_count=None%20task_attributes=TaskAttributes(format_version='0.1',%20categories=%5BTaskAttributeCategory(name='nid%20vide',%20id=1,%20color=(0,%20113,%20188),%20has_instances=None,%20attributes=None,%20dimensions=None),%20TaskAttributeCategory(name='nid%20actif',%20id=2,%20color=(216,%2082,%2024),%20has_instances=None,%20attributes=None,%20dimensions=None)%5D,%20image_attributes=None)%20labelsets=None%20role=None%20readme=''%20metadata=%7B%7D%20noncollaborator_can_label=False%20noncollaborator_can_review=False%20insights_urls=None/samples/

During handling of the above exception, another exception occurred:

NotFoundError                             Traceback (most recent call last)
[<ipython-input-41-ee8836897c4a>](https://localhost:8080/#) in <cell line: 2>()
      6   print(attributes)
      7   name = sample['name']
----> 8   sample = client.add_sample(dataset_test, name, attributes)
      9   #print(sample)

[/usr/local/lib/python3.10/dist-packages/segments/client.py](https://localhost:8080/#) in add_sample(self, dataset_identifier, name, attributes, metadata, priority, assigned_labeler, assigned_reviewer)
   1046             payload["assigned_reviewer"] = assigned_reviewer
   1047 
-> 1048         r = self._post(
   1049             f"/datasets/{dataset_identifier}/samples/",
   1050             data=payload,

[/usr/local/lib/python3.10/dist-packages/segments/client.py](https://localhost:8080/#) in throw_segments_exception(model, *args, **kwargs)
    121             text = e.response.text.lower()
    122             if "not found" in text or "does not exist" in text:
--> 123                 raise NotFoundError(message=text, cause=e)
    124             if "already exists" in text or "already have" in text:
    125                 raise AlreadyExistsError(message=text, cause=e)

NotFoundError: {"detail":"not found."}

upload_asset fails

When uploading the above file with upload_asset as described in the README.md the code exits with the following exception

---------------------------------------------------------------------------
SysCallError                              Traceback (most recent call last)
/opt/anaconda3/lib/python3.7/site-packages/urllib3/contrib/pyopenssl.py in _send_until_done(self, data)
    339             try:
--> 340                 return self.connection.send(data)
    341             except OpenSSL.SSL.WantWriteError:

/opt/anaconda3/lib/python3.7/site-packages/OpenSSL/SSL.py in send(self, buf, flags)
   1736         result = _lib.SSL_write(self._ssl, buf, len(buf))
-> 1737         self._raise_ssl_error(self._ssl, result)
   1738         return result

/opt/anaconda3/lib/python3.7/site-packages/OpenSSL/SSL.py in _raise_ssl_error(self, ssl, result)
   1638                     if errno != 0:
-> 1639                         raise SysCallError(errno, errorcode.get(errno))
   1640                 raise SysCallError(-1, "Unexpected EOF")

SysCallError: (32, 'EPIPE')

During handling of the above exception, another exception occurred:

OSError                                   Traceback (most recent call last)
/opt/anaconda3/lib/python3.7/site-packages/urllib3/connectionpool.py in urlopen(self, method, url, body, headers, retries, redirect, assert_same_host, timeout, pool_timeout, release_conn, chunked, body_pos, **response_kw)
    671                 headers=headers,
--> 672                 chunked=chunked,
    673             )

/opt/anaconda3/lib/python3.7/site-packages/urllib3/connectionpool.py in _make_request(self, conn, method, url, timeout, chunked, **httplib_request_kw)
    386         else:
--> 387             conn.request(method, url, **httplib_request_kw)
    388 

/opt/anaconda3/lib/python3.7/http/client.py in request(self, method, url, body, headers, encode_chunked)
   1228         """Send a complete request to the server."""
-> 1229         self._send_request(method, url, body, headers, encode_chunked)
   1230 

/opt/anaconda3/lib/python3.7/http/client.py in _send_request(self, method, url, body, headers, encode_chunked)
   1274             body = _encode(body, 'body')
-> 1275         self.endheaders(body, encode_chunked=encode_chunked)
   1276 

/opt/anaconda3/lib/python3.7/http/client.py in endheaders(self, message_body, encode_chunked)
   1223             raise CannotSendHeader()
-> 1224         self._send_output(message_body, encode_chunked=encode_chunked)
   1225 

/opt/anaconda3/lib/python3.7/http/client.py in _send_output(self, message_body, encode_chunked)
   1054                         + b'\r\n'
-> 1055                 self.send(chunk)
   1056 

/opt/anaconda3/lib/python3.7/http/client.py in send(self, data)
    976         try:
--> 977             self.sock.sendall(data)
    978         except TypeError:

/opt/anaconda3/lib/python3.7/site-packages/urllib3/contrib/pyopenssl.py in sendall(self, data)
    351             sent = self._send_until_done(
--> 352                 data[total_sent : total_sent + SSL_WRITE_BLOCKSIZE]
    353             )

/opt/anaconda3/lib/python3.7/site-packages/urllib3/contrib/pyopenssl.py in _send_until_done(self, data)
    345             except OpenSSL.SSL.SysCallError as e:
--> 346                 raise SocketError(str(e))
    347 

OSError: (32, 'EPIPE')

During handling of the above exception, another exception occurred:

ProtocolError                             Traceback (most recent call last)
/opt/anaconda3/lib/python3.7/site-packages/requests/adapters.py in send(self, request, stream, timeout, verify, cert, proxies)
    448                     retries=self.max_retries,
--> 449                     timeout=timeout
    450                 )

/opt/anaconda3/lib/python3.7/site-packages/urllib3/connectionpool.py in urlopen(self, method, url, body, headers, retries, redirect, assert_same_host, timeout, pool_timeout, release_conn, chunked, body_pos, **response_kw)
    719             retries = retries.increment(
--> 720                 method, url, error=e, _pool=self, _stacktrace=sys.exc_info()[2]
    721             )

/opt/anaconda3/lib/python3.7/site-packages/urllib3/util/retry.py in increment(self, method, url, response, error, _pool, _stacktrace)
    399             if read is False or not self._is_method_retryable(method):
--> 400                 raise six.reraise(type(error), error, _stacktrace)
    401             elif read is not None:

/opt/anaconda3/lib/python3.7/site-packages/urllib3/packages/six.py in reraise(tp, value, tb)
    733             if value.__traceback__ is not tb:
--> 734                 raise value.with_traceback(tb)
    735             raise value

/opt/anaconda3/lib/python3.7/site-packages/urllib3/connectionpool.py in urlopen(self, method, url, body, headers, retries, redirect, assert_same_host, timeout, pool_timeout, release_conn, chunked, body_pos, **response_kw)
    671                 headers=headers,
--> 672                 chunked=chunked,
    673             )

/opt/anaconda3/lib/python3.7/site-packages/urllib3/connectionpool.py in _make_request(self, conn, method, url, timeout, chunked, **httplib_request_kw)
    386         else:
--> 387             conn.request(method, url, **httplib_request_kw)
    388 

/opt/anaconda3/lib/python3.7/http/client.py in request(self, method, url, body, headers, encode_chunked)
   1228         """Send a complete request to the server."""
-> 1229         self._send_request(method, url, body, headers, encode_chunked)
   1230 

/opt/anaconda3/lib/python3.7/http/client.py in _send_request(self, method, url, body, headers, encode_chunked)
   1274             body = _encode(body, 'body')
-> 1275         self.endheaders(body, encode_chunked=encode_chunked)
   1276 

/opt/anaconda3/lib/python3.7/http/client.py in endheaders(self, message_body, encode_chunked)
   1223             raise CannotSendHeader()
-> 1224         self._send_output(message_body, encode_chunked=encode_chunked)
   1225 

/opt/anaconda3/lib/python3.7/http/client.py in _send_output(self, message_body, encode_chunked)
   1054                         + b'\r\n'
-> 1055                 self.send(chunk)
   1056 

/opt/anaconda3/lib/python3.7/http/client.py in send(self, data)
    976         try:
--> 977             self.sock.sendall(data)
    978         except TypeError:

/opt/anaconda3/lib/python3.7/site-packages/urllib3/contrib/pyopenssl.py in sendall(self, data)
    351             sent = self._send_until_done(
--> 352                 data[total_sent : total_sent + SSL_WRITE_BLOCKSIZE]
    353             )

/opt/anaconda3/lib/python3.7/site-packages/urllib3/contrib/pyopenssl.py in _send_until_done(self, data)
    345             except OpenSSL.SSL.SysCallError as e:
--> 346                 raise SocketError(str(e))
    347 

ProtocolError: ('Connection aborted.', OSError("(32, 'EPIPE')"))

During handling of the above exception, another exception occurred:

ConnectionError                           Traceback (most recent call last)
<ipython-input-105-2be455ca0a8e> in <module>
----> 1 asset = client.upload_asset(png, filename='sswv_0055.jpg')

/opt/anaconda3/lib/python3.7/site-packages/segments/client.py in upload_asset(self, file, filename)
     59     def upload_asset(self, file, filename):
     60         r = self.post('/assets/', {'filename': filename})
---> 61         response_aws = self._upload_to_aws(file.getvalue(), r.json()['presignedPostFields'])
     62         return r.json()
     63 

/opt/anaconda3/lib/python3.7/site-packages/segments/client.py in _upload_to_aws(file, aws_fields)
    126         r = requests.post(aws_fields['url'],
    127                                  files=files,
--> 128                                  data=aws_fields['fields'])
    129         return r

/opt/anaconda3/lib/python3.7/site-packages/requests/api.py in post(url, data, json, **kwargs)
    117     """
    118 
--> 119     return request('post', url, data=data, json=json, **kwargs)
    120 
    121 

/opt/anaconda3/lib/python3.7/site-packages/requests/api.py in request(method, url, **kwargs)
     59     # cases, and look like a memory leak in others.
     60     with sessions.Session() as session:
---> 61         return session.request(method=method, url=url, **kwargs)
     62 
     63 

/opt/anaconda3/lib/python3.7/site-packages/requests/sessions.py in request(self, method, url, params, data, headers, cookies, files, auth, timeout, allow_redirects, proxies, hooks, stream, verify, cert, json)
    528         }
    529         send_kwargs.update(settings)
--> 530         resp = self.send(prep, **send_kwargs)
    531 
    532         return resp

/opt/anaconda3/lib/python3.7/site-packages/requests/sessions.py in send(self, request, **kwargs)
    641 
    642         # Send the request
--> 643         r = adapter.send(request, **kwargs)
    644 
    645         # Total elapsed time of the request (approximately)

/opt/anaconda3/lib/python3.7/site-packages/requests/adapters.py in send(self, request, stream, timeout, verify, cert, proxies)
    496 
    497         except (ProtocolError, socket.error) as err:
--> 498             raise ConnectionError(err, request=request)
    499 
    500         except MaxRetryError as e:

ConnectionError: ('Connection aborted.', OSError("(32, 'EPIPE')"))

The error is specific to the above image (and some others) but in general I can upload with the code you provide. Can't quite figure out from the given traceback what the problem with the file is though.

pycocotools is a problematic dependency

Hi. I'm wondering if it is possible to make pycocotools an optional dependency. As far as I can tell, this dependency is used in export.py - it's use is even a bit suspect as the import is from pycocotools import mask, but then the script uses a variable mask as well, which - excuse me - masks the import.

The issue is that pycocotools is a problematic package to install, it gives all kinds of problems and is not maintained at all. Given it's limited use here, might it be possible to either make this package optional to install, or even remove the whole dependency?

I could make a PR as well if you like, let me know.

release2dataset throws an error

I created a dataset using the webinterface and finally converting the dataset to hf format using release2dataset throws the following error.

Code:

release = segments_client.get_release("###/###", "v0.1")
hf_dataset = release2dataset(release)```


Error:
`  File "pyarrow\table.pxi", line 3315, in pyarrow.lib.Table.combine_chunks
  File "pyarrow\error.pxi", line 144, in pyarrow.lib.pyarrow_internal_check_status
  File "pyarrow\error.pxi", line 100, in pyarrow.lib.check_status
pyarrow.lib.ArrowInvalid: offset overflow while concatenating arrays`

Loading release with skipped images causes an exception to be thrown

When instantiating a SegmentsDataset on a release (in this case, segmentation annotation) and there are skipped images in the dataset, the segmentation_bitmap_url does not exist (https://github.com/segments-ai/segments-ai/blob/master/src/segments/dataset.py#L259)
This causes an uncaught exception to be raised.
maybe add a

if label["label_status"] == "SKIPPED"

before _load_segmentation_bitmap_from_cache

Thanks !

HTTPError/NetworkError when doing `add_sample` for a point cloud sequence even though the sample successfully uploads

When adding a point cloud sequence to a Segments.ai dataset like so:

attributes = {"frames": frames}
_ = segments_client.add_sample(DATASET, current_scans_name, attributes)

I get the following error:

HTTPError                                 Traceback (most recent call last)
File ~/.pyenv/versions/3.10.10/envs/test/lib/python3.10/site-packages/segments/client.py:96, in handle_exceptions.<locals>.throw_segments_exception(model, *args, **kwargs)
     95 r = f(*args, **kwargs)
---> 96 r.raise_for_status()
     97 if r.content:

File ~/.pyenv/versions/3.10.10/envs/test/lib/python3.10/site-packages/requests/models.py:1021, in Response.raise_for_status(self)
   1020 if http_error_msg:
-> 1021     raise HTTPError(http_error_msg, response=self)

HTTPError: 500 Server Error: Internal Server Error for url: https://api.segments.ai/datasets/redacted/redacted/samples/

During handling of the above exception, another exception occurred:

NetworkError                              Traceback (most recent call last)
Cell In[2], line 2
      1 attributes = {"frames": frames}
----> 2 _ = segments_client.add_sample(DATASET, current_scans_name, attributes)

File ~/.pyenv/versions/3.10.10/envs/test/lib/python3.10/site-packages/segments/client.py:947, in SegmentsClient.add_sample(self, dataset_identifier, name, attributes, metadata, priority, assigned_labeler, assigned_reviewer, embedding)
    944 if embedding:
    945     payload["embedding"] = embedding
--> 947 r = self._post(
    948     f"/datasets/{dataset_identifier}/samples/",
    949     data=payload,
    950     model=Sample,
    951 )
    952 # logger.info(f"Added {name}")
    954 return cast(Sample, r)

File ~/.pyenv/versions/3.10.10/envs/test/lib/python3.10/site-packages/segments/client.py:133, in handle_exceptions.<locals>.throw_segments_exception(model, *args, **kwargs)
    131     if "free trial ended" in text or "exceeded user limit" in text:
    132         raise SubscriptionError(message=text, cause=e)
--> 133     raise NetworkError(message=text, cause=e)
    134 except requests.exceptions.TooManyRedirects as e:
    135     # Tell the user their URL was bad and try a different one
    136     raise NetworkError(message="Bad url, please try a different one.", cause=e)

NetworkError: <html>
  <head>
    <title>internal server error</title>
  </head>
  <body>
    <h1><p>internal server error</p></h1>
    
  </body>
</html>

The sample is correctly uploaded to the dataset despite this error. frames looks like the following:

[{'pcd': {'url': 'https://redacted.pcd',
   'type': 'pcd'},
  'name': 'redacted'},
 {'pcd': {'url': 'https://redacted.pcd',
   'type': 'pcd'},
  'name': 'redacted'},
 {'pcd': {'url': 'https://redacted.pcd',
   'type': 'pcd'},
  'name': 'redacted'},
 {'pcd': {'url': 'https://redacted.pcd',
   'type': 'pcd'},
  'name': 'redacted'},
 {'pcd': {'url': 'https://redacted.pcd',
   'type': 'pcd'},
  'name': 'redacted'}]

I'm using version 1.0.25 of the Python SDK.

Export folder is never used

Even though the export folder is created, it is never used, and the output still goes to segments folder.

segments-ai/src/segments/export.py

Line 555 in f5b3ecf

os.makedirs(export_folder, exist_ok=True)

Issue in export API

Hello Bert,

Facing an issue while exporting dataset in coco-panoptic format through the export API.-

"NameError: name 'regionprops' is not defined"

Code-
from segments import SegmentsDataset
from segments.utils import export_dataset
dataset = SegmentsDataset(release, labelset='ground-truth', filter_by=['labeled', 'reviewed'])

Export to COCO panoptic format

export_dataset(dataset, export_format='coco-panoptic')

It works for other formats like semantic , semantic-color. Did you just enhance the export API to support these? it will help me so I don't need to now convert the instance ids to category ids at my end. Are they expected to also return a json, as I only see images in the output of export.

Best,
Shailesh

system hangs when sorting predictions on score and moving to the next page

in https://segments.ai/pfizer/Farad2Sort_IPL we added a second label set "predictions". When we sort the samples on the prediction score and then click 2 to move to the next page the system hangs.

Export dataset error

Some of the images in our dataset has orientation error, where those images has ExifTags for fix the orientation.

from PIL import Image, ExifTags
import numpy as np

try:

    for orientation in ExifTags.TAGS.keys():
              if ExifTags.TAGS[orientation]=='Orientation':
                  break

    for i,sample in enumerate(labeled_dataset):

      if(np.asarray(sample['image']).shape[:2] != np.asarray(sample['segmentation_bitmap']).shape[:2]):
        print("old ",sample['name'],np.asarray(sample['image']).shape)

        image_path = "/content/fast-labeling-workflow/segments/Sougata_Apple_Train/v0.3/"+sample['name'].split(".")[0]+".jpg"
        print(image_path)
        image=Image.open(image_path)
      
        
        exif=dict(image._getexif().items())

        if exif[orientation] == 3:
            image=image.transpose(Image.ROTATE_180)
        elif exif[orientation] == 6:
            image=image.transpose(Image.ROTATE_270)
        elif exif[orientation] == 8:
            image=image.transpose(Image.ROTATE_90)
        print("New ",np.asarray(image).shape)

except (AttributeError, KeyError, IndexError) as e:
    # cases: image don't have getexif
    print(e)
    pass

When I upload the image from my machine that code work fine, but when I download the date by using Python SDK, the image is downloaded without any attributes I am getting this error:

old  GOPR2236.JPG (3000, 4000, 3)
/content/fast-labeling-workflow/segments/Sougata_Apple_Train/v0.3/GOPR2236.jpg
'NoneType' object has no attribute 'items'

So, is there a way to download the data with EXIF attributes.

caching issue: when you rename a label set to a previously deleted label set that is not the ground-truth, the old values of the deleted are shown unless you clear your browser cache

Dataset created with segments.ai does not work with SegFormer

Greetings,

I have been trying to the web interface to label a segmentation dataset and use them in your segformer tutorial for custom training a dataset. I pushed the dataset to HuggingFace and while pulling it from the HuggingFace repository, the trainer method throws an index out of range error.

Error

File "C:\.conda\envs\segformer\lib\site-packages\datasets\table.py", line 121, in <listcomp>
    self._batches[batch_idx].slice(i - self._offsets[batch_idx], 1)
IndexError: list index out of range
  0%|          | 0/45 [00:00<?, ?it/s]

huggingface dataset integration readme check

I was following this tutorial along with my own dataset

from segments.huggingface import release2dataset

release = client.get_release(dataset_name, release_name)
hf_dataset = release2dataset(release)

and get file not found error because of this line

with open('data/dataset_card_template.md', 'r') as f:

It was easy enough to create an empty "data/dataset_card_template.md" file but would be more convenient to add an file exists check before reading the file 😄

Awesome stuff btw !

Release files - Enhancement request

Hi Bert,

Today, the release files are more for export. However, I think it will help if the release files can also drive the creation of samples, modification of already existing work. For eg -

I complete my labeling with 5 classes - Release 1. Then decide to add another class on top. I could create my labeled samples dataset using Release 1 (in a different folder) and then start from there, making modifications to each image for the newly added class on top of the segmentation that already exists.
Or I want to revisit and make finer adjustments as new scenarios come across. So each release is not just a representation of the pixel value but carries with it the class json file - so a complete unit that I can use to recreate that release as is - with it's classes, and the segmentation built using those classes.

Currently, when I delete a release and go back to the older one, the class file (setting) is still retained as the latest one and the samples point back to that file. Instead, each delete of release can wipe out the class json/setting file, and revert that to the one that corresponds to the active release. Basically the release file should have a one to one match with the setting file (like git).

I understand that despite the setting file being retained, the segmentation stays - so all objects are mapped, it's just that i need to re-apply the labels. But I believe above feature will be helpful in saving time, or being able to create multiple folders by uploading a release instead of only samples.

Let me know what you think,
Shailesh

KeyError when downloading a dataset with missing segmentations

Here's the stack trace:

Traceback (most recent call last):
File "/home/steven.sagaert/python_workspace/FARAD2SORT/FARAD_research/Steven/Active/segments_interface.py", line 94, in
dataset = fetch_release(dataset_name=dataset_name)
File "/home/steven.sagaert/python_workspace/FARAD2SORT/FARAD_research/Steven/Active/segments_interface.py", line 28, in fetch_release
return SegmentsDataset(release,filter_by="REVIEWED", segments_dir=dump_folder)
File "/home/steven.sagaert/miniconda3/envs/farad/lib/python3.8/site-packages/segments/dataset.py", line 155, in init
self.load_dataset()
File "/home/steven.sagaert/miniconda3/envs/farad/lib/python3.8/site-packages/segments/dataset.py", line 214, in load_dataset
list(
File "/home/steven.sagaert/miniconda3/envs/farad/lib/python3.8/site-packages/tqdm/std.py", line 1178, in iter
for obj in iterable:
File "/home/steven.sagaert/miniconda3/envs/farad/lib/python3.8/multiprocessing/pool.py", line 868, in next
raise value
File "/home/steven.sagaert/miniconda3/envs/farad/lib/python3.8/multiprocessing/pool.py", line 125, in worker
result = (True, func(*args, **kwds))
File "/home/steven.sagaert/miniconda3/envs/farad/lib/python3.8/site-packages/segments/dataset.py", line 198, in _load_image
self.getitem(i)
File "/home/steven.sagaert/miniconda3/envs/farad/lib/python3.8/site-packages/segments/dataset.py", line 325, in getitem
segmentation_bitmap = self._load_segmentation_bitmap_from_cache(
File "/home/steven.sagaert/miniconda3/envs/farad/lib/python3.8/site-packages/segments/dataset.py", line 259, in _load_segmentation_bitmap_from_cache
segmentation_bitmap_url = label["attributes"]["segmentation_bitmap"]["url"]
KeyError: 'segmentation_bitmap'

The solution is to change line 337 in dataset.py to
except (KeyError,TypeError):

error when loading dataset with image with no segmentation mask

in the pfizer dataset there are images where the url for the segmentation mask is None. This gives an error when loading the dataset and stops further loading.

I fixed this as follows: in dataset.py from line 268 onwards

if not os.path.exists(segmentation_bitmap_filename):
if segmentation_bitmap_url is not None:
return load_label_bitmap_from_url(
segmentation_bitmap_url, segmentation_bitmap_filename
)
else:
return Image.open(segmentation_bitmap_filename)