Giter Club home page Giter Club logo

cryoet-data-portal's Introduction

cryoet-data-portal's People

Contributors

aganders3 avatar andy-sweet avatar codemonkey800 avatar dgmccart avatar ericmaxwang avatar github-actions[bot] avatar jgadling avatar kandarpksk avatar kne42 avatar manasav3 avatar richaagarwal avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cryoet-data-portal's Issues

Annotation dates are not serializable

There are multiple date objects in Annotation objects in the data portal:
'deposition_date': datetime.date(2023, 4, 1), 'release_date': datetime.date(2023, 6, 1), 'last_modified_date': datetime.date(2023, 6, 1). These objects are not serializable. I suggest converting these objects to iso 8601 strings before adding them to the data portal .

I ran into this trying to pretty print annotations, but this would affect other workflows that involve serializing annotations.

>>> json.dumps(annotations[0].to_dict(), indent=4)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/kharrington/micromamba/envs/nesoi/lib/python3.10/json/__init__.py", line 238, in dumps
    **kw).encode(obj)
  File "/Users/kharrington/micromamba/envs/nesoi/lib/python3.10/json/encoder.py", line 201, in encode
    chunks = list(chunks)
  File "/Users/kharrington/micromamba/envs/nesoi/lib/python3.10/json/encoder.py", line 431, in _iterencode
    yield from _iterencode_dict(o, _current_indent_level)
  File "/Users/kharrington/micromamba/envs/nesoi/lib/python3.10/json/encoder.py", line 405, in _iterencode_dict
    yield from chunks
  File "/Users/kharrington/micromamba/envs/nesoi/lib/python3.10/json/encoder.py", line 438, in _iterencode
    o = _default(o)
  File "/Users/kharrington/micromamba/envs/nesoi/lib/python3.10/json/encoder.py", line 179, in default
    raise TypeError(f'Object of type {o.__class__.__name__} '
TypeError: Object of type date is not JSON serializable

Generate JSON for neuroglancer

User story:
I want to visualize data from the data portal via an interactive web viewer, so that I am able to view my data in real time.

Create a JSON that frontend can use to create an encoded link to the neuroglancer to visualize the tomogram.

View other tomograms and segmentation annotations in Neuroglancer

User story: (4.6) I want to open a tomogram in Neuroglancer, so that I can visualize the data

Tasks

  • option to view other tomograms (besides canonical) in Neuroglancer
  • open Neuroglancer in a separate tab OR navigate back to original page it was launched from
  • view segmentation annotations

Link to design:

filter on single dataset page - to narrow down runs

Tasks

  • 2 filters: tilt series quality score, tilt range
  • general AND/OR logic:
    • AND logic between different filters
    • OR logic for multiple values within one filter
  • when filter is applied, show as a blue tag that use can "X" out of to remove
  • only show the filter values that we have available in the database
  • once filter is applied, show number of search results in the runs table

Note: sort is out of scope

Link to design: https://www.figma.com/file/oUtMvkDdADObFmvOJVVy9G/Phase-2-Deprioritized-Designs-%26-Explorations?node-id=29%3A9586&mode=dev

mrc pyramid is not necessary

Those who download tomogram mrc files are end users needing it at the particular voxel_size. The multi-scale pyramid is not useful. In addition, the pyramid scale all three-axes equally, causing tilt_series which are stack of image skipping the tilt sequence.

Recommendation: remove mrc multiscale pyramid. Only do so on zarr files.

Single Run page - Annotations table view

User story: (4.5) I want to see essential annotation metadata displayed on the table so that I can find the annotation of interest

Tasks

  • table has the following columns: Annotation, annotation object, object count, precision, recall
  • show pagination - max out at 20
  • show how many annotations total and how many are being displayed (e.g. 20 of 25 Annotations)
  • Show blue tag where ground truth is available

Link to design: https://www.figma.com/file/WEmbsjtlBUtRy7pzmuCCjj/CryoET-Data-Portal---Phase-2-Designs?type=design&node-id=1310%3A11321&mode=dev

Create CORS support for cloudfront

To allow neuroglancer to request files from the Cloudfront API gateway, we have to add the neuroglancer domain to the supported cross-orgins.

key photo at dataset level

User story: (3.2) I want to see a key photo at the dataset level, so that I can get a glimpse of what I might expect to see in the dataset

Tasks

  • surface key photo (single large photo at the top of single dataset page)
  • surface key photo on (small photos at the browse all data page)
    note: key photos are submitted at run level and will have to pick one to show at dataset level. Other times, the key photo is coming from EMPIAR (in which case we would only have 1 and will not have to select which to show)

Link to design: https://www.figma.com/file/WEmbsjtlBUtRy7pzmuCCjj/CryoET-Data-Portal---Phase-2-Designs?type=design&node-id=757%3A86414&mode=dev

Querying annotations returns duplicate HTTPS URLs

Querying some of the annotations returns more objects than I'd expect, some of which have duplicate metadata URLs.

See the following reproducer.

from cryoet_data_portal import Client, Tomogram
client = Client()
tomo = next(Tomogram.find(client, [Tomogram.name == 'TS_026']))
annos = list(tomo.tomogram_voxel_spacing.annotations)
for a in annos:
    print(a.https_metadata_path)

# https://files.cryoetdataportal.cziscience.com/10000/TS_026/Tomograms/VoxelSpacing13.48/Annotations/sara_goetz-fatty_acid_synthase-1.0.json
# https://files.cryoetdataportal.cziscience.com/10000/TS_026/Tomograms/VoxelSpacing13.48/Annotations/sara_goetz-ribosome-1.0.json
# https://files.cryoetdataportal.cziscience.com/10000/TS_026/Tomograms/VoxelSpacing13.48/Annotations/sara_goetz-ribosome-1.0.json
# https://files.cryoetdataportal.cziscience.com/10000/TS_026/Tomograms/VoxelSpacing13.48/Annotations/sara_goetz-fatty_acid_synthase-1.0.json

I suspect that we're returning multiple s3 versions of each bucket object, but not certain.

Error in Client.get_by_id

version == 1.0.0

from cryoet_data_portal import Client, Dataset
myclient=Client()
item = Dataset.get_by_id(myclient, 10000)

Gave error

  File "/Users/anchi.cheng/Library/Python/3.9/lib/python/site-packages/cryoet_data_portal/_gql_base.py", line 259, in get_by_id
    return client.find_one(cls, [cls.id == id])
  File "/Users/anchi.cheng/Library/Python/3.9/lib/python/site-packages/cryoet_data_portal/_client.py", line 58, in find_one
    for result in self.find(*args, **kwargs):
  File "/Users/anchi.cheng/Library/Python/3.9/lib/python/site-packages/cryoet_data_portal/_client.py", line 53, in find
    response = self.client.execute(self.build_query(cls, gql_type, query_filters))
  File "/Users/anchi.cheng/Library/Python/3.9/lib/python/site-packages/gql/client.py", line 403, in execute
    return self.execute_sync(
  File "/Users/anchi.cheng/Library/Python/3.9/lib/python/site-packages/gql/client.py", line 221, in execute_sync
    return session.execute(
  File "/Users/anchi.cheng/Library/Python/3.9/lib/python/site-packages/gql/client.py", line 860, in execute
    raise TransportQueryError(
gql.transport.exceptions.TransportQueryError: {'extensions': {'code': 'validation-failed', 'path': '$.selectionSet.datasets.args.where'}, 'message': "expected an object for type 'datasets_bool_exp', but found an enum value"}

Workaround:

    items = Dataset.find(client,[Dataset.id==10004])
    for item in items:
        print(item.title)

Filter on browse all data page

User Story: 2.2 I want to filter through the data, so that I can narrow down my selection

  • 13 filters total, organized within 7 groups (see design)
  • general AND/OR logic:
    • AND logic between different filters (e.g., I want to see object name = ribosome AND camera manufacturer = Gatan)
    • OR logic for multiple values within one filter (e.g., I want to see datasets that have Ribosomes OR Membranes annotated)
    • exception to the above rule: multi-select under the "available files" section is AND logic (e.g., i want to see datasets where it includes raw frames AND tilt series)
  • when filter is applied, show as a blue tag that use can "X" out of to remove
  • only show the filter values that we have available in the database (i.e., we wouldn't list "Mesh" under object shape type if it doesn't exist in our metadata)
  • once filter is applied, show number of search results in the dataset table
  • note: see specific filter menu types in design

Link to design: https://www.figma.com/file/WEmbsjtlBUtRy7pzmuCCjj/CryoET-Data-Portal---Phase-2-Designs?type=design&node-id=1637-37001&mode=design&t=QGhDNstOreaiFP49-0

Configure download dialog from single run page

User story: (5.5) I want to download specific tomograms and/or annotations so that I have targeted data without unnecessarily using up memory

Tasks

  • download button on single run page to open "Configure Download" dialog
  • dialog contains the following elements: title, dataset name, run name, and 2 download options (see below)
  • directions to download all run data via API, with link to API instructions

Link to design: https://www.figma.com/file/WEmbsjtlBUtRy7pzmuCCjj/CryoET-Data-Portal---Phase-2-Designs?type=design&node-id=2315-169419&mode=design&t=8myNEdr3dZigyAP2-0

Run quality metrics

User story: (4.5) I want to see quality metrics, so that I can decide how reliable the data is

Tasks

  • tilt series quality metrics shown directly in UI (separated from metadata to make it more accessible)

Link to design:

A client with an invalid URL hangs indefinitely on find

I can create a client with an invalid URL successfully, then attempt to find datasets (or presumably other things). I'd expect this to error (ideally on initialization of the client), but instead it hangs indefinitely when calling find.

The following code should reproduce this issue

from cryoet_data_portal import Client, Dataset
client = Client("https://graphql.catdataportal.cziscience.com/v1/graphql")
datasets = list(Dataset.find(client, [Dataset.id == 10000]))
# hangs indefinitely

Download a single dataset

User story: (5.1) I want to download a single dataset, so that I can use it to develop/train ML models

Tasks:

P0:

  • download button on single dataset page to open download dataset dialog
  • dialog contains the following elements: title, dataset name, and 2 download tabs for users to chose from
  • tab 1: download via AWS S3
  • - step 1 asks user to select their save destination (this is optional for user) - P1
  • step 2 is a AWS s3 snippet that user can copy
  • provide link to AWS instructions
    • note: instructions are WIP by Dannielle
  • tab 2: download via API
  • info box to link user to API documentation

P1:

  • surface dataset ID for user to copy
  • asks user to select their save destination (this is optional for user)

Link to design: https://www.figma.com/file/WEmbsjtlBUtRy7pzmuCCjj/CryoET-Data-Portal---Phase-2-Designs?type=design&node-id=2315-169422&mode=design&t=8myNEdr3dZigyAP2-0

Filter on single runs page - to narrow down annotations

Add the following 6 filters to the single run page to narrow down annotations:

  • 1.Annotation Author
    • new filter following existing patterns, should return annotation authors, and not dataset authors
  • 2.Object Name
    • filter exists on Browse All page
  • 3.GO ID
    • new filter, see filter specs in designs
  • 4.Object Shape Type
    • filter exists on Browse All page
  • 5.Method Type
    • new filter following existing patterns. Should only have a max of 3 dropdown options
    • dependent on #443 and #442
  • 6.Annotation Software
    • new filter following existing patterns

Link to design: https://www.figma.com/file/q6Z394Xy6wUmaXQ9YFjZH7/Kevin---2023-Spillover-Work?type=design&node-id=2118-11276&mode=design&t=W43r2ZQSbiy3s1ES-0

browse on 2 separate tabs (dataset, run)

User Story: (2.6) I want the filter, sort, and search results to be shown on 2 separate tabs (dataset, run), so that I can better navigate different levels of data

Link to design:

Sort data (on browse all data page)

User Story: (2.3) I want to sort the data, so that the data is arranged in a meaningful order

  • sort by organism, # of runs, object type, etc. (see designs of final list)

Link to design:

add example for using ilike method to construct query in API documentation

like, ilike, _in does work like other operators.

I had to use them as attribute of the field, or it get syntex error. For example,

from cryoet_data_portal import Client, Annotation         
myclient = Client()
results = Annotation.find(myclient,[Annotation.object_name.ilike('fatty acid synthase%')])

while

results = Annotation.find(myclient,[Annotation.object_name ilike 'fatty acid synthase%'])

gives syntax error

Open tomogram in Neuroglancer

User story: (4.6) I want to open a tomogram in Neuroglancer, so that I can visualize the data

Tasks

  • default to opening canonical tomogram in Neuroglancer
    • run page
    • dataset page

Link to design:

OME-Zarr tomogram scale is not physically correct

Currently, the OME-Zarr metadata (i.e. the top-level zattrs) describe unit-less spatial dimensions of z/y/x and multi-resolution scales of (1, 2, 4).

While these scales are relatively correct (i.e. they represent the scaling factors from the highest resolution tomogram), they do not capture the physical spacing between the voxels of the tomogram as I would expect.

Instead, I would expect the spatial dimensions to include a supported OME-NGFF unit (e.g. angstrom) and then absorb the voxel spacing/sizes into the multi-resolution scales (e.g. 13.48, 26.96, 53.92).

Not a major issue, but this should mean that tools that read those metadata, will automatically set the scale of the data to be physically correct with respect to other data and visualization tools like scale bars.

Single run page header area

User story: (4.1) I want to see high-level description of the run so that I can quickly decide if I would like to investigate further

Tasks

  • count of # of files (Frames, Tilt-series, Tomograms, Annotations)
  • Overview of tilt series, including tilt quality, tilt range, tilt scheme
  • Overview of Tomogram, including resolutions available, tomogram processing, annotated objects

Link to design: https://www.figma.com/file/WEmbsjtlBUtRy7pzmuCCjj/CryoET-Data-Portal---Phase-2-Designs?type=design&node-id=1310%3A11321&mode=dev

Website description

User story: (1.1) I want to see a website description, so that I can understand the purpose of the data portal

Tasks

  • landing page layout
  • landing page image banner
  • #27 (this is added as a separate ticket bc it's a P1)
  • general description (what is the portal)
  • button to direct user to browse all data page
  • Thank you to our data contributors: list of names in alphabetical order

**Link to design:**https://www.figma.com/file/WEmbsjtlBUtRy7pzmuCCjj/CryoET-Data-Portal---Phase-2-Designs?node-id=1929%3A65457&mode=dev

Download data

Epic for tracking all issues/tasks related to downloading data from the CryoET data portal

P0

P1

  • #61 - out of scope

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.