CryoET Data Portal frontend, docs, tools, and api client.
chanzuckerberg / cryoet-data-portal Goto Github PK
View Code? Open in Web Editor NEWCryoET Data Portal
License: MIT License
CryoET Data Portal
License: MIT License
CryoET Data Portal frontend, docs, tools, and api client.
User story: (1.2) I want to see the description of the kinds of data hosted (e.g., type and count), so that I can quickly get an idea of what I might find
Tasks
The count will auto update as new datasets are added
Link to design: https://www.figma.com/file/WEmbsjtlBUtRy7pzmuCCjj/CryoET-Data-Portal---Phase-2-Designs?node-id=1929%3A65457&mode=dev
There are multiple date
objects in Annotation
objects in the data portal:
'deposition_date': datetime.date(2023, 4, 1), 'release_date': datetime.date(2023, 6, 1), 'last_modified_date': datetime.date(2023, 6, 1)
. These objects are not serializable. I suggest converting these objects to iso 8601 strings before adding them to the data portal .
I ran into this trying to pretty print annotations, but this would affect other workflows that involve serializing annotations.
>>> json.dumps(annotations[0].to_dict(), indent=4)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/kharrington/micromamba/envs/nesoi/lib/python3.10/json/__init__.py", line 238, in dumps
**kw).encode(obj)
File "/Users/kharrington/micromamba/envs/nesoi/lib/python3.10/json/encoder.py", line 201, in encode
chunks = list(chunks)
File "/Users/kharrington/micromamba/envs/nesoi/lib/python3.10/json/encoder.py", line 431, in _iterencode
yield from _iterencode_dict(o, _current_indent_level)
File "/Users/kharrington/micromamba/envs/nesoi/lib/python3.10/json/encoder.py", line 405, in _iterencode_dict
yield from chunks
File "/Users/kharrington/micromamba/envs/nesoi/lib/python3.10/json/encoder.py", line 438, in _iterencode
o = _default(o)
File "/Users/kharrington/micromamba/envs/nesoi/lib/python3.10/json/encoder.py", line 179, in default
raise TypeError(f'Object of type {o.__class__.__name__} '
TypeError: Object of type date is not JSON serializable
The only possible way to download an annotation is to use TomoVoxelSpacing.download_all.
User story:
I want to visualize data from the data portal via an interactive web viewer, so that I am able to view my data in real time.
Create a JSON that frontend can use to create an encoded link to the neuroglancer to visualize the tomogram.
Create staging environment, to test all merged in changes.
User story: (4.6) I want to open a tomogram in Neuroglancer, so that I can visualize the data
Tasks
Link to design:
Tasks
Note: sort is out of scope
Link to design: https://www.figma.com/file/oUtMvkDdADObFmvOJVVy9G/Phase-2-Deprioritized-Designs-%26-Explorations?node-id=29%3A9586&mode=dev
User story: (3.5) I want to see all runs associated with this dataset + filter/sort within the dataset so that I can easily navigate large amounts within dataset
Tasks
Link to design: https://www.figma.com/file/WEmbsjtlBUtRy7pzmuCCjj/CryoET-Data-Portal---Phase-2-Designs?type=design&node-id=757%3A86414&mode=dev
Those who download tomogram mrc files are end users needing it at the particular voxel_size. The multi-scale pyramid is not useful. In addition, the pyramid scale all three-axes equally, causing tilt_series which are stack of image skipping the tilt sequence.
Recommendation: remove mrc multiscale pyramid. Only do so on zarr files.
User story: (4.5) I want to see essential annotation metadata displayed on the table so that I can find the annotation of interest
Tasks
Link to design: https://www.figma.com/file/WEmbsjtlBUtRy7pzmuCCjj/CryoET-Data-Portal---Phase-2-Designs?type=design&node-id=1310%3A11321&mode=dev
Link to design: https://www.figma.com/file/WEmbsjtlBUtRy7pzmuCCjj/CryoET-Data-Portal---Phase-2-Designs?node-id=2234%3A141854&mode=dev
User story: (3.4) I want to see detailed dataset metadata, so that I can get information on the experiment as a whole
Tasks
To allow neuroglancer to request files from the Cloudfront API gateway, we have to add the neuroglancer domain to the supported cross-orgins.
User story: (3.2) I want to see a key photo at the dataset level, so that I can get a glimpse of what I might expect to see in the dataset
Tasks
Link to design: https://www.figma.com/file/WEmbsjtlBUtRy7pzmuCCjj/CryoET-Data-Portal---Phase-2-Designs?type=design&node-id=757%3A86414&mode=dev
We have a variety of different metadata changes coming up for Phase 2. Before we can make these changes to our json and GQL schema, let's get approval for what these fields should be called and what their structure should be.
Draft changes are here: https://docs.google.com/document/d/1zcD9OyY86gaogVZOBZPwcn0fxZ5QSFrlqXeaodJX9BI/edit#heading=h.be2fyivqz1zp
User story: (2.1) I want to browse all data, so that I am aware of what is available
Tasks
Link to design:
User story: (3.1) I want to see a high-level description of the dataset, so that I can decide whether I want to proceed with downloading
Tasks
Link to design: https://www.figma.com/file/WEmbsjtlBUtRy7pzmuCCjj/CryoET-Data-Portal---Phase-2-Designs?type=design&node-id=757%3A86414&mode=dev
Querying some of the annotations returns more objects than I'd expect, some of which have duplicate metadata URLs.
See the following reproducer.
from cryoet_data_portal import Client, Tomogram
client = Client()
tomo = next(Tomogram.find(client, [Tomogram.name == 'TS_026']))
annos = list(tomo.tomogram_voxel_spacing.annotations)
for a in annos:
print(a.https_metadata_path)
# https://files.cryoetdataportal.cziscience.com/10000/TS_026/Tomograms/VoxelSpacing13.48/Annotations/sara_goetz-fatty_acid_synthase-1.0.json
# https://files.cryoetdataportal.cziscience.com/10000/TS_026/Tomograms/VoxelSpacing13.48/Annotations/sara_goetz-ribosome-1.0.json
# https://files.cryoetdataportal.cziscience.com/10000/TS_026/Tomograms/VoxelSpacing13.48/Annotations/sara_goetz-ribosome-1.0.json
# https://files.cryoetdataportal.cziscience.com/10000/TS_026/Tomograms/VoxelSpacing13.48/Annotations/sara_goetz-fatty_acid_synthase-1.0.json
I suspect that we're returning multiple s3 versions of each bucket object, but not certain.
version == 1.0.0
from cryoet_data_portal import Client, Dataset
myclient=Client()
item = Dataset.get_by_id(myclient, 10000)
Gave error
File "/Users/anchi.cheng/Library/Python/3.9/lib/python/site-packages/cryoet_data_portal/_gql_base.py", line 259, in get_by_id
return client.find_one(cls, [cls.id == id])
File "/Users/anchi.cheng/Library/Python/3.9/lib/python/site-packages/cryoet_data_portal/_client.py", line 58, in find_one
for result in self.find(*args, **kwargs):
File "/Users/anchi.cheng/Library/Python/3.9/lib/python/site-packages/cryoet_data_portal/_client.py", line 53, in find
response = self.client.execute(self.build_query(cls, gql_type, query_filters))
File "/Users/anchi.cheng/Library/Python/3.9/lib/python/site-packages/gql/client.py", line 403, in execute
return self.execute_sync(
File "/Users/anchi.cheng/Library/Python/3.9/lib/python/site-packages/gql/client.py", line 221, in execute_sync
return session.execute(
File "/Users/anchi.cheng/Library/Python/3.9/lib/python/site-packages/gql/client.py", line 860, in execute
raise TransportQueryError(
gql.transport.exceptions.TransportQueryError: {'extensions': {'code': 'validation-failed', 'path': '$.selectionSet.datasets.args.where'}, 'message': "expected an object for type 'datasets_bool_exp', but found an enum value"}
Workaround:
items = Dataset.find(client,[Dataset.id==10004])
for item in items:
print(item.title)
User Story: 2.2 I want to filter through the data, so that I can narrow down my selection
Generate snapshots of tomogram, so that user can get a glimpse of what they might expect to see in the dataset.
User story: (5.5) I want to download specific tomograms and/or annotations so that I have targeted data without unnecessarily using up memory
Tasks
User story: (4.5) I want to see quality metrics, so that I can decide how reliable the data is
Tasks
Link to design:
User story: (4.2) I want to see all the metadata within a run, so that I can access detailed information
Tasks
Link to design: https://www.figma.com/file/WEmbsjtlBUtRy7pzmuCCjj/CryoET-Data-Portal---Phase-2-Designs?type=design&node-id=1310%3A11321&mode=dev
I can create a client with an invalid URL successfully, then attempt to find datasets (or presumably other things). I'd expect this to error (ideally on initialization of the client), but instead it hangs indefinitely when calling find
.
The following code should reproduce this issue
from cryoet_data_portal import Client, Dataset
client = Client("https://graphql.catdataportal.cziscience.com/v1/graphql")
datasets = list(Dataset.find(client, [Dataset.id == 10000]))
# hangs indefinitely
User story: (5.1) I want to download a single dataset, so that I can use it to develop/train ML models
Tasks:
P0:
P1:
Add the following 6 filters to the single run page to narrow down annotations:
Link to design: https://www.figma.com/file/q6Z394Xy6wUmaXQ9YFjZH7/Kevin---2023-Spillover-Work?type=design&node-id=2118-11276&mode=design&t=W43r2ZQSbiy3s1ES-0
User Story: (2.6) I want the filter, sort, and search results to be shown on 2 separate tabs (dataset, run), so that I can better navigate different levels of data
Link to design:
User Story: (2.3) I want to sort the data, so that the data is arranged in a meaningful order
Link to design:
like, ilike, _in does work like other operators.
I had to use them as attribute of the field, or it get syntex error. For example,
from cryoet_data_portal import Client, Annotation
myclient = Client()
results = Annotation.find(myclient,[Annotation.object_name.ilike('fatty acid synthase%')])
while
results = Annotation.find(myclient,[Annotation.object_name ilike 'fatty acid synthase%'])
gives syntax error
User story: (2.1) I want to browse all data, so that I am aware of what is available
Tasks
Link to design: https://www.figma.com/file/WEmbsjtlBUtRy7pzmuCCjj/CryoET-Data-Portal---Phase-2-Designs?mode=dev
User story: (4.6) I want to open a tomogram in Neuroglancer, so that I can visualize the data
Tasks
Link to design:
Currently, the OME-Zarr metadata (i.e. the top-level zattrs) describe unit-less spatial dimensions of z/y/x and multi-resolution scales of (1, 2, 4).
While these scales are relatively correct (i.e. they represent the scaling factors from the highest resolution tomogram), they do not capture the physical spacing between the voxels of the tomogram as I would expect.
Instead, I would expect the spatial dimensions to include a supported OME-NGFF unit (e.g. angstrom
) and then absorb the voxel spacing/sizes into the multi-resolution scales (e.g. 13.48, 26.96, 53.92).
Not a major issue, but this should mean that tools that read those metadata, will automatically set the scale of the data to be physically correct with respect to other data and visualization tools like scale bars.
User story: (4.1) I want to see high-level description of the run so that I can quickly decide if I would like to investigate further
Tasks
Link to design: https://www.figma.com/file/WEmbsjtlBUtRy7pzmuCCjj/CryoET-Data-Portal---Phase-2-Designs?type=design&node-id=1310%3A11321&mode=dev
User story: (1.1) I want to see a website description, so that I can understand the purpose of the data portal
Tasks
**Link to design:**https://www.figma.com/file/WEmbsjtlBUtRy7pzmuCCjj/CryoET-Data-Portal---Phase-2-Designs?node-id=1929%3A65457&mode=dev
Link to logos: https://drive.google.com/drive/folders/1MdDfET-nznjnx5EmO0icHokVq_H2mNo_?usp=drive_link
User story: (4.3) I want to see a key photo at the run level, so that I can get a glimpse of what I might expect to see in the run
Note: right click does not allow due to size of image. will move forward with click to open photo in separate tab
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.