hubmapconsortium / portal-ui Goto Github PK

HuBMAP Data Portal front end

Home Page: https://portal.hubmapconsortium.org

License: MIT License

Python 7.83% CSS 0.03% HTML 0.28% Shell 0.87% Dockerfile 0.11% JavaScript 58.93% SCSS 0.19% Jupyter Notebook 0.29% TypeScript 31.46%

gehlenborglab hubmap user-interface hidivelab

portal-ui's Introduction

portal-ui

HuBMAP Data Portal: This is a Flask app, using React on the front end and primarily Elasticsearch on the back end, wrapped in a Docker container for deployment using Docker Compose. The front end depends on AWS S3 and CloudFront for the hosting and delivery of images. It is deployed at portal.hubmapconsortium.org

The Data Portal depends on many APIs, and directly or indirectly, on many other HuBMAP repos.

graph LR
    gateway
    click gateway href "https://github.com/hubmapconsortium/gateway"

    top[portal-ui] --> commons
    click top href "https://github.com/hubmapconsortium/portal-ui"
    click commons href "https://github.com/hubmapconsortium/commons"
    top --> ccf-ui
    click ccf-ui href "https://github.com/hubmapconsortium/ccf-ui"
    top --> vitessce --> viv
    click vitessce href "https://github.com/vitessce/vitessce"
    click viv href "https://github.com/hms-dbmi/viv"
    top --> portal-visualization --> vitessce-python
    click portal-visualization href "https://github.com/hubmapconsortium/portal-visualization"
    click vitessce-python href "https://github.com/vitessce/vitessce-python"
    top --> valid[ingest-validation-tools]
    click valid href "https://github.com/hubmapconsortium/ingest-validation-tools"
    top --> cells-sdk --> cells-api --> pipe
    click cells-sdk href "https://github.com/hubmapconsortium/cells-api-py-client"
    click cells-api href "https://github.com/hubmapconsortium/cross_modality_query"
    top --> gateway
    gateway --> entity-api --> pipe[ingest-pipeline]
    click entity-api href "https://github.com/hubmapconsortium/entity-api"
    click pipe href "https://github.com/hubmapconsortium/ingest-pipeline"
    gateway --> assets-api --> pipe
    %% assets-api is just a file server: There is no repo.
    gateway --> search-api --> pipe
    click search-api href "https://github.com/hubmapconsortium/search-api"
    gateway --> workspaces-api
    click workspaces-api href "https://github.com/hubmapconsortium/user_workspaces_server"

    pipe --> valid
    pipe --> portal-containers
    click portal-containers href "https://github.com/hubmapconsortium/portal-containers/"

    subgraph APIs
        entity-api
        search-api
        cells-api
        assets-api
        workspaces-api
    end

    subgraph Git Submodules
        valid
    end

    subgraph Python Packages
        commons
        portal-visualization
        vitessce-python
        cells-sdk
    end

    subgraph NPM Packages
        vitessce
        viv
    end

    subgraph cdn.jsdelivr.net
        ccf-ui
    end

    subgraph legend
        owner
        contributor
        not-harvard
    end

    classDef contrib fill:#ddffdd,stroke:#88AA88,color:#000;
    class owner,contributor,top,vitessce,viv,portal-visualization,vitessce-python,cells-sdk,portal-containers,valid,search-api contrib

    classDef owner stroke-width:3px,font-style:italic,color:#000;
    class owner,top,vitessce,viv,portal-visualization,vitessce-python,portal-containers owner

    style legend fill:#f8f8f8,stroke:#888888;

Feedback

Issues with the Portal can be reported via email. More information on how issues are tracked across HuBMAP is available here.

Design

We try to have a design ready before we start coding. Often, issues are filed in pairs, tagged design and enhancement. All designs are in Figma. (Note that if that link redirects to /files/recent, you'll need to be added to the project, preferably with a .edu email, if you want write access.)

Development

Prerequisites

git: Suggest installing Apple XCode.
python 3.9
- MiniConda:
  - installing miniconda and creating a new conda environment: conda create -n portal python=$(cat .python-version)
- pyenv:
  - brew install pyenv
  - brew install pyenv-virtualenv
  - pyenv install `cat .python-version`
  - pyenv virtualenv `cat .python-version` portal
  - pyenv activate portal
nodejs/npm: Suggest installing nvm and then using it to install the appropriate node version: nvm install.
- nvm install `cat .nvmrc`
- nvm use `cat .nvmrc`

Optional:

VS Code, with recommended extensions.
- While this is optional, it is worth noting that it is in use by the whole development team
- Using VS Code lets us share default configuration settings and easily run scripts using VS Code tasks.
docker
- Docker is necessary in order to create images for the deploy process
- It is also used to run a local instance of the application when using the test scripts in the ./etc directory

Development

After checking out the project, cd-ing into it, and setting up a Python 3.9 virtual environment,

Get app.conf from Confluence or from another developer and place it at context/instance/app.conf.
Run etc/dev/dev-start.sh to start the webpack dev and flask servers and then visit localhost:5001.
- If using VS Code, you can also use the dev-start task, which will launch these services in separate terminal windows.

The webpack dev server serves all files within the public directory and provides hot module replacement for the react application; The webpack dev server proxies all requests outside of those for files in the public directory to the flask server.

Note: Searchkit, our interface to Elasticsearch, has changed significantly in the latest release. Documentation for version 2.0 can be found here.

Changelog files

Every PR should be reviewed, and every PR should include a new CHANGELOG-something.md at the root of the repository. These are concatenated by etc/build/push.sh during deploy.

File and directory structure conventions

⚛️ React

Note
Any mentions of .js/.jsx in the following guidelines are interchangeable with .ts/.tsx. New features should ideally be developed in TypeScript.

Components with tests or styles should be placed in to their own directory.
Styles should follow the style.* pattern where the extension is js for styled components or css for stylesheets.
- New styled components should use styled from @mui/material/styles.
Supporting test files have specific naming conventions:
- Jest Tests should follow the *.spec.js pattern.
- Stories should follow the *.stories.js pattern.
- Cypress tests should follow the *.cy.js pattern.
- For all test files, the prefix is the name of the component.
Each component directory should have an index.js which exports the component as default.
Components which share a common domain can be placed in a directory within components named after the domain.

🖼️ Images

Images should displayed using the source srcset attribute. You should prepare four versions of the image starting at its original size and at 75%, 50% and 25% the original image's size preserving its aspect ratio. If available, you should also provide a 2x resolution for higher density screens.

For example, to resize images using Mac's Preview you can visit the 'Tools' menu and select 'Adjust Size', from there you can change the image's width while making sure 'Scale Proportionally' and 'Resample Image' are checked. Once ready, each version of the image should be processed with an image optimizer such as ImgOptim or Online Image Compressor.

Homepage images should also be provided in .webp format; a batch conversion script is provided to aid this process.

Finally after processing, the images should be added to the S3 bucket, portal-ui-images-s3-origin, to be delivered by the cloudfront CDN. SVG files larger than 5KB should also be stored in S3 and delivered by the CDN. SVG files smaller than 5KB can be included in the repository in context/app/static/assets/svg/. The CDN responds with a cache-control: max-age=1555200 header for all items, but can be overridden on a per image basis by setting the cache-control header for the object in S3.

If an uploaded file replaces an existing one and uses the same file name, a CloudFront cache invalidation should be run, targeting the specific file(s) that have been updated.

Log in to the AWS console and go to distributions
Select the distribution corresponding to the S3 server.
Go to the Invalidations tab and click Create Invalidation.
Enter the file names which should be invalidated in cache, with the full path; you can target multiple similar file names by using wildcards
- e.g. to invalidate all files in / starting with publication-slide, you would enter /publication-slide*, which would select all the different sizes of that image.
After confirming that you are targeting only the intended files, click Create Invalidation again.

For the homepage carousel, images should have a 16:9 aspect ratio, a width of at least 1400px, a title, a description, and, if desired, a url to be used for the 'Get Started' button.

Testing

Python unit tests use Pytest, front end tests use Jest, and end-to-end tests use Cypress. Each suite is run separately on GitHub CI.

Load tests are available, but they are not run as part of CI.

Running tests locally without docker

Jest: cd context; npm run test
Cypress: With the application running, cd end-to-end; npm run cypress:open
- If using WSL2, see the WSL2-specific steps in the end to end readme.
- Note that the cypress tests (particularly for the publication page) are expected to be run with the test environment enabled in app.conf
Pytest: cd context; pytest app --ignore app/api/vitessce_conf_builder

Linting and pre-commit hooks

CI lints the codebase, and to save time, we also lint in a pre-commit hook. If you want to bypass the hook, set HUSKY_SKIP_HOOKS=1.

You can also lint and auto-correct from the command-line:

cd context
npm run lint
npm run lint:fix
EXCLUDE=node_modules,ingest-validation-tools,etc/dev/organ-utils
autopep8 --in-place --aggressive -r . --exclude $EXCLUDE

Storybook

To start storybook locally you can either run etc/dev/dev-start.sh, or just npm run storybook, and after it has started, visit localhost:6006.

Build, tag, and deploy

The build, tag, deploy, and QA procedures are detailed here.

Instructions for Production are provided here.

Understanding the build

Webpack

To view visualizations of the production webpack bundle run npm run build:analyze. The script will generate two files, report.html and stats.html, inside the public directory each showing a different visual representation of the bundle.

Docker

To build and run the docker image locally:

etc/dev/docker.sh 5001 --follow

Our base image is based on this template.

Docker Compose

In the deployments, our container is behind a NGINX reverse reproxy; Here's a simple demonstration of how that works.

Related projects and dependencies

Search and Metadata

The metadata that we have for each dataset ultimately comes from the data providers, but the fields they supply are determined by the schemas in ingest-validation-tools. That repo is also included as a submodule here, and human-readable field descriptions are pulled from it.

The portal team contributes code to a subdirectory within search-api to clean up the raw Neo4J export and provide us with clean, usable facets. Within that directory, config.yaml configures the Elasticsearch index itself.

Visualization

Data visualization is an integral part of the portal, allowing users to view the results of analysis pipelines or raw uploaded data easily directly in the browser. How such data is processed and prepared for visualization in the client-side Javascript via vitessce can be found here.

General-purpose tools:

viv: JavaScript library for rendering OME-TIFF and OME-NGFF (Zarr) directly in the browser. Packaged as deck.gl layers.
vitessce: Visual integration tool for exploration of spatial single-cell experiments. Built on top of deck.gl.
vitessce-python: Python wrapper classes which make it easier to build configurations.

Particular to HuBMAP:

portal-visualization: Given HuBMAP Dataset JSON, creates a Vitessce configuration.
portal-containers: Docker containers for visualization preprocessing.
airflow-dev: CWL pipelines wrapping those Docker containers.

portal-ui's People

Contributors

Stargazers

Watchers

Forkers

pecan88 schwenk102

portal-ui's Issues

Mockup: Dataset details

Dataset is one of the basic types: It needs a details page.

Nils should give requirements, and we should confirm that the API will have the information necessary.

Deliverable is a mockup which has approval from Nils. File a new issue for implementation when this is complete.

If page requires API, redirect to login page ... or show human-readable error.

In the long run, all pages will be accessible, and the API will return different results depending on your credentials... but in the near term, it will just error if you try to access a page that hits the API, and that is confusing.

Should we have better messaging for this temporary problem? If so, what?

Decide on roadmap for faceted search

This might be divided into several sub issues, but I think just agreeing to a roadmap is a prerequisite.

API design: Can the API team give us a spec in the near term, or should we make assumptions about functionality in the near term and implement them in the API-client, and leave the API itself for later?
Indexing: For each entity type, precisely what fields will be available? Until we get details on this from the API team, I don't think we should make any assumptions about the precise fields which will be available.
UI: We shouldn't be doing anything interesting or new in the UI. Nils can point at example he likes, or Chuck could, or we can make mockups from scratch, or ....?

Close this issue when precise sub-issues have been filed, and responsibilities are clear.

create default error page template

Create a generic error page template that can be used to handle any API errors that displays the error code.

Additionally, we should create specialized error pages for common errors (maybe 400, 401, 403, 404, 500, 502, 503, 503, 504) that show the error code as well as very simple explanation, in particular if the user could address the problem (e.g. by logging in or by navigating to the correct URL).

Better vitessce demo (deck.gl)

Show something with Deck.gl. (I worry there might be a problem there...)

agree-not-to-reidentify click-through

On first visiting the site, users should be prompted to confirm that they will not attempt to reidentify donors.

Blocking on:

Ok to do this just on the client side, with a log-lived cookie?

Can come later:

Exact wording.
Design.

Mockup: Dataset list

Dataset is one of the basic types: It needs a list page.

Nils should give requirements, and we should confirm that the API will have the information necessary.

Deliverable is a mockup which has approval from Nils. File a new issue for implementation when this is complete.

Get assurance that API will provide what we need

I've written up a document detailing what we need from the API. @shirey will review this, and either assure us that all of this will be provided by the API, at some point, or suggest that we need to find a different way of satisfying individual items.

Get human readable summary when you click on provenance node

Make link to protocol on details page

Nils has said that the format of this field is not consistent... so probably be flexible now, and file an issue for tighter validation upstream.

Add file details page to routes

Add file details page to routes, and specialize it so it could show file-specific visualization.

Mockup: Donor details

Donor is one of the basic types: It needs a details page.

Nils should give requirements, and we should confirm that the API will have the information necessary.

Deliverable is a mockup which has approval from Nils. File a new issue for implementation when this is complete.

Mockup: Sample list

Sample is one of the basic types: It needs a list page.

Nils should give requirements, and we should confirm that the API will have the information necessary.

Deliverable is a mockup which has approval from Nils. File a new issue for implementation when this is complete.

Mockup: Basic page template

@ngehlenborg : I believe you do not want a pull-down for types in the header? More generally, is the current basic page template ok, or does it need to be changed?

If changes are needed, state requirements here, and a mockup satisfying those requirements will be delivered.

Note: Current layout is material UI and react: I believe to simplify things, we want the base to be a Django template. Redoing the basic template will make hubmapconsortium/hubmap-data-portal#159 irrelevant.

Mockup: Donor list

Donor is one of the basic types: It needs a list page.

Nils should give requirements, and we should confirm that the API will have the information necessary.

Deliverable is a mockup which has approval from Nils. File a new issue for implementation when this is complete.

Use consts in commons

From @cborromeo :

I have begun collecting some of this information in this code: https://github.com/hubmapconsortium/commons/blob/master/hubmap_commons/hubmap_const.py. There is a series of variables called *_REQUIRED_ATTRIBUTE_LIST at the bottom of the file. These are the fields I've collected so far. I have a fairly consistent set of terms: uuid, DOI, and display DOI required for most items in Neo4j. I can add more fields, but we would need to ensure the UI honors these required fields.

Link "Pipelines" to HuBMAP Dockstore page

URL: https://dockstore.org/organizations/HuBMAP

Show male/female human anatomical diagrams on front page

These diagrams should contain outlines of the organs/tissues from which the HuBMAP consortium has collected data. These diagrams should be interactive; clicking an organ/tissue should do one or more of the following:

Filter the "data by time" plot to include only that tissue/organ
Show a tabular list of experiments which profiled that organ
Show genes which are differentially express in that tissue vs. the rest of the human body

... and anything else that seems appropriate.

Specific functionality for the anatomy diagram should probably be tracked with separate GitHub issues.

Dockerized version to run Portal easily

Implement searching by gene, to show in which tissues a gene is expressed

The front page should feature a simple "search by gene" interface, which would show tissues/organs which show expression of that gene above some threshold.

Provenance: Collapse nodes with lots of multiples (Hard?)

Talking with Alex, this is possible with the 4DN tool... not sure about in our wrapper.

Filter to just entities with spatial information

Use case suggested by Katy Borner: List all the entities with spatial location metadata.

Show preview images for imaging studies

We don’t intend this to be a very full-featured interface at first. We should perhaps choose a few randomly-selected images for mass spec or microscopy studies, and show previews of these in the data portal. These preview images can be saved by data processing pipelines; maybe we should save 5 or 10 and choose a random subset of the available images for previews.

Provide docker context

This repo should provide a docker context that can be used in docker compose like so:

services:
  flask-data-portal:
    build:
      context: https://github.com/hubmapconsortium/flask-data-portal.git#v0.0.x:context

Note:

context in docker-compose.yml can specify urls.
These urls can specify tags and subtrees.
I prefer pinning to git versions, rather than having the dockerhub intermediate. While this does mean a build will be required, rather than just taking an image, I think the simplicity is worth it.
Blocked on availability of shared docker base image: Our services have a shared set of concerns, so we shouldn't duplicate the work of setting up the containers. Bill wrote:

For our stuff we were standardizing on CentOS because that is what we have experience with, run in prod (will be less likely to have a kernel conflict) and track for security updates.
Also we are standardizing on uWSGI or Nginx in front of uWSGI because of the architecture of our API Gateway. ... I think it will be easy for Zhou to put a standard image together once he's got time.

Mockup: Sample details

Sample is one of the basic types: It needs a details page.

Nils should give requirements, and we should confirm that the API will have the information necessary.

Deliverable is a mockup which has approval from Nils. File a new issue for implementation when this is complete.

Add access control for private raw data (e.g. RNA-seq FASTQ)

We will almost definitely have some sensitive data, e.g. the FASTQ files from RNA-seq runs. We should implement access control ASAP so this isn't something we have to bolt on later.

Users who aren't authenticated or authorized should see all data, but be unable to download things that require additional access.

Point to demo API

From Zhou (Joe) Yuan to Everyone: (10:11 AM)
 Here are the testing APIs: http://entity-api.test.hubmapconsortium.org
http://uuid-api.test.hubmapconsortium.org/
http://ingest-api.test.hubmapconsortium.org/ 
From Zhou (Joe) Yuan to Everyone: (10:12 AM)
 All the available API endpoints are specified here: https://github.com/hubmapconsortium/gateway/blob/master/api_endpoints.prod.json

Support simple visualization of (expression) data

Implement a simple expression heatmap at the level of tissues/organs. This visualization should gain many other features over time, but we should start with something straightforward that we can (finish) implement(ing) ASAP.

The human anatomical diagram in production at https://demo1.hubmapconsortium.org/ currently re-colors the organs/tissues after page load, with random values from the REST API. Adjust this to:

only re-color the diagram after a user searches for/selects a gene
use real data and not random values
adjust the color of the organ/tissue outlines in addition to the "fill" color

Support linking between details views

Propose how the API should deliver references to other objects, and implement it in our mockup

With input from collaborators, propose top-of-page summary for details page

With input from @mruffalo , @shirey , and anyone else, make a proposal for what the top of the details page should look like. Deliverable is a google doc mock-up which @ngehlenborg has signed off on.

Deploy on real infrastructure

Docker container exists: folks in Pittsburgh should deploy this along with all the other containers, using whatever configuration management they are using.

Static pages: content

@ngehlenborg will generate some content to include on on static pages... Perhaps copy-and-pasting existing content? Or delegate a defined sub-task?

Text can either be pasted in this issue, or in a google doc.

(Framework to support this content is #62.)

Add HTML(5?) validator

Look at all the Entity APIs, and try to use them

Define advanced search and facetting requirements

These features may need clearer requirements before implementation. When requirements are defined and sub-issues filed, this issue can be closed.

Keyword search across all fields
Search within single field
Synonym resolution: Search "heart" and get "pericardium" results
Stemming: Search "pericardial" and get "pericardium" results
With a search result response, we will want to have counts of matching documents for each possible value for each enumerated field.

Visualize W3C PROV JSON

The backend will (eventually) provide PROV JSON describing the provenance of each node.

See if there are any acceptable existing visualization tools for this data.
- If so, get buy-in from Nils,
- If not, make a mock-up and get buy-in from Nils. Set up new repo for new React JS.
Incorporate the tool in portal-ui, using mock data.

Mock-up attribution details

Assuming the API will be able to provide us with this information: (based on ENCODE)

Contributor
Michael Snyder, Stanford
Award
U54HG006996 (Michael Snyder, Stanford)
Project
ENCODE
Date submitted
May 2, 2017
Internal release
....
Public release
May 17, 2017

Implement top of page summary

follow up to #47: implement https://docs.google.com/drawings/d/19eX28YnVaXNu2iPSnWhZZkCBaN8LLr1yoS4CWckAM0Y/edit

Show graph of how much data (cells/experiments/images/etc.) we have, by tissue

A prototype version of these graphs is currently implemented, but we'll want some more functionality for these. At least, we should probably show additional data per unit time, such as the number of cells, experiments, and images.

We should show these as cumulative graphs with time on the x-axis.

Visualize data processing pipeline runs

Adapt the 4DN workflow visualization function to our data portal, and use this to show the results of our data analysis runs. The 4DN interface consists of a large amount of ReactJS code, so this won't be trivial.

Example: https://data.4dnucleome.org/experiment-set-replicates/4DNESH4UTRNL/#graph-section

Additional information:

Each node in the visualization represents either a data file (at different levels of processing) or a workflow run
Clicking each node shows details about that data file or workflow run, allowing for (direct?) downloading of data
Workflow runs should link to the “abstract” workflow (with no concrete input/output files), and to the implementation of the workflow in GitHub/Docker Hub/Dockstore/etc.
Either convert Research Object (RO) output to input of visualization diagram, or adjust visualization diagram to read this directly

Add codegen from OpenAPI?

Vitessce "Hello World"

Load the Vitessce code from a CDN and get it working with sample data (data-uri) coming from the mock API.

Show QA/QC results where/when appropriate

Related to hubmapconsortium/hubmap-data-portal#7. This applies to both raw and processed data. This will require automated reporting of QA/QC metrics from analysis pipelines, also.

Example from 4DN: https://data.4dnucleome.org/experiment-set-replicates/4DNESH4UTRNL/#graph-section

Static pages: Framework

Provide a framework for hosting static text content, with a minimum of formatting... Try markdown, if it's not too difficult. Providing content is #61.

Best practice for expired tokens?

One morning, coming back to a session that should have been logged in, things were stuck: I think the tokens had expired, and I couldn't even click on logout, because that required the tokens, apparently.

In PR #53, I do this:

    try:
        tokens = session['tokens']
    except Exception:
        # TODO: After leaving it logged for several hours, my tokens had expired,
        # but I was still logged in. Is this the best fix?
        tokens = {}

which fixed my immediate problem, but I'm not sure it's the best approach. Maybe this is what refresh tokens are used for?

Fresh start on home page

The current homepage has gotten too far ahead of plans for the API. It began as a demonstration mock-up, and it did that well, but at this point the presence of UI elements which are just mockups makes it hard to communicate what work is still to be done. Nils has approved this minimal fresh start:

Welcome to HuBMAP

Show all:

Experiments TODO

Files TODO

Samples TODO

Protocols TODO

Organs TODO

Donors TODO

© Human BioMolecular Atlas Program. Supported by the NIH Common Fund | v0.2.2

Deliverable: Someone not-Chuck gets it up and running, and submits a PR for an road-bumps.

In page / modal login

@ngehlenborg would prefer if, instead of taking you to the globus page, login remained within HuBMAP. Is this possible?

hubmapconsortium / portal-ui Goto Github PK

portal-ui's Introduction

portal-ui

Feedback

Design

Development

Prerequisites

Development

Changelog files

File and directory structure conventions

Testing

Running tests locally without docker

Linting and pre-commit hooks

Storybook

Build, tag, and deploy

Understanding the build

Related projects and dependencies

Search and Metadata

Visualization

portal-ui's People

Contributors

Stargazers

Watchers

Forkers

portal-ui's Issues

Recommend Projects

Recommend Topics

Recommend Org