Giter Club home page Giter Club logo

htan-portal's Introduction

HTAN Data Portal

This repo contains the code for the Human Tumor Atlas Network Data Portal

Framework

This is a Next.js project bootstrapped with create-next-app

Backend

All data is coming from Synapse. We have a Python script that generates a JSON file that contains all the metadata. There is currently no backend, it's a fully static site i.e. all filtering happens on the frontend.

Update Data Files

Update release information

Only certain metadata rows and data files on Synapse are released. We keep track of this information in Google BigQuery. One can get the latest dump of that using these commands (requires access to the htan-dcc google project):

bq extract --destination_format CSV released.entities_v5 gs://htan-release-files/entities_v5.csv
bq extract --destination_format CSV released.metadata_v5 gs://htan-release-files/metadata_v5.csv
gsutil cp gs://htan-release-files/entities_v5.csv entities_v5.csv
gsutil cp gs://htan-release-files/metadata_v5.csv metadata_v5.csv

Pull files from Synapse and Process for ingestion

cd data
# Run the script that pulls all the HTAN metadata
# It outputs a JSON in public/syn_data.json and a JSON with links to metadata in data/syn_metadata.json
python get_syn_data.py
cd ..
# Find and replace certain values (this is a temp fix)
yarn findAndReplace
# we store the result of this in gzipped format
gzip -c public/syn_data.json > public/syn_data.json.gz
# Convert the resulting  JSON to a more efficient structure for visualization
# Note: we output stdout and stderr to files to share these with others for
# data qc debugging purposes
# TODO: there is a ssl legacy provider is hack for
# https://stackoverflow.com/questions/69692842/error-message-error0308010cdigital-envelope-routinesunsupported
yarn processSynapseJSON > data/processSynapseJSON.log 2> data/processSynapseJSON.error.log
# we also store the processed data in gzipped format
gzip -c public/processed_syn_data.json > public/processed_syn_data.json.gz

Export to bucket

At the moment all data is hosted on S3 for producion. This is because there is a file size limit for vercel. To update it:

  1. gzip file (note that it's already gzipped in the repo)
  2. Remove ".gz" extension so it's just json and rename to include current date in filename.
  3. Upload file to s3 bucket "htanfiles" (part of schultz AWS org)
  4. The file needs two meta settings: Content-Encloding=gzip and Content-Type=application/json
  5. Once file is up, change path in /lib/helpers.ts

Or step 1-4 as command:

MY_AWS_PROFILE=inodb
aws s3 cp processed_syn_data.json.gz s3://htanfiles/processed_syn_data_$(date "+%Y%m%d_%H%M").json --profile=${MY_AWS_PROFILE} --content-encoding gzip --content-type=application/json --acl public-read
aws s3 cp metadata_gzip s3://htanfiles/metadata --recursive --profile=${MY_AWS_PROFILE} --content-encoding gzip --content-type=text/csv --acl public-read

Testing

There are currently no automated tests, other than building the project, so be careful when merging to master

Getting Started

First, run the development server:

npm run dev
# or
yarn dev

Open http://localhost:3000 with your browser to see the result.

You can start editing any page. The page auto-updates as you edit the file.

Debugging processSynapseJSON

Add debugger; somewhere in the code. Then run:

node --openssl-legacy-provider ./node_modules/.bin/ncc build --source-map --no-source-map-register  data/processSynapseJSON.ts

Followed by:

node  --inspect-brk dist/index.js

Now you can attach to it in e.g. VSCode

Learn More about Next.js

To learn more about Next.js, take a look at the following resources:

Deployment

The app is deployed using the ZEIT Now Platform from the creators of Next.js.

htan-portal's People

Contributors

inodb avatar onursumer avatar alisman avatar milen-sage avatar ecerami avatar adamjtaylor avatar ethansiegl avatar adamabeshouse avatar dependabot[bot] avatar linglp avatar leexgh avatar nelliney avatar clarisse-lau avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.