Giter Club home page Giter Club logo

materialsproject / mpcontribs Goto Github PK

View Code? Open in Web Editor NEW
34.0 10.0 22.0 26.92 MB

Platform for materials scientists to contribute and disseminate their materials data through Materials Project

Home Page: https://mpcontribs.org

License: MIT License

Python 38.63% HTML 9.65% JavaScript 8.96% Jupyter Notebook 40.68% Dockerfile 0.73% Shell 0.29% SCSS 0.43% Jinja 0.47% Makefile 0.17%
python docker mongodb django flask dissemination aws-cloudformation fargate flasgger swagger

mpcontribs's People

Contributors

acrutt avatar ardunn avatar atndiaye avatar bafflerbach avatar codacy-badger avatar dependabot-preview[bot] avatar dependabot[bot] avatar dwinston avatar fraricci avatar h0lland avatar johuck avatar josuav1 avatar knc6 avatar mkhorton avatar raulf2012 avatar rkingsbury avatar shyamd avatar smithmackensie96 avatar tschaume avatar waffle-iron avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

mpcontribs's Issues

Docker Container

make a docker image/container to run mpcontribs website/services on NERSC spin

advanced book-keeping

This issue lists next or more advanced steps with respect to book-keeping once issue #3 has been closed. For instance, a mandatory description field for each contribution as well as means to retrieve and search for specific contributions are helpful.

  • use reduced_formula/alphabetic_formula as composition cid
  • builders: set figure & axes titles automatically from table name & columns
  • builders: re-enable plotly updates without overwriting
  • mpfile.concat: account for comments in appended MPFile
  • client-side transfer of cids between MPFiles
  • mgc delete use abbreviated/shortened cids
  • establish mandatory root-level description field and make searchable (mgc find)
  • mgc log/info à la git log --oneline with short description message
  • implement mgc get to return an auto-generated MPFile with embedded cid
  • mgc delete: also delete plotly graphs from user account

cloudformation: use template for Atlas MongoDB deployment

The Atlas MongoDB deployment and its peering connection from the AWS VPC is currently set up manually every time a new pipeline and thus a new collection of stacks is required. This can happen, for instance, if development stacks are needed, a new region is explored, or new infrastructure components are being tested.

The solution is to use MongoDB's Atlas Resource Providers in the Cloudformation templates directly. This would also set up peering connections automatically.

This change will likely result in a brand new Atlas MongoDB deployment which will implicitly use the new Private Connection Strings. The environment variable for the API containers will have to be updated accordingly.

MPFileViewer: "On-the-Fly" Analyses

Establish a graphs only endpoint to enable "on-the-fly" analyses during post-submission processing, i.e. producing non-persistent Plotly graphs without any MP or Plotly interaction.

List of Projects



  • Tess Ferroelectrics
  • NREL HTEM (P. Graf, V. Stevanovic)
  • Gas Phase Core Excitation Database (Z. Hussain)
  • Joey Google Sheet?
  • NSLS-II Databroker
  • martin_lab
  • Dibbs
  • MP Workshop 2017
  • QMCDB
  • PyCDT Defect Paderborn (Wiebeler)
  • BURP BS accuracy benchmark (@dwinston)
  • Walter Lambrecht CASE DMREF
  • Maria Chan, Cynthia Lo (ANL/Washington); some diffusion data
  • Philip Ross, Jinghua Guo (LBNL/ALS)
    XES/XAS spectra, band gaps, complex oxides, Mott insulators, 50-60 materials
  • Ramamurthy Ramprasad, Huan/Arun (Connecticut/MURI)
    Description: basic properties of ca. 500 polymers. Structure predictions for
    new polymers. Part of Materials Explorer. Search by ploymer group not element!
  • Sefaattin Tongay (Arizona State)
    Description: experimental (Photoluminescence, Absorption spectroscopy,
    Scanning Tunneling spectroscopy, Raman spectroscopy, XRD) and theoretical
    (Band structure on 2D materials calculated by PBE/HSE06/QMC, Phonon
    dispersion) input on wide range of 2D and 1D material systems. EFRI 2DARE
    proposal: integrate our results on 2D materials with MP. Openly share our
    findings from 5 teams from M.I.T (Prof. Grossman at Materials), Northwestern
    (Prof. Aydin at ECE), and ASU (Prof. Tongay, Wang, and Jiang). Our findings
    will be very compatible with the data currently presented on MP.
  • Mark Jarrell (LSU); CORRELPACK proposal
  • Andriy Zakutayev (NREL); catalog experimental data on thin film samples
  • Atsushi Togo (Kyoto university, Japan); phonon calculations based on MP crystal structures
  • Steve Kevan (ALS/LBNL); experimental combi XAS spectra
  • Warren Pickett (Davis); semiconductors, user ticket #141
  • W. Butler (MINT/Alabama); contribution of data generated by the group
  • Nicola Marzari (EPFL); AiiDA
  • Yuanjun Zhou, Karin Rabe (Rutgers); input our first-principles results into the database
  • NMGC center (Minnesota); data driving nano-porous app

database migration

All storage pertaining to user data contributions needs to be separated into its own database to avoid any interaction with MP's core datasets. In addition to minor improvements, this transition requires new configuration files and updated adapters for the front-end.

ToDo's:

  • switch tree/table/plots fields in contribution_data to reflect front-end
  • make --reset work with mpcontribs database
  • use normal MongoDB object_id as contribution ID, last 6 hex characters for humans
  • move contribution_data field to materials/compositions collection
  • in materials/composition collection, use mp_cat_id as object_id
  • make MPContributionsBuilder work again
  • separate materials and composition collections
  • refactor python -m mpcontribs into subcommands fake/submit/reset, retire insert option
  • migrate to mpcontribs database in materialsdjango's utils.connector
  • make mgc subcommands info and submit [new file] work
  • adjust adapter and rest interface to use single contribution mode
  • refactor CMA/MCB to log output of submit and build process to user console
  • make sure that mgc delete/viewer subcommands are still functional
  • adjust material_contributions view in materials_django to query correct db and propagate correct data to template

Error reporting and validation of MPFile

This issue keeps track of errors which should be caught by the parser to warn the user of potential problems with the MPFile. The error messages should be verbose enough to guide the user through error resolution.

  • root-level section header always and exclusively contains materials identifier
  • the line for the table name can be omitted if no other level-1 sections are included
  • make sure to include a space after the >>> and : separators
  • try not to include additional colons in the values of key-value sections

pip error about jsonschema version

pip complains:

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
mpcontribs-client 4.2.9 requires jsonschema<4.0, but you have jsonschema 4.4.0 which is incompatible.

Is the pinning actually necessary?

"jsonschema<4.0",

Standalone MPContribs

  • materialsproject/mp-jupyter-docker#1
  • decouple MPContribs website from materials_django (run barebone webtzite as in JupyterHub)
  • rename alpha -> contribs.materialsproject.org
  • use CAS SSO to accept users coming from MP w/o login
    • also for JupyterHub login, set PMP_MAPI_KEY
  • MPContribs button in JupyterHub: dropdown w/ direct links to ingester and portals

references/credits

Along with credits to projects and collaborators (see #1), it is important to prominently display references on the contribution pages. Thus they become special entries in the root-level key-value section(s) of MPFiles ("first-class citizens"). The reference format should minimally support URLs (incl. access date), bibtex strings, and DOIs.

ToDos to support this feature:

  • define key-value formats of references
  • catch unallowed formats for references during import
  • implement reference post-processing during builder stage
  • new django ticket: render references on contribution page
  • demo video (possibly merge with #1)

Dilute Solute Diffusion

  • pre_submission: bring up to date and clean up (reorganize data submission?)
  • clean up and revisit template code
  • check contribution cards
  • make screenshots to include in portal

Dataset doesn't seem to download (shows 0%, then progress percentage goes away)

https://contribs.materialsproject.org/projects/carrier_transport

Tried CSV, JSON, with and without tables. Maybe has to do with the dataset being fairly large? Tried on Google Chrome Win 10. A colleague of mine ran into the same problem (@AndrewFalkowski) and I ran into the same issue. Incidentally, we can download the data via https://datadryad.org/stash/dataset/doi:10.5061/dryad.gn001 (using the "References" link in the top-left), so it's non-blocking on our workflow, but figured it was worth mentioning.

local install for combining mpcontribs and MAPI queries

See this forum post on filtering for materials by electrical conductivity. I would like to post a snippet where someone who has pip installed pymatgen and mpcontribs and who has set their API key using the pmg CLI can use the boltztrap rester to get mp-ids for materials with given electrical conductivity and feed those into MPRester to get additional core data. I currently cannot do this because documented use of mpcontribs is via MP's hosted jupyterhub only, and furthermore it seems like a local install would require some work because mpcontribs is currently python2 only.

Rudimentary MPFile Viewer

Develop a basic Flask web app for the user to "simulate" the contribution of an MPFile on a local machine. The app should load an MPFile and show the resulting contributions in an interactive way similar to the MP front-end. For reasons of simplicity, the internal parser and builder phases should happen in a purely dynamic manner (in memory) without the need to setup/interact with a database.

Such an app would allow interested contributors (e.g. @ATNDiaye) to try out the framework before its (official) deployment, or at a later stage, before going through the official submission process.

ToDo's:

  • setup basic flask web app
  • make ContributionsMongoAdapter compatible with db-less operation
  • make MPContributionsBuilder compatible with db-less operation (tree/table-only)
  • setup basic template for rendering contributions
  • render tree_data using JSONTree
  • render tables using DataTables
  • enable plots in MPContributionsBuilder for db-less mode
  • render plots via Plotly URL
  • include header to jump to specific contribution
  • connect UI with mgc via subcommand (open/reload browser window)
  • load input file from command line and auto-determine contributor
  • include buttons to choose and view input file
  • fix level counting bug in io.recparse
  • fix missing data_ prefix in plots.xxx.table
  • test the workflow with a few input files
  • make MPContribs installable from GitHub as a package, see #9
  • MPFile.from_file: also check for unicode not only str
  • close issue: write instructions to install and run mgc viewer (tag users)

book-keeping

Upon submission of a MPFile, its contents are linked to a unique contribution ID (cid). Updates/Overwrites/Deletions of contributions subsequently require the knowledge of affected cids with respect to the MPFile being submitted. To facilitate the book-keeping, there has to be a mechanism in place that allows users to keep track of the respective cid.

ToDos to support this feature:

  • implement mode for MPFile.from_file() which keeps comments (02f1588)
  • include comments in output of MPFile.get_string() (6a2db77, 17d9e9c)
  • re-generate global general section from single contributions (make_general_section 5108c63)
  • prepend not append general section in make_general_section (349dbbb)
  • insert contribution id into MPFile and test w/ viewer (e7fb477)
  • mgc submit: implement dry run mode to test submission vs scratch-type collections (0519db7)
  • implement test mode for submissions to reuse IDs and reduce Plotly spam (143bb82)
  • make sure all strings are processed and submitted in utf-8 encoding (6d96d77)
  • mgc submit: overwrite/update local MPFile by embedding cid
  • mgc submit: check for embedded cid and trigger correct action (add/update)

Dependabot couldn't find a Dockerfile for this project

Dependabot couldn't find a Dockerfile for this project.

Dependabot requires a Dockerfile to evaluate your project's current Docker dependencies. It had expected to find one at the path: /mpcontribs-sidecar/chrome/Dockerfile.

If this isn't a Docker project, or if it is a library, you may wish to disable updates for it in the .dependabot/config.yml file in this repo.

View the update logs.

Error reporting in MPFile parser

It would be very helpful to have a more meaningful error-reporting if an MPFile cannot be parsed correctly. As a start the number of the line in the MPFile which triggers the error would be helpful.

I get:

File "/usr/lib/python2.7/dist-packages/pandas/io/parsers.py", line 1052, in __init__
   self._reader = _parser.TextReader(src, **kwds)
 File "pandas/parser.pyx", line 508, in pandas.parser.TextReader.__cinit__ (pandas/parser.c:4775)
ValueError: No columns to parse from file

P.S.: I just found the preview button :)

MPFileViewer: Offline Switch

Allow the MPFile viewer to be run completely in offline mode, i.e. no interactive graphing via Plotly nor loading of javascript libraries from CDNs. (@ATNDiaye)

Dependabot couldn't find a Dockerfile for this project

Dependabot couldn't find a Dockerfile for this project.

Dependabot requires a Dockerfile to evaluate your project's current Docker dependencies. It had expected to find one at the path: /mpcontribs-sidecar/kernel_gateway/Dockerfile.

If this isn't a Docker project, or if it is a library, you may wish to disable updates for it in the .dependabot/config.yml file in this repo.

View the update logs.

Contribution Detail Pages

  • fix toggles for contribution components
  • numbers are not sorted correctly in table [strings -> float]
  • add modals as placeholder for:
    • add link to project's landing page
    • include button to start notebook in JupyterHub
  • bugfix: "notebook/js/outputarea.js"
  • grouped columns don’t quite work in NB but work in landing page
    -> backgrid-patch loading [only work on restarting notebook after output saved]

Unit Tests

Already integrated with Travis CI but only running one simple test for MPFile.get_string() (see test_mpfile.py)

  • figure out why Travis CI currently failing
  • write lots more tests!! :)

Contribution Cards

  • bugfix: et al. tooltip not loading as expected. cafe804
  • link to or show identifier in panel-heading?
  • make cards scrollable to fix grid row height
  • explorer: replace "projects" with "title" query d491f4e
  • check all queries of generic explorer to respond with cards
  • update materials_django to work with new /card/cid endpoint @dwinston

ALS Beamline v2

  • @ATNDiaye is working on larger contribution
  • smooth area in ternary instead of points (see post_submission.py)
  • link composition in table to according MP query link in MP Explorer

portal: S3 versioned downloads of projects/datasets

Identical to FigShare's ndownloader URLs, the MPContribsML deployment already supports downloading full matbench_* datasets through AWS S3 compressed as .json.gz files using a top-level URL, e.g. https://ml.materialsproject.cloud/matbench_expt_gap.json.gz. This functionality is being used in hackingmaterials/matminer#446 to load datasets for mining.

Powered by the /contributions/download/ endpoint in the MPContribs API, downloads through the according button on the MPContribs landing pages are currently being generated dynamically/server-side and respect the filters and sorting (query parameters) applied in the UI. Both csv and json formats are supported, each with or without full structure objects (full vs minimal). Right now, the only supported MIME type is gz but other types could be valuable to implement in the future (e.g. bzip, jpeg, png, vnd.plotly.v1+json, x-ipynb+json, ods, pdf, tar, zip, excel, xml).

To reduce duplicate and potentially heavy DB queries, the download endpoint could encode the query in the filename and save it to S3 upon first request or after update of the underlying data. This would implicitly maintain versioned snapshots of the datasets as API POST/PUT requests would add a timestamp to the old file and the next GET request would generate a new file. The S3 bucket storing the exported project data would have a sub-folder for each MPContribs deployment (Main, ML, LightSources, ...).

A progress bar is needed while the first export file for a project and query is generated on S3. It would use server-side events and a Redis cache as already implemented for the dynamic (re-)generation of Jupyter notebooks which power MPContribs Contribution Details Pages.

If a file for the specific project and query without a timestamp exists, the API's /contributions/download/ endpoint would simply return a 302 Redirect to S3 - thus relegating download traffic to S3. Alternatively, the API could use the boto3 client to retrieve the file from S3 and load it into memory, and then return it as a response to the request. However, this would cause unnecessary implementation, maintenance, and monitoring efforts as well as strain on the API Fargate tasks.
EDIT 06/19/2020: I chose to always go through the API Fargate task and keep the S3 bucket private (next paragraph outdated)

The consequence of a simple redirect is that authentication/authorization can be enforced on generating the file export (saving to S3) on the first request but not on subsequent download requests from the public S3 bucket. The MPContribs URLs for the portal and the API could technically still use authentication/authorization for retrieval of the data exports but the URL to the S3 object would need to be public anyways. S3 storage of export files would thus only be enabled for public projects which could be an additional inducement for contributors to make their data available to the public.

Saving files from the API Fargate task to S3 does not incur extra data traffic or processing costs since the S3 Gateway Endpoint is free (as opposed to a NAT Gateway) and the S3 bucket is in the same AWS region. However, there will be costs related to traffic caused by downloads of the (compressed and predominantly small) S3 objects and its bare storage. The latter can be optimized by setting up lifecycle policies which automatically move objects into other storage levels depending on their monthly access frequency. For instance, old timestamped snapshots would likely move into cheaper Glacier storage since they'll only be needed/downloaded occasionally.

Martin Lab

  • data upload via Google/Dropbox URL or NERSC ssh-fs?
  • ingestion: reduce the data accuracy (bit size) or downsample the image on input? prefer to reduce the bit size if possible.
  • design simple landing page based on Josh’s notebook
  • static images in export_notebook due to size restrictions / responsiveness
  • @jagar2 is working on ipywidgets form to interactively prepare mpfile_init.txt

projects

use MPContribsUsers modules as projects:

  • set project in mpcontribs collections
  • link contribution card title to project landing page
  • replace institutions query in Explorer w/ projects

Installation problems with missing 'docker' dir in git repo

Hi,

I seem to be having some trouble getting set up following the 'INSTALL.md' instructions and was hoping you might be able to point me in the right direction.

I first tried following instructions from the the INSTALL.md file in this git repo, but it looks like a directory needed for the installation (called 'docker') is missing from the repo.

I next tried using notebooks within the MPContribs JupyterHub portal. The notebooks didn't seem to work in there right away so I tried following the instructions in the INSTALL.md from a terminal in the JupyterHub portal. On this attempt, I got further since the 'docker' directory was present but only as for as 'git checkout -b flaskproxy origin/flaskproxy' where I think I was meant to be accessing files from docker/dockerspawner from the remote repository which were no longer there.

Apologies if I am just missing something trivial!

Delete Contributions

MPContribsRester.delete_contributions() doesn’t clean up derived collections!

CLI mgc: collaborators

A contribution is not assigned to a particular user with a given e-mail address but rather a project/institution encompassing a list of collaborators. Everyone in the list is allowed to edit/update an existing contribution as well as add/remove collaborators. A sub-command to the mgc command line program enables full control over the collaborators. The first collaborator in the list is considered primary contact.

This functionality is a prerequisite for the important support of an incubation period during which only collaborators can access a contribution before publication.

ToDos to support this feature:

  • employ user institution as project:
    1722be7, materialsproject/materials_django@0808a76acef1b5b7d333daca607a6b38518f1050
  • implement list of collaborators for each contribution and restrict permissions:
    4b93f72, materialsproject/materials_django@75bb620e6f733159e83ce48dea53d0d15cee882d
  • mgc collab sub-command to control collaborators and primary contact (POC):
    1375b78
  • implement mgc collab functionality in rest.views.update_collaborators:
    • process collaborators shortcuts into "authors":
      materialsproject/materials_django@e76de6e07bf15525e1536d174e12df6bb7628f9e, materialsproject/materials_django@84a0e4ce38a43f4d2d619d7a8acac79281e516ad
    • update collaborators in contributions collection based on mode
    • build update into materials collection
  • perform extended testing with different accounts (provides example for django issue)
  • new django issue: add list of contributors on contribution page (1st email = POC)
  • demo video for collaborators functionality

Perovskites Diffusion

  • pre_submission: bring up to date and clean up (reorganize data submission?)
  • clean up and revisit template code
  • check contribution cards
  • make screenshots to include in portal
  • use Backgrid.ColumnManager to manage display of columns
    (current landing page implementation can then be discarded)

trouble installing

I tried installing into a fresh virtualenv but ran into the following:

(mgc)λ> python setup.py install
running install
running bdist_egg
running egg_info
writing requirements to mpcontribs.egg-info/requires.txt
writing mpcontribs.egg-info/PKG-INFO
writing top-level names to mpcontribs.egg-info/top_level.txt
writing dependency_links to mpcontribs.egg-info/dependency_links.txt
reading manifest file 'mpcontribs.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
warning: no files found matching 'README.asc'
writing manifest file 'mpcontribs.egg-info/SOURCES.txt'
installing library code to build/bdist.macosx-10.10-x86_64/egg
running install_lib
running build_py
error: Error: setup script specifies an absolute path:

    /Users/dwinston/Dropbox/materialsproject/MPContribs/scripts/mgc

setup() arguments must *always* be /-separated paths relative to the
setup.py directory, *never* absolute paths.

MPFileViewer: fold in Data Tables

If it is not a huge deal to implement, it would be great if the datables could be collapsed just like the elements of the attribute tree.

MGC viewer: monty dependency check

A dependency check throws an error, although I have an updated version of monty (I think).

$ mgc 
Traceback (most recent call last):
  File "/usr/local/bin/mgc", line 4, in <module>
    __import__('pkg_resources').require('mpcontribs==0.0')
  File "/usr/lib/python2.7/dist-packages/pkg_resources/__init__.py", line 3020, in <module>
    working_set = WorkingSet._build_master()
  File "/usr/lib/python2.7/dist-packages/pkg_resources/__init__.py", line 616, in _build_master
    return cls._build_from_requirements(__requires__)
  File "/usr/lib/python2.7/dist-packages/pkg_resources/__init__.py", line 629, in _build_from_requirements
    dists = ws.resolve(reqs, Environment())
  File "/usr/lib/python2.7/dist-packages/pkg_resources/__init__.py", line 807, in resolve
    raise DistributionNotFound(req)
pkg_resources.DistributionNotFound: monty==0.6.4
In [1]: import monty
In [2]: monty.__version__
Out[2]: '0.6.5'

client: support regex-based query parameters

The MPContribs API supports dynamic query parameters based on regular expressions for the hierarchical data in /contributions/. For instance, any query parameter matching ^data__((?!__).)*$__gte will be accepted to query a (potentially nested) data subfield via the $gte query operator. The sort and filter functionalities on the MPContribs landing pages for a project rely on this feature of the API.

However, validation in bravado as part of the python client library (mpcontribs-client) fails to recognize query parameters when they match a regex definition in the swagger spec. A solution could be the following:

  1. Use the dynamic columns field in the /projects/ endpoint (_fields=columns) to obtain a list of possible data subfields.
  2. Rewrite the Swagger spec retrieved from the API server dynamically by expanding all regex-based query parameters based on the list of columns in 1.
  3. Reinitiate the client and return it to the user.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.