The mpcontribs from materialsproject

render details pages w/o building notebooks

MPFileViewer: Larger Plotly Graphs

Increase the size of embedded plotly graphs to improve readability and manageability. @ATNDiaye

Docker Container

make a docker image/container to run mpcontribs website/services on NERSC spin

This issue lists next or more advanced steps with respect to book-keeping once issue #3 has been closed. For instance, a mandatory description field for each contribution as well as means to retrieve and search for specific contributions are helpful.

cloudformation: use template for Atlas MongoDB deployment

The Atlas MongoDB deployment and its peering connection from the AWS VPC is currently set up manually every time a new pipeline and thus a new collection of stacks is required. This can happen, for instance, if development stacks are needed, a new region is explored, or new infrastructure components are being tested.

The solution is to use MongoDB's Atlas Resource Providers in the Cloudformation templates directly. This would also set up peering connections automatically.

This change will likely result in a brand new Atlas MongoDB deployment which will implicitly use the new Private Connection Strings. The environment variable for the API containers will have to be updated accordingly.

MPFileViewer: "On-the-Fly" Analyses

Establish a graphs only endpoint to enable "on-the-fly" analyses during post-submission processing, i.e. producing non-persistent Plotly graphs without any MP or Plotly interaction.

MPFileViewer: Node list of the MPFile is reordered

The node list of the MPFile is ordered alphabetically. Especially in the processes it is important, that the order from the file is retained.

List of Projects

load testing before release? https://locust.io/
disambiguation page?
http -> https

Composition/Formula Subscript Unicode

use pymatgen translation of composition/formula to string with subscript unicode characters materialsproject/pymatgen#1029. Done e65460d
make backgrid search field work without having to enter unicode subscripts (works for copy-pasting compositions) @mkhorton

database migration

All storage pertaining to user data contributions needs to be separated into its own database to avoid any interaction with MP's core datasets. In addition to minor improvements, this transition requires new configuration files and updated adapters for the front-end.

ToDo's:

Error reporting and validation of MPFile

This issue keeps track of errors which should be caught by the parser to warn the user of potential problems with the MPFile. The error messages should be verbose enough to guide the user through error resolution.

root-level section header always and exclusively contains materials identifier
the line for the table name can be omitted if no other level-1 sections are included
make sure to include a space after the >>> and : separators
try not to include additional colons in the values of key-value sections

pip error about jsonschema version

pip complains:

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
mpcontribs-client 4.2.9 requires jsonschema<4.0, but you have jsonschema 4.4.0 which is incompatible.

Is the pinning actually necessary?

MPContribs/mpcontribs-client/setup.py

Line 32 in b7b2e21

"jsonschema<4.0",

implement list of identifiers for contributions

Standalone MPContribs

materialsproject/mp-jupyter-docker#1
decouple MPContribs website from materials_django (run barebone webtzite as in JupyterHub)
rename alpha -> contribs.materialsproject.org
use CAS SSO to accept users coming from MP w/o login
- also for JupyterHub login, set PMP_MAPI_KEY
MPContribs button in JupyterHub: dropdown w/ direct links to ingester and portals

references/credits

Along with credits to projects and collaborators (see #1), it is important to prominently display references on the contribution pages. Thus they become special entries in the root-level key-value section(s) of MPFiles ("first-class citizens"). The reference format should minimally support URLs (incl. access date), bibtex strings, and DOIs.

ToDos to support this feature:

define key-value formats of references
catch unallowed formats for references during import
implement reference post-processing during builder stage
new django ticket: render references on contribution page
demo video (possibly merge with #1)

Dilute Solute Diffusion

pre_submission: bring up to date and clean up (reorganize data submission?)
clean up and revisit template code
check contribution cards
make screenshots to include in portal

JupyterHub: MPContribs Dropdown Menu

Implement the MPContribs button in JupyterHub as a dropdown with direct links to the ingester and local/production portals

Dataset doesn't seem to download (shows 0%, then progress percentage goes away)

https://contribs.materialsproject.org/projects/carrier_transport

Tried CSV, JSON, with and without tables. Maybe has to do with the dataset being fairly large? Tried on Google Chrome Win 10. A colleague of mine ran into the same problem (@AndrewFalkowski) and I ran into the same issue. Incidentally, we can download the data via https://datadryad.org/stash/dataset/doi:10.5061/dryad.gn001 (using the "References" link in the top-left), so it's non-blocking on our workflow, but figured it was worth mentioning.

local install for combining mpcontribs and MAPI queries

See this forum post on filtering for materials by electrical conductivity. I would like to post a snippet where someone who has pip installed pymatgen and mpcontribs and who has set their API key using the pmg CLI can use the boltztrap rester to get mp-ids for materials with given electrical conductivity and feed those into MPRester to get additional core data. I currently cannot do this because documented use of mpcontribs is via MP's hosted jupyterhub only, and furthermore it seems like a local install would require some work because mpcontribs is currently python2 only.

use LambdaFunction to generate downloads

fixes #959 #1224

Rudimentary MPFile Viewer

Develop a basic Flask web app for the user to "simulate" the contribution of an MPFile on a local machine. The app should load an MPFile and show the resulting contributions in an interactive way similar to the MP front-end. For reasons of simplicity, the internal parser and builder phases should happen in a purely dynamic manner (in memory) without the need to setup/interact with a database.

Such an app would allow interested contributors (e.g. @ATNDiaye) to try out the framework before its (official) deployment, or at a later stage, before going through the official submission process.

ToDo's:

MGC Viewer: Refresh button

A refresh button which reloads the file and generates new plots would be nice.

MPFileViewer: stalling 'figure' windows

Opens unused figure windows. (see screenshot)

book-keeping

Upon submission of a MPFile, its contents are linked to a unique contribution ID (cid). Updates/Overwrites/Deletions of contributions subsequently require the knowledge of affected cids with respect to the MPFile being submitted. To facilitate the book-keeping, there has to be a mechanism in place that allows users to keep track of the respective cid.

ToDos to support this feature:

MPFileViewer: enable Plotly customization

Find a way for customizations of Plotly graphs to also show up in the MPFileViewer. This will be limited to rudimentary customizations of the default graph (see #13).

Support for empty sections

MPFile import crashes if MPFile has empty sections.

Dependabot couldn't find a Dockerfile for this project

Dependabot couldn't find a Dockerfile for this project.

Dependabot requires a Dockerfile to evaluate your project's current Docker dependencies. It had expected to find one at the path: /mpcontribs-sidecar/chrome/Dockerfile.

If this isn't a Docker project, or if it is a library, you may wish to disable updates for it in the .dependabot/config.yml file in this repo.

View the update logs.

Error reporting in MPFile parser

It would be very helpful to have a more meaningful error-reporting if an MPFile cannot be parsed correctly. As a start the number of the line in the MPFile which triggers the error would be helpful.

I get:

File "/usr/lib/python2.7/dist-packages/pandas/io/parsers.py", line 1052, in __init__
   self._reader = _parser.TextReader(src, **kwds)
 File "pandas/parser.pyx", line 508, in pandas.parser.TextReader.__cinit__ (pandas/parser.c:4775)
ValueError: No columns to parse from file

P.S.: I just found the preview button :)

MPFileViewer: Offline Switch

Allow the MPFile viewer to be run completely in offline mode, i.e. no interactive graphing via Plotly nor loading of javascript libraries from CDNs. (@ATNDiaye)

Dependabot couldn't find a Dockerfile for this project

Dependabot couldn't find a Dockerfile for this project.

Dependabot requires a Dockerfile to evaluate your project's current Docker dependencies. It had expected to find one at the path: /mpcontribs-sidecar/kernel_gateway/Dockerfile.

If this isn't a Docker project, or if it is a library, you may wish to disable updates for it in the .dependabot/config.yml file in this repo.

View the update logs.

Contribution Detail Pages

fix toggles for contribution components
numbers are not sorted correctly in table [strings -> float]
add modals as placeholder for:
- add link to project's landing page
- include button to start notebook in JupyterHub
bugfix: "notebook/js/outputarea.js"
grouped columns don’t quite work in NB but work in landing page
-> backgrid-patch loading [only work on restarting notebook after output saved]

Unit Tests

Already integrated with Travis CI but only running one simple test for MPFile.get_string() (see test_mpfile.py)

figure out why Travis CI currently failing
write lots more tests!! :)

Contribution Cards

bugfix: et al. tooltip not loading as expected. cafe804
~~link to or show identifier in panel-heading?~~
~~make cards scrollable to fix grid row height~~
explorer: replace "projects" with "title" query d491f4e
check all queries of generic explorer to respond with cards
update materials_django to work with new /card/cid endpoint @dwinston

ALS Beamline v2

@ATNDiaye is working on larger contribution
smooth area in ternary instead of points (see post_submission.py)
link composition in table to according MP query link in MP Explorer

portal: S3 versioned downloads of projects/datasets

Identical to FigShare's ndownloader URLs, the MPContribsML deployment already supports downloading full matbench_* datasets through AWS S3 compressed as .json.gz files using a top-level URL, e.g. https://ml.materialsproject.cloud/matbench_expt_gap.json.gz. This functionality is being used in hackingmaterials/matminer#446 to load datasets for mining.

Powered by the /contributions/download/ endpoint in the MPContribs API, downloads through the according button on the MPContribs landing pages are currently being generated dynamically/server-side and respect the filters and sorting (query parameters) applied in the UI. Both csv and json formats are supported, each with or without full structure objects (full vs minimal). Right now, the only supported MIME type is gz but other types could be valuable to implement in the future (e.g. bzip, jpeg, png, vnd.plotly.v1+json, x-ipynb+json, ods, pdf, tar, zip, excel, xml).

To reduce duplicate and potentially heavy DB queries, the download endpoint could encode the query in the filename and save it to S3 upon first request or after update of the underlying data. This would implicitly maintain versioned snapshots of the datasets as API POST/PUT requests would add a timestamp to the old file and the next GET request would generate a new file. The S3 bucket storing the exported project data would have a sub-folder for each MPContribs deployment (Main, ML, LightSources, ...).

A progress bar is needed while the first export file for a project and query is generated on S3. It would use server-side events and a Redis cache as already implemented for the dynamic (re-)generation of Jupyter notebooks which power MPContribs Contribution Details Pages.

If a file for the specific project and query without a timestamp exists, ~~the API's /contributions/download/ endpoint would simply return a 302 Redirect to S3 - thus relegating download traffic to S3. Alternatively,~~ the API could use the boto3 client to retrieve the file from S3 and load it into memory, and then return it as a response to the request. ~~However, this would cause unnecessary implementation, maintenance, and monitoring efforts as well as strain on the API Fargate tasks.~~
EDIT 06/19/2020: I chose to always go through the API Fargate task and keep the S3 bucket private (next paragraph outdated)

The consequence of a simple redirect is that authentication/authorization can be enforced on generating the file export (saving to S3) on the first request but not on subsequent download requests from the public S3 bucket. The MPContribs URLs for the portal and the API could technically still use authentication/authorization for retrieval of the data exports but the URL to the S3 object would need to be public anyways. S3 storage of export files would thus only be enabled for public projects which could be an additional inducement for contributors to make their data available to the public.

Saving files from the API Fargate task to S3 does not incur extra data traffic or processing costs since the S3 Gateway Endpoint is free (as opposed to a NAT Gateway) and the S3 bucket is in the same AWS region. However, there will be costs related to traffic caused by downloads of the (compressed and predominantly small) S3 objects and its bare storage. The latter can be optimized by setting up lifecycle policies which automatically move objects into other storage levels depending on their monthly access frequency. For instance, old timestamped snapshots would likely move into cheaper Glacier storage since they'll only be needed/downloaded occasionally.

Martin Lab

data upload via Google/Dropbox URL or NERSC ssh-fs?
ingestion: reduce the data accuracy (bit size) or downsample the image on input? prefer to reduce the bit size if possible.
design simple landing page based on Josh’s notebook
static images in export_notebook due to size restrictions / responsiveness
@jagar2 is working on ipywidgets form to interactively prepare mpfile_init.txt

projects

use MPContribsUsers modules as projects:

set project in mpcontribs collections
link contribution card title to project landing page
replace institutions query in Explorer w/ projects

Installation problems with missing 'docker' dir in git repo

Hi,

I seem to be having some trouble getting set up following the 'INSTALL.md' instructions and was hoping you might be able to point me in the right direction.

I first tried following instructions from the the INSTALL.md file in this git repo, but it looks like a directory needed for the installation (called 'docker') is missing from the repo.

I next tried using notebooks within the MPContribs JupyterHub portal. The notebooks didn't seem to work in there right away so I tried following the instructions in the INSTALL.md from a terminal in the JupyterHub portal. On this attempt, I got further since the 'docker' directory was present but only as for as 'git checkout -b flaskproxy origin/flaskproxy' where I think I was meant to be accessing files from docker/dockerspawner from the remote repository which were no longer there.

Apologies if I am just missing something trivial!

MPContribs package on PyPI

Prepare MPContribs repo as a package suitable for install from the Python Package Index. Include basic documentation with install instructions (official and editable) and release to https://pythonhosted.org/.

make installable directly from GitHub
release to PyPi via changes
prepare and upload sphinx docs

Delete Contributions

MPContribsRester.delete_contributions() doesn’t clean up derived collections!

CLI mgc: collaborators

A contribution is not assigned to a particular user with a given e-mail address but rather a project/institution encompassing a list of collaborators. Everyone in the list is allowed to edit/update an existing contribution as well as add/remove collaborators. A sub-command to the mgc command line program enables full control over the collaborators. The first collaborator in the list is considered primary contact.

This functionality is a prerequisite for the important support of an incubation period during which only collaborators can access a contribution before publication.

ToDos to support this feature:

Perovskites Diffusion

pre_submission: bring up to date and clean up (reorganize data submission?)
clean up and revisit template code
check contribution cards
make screenshots to include in portal
use Backgrid.ColumnManager to manage display of columns
(current landing page implementation can then be discarded)

trouble installing

I tried installing into a fresh virtualenv but ran into the following:

(mgc)λ> python setup.py install
running install
running bdist_egg
running egg_info
writing requirements to mpcontribs.egg-info/requires.txt
writing mpcontribs.egg-info/PKG-INFO
writing top-level names to mpcontribs.egg-info/top_level.txt
writing dependency_links to mpcontribs.egg-info/dependency_links.txt
reading manifest file 'mpcontribs.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
warning: no files found matching 'README.asc'
writing manifest file 'mpcontribs.egg-info/SOURCES.txt'
installing library code to build/bdist.macosx-10.10-x86_64/egg
running install_lib
running build_py
error: Error: setup script specifies an absolute path:

    /Users/dwinston/Dropbox/materialsproject/MPContribs/scripts/mgc

setup() arguments must *always* be /-separated paths relative to the
setup.py directory, *never* absolute paths.

use globally unique operation IDs

see #997 (comment)
prerequisite to close #997

MPFileViewer: fold in Data Tables

If it is not a huge deal to implement, it would be great if the datables could be collapsed just like the elements of the attribute tree.

Contribution Cards v2

show carousel of static images if graphs available in contribution

Multiple data sections don't display in the viewer

Only one table per composition shows up in the viewer.

MGC viewer: monty dependency check

A dependency check throws an error, although I have an updated version of monty (I think).

$ mgc 
Traceback (most recent call last):
  File "/usr/local/bin/mgc", line 4, in <module>
    __import__('pkg_resources').require('mpcontribs==0.0')
  File "/usr/lib/python2.7/dist-packages/pkg_resources/__init__.py", line 3020, in <module>
    working_set = WorkingSet._build_master()
  File "/usr/lib/python2.7/dist-packages/pkg_resources/__init__.py", line 616, in _build_master
    return cls._build_from_requirements(__requires__)
  File "/usr/lib/python2.7/dist-packages/pkg_resources/__init__.py", line 629, in _build_from_requirements
    dists = ws.resolve(reqs, Environment())
  File "/usr/lib/python2.7/dist-packages/pkg_resources/__init__.py", line 807, in resolve
    raise DistributionNotFound(req)
pkg_resources.DistributionNotFound: monty==0.6.4

In [1]: import monty
In [2]: monty.__version__
Out[2]: '0.6.5'

client: support regex-based query parameters

The MPContribs API supports dynamic query parameters based on regular expressions for the hierarchical data in /contributions/. For instance, any query parameter matching ^data__((?!__).)*$__gte will be accepted to query a (potentially nested) data subfield via the $gte query operator. The sort and filter functionalities on the MPContribs landing pages for a project rely on this feature of the API.

However, validation in bravado as part of the python client library (mpcontribs-client) fails to recognize query parameters when they match a regex definition in the swagger spec. A solution could be the following:

Use the dynamic columns field in the /projects/ endpoint (_fields=columns) to obtain a list of possible data subfields.
Rewrite the Swagger spec retrieved from the API server dynamically by expanding all regex-based query parameters based on the list of columns in 1.
Reinitiate the client and return it to the user.

materialsproject / mpcontribs Goto Github PK

mpcontribs's People

Contributors

Stargazers

Watchers

Forkers

mpcontribs's Issues

Recommend Projects

Recommend Topics

Recommend Org