materialsproject / mpcontribs Goto Github PK
View Code? Open in Web Editor NEWPlatform for materials scientists to contribute and disseminate their materials data through Materials Project
Home Page: https://mpcontribs.org
License: MIT License
Platform for materials scientists to contribute and disseminate their materials data through Materials Project
Home Page: https://mpcontribs.org
License: MIT License
Increase the size of embedded plotly graphs to improve readability and manageability. @ATNDiaye
make a docker image/container to run mpcontribs website/services on NERSC spin
This issue lists next or more advanced steps with respect to book-keeping once issue #3 has been closed. For instance, a mandatory description field for each contribution as well as means to retrieve and search for specific contributions are helpful.
reduced_formula/alphabetic_formula
as composition cid
builders
: set figure & axes titles automatically from table name & columnsbuilders
: re-enable plotly updates without overwritingmpfile.concat
: account for comments in appended MPFile
cids
between MPFiles
mgc delete
use abbreviated/shortened cidsmgc find
)mgc log/info
à la git log --oneline
with short description messagemgc get
to return an auto-generated MPFile
with embedded cid
mgc delete
: also delete plotly graphs from user accountThe Atlas MongoDB deployment and its peering connection from the AWS VPC is currently set up manually every time a new pipeline and thus a new collection of stacks is required. This can happen, for instance, if development stacks are needed, a new region is explored, or new infrastructure components are being tested.
The solution is to use MongoDB's Atlas Resource Providers in the Cloudformation templates directly. This would also set up peering connections automatically.
This change will likely result in a brand new Atlas MongoDB deployment which will implicitly use the new Private Connection Strings. The environment variable for the API containers will have to be updated accordingly.
Establish a graphs only
endpoint to enable "on-the-fly" analyses during post-submission processing, i.e. producing non-persistent Plotly graphs without any MP or Plotly interaction.
The node list of the MPFile is ordered alphabetically. Especially in the processes it is important, that the order from the file is retained.
All storage pertaining to user data contributions needs to be separated into its own database to avoid any interaction with MP's core datasets. In addition to minor improvements, this transition requires new configuration files and updated adapters for the front-end.
ToDo's:
tree/table/plots
fields in contribution_data
to reflect front-end--reset
work with mpcontribs
databaseobject_id
as contribution ID, last 6 hex characters for humanscontribution_data
field to materials/compositions
collectionmaterials/composition
collection, use mp_cat_id
as object_id
MPContributionsBuilder
work againmaterials
and composition
collectionspython -m mpcontribs
into subcommands fake/submit/reset
, retire insert optionmpcontribs
database in materialsdjango's utils.connector
mgc
subcommands info
and submit
[new file] workmgc delete/viewer
subcommands are still functionalmaterial_contributions
view in materials_django
to query correct db and propagate correct data to templateThis issue keeps track of errors which should be caught by the parser to warn the user of potential problems with the MPFile
. The error messages should be verbose enough to guide the user through error resolution.
>>>
and :
separatorspip
complains:
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
mpcontribs-client
4.2.9 requiresjsonschema<4.0
, but you havejsonschema
4.4.0 which is incompatible.
Is the pinning actually necessary?
MPContribs/mpcontribs-client/setup.py
Line 32 in b7b2e21
materials_django
(run barebone webtzite
as in JupyterHub)PMP_MAPI_KEY
Along with credits to projects and collaborators (see #1), it is important to prominently display references on the contribution pages. Thus they become special entries in the root-level key-value section(s) of MPFiles
("first-class citizens"). The reference format should minimally support URLs (incl. access date), bibtex strings, and DOIs.
ToDos to support this feature:
Implement the MPContribs button in JupyterHub as a dropdown with direct links to the ingester and local/production portals
https://contribs.materialsproject.org/projects/carrier_transport
Tried CSV, JSON, with and without tables. Maybe has to do with the dataset being fairly large? Tried on Google Chrome Win 10. A colleague of mine ran into the same problem (@AndrewFalkowski) and I ran into the same issue. Incidentally, we can download the data via https://datadryad.org/stash/dataset/doi:10.5061/dryad.gn001 (using the "References" link in the top-left), so it's non-blocking on our workflow, but figured it was worth mentioning.
See this forum post on filtering for materials by electrical conductivity. I would like to post a snippet where someone who has pip install
ed pymatgen
and mpcontribs
and who has set their API key using the pmg
CLI can use the boltztrap rester to get mp-ids for materials with given electrical conductivity and feed those into MPRester to get additional core data. I currently cannot do this because documented use of mpcontribs is via MP's hosted jupyterhub only, and furthermore it seems like a local install would require some work because mpcontribs is currently python2 only.
Develop a basic Flask web app for the user to "simulate" the contribution of an MPFile
on a local machine. The app should load an MPFile
and show the resulting contributions in an interactive way similar to the MP front-end. For reasons of simplicity, the internal parser and builder phases should happen in a purely dynamic manner (in memory) without the need to setup/interact with a database.
Such an app would allow interested contributors (e.g. @ATNDiaye) to try out the framework before its (official) deployment, or at a later stage, before going through the official submission process.
ToDo's:
ContributionsMongoAdapter
compatible with db-less operationMPContributionsBuilder
compatible with db-less operation (tree/table-only)tree_data
using JSONTreetables
using DataTablesMPContributionsBuilder
for db-less modeplots
via Plotly URLmgc
via subcommand (open/reload browser window)io.recparse
data_
prefix in plots.xxx.table
MPContribs
installable from GitHub as a package, see #9MPFile.from_file
: also check for unicode not only strmgc viewer
(tag users)A refresh button which reloads the file and generates new plots would be nice.
Upon submission of a MPFile
, its contents are linked to a unique contribution ID (cid
). Updates/Overwrites/Deletions of contributions subsequently require the knowledge of affected cids
with respect to the MPFile
being submitted. To facilitate the book-keeping, there has to be a mechanism in place that allows users to keep track of the respective cid
.
ToDos to support this feature:
MPFile.from_file()
which keeps comments (02f1588)MPFile.get_string()
(6a2db77, 17d9e9c)make_general_section
5108c63)make_general_section
(349dbbb)MPFile
and test w/ viewer (e7fb477)mgc submit
: implement dry run mode to test submission vs scratch-type collections (0519db7)utf-8
encoding (6d96d77)mgc submit
: overwrite/update local MPFile
by embedding cid
mgc submit
: check for embedded cid
and trigger correct action (add/update)Find a way for customizations of Plotly graphs to also show up in the MPFileViewer. This will be limited to rudimentary customizations of the default graph (see #13).
MPFile import crashes if MPFile has empty sections.
Dependabot couldn't find a Dockerfile for this project.
Dependabot requires a Dockerfile to evaluate your project's current Docker dependencies. It had expected to find one at the path: /mpcontribs-sidecar/chrome/Dockerfile
.
If this isn't a Docker project, or if it is a library, you may wish to disable updates for it in the .dependabot/config.yml
file in this repo.
It would be very helpful to have a more meaningful error-reporting if an MPFile cannot be parsed correctly. As a start the number of the line in the MPFile which triggers the error would be helpful.
I get:
File "/usr/lib/python2.7/dist-packages/pandas/io/parsers.py", line 1052, in __init__
self._reader = _parser.TextReader(src, **kwds)
File "pandas/parser.pyx", line 508, in pandas.parser.TextReader.__cinit__ (pandas/parser.c:4775)
ValueError: No columns to parse from file
P.S.: I just found the preview button :)
Allow the MPFile
viewer to be run completely in offline mode, i.e. no interactive graphing via Plotly nor loading of javascript libraries from CDNs. (@ATNDiaye)
Dependabot couldn't find a Dockerfile for this project.
Dependabot requires a Dockerfile to evaluate your project's current Docker dependencies. It had expected to find one at the path: /mpcontribs-sidecar/kernel_gateway/Dockerfile
.
If this isn't a Docker project, or if it is a library, you may wish to disable updates for it in the .dependabot/config.yml
file in this repo.
"notebook/js/outputarea.js"
Already integrated with Travis CI but only running one simple test for MPFile.get_string()
(see test_mpfile.py)
et al.
tooltip not loading as expected. cafe804materials_django
to work with new /card/cid
endpoint @dwinstonpost_submission.py
)Identical to FigShare's ndownloader
URLs, the MPContribsML deployment already supports downloading full matbench_*
datasets through AWS S3 compressed as .json.gz
files using a top-level URL, e.g. https://ml.materialsproject.cloud/matbench_expt_gap.json.gz. This functionality is being used in hackingmaterials/matminer#446 to load datasets for mining.
Powered by the /contributions/download/ endpoint in the MPContribs API, downloads through the according button on the MPContribs landing pages are currently being generated dynamically/server-side and respect the filters and sorting (query parameters) applied in the UI. Both csv
and json
formats are supported, each with or without full structure objects (full
vs minimal
). Right now, the only supported MIME type is gz
but other types could be valuable to implement in the future (e.g. bzip, jpeg, png, vnd.plotly.v1+json, x-ipynb+json, ods, pdf, tar, zip, excel, xml).
To reduce duplicate and potentially heavy DB queries, the download
endpoint could encode the query in the filename and save it to S3 upon first request or after update of the underlying data. This would implicitly maintain versioned snapshots of the datasets as API POST/PUT requests would add a timestamp to the old file and the next GET request would generate a new file. The S3 bucket storing the exported project data would have a sub-folder for each MPContribs deployment (Main, ML, LightSources, ...).
A progress bar is needed while the first export file for a project and query is generated on S3. It would use server-side events and a Redis cache as already implemented for the dynamic (re-)generation of Jupyter notebooks which power MPContribs Contribution Details Pages.
If a file for the specific project and query without a timestamp exists, the API's the API could use the boto3 client to retrieve the file from S3 and load it into memory, and then return it as a response to the request. /contributions/download/
endpoint would simply return a 302 Redirect to S3 - thus relegating download traffic to S3. Alternatively,However, this would cause unnecessary implementation, maintenance, and monitoring efforts as well as strain on the API Fargate tasks.
EDIT 06/19/2020: I chose to always go through the API Fargate task and keep the S3 bucket private (next paragraph outdated)
The consequence of a simple redirect is that authentication/authorization can be enforced on generating the file export (saving to S3) on the first request but not on subsequent download requests from the public S3 bucket. The MPContribs URLs for the portal and the API could technically still use authentication/authorization for retrieval of the data exports but the URL to the S3 object would need to be public anyways. S3 storage of export files would thus only be enabled for public projects which could be an additional inducement for contributors to make their data available to the public.
Saving files from the API Fargate task to S3 does not incur extra data traffic or processing costs since the S3 Gateway Endpoint is free (as opposed to a NAT Gateway) and the S3 bucket is in the same AWS region. However, there will be costs related to traffic caused by downloads of the (compressed and predominantly small) S3 objects and its bare storage. The latter can be optimized by setting up lifecycle policies which automatically move objects into other storage levels depending on their monthly access frequency. For instance, old timestamped snapshots would likely move into cheaper Glacier storage since they'll only be needed/downloaded occasionally.
mpfile_init.txt
use MPContribsUsers modules as projects:
Hi,
I seem to be having some trouble getting set up following the 'INSTALL.md' instructions and was hoping you might be able to point me in the right direction.
I first tried following instructions from the the INSTALL.md file in this git repo, but it looks like a directory needed for the installation (called 'docker') is missing from the repo.
I next tried using notebooks within the MPContribs JupyterHub portal. The notebooks didn't seem to work in there right away so I tried following the instructions in the INSTALL.md from a terminal in the JupyterHub portal. On this attempt, I got further since the 'docker' directory was present but only as for as 'git checkout -b flaskproxy origin/flaskproxy' where I think I was meant to be accessing files from docker/dockerspawner from the remote repository which were no longer there.
Apologies if I am just missing something trivial!
Prepare MPContribs
repo as a package suitable for install from the Python Package Index. Include basic documentation with install instructions (official and editable) and release to https://pythonhosted.org/.
changes
MPContribsRester.delete_contributions()
doesn’t clean up derived collections!
A contribution is not assigned to a particular user with a given e-mail address but rather a project/institution encompassing a list of collaborators. Everyone in the list is allowed to edit/update an existing contribution as well as add/remove collaborators. A sub-command to the mgc
command line program enables full control over the collaborators. The first collaborator in the list is considered primary contact.
This functionality is a prerequisite for the important support of an incubation period during which only collaborators can access a contribution before publication.
ToDos to support this feature:
mgc collab
sub-command to control collaborators and primary contact (POC):mgc collab
functionality in rest.views.update_collaborators
:
Backgrid.ColumnManager
to manage display of columnsI tried installing into a fresh virtualenv but ran into the following:
(mgc)λ> python setup.py install
running install
running bdist_egg
running egg_info
writing requirements to mpcontribs.egg-info/requires.txt
writing mpcontribs.egg-info/PKG-INFO
writing top-level names to mpcontribs.egg-info/top_level.txt
writing dependency_links to mpcontribs.egg-info/dependency_links.txt
reading manifest file 'mpcontribs.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
warning: no files found matching 'README.asc'
writing manifest file 'mpcontribs.egg-info/SOURCES.txt'
installing library code to build/bdist.macosx-10.10-x86_64/egg
running install_lib
running build_py
error: Error: setup script specifies an absolute path:
/Users/dwinston/Dropbox/materialsproject/MPContribs/scripts/mgc
setup() arguments must *always* be /-separated paths relative to the
setup.py directory, *never* absolute paths.
If it is not a huge deal to implement, it would be great if the datables could be collapsed just like the elements of the attribute tree.
Only one table per composition shows up in the viewer.
A dependency check throws an error, although I have an updated version of monty (I think).
$ mgc
Traceback (most recent call last):
File "/usr/local/bin/mgc", line 4, in <module>
__import__('pkg_resources').require('mpcontribs==0.0')
File "/usr/lib/python2.7/dist-packages/pkg_resources/__init__.py", line 3020, in <module>
working_set = WorkingSet._build_master()
File "/usr/lib/python2.7/dist-packages/pkg_resources/__init__.py", line 616, in _build_master
return cls._build_from_requirements(__requires__)
File "/usr/lib/python2.7/dist-packages/pkg_resources/__init__.py", line 629, in _build_from_requirements
dists = ws.resolve(reqs, Environment())
File "/usr/lib/python2.7/dist-packages/pkg_resources/__init__.py", line 807, in resolve
raise DistributionNotFound(req)
pkg_resources.DistributionNotFound: monty==0.6.4
In [1]: import monty
In [2]: monty.__version__
Out[2]: '0.6.5'
The MPContribs API supports dynamic query parameters based on regular expressions for the hierarchical data
in /contributions/. For instance, any query parameter matching ^data__((?!__).)*$__gte
will be accepted to query a (potentially nested) data
subfield via the $gte
query operator. The sort and filter functionalities on the MPContribs landing pages for a project rely on this feature of the API.
However, validation in bravado as part of the python client library (mpcontribs-client
) fails to recognize query parameters when they match a regex definition in the swagger spec. A solution could be the following:
columns
field in the /projects/ endpoint (_fields=columns
) to obtain a list of possible data
subfields.columns
in 1.A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.