lab-cosmo / chemiscope Goto Github PK
View Code? Open in Web Editor NEWAn interactive structure/property explorer for materials and molecules
Home Page: http://chemiscope.org
License: BSD 3-Clause "New" or "Revised" License
An interactive structure/property explorer for materials and molecules
Home Page: http://chemiscope.org
License: BSD 3-Clause "New" or "Revised" License
When loading different structures in multiple viewers in the same page, it looks like some global state is not updated on the JSmol side, leading to the wrong structure being loaded in some viewers.
We mostly see this when applying saved settings containing multiple viewers, and slowing down loading seems to fix it:
https://github.com/cosmo-epfl/chemiscope/blob/acdae4d83cac62da0308c5b0d87046306a89c8bb/src/index.ts#L339-L347
We should still go for an actual fix as much as possible since the fix above makes everything very slow, and even 1s delay is not always enough.
One of the reviewer concern on the JOSS paper was that we lack automated tests. We can do manual testing, but that mean it can be hard to refactor without introducing bugs for corner cases. At the same time, chemiscope being mostly a GUI is not well suited for standard unit testing (checking public classes/functions one by one, single behavior by single behavior)
I think we should add some form of automated testing, end to end testing might be the less bad one. The idea is to specify high-level behavior such as selecting a new property for axis x changes the plot & axis title on the plot
. Such tests might be painful to write since we have to make them generic enough to not break due to layout changes but specific enough to catch the issues; but I still think they are worth it.
Cypress looks like a nice framework for such end to end testing: https://www.cypress.io/.
Both on this repo and in the website
Trying to run pip install .
from a source checkout will fail with "could not find package.json" issue. Installing from a pre-generated sdist (which is what happen with pip install chemisope
) is fine, so this should not impact most users.
The core of the issue is that we use the same version number for the python package & the npm package, so python setup.py read the version number in package.json, which is symlinked in python/package.json
. pip install .
tries to isolate the build, and copy the python/
directory to a temporary location, where it can not find the symlinked file.
python setup.py install
and the versions uploaded on PyPI works fine, so I don't think we have to worry about this.
This is tracked upstream in pypa/pip#3500
Using NaNs to indicate environments that are not linked to points on the map is cumbersome and inefficient. The indexing is already allowing for a sparse list, it only needs to be handled on the JSMol side. Also related to #4
Using symbols to display categorical data interferes with selection highlighting, particularly when using multiple structure panels.
Expected behavior:
Highlighted structures are shown using the same symbol used for the category, which is just made a bit bigger and colored according to the structure panel color key.
Observed behavior:
in 2D, a circle is used regardless of the underlying class
in 3D (even more problematic) the symbol of the active panel is used for all the selected structures (so that a structure with a square symbol might turn into a cross if I select a cross structure as the active one)
For example when loading the QM9 KPCovR map, and then clicking on the Strucure XXX
button, there are too many properties to fit in the space occupied by the structure viewer, and no scroll bar to get back to the properties on the top.
This should be a simple fix to set overflow
CSS property to scroll
instead of hidden
.
In the first pseudo script under the section "Creating an input file"
the ase.io import needs to be changed to
import ase.io
same for sklearn import
import sklearn.decomposition
(or you change the usage later in the pseudo script)
Getting the version as git hash with git describe --tags --dirty
or equivalent when compiling the typescript code to javascript; and then displaying it somewhere on the page
clicking on the "structure" info button when the "environment" info is displayed should hide the latter. instead, it opens "behind" it.
Idea would be to have two layouts depending on aspect ratio of the window, resized automaticallly to fit within it. also, the property listing for one frame should overlay the jsmol box, so that it always stay within the corresponding 1x1 block. if there are too many properties, there should be a small scrollbar within the overlay.
mockup:
This should be made more visible in the documentation.
Another way to achieve this is to move the DefaultVisualizer code to /app
and point to it as an example of how to link widgets together.
Instead of displaying the full dataset in the map all the time, we could give the user the ability to filter which points to show based on the property values (e.g. energy < 345
).
The filtered-out points should not be removed, but rather dimmed out/greyed on the map, so that selecting an environment from the structure or the slider still works.
The hard question is how the user would input the filters. A first version can use range filters like
[ MIN ] <= [ $property ▼] <= [ MAX ]
with MIN
and MAX
number input and $property
a dropdown select element.
Then a second step would be to combine range filters on different properties with or/and.
Another solution would be to have a selection language, but this might be overkill.
Regardless of the choice of "trajectory" settings, some options such as packed cells, or the style for the environments, are reset when loading a new structure by clicking on the map or on the trajectory sliders.
Previously the standalone script would by default hide the loader when a data file was included in the html.
Now (1) the hiding code doesn't work anymore and (2) I question whether it makes sense to hide it, given that being able to save visualization state is useful. This issue is to track (1) and discuss on (2)
As discussed in #88 (comment) and following comments, it would be very nice to support LaTeX syntax in user-facing data. The core use case for this is rendering units. Another appealing case is to render dataset description with some math inside. The alternative is to use unicode math characters (in particular unicode superscripts) where needed.
One solution to do this that would integrate relatively well with plotly is to use https://www.mathjax.org/ for latex rendering. Another alternative is https://katex.org/, which is usually faster and smaller than mathjax, although I don't know if it works with plotly.
If we want to do this, one thing to consider is that we would have to bundle mathjax/katex, which would increase the size of the bundled javascript; and might interfere with downstream users who might already have a latex rendered installed (e.g. materials cloud). This is my only objection to this feature: it might not be worth the slower loading time for marginally more convenient unit & description math input.
Also, we currently render the dataset description & references using markdown syntax, so we have to make sure not to break it when rendering latex. This should be fine, we mostly have to check that it is not broken when implementing this.
Places where we may want to have latex rendering:
Things to look at before starting the implementation:
When in atom mode, it would be good to highlight on the map other points pertaining to the same structure as the currently selected atom.
A good way to do that would be to dim / grey out other points, using the same mechanism as #3.
If the dataset contains both atom and structure properties, we currently default to showing the 'atom' ones, but the user should have a way to switch between the two modes.
The display mode is already centralized in the EnvironmentIndexer, this mainly needs to be a setting somewhere.
Seems that changing the color range manually does not trigger an update of the plotly map
The only metadata being used currently is the dataset name. We should add more, I think at least
string[]
string[]
(journal in which this was published/DOI)string
Anything else?
The metadata could be hidden by default, and displayed when clicking on the dataset name. Or this could be moved to a separate component, to be displayed on top of the other ones.
Taking the list from #25 review
select({-1, -1}, guid)
to indicate removal, instead add specific functions for each possible action: addPinned
, removePinned
, changeActivePinned
. Then select
always act on the current active
and only move the marker/change the structurestarterGUID
parameter for PropertyMap
constructor to allow using the map without a structure viewer. This should be easy to do after the above changesclassList.toggle('chsp-active-structure-marker', false)
) to a separate HTMLMarker
class doing all of this.On a related note, src/map/map.ts
is getting quite large, so we should try to extract functionalities and potentially move it out of this file.
For the default interface, we could have a way to specify a visualization state in the input JSON file.
On the map side, this means describing which property should be used as color/size/symbols, x/y/z ranges, etc.
On the structure viewer side, this means supercell settings, visualizations settings, etc.
This can be implemented by adding serialization/de-serialization of the settings to JSON, and having an additional section for these in the JSON input file.
I find often myself using ugly code such as
from os.path import expanduser
sys.path.append(expanduser('~')+'/lavoro/code/chemiscope/utils')
from chemiscope_input import write_chemiscope_input
to be able to write chemiscopes from my analysis notebooks.
I think it would be nice to have a pip package that installs the "utils" section of the chemiscope package, to facilitate its usage.
I was kind of torn as to whether these utils should rather live in a separate repo, but I actually think it makes sense for them to be associated with chemiscope, as that will simplify ensuring that the utils and the actual viewer stay in sync.
On slow connections, loading one of the examples can take multiple seconds. A small loading indicator would show the users that the code is doing something.
Scrolling in one direction increases zoom level on the map but decreases the one for the structure. This is surprising for the users.
I don't know if plotly or jsmol can be configured to use "reverse" zoom. If not, a possibility would be to intercept the wheel event and invert the sign before allowing handling by the libraries.
This should be easy, since all viewers are built on canvas
elements, which can be rendered to an image with canvas.toDataURL
.
This should be nicer to use than using screenshots to extract structures views.
When something like energy is used as a property for dot size in the map, the most interesting points are the one with lowest energy, but they end up being the smallest.
We could provide an option to use -<property>
instead of <property>
for the size to support this use case.
If one click on "disable" for the "Environment" setting in the structure visualizer, the setting is reset when loading a new structure.
When you only have one viewer left:
This probably will not happen to a lot of user, and if it happens you can just reload everything, so I would say this is low priority.
We already have the ability to generate a "standalone" visualization as an HTML file, the idea is to make it easier for users to download such file including their dataset.
standalone.html
, stringify the dataset back to JSON (including visualization state #6) and add it at the end of the file.While #96 is a first step toward this, there are still multiple places where the process could be smoother.
I think ideally we should keep the dual workflow with a function that is
mirrored by a command-line utility. People from "my generation" still have
an instinct to go full bash onto postprocessing.
So I think we want to be able to easily combine structures (here having
something that can be read by ASE or an Atoms list seem to cover quite some
grounds) and arrays of values (that maps easily into column files) or
dicts.
One thing that often bugs me is that I want to drop info from the ASE file
so there could also be a switch that allows you to drop those fields.Originally posted by @ceriottm in #96 (comment)
There are three main parts to a chemiscope input file: metadata, properties and structures.
The story to import structures into chemiscope is already pretty good, as long as you work with ASE =). Adding support for alternative file formats should be relatively easy and can be done on a case-by-case basis.
Properties is the harder part right now. We take the properties defined by ase in Atoms.info
and Atoms.arrays
, but the user may not want this (e.g. the number
property), and may want more properties. For now, the only way to add other properties is to manually create the right dictionary and pass it to the function. Removing properties is also possible within python with del frame.info["whatever"]
or del frame.arrays["whatever"]
, but not with the command line script.
Finally, the script support basic metadata input, but again it is much easier to do this with the Python function.
One thing we can do is add support for properties stored in CSC/text/npy files. For CSV files the property name would be the CSV header, for the other methods we could just name properties 1, 2, 3, etc. We could easily guess the target (atom
/structure
) by counting the number of values in the property.
This will obviously not support any property metadata (description/units), but for quick & dirty command line scripting, or to separate analysis/chemiscope generation it could help.
As it says on the tin! We already save the zoom level in 2D with axis min/max values.
Getting these values and applying them should be easy with JSmol (we already have a save/apply orientation setting), I don't know if this is feasible with plotly though.
See: https://plot.ly/javascript/group-by/
We currently use one additional empty trace for each symbol to be displayed, this will make the code simpler and might improve performances.
Extract code dealing with selected markers (things like classList.toggle('chsp-active-structure-marker', false)) to a separate HTMLMarker class doing all of this.
The standalone viewer is only able to read uncompressed JSON, and fails with a cryptic error message if one uses a compressed file instead. We should at least improve the error message in this case, or even better add pako to the standalone viewer to decompress the dataset.
When trying to download the example input following the install instructions, the download-example-input
step fails when /tmp
is on a different partition than the current working directory:
$ npm run download-example-input
> [email protected] download-example-input /home/kai/Documents/reviews/2020/JOSS-2117/chemiscope
> ts-node ./utils/download-example-input.ts
Cloning into '/tmp/tmp-173153-8SqGyGFytJyu/chemiscope'...
Error: EXDEV: cross-device link not permitted, rename '/tmp/tmp-173153-8SqGyGFytJyu/chemiscope/CSD-500.json.gz' -> './app/CSD-500.json.gz'
at Object.renameSync (fs.js:756:3)
at Object.<anonymous> (/home/kai/Documents/reviews/2020/JOSS-2117/chemiscope/utils/download-example-input.ts:14:8)
at Module._compile (internal/modules/cjs/loader.js:1200:30)
at Module.m._compile (/home/kai/Documents/reviews/2020/JOSS-2117/chemiscope/node_modules/ts-node/src/index.ts:858:23)
at Module._extensions..js (internal/modules/cjs/loader.js:1220:10)
at Object.require.extensions.<computed> [as .ts] (/home/kai/Documents/reviews/2020/JOSS-2117/chemiscope/node_modules/ts-node/src/index.ts:861:12)
at Module.load (internal/modules/cjs/loader.js:1049:32)
at Function.Module._load (internal/modules/cjs/loader.js:937:14)
at Function.executeUserEntryPoint [as runMain] (internal/modules/run_main.js:71:12)
at main (/home/kai/Documents/reviews/2020/JOSS-2117/chemiscope/node_modules/ts-node/src/bin.ts:227:14)
Googling the error message ends me up on a stackoverflow post that sounds like fs.rename
uses a syscall that doesn't work across partitions/mount points/filesystems.
In my case, /home
is on a different filesystem than /tmp
, and thus the above error happens.
Multidimensional properties could be very nice to visualize. We can see two use-cases that could be implemented with the same infrastructure:
The same code could also be used to store (and display in the future) vector/tensorial properties as 3/9-values vectors.
1st milestone:
"properties": {
"<name>": {
// array of arrays, of dimension N_structure/N_environments x p were p is the size of the parameter array below
"values": [[...], [...]],
// list of parameters, 1 to start but potentially multiple parameters later for 2+D properties
"parameters": ['parameter 1'],
}
},
// parameters above refer to the values in this separate table so that multiple properties can use the same parameters
"parameters": {
"parameter 1": number[], // p1 elements
"parameter 2": number[], // p2 elements
}
2nd milestone:
3rd milestone:
We may want to extend this to 2+D properties in the future, the current proposal should be forward compatible with this.
There should be an example (maybe in https://github.com/cosmo-epfl/kernel-tutorials/) on how to go from structure to chemiscope, using SOAP and PCA (maybe KPCovR).
It is possible to set meta.name
to <script>alert('got you')</script>
, or any other arbitrary HTML, which is unfortunate. We should add HTML sanitation when checking the datasets before trying to display them, to remove potential visual breakage or attacks.
Enable chemiscope to remember the last few items (or pinned items) so that we can enable comparison between structures within the projection.
1st round: click-enabled list of structure/environment id's to revisit
nth round: movable click-enabled static pngs of previously visited structures/environments (see proxy)
It would be nice to be able to hide/grey out points in the map by selecting a given symbol/clicking on it on the legend.
This could be implemented in such a way to be able to also use it for #4.
Adding description
and units
for all properties in the dataset would be good
Places where we can display them:
name (unit)
on the axis of the map (just name
if unit is undefined or empty)description
when selecting properties in the map setting? Not sure if this is possibleDo you see other metadata we would want to attach to the properties?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.