Giter Club home page Giter Club logo

maml's Introduction

maml

GitHub license Linting Testing Downloads codecov

maml (MAterials Machine Learning) is a Python package that aims to provide useful high-level interfaces that make ML for materials science as easy as possible.

The goal of maml is not to duplicate functionality already available in other packages. maml relies on well-established packages such as scikit-learn and tensorflow for implementations of ML algorithms, as well as other materials science packages such as pymatgen and matminer for crystal/molecule manipulation and feature generation.

Official documentation at https://materialsvirtuallab.github.io/maml/

Features

  1. Convert materials (crystals and molecules) into features. In addition to common compositional, site and structural features, we provide the following fine-grain local environment features.

a) Bispectrum coefficients b) Behler Parrinello symmetry functions c) Smooth Overlap of Atom Position (SOAP) d) Graph network features (composition, site and structure)

  1. Use ML to learn relationship between features and targets. Currently, the maml supports sklearn and keras models.

  2. Applications:

a) pes for modelling the potential energy surface, constructing surrogate models for property prediction.

i) Neural Network Potential (NNP) ii) Gaussian approximation potential (GAP) with SOAP features iii) Spectral neighbor analysis potential (SNAP) iv) Moment Tensor Potential (MTP)

b) rfxas for random forest models in predicting atomic local environments from X-ray absorption spectroscopy.

c) bowsr for rapid structural relaxation with bayesian optimization and surrogate energy model.

Installation

Pip install via PyPI:

pip install maml

To run the potential energy surface (pes), lammps installation is required you can install from source or from conda::

conda install -c conda-forge/label/cf202003 lammps

The SNAP potential comes with this lammps installation. The GAP package for GAP and MLIP package for MTP are needed to run the corresponding potentials. For fitting NNP potential, the n2p2 package is needed.

Install all the libraries from requirement.txt file::

pip install -r requirements.txt

For all the requirements above::

pip install -r requirements-ci.txt
pip install -r requirements-optional.txt
pip install -r requirements-dl.txt
pip install -r requirements.txt

Usage

Many Jupyter notebooks are available on usage. See notebooks. We also have a tool and tutorial lecture at nanoHUB.

API documentation

See API docs.

Citing

@misc{
    maml,
    author = {Chen, Chi and Zuo, Yunxing, Ye, Weike, Ji, Qi and Ong, Shyue Ping},
    title = {{Maml - materials machine learning package}},
    year = {2020},
    publisher = {GitHub},
    journal = {GitHub repository},
    howpublished = {\url{https://github.com/materialsvirtuallab/maml}},
}

For the ML-IAP package (maml.pes), please cite::

Zuo, Y.; Chen, C.; Li, X.; Deng, Z.; Chen, Y.; Behler, J.; Csányi, G.; Shapeev, A. V.; Thompson, A. P.;
Wood, M. A.; Ong, S. P. Performance and Cost Assessment of Machine Learning Interatomic Potentials.
J. Phys. Chem. A 2020, 124 (4), 731–745. https://doi.org/10.1021/acs.jpca.9b08723.

For the BOWSR package (maml.bowsr), please cite::

Zuo, Y.; Qin, M.; Chen, C.; Ye, W.; Li, X.; Luo, J.; Ong, S. P. Accelerating Materials Discovery with Bayesian
Optimization and Graph Deep Learning. Materials Today 2021, 51, 126–135.
https://doi.org/10.1016/j.mattod.2021.08.012.

For the AtomSets model (maml.models.AtomSets), please cite::

Chen, C.; Ong, S. P. AtomSets as a hierarchical transfer learning framework for small and large materials
datasets. Npj Comput. Mater. 2021, 7, 173. https://doi.org/10.1038/s41524-021-00639-w

maml's People

Contributors

chc273 avatar code-mraj avatar comprhys avatar dependabot-preview[bot] avatar dependabot[bot] avatar janosh avatar jiqi535 avatar mausam1112 avatar ml-evs avatar pre-commit-ci[bot] avatar sgbaird avatar shivamjindal1 avatar shyuep avatar tinaatucsd avatar w6ye avatar yunxingzuo avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

maml's Issues

SNAP model failing with supercells larger 12 angstroms

Dear Developers,

I ran into a problem while training a test snap model.
The fact is that for a supercell smaller than 12 angstroms, everything is fine. And for a larger supercell, the correlation between the predicted forces and the forces in the training set is zero.

The structures for the training set were obtained using the VASP. I attach two samples in the JSON format, obtained by an identical vasp script with the only difference being that one cell was 11.9 angstroms, and the other was 12.1 angstroms. Also I attach text files with original and predicted forces to visualize the difference.

What can be the reason of this effect?

My python script is following:

element_profile = {Al: {'r': 0.5, 'w': 1.0}} per_force_describer = BispectrumCoefficients(rcutfac=rcutfac, twojmax=6, element_profile=element_profile, quadratic=False, pot_fit=True, include_stress=False, n_jobs=n_threads, verbose=False) elem_features = per_force_describer.transform(train_structures) train_pool = pool_from(train_structures, train_energies, train_forces) _, elem_df = convert_docs(train_pool) y = elem_df['y_orig'] / elem_df['n'] x = elem_features weights = np.ones(len(elem_df['dtype']), ) weights[elem_df['dtype'] == 'energy'] = en_weight weights[elem_df['dtype'] == 'force'] = 1 weighted_model = LinearRegression() weighted_model.fit(x, y, sample_weight=weights) energy_indices = np.argwhere(np.array(elem_df["dtype"]) == "energy").ravel() forces_indices = np.argwhere(np.array(elem_df["dtype"]) == "force").ravel() weighted_predict_y = weighted_model.predict(x) original_energy = y[energy_indices] original_forces = y[forces_indices] weighted_predict_energy = weighted_predict_y[energy_indices] weighted_predict_forces = weighted_predict_y[forces_indices] file_fl = open('forces_linear.txt', "w") file_el = open('energies_linear.txt', "w") file_fl.write("orig_force, predict_force\n") for index in forces_indices: file_fl.write(str(y[index])+" "+str(weighted_predict_y[index])+"\n") file_el.write("orig_en, predict_en\n") for index in energy_indices: file_el.write(str(y[index])+" "+str(weighted_predict_y[index])+"\n") file_fl.close() file_el.close() RMSE = mean_squared_error(original_forces, weighted_predict_forces) print("Parameters = " + str([en_weight, r1 ,w1,rcutfac])+" /// RMSE = "+str(RMSE)) return(RMSE)

JSONS.zip

An error of source code

The method parameter 'model_fname' of SNAPotential.model.save(model_fname=filename) seems not match with sklearn?? cause when it called, an error was raised。It should be 'filename'??

NAN values in NNP training

Hello!
Recently, I encountered a problem of some NAN values in the first column of the weights.XXX.data file when NNP training.

The following is my NNP training paramaters,

"nnp.train(train_structures=train_structures,
train_energies=train_energies,
train_forces=train_forces,
cutoff_type=1,
r_etas=[0.5,2.0],
a_etas=[0.5,2.0],
r_shift=[0.0],
zetas=[1.0,4.0],
r_cut=4.2,
hidden_layers=[4,4],
epochs=5)"

Very strangely, running the example(https://github.com/materialsvirtuallab/maml/blob/master/notebooks/pes/nnp/example.ipynb) in my cumputer is OK!

matminer wrapper has no tests

New versions of pylint failed because there was a bad call in the matminer wrapper that says

super(new_class, self).__init__(**base_kwargs)

when new_class is not even defined. Why are there no unittests for all these?

lattice not in supported kwargs ['lmp_exe']

from maml.apps.pes._snap import SNAPotential
from pymatgen.core import Structure, Element
from maml.apps.pes._lammps import EnergyForceStress, ElasticConstant, DefectFormation

snap = SNAPotential.from_config(coeff_file='SNAPotential.snapcoeff', param_file='SNAPotential.snapparam')

Ni_conventional_cell = Structure.from_file('Ni_conventional.cif')
efs_calculator = EnergyForceStress(ff_settings=snap)
energy, forces, stresses = efs_calculator.calculate([Ni_conventional_cell])[0]
print('The predicted energy of Ni conventional cell is {} eV'.format(energy))
print('The predicted forces of Ni conventional cell is \n {} eV/Angstrom'.format(forces))

elastic_calculator = ElasticConstant(ff_settings=snap, lattice='fcc', alat=3.508)
C11, C12, C44, bulkmodulus = elastic_calculator.calculate()
print('The predicted C11, C12, C44, bulkmodulus are {}, {}, {}, {} GPa'.format(C11, C12, C44, bulkmodulus))

defect_calculator = DefectFormation(ff_settings=snap, specie='Ni', lattice='fcc', alat=3.508)
defect_formation_energy = defect_calculator.calculate()
print('The predicted defect formation energy is {} eV'.format(defect_formation_energy))

When I run this code, it has this error.

Traceback (most recent call last):
File "/home/sdb/zzhen/2021/materialsvirtuallab/maml-2021.10.14/mvl_models/pes/Ni/snap/Ni_snap.py", line 13, in
elastic_calculator = ElasticConstant(ff_settings=snap, lattice='fcc', alat=3.508)
File "/home/sdb/zzhen/2021/materialsvirtuallab/maml-2021.10.14/maml/apps/pes/_lammps.py", line 417, in init
super().init(**kwargs)
File "/home/sdb/zzhen/2021/materialsvirtuallab/maml-2021.10.14/maml/apps/pes/_lammps.py", line 87, in init
raise TypeError("%s not in supported kwargs %s" % (str(i), str(self.allowed_kwargs)))
TypeError: lattice not in supported kwargs ['lmp_exe']

Fortran runtime error

Hi, while running the pes/gap example, i encountered the following error:

INFO:maml.utils._lammps:Structure index 0 is rotated.
INFO:maml.utils._lammps:Structure index 1 is rotated.
INFO:maml.utils._lammps:Structure index 2 is rotated.
INFO:maml.utils._lammps:Structure index 3 is rotated.
INFO:maml.utils._lammps:Structure index 4 is rotated.
INFO:maml.utils._lammps:Structure index 5 is rotated.
INFO:maml.utils._lammps:Structure index 6 is rotated.
INFO:maml.utils._lammps:Structure index 7 is rotated.
INFO:maml.utils._lammps:Structure index 8 is rotated.
INFO:maml.utils._lammps:Structure index 9 is rotated.
Fortran runtime error: Incorrect extent in VALUE argument to DATE_AND_TIME intrinsic: is -2, should be >=8

Error termination. Backtrace:
#0 0x5599f40f562a in ???
#1 0x5599f3cf11ea in ???
#2 0x5599f3cf0cae in ???
#3 0x7fd4f1abc0b2 in ???
#4 0x5599f3cf0ced in ???
#5 0xffffffffffffffff in ???
Traceback (most recent call last):
File "/home/xinglong/anaconda3/envs/ml/lib/python3.8/site-packages/maml/apps/pes/_gap.py", line 343, in train
error_line = [i for i, m in enumerate(msg) if m.startswith("ERROR")][0]
IndexError: list index out of range

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "example.py", line 17, in
gap.train(train_structures=train_structures, train_energies=train_energies,
File "/home/xinglong/anaconda3/envs/ml/lib/python3.8/site-packages/maml/apps/pes/_gap.py", line 346, in train
error_msg += msg[-1]
IndexError: list index out of range

From the limited answers online, it seems that this suggests that the gfortran used to compile the program (QUIP/gap_fit) is different from the one that is used during the run. However, the gfortran i have on the machine is the same one used for compiling and running. The detail architech of the machine is linux_x86_64, the gfortran version is GNU Fortran (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0 and the gcc version is gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0.

This may be due to the gap_fit program of QUIP, wondering if you are able to advise on possible solutions please?

Thank you for your kind attention.

Flatten the structure of maml.

I think we should move towards a flatter organizational structure, similar to sklearn.

Basically, all implementations should be in separate files, but preceded by _. E.g.,

pes
- _snap
- _mtp
- ...
- __init__.py

The __init__.py will then import the relevant things. See how sklearn implement things - e.g., scikit-learn/ensemble, scikit-learn/ensemble. I like this implementation because imports are a lot simpler, and we still retain the good organization of separate files and full flexibility to move things around if needed.

Predicting for structures in parallel

Is there a way to generate descriptors for the "validation set" data in parallel to predict the energy/ forces/ stress ? I am currently using EnergyForceStress, but have to loop over each structures individually to do these predictions.

ImportError: cannot import name 'export_saved_model' from 'tensorflow.python.keras.saving.saved_model'

Hi,

I was trying to run one of the notebooks and received this error while importing:

ImportError: cannot import name 'export_saved_model' from 'tensorflow.python.keras.saving.saved_model' (/home/vishank-hp/miniconda3/lib/python3.7/site-packages/tensorflow/python/keras/saving/saved_model/__init__.py)
For installation, I tried both the methods:

  1. Forking the repository and python setup.py develop
  2. pip install maml

Would you please let me know if I do have to install some dependencies manually? or I am missing a step.

Thanks

Unittests are fragile

@chc273 @JiQi535 @w6ye The unittests as they are written are very fragile. Look at the recent runs. When using ISIS, the results fluctuate and the tests fail randomly. The LBGS optimization also sometimes go out of bounds. While I understand that some of these algorithms are numerical in nature, tests have to be written on model systems and data where you can be very certain of the outcome. Otherwise, they are not proper tests. Fix this asap.

mpi for NNP

I wonder if the parallelization of n2p2 through mpi is transferred and can be used in the maml package.

Describer. - get citations

I do not undersrtand the design of this method.
Why does it return a list of str?
That is a silly format.
Either make it just a simple string, or return a well-structured object like a pybtex.
Alternatively, there is no need for the citation to be returned as a method. It can just be in the documentation.

MTP training problem

Hi,
I'm trying to train MTP models with the previous example data and notebook from mlearn package (current MAML seems not having that notebook), but the training process fails with configuration file (.mtp file) giving multiple '-nan' values:

"""
MTP
version = 1.1.0
potential_name = MTP1m
scaling = 1.438492177533894e-04
species_count = 1
potential_tag =
radial_basis_type = RBChebyshev
min_dist = 4.000000000000000e+00
max_dist = 4.800000000000000e+00
radial_basis_size = 8
radial_funcs_count = 2
radial_coeffs
0-0
{-nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan}
{-nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan}
alpha_moments_count = 18
alpha_index_basic_count = 11
alpha_index_basic = {{0, 0, 0, 0}, {0, 1, 0, 0}, {0, 0, 1, 0}, {0, 0, 0, 1}, {0, 2, 0, 0}, {0, 1, 1, 0}, {0, 1, 0, 1}, {0, 0, 2, 0}, {0, 0, 1, 1}, {0, 0, 0, 2}, {1, 0, 0, 0}}
alpha_index_times_count = 14
alpha_index_times = {{0, 0, 1, 11}, {1, 1, 1, 12}, {2, 2, 1, 12}, {3, 3, 1, 12}, {4, 4, 1, 13}, {5, 5, 2, 13}, {6, 6, 2, 13}, {7, 7, 1, 13}, {8, 8, 2, 13}, {9, 9, 1, 13}, {0, 10, 1, 14}, {0, 11, 1, 15}, {0, 12, 1, 16}, {0, 15, 1, 17}}
alpha_scalar_moments = 9
alpha_moment_mapping = {0, 10, 11, 12, 13, 14, 15, 16, 17}
species_coeffs = {-nan}
moment_coeffs = {-nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan}
"""

Also the training output is weird:

"""
WARNING:
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!! WARNING WARNING WARNING !!!
!!! Read a configuration with (negative) Stress. !!!
!!! This feature will be removed soon! !!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

BFGS iterations count set to 500
BFGS convergence tolerance set to 1e-08
Energy weight: 1
Force weight: 0.01
Stress weight: 0
MTPR parallel training started
BFGS iter 0: f=-nan
BFGS iter 1: f=-nan
BFGS iter 2: f=-nan
BFGS iter 3: f=-nan
BFGS iter 4: f=-nan
BFGS iter 5: f=-nan
......
BFGS iter 499: f=-nan
step limit reached
MTPR training ended
Rescaling...
scaling = 0.000119874348127824, condition number = -nan
scaling = 0.000130772016139445, condition number = -nan
scaling = 0.000143849217753389, condition number = -nan
scaling = 0.000158234139528728, condition number = -nan
scaling = 0.000172619061304067, condition number = -nan
Rescaling to 0.000143849217753389... done

	* * * TRAIN ERRORS * * *

Errors report
Energy:
Errors checked for 10 configurations
Maximal absolute difference = nan
Average absolute difference = nan
RMS absolute difference = nan

Energy per atom:
Errors checked for 10 configurations
Maximal absolute difference = nan
Average absolute difference = nan
RMS absolute difference = nan

Forces:
Errors checked for 540 atoms
Maximal absolute difference = -nan
Average absolute difference = -nan
RMS absolute difference = -nan
Max(ForceDiff) / Max(Force) = -nan
RMS(ForceDiff) / RMS(Force) = -nan

Stresses (in eV):
Errors checked for 10 configurations
Maximal absolute difference = -nan
Average absolute difference = -nan
RMS absolute difference = -nan
Max(StresDiff) / Max(Stres) = -nan
RMS(StresDiff) / RMS(Stres) = -nan

Virial stresses (in GPa):
Errors checked for 10 configurations
Maximal absolute difference = -nan
Average absolute difference = -nan
RMS absolute difference = -nan
Max(StresDiff) / Max(Stres) = -nan
RMS(StresDiff) / RMS(Stres) = -nan


"""

It seems the problem caused by '-nan' values given by .mtp files.

Thanks a lot and have a nice day!

Converting vasp output for Bispectrum_coefficients.

I have vasp output OUTCAR/CONTCAR/XDATCAR files of my required structures obtained from MOLECULAR DYNAMICS. Can you help me in converting them to format that can be input to the transform() function of "Bispectrum_coefficients"?

Refactor soap to gap

@YunxingZuo
Please change the model name and all related modules to Gaussian approximation potential (GAP) instead of using its feature name smooth overlap of atomic positions (SOAP).

NNP train issues (nnp-scaling and nnp-train)

Hi!
I recently find that the train() function may have some issues in NNP model:

I personally changed the line 618 and line 621 in _nnp.py (also nnp.py in previous mlearn package):

  • p_scaling = subprocess.Popen(['nnp-scaling', input_filename]) --> p_scaling = subprocess.Popen(['nnp-scaling', '{}'.format(bin_num)])

  • p_train = subprocess.Popen(['nnp-train', input_filename], --> p_train = subprocess.Popen(['nnp-train'],

And here're some of my questions:

a. Are my changes correct? (I made these changes according to my own understanding of the n2p2 websites);

b. The role of bin number in scaling process seems not very clear (even in original website);

c. Is it possible to capture the error messages and show them? The error message of 'gsl histogram' can be seen when executing python on terminal, but are missing in ipynb or in program output;

d. It seems that nnp-scaling and nnp-training support parallel computing, such as using 'mpirun -np ', which may decrease the training time a lot.

Thank you very much!
Best regards,

Aout virial_stress data

Why does the function convert_docs() in /maml/utils/_data_conversion.py not handle virial_stress data?
Add how can I train with virial_stress data?
Thank you!

element profile/hyperparameter optimization

Dear Developers,

I am trying to optimize the element profile for a multicomponent system.
I am a very beginner in python doing this (manually) by python 'for loop'.
I am afraid that it will take 15 years to be finished (200x200x200 number of searches).
I am seeing that authors previously did it for several multicomponent systems.

Could you suggest to us some efficient and faster way to do it?

####################

loop

rcut_grid = []
for rc_1 in np.arange(4,6,0.01):
for rc_2 in np.arange(4,6,0.01):
for rc_3 in np.arange(4,6,0.01):

        element_profile = {'Ti': {'r': rc_1, 'w': Ti}, 'Si': {'r': rc_2 , 'w': Si}, 
               'C': {'r': rc_3, 'w': C}}
        describer = BispectrumCoefficients(rcutfac=0.5, twojmax=6, 
                               element_profile=element_profile, quadratic=False, 
                               pot_fit=True, include_stress=False, n_jobs=4)
        tsc_features = describer.transform(tsc_train_structures)
        y = tsc_df['y_orig'] / tsc_df['n']
        x = tsc_features
        simple_model = LinearRegression(n_jobs=4)
        simple_model.fit(x, y, sample_weight=weights)
        energy_indices = np.argwhere(np.array(tsc_df["dtype"]) == "energy").ravel()
        forces_indices = np.argwhere(np.array(tsc_df["dtype"]) == "force").ravel()
        simple_predict_y = simple_model.predict(x)
        original_energy = y[energy_indices]
        original_forces = y[forces_indices]
        simple_predict_energy = simple_predict_y[energy_indices]
        simple_predict_forces = simple_predict_y[forces_indices]
        e_e=mean_absolute_error(original_energy, simple_predict_energy) *10000
        e_f=mean_absolute_error(original_forces, simple_predict_forces)

        rcut_grid.append((rc_1, rc_2, rc_3, e_e, e_f))

positive pe/atom, multi-element system: SNAP

Dear Developers,

I am training SNAP for a ternary system containing Ti, Si, and C.

  1. Using distorted (with or without NVT data) training data set of this system I am seeing that computed potential energy/atom is huge (order of 10^6) and negative for Ti, C; positive for Si!.
  2. However, the total energy of the perfect system is consistent with DFT.
  3. The obtained SNAP can reproduce many properties calculated by DFT like lattice parameter, elastic constant, stacking fault energy.
  4. To get reasonable pe/atom, I add elemental (bulk) data for each component. Now the pe/atom is in order of 10^1, but still, I am getting positive pe for Si.

I guess I am (by maml code) somehow managing the total energy of my system with non-physical pe/atom by making two more negative and one positive.
I actually tried a lot with different combinations of training data set and looking for the pe/atom which is physically meaningful (negative).
Is there any way to constrain the snap coeff. during training which will at least ensure negative pe/atom?
Or how can I resolve this issue for any multi-component systems?

I will be waiting to hear from you.

Best regards,
Rana

Memory handling

I am trying to use the Bispectrum Coefficients based SNAP potential for my training with ~7500 structures but ending up with some memory issue:
"Some of your processes may have been killed by the cgroup out-of-memory handler"

I am using parallel descriptor construction with the n_jobs tag.

Any advice what I might be doing wrong?

Incompatible structure found

Can someone please explain me what the _sanity_check function (line 99) in the ) _lammps.py is doing?

I am getting "Incompatible structure found" while trying to train some of the structures.

Error when running NNP.train( )

Hi!

I keep getting this error message when trying to run NNP.train( ). The input structures are a list of pymatgen structures, with the corresponding list of energies and forces (n_atoms, 3).

Error message:
--> 685 self.train_forces_rmse = errors[0]
686 self.validation_forces_rmse = errors[1]

Index Error: list index out of range

Thanks in advance!

Installing GAP and NNP dependencies

Hi,

I was trying to run the example of PES with GAP fitting, and would like to know how I can interface MAML code with GAP. I have installed quippy with GAP already but do not know how to let maml code direct at the correct file/location to look for GAP capabilities.

Thanks

Query regarding parameters

Hi, I'm fairly new to using the SNAP POTENTIAL in maml. I'd like to know what is the meaning of w and r in element_profile. Also the rcutfac in BispectrumCoefficients() is same as r_c parameter discussed in the seminal papers of SNAP?

Database format

A database is a good idea. But I think we should try to use something widely supported. We can even support a few options. Any recommendations? The obvious ones are hdf5 and json and mysql. MongoDB is probably too heavy duty, though it can be an option since the translation to json is easy.

Data availability for Ensemble-Learned Spectra IdEntification (ELSIE) algorithm

Hello,

Recently I have developed property predictive and spectra matching deep learning algorithm using site averaged K-edge XANES spectrum database from Materials Project. And I found that the site-wise might improve my model but unfortunately, I couldn't download site-wise K-edge spectrum from MIRester. So I have to download site-wise spectra from the legacy website by clicking download button. Downloading all XANES spectra in this way is impractical. L-edge data can be downloaded from the paper website (offered by figshare link). Is there any way to download site-wise K-edge XANES spectra?

I want compare my models with excellent results of your group.

  1. Random Forest Models for Accurate Identification of Coordination Environments from X-Ray Absorption Near-Edge Structure
  2. Automated generation and ensemble-learned matching of X-ray absorption spectra

But without the database, the comparison might be wrong because my model is not trained with same database you used.

Thank you!

BOWSR implementation in material discovery problems

Hi all!
Congrats on the great work!
I am considering implementing BOWSR in a material discovery pipeline, but I am unsure about the actual inputs it requires.
From Section 2.2 (Properties Prediction) in the paper it seems that the underlying idea of the algorithm is to skip the computationally expensive DFT structural relaxation with the elemental substitution "trick", that basically translates to a smart way to get the unrelaxed structure of a compound which crystallography is unknown. ( (1) Is this right?)
I am referring to the lines

"For each crystal in the dataset (e.g., rock salt GeTe), another crystal with the
same prototype but a different composition (e.g., rock salt NaCl) was selected
at random and multi-element substitutions (Na→Ge, Cl→Te) were performed
to arrive at an “unrelaxed” structure."

If this was the case, the algorithm would be able to get a reasonably relaxed structure for any given input formula (and only formula), but in the example notebooks provided, the algorithm is only used as a structure relaxator, skipping the very relevant structure guessing step.

(2) Is my question well posed, or did I not catch something?
(3) If so, how can I see the algorithm at work assigning an unrelaxed structure to a formula?

Thank you!

import typo

The second line in garnet_formation_energy.ipynb should be:

from pymatgen.core import Structure 

rather than

from pymatgen import Structure

Assigning weights for each group in SNAP.train

Hello,

I'm trying to recreate the results in this paper using the data given in the mlearn repo. The tutorial on nanoHUB assigns 10000 and 1 as the weights for energy and force respectively. The supplementary material with the paper gives a list of optimized hyperparameters for each group of data (eg. Energy weight of elastic group, Force weight of elastic group, etc) for each element. On using convert_docs after pooling the structures, energies and forces, the dataframe obtained does not specify the group. How can I assign the optimized weights corresponding to each specific group?

SNAP model failing with supercells larger 12 angstroms

Dear Developers,

I ran into a problem while training a test snap model.
The fact is that for a supercell smaller than 12 angstroms, everything is fine. And for a larger supercell, the correlation between the predicted forces and the forces in the training set is zero.

The structures for the training set were obtained using the VASP. I attach two samples in the JSON format, obtained by an identical vasp script with the only difference being that one cell was 11.9 angstroms, and the other was 12.1 angstroms. Also I attach text files with original and predicted forces to visualize the difference.

What can be the reason of this effect?

JSONS.zip

My python script is following:

      element_profile = {Al: {'r': 0.5, 'w': 1.0}}
      per_force_describer = BispectrumCoefficients(rcutfac=rcutfac, twojmax=6, 
                                                      element_profile=element_profile, 
                                                      quadratic=False, 
                                                      pot_fit=True, 
                                                      include_stress=False, 
                                                      n_jobs=n_threads, verbose=False)

      elem_features = per_force_describer.transform(train_structures)

      train_pool = pool_from(train_structures, train_energies, train_forces)
      _, elem_df = convert_docs(train_pool)

      y = elem_df['y_orig'] / elem_df['n']
      x = elem_features

      weights = np.ones(len(elem_df['dtype']), )
      weights[elem_df['dtype'] == 'energy'] = en_weight
      weights[elem_df['dtype'] == 'force'] = 1

      weighted_model = LinearRegression()
      weighted_model.fit(x, y, sample_weight=weights)

      energy_indices = np.argwhere(np.array(elem_df["dtype"]) == "energy").ravel()
      forces_indices = np.argwhere(np.array(elem_df["dtype"]) == "force").ravel()
      weighted_predict_y = weighted_model.predict(x)
      original_energy = y[energy_indices]
      original_forces = y[forces_indices]
      weighted_predict_energy = weighted_predict_y[energy_indices]
      weighted_predict_forces = weighted_predict_y[forces_indices]
      file_fl = open('forces_linear.txt', "w")
      file_el = open('energies_linear.txt', "w")
      file_fl.write("orig_force, predict_force\n")
      for index in forces_indices:
            file_fl.write(str(y[index])+" "+str(weighted_predict_y[index])+"\n")
      file_el.write("orig_en, predict_en\n")
      for index in energy_indices:
            file_el.write(str(y[index])+" "+str(weighted_predict_y[index])+"\n")      
      file_fl.close()
      file_el.close()
      RMSE = mean_squared_error(original_forces, weighted_predict_forces)
      print("Parameters = " + str([en_weight, r1 ,w1,rcutfac])+"  /// RMSE = "+str(RMSE))
      return(RMSE)

Switch to Github Actions

Instead of using CircleCI, we will be moving to github actions henceforth for testing and linting.

  1. Pls make sure all linting passes, including pylint.
  2. Pls add a lammps executable for Ubuntu so that we can test the PES.

We will disable the CircleCI once all tests pass.

How to create the database

Hi there, big fan of the program,

I am struggling with figuring out how to create a .json database from which to train the model. Most of the example jupyter-notebooks start with loading in some data stored in a .json format, e.g. using "loadfn('./data/Mo/AIMD_NVT.json')"

I have tried simply using the to_json method from pymatgen.io.vasp.Vasprun , but when I try using loadfn() it keeps throwing errors like
"init() got an unexpected keyword argument 'vasp_version' "

Can you please take me through how to create such a data file from VASP output?

Cheers

Port jupyter notebooks from mlearn

I want to ensure that all functionality originally in mlearn is completely ported over, including the Jupyter notebooks. Also, we should deprecate mlearn completely and point people to maml.

@YunxingZuo

Missing 'diagonal' command in SpectralNeighborAnalysis() function

Hi, all!
I was running SNAP model recently and find a missing command in SpectralNeighborAnalysis() function in the current version:
When running the notebook example of SNAP, training snap (the codes in the 3rd block) gives an error:
"ValueError: Shape of passed values is (144, 64), indices imply (144, 30)"
which means the bispectrum values calculated by lammps are different from the intended output.

I found that in previous mlearn package, where SNAP works fine, the computation argument is:
compute_args += ' diagonal {} rmin0 {} quadraticflag {}'.
format(self.diagonalstyle, self.rmin0, qflag)
in 'calcs.py' line 584. Note that the default diagonalstyle is 3 previously.

While the current maml, 'diagonalstyle' is removed and the computation argument is:
compute_args += " rmin0 0 quadraticflag {}".format(int(self.quadratic))
in '_lammps.py' line 342.

As I tested, the default 'diagonal' in argument command is 0, which gives 64 values (diagonal 1 gives 22 and diagonal 2 gives 7). Changing the argument command to " diagonal 3 rmin0 0 quadraticflag {}" seems solving the problem. Or simply add back the 'diagonalstyle' parameter.
I'm not sure if I get this one correct?

Thanks a lot and have a nice day!

Best regards,

ModuleNotFoundError: No module named 'bowsr'

Hi,

I had some problems to run the notebooks of cgcnn_example.ipynb and megnet_example.ipynb.

I had the similar error messages for both notebooks:

ModuleNotFoundError Traceback (most recent call last)
/var/folders/0c/rcgf90kd6z9c7yc0p_tpffl81l1jxs/T/ipykernel_90244/3049958724.py in
----> 1 from bowsr.model.cgcnn import CGCNN
2 from bowsr.optimizer import BayesianOptimizer
3 from pymatgen.core.periodic_table import get_el_sp
4 model = CGCNN()
5

ModuleNotFoundError: No module named 'bowsr'


ModuleNotFoundError Traceback (most recent call last)
/var/folders/0c/rcgf90kd6z9c7yc0p_tpffl81l1jxs/T/ipykernel_90245/1238206353.py in
2 os.environ["CUDA_VISIBLE_DEVICES"] = "-1"
3 import tensorflow as tf
----> 4 from bowsr.model.megnet import MEGNet
5 from bowsr.optimizer import BayesianOptimizer
6 from pymatgen.core.periodic_table import get_el_sp

ModuleNotFoundError: No module named 'bowsr'

I followed the instructions to install all the libraries. Please let me know if you can help.

LAMMPS pair_style compatibility for NNP (HDNNP)

Hi, all!
It seems LAMMPS has updated its pair_style command for hdnnp. Running MAML with NNP gives errors on pair_style:
"""
pair_style hdnnp cutoff keyword value ...
pair_coeff * * elements
"""
which is different from previous:
"""
pair_style nnp keyword value ...
pair_coeff * * elements cutoff
"""
https://docs.lammps.org/pair_hdnnp.html

I'm not sure if the LAMMPS has other packages implementing NNP else than HDNNP.
If ML-HDNNP is the LAMMPS package for NNP, I guess the _nnp.py may need to be updated as well?
For pair_style and pair_coeff variables:
"""
pair_style = (
'pair_style hdnnp {} dir "./" showew no showewsum 0 '
"maxew 10000000 resetew yes cflength 1.8897261328 cfenergy 0.0367493254"
)
pair_coeff = "pair_coeff * * {}"
""" (from line 37)

For write_param():
"""
ff_settings = [self.pair_style.format(self.param.get("r_cut") + 1e-2), self.pair_coeff.format(" ".join(self.elements))]
""" (from line 704)
I'm not sure if I missed anything on this part?

Thanks a lot and have a nice day!

Best,

Tensorflow import error - ModuleNotFoundError: No module named 'absl'

After pip install maml on Windows through VS Code in a fresh conda env in Python 3.8:

>>> import tensorflow
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\sterg\AppData\Roaming\Python\Python38\site-packages\tensorflow-2.5.0rc1-py3.8-win-amd64.egg\tensorflow\__init__.py", line 41, in <module>
    from tensorflow.python.tools import module_util as _module_util  
  File "C:\Users\sterg\AppData\Roaming\Python\Python38\site-packages\tensorflow-2.5.0rc1-py3.8-win-amd64.egg\tensorflow\python\__init__.py", line 40, in <module>
    from tensorflow.python.eager import context
  File "C:\Users\sterg\AppData\Roaming\Python\Python38\site-packages\tensorflow-2.5.0rc1-py3.8-win-amd64.egg\tensorflow\python\eager\context.py", line 28, in <module>
    from absl import logging
ModuleNotFoundError: No module named 'absl'

A quick search reveals the suggestion:

pip install absl-py

Then I get ModuleNotFoundError for gast, then astunparse, and decided to stop there.

In the end, I resolved it by following the instructions from my PR #371. I think I needed to explicitly pip install tensorflow.

BOWSR MWE based on chemical formula

Hi @chc273,

Again, nice work on BOWSR! I've been recommending BOWSR to a few people, but I'm realizing the implementation described in the Materials Today paper is not immediately obvious to me. IIRC, this involves swapping out the "correct" atoms for a similar chemical formula template (e.g. we have the CIF file for Al2O3, we want a crystal structure for V2O3, so we "swap" Al atoms with V atoms) and then running the optimizer.

Assuming my understanding is correct, do you mind sharing a MWE for the use-case described in the paper (i.e. create a relaxed structure using only a chemical formula)?

Sterling

Getting the snapcoeff and snapparam files.

Dear all, how can I get the snapcoeff and snapparam files from within maml to be used with LAMMPS?

Also, if I want to calculate properties using a developed snap model for a multi element system, how to proceed? Suppose I want to get elastic constants for NbMoTaW with the potential object named "NMTW", in that case how to initiate the ElasticConstant() calculator? I want to get the constants for bulk NbMoTaW and not individual elements.

Potential file from NNP to Lammps

Hi!

I wonder if there's a functionality in maml to extract a NNP potential file (in the Lammps) format after training within the maml
framework, in order to run further Lammps simulations.

Thanks in advance!

BispectrumCoefficients error

Hello,

I am getting "TypeError: expected str, bytes or os.PathLike object, not NoneType" when I try to compute the BispectrumCoefficients in the feature_analysis notebook provided on nanoHUB.

Screenshot (239)
Screenshot (241)
Screenshot (244)

How can I fix this error?

general describers can be rewritten

There are two describers in general.py, namely MultiDescriber and FuncGenerator.

The MultiDescriber works similarly to a sklearn Pipeline. I think we can redo this using sklearn pipeline to get a more robust version.

The FuncGenertor relies on the eval function to deserialize a function. It requires that the function to be defined somewhere in the script or in the general.py module. It is not very robust. It would be nice to rewrite it and use utilities maml.utils to deserialize the functions.

cannot import name 'NNPotential' from 'maml.apps.pes'

Hello!
I've got a problem at initials state while importing potentials
from maml.apps.pes import NNPotential
ImportError: cannot import name 'NNPotential' from 'maml.apps.pes' (//anaconda3/lib/python3.7/site-packages/maml/apps/pes/__init__.py)

The same thing with other potentials

Thanks in advance!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.