cedergrouphub / smol Goto Github PK
View Code? Open in Web Editor NEWStatistical Mechanics on Lattices
Home Page: https://cedergrouphub.github.io/smol/
License: Other
Statistical Mechanics on Lattices
Home Page: https://cedergrouphub.github.io/smol/
License: Other
Reduce ewald terms into pair interactions, return the corresponding pairs in a form similar to that mentioned in:
#34
Is your feature request related to a problem? Please describe.
There are some examples which require modification to the way spin flips are currently implemented:
It has been suggested by Gerd to include a feature where a user can specify allowed perturbations since it seems that other perturbations, besides for the two examples above, may want to be enforced.
Describe the solution you'd like
Ideally, user specifies a table of allowed perturbations and a probability by which the perturbation can occur. In example 1, a user could specify: {'Li+': {'sublattice_1': 0.5}, {'sublattice_2': 0.5}}
indicating that a Li+ can equally flip into sublattice 1 or sublattice 2. The code will then enforce that such possible flips are picked with equal probability.
Note that detailed balance should be obeyed at all times, which could be either up the user, or the code, to enforce.
Describe alternatives you've considered
@tchen0965 has code which can address the first issue of Li+ spin flips between different sublattices. She may want to submit a PR soon.
Is your feature request related to a problem? Please describe.
Please describe the problem you are trying to solve.
Not related to a problem, but I find myself creating to many ensembles that I rarely use directly.
Except for obtaining an initial occupancy, but other than that not really used for anything else.
Describe the solution you'd like
Please describe the desired behavior.
Perhaps having a convenience constructor in the sampler to create directly from a Cluster Expansion like so,
Sampler.from_cluster_expansion(expansion, supercell_matrix, ...all other goodies needed)
Describe alternatives you've considered
Please describe alternative solutions or features you have considered.
Doing so would likely limit how much tweaking a user could do for each of the classes involved (processor, ensemble, kernel, etc) but for those cases one can just fall back to the lengthier more verbose construction of each class individually.
File Name | Module Path | Authors |
---|---|---|
ewald.py | smol.extern | William Davidson Richard, Luis Barroso-Luque |
We use code review issues to open up a pinned location to discuss a specific
source file.
Implementation of EwaldTerm
class.
Provides the functionality to fit an Ewald term as an additional feature in a
cluster expansion as proposed refs below to improve convergence of expansions
of ionic materials.
Chapter 4.6 of W.D. Richard's thesis.
W. D. Richards, et al., Energy Environ. Sci., 2016, 9, 3272โ3278
Point to specific lines in code
If you are mentioning a set of specific lines in code already implemented, it
may come in handy to add a
permanent link to it.
Format suggested code niceley
def roast(beans, *flavoring_oils):
for oil in flavoring_oils:
beans.add_oil(oils)
return np.trapz(beans.get_heat_vector())
Use other handy features too
Using many of other handy features offered, can help improve discussion.
https://docs.github.com/en
I think a pruning test could be a good idea. Could just be something simple like before and after pruning, when the tolerance is small (1e-6 or even 0), the energy difference should be below some reasonable number like 10 meV. Maybe also plus a check that n_bit_orderings is equal to number of ECI + # external terms.
Hey just a small idea for user friendliness:
How about we add a method that visualizes clusters, so that users knows what it is on a projected structure plot? I think this might be very helpful when user wants to analyze short range orderings. They can just type like: some_orbit.visualize() and get a image right away, not having to match the indices of each atom in a cluster, and connect them manually.
Is your feature request related to a problem? Please describe.
Please describe the problem you are trying to solve.
Describe the solution you'd like
Please describe the desired behavior.
Describe alternatives you've considered
Please describe alternative solutions or features you have considered.
When creating a ClusterSubspace.from_cutoffs
all the clusters of given size below the provided cutoff should be obtained.
The newest update which improved performance of orbit/cluster generation has too tight of a search window such that in some cases it may miss clusters that are within the given cutoff. For example for a system I am currently using setting the cutoff for clusters of size 4 to 3, misses a quadruplet of diameter 2.97. Setting the cutoff slightly higher to 3.1 correctly finds it.
The optimal point where not too many unnecessary or ideally no clusters are discarded is needed to get the correct behavior.
File Name | Module Path | Authors |
---|---|---|
wrangler.py | smol | Luis Barroso-Luque |
We use code review issues to open up a pinned location to discuss a specific
source file.
Implementation of the StructureWrangler
class and functions to obtain fit weights.
Includes functions used to preprocess and check (wrangle) fitting data of
structures and properties.
Point to specific lines in code
If you are mentioning a set of specific lines in code already implemented, it
may come in handy to add a
permanent link to it.
Format suggested code niceley
def roast(beans, *flavoring_oils):
for oil in flavoring_oils:
beans.add_oil(oils)
return np.trapz(beans.get_heat_vector())
Use other handy features too
Using many of other handy features offered, can help improve discussion.
https://docs.github.com/en
(1)
path :CederGroupHub/smol/examples/using-monte-carlo.ipynb
modification needed for cell 1:
from smol.cofe.configspace import EwaldTerm
import numpy as np
import json
import matplotlib.pyplot as plt
from pymatgen.io.cif import CifParser
from pymatgen.core.structure import Structure
from pymatgen.transformations.standard_transformations import OrderDisorderedStructureTransformation
from smol.cofe import ClusterExpansion
from smol.moca import CExpansionProcessor, CanonicalEnsemble, MuSemiGrandEnsemble, FuSemiGrandEnsemble, BaseSemiGrandEnsemble
from smol.cofe.configspace import EwaldTerm
(2) In the section "Now Create a Processor"
you can change processor = ClusterExpansionProcessor(ce, sc_matrix)
to
processor = CExpansionProcessor(ce, sc_matrix)
(3) in the section "Look at the ensemble current state properties"
you can change print(f'The current step energy is {censemble.energy} eV')
to
print(f'The current step energy is {censemble.current_energy} eV')
I'd like to include the Ewald energy from pymatgen's Ewald summation algorithm and was thinking I could add it when generating the ClusterExpansion object as follows:
`
fitting_data = [(Structure.from_dict(x['s']), x['toten']) for x in json.load(fin)]
EwaldTerms = []
for sData in fitting_data:
EwaldTerms.append(EwaldSummation(sData[0]))
ce = ClusterExpansion.from_radii(structure = prim,
data = fitting_data,
external_terms = [EwaldTerms],
...)
`
However, the input to external_terms throws an error. Any leads or suggestions on how to add it properly?
Thank you in advance!
Is your feature request related to a problem? Please describe.
Please describe the problem you are trying to solve.
Describe the solution you'd like
Please describe the desired behavior.
Describe alternatives you've considered
Please describe alternative solutions or features you have considered.
Is your feature request related to a problem? Please describe.
For MC runs a production iteration sampling start can be set by the user as something like ensemble.production_start = X
. It would be helpful to have a class method that can auto-detect a value for this.
Describe the solution you'd like
A helper method in either the BaseEnsemble
or the CanonicalEnsemble
class, so that other ensembles simply inherit this. Since there are a few methods to do this we can start with one we think is best or allow users to choose how this is done.
File Name | Module Path | Authors |
---|---|---|
cluster.py | smol.configspace | Luis Barroso-Luque, William Davidson Richard |
We use code review issues to open up a pinned location to discuss a specific
source file.
Implementation of the Cluster
class.
Represents a group of sites of a given lattice. These are the building blocks
for a cluster basis of functions over configurational space.
Point to specific lines in code
If you are mentioning a set of specific lines in code already implemented, it
may come in handy to add a
permanent link to it.
Format suggested code niceley
def roast(beans, *flavoring_oils):
for oil in flavoring_oils:
beans.add_oil(oils)
return np.trapz(beans.get_heat_vector())
Use other handy features too
Using many of other handy features offered, can help improve discussion.
https://docs.github.com/en
This functionality will be very helpful for ground state solver implementation. For a 1D chain lattice, with each site's site domain = ['A','B','C'], and 2 sites in each sublattice.
Consider a supercell containing 3 adjacent sites ,therefore nbits = [[0,1],[0,1],[0,1]], and the corresponding site Boolean indices are:
[[0,1],[2,3],[4,5]]. If we consider only NN interactions, then all possible 'bit cluster' terms should be (including periodic boundary considition): [[0,2],[0,3],[1,2],[1,3],[2,4],[2,5],[3,4],[3,5],[0,4],[0,5],[1,4],[1,5]].
If we can make a list like this, it will be a lot easier to write down the ground state energy Boolean function.
Is your feature request related to a problem? Please describe.
In analysis of MC sampling we will eventually want access to many observables, for example correlation vectors, the occupation of (certain) clusters, etc.
Describe the solution you'd like
In theory all these things could be calculated from the sampled occupancy strings and/or properties. But if they are already being calculated in the run might as well save them. Here are some thoughts:
The portions of the Ewald energy (total, real, reciprocal, and point) should match when computed using the "expanded" matrices created with the EwaldTerm
or directly with an EwaldSummation
for an ordered structure.
For the case of charged cells all should match except the charged cell correction since the "expanded" Ewald structure has the charge of the disordered structure and not the particular ordered structure.
The computed Ewald interaction for charged cells using the "expanded" ewald matrices created by the EwaldTerm
, do not match the corresponding Ewald energies computed directly for a structure using pymatgen.EwaldSummation
when the structure is charged.
The real and reciprocal energies do not match, and as a result the total energy (even excluding the charged cell correction) does not match. The point energies do match.
In MC we should decide if it is necessary to add charge corrections when going off charge, because that is currently not being considered.
Everything works well for charge neutral structures.
For the real space energy for example:
EwaldSummation(ordered structure).real_space_energy
EwaldTerm(use_term='real').value_from_occupancy(occu for ordered structure, disordered structure)
File Name | Module Path | Authors |
---|---|---|
orbit.py | smol.configspace | Luis Barroso-Luque, William Davidson Richard |
We use code review issues to open up a pinned location to discuss a specific source file.
Implementation of the Orbit
class.
A set of symmetrically equivalent (with respect to the given random structure symmetry) clusters.
Point to specific lines in code
If you are mentioning a set of specific lines in code already implemented, it may come in handy to add a
permanent link to it.
Format suggested code niceley
def roast(beans, *flavoring_oils):
for oil in flavoring_oils:
beans.add_oil(oils)
return np.trapz(beans.get_heat_vector())
Use other handy features too
Using many of other handy features offered, can help improve discussion.
https://docs.github.com/en
The error message is:
raise AttributeError('The provided ClusterExpansion must have only'
AttributeError: The provided ClusterExpansion must have onlyone external term being an EwaldTerm
Arises for both ce.prune()
and not pruning. CEProcessor(ce, scs_matrix)
works fine even though ce
here also contains the EwaldTerm.
Make sure that a code review issue for the source file you are about to create
one does not already exist. Please edit the above fields based on the file you
are opening this issue for.
File Name | Module Path | Authors |
---|---|---|
expansion.py | smol | Luis Barroso-Luque |
We use code review issues to open up a pinned location to discuss a specific
source file.
This module implements the ClusterExpansion
class.
A ClusterExpansion
holds the necessary attributes to represent a CE and predict
the property for new structures.
Point to specific lines in code
If you are mentioning a set of specific lines in code already implemented, it
may come in handy to add a
permanent link to it.
Format suggested code nicely
def roast(beans, *flavoring_oils):
for oil in flavoring_oils:
beans.add_oil(oils)
return np.trapz(beans.get_heat_vector())
Use other handy features too
Using many of other handy features offered, can help improve discussion.
https://docs.github.com/en
File Name | Module Path | Authors |
---|---|---|
canonical.py | smol.moca.ensembles | Luis Barroso-Luque |
We use code review issues to open up a pinned location to discuss a specific
source file.
Implementation of a CanonicalEnsemble
Class.
Used when running Monte Carlo simulations for fixed number of sites and fixed
concentration of species.
Point to specific lines in code
If you are mentioning a set of specific lines in code already implemented, it
may come in handy to add a
permanent link to it.
Format suggested code niceley
def roast(beans, *flavoring_oils):
for oil in flavoring_oils:
beans.add_oil(oils)
return np.trapz(beans.get_heat_vector())
Use other handy features too
Using many of other handy features offered, can help improve discussion.
https://docs.github.com/en
File Name | Module Path | Authors |
---|---|---|
basis.py | smol.configspace | Luis Barroso-Luque |
We use code review issues to open up a pinned location to discuss a specific
source file.
Implementations of SiteBasis
classes used to define different basis sets for site function spaces.
The product of single site functions make up a cluster/orbit function used to
obtain correlation vectors. Site function spaces include the basis functions
and measure that defines the inner product for a single site. Most commonly a
uniform measure, but this can be changed to use "concentration" biased bases.
Point to specific lines in code
If you are mentioning a set of specific lines in code already implemented, it
may come in handy to add a
permanent link to it.
Format suggested code niceley
def roast(beans, *flavoring_oils):
for oil in flavoring_oils:
beans.add_oil(oils)
return np.trapz(beans.get_heat_vector())
Use other handy features too
Using many of other handy features offered, can help improve discussion.
https://docs.github.com/en
Many of the unit-tests are being ran on a synthetic binary system. It would be much more robust adding a somewhat more complex system such that things like multiple orbit labelings and different site spaces are also being checked in tests.
If you remove some bit_combos with zero ecis, then you first dumpfn, then reload with loadfn, the number of correlation functions won't match before and after reload
The reason might be that you actually never save the changed orbit.bit_combos in as_dict, and you always re_initialize bit_combos when you re_initialize an orbit object. So for each saved orbit, you have added some previously removed bit_combo (correlation function) back during re-initialization.
I would not really call this as a bug, because people can still redo the trimming process whenever they want to use a cluster expansion with zero eci correlation functions trimmed out. But you know this still causes some inconvenience, and personally I would feel safer and more comfortable if I'm able to save and reload the trimmed cluster expansion directly.
Make sure that a code review issue for the source file you are about to create
one does not already exist. Please edit the above fields based on the file you
are opening this issue for.
File Name | Module Path | Authors |
---|---|---|
processor.py | smol.moca | Luis Barroso-Luque |
We use code review issues to open up a pinned location to discuss a specific
source file.
Implementation of processor classes for a fixed size super cell.
A processor is optimized to compute correlation vectors and local changes in
correlation vectors. This class allows the use a cluster expansion hamiltonian
to run Monte Carlo based simulations.
Two classes are implemented
CEProcessor
- for flips in CE Hamiltonians onlyEwaldCEProcessor
- for flips in CE Hamiltonians with Ewald electrostaticsPoint to specific lines in code
If you are mentioning a set of specific lines in code already implemented, it
may come in handy to add a
permanent link to it.
Format suggested code nicely
def roast(beans, *flavoring_oils):
for oil in flavoring_oils:
beans.add_oil(oils)
return np.trapz(beans.get_heat_vector())
Use other handy features too
Using many of other handy features offered, can help improve discussion.
https://docs.github.com/en
Is your feature request related to a problem? Please describe.
Please describe the problem you are trying to solve.
In #154 Trace classes where introduced to generalize sampling values during MC. These classes only allow np.arrays as attributes. It seems this is a good opportunity to use numpy structured arrays instead, since they essentially provide all the functionality needed and will also benefit from the better memory layout,
Describe the solution you'd like
Please describe the desired behavior.
Use numpy structured arrays instead of Trace/StepTrace classes
Is your feature request related to a problem? Please describe.
Functionality for bit combo hierarchy (or function decorated clusters) has been added in #106. It would be nice to allow to get orbit hierarchy as well.
Describe the solution you'd like
The ClusterSubspace
class should have a method similar to bit_combo_hierarchy
that gives a hierarchy of only the undecorated Orbit
objects. It will also be helpful if a user can quickly obtain all the sub-orbits (or finer grained sub orbit and bit_combos) for a given orbit (and bit combo).
Describe alternatives you've considered
I would suggest adding functionality to the Orbit
class to determine if a given orbit is a sub-orbit of it. This can also be fine-grained to clean up the bit combo hierarchy code, by enable orbits to check directly if a given orbit and bit combo is a subset of one of its bit combos.
In the old smol.cofe.config_space.utils.get_site_domains, a site domain is generated as a list of strings. Now for more generalized purposes (for example, if we want to consider a molecular fragment as a 'specie', or encode more properties such as magnetization direction into a specie), we may need to return a list of GenealizedSpecie object, rather than just strings.
Other function that uses a site domain as an input should also be modified accordingly.
`def get_site_domains(structure, include_measure=False):
"""Get site domains for sites in a disordered structure.
Helper method to obtain the single site domains for the sites in a
structure. The single site domains are represented by the allowed species
for each site (with an optional measure/concentration for disordered sites)
Vacancies are included in sites where the site element composition does not
sum to 1 (i.e. the total occupation is not 1)
Args:
structure (Structure):
Structure to determine site domains from at least some sites should
be disordered, otherwise there is no point in using this.
include_measure (bool): (optional)
To include the site element compositions as the domain measure.
Returns:
list: Of allowed species for each site if include_measure is False
Ordereddict: Of allowed species and their measure for each site if
include_measure is True`
Recent changes to pymatgen Structure
break its use in ClusterSubspace
to obtain sites in sphere.
The new implementations returns a PeriodicNeighbor
object and no longer a list of tuples.
The new single return value is causing an unpacking error.
At some point just update the unpacking to match new pymatgen signature.
smol.cofe.space.domain.Vacancy can not be copied correctly.
We have a deepcopy method in the cn-sgmc branch, but it's not adapted into the master branch
copied Vacancy() should be exactly the same with before.
from smol.cofe.space.domain import Vacancy
Vacancy().as_dict()
{'@module': 'smol.cofe.space.domain', '@Class': 'Vacancy', 'element': 'A', 'oxidation_state': 0}
Vacancy().copy().as_dict()
Traceback (most recent call last):
File "", line 1, in
File "/home/fengyu_xie/anaconda3/envs/py37/lib/python3.7/site-packages/pymatgen/core/periodic_table.py", line 1475, in getattr
raise AttributeError(a)
AttributeError: copy
from copy import deepcopy
deepcopy(Vacancy()).as_dict()
{'@module': 'pymatgen.core.periodic_table', '@Class': 'DummySpecies', 'element': 'A', 'oxidation_state': 0}
Add copy and deepcopy methods to Vacancy class.
File Name | Module Path | Authors |
---|---|---|
sgcanonical.py | smol.moca.ensembles | Luis Barroso-Luque |
We use code review issues to open up a pinned location to discuss a specific
source file.
Implementation of Semi-Grand Canonical Ensemble Classes.
These are used to run Monte Carlo sampling for fixed number of sites but
variable concentration of species.
Two classes are different SGC ensembles implemented:
MuSemiGrandEnsemble
- for which relative chemical potentials are fixedFuSemiGrandEnsemble
- for which relative fugacity fractions are fixed.Point to specific lines in code
If you are mentioning a set of specific lines in code already implemented, it
may come in handy to add a
permanent link to it.
Format suggested code niceley
def roast(beans, *flavoring_oils):
for oil in flavoring_oils:
beans.add_oil(oils)
return np.trapz(beans.get_heat_vector())
Use other handy features too
Using many of other handy features offered, can help improve discussion.
https://docs.github.com/en
This is just a half chewed idea...
Would it make sense to use ComputedStructureEntry
for items added to StructureWranglers
? As a first benefit it punts part of the saving procedures to pymatgen. But additionally could make the code cleaner and allow an easier path of other analysis that can be done with pymatgen entries.
struct1 = ensemble.processor.structure_from_occupancy(init_occu)
occu2 = ensemble.processor.occupancy_from_structure(struct1)
print (np.where(init_occu != occu2)) # should return (array([], dtype=int64),)
Instead, many occupancies are different. init_occu
is not being reproduced with occu2
:
print (np.where(init_occu != occu2)) # returns (array([ 2, 14, 25, ..., 3447, 3450, 3453]),)
I am using a large cell with 3456 occupancies. There are 1210 occupancies in occu2 which are different from init_occu.
Is it related to the "#noqa" in the occupancy_from_structure() line? My supercell is big (864 supercell) so maybe that has something to do with this bug?
I can provide all my mson, MC data files if helpful. The steps I am taking to get this bug are as follows:
ensemble = CanonicalEnsemble.from_cluster_expansion(expansion, sc_matrix)
init_occu = mc_data[T]['occupancies'][0]
structure1 = ensemble.processor.structure_from_occupancy(init_occu)
occu2 = ensemble.processor.occupancy_from_structure(structure1)
init_occu
and occu2
are different.I would like to analyze occupancies for order parameter calculations during an MC simulation. One example is the Mn-16d occupancy which tells how spinel-like a MC structure is.
I am using the table-swap method, but I don't think using this algorithm should affect the regeneration of the occupancies from structure, and vice versa. I am mystified because I haven't encountered this bug before...
Orthonormalizing the site basis sets in a clusterspace is not recorded when saving the objects as mson dicts, so when reloaded the basis sets are not orthonormal.
The state of a cluster subspace should be saved completely to faithfully recreate the exact same object from a dictionary.
Since the SiteBasis
class is not MSONable, the exact state of the basis sets is not completely saved when saving a ClusterSubspace
.
Either make the SiteBasis
class MSOnable and save its full state, or create boolean flags as attributes to keep track of changes, specifically orthonormalizing. As a quick and dirty fix users can simply "remember" if the were using an orthonormal set and then set the basis accordingly (this should NOT be a long term solution and the issue should be fixed in the source code asap):
subspace = ClusterSubpsace.from_dict(d)
subspace.change_site_bases('same basis as before', orthormal=True)
ClusterSubspace
with orthonormal=True
from smol.cofe import ClusterSubspace
subspace = ClusterSubspace.from_cutoffs(prim, {2: 6, 3: 4}, basis='sinusoid', orthonormal=True)
subspace1 = ClusterSubspace.from_dict(subspace.as_dict())
assert subspace.basis_orthonormal # this is good
assert subspace1.basis_orthonormal # this is not good!
Make the SiteBasis
class MSONable
such that it saves the exact values of all the bases arrays and therefore faithfully saves any changes done to it. (Better solution imo)
Add boolean attributes to record changes in state, ie, orbit._basis_orthonormal, and save in in dict, such that when the object are recreated the basis sets are properly normalized.
ClusterSubspaces
Is your feature request related to a problem? Please describe.
After fitting and creating a ClusterExpansion
object there is currently no saving of the provenance and quality of the fit. Things like the regression model that was used, the hyper-parameters, and model fit metrics is important information that users may want (and may not remember) at some point in the future.
Describe the solution you'd like
It would be nice to save this information in the ClusterExpansion
object. Such that for most practical cases if a user gets a CE object they can easily get the information used in the fit. The main glaring issue here is that this is somewhat heterogeneous information and is not straightforward to create a fixed schema for it.
A list of things to save:
Describe alternatives you've considered
Currently, a CE has a metadata
dictionary as an attribute that is meant to save this information. But the user is fully responsible of saving whatever they deem important and most usually this is basically nothing.
One solution would be to implement a "FitData" dataclass that can be optionally passed to the CE constructor, and if not passed a warning can be raised letting the user know they might regret not keeping this information.
The _encode function should return a integer for function sinusoid to work ,but currently it's written as:
def _encode(self, specie): """Encode species to another set (i.e. species names to integers).""" return specie
This is not very safe to inputs types. If your imput species are strings, the basis functions will break. I have edited the _encode function in my forked version to solve this:
return self.species.index(specie)
There is a pretty big difference in scaling when the ewald term (EwaldCEProcessor) is used in canonical MC compared to when its not used (CEProcessor). For example, a 12x12x10 supercell of my system (containing ~6000 sites) appears to take ~2.5 seconds per attempted flip when the Ewald term is included but only ~.003 seconds per attempted flip without the Ewald term. It seems that the update to the Ewald term is quite expensive per flip once the Ewald matrix gets very large. However, I'm not yet sure which part of updating the Ewald term is the limiting step.
I'm not entirely sure what a solution to this would look like. Potentially only using the reciprocal space part of the Ewald sum would help. There are also likely a number of instances in literature of people trying to optimize electrostatic/Ewald sum calculations that we can turn to.
Make sure that a code review issue for the source file you are about to create
one does not already exist. Please edit the above fields based on the file you
are opening this issue for.
File Name | Module Path | Authors |
---|---|---|
base.py | smol.moca.ensembles | Luis Barroso-Luque |
We use code review issues to open up a pinned location to discuss a specific
source file.
Abstract base class for Monte Carlo Ensembles.
Point to specific lines in code
If you are mentioning a set of specific lines in code already implemented, it
may come in handy to add a
permanent link to it.
Format suggested code niceley
def roast(beans, *flavoring_oils):
for oil in flavoring_oils:
beans.add_oil(oils)
return np.trapz(beans.get_heat_vector())
Use other handy features too
Using many of other handy features offered, can help improve discussion.
https://docs.github.com/en
The method ClusterSubspace.bit_combo_hierarchy(self, min_size=2, invert=False)
should give a list of lists where each sub list should give the id's of all correlation functions that are subfunctions (or factors) of the one for the corresponding index of the outer list.
i.e. bit_combo_hierarchy()[l] = [i, j, k]
where i, j, k
are sub corr functions of corr function l
.
Currently the method can miss some correlation functions in systems with only 1 Sublattices with species n > 2. (It will give a weak inclusion constraints). @zhongpc and I looked into this and it seems it is because some fractional coordinates will not be correctly matched.
For systems with more than 1 sublattice that have n > 2 (such as two different ternary sublattices). The method can add additional incorrect hierarchy entries, since it does not distinguish which bit indices correspond to each different sublattice.
I have implemented a new method ClusterSubspace.function_hierarchy(self, level=1, min_size=2, invert=False)
, that I think addresses both issues above, but I have not tested it yet.
The simulated annealing method in the ensemble classes is not tested. There is a TODO in the code, but going to go ahead and just create an issue for it.
smol/tests/test_moca/test_canonical.py
Line 111 in 9e4625d
It's not super straightforward whats the best type of unit test for this, but something is better than nothing.
When a CompositeProcessor is created using as_dict(), it does not recreate the coefs property. Not sure whether it's a bug or not, but may be helpful if it is more clear how to recreate the CompositeProcessor. I believe this should be easily fixable though.
When recreated from CompositeProcessor.from_dict(), the new CompositeProcessor object should have a coefs property, as I believe this is necessary to be used in an ensemble.
The CompositeProcessor is instantiated with an empty array.
When created from a dictionary, CompositeProcessor can set its coefficients based on its constituent Processor objects.
This isn't a super necessary change, as it's easy enough to add the coefs once its apparent that this is the issue, but it's slightly more clear if the coefficients are automatically set based on the coefficients of the CompositeProcessor's individual processors.
As a followup of today's discussion. I have a few feature branches suggestions in mind that we probably want to work on in the near feature. Feel free to modify or comment on it. I feel some of them are a bit urgent as we may want to update in next Materials Project meeting or we want to send papers out.
The purpose of this issue is just to plan ahead of time so we can make progress with some directions, also we can prioritize something that is more important. We can also come back to amend and merge some branches later.
Implementing cluster hierarchy as a part of clustersubspace will greatly help with the implementation of hierarchy constraints.
In near future, we will make peichen's L0L2-hierarchy method as default, therefore adding this part to smol is essential.
Is your feature request related to a problem? Please describe.
During an attempted MC flip, a difference in the property vector needs to be computed. We can call this dProp.
However, the user should be able to easily check that dProp is close to the true value, which is the CE-predicted final state minus the CE-predicted initial state.
Describe the solution you'd like
Ideally the solution would be some assertion check, which would look like:
assertTrue(np.allclose(dProp, CE.predict(final) - CE.predict(initial), rtol = rtol, atol = atol))
The sanity check should be within some numerical tolerance such that the energy drift over many MC steps should be small.
Describe alternatives you've considered
Alternatives would be user should carry out this test on their own.
Implement allowing using cooling functions for the simulated annealing method in the ensemble classes.
The option is already there but it has not actually been implemented.
smol/smol/moca/ensembles/canonical.py
Line 103 in 9e4625d
Some testing would be necessary too (#26).
max_lp = max(exp_struct.lattice.abc) / 2
for size, diameter in sorted(cutoffs.items()):
new_orbits = []
neighbors = exp_struct.get_sites_in_sphere([0.5, 0.5, 0.5],
diameter + max_lp,
include_index=True)
get neighbors in [0.5,0.5,0.5] with a max_lp
can potentially miss the interactions with the centroid. For example, four-anion interactions on a tetrahedra.
To resolve this issue, by changing max_lp = max(exp_struct.lattice.abc) / 2
into max_lp = max(exp_struct.lattice.abc)
would work.
This gives exactly the same correlation function as pyabinitio. (More general tests all also welcomed)
File Name | Module Path | Authors |
---|---|---|
clusterspace.py | smol.cofe.configspace | Luis Barroso-Luque, William Davidson Richard |
We use code review issues to open up a pinned location to discuss a specific
source file.
This file contains the implementation of the ClusterSubspace
class.
A ClusterSubspace
is necessary to define the terms to be included in a cluster
expansion. A cluster subspace is a finite set of clusters, more precisely
orbits that contain symmetrically equivalent clusters, that are used to define
orbit/cluster basis functions which span a subspace of the total function space
over the configurational space of a given lattice system.
Point to specific lines in code
If you are mentioning a set of specific lines in code already implemented, it
may come in handy to add a
permanent link to it.
Format suggested code niceley
def roast(beans, *flavoring_oils):
for oil in flavoring_oils:
beans.add_oil(oils)
return np.trapz(beans.get_heat_vector())
Use other handy features too
Using many of other handy features offered, can help improve discussion.
https://docs.github.com/en
Is your feature request related to a problem? Please describe.
As mentioned by The Boss on slack, allowing to use only the reciprocal space portion of the ewald summation term to be used as a feature and in subsequent monte carlo sampling could be useful to account only for the long range electrostatic interactions, and hopefully let the CE pair terms take care of the short term ones.
Describe the solution you'd like
Solution should be "simple" to implement (maybe a little more involved to test and be comfortable with results). This involves simply allowing one to use the reciprocal space matrix instead of the total energy matrix for computing the feature. For the monte carlo we just need to make sure we carefully account for the local contributions only.
When loading a ClusterSubspace
created with a polynomial site basis (ie Chebyshev) from a mson serialized dictionary, loading fails when attempting to create the SiteBasis
objects in the orbits.
The cluster subspace object should be created without any errors.
An error is thrown when attempting to create the SiteBasis
objects in the Orbits
. Here is the traceback:
Traceback (most recent call last):
File "/home/lbluque/Develop/smol/smol/utils.py", line 45, in derived_class_factory
derived_class = get_subclasses(base_class)[class_name]
KeyError: 'AbstractIterator'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/lbluque/opt/miniconda3/envs/matx_dev/lib/python3.8/site-packages/IPython/core/interactiveshell.py", line 3331, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-5-5e9c1382ac61>", line 1, in <module>
cs2 = loadfn('test.mson')
File "/home/lbluque/opt/miniconda3/envs/matx_dev/lib/python3.8/site-packages/monty/serialization.py", line 88, in loadfn
return json.load(fp, *args, **kwargs)
File "/home/lbluque/opt/miniconda3/envs/matx_dev/lib/python3.8/json/__init__.py", line 293, in load
return loads(fp.read(),
File "/home/lbluque/opt/miniconda3/envs/matx_dev/lib/python3.8/json/__init__.py", line 370, in loads
return cls(**kw).decode(s)
File "/home/lbluque/opt/miniconda3/envs/matx_dev/lib/python3.8/site-packages/monty/json.py", line 368, in decode
return self.process_decoded(d)
File "/home/lbluque/opt/miniconda3/envs/matx_dev/lib/python3.8/site-packages/monty/json.py", line 340, in process_decoded
return cls_.from_dict(data)
File "/home/lbluque/Develop/smol/smol/cofe/space/clusterspace.py", line 866, in from_dict
orbits = {int(s): [Orbit.from_dict(o) for o in v]
File "/home/lbluque/Develop/smol/smol/cofe/space/clusterspace.py", line 866, in <dictcomp>
orbits = {int(s): [Orbit.from_dict(o) for o in v]
File "/home/lbluque/Develop/smol/smol/cofe/space/clusterspace.py", line 866, in <listcomp>
orbits = {int(s): [Orbit.from_dict(o) for o in v]
File "/home/lbluque/Develop/smol/smol/cofe/space/orbit.py", line 345, in from_dict
site_bases = [basis_factory(*sb_d) for sb_d in site_bases]
File "/home/lbluque/Develop/smol/smol/cofe/space/orbit.py", line 345, in <listcomp>
site_bases = [basis_factory(*sb_d) for sb_d in site_bases]
File "/home/lbluque/Develop/smol/smol/cofe/space/basis.py", line 352, in basis_factory
basis_funcs = derived_class_factory(iterator_name, BasisIterator, species)
File "/home/lbluque/Develop/smol/smol/utils.py", line 48, in derived_class_factory
raise NotImplementedError(f'{class_name} is not implemented.')
NotImplementedError: AbstractIterator is not implemented.
Seems like the issue is the derived_class_factory
in basis_factory
is not being able to recursively find the derived class ChebyshevIterator
from BasisIterator
. Probably need to fix that. The issue does not occur with other basis iterator classes that are derived directly from BasisIterator
.
ClusterSubspace
with basis=chebyshev
as_dict
)ClusterSubspace
from monty.serialization import loadfn, dumpfn
from smol.cofe import ClusterSubspace
subspace = ClusterSubspace.from_cutoffs(prim, {2: 6}, basis='chebyshev')
# This throws error
dumpfn(subspace, 'test.mson')
subspace1 = loadfn('test.mson')
# So does this
subspace1 = ClusterSubspace.from_dict(subspace.as_dict())
The cached dictionary for orbit mappings for different supercells can be very very large in terms of file size when saving a ClusterSubspace
or a StructureWrangler
with a cluster subspace that has seen many different structures and cached their orbit mappings.
This is not a "bug" per say but it can become hard to load large files since python consumes much more memory than the file itself during loading which can lead to crashing python itself.
It may be useful to find a more efficient way to save it, or an option to load or not the cached mappings.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.