Comments (4)
Hi @arosen93,
Ok, great! Indeed your script very nicely demonstrates how the two different will give the same result (when using a linear metric). In SOAP the average=True
indeed already takes the average of all atoms in the structure. Thanks for the docfix, now updated.
from dscribe.
Hi @arosen93!
Just to be clear: are you sure that you want to calculate the AverageKernel
and not something else? I would expect that if you are working with such big structures and large amount of them, you would instead want to work with a single averaged global descriptor per structure instead of AverageKernel (as is done in many cases, e.g. in the DScribe article). Especially, if you are only using metric="linear"
in AverageKernel
, you can get the same exact result much faster by simply working with a single descriptor per structure that is an average over all atoms in that structure.
If, however, you really want to work with constructing the similarity by individually comparing the local sites, I'm afraid I don't have very good suggestions for going around the time complexity, which seems to be fixed at around O(n_structures^2*n_atoms_per_structure^2)
. The simplest speedup that I could think for this is the embarrassingly parallel version that you have already tried, but maybe spread over multiple nodes as NumPy will probably already try to efficiently use all cores on a single machine.
Hope this helps!
from dscribe.
Hi @lauri-codes!
Thank you for taking the time to reply. I will say I'm not 100% certain (only because I am very new to this field). For context, I was trying to reproduce the protocol in a paper where the authors say they used an average kernel to get a global similarity kernel for kernel ridge regression. So, in that sense I think I was trying to do the right thing, even if it wasn't necessarily the best route in practice.
Just to make sure I fully understand, how would one go about calculating the aforementioned single averaged global descriptor per structure? I see in the dscribe paper it says "we also include results using a simple averaged SOAP output for all atoms in the simulation cell." I assume this equivalent to the average=True
flag in the SOAP
object? I think my confusion lies in how to go from this set of feature vector vectors to something useful for regression.
Thanks for the suggestion regarding the time-consuming nature of the kernel calculation. That is what I figured, but I wanted to double-check.
EDIT:
Thanks for the reply. I ended up figuring it out by going back to the 2016 PCCP paper. Anyway, you can close this issue since it was mostly due to me misunderstanding.
from dscribe.descriptors import SOAP
from dscribe.kernels import AverageKernel
import numpy as np
from ase.build import molecule
species = ["H", "C", "O", "N"]
rcut = 6.0
nmax = 8
lmax = 6
soap = SOAP(
species=species,
periodic=False,
rcut=rcut,
nmax=nmax,
lmax=lmax
)
N_features = soap.get_number_of_features()
water = molecule("H2O")
methanol = molecule('CH3OH')
h2o2 = molecule('H2O2')
molecules = [water,methanol,h2o2]
soaps = []
avg_soaps = np.zeros((len(molecules),N_features))
for i, mol in enumerate(molecules):
soap_temp = soap.create(mol)
soaps.append(soap_temp)
avg_soaps[i,:] = soap_temp.mean(axis=0)
K1 = avg_soaps.dot(avg_soaps.T)
K1 = K1/np.sqrt(np.einsum('ii,jj->ij', K1, K1))
avg_kernel = AverageKernel(metric="linear")
K2 = avg_kernel.create(soaps)
print(K2-K1)
from dscribe.
P.S. Super minor docfix. In the tutorials page for SOAP here, the last block of code uses soap_peroxide
, which isn't defined in the doc (the h2o2 molecule is created but no SOAP object created for that molecule). There's just a disconnect between the example script and the live docs.
from dscribe.
Related Issues (20)
- Is it possible to parallelize `lmbtr.create` when working on one `ase.Atoms` object? HOT 3
- Error with np.str (NumPy >= 1.24) HOT 1
- Descriptor that recognizes each atom of the same species differently HOT 1
- The example in README.md is not correct HOT 1
- [Bug] Error in SOAP derivatives when using weighting. HOT 2
- API compatibility is broken since 0696656 HOT 1
- ACSF.create cannot accept cartesian positions as "centers" parameter HOT 2
- Numpy operations on sparsed derivatives HOT 5
- Similarity based on Average kernel obtain deferent value between each atom and its replica atoms. HOT 1
- Similarity value is different between equivalent atoms HOT 5
- Segmentation fault in SOAP for l_max > 9 HOT 2
- Analytic Integral of SH expansion coefficients HOT 2
- Descsize of ASCF HOT 1
- conda channel has no function for features' derivatives
- `CoulombMatrix(permutation="sorted_l2")` is not symmetric HOT 5
- Naming incosistency of rcut in SOAP and MBTR HOT 2
- Potential memory leak in MBTR HOT 2
- Analytical derivatives of SOAP HOT 4
- Identical geometry but similarity < 1 HOT 4
- Numerical SOAP derivatives for periodic systems HOT 6
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dscribe.