Currently, the implementation of dscribe.kernels.localsimila

Hi @arosen93, Ok, great! Indeed your very nicely demonstrates

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

P.S. Super minor docfix. In the tutorials page for SOAP <a href="https://singroup.gith

Slow generation of kernels about dscribe HOT 4 CLOSED

singroup commented on May 24, 2024

Slow generation of kernels

from dscribe.

Comments (4)

lauri-codes commented on May 24, 2024 1

Hi @arosen93,

Ok, great! Indeed your script very nicely demonstrates how the two different will give the same result (when using a linear metric). In SOAP the average=True indeed already takes the average of all atoms in the structure. Thanks for the docfix, now updated.

from dscribe.

lauri-codes commented on May 24, 2024

Hi @arosen93!

Just to be clear: are you sure that you want to calculate the AverageKernel and not something else? I would expect that if you are working with such big structures and large amount of them, you would instead want to work with a single averaged global descriptor per structure instead of AverageKernel (as is done in many cases, e.g. in the DScribe article). Especially, if you are only using metric="linear" in AverageKernel, you can get the same exact result much faster by simply working with a single descriptor per structure that is an average over all atoms in that structure.

If, however, you really want to work with constructing the similarity by individually comparing the local sites, I'm afraid I don't have very good suggestions for going around the time complexity, which seems to be fixed at around O(n_structures^2*n_atoms_per_structure^2). The simplest speedup that I could think for this is the embarrassingly parallel version that you have already tried, but maybe spread over multiple nodes as NumPy will probably already try to efficiently use all cores on a single machine.

Hope this helps!

from dscribe.

Andrew-S-Rosen commented on May 24, 2024

Hi @lauri-codes!

Thank you for taking the time to reply. I will say I'm not 100% certain (only because I am very new to this field). For context, I was trying to reproduce the protocol in a paper where the authors say they used an average kernel to get a global similarity kernel for kernel ridge regression. So, in that sense I think I was trying to do the right thing, even if it wasn't necessarily the best route in practice.

Just to make sure I fully understand, how would one go about calculating the aforementioned single averaged global descriptor per structure? I see in the dscribe paper it says "we also include results using a simple averaged SOAP output for all atoms in the simulation cell." I assume this equivalent to the average=True flag in the SOAP object? I think my confusion lies in how to go from this set of feature vector vectors to something useful for regression.

Thanks for the suggestion regarding the time-consuming nature of the kernel calculation. That is what I figured, but I wanted to double-check.

EDIT:

Thanks for the reply. I ended up figuring it out by going back to the 2016 PCCP paper. Anyway, you can close this issue since it was mostly due to me misunderstanding.

from dscribe.descriptors import SOAP
from dscribe.kernels import AverageKernel
import numpy as np
from ase.build import molecule

species = ["H", "C", "O", "N"]
rcut = 6.0
nmax = 8
lmax = 6
soap = SOAP(
    species=species,
    periodic=False,
    rcut=rcut,
    nmax=nmax,
    lmax=lmax
)

N_features = soap.get_number_of_features()

water = molecule("H2O")
methanol = molecule('CH3OH')
h2o2 = molecule('H2O2')
molecules = [water,methanol,h2o2]
soaps = []
avg_soaps = np.zeros((len(molecules),N_features))
for i, mol in enumerate(molecules):
	soap_temp = soap.create(mol)
	soaps.append(soap_temp)
	avg_soaps[i,:] = soap_temp.mean(axis=0)

K1 = avg_soaps.dot(avg_soaps.T)
K1 = K1/np.sqrt(np.einsum('ii,jj->ij', K1, K1))

avg_kernel = AverageKernel(metric="linear")
K2 = avg_kernel.create(soaps)

print(K2-K1)

from dscribe.

Andrew-S-Rosen commented on May 24, 2024

P.S. Super minor docfix. In the tutorials page for SOAP here, the last block of code uses soap_peroxide, which isn't defined in the doc (the h2o2 molecule is created but no SOAP object created for that molecule). There's just a disconnect between the example script and the live docs.

from dscribe.

Slow generation of kernels about dscribe HOT 4 CLOSED

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent