Giter Club home page Giter Club logo

Comments (4)

lauri-codes avatar lauri-codes commented on May 24, 2024 1

Hi @arosen93,

Ok, great! Indeed your script very nicely demonstrates how the two different will give the same result (when using a linear metric). In SOAP the average=True indeed already takes the average of all atoms in the structure. Thanks for the docfix, now updated.

from dscribe.

lauri-codes avatar lauri-codes commented on May 24, 2024

Hi @arosen93!

Just to be clear: are you sure that you want to calculate the AverageKernel and not something else? I would expect that if you are working with such big structures and large amount of them, you would instead want to work with a single averaged global descriptor per structure instead of AverageKernel (as is done in many cases, e.g. in the DScribe article). Especially, if you are only using metric="linear" in AverageKernel, you can get the same exact result much faster by simply working with a single descriptor per structure that is an average over all atoms in that structure.

If, however, you really want to work with constructing the similarity by individually comparing the local sites, I'm afraid I don't have very good suggestions for going around the time complexity, which seems to be fixed at around O(n_structures^2*n_atoms_per_structure^2). The simplest speedup that I could think for this is the embarrassingly parallel version that you have already tried, but maybe spread over multiple nodes as NumPy will probably already try to efficiently use all cores on a single machine.

Hope this helps!

from dscribe.

Andrew-S-Rosen avatar Andrew-S-Rosen commented on May 24, 2024

Hi @lauri-codes!

Thank you for taking the time to reply. I will say I'm not 100% certain (only because I am very new to this field). For context, I was trying to reproduce the protocol in a paper where the authors say they used an average kernel to get a global similarity kernel for kernel ridge regression. So, in that sense I think I was trying to do the right thing, even if it wasn't necessarily the best route in practice.

Just to make sure I fully understand, how would one go about calculating the aforementioned single averaged global descriptor per structure? I see in the dscribe paper it says "we also include results using a simple averaged SOAP output for all atoms in the simulation cell." I assume this equivalent to the average=True flag in the SOAP object? I think my confusion lies in how to go from this set of feature vector vectors to something useful for regression.

Thanks for the suggestion regarding the time-consuming nature of the kernel calculation. That is what I figured, but I wanted to double-check.

EDIT:

Thanks for the reply. I ended up figuring it out by going back to the 2016 PCCP paper. Anyway, you can close this issue since it was mostly due to me misunderstanding.

from dscribe.descriptors import SOAP
from dscribe.kernels import AverageKernel
import numpy as np
from ase.build import molecule

species = ["H", "C", "O", "N"]
rcut = 6.0
nmax = 8
lmax = 6
soap = SOAP(
    species=species,
    periodic=False,
    rcut=rcut,
    nmax=nmax,
    lmax=lmax
)

N_features = soap.get_number_of_features()

water = molecule("H2O")
methanol = molecule('CH3OH')
h2o2 = molecule('H2O2')
molecules = [water,methanol,h2o2]
soaps = []
avg_soaps = np.zeros((len(molecules),N_features))
for i, mol in enumerate(molecules):
	soap_temp = soap.create(mol)
	soaps.append(soap_temp)
	avg_soaps[i,:] = soap_temp.mean(axis=0)

K1 = avg_soaps.dot(avg_soaps.T)
K1 = K1/np.sqrt(np.einsum('ii,jj->ij', K1, K1))

avg_kernel = AverageKernel(metric="linear")
K2 = avg_kernel.create(soaps)

print(K2-K1)

from dscribe.

Andrew-S-Rosen avatar Andrew-S-Rosen commented on May 24, 2024

P.S. Super minor docfix. In the tutorials page for SOAP here, the last block of code uses soap_peroxide, which isn't defined in the doc (the h2o2 molecule is created but no SOAP object created for that molecule). There's just a disconnect between the example script and the live docs.

from dscribe.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.