lab-cosmo / sphericart Goto Github PK

View Code? Open in Web Editor NEW

58.0 5.0 11.0 7.12 MB

Multi-language library for the calculation of spherical harmonics in Cartesian coordinates

Home Page: https://sphericart.readthedocs.io/en/latest/

License: MIT License

C 2.92% Python 25.41% CMake 4.55% C++ 48.57% Cuda 9.54% Shell 0.33% Julia 8.68%

geometric-deep-learning python pytorch spherical-harmonics

sphericart's People

Stargazers

Watchers

Forkers

unixjunkie felixmusil kanduri hapemask cortner davidyen1011-twnb1 nancysnals m-stack-org tjjarvinen aqhali

sphericart's Issues

Enforce C/C++ formatting in CI

Chose a C/C++ format and enforce it in the CI to avoid large changes as people modify the C/C++ files.

Julia Implementation roadmap

A few points left to do for the Julia implementation.

careful performance comparison and see where Julia is slower than C++ (or vice-versa).
ChainRules.jl compatibility
multi-threaded kernels for larger input batches
GPU kernels.
better generic kernels for single inputs using the optimized polynomial expressions for small L

Improve naming of function arguments and relative documentation

Perhaps we could rename dsph and ddsph to something more meaningful.

Tagging @felixmusil who seemed interested. Challenge here would be to reuse as much as possible the CUDA code @nickjbrowning wrote for libtorch. The whole concept is to keep the codebase as compact as possible.

Small typo in documentation

sphericart/docs/src/maths.rst

Line 16 in 0fc2f9f

harmonics, we express them as a function of the full Cartesian cooedinates of a

see "cooedinates"

Segmentation fault on large tensors

While computing spherical harmonics on large tensors, one may face the segmentation fault with signal 11. The following example should reproduce the error.

import torch
import sphericart.torch as sphericart

if __name__ == '__main__':
    calculator = sphericart.SphericalHarmonics(l_max=12, normalized=True)
    vectors = 5.0 * torch.rand(100000000, 3)
    sp_harm = calculator.compute(vectors)

Naive memory profiling of this script gives the following feedback:

/usr/bin/time -v python test_sphericart.py 
Command terminated by signal 11
        Command being timed: "python test_sphericart.py"
        User time (seconds): 3.54
        System time (seconds): 1.09
        Percent of CPU this job got: 64%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 0:07.22
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 2578860
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 0
        Minor (reclaiming a frame) page faults: 67800
        Voluntary context switches: 8953
        Involuntary context switches: 97625
        Swaps: 0
        File system inputs: 0
        File system outputs: 0
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0

Derivative with l=0 and m=0

Thank you for providing good library.
I am theoretical physicist. I am trying to use sphericart in C++.
I wrote the following test code.

    std::vector<double> Rij = {5.934, 0, 4.649};
    int l_max = 1;
    auto Ylmcalculator = std::make_shared<sphericart::SphericalHarmonics<double>>((size_t)l_max);
    auto sph = std::vector<double>(1 * (l_max + 1) * (l_max + 1), 0.0);
    auto dsph = std::vector<double>(1 * 3 * (l_max + 1) * (l_max + 1), 0.0);
    Ylmcalculator->compute_with_gradients(Rij, sph, dsph);

    int count = 0;
    for (int p = 0; p < 3; p++)
    {
        for (int l = 0; l <= l_max; l++)
        {
            for (int m = -l; m <= l; m++)
            {
                std::cout << "p,l,m,dYdR " << p  << " " << l << " " << m << " "  << " " <<  dsph.at(count) << std::endl;
                count += 1;
            }
        }
    }

The output is

p,l,m,dYdR 0 0 0  0
p,l,m,dYdR 0 1 -1  0
p,l,m,dYdR 0 1 0  0.488603
p,l,m,dYdR 0 1 1  0
p,l,m,dYdR 1 0 0  0.488603
p,l,m,dYdR 1 1 -1  0
p,l,m,dYdR 1 1 0  0
p,l,m,dYdR 1 1 1  0
p,l,m,dYdR 2 0 0  0
p,l,m,dYdR 2 1 -1  0
p,l,m,dYdR 2 1 0  0
p,l,m,dYdR 2 1 1  0

However, I think the spherical harmonics with l=m=0 does not depend on theta and phi so dY/dx=dY/dy=dY/dz should be zero.
In the document, I saw

the leading dimension represents the different samples, while the inner-most dimension size is (l_max + 1) * (l_max + 1), and it represents the degree and order of the spherical harmonics (again, organized in lexicographic order). The intermediate dimension corresponds to different spatial derivatives of the spherical harmonics: x, y, and z, respectively.

Could you tell me what happens in my code? Or could you tell me how to fix my code?

pytorch example

Hi,

I installed with pip the library and the torch bindings (with a fresh conda environment with torch 2.0.0) and I run python examples/pytorch/example.py --normalized which gave the error:

Float vs double relative error: 3.25688027e-07
Check derivative difference: 0.0
torch.Size([1000, 121, 3]) torch.Size([1000, 3, 121])
Traceback (most recent call last):
  File "/home/musil/git/sphericart/examples/pytorch/example.py", line 108, in <module>
    sphericart_example(args.l, args.s, args.normalized)
  File "/home/musil/git/sphericart/examples/pytorch/example.py", line 78, in sphericart_example
    f"Check fw derivative difference CPU vs CUDA: {torch.norm(dsh_sphericart_cuda.to('cpu')-dsh_sphericart)}"
RuntimeError: The size of tensor a (3) must match the size of tensor b (121) at non-singleton dimension 2

As you can see, the gradients in the cpu/cuda implementation return shapes with swapped axis: [cuda]torch.Size([1000, 121, 3]) [cpu]torch.Size([1000, 3, 121]).

Do you also observe such behavior?

Make APIs as uniform as possible

At the moment, the APIs for C/C++/NumPy/torch, JAX, Julia and CUDA are all slightly different. We should discuss up to what point we should aim at making them uniform and where we should instead give way to the idioms of each language/framework

Add CI runner to check we can compile with older cmake versions

We had a report that the code fails to compile with cmake 3.23, which is not ideal. It would be nice to make sure we can compile the code on older cmake, idealy cmake 3.16 like metatensor/rascaline (this is the version on Debian stable).

(Julia) Switch to Bumper.jl

@tjjarvinen - I think when exploring a switch from ObjectPools to Bumper, sphericart should be our first priority since it's more public than the other repos.

`SPHERICART_OPENMP` and `SPHERICART_ENABLE_CUDA` are ignored when the compiler does not support them

If a user tries to build sphericart with either of these option set to ON, and cmake can not find the requirements, we will just print a message Could not find a CUDA compiler or Could not find OpenMP, and continue building the code as if these options where disabled.

I see how this is the most sensible when building the code from setup.py, where we want to enable these if possible but still build in the other case, but it might create issues for C++ users who think they built everything right but then get slowdown/crash later at runtime.

At the very least, both should be a warning when enabled but not found (currently only OpenMP is); and I would appreciate if we can turn these into errors without breaking the python build. Maybe we could have a IMPLICIT_DISABLE_IS_ERROR option, ON by default and set to OFF by setup.py.

Add copy/move constructors to the C++ classes

We are missing copy and move constructors in the CPU and CUDA SphericalHarmonics classes, we should add them!

Cannot install for JAX

Dear all,
great project! I look forward to using this in Jax, especially since the Jax.scipy implementation does not work well.
I am trying to pip-install it, but I got the following error

WARNING: sphericart 0.3.0 does not provide the extra 'jax'

Then I tried installing from the cloned repository, but I get this error

         raise CalledProcessError(retcode, process.args,
      subprocess.CalledProcessError: Command '['cmake', '--build', '/XXXXXX/XXXXX/third_party/sphericart/sphericart-jax/build/cmake-build', '--parallel', '--target', 'install']' returned non-zero exit status 2.

(where XXXXX is my folder path)
I also tried following this instruction, but the cmake compilation failed.

Any clue?

System info:

linux 20.04
cuda ~~12.1~~ 12.5
python 3.9
jax/jaxlib:0.4.29

Building in write-disabled containerized environments.

At CSCS we have moved onto using images for providing software stacks. These images are mounted into a read-only directory. This creates an issue with Sphericart, since we attempt to hack the cudnn_version.h file if we are using the PyTorch installation provided by the image.

@Luthaf do you know if we still need this hack or can it be removed? I've commented it out locally on our cluster and it still builds fine with the latest PyTorch.

Add an example of using the torchscript class from a C++ project

I think this is another good use case and it'd be nice to show a boilerplate example of usage.

Add Rust bindings

This would be the first step to be able to try sphericart in rascaline. This should be relatively easy, since sphericart is small I would always statically link it, removing the need for separate sphericart-sys package.

Add Julia bindings

This will require multiple steps:

Register the C package with Yggdrasil, creating a Sphericart_jll package. This will require making sure we can cross-compile the C++ code from Linux to all supported architectures and OS;
Write the pure Julia bindings, re-exposing the functions from Sphericart_jll with a nicer Julia API

Improvements to installation docs

Hi there, I've just gone through the process of installing sphericart with CUDA for the first time and I'd suggest some minor improvements to the related docs:

Mention that the target GPU needs to be available during installation (relevant on HPC, where the login node may not have one)
Mention that some fairly serious amount of RAM is required (see #115)

Additionally, the docs on building the C++ code could benefit from:

Mention that libtorch is a requirement and where it should be exported (CMAKE_PREFIX_PATH)
Mention the need for cmake to be at least version 3.24 (otherwise the detection of which target GPU to target fails)

Overall, it'd be a good idea to make it possible to build for non-present GPUs, but this can't be accomplished by slightly polishing the docs. :)

Register Julia Package

To register the Julia package with the General Registry one needs to

Install the Registrator bot. I assume I don't have the rights to do this.
update the version number in the Project.toml - we need to decide on 0.0.2 or 0.1.0; I don't mind either way, I think it is ready for 0.1.0.
trigger the registrator bot (I believe I can do that myself once it is installed)

Collect code coverage

This will require a new CI job.

For C and C++ code coverage, we will need to build the code with the -coverage gcc/clang flag; then running the code (from Python or directly) and then collecting coverage with gcov/lcov.

For Python code coverage, we can use the coverage package.

I don't know how to do this for CUDA, @nickjbrowning is there a similar flag for nvcc?

Bug in neural network training with gradients

The derivatives of the gradient errors with respect to the weights are computed, but it seems that one or more terms are missing or incorrect. A temporary fix is to use backward_second_derivatives=True at class initialization.

Complex SH?

Dear all,
Thank you very much for your amazing library.
It works very well, and my collaborators are enthusiastically using it.
We would like to ask if you are thinking of implementing also complex spherical harmonics.

Check pip version

@Luthaf and I think we could do some version checking of pip. This would ensure that builds don't fail in silly ways due to simply having an old pip version

Excessive memory usage during compilation with pip

Currently, attempting to build the sphericart-torch wheel with pip requires a large amount of RAM if many CPU cores are present. I think this is due to this line, which invokes cmake without specifying the number of jobs, which presumably will default to the total number of cores. On a HPC system those can be 40 or 80, and so compilation tends to get killed by the host OS.

While this is not catastrophic, it is inconvenient, and a waste of resources in many cases (the compilation is not much faster in parallel mode). I would suggest defaulting to some reasonable default instead, or disabling parallel builds entirely. Alternatively, the installation docs should at least mention this fact (see #116).

ERROR: Could not build wheels for sphericart, which is required to install pyproject.toml-based projects

Hi all,

i did the following and could not install it to my conda environment.

Open python terminal of my environment with python3.9
Type "git clone https://github.com/lab-cosmo/sphericart"
Type "cd .\sphericart"
Type "pip install ."

Then i get the following error:
Building wheel for sphericart (pyproject.toml) did not run successfully.
│ exit code: 1

What am i doing wrong? I have a standard windows computer.

kr Simon

installing pytorch bindings

Hey, I am trying to install sphericart with pytorch bindings with GPU support on a cluster with: pip install .[torch] but it fails. After some debugging it turns out that in sphericart/torch/pyproject.toml, torch is required for installation but it is done in isolation by pip so it pulls the latest torch with cpu-only support (2.0.1). I am using an older version of pytorch (1.13.1) so the resulting library is not compatible hence the failure.

CPU support with CUDA build

Hi! I've built sphericart with CUDA support (with pip install .[torch] in a clone of this repo). Now, whenever I run tests on a compute node without GPU, the CPU fallback appears to not be working, and the program crashes with

CUDA error at /home/langer/software/sphericart/sphericart-torch/sphericart/src/sphericart_cuda.cu:42 - no CUDA-capable device is detected

Apparently, sphericart is trying to go through the GPU code path despite all involved tensors being on the cpu device. This in in contrast with the docs, which state "Depending on the device the tensor is stored on, [...], the calculations will be performed [...] using the CPU or CUDA implementation."

It'd be nice to fix this, as this makes it rather annoying to debug when a GPU is not readily available.

Cheers!

lab-cosmo / sphericart Goto Github PK

sphericart's People

Stargazers

Watchers

Forkers

sphericart's Issues

Recommend Projects

Recommend Topics

Recommend Org