teddykoker / torchsort Goto Github PK

View Code? Open in Web Editor NEW

739.0 8.0 33.0 578 KB

Fast, differentiable sorting and ranking in PyTorch

Home Page: https://pypi.org/project/torchsort/

License: Apache License 2.0

Python 45.06% C++ 24.98% Shell 0.14% Cuda 29.81%

pytorch sort ranking cuda-kernel

torchsort's Introduction

Torchsort

Fast, differentiable sorting and ranking in PyTorch.

Pure PyTorch implementation of Fast Differentiable Sorting and Ranking (Blondel et al.). Much of the code is copied from the original Numpy implementation at google-research/fast-soft-sort, with the isotonic regression solver rewritten as a PyTorch C++ and CUDA extension.

Install

pip install torchsort

To build the CUDA extension you will need the CUDA toolchain installed. If you want to build in an environment without a CUDA runtime (e.g. docker), you will need to export the environment variable TORCH_CUDA_ARCH_LIST="Pascal;Volta;Turing;Ampere" before installing.

Conda Installation

On some systems the package my not compile with `pip` install in conda environments. If this happens you may need to:

Install g++ with conda install -c conda-forge gxx_linux-64=9.40
Run export CXX=/path/to/miniconda3/envs/env_name/bin/x86_64-conda_cos6-linux-gnu-g++
Run export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/path/to/miniconda3/lib
pip install --force-reinstall --no-cache-dir --no-deps torchsort

Thanks to @levnikmyskin, @sachit-menon for pointing this out!

Pre-built Wheels

Pre-built wheels are currently available on Linux for recent Python/PyTorch/CUDA combinations:

# torchsort version, supports >= 0.1.9
export TORCHSORT=0.1.9
# PyTorch version, supports pt21, pt20, and pt113 for versions 2.1, 2.0, and 1.13 respectively
export TORCH=pt21
# CUDA version, supports cpu, cu113, cu117, cu118, and cu121 for CPU-only, CUDA 11.3, CUDA 11.7,
# CUDA 11.8 and CUDA 12.1 respectively
export CUDA=cu121
# Python version, supports cp310 and cp311 for versions 3.10 and 3.11 respectively
export PYTHON=cp310

pip install https://github.com/teddykoker/torchsort/releases/download/v${TORCHSORT}/torchsort-${TORCHSORT}+${TORCH}${CUDA}-${PYTHON}-${PYTHON}-linux_x86_64.whl

Thanks to @siddharthab for the help creating the build action!

Usage

torchsort exposes two functions: soft_rank and soft_sort, each with parameters regularization ("l2" or "kl") and regularization_strength (a scalar value). Each will rank/sort the last dimension of a 2-d tensor, with an accuracy dependent upon the regularization strength:

import torch
import torchsort

x = torch.tensor([[8, 0, 5, 3, 2, 1, 6, 7, 9]])

torchsort.soft_sort(x, regularization_strength=1.0)
# tensor([[0.5556, 1.5556, 2.5556, 3.5556, 4.5556, 5.5556, 6.5556, 7.5556, 8.5556]])
torchsort.soft_sort(x, regularization_strength=0.1)
# tensor([[-0., 1., 2., 3., 5., 6., 7., 8., 9.]])

torchsort.soft_rank(x)
# tensor([[8., 1., 5., 4., 3., 2., 6., 7., 9.]])

Both operations are fully differentiable, on CPU or GPU:

x = torch.tensor([[8., 0., 5., 3., 2., 1., 6., 7., 9.]], requires_grad=True).cuda()
y = torchsort.soft_sort(x)

torch.autograd.grad(y[0, 0], x)
# (tensor([[0.1111, 0.1111, 0.1111, 0.1111, 0.1111, 0.1111, 0.1111, 0.1111, 0.1111]],
#         device='cuda:0'),)

Example

Spearman's Rank Coefficient

Spearman's rank coefficient is a very useful metric for measuring how monotonically related two variables are. We can use Torchsort to create a differentiable Spearman's rank coefficient function so that we can optimize a model directly for this metric:

import torch
import torchsort

def spearmanr(pred, target, **kw):
    pred = torchsort.soft_rank(pred, **kw)
    target = torchsort.soft_rank(target, **kw)
    pred = pred - pred.mean()
    pred = pred / pred.norm()
    target = target - target.mean()
    target = target / target.norm()
    return (pred * target).sum()

pred = torch.tensor([[1., 2., 3., 4., 5.]], requires_grad=True)
target = torch.tensor([[5., 6., 7., 8., 7.]])
spearman = spearmanr(pred, target)
# tensor(0.8321)

torch.autograd.grad(spearman, pred)
# (tensor([[-5.5470e-02,  2.9802e-09,  5.5470e-02,  1.1094e-01, -1.1094e-01]]),)

Benchmark

torchsort and fast_soft_sort each operate with a time complexity of O(n log n), each with some additional overhead when compared to the built-in torch.sort. With a batch size of 1 (see left), the Numba JIT'd forward pass of fast_soft_sort performs about on-par with the torchsort CPU kernel, however its backward pass still relies on some Python code, which greatly penalizes its performance.

Furthermore, the torchsort kernel supports batches, and yields much better performance than fast_soft_sort as the batch size increases.

The torchsort CUDA kernel performs quite well with sequence lengths under ~2000, and scales to extremely large batch sizes. In the future the CUDA kernel can likely be further optimized to achieve performance closer to that of the built in torch.sort.

Reference

@inproceedings{blondel2020fast,
  title={Fast differentiable sorting and ranking},
  author={Blondel, Mathieu and Teboul, Olivier and Berthet, Quentin and Djolonga, Josip},
  booktitle={International Conference on Machine Learning},
  pages={950--959},
  year={2020},
  organization={PMLR}
}

torchsort's People

Contributors

Stargazers

Watchers

torchsort's Issues

Parallelize batch dimension

Should be trivial, just add another dimension to the tensors, and use #pragma omp parallel for on the outer loop.

Cannot install using poetry

Hello
I am installing the package using poetry and I get the following error. Any tip how I can do this?

  Command ['/home/ashkan/w/numersub/.venv/bin/python', '-m', 'pip', 'install', '--use-pep517', '--disable-pip-version-check', '--prefix', '/home/ashkan/w/numersub/.venv', '--no-deps', '/home/ashkan/.cache/pypoetry/artifacts/0e/71/4c/36d2482ca69c4e6c8ff5dbed48ecc16ad2bf8ea2c110395f1b50bfffc5/torchsort-0.1.9.tar.gz'] errored with the following return code 1, and output:
  Processing /home/ashkan/.cache/pypoetry/artifacts/0e/71/4c/36d2482ca69c4e6c8ff5dbed48ecc16ad2bf8ea2c110395f1b50bfffc5/torchsort-0.1.9.tar.gz
    Installing build dependencies: started
    Installing build dependencies: finished with status 'done'
    Getting requirements to build wheel: started
    Getting requirements to build wheel: finished with status 'error'
    error: subprocess-exited-with-error

    × Getting requirements to build wheel did not run successfully.
    │ exit code: 1
    ╰─> [17 lines of output]
        Traceback (most recent call last):
          File "/home/ashkan/w/numersub/.venv/lib/python3.10/site-packages/pip/_vendor/pep517/in_process/_in_process.py", line 363, in <module>
            main()
          File "/home/ashkan/w/numersub/.venv/lib/python3.10/site-packages/pip/_vendor/pep517/in_process/_in_process.py", line 345, in main
            json_out['return_val'] = hook(**hook_input['kwargs'])
          File "/home/ashkan/w/numersub/.venv/lib/python3.10/site-packages/pip/_vendor/pep517/in_process/_in_process.py", line 130, in get_requires_for_build_wheel
            return hook(config_settings)
          File "/tmp/pip-build-env-zc4jpjhm/overlay/lib/python3.10/site-packages/setuptools/build_meta.py", line 338, in get_requires_for_build_wheel
            return self._get_build_requires(config_settings, requirements=['wheel'])
          File "/tmp/pip-build-env-zc4jpjhm/overlay/lib/python3.10/site-packages/setuptools/build_meta.py", line 320, in _get_build_requires
            self.run_setup()
          File "/tmp/pip-build-env-zc4jpjhm/overlay/lib/python3.10/site-packages/setuptools/build_meta.py", line 483, in run_setup
            super(_BuildMetaLegacyBackend,
          File "/tmp/pip-build-env-zc4jpjhm/overlay/lib/python3.10/site-packages/setuptools/build_meta.py", line 335, in run_setup
            exec(code, locals())
          File "<string>", line 8, in <module>
        ModuleNotFoundError: No module named 'torch'
        [end of output]

    note: This error originates from a subprocess, and is likely not a problem with pip.
  error: subprocess-exited-with-error

  × Getting requirements to build wheel did not run successfully.
  │ exit code: 1
  ╰─> See above for output.

  note: This error originates from a subprocess, and is likely not a problem with pip.

Problem installing when no GPU present (in docker build step for example)

Doesn't install during docker build phase (that does not have GPUs configured).

Get error: /root/miniconda/lib/python3.8/site-packages/torch/cuda/init.py:52: UserWarning: CUDA initialization: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx (Triggered internally at /opt/conda/conda-bld/pytorch_1607370172916/work/c10/cuda/CUDAFunctions.cpp:100.)
return torch._C._cuda_getDeviceCount() > 0

If I install on the same image after running it with GPUs enabled it installs fine.

Hi, NVIDIA CUDA version >= 11.4 does not seem to install successfully.

I tried to install the package in Tesla A100 and GeForce RTX 3090 with CUDA version 11.4 both failed. Can you provide some help please? Thank you very much!

Unable to install either via pip or from source in docker

I want to install in a docker container created using NVIDIA Container Toolkit.

However, I am getting the following error message every time despite trying out all the alternative routes that are listed in the README or in other issues:

root:/workspace# pip install torchsort
WARNING: The directory '/root/.cache/pip' or its parent directory is not owned or is not writable by the current user. The cache has been disabled. Check the permissions and owner of that directory. If executing pip with sudo, you may want sudo's -H flag.
Collecting torchsort
  Downloading torchsort-0.1.7.tar.gz (11 kB)
Requirement already satisfied: torch in /opt/conda/lib/python3.8/site-packages (from torchsort) (1.8.1)
Requirement already satisfied: typing_extensions in /opt/conda/lib/python3.8/site-packages (from torch->torchsort) (3.7.4.3)
Requirement already satisfied: numpy in /root/.local/lib/python3.8/site-packages (from torch->torchsort) (1.21.0)
Building wheels for collected packages: torchsort
  Building wheel for torchsort (setup.py) ... error
  ERROR: Command errored out with exit status 1:
   command: /opt/conda/bin/python -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-s3448tu7/torchsort/setup.py'"'"'; __file__='"'"'/tmp/pip-install-s3448tu7/torchsort/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' bdist_wheel -d /tmp/pip-wheel-kmuaqe1f
       cwd: /tmp/pip-install-s3448tu7/torchsort/
  Complete output (60 lines):
  running bdist_wheel
  running build
  running build_py
  creating build
  creating build/lib.linux-x86_64-3.8
  creating build/lib.linux-x86_64-3.8/torchsort
  copying torchsort/__init__.py -> build/lib.linux-x86_64-3.8/torchsort
  copying torchsort/ops.py -> build/lib.linux-x86_64-3.8/torchsort
  running egg_info
  writing torchsort.egg-info/PKG-INFO
  writing dependency_links to torchsort.egg-info/dependency_links.txt
  writing requirements to torchsort.egg-info/requires.txt
  writing top-level names to torchsort.egg-info/top_level.txt
  reading manifest file 'torchsort.egg-info/SOURCES.txt'
  reading manifest template 'MANIFEST.in'
  writing manifest file 'torchsort.egg-info/SOURCES.txt'
  copying torchsort/isotonic_cpu.cpp -> build/lib.linux-x86_64-3.8/torchsort
  copying torchsort/isotonic_cuda.cu -> build/lib.linux-x86_64-3.8/torchsort
  running build_ext
  Traceback (most recent call last):
    File "<string>", line 1, in <module>
    File "/tmp/pip-install-s3448tu7/torchsort/setup.py", line 52, in <module>
      setup(
    File "/opt/conda/lib/python3.8/site-packages/setuptools/__init__.py", line 153, in setup
      return distutils.core.setup(**attrs)
    File "/opt/conda/lib/python3.8/distutils/core.py", line 148, in setup
      dist.run_commands()
    File "/opt/conda/lib/python3.8/distutils/dist.py", line 966, in run_commands
      self.run_command(cmd)
    File "/opt/conda/lib/python3.8/distutils/dist.py", line 985, in run_command
      cmd_obj.run()
    File "/opt/conda/lib/python3.8/site-packages/wheel/bdist_wheel.py", line 290, in run
      self.run_command('build')
    File "/opt/conda/lib/python3.8/distutils/cmd.py", line 313, in run_command
      self.distribution.run_command(command)
    File "/opt/conda/lib/python3.8/distutils/dist.py", line 985, in run_command
      cmd_obj.run()
    File "/opt/conda/lib/python3.8/distutils/command/build.py", line 135, in run
      self.run_command(cmd_name)
    File "/opt/conda/lib/python3.8/distutils/cmd.py", line 313, in run_command
      self.distribution.run_command(command)
    File "/opt/conda/lib/python3.8/distutils/dist.py", line 985, in run_command
      cmd_obj.run()
    File "/opt/conda/lib/python3.8/site-packages/setuptools/command/build_ext.py", line 79, in run
      _build_ext.run(self)
    File "/opt/conda/lib/python3.8/distutils/command/build_ext.py", line 340, in run
      self.build_extensions()
    File "/opt/conda/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 378, in build_extensions
      self._check_abi()
    File "/opt/conda/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 734, in _check_abi
      check_compiler_abi_compatibility(compiler)
    File "/opt/conda/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 282, in check_compiler_abi_compatibility
      if not check_compiler_ok_for_platform(compiler):
    File "/opt/conda/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 242, in check_compiler_ok_for_platform
      which = subprocess.check_output(['which', compiler], stderr=subprocess.STDOUT)
    File "/opt/conda/lib/python3.8/subprocess.py", line 415, in check_output
      return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
    File "/opt/conda/lib/python3.8/subprocess.py", line 516, in run
      raise CalledProcessError(retcode, process.args,
  subprocess.CalledProcessError: Command '['which', 'g++']' returned non-zero exit status 1.
  ----------------------------------------
  ERROR: Failed building wheel for torchsort
  Running setup.py clean for torchsort
Failed to build torchsort
Installing collected packages: torchsort
    Running setup.py install for torchsort ... error
    ERROR: Command errored out with exit status 1:
     command: /opt/conda/bin/python -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-s3448tu7/torchsort/setup.py'"'"'; __file__='"'"'/tmp/pip-install-s3448tu7/torchsort/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /tmp/pip-record-r_bd2xc3/install-record.txt --single-version-externally-managed --compile --install-headers /opt/conda/include/python3.8/torchsort
         cwd: /tmp/pip-install-s3448tu7/torchsort/
    Complete output (62 lines):
    running install
    running build
    running build_py
    creating build
    creating build/lib.linux-x86_64-3.8
    creating build/lib.linux-x86_64-3.8/torchsort
    copying torchsort/__init__.py -> build/lib.linux-x86_64-3.8/torchsort
    copying torchsort/ops.py -> build/lib.linux-x86_64-3.8/torchsort
    running egg_info
    writing torchsort.egg-info/PKG-INFO
    writing dependency_links to torchsort.egg-info/dependency_links.txt
    writing requirements to torchsort.egg-info/requires.txt
    writing top-level names to torchsort.egg-info/top_level.txt
    reading manifest file 'torchsort.egg-info/SOURCES.txt'
    reading manifest template 'MANIFEST.in'
    writing manifest file 'torchsort.egg-info/SOURCES.txt'
    copying torchsort/isotonic_cpu.cpp -> build/lib.linux-x86_64-3.8/torchsort
    copying torchsort/isotonic_cuda.cu -> build/lib.linux-x86_64-3.8/torchsort
    running build_ext
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/tmp/pip-install-s3448tu7/torchsort/setup.py", line 52, in <module>
        setup(
      File "/opt/conda/lib/python3.8/site-packages/setuptools/__init__.py", line 153, in setup
        return distutils.core.setup(**attrs)
      File "/opt/conda/lib/python3.8/distutils/core.py", line 148, in setup
        dist.run_commands()
      File "/opt/conda/lib/python3.8/distutils/dist.py", line 966, in run_commands
        self.run_command(cmd)
      File "/opt/conda/lib/python3.8/distutils/dist.py", line 985, in run_command
        cmd_obj.run()
      File "/opt/conda/lib/python3.8/site-packages/setuptools/command/install.py", line 61, in run
        return orig.install.run(self)
      File "/opt/conda/lib/python3.8/distutils/command/install.py", line 545, in run
        self.run_command('build')
      File "/opt/conda/lib/python3.8/distutils/cmd.py", line 313, in run_command
        self.distribution.run_command(command)
      File "/opt/conda/lib/python3.8/distutils/dist.py", line 985, in run_command
        cmd_obj.run()
      File "/opt/conda/lib/python3.8/distutils/command/build.py", line 135, in run
        self.run_command(cmd_name)
      File "/opt/conda/lib/python3.8/distutils/cmd.py", line 313, in run_command
        self.distribution.run_command(command)
      File "/opt/conda/lib/python3.8/distutils/dist.py", line 985, in run_command
        cmd_obj.run()
      File "/opt/conda/lib/python3.8/site-packages/setuptools/command/build_ext.py", line 79, in run
        _build_ext.run(self)
      File "/opt/conda/lib/python3.8/distutils/command/build_ext.py", line 340, in run
        self.build_extensions()
      File "/opt/conda/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 378, in build_extensions
        self._check_abi()
      File "/opt/conda/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 734, in _check_abi
        check_compiler_abi_compatibility(compiler)
      File "/opt/conda/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 282, in check_compiler_abi_compatibility
        if not check_compiler_ok_for_platform(compiler):
      File "/opt/conda/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 242, in check_compiler_ok_for_platform
        which = subprocess.check_output(['which', compiler], stderr=subprocess.STDOUT)
      File "/opt/conda/lib/python3.8/subprocess.py", line 415, in check_output
        return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
      File "/opt/conda/lib/python3.8/subprocess.py", line 516, in run
        raise CalledProcessError(retcode, process.args,
    subprocess.CalledProcessError: Command '['which', 'g++']' returned non-zero exit status 1.
    ----------------------------------------
ERROR: Command errored out with exit status 1: /opt/conda/bin/python -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-s3448tu7/torchsort/setup.py'"'"'; __file__='"'"'/tmp/pip-install-s3448tu7/torchsort/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /tmp/pip-record-r_bd2xc3/install-record.txt --single-version-externally-managed --compile --install-headers /opt/conda/include/python3.8/torchsort Check the logs for full command output.

When I try to install from the source by cloning and running python setup.py install, I get the following error that appears to be identical to the issue encountered via pip:

root:/workspace/torchsort# python setup.py install
running install
running bdist_egg
running egg_info
creating torchsort.egg-info
writing torchsort.egg-info/PKG-INFO
writing dependency_links to torchsort.egg-info/dependency_links.txt
writing requirements to torchsort.egg-info/requires.txt
writing top-level names to torchsort.egg-info/top_level.txt
writing manifest file 'torchsort.egg-info/SOURCES.txt'
reading manifest file 'torchsort.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
writing manifest file 'torchsort.egg-info/SOURCES.txt'
installing library code to build/bdist.linux-x86_64/egg
running install_lib
running build_py
creating build
creating build/lib.linux-x86_64-3.8
creating build/lib.linux-x86_64-3.8/torchsort
copying torchsort/__init__.py -> build/lib.linux-x86_64-3.8/torchsort
copying torchsort/ops.py -> build/lib.linux-x86_64-3.8/torchsort
copying torchsort/isotonic_cpu.cpp -> build/lib.linux-x86_64-3.8/torchsort
copying torchsort/isotonic_cuda.cu -> build/lib.linux-x86_64-3.8/torchsort
running build_ext
Traceback (most recent call last):
  File "setup.py", line 52, in <module>
    setup(
  File "/opt/conda/lib/python3.8/site-packages/setuptools/__init__.py", line 153, in setup
    return distutils.core.setup(**attrs)
  File "/opt/conda/lib/python3.8/distutils/core.py", line 148, in setup
    dist.run_commands()
  File "/opt/conda/lib/python3.8/distutils/dist.py", line 966, in run_commands
    self.run_command(cmd)
  File "/opt/conda/lib/python3.8/distutils/dist.py", line 985, in run_command
    cmd_obj.run()
  File "/opt/conda/lib/python3.8/site-packages/setuptools/command/install.py", line 67, in run
    self.do_egg_install()
  File "/opt/conda/lib/python3.8/site-packages/setuptools/command/install.py", line 109, in do_egg_install
    self.run_command('bdist_egg')
  File "/opt/conda/lib/python3.8/distutils/cmd.py", line 313, in run_command
    self.distribution.run_command(command)
  File "/opt/conda/lib/python3.8/distutils/dist.py", line 985, in run_command
    cmd_obj.run()
  File "/opt/conda/lib/python3.8/site-packages/setuptools/command/bdist_egg.py", line 167, in run
    cmd = self.call_command('install_lib', warn_dir=0)
  File "/opt/conda/lib/python3.8/site-packages/setuptools/command/bdist_egg.py", line 153, in call_command
    self.run_command(cmdname)
  File "/opt/conda/lib/python3.8/distutils/cmd.py", line 313, in run_command
    self.distribution.run_command(command)
  File "/opt/conda/lib/python3.8/distutils/dist.py", line 985, in run_command
    cmd_obj.run()
  File "/opt/conda/lib/python3.8/site-packages/setuptools/command/install_lib.py", line 11, in run
    self.build()
  File "/opt/conda/lib/python3.8/distutils/command/install_lib.py", line 107, in build
    self.run_command('build_ext')
  File "/opt/conda/lib/python3.8/distutils/cmd.py", line 313, in run_command
    self.distribution.run_command(command)
  File "/opt/conda/lib/python3.8/distutils/dist.py", line 985, in run_command
    cmd_obj.run()
  File "/opt/conda/lib/python3.8/site-packages/setuptools/command/build_ext.py", line 79, in run
    _build_ext.run(self)
  File "/opt/conda/lib/python3.8/distutils/command/build_ext.py", line 340, in run
    self.build_extensions()
  File "/opt/conda/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 378, in build_extensions
    self._check_abi()
  File "/opt/conda/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 734, in _check_abi
    check_compiler_abi_compatibility(compiler)
  File "/opt/conda/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 282, in check_compiler_abi_compatibility
    if not check_compiler_ok_for_platform(compiler):
  File "/opt/conda/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 242, in check_compiler_ok_for_platform
    which = subprocess.check_output(['which', compiler], stderr=subprocess.STDOUT)
  File "/opt/conda/lib/python3.8/subprocess.py", line 415, in check_output
    return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
  File "/opt/conda/lib/python3.8/subprocess.py", line 516, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['which', 'g++']' returned non-zero exit status 1.

Again, note that I've gone through the following steps in the README before running any of the above:

Conda Installation
On some systems the package my not compile with pip install in conda environments. If this happens you may need to:
Install g++ with conda install -c conda-forge gxx_linux-64
Set export variable export CXX=/path/to/miniconda3/envs/env_name/bin/x86_64-conda_cos6-linux-gnu-g++
If still failing, export variable export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/path/to/miniconda3/lib
Thanks to @levnikmyskin for pointing this out!

Some bug

@teddykoker
Hello, i am using torchsort.
But I found there is something I can't understand when I ran the following code:

    x = torch.tensor([[1., -2., 2., 3., 0.5, -1.]])
    print(torchsort.soft_rank(x))

I got tensor([[3.8750, 1.0000, 4.8750, 5.8750, 3.3750, 2.0000]]) rather than [4, 1, 5, 6, 3, 2]
Why?

cuda TypeError: 'NoneType' object is not callable

>>> import torch
>>> import torchsort
>>> x = torch.tensor([[8., 0., 5., 3., 2., 1., 6., 7., 9.]], requires_grad=True).cuda()
>>> y = torchsort.soft_sort(x)

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/shuiy/anaconda3/envs/pytorch_py3/lib/python3.7/site-packages/torchsort/ops.py", line 48, in soft_sort
    return SoftSort.apply(values, regularization, regularization_strength)
  File "/home/shuiy/anaconda3/envs/pytorch_py3/lib/python3.7/site-packages/torchsort/ops.py", line 132, in forward
    sol = isotonic_l2[s.device.type](w - s)
TypeError: 'NoneType' object is not callable

on jupyter notebook is:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
/tmp/ipykernel_8938/1075883647.py in <module>
      1 x = torch.tensor([[8., 0., 5., 3., 2., 1., 6., 7., 9.]], requires_grad=True).cuda()
----> 2 y = torchsort.soft_sort(x)

~/anaconda3/envs/pytorch_py3/lib/python3.7/site-packages/torchsort/ops.py in soft_sort(values, regularization, regularization_strength)
     46     if regularization not in ["l2", "kl"]:
     47         raise ValueError(f"'regularization' should be a 'l2' or 'kl'")
---> 48     return SoftSort.apply(values, regularization, regularization_strength)
     49 
     50 

~/anaconda3/envs/pytorch_py3/lib/python3.7/site-packages/torchsort/ops.py in forward(ctx, tensor, regularization, regularization_strength)
    130         # note reverse order of args
    131         if ctx.regularization == "l2":
--> 132             sol = isotonic_l2[s.device.type](w - s)
    133         else:
    134             sol = isotonic_kl[s.device.type](w, s)

TypeError: 'NoneType' object is not callable

if x is on cpu(), run code is ok
python 3.7.10, pytorch 1.9.0 , cudatoolkit=11.1, ubuntu 18.04

How to build the CUDA extension?

PyTorch 1.9 Crash

I am getting the following error after upgrading to torch 1.9

double free or corruption (!prev)
Aborted (core dumped)

Reproduce

import torchsort
import torch

torchsort.soft_rank(torch.randn(10, 10))
torchsort.soft_rank(torch.randn(10, 10).cuda())  # this too

python -c "import torchsort, torch; torchsort.soft_rank(torch.randn(10, 10))"
python -c "import torchsort, torch; torchsort.soft_rank(torch.randn(10, 10).cuda())"

System Details

Driver Version: 460.32.03
CUDA Version: 11.2
python: 3.8.10
Ubuntu 18.04.5 LTS x86_64
Kernel: 4.15.0-144-generic

Can I sort by specific column?

Is there any way to sort a tensor by a given column?

For example, soring by first column:

input_tensor = torch.tensor([
        [1, 5], 
        [30, 30], 
        [6, 9], 
        [80, -2]
])

target_tensor = torch.tensor([
        [80, -2],
        [30, 30], 
        [6, 9], 
        [1, 5], 
])

RuntimeError: Tensors of type TensorImpl do not have sizes

I recently upgraded to torch 1.12.1 and now get a runtime error with torchsort ( this was working in 1.10.1)...

torch==1.12.1+cu113
torchsort==0.1.9

import torch
import torchsort

values = torch.randn( 100, 1)
torchsort.soft_rank( values, regularization='l2', regularization_strength=0.0001)

RuntimeError: Tensors of type TensorImpl do not have sizes

Is there a way to use this to find the index of the biggest number in a torch vector?

When you use the regular torch.sort, the ranks vector that's returned is sorted, so if I want the index of the maximum value I just take the last elements of the ranks vector.
Same for the index of the 2nd biggest element: I just takes ranks_vec[-2] .

Unfortunately the regular torch.sort does not support a gradient.
I've been trying to think of a way to achieve this with your torchsort, any chance that you have any clue?

Appreciate it!

pip install fails

pip install torchsort
Collecting torchsort
Using cached torchsort-0.1.8.tar.gz (15 kB)
Requirement already satisfied: torch in ./miniconda3/envs/numerai/lib/python3.8/site-packages (from torchsort) (1.10.1)
Requirement already satisfied: typing_extensions in ./miniconda3/envs/numerai/lib/python3.8/site-packages (from torch->torchsort) (3.10.0.2)
Building wheels for collected packages: torchsort
Building wheel for torchsort (setup.py) ... error
ERROR: Command errored out with exit status 1:
command: /home/gbrecht/miniconda3/envs/numerai/bin/python -u -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-13efi_54/torchsort_345be19e602e41578bf71da3cb5a3cee/setup.py'"'"'; file='"'"'/tmp/pip-install-13efi_54/torchsort_345be19e602e41578bf71da3cb5a3cee/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(file) if os.path.exists(file) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' bdist_wheel -d /tmp/pip-wheel-ae5914wo
cwd: /tmp/pip-install-13efi_54/torchsort_345be19e602e41578bf71da3cb5a3cee/
Complete output (63 lines):
running bdist_wheel
/home/gbrecht/miniconda3/envs/numerai/lib/python3.8/site-packages/torch/utils/cpp_extension.py:381: UserWarning: Attempted to use ninja as the BuildExtension backend but we could not find ninja.. Falling back to using the slow distutils backend.
warnings.warn(msg.format('we could not find ninja.'))
running build
running build_py
creating build
creating build/lib.linux-x86_64-3.8
creating build/lib.linux-x86_64-3.8/torchsort
copying torchsort/ops.py -> build/lib.linux-x86_64-3.8/torchsort
copying torchsort/init.py -> build/lib.linux-x86_64-3.8/torchsort
running egg_info
writing torchsort.egg-info/PKG-INFO
writing dependency_links to torchsort.egg-info/dependency_links.txt
writing requirements to torchsort.egg-info/requires.txt
writing top-level names to torchsort.egg-info/top_level.txt
reading manifest file 'torchsort.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
adding license file 'LICENSE'
writing manifest file 'torchsort.egg-info/SOURCES.txt'
copying torchsort/isotonic_cpu.cpp -> build/lib.linux-x86_64-3.8/torchsort
copying torchsort/isotonic_cuda.cu -> build/lib.linux-x86_64-3.8/torchsort
running build_ext
Traceback (most recent call last):
File "", line 1, in
File "/tmp/pip-install-13efi_54/torchsort_345be19e602e41578bf71da3cb5a3cee/setup.py", line 52, in
setup(
File "/home/gbrecht/miniconda3/envs/numerai/lib/python3.8/site-packages/setuptools/init.py", line 153, in setup
return distutils.core.setup(**attrs)
File "/home/gbrecht/miniconda3/envs/numerai/lib/python3.8/distutils/core.py", line 148, in setup
dist.run_commands()
File "/home/gbrecht/miniconda3/envs/numerai/lib/python3.8/distutils/dist.py", line 966, in run_commands
self.run_command(cmd)
File "/home/gbrecht/miniconda3/envs/numerai/lib/python3.8/distutils/dist.py", line 985, in run_command
cmd_obj.run()
File "/home/gbrecht/miniconda3/envs/numerai/lib/python3.8/site-packages/wheel/bdist_wheel.py", line 299, in run
self.run_command('build')
File "/home/gbrecht/miniconda3/envs/numerai/lib/python3.8/distutils/cmd.py", line 313, in run_command
self.distribution.run_command(command)
File "/home/gbrecht/miniconda3/envs/numerai/lib/python3.8/distutils/dist.py", line 985, in run_command
cmd_obj.run()
File "/home/gbrecht/miniconda3/envs/numerai/lib/python3.8/distutils/command/build.py", line 135, in run
self.run_command(cmd_name)
File "/home/gbrecht/miniconda3/envs/numerai/lib/python3.8/distutils/cmd.py", line 313, in run_command
self.distribution.run_command(command)
File "/home/gbrecht/miniconda3/envs/numerai/lib/python3.8/distutils/dist.py", line 985, in run_command
cmd_obj.run()
File "/home/gbrecht/miniconda3/envs/numerai/lib/python3.8/site-packages/setuptools/command/build_ext.py", line 79, in run
_build_ext.run(self)
File "/home/gbrecht/miniconda3/envs/numerai/lib/python3.8/distutils/command/build_ext.py", line 340, in run
self.build_extensions()
File "/home/gbrecht/miniconda3/envs/numerai/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 390, in build_extensions
self._check_abi()
File "/home/gbrecht/miniconda3/envs/numerai/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 761, in _check_abi
check_compiler_abi_compatibility(compiler)
File "/home/gbrecht/miniconda3/envs/numerai/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 294, in check_compiler_abi_compatibility
if not check_compiler_ok_for_platform(compiler):
File "/home/gbrecht/miniconda3/envs/numerai/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 254, in check_compiler_ok_for_platform
which = subprocess.check_output(['which', compiler], stderr=subprocess.STDOUT)
File "/home/gbrecht/miniconda3/envs/numerai/lib/python3.8/subprocess.py", line 415, in check_output
return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
File "/home/gbrecht/miniconda3/envs/numerai/lib/python3.8/subprocess.py", line 516, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['which', '/home/gbrecht/miniconda3/envs/env_name/bin/x86_64-conda_cos6-linux-gnu-g++']' returned non-zero exit status 1.

ERROR: Failed building wheel for torchsort
Running setup.py clean for torchsort
Failed to build torchsort
Installing collected packages: torchsort
Running setup.py install for torchsort ... error
ERROR: Command errored out with exit status 1:
command: /home/gbrecht/miniconda3/envs/numerai/bin/python -u -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-13efi_54/torchsort_345be19e602e41578bf71da3cb5a3cee/setup.py'"'"'; file='"'"'/tmp/pip-install-13efi_54/torchsort_345be19e602e41578bf71da3cb5a3cee/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(file) if os.path.exists(file) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' install --record /tmp/pip-record-ebscxvww/install-record.txt --single-version-externally-managed --compile --install-headers /home/gbrecht/miniconda3/envs/numerai/include/python3.8/torchsort
cwd: /tmp/pip-install-13efi_54/torchsort_345be19e602e41578bf71da3cb5a3cee/
Complete output (65 lines):
running install
running build
running build_py
creating build
creating build/lib.linux-x86_64-3.8
creating build/lib.linux-x86_64-3.8/torchsort
copying torchsort/ops.py -> build/lib.linux-x86_64-3.8/torchsort
copying torchsort/init.py -> build/lib.linux-x86_64-3.8/torchsort
running egg_info
writing torchsort.egg-info/PKG-INFO
writing dependency_links to torchsort.egg-info/dependency_links.txt
writing requirements to torchsort.egg-info/requires.txt
writing top-level names to torchsort.egg-info/top_level.txt
/home/gbrecht/miniconda3/envs/numerai/lib/python3.8/site-packages/torch/utils/cpp_extension.py:381: UserWarning: Attempted to use ninja as the BuildExtension backend but we could not find ninja.. Falling back to using the slow distutils backend.
warnings.warn(msg.format('we could not find ninja.'))
reading manifest file 'torchsort.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
adding license file 'LICENSE'
writing manifest file 'torchsort.egg-info/SOURCES.txt'
copying torchsort/isotonic_cpu.cpp -> build/lib.linux-x86_64-3.8/torchsort
copying torchsort/isotonic_cuda.cu -> build/lib.linux-x86_64-3.8/torchsort
running build_ext
Traceback (most recent call last):
File "", line 1, in
File "/tmp/pip-install-13efi_54/torchsort_345be19e602e41578bf71da3cb5a3cee/setup.py", line 52, in
setup(
File "/home/gbrecht/miniconda3/envs/numerai/lib/python3.8/site-packages/setuptools/init.py", line 153, in setup
return distutils.core.setup(**attrs)
File "/home/gbrecht/miniconda3/envs/numerai/lib/python3.8/distutils/core.py", line 148, in setup
dist.run_commands()
File "/home/gbrecht/miniconda3/envs/numerai/lib/python3.8/distutils/dist.py", line 966, in run_commands
self.run_command(cmd)
File "/home/gbrecht/miniconda3/envs/numerai/lib/python3.8/distutils/dist.py", line 985, in run_command
cmd_obj.run()
File "/home/gbrecht/miniconda3/envs/numerai/lib/python3.8/site-packages/setuptools/command/install.py", line 61, in run
return orig.install.run(self)
File "/home/gbrecht/miniconda3/envs/numerai/lib/python3.8/distutils/command/install.py", line 545, in run
self.run_command('build')
File "/home/gbrecht/miniconda3/envs/numerai/lib/python3.8/distutils/cmd.py", line 313, in run_command
self.distribution.run_command(command)
File "/home/gbrecht/miniconda3/envs/numerai/lib/python3.8/distutils/dist.py", line 985, in run_command
cmd_obj.run()
File "/home/gbrecht/miniconda3/envs/numerai/lib/python3.8/distutils/command/build.py", line 135, in run
self.run_command(cmd_name)
File "/home/gbrecht/miniconda3/envs/numerai/lib/python3.8/distutils/cmd.py", line 313, in run_command
self.distribution.run_command(command)
File "/home/gbrecht/miniconda3/envs/numerai/lib/python3.8/distutils/dist.py", line 985, in run_command
cmd_obj.run()
File "/home/gbrecht/miniconda3/envs/numerai/lib/python3.8/site-packages/setuptools/command/build_ext.py", line 79, in run
_build_ext.run(self)
File "/home/gbrecht/miniconda3/envs/numerai/lib/python3.8/distutils/command/build_ext.py", line 340, in run
self.build_extensions()
File "/home/gbrecht/miniconda3/envs/numerai/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 390, in build_extensions
self._check_abi()
File "/home/gbrecht/miniconda3/envs/numerai/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 761, in _check_abi
check_compiler_abi_compatibility(compiler)
File "/home/gbrecht/miniconda3/envs/numerai/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 294, in check_compiler_abi_compatibility
if not check_compiler_ok_for_platform(compiler):
File "/home/gbrecht/miniconda3/envs/numerai/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 254, in check_compiler_ok_for_platform
which = subprocess.check_output(['which', compiler], stderr=subprocess.STDOUT)
File "/home/gbrecht/miniconda3/envs/numerai/lib/python3.8/subprocess.py", line 415, in check_output
return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
File "/home/gbrecht/miniconda3/envs/numerai/lib/python3.8/subprocess.py", line 516, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['which', '/home/gbrecht/miniconda3/envs/env_name/bin/x86_64-conda_cos6-linux-gnu-g++']' returned non-zero exit status 1.
----------------------------------------
ERROR: Command errored out with exit status 1: /home/gbrecht/miniconda3/envs/numerai/bin/python -u -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-13efi_54/torchsort_345be19e602e41578bf71da3cb5a3cee/setup.py'"'"'; file='"'"'/tmp/pip-install-13efi_54/torchsort_345be19e602e41578bf71da3cb5a3cee/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(file) if os.path.exists(file) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' install --record /tmp/pip-record-ebscxvww/install-record.txt --single-version-externally-managed --compile --install-headers /home/gbrecht/miniconda3/envs/numerai/include/python3.8/torchsort Check the logs for full command output.

Build Errors When Upgrading to 1.7

I tried to upgrade to 1.7 from 1.6.
I get the following build errors in Ubuntu:

  • Updating torchsort (0.1.6 -> 0.1.7): Failed

  EnvCommandError

  Command ['/home/jkk/w/N9MER/.venv/bin/pip', 'install', '--no-deps', '-U', 'file:///home/jkk/.cache/pypoetry/artifacts/dd/3b/ee/bd011dc524042d73babfc5cb3551973002870652949f15af2cd67740a4/torchsort-0.1.7.tar.gz'] errored with the following return code 1, and output:
  Processing /home/jkk/.cache/pypoetry/artifacts/dd/3b/ee/bd011dc524042d73babfc5cb3551973002870652949f15af2cd67740a4/torchsort-0.1.7.tar.gz
  Building wheels for collected packages: torchsort
    Building wheel for torchsort (setup.py): started
    Building wheel for torchsort (setup.py): finished with status 'error'
    ERROR: Command errored out with exit status 1:
     command: /home/jkk/w/N9MER/.venv/bin/python -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-req-build-n22hcjuz/setup.py'"'"'; __file__='"'"'/tmp/pip-req-build-n22hcjuz/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' bdist_wheel -d /tmp/pip-wheel-k64ka9oe
         cwd: /tmp/pip-req-build-n22hcjuz/
    Complete output (37 lines):
    running bdist_wheel
    /home/jkk/w/N9MER/.venv/lib/python3.9/site-packages/torch/utils/cpp_extension.py:370: UserWarning: Attempted to use ninja as the BuildExtension backend but we could not find ninja.. Falling back to using the slow distutils backend.
      warnings.warn(msg.format('we could not find ninja.'))
    running build
    running build_py
    creating build
    creating build/lib.linux-x86_64-3.9
    creating build/lib.linux-x86_64-3.9/torchsort
    copying torchsort/__init__.py -> build/lib.linux-x86_64-3.9/torchsort
    copying torchsort/ops.py -> build/lib.linux-x86_64-3.9/torchsort
    running egg_info
    writing torchsort.egg-info/PKG-INFO
    writing dependency_links to torchsort.egg-info/dependency_links.txt
    writing requirements to torchsort.egg-info/requires.txt
    writing top-level names to torchsort.egg-info/top_level.txt
    reading manifest file 'torchsort.egg-info/SOURCES.txt'
    reading manifest template 'MANIFEST.in'
    writing manifest file 'torchsort.egg-info/SOURCES.txt'
    copying torchsort/isotonic_cpu.cpp -> build/lib.linux-x86_64-3.9/torchsort
    copying torchsort/isotonic_cuda.cu -> build/lib.linux-x86_64-3.9/torchsort
    running build_ext
    building 'torchsort.isotonic_cpu' extension
    creating build/temp.linux-x86_64-3.9
    creating build/temp.linux-x86_64-3.9/torchsort
    gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -fPIC -I/home/jkk/w/N9MER/.venv/lib/python3.9/site-packages/torch/include -I/home/jkk/w/N9MER/.venv/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/jkk/w/N9MER/.venv/lib/python3.9/site-packages/torch/include/TH -I/home/jkk/w/N9MER/.venv/lib/python3.9/site-packages/torch/include/THC -I/home/jkk/w/N9MER/.venv/include -I/home/jkk/.pyenv/versions/3.9.4/include/python3.9 -c torchsort/isotonic_cpu.cpp -o build/temp.linux-x86_64-3.9/torchsort/isotonic_cpu.o -fopenmp -ffast-math -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="_gcc" -DPYBIND11_STDLIB="_libstdcpp" -DPYBIND11_BUILD_ABI="_cxxabi1011" -DTORCH_EXTENSION_NAME=isotonic_cpu -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++14
    g++ -pthread -shared -L/home/jkk/.pyenv/versions/3.9.4/lib -L/home/jkk/.pyenv/versions/3.9.4/lib build/temp.linux-x86_64-3.9/torchsort/isotonic_cpu.o -L/home/jkk/w/N9MER/.venv/lib/python3.9/site-packages/torch/lib -lc10 -ltorch -ltorch_cpu -ltorch_python -o build/lib.linux-x86_64-3.9/torchsort/isotonic_cpu.cpython-39-x86_64-linux-gnu.so
    building 'torchsort.isotonic_cuda' extension
    /usr/bin/nvcc -I/home/jkk/w/N9MER/.venv/lib/python3.9/site-packages/torch/include -I/home/jkk/w/N9MER/.venv/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/jkk/w/N9MER/.venv/lib/python3.9/site-packages/torch/include/TH -I/home/jkk/w/N9MER/.venv/lib/python3.9/site-packages/torch/include/THC -I/home/jkk/w/N9MER/.venv/include -I/home/jkk/.pyenv/versions/3.9.4/include/python3.9 -c torchsort/isotonic_cuda.cu -o build/temp.linux-x86_64-3.9/torchsort/isotonic_cuda.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options '-fPIC' -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="_gcc" -DPYBIND11_STDLIB="_libstdcpp" -DPYBIND11_BUILD_ABI="_cxxabi1011" -DTORCH_EXTENSION_NAME=isotonic_cuda -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_61,code=compute_61 -gencode=arch=compute_61,code=sm_61 -std=c++14
    /usr/include/c++/10/chrono: In substitution of ‘template<class _Rep, class _Period> template<class _Period2> using __is_harmonic = std::__bool_constant<(std::ratio<((_Period2::num / std::chrono::duration<_Rep, _Period>::_S_gcd(_Period2::num, _Period::num)) * (_Period::den / std::chrono::duration<_Rep, _Period>::_S_gcd(_Period2::den, _Period::den))), ((_Period2::den / std::chrono::duration<_Rep, _Period>::_S_gcd(_Period2::den, _Period::den)) * (_Period::num / std::chrono::duration<_Rep, _Period>::_S_gcd(_Period2::num, _Period::num)))>::den == 1)> [with _Period2 = _Period2; _Rep = _Rep; _Period = _Period]’:
    /usr/include/c++/10/chrono:473:154:   required from here
    /usr/include/c++/10/chrono:428:27: internal compiler error: Segmentation fault
      428 |  _S_gcd(intmax_t __m, intmax_t __n) noexcept
          |                           ^~~~~~
    Please submit a full bug report,
    with preprocessed source if appropriate.
    See <file:///usr/share/doc/gcc-10/README.Bugs> for instructions.
    error: command '/usr/bin/nvcc' failed with exit code 1
    ----------------------------------------
    ERROR: Failed building wheel for torchsort
    Running setup.py clean for torchsort
  Failed to build torchsort
  Installing collected packages: torchsort
    Attempting uninstall: torchsort
      Found existing installation: torchsort 0.1.6
      Uninstalling torchsort-0.1.6:
        Successfully uninstalled torchsort-0.1.6
      Running setup.py install for torchsort: started
      Running setup.py install for torchsort: finished with status 'error'
      ERROR: Command errored out with exit status 1:
       command: /home/jkk/w/N9MER/.venv/bin/python -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-req-build-n22hcjuz/setup.py'"'"'; __file__='"'"'/tmp/pip-req-build-n22hcjuz/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /tmp/pip-record-87f187fd/install-record.txt --single-version-externally-managed --compile --install-headers /home/jkk/w/N9MER/.venv/include/site/python3.9/torchsort
           cwd: /tmp/pip-req-build-n22hcjuz/
      Complete output (37 lines):
      running install
      running build
      running build_py
      creating build
      creating build/lib.linux-x86_64-3.9
      creating build/lib.linux-x86_64-3.9/torchsort
      copying torchsort/__init__.py -> build/lib.linux-x86_64-3.9/torchsort
      copying torchsort/ops.py -> build/lib.linux-x86_64-3.9/torchsort
      running egg_info
      writing torchsort.egg-info/PKG-INFO
      writing dependency_links to torchsort.egg-info/dependency_links.txt
      writing requirements to torchsort.egg-info/requires.txt
      writing top-level names to torchsort.egg-info/top_level.txt
      /home/jkk/w/N9MER/.venv/lib/python3.9/site-packages/torch/utils/cpp_extension.py:370: UserWarning: Attempted to use ninja as the BuildExtension backend but we could not find ninja.. Falling back to using the slow distutils backend.
        warnings.warn(msg.format('we could not find ninja.'))
      reading manifest file 'torchsort.egg-info/SOURCES.txt'
      reading manifest template 'MANIFEST.in'
      writing manifest file 'torchsort.egg-info/SOURCES.txt'
      copying torchsort/isotonic_cpu.cpp -> build/lib.linux-x86_64-3.9/torchsort
      copying torchsort/isotonic_cuda.cu -> build/lib.linux-x86_64-3.9/torchsort
      running build_ext
      building 'torchsort.isotonic_cpu' extension
      creating build/temp.linux-x86_64-3.9
      creating build/temp.linux-x86_64-3.9/torchsort
      gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -fPIC -I/home/jkk/w/N9MER/.venv/lib/python3.9/site-packages/torch/include -I/home/jkk/w/N9MER/.venv/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/jkk/w/N9MER/.venv/lib/python3.9/site-packages/torch/include/TH -I/home/jkk/w/N9MER/.venv/lib/python3.9/site-packages/torch/include/THC -I/home/jkk/w/N9MER/.venv/include -I/home/jkk/.pyenv/versions/3.9.4/include/python3.9 -c torchsort/isotonic_cpu.cpp -o build/temp.linux-x86_64-3.9/torchsort/isotonic_cpu.o -fopenmp -ffast-math -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="_gcc" -DPYBIND11_STDLIB="_libstdcpp" -DPYBIND11_BUILD_ABI="_cxxabi1011" -DTORCH_EXTENSION_NAME=isotonic_cpu -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++14
      g++ -pthread -shared -L/home/jkk/.pyenv/versions/3.9.4/lib -L/home/jkk/.pyenv/versions/3.9.4/lib build/temp.linux-x86_64-3.9/torchsort/isotonic_cpu.o -L/home/jkk/w/N9MER/.venv/lib/python3.9/site-packages/torch/lib -lc10 -ltorch -ltorch_cpu -ltorch_python -o build/lib.linux-x86_64-3.9/torchsort/isotonic_cpu.cpython-39-x86_64-linux-gnu.so
      building 'torchsort.isotonic_cuda' extension
      /usr/bin/nvcc -I/home/jkk/w/N9MER/.venv/lib/python3.9/site-packages/torch/include -I/home/jkk/w/N9MER/.venv/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/jkk/w/N9MER/.venv/lib/python3.9/site-packages/torch/include/TH -I/home/jkk/w/N9MER/.venv/lib/python3.9/site-packages/torch/include/THC -I/home/jkk/w/N9MER/.venv/include -I/home/jkk/.pyenv/versions/3.9.4/include/python3.9 -c torchsort/isotonic_cuda.cu -o build/temp.linux-x86_64-3.9/torchsort/isotonic_cuda.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options '-fPIC' -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="_gcc" -DPYBIND11_STDLIB="_libstdcpp" -DPYBIND11_BUILD_ABI="_cxxabi1011" -DTORCH_EXTENSION_NAME=isotonic_cuda -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_61,code=compute_61 -gencode=arch=compute_61,code=sm_61 -std=c++14
      /usr/include/c++/10/chrono: In substitution of ‘template<class _Rep, class _Period> template<class _Period2> using __is_harmonic = std::__bool_constant<(std::ratio<((_Period2::num / std::chrono::duration<_Rep, _Period>::_S_gcd(_Period2::num, _Period::num)) * (_Period::den / std::chrono::duration<_Rep, _Period>::_S_gcd(_Period2::den, _Period::den))), ((_Period2::den / std::chrono::duration<_Rep, _Period>::_S_gcd(_Period2::den, _Period::den)) * (_Period::num / std::chrono::duration<_Rep, _Period>::_S_gcd(_Period2::num, _Period::num)))>::den == 1)> [with _Period2 = _Period2; _Rep = _Rep; _Period = _Period]’:
      /usr/include/c++/10/chrono:473:154:   required from here
      /usr/include/c++/10/chrono:428:27: internal compiler error: Segmentation fault
        428 |  _S_gcd(intmax_t __m, intmax_t __n) noexcept
            |                           ^~~~~~
      Please submit a full bug report,
      with preprocessed source if appropriate.
      See <file:///usr/share/doc/gcc-10/README.Bugs> for instructions.
      error: command '/usr/bin/nvcc' failed with exit code 1
      ----------------------------------------
    Rolling back uninstall of torchsort
    Moving to /home/jkk/w/N9MER/.venv/lib/python3.9/site-packages/torchsort-0.1.6.dist-info/
     from /home/jkk/w/N9MER/.venv/lib/python3.9/site-packages/~orchsort-0.1.6.dist-info
    Moving to /home/jkk/w/N9MER/.venv/lib/python3.9/site-packages/torchsort/
     from /home/jkk/w/N9MER/.venv/lib/python3.9/site-packages/~-rchsort
  ERROR: Command errored out with exit status 1: /home/jkk/w/N9MER/.venv/bin/python -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-req-build-n22hcjuz/setup.py'"'"'; __file__='"'"'/tmp/pip-req-build-n22hcjuz/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /tmp/pip-record-87f187fd/install-record.txt --single-version-externally-managed --compile --install-headers /home/jkk/w/N9MER/.venv/include/site/python3.9/torchsort Check the logs for full command output.

I have the following nvcc --version:

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Sun_Feb_14_21:12:58_PST_2021
Cuda compilation tools, release 11.2, V11.2.152
Build cuda_11.2.r11.2/compiler.29618528_0

Any help is appreciated!

Pip install throws error

Environment:

Windows 11
Python == 3.9.13
Visual C++ == 14.34

Error:

Building wheels for collected packages: torchsort
  Building wheel for torchsort (setup.py) ... error
  error: subprocess-exited-with-error

  × python setup.py bdist_wheel did not run successfully.
  │ exit code: 1
  ╰─> [25 lines of output]
      running bdist_wheel
      C:\Users\Sam\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\utils\cpp_extension.py:387: UserWarning: Attempted to use ninja as the BuildExtension backend but we could not find ninja.. Falling back to using the slow distutils backend.
        warnings.warn(msg.format('we could not find ninja.'))
      running build
      running build_py
      creating build
      creating build\lib.win-amd64-3.9
      creating build\lib.win-amd64-3.9\torchsort
      copying torchsort\ops.py -> build\lib.win-amd64-3.9\torchsort
      copying torchsort\__init__.py -> build\lib.win-amd64-3.9\torchsort
      running egg_info
      writing torchsort.egg-info\PKG-INFO
      writing dependency_links to torchsort.egg-info\dependency_links.txt
      writing requirements to torchsort.egg-info\requires.txt
      writing top-level names to torchsort.egg-info\top_level.txt
      reading manifest file 'torchsort.egg-info\SOURCES.txt'
      reading manifest template 'MANIFEST.in'
      writing manifest file 'torchsort.egg-info\SOURCES.txt'
      copying torchsort\isotonic_cpu.cpp -> build\lib.win-amd64-3.9\torchsort
      copying torchsort\isotonic_cuda.cu -> build\lib.win-amd64-3.9\torchsort
      running build_ext
      C:\Users\Sam\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\utils\cpp_extension.py:322: UserWarning: Error checking compiler version for cl: [WinError 2] The system cannot find the file specified
        warnings.warn(f'Error checking compiler version for {compiler}: {error}')
      building 'torchsort.isotonic_cpu' extension
      error: Microsoft Visual C++ 14.0 or greater is required. Get it with "Microsoft C++ Build Tools": https://visualstudio.microsoft.com/visual-cpp-build-tools/
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building wheel for torchsort
  Running setup.py clean for torchsort
Failed to build torchsort
Installing collected packages: torchsort
  Running setup.py install for torchsort ... error
  error: subprocess-exited-with-error

  × Running setup.py install for torchsort did not run successfully.
  │ exit code: 1
  ╰─> [25 lines of output]
      running install
      running build
      running build_py
      creating build
      creating build\lib.win-amd64-3.9
      creating build\lib.win-amd64-3.9\torchsort
      copying torchsort\ops.py -> build\lib.win-amd64-3.9\torchsort
      copying torchsort\__init__.py -> build\lib.win-amd64-3.9\torchsort
      running egg_info
      writing torchsort.egg-info\PKG-INFO
      writing dependency_links to torchsort.egg-info\dependency_links.txt
      writing requirements to torchsort.egg-info\requires.txt
      writing top-level names to torchsort.egg-info\top_level.txt
      C:\Users\Sam\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\utils\cpp_extension.py:387: UserWarning: Attempted to use ninja as the BuildExtension backend but we could not find ninja.. Falling back to using the slow distutils backend.
        warnings.warn(msg.format('we could not find ninja.'))
      reading manifest file 'torchsort.egg-info\SOURCES.txt'
      reading manifest template 'MANIFEST.in'
      writing manifest file 'torchsort.egg-info\SOURCES.txt'
      copying torchsort\isotonic_cpu.cpp -> build\lib.win-amd64-3.9\torchsort
      copying torchsort\isotonic_cuda.cu -> build\lib.win-amd64-3.9\torchsort
      running build_ext
      C:\Users\Sam\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\utils\cpp_extension.py:322: UserWarning: Error checking compiler version for cl: [WinError 2] The system cannot find the file specified
        warnings.warn(f'Error checking compiler version for {compiler}: {error}')
      building 'torchsort.isotonic_cpu' extension
      error: Microsoft Visual C++ 14.0 or greater is required. Get it with "Microsoft C++ Build Tools": https://visualstudio.microsoft.com/visual-cpp-build-tools/
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
error: legacy-install-failure

× Encountered error while trying to install package.
╰─> torchsort

note: This is an issue with the package mentioned above, not pip.
hint: See above for output from the failure.

Does torchsort support fp16 training?

Hi, thank you for the efforts. I want to use torchsort in fp16 setting. It seems that torchsort doesn't support fp16 yet? Looking forward to your reply!

ranking returns only 1 values for all samples

Hallo,

I am having the above mentioned problem, but in debugging I noticed that I really do not understand why the input tensor has to be 2d.
My target (in the pandas world) is 1d, as are the predictions. My current implementation in pytorch has [len(X),1] as the dimension for both.

When I run the code:

torchsort.soft_rank( torch.rand(len(pred), 1,device=device, dtype=pred.dtype), regularization='l2', regularization_strength=1, )

the resulting tensor is
tensor([[1.], [1.], [1.], ..., [1.], [1.], [1.]], device='cuda:0')

which is logic in the sense that in every row there is one sample and so the rank is one, each time. I want to sort the column, though. Of course calling torchsort on tensor.squeeze() violates the 2d requirement, and when I run a tow example on a "true" 2d tensor

torchsort.soft_rank( torch.rand(len(pred), 2,device=device, dtype=pred.dtype), regularization='l2', regularization_strength=1, )

I would expect a result with [1,2] and [2,1] which I do not get, instead it is

tensor([[1.3748, 1.6252], [1.7719, 1.2281], [1.7958, 1.2042], ..., [1.1949, 1.8051], [1.5149, 1.4851], [1.2594, 1.7406]], device='cuda:0')

which is probably due to it being the gradient and not the real values.

So I guess my real question is, how do I soft_rank a 1d tensor that does not return all 1s because that makes my loss NaN

Understanding regularization strength

I use torchsort in my loss function. My issue is that sometimes ist returns NaN, depending on the regularization strength.
My batches are between 1k and 5k samples and there are ~1k features.

Is there some documentation on regularization strength? scrolling through the code I cannot find anything.

Is there a way to estimate a good regularization strength value depending on your data?

I understand that 1 is the default value and reducing regularization strength brings the result closer to the true ordering. So, is the following a good heuristic?

Stay at 1 as long as your model is learning
if 1 does not return a gradient the optimizer can work with, try a lower value
If there is no value that returns a valid ordering and works for the optimizer ... RIP

Couldn't compile using setup.py

Greetings. I would like to install this library to be incorporated in my current framework which required the GPU build. However, I bump into an issue when executing the python setup.py install with the following error messages:

running install running bdist_egg running egg_info writing torchsort.egg-info/PKG-INFO writing dependency_links to torchsort.egg-info/dependency_links.txt writing requirements to torchsort.egg-info/requires.txt writing top-level names to torchsort.egg-info/top_level.txt reading manifest file 'torchsort.egg-info/SOURCES.txt' writing manifest file 'torchsort.egg-info/SOURCES.txt' installing library code to build/bdist.linux-x86_64/egg running install_lib running build_py running build_ext building 'torchsort.isotonic_cpu' extension Emitting ninja build file /data/repositories/SVAE/torchsort/build/temp.linux-x86_64-3.7/build.ninja... Compiling objects... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) 1.8.2 x86_64-linux-gnu-g++ -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions -Wl,-Bsymbolic-functions -Wl,-z,relro -Wl,-Bsymbolic-functions -Wl,-z,relro -g -fdebug-prefix-map=/build/python3.7-a56wZI/python3.7-3.7.10=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 /data/repositories/SVAE/torchsort/build/temp.linux-x86_64-3.7/torchsort/isotonic_cpu.o -L/data/repositories/SVAE/env/lib/python3.7/site-packages/torch/lib -lc10 -ltorch -ltorch_cpu -ltorch_python -o build/lib.linux-x86_64-3.7/torchsort/isotonic_cpu.cpython-37m-x86_64-linux-gnu.so x86_64-linux-gnu-g++: error: /data/repositories/SVAE/torchsort/build/temp.linux-x86_64-3.7/torchsort/isotonic_cpu.o: No such file or directory error: command 'x86_64-linux-gnu-g++' failed with exit status 1

I believed that the compilation faced the error somehow, but without any explicit error message.
Following are my installed (possibly related) packages:

Python 3.7.10
torch 1.7.1
torchvision 0.82
CUDA version 11.0
Nvidia Cuda Toolkit (NVCC) 11.2
Ninja-Build 1.8.2

I am not sure whether this information is sufficient, please kindly give me hints.
Really appreciate your solution to solve this problem.
Thanks!
Regards.

This is great, how can I learn to do it myself?

Thanks for sharing this code, is it the rewritten version of (https://github.com/google-research/fast-soft-sort)? what is the benefit of this version compare to fast soft sort from google?
what makes me really interested in this work is the implementation, I found it so hard to incorporate new stuff to pytorch with c++ and cuda, I know it is a little to much to ask but I am sure it will be appreciated so much if you can write a tutorial (or make a video) on how we can do it for other function.
That would be a huge help!

Failing to recognise torchsort cuda even when torch with cuda is successfully installed

Thanks for the awesome package!

system: linux (arch)
python: 3.8

So i've got a weird chicken-egg issue.

I successfully install pytorch using the suggested conda command

install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch

and verify it recognises the gpu
a = torch.tensor([1,2,3]).cuda(device=0)

then i run the demo code

ImportError: You are trying to use the torchsort CUDA extension, but it looks like it is not available. Make sure you have the CUDA toolchain installed, and reinstall torchsort with pip install --force-reinstall --no-cache-dir torchsort to rebuild the extension.

x = torch.tensor([[8., 0., 5., 3., 2., 1., 6., 7., 9.]], requires_grad=True).cuda()
y = torchsort.soft_sort(x)

and it gives me the error

ImportError: You are trying to use the torchsort CUDA extension, but it looks like it is not available. Make sure you have the CUDA toolchain installed, and reinstall torchsort with `pip install --force-reinstall --no-cache-dir torchsort` to rebuild the extension.

...so I go to install (again) it as suggested.
this is where the first weirdness happens: torchsort tries to download pytorch again!

Collecting torch
Downloading torch-1.12.1-cp38-cp38-manylinux1_x86_64.whl (776.3 MB)
...
Attempting uninstall: torch
Found existing installation: torch 1.12.1
Uninstalling torch-1.12.1:
Successfully uninstalled torch-1.12.1
...
Successfully installed torch-1.12.1 torchsort-0.1.9 typing-extensions-4.4.0

so it succeeds, but when I got back into code the gpu is no longer recognised!

And then re-install again with the conda snippet and pytorch itself is working again (but torchsort with cuda doesn't)

Is there some procedure I'm missing? Or some python version weirdness that's breaking things?

cheers!

isotonic_cpu.cpython-37m-x86_64-linux-gnu.so: undefined symbol: _ZNK2at6Tensor8data_ptrIfEEPT_v

Hello, thank you for the library.
I am trying to use spearmanr example from the main README.md on V100s and A100s GPUs but getting error below.
cuda 11.3
pytorch 1.10.0
python 3.7.11
torchsort 0.1.7

File "/truba/home/fkahraman/miniconda3/envs/openmmlab/lib/python3.7/site-packages/torchsort/ops.py", line 18, in
from .isotonic_cpu import isotonic_kl as isotonic_kl_cpu
ImportError: /truba/home/fkahraman/miniconda3/envs/openmmlab/lib/python3.7/site-packages/torchsort/isotonic_cpu.cpython-37m-x86_64-linux-gnu.so: undefined symbol: _ZNK2at6Tensor8data_ptrIfEEPT_v
from .isotonic_cpu import isotonic_kl as isotonic_kl_cpu
ImportError: /truba/home/fkahraman/miniconda3/envs/openmmlab/lib/python3.7/site-packages/torchsort/isotonic_cpu.cpython-37m-x86_64-linux-gnu.so: undefined symbol: _ZNK2at6Tensor8data_ptrIfEEPT_v
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 99273) of binary: /truba/home/fkahraman/miniconda3/envs/openmmlab/bin/python

Reproducing CIFAR results

Thanks a lot for this implementation.
I was wondering how can I use the repo to reproduce the results on CIFAR as reported in the paper. As I understand, the target one-hot encoding will serve as top-k classification(k=1). But, after obtaining the logits and passing through the softmax(putting output [0, 1]) the objective is to make the output follow the target ordering. How can this be achieved?

Using TorchSort for TSP Solver

Hello,

I am using torchsort.soft_rank to get ranked indices from the model output logits, and then calculate a loss function for TSP (Travelling Salesman Problem) as follows. (I had previously tried it with argsort but switched to soft_rank as it was not differentiable).

points = torch.rand(args.funcd, 2).cuda().requires_grad_()

def path_reward(rank):
    #a = points[rank]
    a = points.index_select(0, rank)
    b = torch.cat((a[1:], a[0].unsqueeze(0)))
    return pdist(a,b).sum()

rank = torchsort.soft_rank(logits, regularization_strength=0.001).floor().long() - 1

However, network weights are still not getting updated by optimizer. I suspect this may be because index_select, as I have read that it is non-differentiable with respect to the index. Could you recommend an alternative solution for solving TSP via soft_rank?

Happy new year.

Sincerely,
Kamer

Unable to install with CUDA

Hi, I'm excited to use this package but unfortunately am having issues getting it working with CUDA. I am using a conda env and have followed the steps in the README related to that. My torch version is 1.11.0 and my cudatoolkit version is 11.3.1. My Python version is 3.8.13 on a Linux machine if that is relevant.

From a fresh environment:

conda install -c pytorch pytorch torchvision cudatoolkit=11.3
pip install torchsort

then if I try to use torchsort on a CUDA tensor, I get ImportError: You are trying to use the torchsort CUDA extension, but it looks like it is not available. Make sure you have the CUDA toolchain installed, and reinstall torchsort with pip install --force-reinstall --no-cache-dir torchsort to rebuild the extension. (which I have tried a few times now).

Any help in getting this working would be amazing! Thanks so much!

Cannot install with Pytorch 1.8.0

repanda_zwx@repandazwx:/home/rpanda$ conda activate iqa2
(iqa2) repanda_zwx@repandazwx:/home/rpanda$ pip install torchsort
ERROR: Could not find a version that satisfies the requirement torchsort
ERROR: No matching distribution found for torchsort
(iqa2) repanda_zwx@repandazwx:/home/rpanda$ python
Python 3.7.10 (default, Feb 26 2021, 18:47:35)
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.

import torch
print(torch.version)
1.8.0

Installing with custom g++ versions

Hi,

while installing the package I ran into the problem that the CUDA version used to install my torch installation (CUDA 9.2) is not compatible with recent g++ versions (7 or higher). The error looks something like this:

/usr/lib/cuda-9.2/include/crt/host_config.h:119:2: error: #error -- unsupported GNU version! gcc versions later than 7 are not supported!

So I just wanted to leave a comment that adapting setup.py as follows (also see https://github.com/davidstutz/torchsort/blob/d97f30233dc7b463899a9072b6a77cb7abade5c7/setup.py#L22) allows to set custom gcc/g++ versions for building:

os.environ['CC'] = '/usr/bin/gcc-7'
os.environ['CXX'] = '/usr/bin/g++-7'
os.environ['CCP'] = '/usr/bin/g++-7'


@lru_cache(None)
def cuda_toolkit_available():
    # https://github.com/idiap/fast-transformers/blob/master/setup.py
    try:
        call(["CC=/usr/bin/gcc-7 CXX=/usr/bin/g++-7 CCP=/usr/bin/g++-7 nvcc"], stdout=DEVNULL, stderr=DEVNULL)
        return True
    except FileNotFoundError:
        return False

Thanks for the package!
David

Regularization CUDA Memory Leak

Computing the soft_rank over a CUDA tensor that requires gradients results in a memory leak, when a regularisation other than l2 is chosen. However, under the same conditions soft_sort seems to work correctly.

import subprocess as sp
import torch
import torchsort

# stored on GPU
# there does not seem to be a memory leak when requires_grad=False
pred = torch.randn(256, 3*64*64, requires_grad=True).cuda()

# eventually our program will run out of memory
for i in range(100000):
    # problematic line, works when regularization="l2"
    torchsort.soft_rank(pred, regularization="l1")
    # check the current GPU free memory
    print(i, ':', sp.check_output("nvidia-smi --query-gpu=memory.free --format=csv,noheader".split()).decode().strip())

RuntimeError: CUDA error: an illegal memory access was encountered

I got the below error when training my network for a while (10-20 epochs).
Traceback (most recent call last): File "train.py", line 232, in <module> main() File "train.py", line 179, in main train(cfg, train_loader, model, criterion, optimizer, lr_scheduler, epoch, final_output_dir, tb_log_dir, writer_dict) File "/home/maxchu/Fin/numerai_dev/function.py", line 62, in train loss, loss_indv = criterion(pred, target, auto_pred, auto_target) File "/home/maxchu/fin_venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl result = self.forward(*input, **kwargs) File "/home/maxchu/Fin/numerai_dev/loss.py", line 39, in forward - 0.1 * spearman(pred, target, regularization_strength=1e-2), File "/home/maxchu/Fin/numerai_dev/loss.py", line 24, in spearman pred = torchsort.soft_rank( File "/home/maxchu/fin_venv/lib/python3.8/site-packages/torchsort-0.1.4-py3.8-linux-x86_64.egg/torchsort/ops.py", line 40, in soft_rank return SoftRank.apply(values, regularization, regularization_strength) File "/home/maxchu/fin_venv/lib/python3.8/site-packages/torchsort-0.1.4-py3.8-linux-x86_64.egg/torchsort/ops.py", line 96, in forward ret = (s - dual_sol).gather(1, inv_permutation) RuntimeError: CUDA error: an illegal memory access was encountered
Some details:

System: Ubuntu 18.06
Python 3.8 using venv
Install method: Manual compile (git clone -> python setup.py install)
PyTorch 1.8.0 with cuda 10.2 (and correspinding pytorch geometric package)

Please let me know if you need more informations.

CUDA benchmarks might be misleading

I wanted to try to improve/modify the torchsort code a little so I tried making a copy of the SoftSort class and the soft_sort function.

Running some benchmarks I got the following results:

Which was worrying. The carbon copy diverges at a similar point to the figure in the readme:

I then re-ran the benchmark with the exact same function twice (not even a copy) and got the same results.

That code can be found here:

import sys
from collections import defaultdict
from timeit import timeit

import matplotlib.pyplot as plt
import torch

import torchsort

try:
    import fast_soft_sort.pytorch_ops as fss
except ImportError:
    print("install fast_soft_sort:")
    print("pip install git+https://github.com/google-research/fast-soft-sort")
    sys.exit()


N = list(range(1, 5_000, 100))
B = [2 ** i for i in range(9)]
B_CUDA = [2 ** i for i in range(13)]
SAMPLES = 100
CONVERT = 1e-6  # convert seconds to micro-seconds


def time(f):
    return timeit(f, number=SAMPLES) / SAMPLES / CONVERT


def backward(f, x):
    y = f(x)
    torch.autograd.grad(y.sum(), x)


def style(name):
    if name == "torch.sort":
        return {"color": "blue"}
    linestyle = "--" if "backward" in name else "-"
    if "fast_soft_sort" in name:
        return {"color": "green", "linestyle": linestyle}
    elif "again" in name:
        return {"color": "red", "linestyle": linestyle}
    else:
        return {"color": "orange", "linestyle": linestyle}


def batch_size(ax):
    data = defaultdict(list)
    for b in B:
        x = torch.randn(b, 100)
        # data["torch.sort"].append(time(lambda: torch.sort(x)))
        data["torchsort"].append(time(lambda: torchsort.soft_sort(x)))
        data["torchsort_again"].append(time(lambda: torchsort.soft_sort(x)))
        # data["fast_soft_sort"].append(time(lambda: fss.soft_sort(x)))
        x = torch.randn(b, 100, requires_grad=True)
        data["torchsort (with backward)"].append(
            time(lambda: backward(torchsort.soft_sort, x))
        )
        data["torchsort_again (with backward)"].append(
            time(lambda: backward(torchsort.soft_sort, x))
        )
        # data["fast_soft_sort (with backward)"].append(
        #     time(lambda: backward(fss.soft_sort, x))
        # )

    for label in data.keys():
        ax.plot(B, data[label], label=label, **style(label))
    ax.set_xlabel("Batch Size")
    ax.set_ylim(0, 5000)
    ax.set_ylabel("Execution Time (μs)")
    ax.legend()


def sequence_length(ax):
    data = defaultdict(list)
    for n in N:
        x = torch.randn(1, n)
        # data["torch.sort"].append(time(lambda: torch.sort(x)))
        data["torchsort"].append(time(lambda: torchsort.soft_sort(x)))
        data["torchsort_again"].append(time(lambda: torchsort.soft_sort(x)))
        # data["fast_soft_sort"].append(time(lambda: fss.soft_sort(x)))
        x = torch.randn(1, n, requires_grad=True)
        data["torchsort (with backward)"].append(
            time(lambda: backward(torchsort.soft_sort, x))
        )
        data["torchsort_again (with backward)"].append(
            time(lambda: backward(torchsort.soft_sort, x))
        )
        # data["fast_soft_sort (with backward)"].append(
        #     time(lambda: backward(fss.soft_sort, x))
        # )

    for label in data.keys():
        ax.plot(N, data[label], label=label, **style(label))
    ax.set_xlabel("Sequence Length")
    ax.set_ylim(0, 1000)
    ax.set_ylabel("Execution Time (μs)")
    ax.legend()


def batch_size_cuda(ax):
    data = defaultdict(list)
    for b in B_CUDA:
        x = torch.randn(b, 100).cuda()
        # data["torch.sort"].append(time(lambda: torch.sort(x)))
        data["torchsort"].append(time(lambda: torchsort.soft_sort(x)))
        data["torchsort_again"].append(time(lambda: torchsort.soft_sort(x)))
        x = torch.randn(b, 100, requires_grad=True).cuda()
        data["torchsort (with backward)"].append(
            time(lambda: backward(torchsort.soft_sort, x))
        )
        data["torchsort_again (with backward)"].append(
            time(lambda: backward(torchsort.soft_sort, x))
        )
    for label in data.keys():
        ax.plot(B_CUDA, data[label], label=label, **style(label))
    ax.set_xlabel("Batch Size")
    ax.set_ylabel("Execution Time (μs)")
    ax.legend()


def sequence_length_cuda(ax):
    data = defaultdict(list)
    for n in N:
        x = torch.randn(1, n).cuda()
        # data["torch.sort"].append(time(lambda: torch.sort(x)))
        data["torchsort"].append(time(lambda: torchsort.soft_sort(x)))
        data["torchsort_again"].append(time(lambda: torchsort.soft_sort(x)))
        x = torch.randn(1, n, requires_grad=True).cuda()
        data["torchsort (with backward)"].append(
            time(lambda: backward(torchsort.soft_sort, x))
        )
        data["torchsort_again (with backward)"].append(
            time(lambda: backward(torchsort.soft_sort, x))
        )
    for label in data.keys():
        ax.plot(N, data[label], label=label, **style(label))
    ax.set_xlabel("Sequence Length")
    ax.set_ylabel("Execution Time (μs)")
    ax.legend()


if __name__ == "__main__":
    # jit/warmup
    x = torch.randn(1, 10, requires_grad=True)
    backward(torchsort.soft_sort, x)
    backward(fss.soft_sort, x)

    fig, (ax1, ax2) = plt.subplots(figsize=(10, 4), ncols=2)
    sequence_length(ax1)
    batch_size(ax2)
    fig.suptitle("Torchsort Benchmark: CPU")
    fig.tight_layout()
    plt.savefig("extra/benchmark3.png")

    if torch.cuda.is_available():
        # warmup
        x = torch.randn(1, 10, requires_grad=True).cuda()
        backward(torchsort.soft_sort, x)

        fig, (ax1, ax2) = plt.subplots(figsize=(10, 4), ncols=2)
        sequence_length_cuda(ax1)
        batch_size_cuda(ax2)
        fig.suptitle("Torchsort Benchmark: CUDA")
        fig.tight_layout()
        plt.savefig("extra/benchmark_cuda3.png")

Any idea what this might depend on?

Failed building wheel when trying to install

On my potato GPU-less desktop pip install torchsort works great in the anaconda prompt, but on another machine I get a "failed building wheel" followed by a huge error message.
The environment works fine with lots of libraries and I'm running a project with it, it's just torchsort that does this problem for me.
Do you have any clue what could cause that?
Thanks!

Error during installation

I'm encountering the following error both when installing with pip and compiling from source.

<stdin>:1:1: warning: return type defaults to ‘int’ [-Wimplicit-int]
/usr/bin/ld: cannot find -lc10-avx2
collect2: error: ld returned 1 exit status
<stdin>:1:1: warning: return type defaults to ‘int’ [-Wimplicit-int]
/usr/bin/ld: cannot find -ltorch-avx2
collect2: error: ld returned 1 exit status
<stdin>:1:1: warning: return type defaults to ‘int’ [-Wimplicit-int]
/usr/bin/ld: cannot find -ltorch_cpu-avx2
collect2: error: ld returned 1 exit status
<stdin>:1:1: warning: return type defaults to ‘int’ [-Wimplicit-int]
/usr/bin/ld: cannot find -ltorch_python-avx2
collect2: error: ld returned 1 exit status
<stdin>:1:1: warning: return type defaults to ‘int’ [-Wimplicit-int]
/usr/bin/ld: cannot find -lc10-avx512
collect2: error: ld returned 1 exit status
<stdin>:1:1: warning: return type defaults to ‘int’ [-Wimplicit-int]
/usr/bin/ld: cannot find -ltorch-avx512
collect2: error: ld returned 1 exit status
<stdin>:1:1: warning: return type defaults to ‘int’ [-Wimplicit-int]
/usr/bin/ld: cannot find -ltorch_cpu-avx512
collect2: error: ld returned 1 exit status
<stdin>:1:1: warning: return type defaults to ‘int’ [-Wimplicit-int]
/usr/bin/ld: cannot find -ltorch_python-avx512
collect2: error: ld returned 1 exit status
g++ -pthread -shared -Wa,-mbranches-within-32B-boundaries -Wl,--build-id=sha1 -Wl,--build-id=sha1 /home/user/downloads/torchsort/build/temp.linux-x86_64-3.9/torchsort/isotonic_cpu.o -L/usr/lib/python3.9/site-packages/torch/lib -lc10 -ltorch -ltorch_cpu -ltorch_python -o build/lib.linux-x86_64-3.9/torchsort/isotonic_cpu.cpython-39-x86_64-linux-gnu.so
g++ -pthread -shared -Wa,-mbranches-within-32B-boundaries -Wl,--build-id=sha1 -Wl,--build-id=sha1 /home/user/downloads/torchsort/build/temp.linux-x86_64-3.9/torchsort/isotonic_cpu.o.avx2 -L/usr/lib/python3.9/site-packages/torch/lib -lc10 -ltorch -ltorch_cpu -ltorch_python -o build/lib.linux-x86_64-3.9/torchsort/isotonic_cpu.cpython-39-x86_64-linux-gnu.so.avx2
/usr/bin/ld: cannot find /home/user/downloads/torchsort/build/temp.linux-x86_64-3.9/torchsort/isotonic_cpu.o.avx2: No such file or directory
collect2: error: ld returned 1 exit status
error: command '/usr/bin/g++' failed with exit code 1

Any ideas on how to resolve this?

installation error

Hi,
Thank you for the nice library. I am trying to install torchsort on a AWS machine with PyTorch 1.7.1 with Python3.7 (CUDA 11.1 and Intel MKL), and encounted the following error:

Looking in indexes: https://pypi.org/simple, https://pip.repos.neuron.amazonaws.com
Collecting torchsort
  Using cached torchsort-0.1.5.tar.gz (13 kB)
Requirement already satisfied: torch in /home/ubuntu/anaconda3/envs/pytorch_latest_p37/lib/python3.7/site-packages (from torchsort) (1.8.1+cu111)
Requirement already satisfied: numpy in /home/ubuntu/anaconda3/envs/pytorch_latest_p37/lib/python3.7/site-packages (from torch->torchsort) (1.19.2)
Requirement already satisfied: typing-extensions in /home/ubuntu/anaconda3/envs/pytorch_latest_p37/lib/python3.7/site-packages (from torch->torchsort) (3.7.4.3)
Building wheels for collected packages: torchsort
  Building wheel for torchsort (setup.py) ... error
  ERROR: Command errored out with exit status 1:
   command: /home/ubuntu/anaconda3/envs/pytorch_latest_p37/bin/python -u -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-55bpk6jp/torchsort_a99e8d8dd95647faafe6d8d72aa77601/setup.py'"'"'; __file__='"'"'/tmp/pip-install-55bpk6jp/torchsort_a99e8d8dd95647faafe6d8d72aa77601/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(__file__) if os.path.exists(__file__) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' bdist_wheel -d /tmp/pip-wheel-sk1_6dbm
       cwd: /tmp/pip-install-55bpk6jp/torchsort_a99e8d8dd95647faafe6d8d72aa77601/
  Complete output (32 lines):
  running bdist_wheel
  /home/ubuntu/anaconda3/envs/pytorch_latest_p37/lib/python3.7/site-packages/torch/utils/cpp_extension.py:369: UserWarning: Attempted to use ninja as the BuildExtension backend but we could not find ninja.. Falling back to using the slow distutils backend.
    warnings.warn(msg.format('we could not find ninja.'))
  running build
  running build_py
  creating build
  creating build/lib.linux-x86_64-3.7
  creating build/lib.linux-x86_64-3.7/torchsort
  copying torchsort/__init__.py -> build/lib.linux-x86_64-3.7/torchsort
  copying torchsort/ops.py -> build/lib.linux-x86_64-3.7/torchsort
  running egg_info
  writing torchsort.egg-info/PKG-INFO
  writing dependency_links to torchsort.egg-info/dependency_links.txt
  writing requirements to torchsort.egg-info/requires.txt
  writing top-level names to torchsort.egg-info/top_level.txt
  reading manifest file 'torchsort.egg-info/SOURCES.txt'
  writing manifest file 'torchsort.egg-info/SOURCES.txt'
  copying torchsort/isotonic_cpu.cpp -> build/lib.linux-x86_64-3.7/torchsort
  running build_ext
  building 'torchsort.isotonic_cpu' extension
  creating build/temp.linux-x86_64-3.7
  creating build/temp.linux-x86_64-3.7/torchsort
  /home/ubuntu/anaconda3/envs/pytorch_latest_p37/bin/x86_64-conda-linux-gnu-cc -Wno-unused-result -Wsign-compare -DNDEBUG -fwrapv -O2 -Wall -Wstrict-prototypes -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -pipe -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -pipe -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -ffunction-sections -pipe -isystem /home/ubuntu/anaconda3/envs/pytorch_latest_p37/include -DNDEBUG -D_FORTIFY_SOURCE=2 -O2 -isystem /home/ubuntu/anaconda3/envs/pytorch_latest_p37/include -fPIC -I/home/ubuntu/anaconda3/envs/pytorch_latest_p37/lib/python3.7/site-packages/torch/include -I/home/ubuntu/anaconda3/envs/pytorch_latest_p37/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -I/home/ubuntu/anaconda3/envs/pytorch_latest_p37/lib/python3.7/site-packages/torch/include/TH -I/home/ubuntu/anaconda3/envs/pytorch_latest_p37/lib/python3.7/site-packages/torch/include/THC -I/home/ubuntu/anaconda3/envs/pytorch_latest_p37/include/python3.7m -c torchsort/isotonic_cpu.cpp -o build/temp.linux-x86_64-3.7/torchsort/isotonic_cpu.o -fopenmp -ffast-math -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="_gcc" -DPYBIND11_STDLIB="_libstdcpp" -DPYBIND11_BUILD_ABI="_cxxabi1011" -DTORCH_EXTENSION_NAME=isotonic_cpu -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++14
  cc1plus: warning: command line option '-Wstrict-prototypes' is valid for C/ObjC but not for C++
  /home/ubuntu/anaconda3/envs/pytorch_latest_p37/bin/x86_64-conda-linux-gnu-c++ -pthread -shared -Wl,-O2 -Wl,--sort-common -Wl,--as-needed -Wl,-z,relro -Wl,-z,now -Wl,-rpath,/home/ubuntu/anaconda3/envs/pytorch_latest_p37/lib -L/home/ubuntu/anaconda3/envs/pytorch_latest_p37/lib -Wl,-O2 -Wl,--sort-common -Wl,--as-needed -Wl,-z,relro -Wl,-z,now -Wl,-rpath,/home/ubuntu/anaconda3/envs/pytorch_latest_p37/lib -L/home/ubuntu/anaconda3/envs/pytorch_latest_p37/lib -Wl,-O2 -Wl,--sort-common -Wl,--as-needed -Wl,-z,relro -Wl,-z,now -Wl,--disable-new-dtags -Wl,--gc-sections -Wl,-rpath,/home/ubuntu/anaconda3/envs/pytorch_latest_p37/lib -Wl,-rpath-link,/home/ubuntu/anaconda3/envs/pytorch_latest_p37/lib -L/home/ubuntu/anaconda3/envs/pytorch_latest_p37/lib -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -ffunction-sections -pipe -isystem /home/ubuntu/anaconda3/envs/pytorch_latest_p37/include -DNDEBUG -D_FORTIFY_SOURCE=2 -O2 -isystem /home/ubuntu/anaconda3/envs/pytorch_latest_p37/include build/temp.linux-x86_64-3.7/torchsort/isotonic_cpu.o -L/home/ubuntu/anaconda3/envs/pytorch_latest_p37/lib/python3.7/site-packages/torch/lib -lc10 -ltorch -ltorch_cpu -ltorch_python -o build/lib.linux-x86_64-3.7/torchsort/isotonic_cpu.cpython-37m-x86_64-linux-gnu.so
  building 'torchsort.isotonic_cuda' extension
  /usr/local/cuda-11.1/bin/nvcc -I/home/ubuntu/anaconda3/envs/pytorch_latest_p37/lib/python3.7/site-packages/torch/include -I/home/ubuntu/anaconda3/envs/pytorch_latest_p37/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -I/home/ubuntu/anaconda3/envs/pytorch_latest_p37/lib/python3.7/site-packages/torch/include/TH -I/home/ubuntu/anaconda3/envs/pytorch_latest_p37/lib/python3.7/site-packages/torch/include/THC -I/usr/local/cuda-11.1/include -I/home/ubuntu/anaconda3/envs/pytorch_latest_p37/include/python3.7m -c torchsort/isotonic_cuda.cu -o build/temp.linux-x86_64-3.7/torchsort/isotonic_cuda.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options '-fPIC' -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="_gcc" -DPYBIND11_STDLIB="_libstdcpp" -DPYBIND11_BUILD_ABI="_cxxabi1011" -DTORCH_EXTENSION_NAME=isotonic_cuda -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_70,code=compute_70 -gencode=arch=compute_70,code=sm_70 -ccbin /home/ubuntu/anaconda3/envs/pytorch_latest_p37/bin/x86_64-conda-linux-gnu-cc -std=c++14
  x86_64-conda-linux-gnu-cc: error: torchsort/isotonic_cuda.cu: No such file or directory
  x86_64-conda-linux-gnu-cc: warning: '-x c++' after last input file has no effect
  x86_64-conda-linux-gnu-cc: fatal error: no input files
  compilation terminated.
  error: command '/usr/local/cuda-11.1/bin/nvcc' failed with exit status 1
  ----------------------------------------
  ERROR: Failed building wheel for torchsort
  Running setup.py clean for torchsort
Failed to build torchsort
Installing collected packages: torchsort
    Running setup.py install for torchsort ... error
    ERROR: Command errored out with exit status 1:
     command: /home/ubuntu/anaconda3/envs/pytorch_latest_p37/bin/python -u -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-55bpk6jp/torchsort_a99e8d8dd95647faafe6d8d72aa77601/setup.py'"'"'; __file__='"'"'/tmp/pip-install-55bpk6jp/torchsort_a99e8d8dd95647faafe6d8d72aa77601/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(__file__) if os.path.exists(__file__) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /tmp/pip-record-5t6z_gdw/install-record.txt --single-version-externally-managed --compile --install-headers /home/ubuntu/anaconda3/envs/pytorch_latest_p37/include/python3.7m/torchsort
         cwd: /tmp/pip-install-55bpk6jp/torchsort_a99e8d8dd95647faafe6d8d72aa77601/
    Complete output (32 lines):
    running install
    running build
    running build_py
    creating build
    creating build/lib.linux-x86_64-3.7
    creating build/lib.linux-x86_64-3.7/torchsort
    copying torchsort/__init__.py -> build/lib.linux-x86_64-3.7/torchsort
    copying torchsort/ops.py -> build/lib.linux-x86_64-3.7/torchsort
    running egg_info
    writing torchsort.egg-info/PKG-INFO
    writing dependency_links to torchsort.egg-info/dependency_links.txt
    writing requirements to torchsort.egg-info/requires.txt
    writing top-level names to torchsort.egg-info/top_level.txt
    /home/ubuntu/anaconda3/envs/pytorch_latest_p37/lib/python3.7/site-packages/torch/utils/cpp_extension.py:369: UserWarning: Attempted to use ninja as the BuildExtension backend but we could not find ninja.. Falling back to using the slow distutils backend.
      warnings.warn(msg.format('we could not find ninja.'))
    reading manifest file 'torchsort.egg-info/SOURCES.txt'
    writing manifest file 'torchsort.egg-info/SOURCES.txt'
    copying torchsort/isotonic_cpu.cpp -> build/lib.linux-x86_64-3.7/torchsort
    running build_ext
    building 'torchsort.isotonic_cpu' extension
    creating build/temp.linux-x86_64-3.7
    creating build/temp.linux-x86_64-3.7/torchsort
    /home/ubuntu/anaconda3/envs/pytorch_latest_p37/bin/x86_64-conda-linux-gnu-cc -Wno-unused-result -Wsign-compare -DNDEBUG -fwrapv -O2 -Wall -Wstrict-prototypes -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -pipe -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -pipe -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -ffunction-sections -pipe -isystem /home/ubuntu/anaconda3/envs/pytorch_latest_p37/include -DNDEBUG -D_FORTIFY_SOURCE=2 -O2 -isystem /home/ubuntu/anaconda3/envs/pytorch_latest_p37/include -fPIC -I/home/ubuntu/anaconda3/en
vs/pytorch_latest_p37/lib/python3.7/site-packages/torch/include -I/home/ubuntu/anaconda3/envs/pytorch_latest_p37/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -I/home/ubuntu/anaconda3/envs/pytorch_latest_p37/lib/python3.7/site-packages/torch/include/TH -I/home/ubuntu/anaconda3/envs/pytorch_latest_p37/lib/python3.7/site-packages/torch/include/THC -I/home/ubuntu/anaconda3/envs/pytorch_latest_p37/include/python3.7m -c torchsort/isotonic_cpu.cpp -o build/temp.linux-x86_64-3.7/torchsort/isotonic_cpu.o -fopenmp -ffast-math -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="_gcc" -DPYBIND11_STDLIB="_libstdcpp" -DPYBIND11_BUILD_ABI="_cxxabi1011" -DTORCH_EXTENSION_NAME=isotonic_cpu -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++14
    cc1plus: warning: command line option '-Wstrict-prototypes' is valid for C/ObjC but not for C++
    /home/ubuntu/anaconda3/envs/pytorch_latest_p37/bin/x86_64-conda-linux-gnu-c++ -pthread -shared -Wl,-O2 -Wl,--sort-common -Wl,--as-needed -Wl,-z,relro -Wl,-z,now -Wl,-rpath,/home/ubuntu/anaconda3/envs/pytorch_latest_p37/lib -L/home/ubuntu/anaconda3/envs/pytorch_latest_p37/lib -Wl,-O2 -Wl,--sort-common -Wl,--as-needed -Wl,-z,relro -Wl,-z,now -Wl,-rpath,/home/ubuntu/anaconda3/envs/pytorch_latest_p37/lib -L/home/ubuntu/anaconda3/envs/pytorch_latest_p37/lib -Wl,-O2 -Wl,--sort-common -Wl,--as-needed -Wl,-z,relro -Wl,-z,now -Wl,--disable-new-dtags -Wl,--gc-sections -Wl,-rpath,/home/ubuntu/anaconda3/envs/pytorch_latest_p37/lib -Wl,-rpath-link,/home/ubuntu/anaconda3/envs/pytorch_latest_p37/lib -L/home/ubuntu/anaconda3/envs/pytorch_latest_p37/lib -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -ffunction-sections -pipe -isystem /home/ubuntu/anaconda3/envs/pytorch_latest_p37/include -DNDEBUG -D_FORTIFY_SOURCE=2 -O2 -isystem /home/ubuntu/anaconda3/envs/pytorch_latest_p37/include build/temp.linux-x86_64-3.7/torchsort/isotonic_cpu.o -L/home/ubuntu/anaconda3/envs/pytorch_latest_p37/lib/python3.7/site-packages/torch/lib -lc10 -ltorch -ltorch_cpu -ltorch_python -o build/lib.linux-x86_64-3.7/torchsort/isotonic_cpu.cpython-37m-x86_64-linux-gnu.so
    building 'torchsort.isotonic_cuda' extension
    /usr/local/cuda-11.1/bin/nvcc -I/home/ubuntu/anaconda3/envs/pytorch_latest_p37/lib/python3.7/site-packages/torch/include -I/home/ubuntu/anaconda3/envs/pytorch_latest_p37/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -I/home/ubuntu/anaconda3/envs/pytorch_latest_p37/lib/python3.7/site-packages/torch/include/TH -I/home/ubuntu/anaconda3/envs/pytorch_latest_p37/lib/python3.7/site-packages/torch/include/THC -I/usr/local/cuda-11.1/include -I/home/ubuntu/anaconda3/envs/pytorch_latest_p37/include/python3.7m -c torchsort/isotonic_cuda.cu -o build/temp.linux-x86_64-3.7/torchsort/isotonic_cuda.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options '-fPIC' -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="_gcc" -DPYBIND11_STDLIB="_libstdcpp" -DPYBIND11_BUILD_ABI="_cxxabi1011" -DTORCH_EXTENSION_NAME=isotonic_cuda -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_70,code=compute_70 -gencode=arch=compute_70,code=sm_70 -ccbin /home/ubuntu/anaconda3/envs/pytorch_latest_p37/bin/x86_64-conda-linux-gnu-cc -std=c++14
    x86_64-conda-linux-gnu-cc: error: torchsort/isotonic_cuda.cu: No such file or directory
    x86_64-conda-linux-gnu-cc: warning: '-x c++' after last input file has no effect
    x86_64-conda-linux-gnu-cc: fatal error: no input files
    compilation terminated.
    error: command '/usr/local/cuda-11.1/bin/nvcc' failed with exit status 1
    ----------------------------------------
ERROR: Command errored out with exit status 1: /home/ubuntu/anaconda3/envs/pytorch_latest_p37/bin/python -u -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-55bpk6jp/torchsort_a99e8d8dd95647faafe6d8d72aa77601/setup.py'"'"'; __file__='"'"'/tmp/pip-install-55bpk6jp/torchsort_a99e8d8dd95647faafe6d8d72aa77601/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(__file__) if os.path.exists(__file__) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /tmp/pip-record-5t6z_gdw/install-record.txt --single-version-externally-managed --compile --install-headers /home/ubuntu/anaconda3/envs/pytorch_latest_p37/include/python3.7m/torchsort Check the logs for full command output.

Do do know what goes wrong？Let me know if u need any further info. Thanks!

cuda extension install nvcc version

What version of nvcc is required to install the cuda extension? Does it just have to match that used to install pytorch?

Memory issue with Spearman for large matrix

I have 24K x 24K matrix that needs to compute Spearman correlation:
pred -> torch.Size([24000, 24000])
target -> torch.Size([24000, 24000])

However, it gives memory warning issue and the notebook halts when I use your spearman function example:

s = spearman(pred, target)

Can you please advise on how I can compute spearman for a large matrix of this size using your library? Would appreciate it if could share the code example. Thanks.

Spearman function as the following:

import torchsort

def corrcoef(target, pred):
    pred_n = pred - pred.mean()
    target_n = target - target.mean()
    pred_n = pred_n / pred_n.norm()
    target_n = target_n / target_n.norm()
    return (pred_n * target_n).sum()


def spearman(
    target,
    pred,
    regularization="l2",
    regularization_strength=0.1,
):
    pred = torchsort.soft_rank(
        pred,
        regularization=regularization,
        regularization_strength=regularization_strength,
    )  
    return corrcoef(target, pred / pred.shape[-1])

In Python 3.9, running 'torchsort' will report a error

Specifically, when I use python 3.9, it will report this error:

ImportError: /home/gyyang/anaconda3/lib/python3.9/site-packages/torchsort/isotonic_cpu.cpython-39-x86_64-linux-gnu.so: undefined symbol: _ZN3c105ErrorC2ENS_14SourceLocationESs

But Python 3.8 is not such an error.

RuntimeError: CUDA error: invalid device function

Hi there, thank you very much for this library- it has worked great for one of the problems I've encountered in research, at least on CPU. Now I am trying to get it to run on GPU; I have installed it from a git clone of this library and then python setup.py install. However, I'm running into this stack trace. Like others, I've tried to use the spearmanr example from the main README.md.

I tried to look for other closed issues involving this to see if anyone else had run into it, but I haven't found any (or maybe I missed them?)

     34 def spearmanr(pred, target, **kw):
---> 35     pred = torchsort.soft_rank(pred, **kw)
     36     target = torchsort.soft_rank(target, **kw)
     37     pred = pred - pred.mean()

~/envs/pytorch1938/lib/python3.8/site-packages/torchsort-0.1.5-py3.8-linux-x86_64.egg/torchsort/ops.py in soft_rank(values, regularization, regularization_strength)
     38     if regularization not in ["l2", "kl"]:
     39         raise ValueError(f"'regularization' should be a 'l2' or 'kl'")
---> 40     return SoftRank.apply(values, regularization, regularization_strength)
     41 
     42 

~/envs/pytorch1938/lib/python3.8/site-packages/torchsort-0.1.5-py3.8-linux-x86_64.egg/torchsort/ops.py in forward(ctx, tensor, regularization, regularization_strength)
     94         if ctx.regularization == "l2":
     95             dual_sol = isotonic_l2[s.device.type](s - w)
---> 96             ret = (s - dual_sol).gather(1, inv_permutation)
     97             factor = torch.tensor(1.0, device=s.device)
     98         else:

RuntimeError: CUDA error: invalid device function
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

Here is my system information:

$ uname -a
Linux xxx.xxx.ucl.ac.uk 4.15.0-143-generic #147-Ubuntu SMP Wed Apr 14 16:10:11 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

Here is my Python information (I am running anaconda - so this is an excerpt from conda info)

         python version : 3.8.10.final.0
       virtual packages : __cuda=11.2=0
                          __linux=4.15.0=0
                          __glibc=2.27=0
                          __unix=0=0
                          __archspec=1=x86_64

Many thanks!

Isotonic backward pass in C++

Currently the backward pass is in Python, (with a for-loop 😠). This should be rewritten in C++

differentiable top k indices

Hello,

I am looking for the the work around of back prop through torch.topk . torch.topk return values and indices. It can backprop values but not indices (see here). I was wondering can torchsort can somehow achieve this? Thank you so much!

Apply torchsort to learn real permutation for downstream tasks

Hi, your torchsort is a quite solid, interesting and inspiring work!

I am trying to apply it into my task. Specifically, I want to use it to learn a permutation for a image x with a fixed shape of [b, c, h, w] (batch size, channel number, height, width). The following toy code shows my basic idea:

...
perm_param = nn.Parameter(torch.rand(h * w))  # learnable parameter for permutation
...
perm = torchsort.soft_rank(perm_param) # generate a learnable permutation via torchsort
...  # some discretization operations
x_p = x.reshape(b, c, h * w)
x_p = x[:, :, perm]  # permute the image
x_p = x.reshape(b, c, h, w)
loss = my_loss(Net(x_p), target)
...
loss.backward()
...

In my implementation, I want to use torchsort to learn a permutation based on a fixed parameter tensor perm_param for an image with fixed size. However, my basic implementation as showed above can not successfully learn the permutation since the loss.backward() would not reach perm_param and update it due to the undifferentiable operations including indexing and ones like .long() for discretization.

I am quite sure that there exists an optimal permutation in my task. However, finding it may bring an O((h * w)!) time complexity. Is there a way to learn the permutation by using torchsort? I am still trying and thinking ...

I am looking forwarding to your reply. Thank you very much for reading such a long post!

kl regularization returns nan for gradient.

Perhaps I am missing something obvious however I found kl regularization to work a bit better for my data so I switch from l2 to kl and began to get nan values in the gradient.

Here is the test code I used to verify...
Python 3.8.10
torch==1.13.0
torchsort==0.1.9

Use of l2 works as expected.

import torch
import torchsort

# This works...
X = torch.tensor([[0.1, 0.3, 0.5, 0.03, 0.2, 0.15, 0.65, 0.7, 0.9]], requires_grad=True)
Y = torchsort.soft_rank(X.view(1, -1),regularization='l2', regularization_strength=1e-4).view(-1)
Y = (Y - Y.min()) / (Y.max() - Y.min())
X_Loss = -torch.log(1 - torch.abs(Y - X.flatten()) + 1e-10)

torch.autograd.grad(X_Loss.mean(), X)
#--yields-> (tensor([[-0.1140, -0.1389, -0.1270,  0.1145, -0.1347, -0.1235, -0.1235, -0.1347,-0.1235]]),)

Use of kl returns nan.

X = torch.tensor([[0.1, 0.3, 0.5, 0.03, 0.2, 0.15, 0.65, 0.7, 0.9]], requires_grad=True)
Y = torchsort.soft_rank(X.view(1, -1),regularization='kl', regularization_strength=1e-4).view(-1)
Y = (Y - Y.min()) / (Y.max() - Y.min())
X_Loss = -torch.log(1 - torch.abs(Y - X.flatten()) + 1e-10)

torch.autograd.grad(X_Loss.mean(), X)
#--yields-> (tensor([[nan, nan, nan, nan, nan, nan, nan, nan, nan]]),)

soft_rank has memory leak?

Hi I have installed the main branch, and I'm seeing that the torchsort.soft_rank function is causing memory leaks. Looking at, nvidia-smi, it does not free up any memory and I see the following printed out over and over:

[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)

Installation error on conda environment (and how to fix it)

Hi!
First things first, thanks for this package! Now, it looks like in some linux systems the package does not compile with pip install in conda environments (to me it looks like it does not compile on colab as well though).
In my case, the error was that the system compiler and the compiler used for installing pytorch via conda are different. I'm opening this issue just FYI (and because this will probably be helpful to others):
if this happens, you should install the g++ compiler with conda install -c conda-forge gxx_linux-64, set the env variable export CXX=/path/to/miniconda3/envs/env_name/bin/x86_64-conda_cos6-linux-gnu-g++. If this still fails, export also the LD_LIBRARY_PATH env variable to export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/path/to/miniconda3/lib.

16bit support?

Curious to know if it would be difficult to implement ( or is perhaps in a future plan) torach.float16? Currently when trying to use 16bit with torchsort we are given the error: "arange_cpu" not implemented for 'Half'

installation issue with cuda

Hello, I'm sorry to open such a wide issue but this has already taken more than 15 hours of my life.

I have run the following commands:

rm -rf env_borgen
python -m venv env_borgen
source env_borgen/bin/activate
pip install --upgrade pip
pip install --force-reinstall torch --extra-index-url https://download.pytorch.org/whl/cu116  #gpu install
pip install --no-cache-dir --upgrade torchsort 
python check_install.py

where check_install.py is simply:

import torch
from torchsort import soft_rank

aaa = torch.rand(1, 3)
soft_rank(aaa.cpu())
soft_rank(aaa.cuda())
print('everything works!')

but I get

raceback (most recent call last):
  File "/home/franchesoni/projects/current/borgen2/setup/check_install.py", line 7, in <module>
    soft_rank(aaa.cuda())
  File "/home/franchesoni/projects/current/borgen2/env_borgen/lib/python3.9/site-packages/torchsort/ops.py", line 48, in soft_rank
    return SoftRank.apply(values, regularization, regularization_strength)
  File "/home/franchesoni/projects/current/borgen2/env_borgen/lib/python3.9/site-packages/torchsort/ops.py", line 103, in forward
    dual_sol = isotonic_l2[s.device.type](s - w)
  File "/home/franchesoni/projects/current/borgen2/env_borgen/lib/python3.9/site-packages/torchsort/ops.py", line 31, in _error
    raise ImportError(
ImportError: You are trying to use the torchsort CUDA extension, but it looks like it is not available. Make sure you have the CUDA toolchain installed, and reinstall torchsort with `pip install --force-reinstall --no-cache-dir torchsort` to rebuild the extension.

by the way, /usr/local/cuda/bin/nvcc --version returns:

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Tue_Mar__8_18:18:20_PST_2022
Cuda compilation tools, release 11.6, V11.6.124
Build cuda_11.6.r11.6/compiler.31057947_0

and I don't have sudo access.

could I please get help in this regard? I don't know how to proceed and I really want to use the tool on GPU

pip install failed in windows

Hi, I faced an installation error in windows. It installed fine in my ubuntu system. Could you tell me how I can fix it?

Internal error: assertion failed at: "C:/dvs/p4/build/sw/rel/gpu_drv/r400/r400_00/drivers/compiler/edg/EDG_4.14/
src/decl_spec.c", line 9596
    
    
    1 catastrophic error detected in the compilation of "C:/Users/Reasat/AppData/Local/Temp/tmpxft_000028b0_00000000
-5_isotonic_cuda.cpp4.ii".
    Compilation aborted.
    isotonic_cuda.cu
    nvcc error   : 'cudafe++' died with status 0xC0000409
    error: command 'C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v10.0\\bin\\nvcc.exe' failed with exit st
atus 9
    Error in atexit._run_exitfuncs:
    Traceback (most recent call last):
      File "C:\Users\Reasat\AppData\Roaming\Python\Python37\site-packages\colorama\ansitowin32.py", line 59, in clos
ed
        return stream.closed
    ValueError: underlying buffer has been detached
    ----------------------------------------
ERROR: Command errored out with exit status 1: 'D:\Miniconda3\envs\pytorch\python.exe' -u -c 'import sys, setuptools
, tokenize; sys.argv[0] = '"'"'C:\\Users\\Reasat\\AppData\\Local\\Temp\\pip-install-uw3b5i5w\\torchsort_f8d66d1aaac6
44a78d37585cc7273f94\\setup.py'"'"'; __file__='"'"'C:\\Users\\Reasat\\AppData\\Local\\Temp\\pip-install-uw3b5i5w\\to
rchsort_f8d66d1aaac644a78d37585cc7273f94\\setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.r
ead().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --recor
d 'C:\Users\Reasat\AppData\Local\Temp\pip-record-nbabfu8b\install-record.txt' --single-version-externally-managed --
compile --install-headers 'D:\Miniconda3\envs\pytorch\Include\torchsort' Check the logs for full command output.

Sorting in more than 2 dimensions

I love this library now that I got it to work with my code!

I was wondering though, are there any plans to make it work with ambiguously shaped tensors? What work would that entail?

My current training scheme has tensors shaped [B, H, W], so I am currently doing torch.stack([soft_sort(item) for item in batch]). This library is fast, but running it sequentially like that is not. Maybe there is a way to parallelize it or extend the function to use more dimensions?

Any help a pip3 install --user issue?

Here is my error code:

Using legacy 'setup.py install' for torchsort, since package 'wheel' is not installed.
Installing collected packages: torchsort
    Running setup.py install for torchsort ... \       error
ERROR: Command errored out with exit status 1:  
     command: /share/software/user/open/python/3.9.0/bin/python3.9 -u -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-rc2snp_l/torchsort_02ab43cb664b4f778f8057965fe69d12/setup.py'"'"'; __file__='"'"'/tmp/pip-install-rc2snp_l/torchsort_02ab43cb664b4f778f8057965fe69d12/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(__file__) if os.path.exists(__file__) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /tmp/pip-record-0etipwtv/install-record.txt --single-version-externally-managed --user --prefix= --compile --install-headers /home/users/huangda/.local/include/python3.9/torchsort
         cwd: /tmp/pip-install-rc2snp_l/torchsort_02ab43cb664b4f778f8057965fe69d12/
    Complete output (31 lines):  
    No CUDA runtime is found, using CUDA_HOME='/share/software/user/open/cuda/11.2.0'  
    running install  
    running build  
    running build_py  
    creating build  
    creating build/lib.linux-x86_64-3.9  
    creating build/lib.linux-x86_64-3.9/torchsort  
    copying torchsort/__init__.py -> build/lib.linux-x86_64-3.9/torchsort  
    copying torchsort/ops.py -> build/lib.linux-x86_64-3.9/torchsort  
    running egg_info  
    writing torchsort.egg-info/PKG-INFO  
    writing dependency_links to torchsort.egg-info/dependency_links.txt    
    writing requirements to torchsort.egg-info/requires.txt  
    writing top-level names to torchsort.egg-info/top_level.txt  
    /share/software/user/open/py-pytorch/1.8.1_py39/lib/python3.9/site-packages/torch/utils/cpp_extension.py:369: UserWarning: Attempted to use ninja as the BuildExtension backend but we could not find ninja.. Falling back to using the slow distutils backend.  
      warnings.warn(msg.format('we could not find ninja.'))  
    reading manifest file 'torchsort.egg-info/SOURCES.txt'  
    reading manifest template 'MANIFEST.in'  
    writing manifest file 'torchsort.egg-info/SOURCES.txt'  
    copying torchsort/isotonic_cpu.cpp -> build/lib.linux-x86_64-3.9/torchsort  
    copying torchsort/isotonic_cuda.cu -> build/lib.linux-x86_64-3.9/torchsort  
    running build_ext   
    building 'torchsort.isotonic_cpu' extension  
    creating build/temp.linux-x86_64-3.9  
    creating build/temp.linux-x86_64-3.9/torchsort  
    gcc -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -fPIC -I/share/software/user/open/py- 
 pytorch/1.8.1_py39/lib/python3.9/site-packages/torch/include -I/share/software/user/open/py-pytorch/1.8.1_py39/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/share/software/user/open/py-pytorch/1.8.1_py39/lib/python3.9/site-packages/torch/include/TH -I/share/software/user/open/py-pytorch/1.8.1_py39/lib/python3.9/site-packages/torch/include/THC -I/share/software/user/open/python/3.9.0/include/python3.9 -c torchsort/isotonic_cpu.cpp -o build/temp.linux-x86_64-3.9/torchsort/isotonic_cpu.o -fopenmp -ffast-math -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="_gcc" -DPYBIND11_STDLIB="_libstdcpp" -DPYBIND11_BUILD_ABI="_cxxabi1011" -DTORCH_EXTENSION_NAME=isotonic_cpu -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++14
    c++ -pthread -shared -L/share/software/user/open/libffi/3.2.1/lib64 -L/share/software/user/open/libressl/3.2.1/lib -L/share/software/user/open/sqlite/3.18.0/lib -L/share/software/user/open/tcltk/8.6.6/lib -L/share/software/user/open/xz/5.2.3/lib -L/share/software/user/open/zlib/1.2.11/lib build/temp.linux-x86_64-3.9/torchsort/isotonic_cpu.o -L/share/software/user/open/py-pytorch/1.8.1_py39/lib/python3.9/site-packages/torch/lib -L/share/software/user/open/python/3.9.0/lib -lc10 -ltorch -ltorch_cpu -ltorch_python -o build/lib.linux-x86_64-3.9/torchsort/isotonic_cpu.cpython-39-x86_64-linux-gnu.so
    building 'torchsort.isotonic_cuda' extension
    /share/software/user/open/cuda/11.2.0/bin/nvcc -I/share/software/user/open/py-pytorch/1.8.1_py39/lib/python3.9/site-packages/torch/include -I/share/software/user/open/py-pytorch/1.8.1_py39/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/share/software/user/open/py-pytorch/1.8.1_py39/lib/python3.9/site-packages/torch/include/TH -I/share/software/user/open/py-pytorch/1.8.1_py39/lib/python3.9/site-packages/torch/include/THC -I/share/software/user/open/cuda/11.2.0/include -I/share/software/user/open/python/3.9.0/include/python3.9 -c torchsort/isotonic_cuda.cu -o build/temp.linux-x86_64-3.9/torchsort/isotonic_cuda.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options '-fPIC' -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="_gcc" -DPYBIND11_STDLIB="_libstdcpp" -DPYBIND11_BUILD_ABI="_cxxabi1011" -DTORCH_EXTENSION_NAME=isotonic_cuda -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=compute_61 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=compute_70 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_75,code=compute_75 -gencode=arch=compute_75,code=sm_75 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 -ccbin gcc -std=c++14
    nvcc error   : 'cicc' died due to signal 9 (Kill signal)  
    error: command '/share/software/user/open/cuda/11.2.0/bin/nvcc' failed with exit code 9  
    ----------------------------------------
ERROR: Command errored out with exit status 1: /share/software/user/open/python/3.9.0/bin/python3.9 -u -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-rc2snp_l/torchsort_02ab43cb664b4f778f8057965fe69d12/setup.py'"'"'; __file__='"'"'/tmp/pip-install-rc2snp_l/torchsort_02ab43cb664b4f778f8057965fe69d12/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(__file__) if os.path.exists(__file__) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /tmp/pip-record-0etipwtv/install-record.txt --single-version-externally-managed --user --prefix= --compile --install-headers /home/users/huangda/.local/include/python3.9/torchsort Check the logs for full command output.

Has anybody else experienced this before? The full command I am running is:

TORCH_CUDA_ARCH_LIST="Pascal;Volta;Turing;Ampere" pip3 install --user torchsort

CUDA Kernel

Should be nearly identical to the CPU Kernel, just using torch::PackedTensorAccessor32 instead of torch::TensorAccessor.

Incorrect results when running on non-default cuda device

When running torchsort.soft_rank or torchsort.soft_sort on a tensor that's not on the default cuda device (usually cuda:0), the results are incorrect.

import torch
import torchsort

x = torch.tensor([[9,8]], device="cuda:1")

print(torchsort.soft_rank(x))
# tensor([[9., 8.]], device='cuda:1')

print(torchsort.soft_sort(x))
# tensor([[-2., -1.]], device='cuda:1')

Based on the GPU memory usage, torchsort tries to do something on the default cuda device cuda:0 instead of whichever device the input tensor is on. As a workaround, you need to either change the default cuda device with torch.cuda.set_device or use the context manager torch.cuda.device.

import torch
import torchsort

x = torch.tensor([[9,8]], device="cuda:1")

with torch.cuda.device(x.device):
    print(torchsort.soft_rank(x))
    # tensor([[2., 1.]], device='cuda:1')

    print(torchsort.soft_sort(x))
    # tensor([[8., 9.]], device='cuda:1')