Giter Club home page Giter Club logo

Comments (7)

refraction-ray avatar refraction-ray commented on June 21, 2024

For the list of functionalities that a backend framework has to support, try
[s for s in dir(tc.backend) if not s.startswith("_")].

The critical operations include the support for complex valued matrix multiplication, addition, inversion, eigen decomposition, QR decomposition, and sigular value decompistion. In addition, automatic differentiation for complex operations and complex input is also subtle, the gradients and derivatives are different upto complex conjugations.

from tensorcircuit.

xiazhuo avatar xiazhuo commented on June 21, 2024

Do the eig, qr, and svd need to support backpropagation? We notice that the reverse of the complex eig decomposition in PyTorch is ill-conditioned.

from tensorcircuit.

refraction-ray avatar refraction-ray commented on June 21, 2024

Yes, they need to support AD. For numerically stability, we can further customize their AD rules, see https://github.com/tencent-quantum-lab/tensorcircuit/blob/master/tensorcircuit/backends/pytorch_ops.py

from tensorcircuit.

xiazhuo avatar xiazhuo commented on June 21, 2024

Hello,

I'm currently contributing to the project and attempting to set up my local development environment. I've encountered some issues while running the tests using pytest. Below are the details of my environment and the steps I've taken:

  • OS: Ubuntu 18.04
  • Python Version: 3.10.14
  • GPU Driver Version: 550.67

Steps Taken:

  1. Cloned the latest version of the repository.

  2. Followed the latest GitHub CI configuration for "test (ubuntu-20.04, 3.10)"

  3. Installed dependencies using the following commands:

    python -m pip install --upgrade pip
    pip install --no-cache-dir -r requirements/requirements.txt
    pip install --no-cache-dir -r requirements/requirements-extra.txt
    pip install --no-cache-dir -r requirements/requirements-dev.txt
    pip install --no-cache-dir -r requirements/requirements-types.txt
  4. Have tried to set environment variables:

    export XLA_PYTHON_CLIENT_PREALLOCATE=false
    export TF_FORCE_GPU_ALLOW_GROWTH=true
  5. Ran the tests using:

    pytest --cov=tensorcircuit --cov-report=xml -svv --benchmark-skip

Issue:

Despite following these steps, I encountered multiple errors during the test execution. Below are the relevant parts of the error logs:

2024-06-15 22:59:56.019344: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-06-15 22:59:56.019428: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-06-15 22:59:56.021177: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-06-15 22:59:57.493948: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
...
================================================ test session starts ================================================
platform linux -- Python 3.10.14, pytest-6.2.4, py-1.11.0, pluggy-0.13.1 -- /home/xiazhuo/.miniconda3/envs/jittorquantum/bin/python
cachedir: .pytest_cache
benchmark: 4.0.0 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
rootdir: /mnt/nas2/home/xiazhuo/tensorcircuit, configfile: pytest.ini
plugins: anyio-4.4.0, xdist-3.5.0, lazy-fixture-0.6.3, cov-5.0.0, benchmark-4.0.0
collecting ... 2024-06-15 23:00:07.331048: W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:47] Overriding orig_value setting because the TF_FORCE_GPU_ALLOW_GROWTH environment variable is set. Original config value was 0.
2024-06-15 23:00:07.331783: W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:47] Overriding orig_value setting because the TF_FORCE_GPU_ALLOW_GROWTH environment variable is set. Original config value was 0.
2024-06-15 23:00:07.332353: W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:47] Overriding orig_value setting because the TF_FORCE_GPU_ALLOW_GROWTH environment variable is set. Original config value was 0.
2024-06-15 23:00:07.332898: W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:47] Overriding orig_value setting because the TF_FORCE_GPU_ALLOW_GROWTH environment variable is set. Original config value was 0.
2024-06-15 23:00:07.333541: W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:47] Overriding orig_value setting because the TF_FORCE_GPU_ALLOW_GROWTH environment variable is set. Original config value was 0.
collected 580 items / 1 skipped / 579 selected                                                                       
.......
.......
.......
---------- coverage: platform linux, python 3.10.14-final-0 ----------
Coverage XML written to file coverage.xml

============================================== short test summary info ==============================================

FAILED tests/test_backends.py::test_device_cpu_gpu[jaxb] - RuntimeError: Unknown backend: 'gpu' requested, but no ...

FAILED tests/test_backends.py::test_qr[torchb] - RuntimeError: clone is not supported by NestedIntSymNode

FAILED tests/test_backends.py::test_optimizers[torchb] - AttributeError: partially initialized module 'torch._dyna...

FAILED tests/test_circuit.py::test_circuit_inverse_2[npb] - qiskit.exceptions.MissingOptionalLibraryError: "The 'p...

FAILED tests/test_circuit.py::test_circuit_inverse_2[tfb] - qiskit.exceptions.MissingOptionalLibraryError: "The 'p...

FAILED tests/test_circuit.py::test_circuit_inverse_2[jaxb] - qiskit.exceptions.MissingOptionalLibraryError: "The '...

FAILED tests/test_circuit.py::test_draw_cond_measure - qiskit.exceptions.MissingOptionalLibraryError: "The 'pylate...

FAILED tests/test_circuit.py::test_circuit_to_json[npb] - qiskit.exceptions.MissingOptionalLibraryError: "The 'pyl...

FAILED tests/test_circuit.py::test_circuit_to_json[tfb] - qiskit.exceptions.MissingOptionalLibraryError: "The 'pyl...

FAILED tests/test_circuit.py::test_circuit_to_json[jaxb] - qiskit.exceptions.MissingOptionalLibraryError: "The 'py...

FAILED tests/test_circuit.py::test_to_openqasm - qiskit.exceptions.MissingOptionalLibraryError: "The 'pylatexenc' ...

FAILED tests/test_circuit.py::test_initial_mapping - qiskit.exceptions.MissingOptionalLibraryError: "The 'pylatexe...

FAILED tests/test_compiler.py::test_qsikit_compiler - qiskit.exceptions.MissingOptionalLibraryError: "The 'pylatex...

FAILED tests/test_compiler.py::test_composed_compiler - qiskit.exceptions.MissingOptionalLibraryError: "The 'pylat...

FAILED tests/test_compiler.py::test_replace_r - qiskit.exceptions.MissingOptionalLibraryError: "The 'pylatexenc' l...

FAILED tests/test_compiler.py::test_default_compiler - qiskit.exceptions.MissingOptionalLibraryError: "The 'pylate...

FAILED tests/test_dmcircuit.py::test_dm_circuit_draw - qiskit.exceptions.MissingOptionalLibraryError: "The 'pylate...

FAILED tests/test_interfaces.py::test_dlpack_transformation[tfb] - jaxlib.xla_extension.XlaRuntimeError: INVALID_A...

FAILED tests/test_shadows.py::test_jit[tfb] - tensorflow.python.framework.errors_impl.ResourceExhaustedError: Grap...

======================== 19 failed, 543 passed, 17 skipped, 2 xfailed in 1295.04s (0:21:35) =========================

Request:

Could you please help me identify what might be causing these issues and how I can resolve them? Any guidance on additional steps or configurations that I might need to set up would be greatly appreciated.

If you need more information about my configuration and the full logs, please let me know.

Thank you for your assistance and for all your hard work on this project!

from tensorcircuit.

refraction-ray avatar refraction-ray commented on June 21, 2024

Seems these errors are from different sources:

  • qiskit.exceptions.MissingOptionalLibraryError: "The 'pyl, this is due to the lack of package pylatexenc, pip install this package will solve most errors

  • RuntimeError: clone is not supported by NestedIntSymNode, AttributeError: partially initialized module 'torch._dyna these two seem to be related to incompatibility with torch>=2.3, you can try lower the version of pytorch

For remaining errors, I would like to see the full exception and error output to figure out the source of errors. I guess some of the remaining error might be related to breaking changes in device management API in these ML packages, since GPU related code is not tested on GitHub.

from tensorcircuit.

xiazhuo avatar xiazhuo commented on June 21, 2024

Thank you very much for your patience! Following your previous suggestions, I have installed the pylatexenc package and downgraded torch to version 2.1. However, I am still encountering several errors. Below are the error output:

====================================================== FAILURES =======================================================
______________________________________________ test_device_cpu_gpu[jaxb] ______________________________________________
backend = None

    @pytest.mark.skipif(
        len(tf.config.list_physical_devices()) == 1, reason="no GPU detected"
    )
    @pytest.mark.parametrize("backend", [lf("tfb"), lf("jaxb"), lf("torchb")])
    def test_device_cpu_gpu(backend):
        a = tc.backend.ones([])
>       a1 = tc.backend.device_move(a, "gpu:0")

tests/test_backends.py:330: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
tensorcircuit/backends/jax_backend.py:639: in device_move
    dev = self._str2dev(dev)
tensorcircuit/backends/jax_backend.py:654: in _str2dev
    return libjax.devices("gpu")[_id]
/home/xiazhuo/.miniconda3/envs/jittorquantum/lib/python3.10/site-packages/jax/_src/xla_bridge.py:1077: in devices
    return get_backend(backend).devices()
/home/xiazhuo/.miniconda3/envs/jittorquantum/lib/python3.10/site-packages/jax/_src/xla_bridge.py:1011: in get_backend
    return _get_backend_uncached(platform)
/home/xiazhuo/.miniconda3/envs/jittorquantum/lib/python3.10/site-packages/jax/_src/xla_bridge.py:992: in _get_backend_uncached
    platform = canonicalize_platform(platform)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
platform = 'gpu'

    def canonicalize_platform(platform: str) -> str:
      """Replaces platform aliases with their concrete equivalent.
    
      In particular, replaces "gpu" with either "cuda" or "rocm", depending on which
      hardware is actually present. We want to distinguish "cuda" and "rocm" for
      purposes such as MLIR lowering rules, but in many cases we don't want to
      force users to care.
      """
      platforms = _alias_to_platforms.get(platform, None)
      if platforms is None:
        return platform
    
      b = backends()
      for p in platforms:
        if p in b.keys():
          return p
>     raise RuntimeError(f"Unknown backend: '{platform}' requested, but no "
                         f"platforms that are instances of {platform} are present. "
                         "Platforms present are: " + ",".join(b.keys()))
E     RuntimeError: Unknown backend: 'gpu' requested, but no platforms that are instances of gpu are present. Platforms present are: cpu

/home/xiazhuo/.miniconda3/envs/jittorquantum/lib/python3.10/site-packages/jax/_src/xla_bridge.py:793: RuntimeError
___________________________________________ test_dlpack_transformation[tfb] ___________________________________________
backend = None

    @pytest.mark.parametrize("backend", [lf("tfb"), lf("jaxb"), lf("torchb")])
    def test_dlpack_transformation(backend):
        blist = ["tensorflow", "jax"]
        if is_torch is True:
            blist.append("pytorch")
        for b in blist:
>           ans = tc.interfaces.general_args_to_backend(
                args=tc.backend.ones([2], dtype="float32"),
                target_backend=b,
                enable_dlpack=True,
            )

tests/test_interfaces.py:363: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
tensorcircuit/interfaces/tensortrans.py:136: in general_args_to_backend
    return backend.tree_map(target_backend.from_dlpack, caps)
tensorcircuit/backends/abstract_backend.py:841: in tree_map
    return tf.nest.map_structure(f, *pytrees)
/home/xiazhuo/.miniconda3/envs/jittorquantum/lib/python3.10/site-packages/tensorflow/python/util/nest.py:631: in map_structure
    return nest_util.map_structure(
/home/xiazhuo/.miniconda3/envs/jittorquantum/lib/python3.10/site-packages/tensorflow/python/util/nest_util.py:1066: in map_structure
    return _tf_core_map_structure(func, *structure, **kwargs)
/home/xiazhuo/.miniconda3/envs/jittorquantum/lib/python3.10/site-packages/tensorflow/python/util/nest_util.py:1106: in _tf_core_map_structure
    [func(*x) for x in entries],
/home/xiazhuo/.miniconda3/envs/jittorquantum/lib/python3.10/site-packages/tensorflow/python/util/nest_util.py:1106: in <listcomp>
    [func(*x) for x in entries],
tensorcircuit/backends/jax_backend.py:434: in from_dlpack
    return jax.dlpack.from_dlpack(a)
/home/xiazhuo/.miniconda3/envs/jittorquantum/lib/python3.10/site-packages/jax/_src/dlpack.py:278: in from_dlpack
    return _legacy_from_dlpack(external_array, device, copy)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
dlpack = <capsule object "dltensor" at 0x7f271efb9c80>, device = None, copy = None

    def _legacy_from_dlpack(dlpack, device: xla_client.Device | None = None,
                            copy: bool | None = None):
      preferred_platform = getattr(device, "platform", None)
      if device and preferred_platform == "gpu":
        preferred_platform = "cuda" if "cuda" in device.client.platform_version else "rocm"
    
      cpu_backend = xla_bridge.get_backend("cpu")
      gpu_backend = None
    
      if preferred_platform in {"cuda", "rocm"}:
        try:
          gpu_backend = xla_bridge.get_backend(preferred_platform)
        except RuntimeError:
          raise TypeError(
            f"A {str.upper(preferred_platform)} device was specified, however no "
            f"{str.upper(preferred_platform)} backend was found."
          )
    
      if preferred_platform is None:
        try:
          gpu_backend = xla_bridge.get_backend("cuda")
        except RuntimeError:
          pass
        # Try ROCm if CUDA backend not found
        if gpu_backend is None:
          try:
            gpu_backend = xla_bridge.get_backend("rocm")
          except RuntimeError:
            pass
    
>     _arr = jnp.asarray(xla_client._xla.dlpack_managed_tensor_to_buffer(
          dlpack, cpu_backend, gpu_backend)) # type: ignore
E     jaxlib.xla_extension.XlaRuntimeError: INVALID_ARGUMENT: DLPack tensor is on GPU, but no GPU backend was provided.
/home/xiazhuo/.miniconda3/envs/jittorquantum/lib/python3.10/site-packages/jax/_src/dlpack.py:195: XlaRuntimeError

---------- coverage: platform linux, python 3.10.14-final-0 ----------
Coverage XML written to file coverage.xml

================================================ short test summary info ================================================
FAILED tests/test_backends.py::test_device_cpu_gpu[jaxb] - RuntimeError: Unknown backend: 'gpu' requested, but no plat...
FAILED tests/test_interfaces.py::test_dlpack_transformation[tfb] - jaxlib.xla_extension.XlaRuntimeError: INVALID_ARGUM...
=========================== 2 failed, 560 passed, 17 skipped, 2 xfailed in 1168.15s (0:19:28) ===========================

Additionally, if it is convenient, could you update the contribution guidelines and the requirements files to reflect the new steps and dependencies for setting up the environment?

from tensorcircuit.

refraction-ray avatar refraction-ray commented on June 21, 2024

The above errors seem to be the misconfiguration of jax+GPU, i.e. the installed jax doesn't have a well configured GPU backend somehow

Additionally, if it is convenient, could you update the contribution guidelines and the requirements files to reflect the new steps and dependencies for setting up the environment?

will done, thanks for the advice

from tensorcircuit.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.