Giter Club home page Giter Club logo

Comments (4)

Palisand avatar Palisand commented on June 4, 2024

Here is what I ran in the VM, up to the unit tests run:

sudo apt update
sudo apt install -y wget git

# miniconda
wget https://repo.anaconda.com/miniconda/Miniconda3-py39_4.12.0-Linux-x86_64.sh
bash Miniconda3-py39_4.12.0-Linux-x86_64.sh
source .bashrc

# COLMAP - https://colmap.github.io/install.html

sudo apt install -y \
    cmake \
    build-essential \
    libboost-program-options-dev \
    libboost-filesystem-dev \
    libboost-graph-dev \
    libboost-system-dev \
    libboost-test-dev \
    libeigen3-dev \
    libsuitesparse-dev \
    libfreeimage-dev \
    libmetis-dev \
    libgoogle-glog-dev \
    libgflags-dev \
    libglew-dev \
    qtbase5-dev \
    libqt5opengl5-dev \
    libcgal-dev

sudo apt install -y libatlas-base-dev libsuitesparse-dev
git clone https://ceres-solver.googlesource.com/ceres-solver
cd ceres-solver
git checkout $(git describe --tags) # Checkout the latest release
mkdir build
cd build
cmake .. -DBUILD_TESTING=OFF -DBUILD_EXAMPLES=OFF
make -j
sudo make install

cd ~
git clone https://github.com/colmap/colmap.git
cd colmap
git checkout dev
mkdir build
cd build
cmake ..
make -j
sudo make install
colmap -h

## Install & configure MultiNERF

cd ~
git clone https://github.com/google-research/multinerf.git
cd multinerf
conda create --name multinerf python=3.9
conda activate multinerf
conda install pip
pip install --upgrade pip
pip install -r requirements.txt
pip install tensorflow==2.9.1  # match TPU software version
git clone https://github.com/rmbrualla/pycolmap.git ./internal/pycolmap
./scripts/run_all_unit_tests.sh

from multinerf.

jonbarron avatar jonbarron commented on June 4, 2024

How are you running this on a Google TPU? We train our models on Google TPUs but using the internal interface, which is different from the publicly available one. I don't think this code has yet been run through the external interface. Have you verified that you can run other models on the TPUs you're using? It seems like the issue here is at a lower level than this codebase here --- maybe a jax/cuda/driver issue?

from multinerf.

Palisand avatar Palisand commented on June 4, 2024

Ah, I see. I am using the publicly available interface, following google's Cloud TPU documentation. I haven't verified other models.

To create the TPU VM, I ran:

gcloud config set project multinerf
gcloud services enable tpu.googleapis.com
gcloud beta services identity create --service tpu.googleapis.com
gcloud alpha compute tpus tpu-vm create tpu-multinerf --zone us-central1-b --accelerator-type v3-8 --version tpu-vm-tf-2.9.1

I then SSHed into the VM:

gcloud alpha compute tpus tpu-vm ssh tpu-multinerf --zone us-central1-b

And ran the aforementioned commands.

Before using the TPU VM, I tested these commands locally, in a Docker container running Ubuntu 20.04 (just like the VM). The tests succeeded in the container.

from multinerf.

Palisand avatar Palisand commented on June 4, 2024

I tried again from scratch. This time, I removed jax, jaxlib, and tensorflow from requirements.txt and then I ran:

pip install "jax[tpu]>=0.2.16" -f https://storage.googleapis.com/jax-releases/libtpu_releases.html
pip install tensorflow==2.9.1
pip install -r requirements.txt

For jax: https://github.com/google/jax/#pip-installation-google-cloud-tpu

Some tests still fail, but at least they're not aborted. Here's some partial test output:
FAIL: test_construct_ray_warps_extents_log (tests.coord_test.CoordTest)
tests.coord_test.CoordTest.test_construct_ray_warps_extents_log
test_construct_ray_warps_extents_log(<CompiledFunction of <function _one_to_one_unop.<locals>.<lambda> at 0x7faae18e0e50>>)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/palisand/miniconda3/envs/multinerf/lib/python3.9/site-packages/absl/testing/parameterized.py", line 314, in bound_param_test
    return test_method(self, *testcase_params)
  File "/home/palisand/multinerf/tests/coord_test.py", line 194, in test_construct_ray_warps_extents
    np.testing.assert_allclose(
  File "/home/palisand/miniconda3/envs/multinerf/lib/python3.9/site-packages/numpy/testing/_private/utils.py", line 1527, in assert_allclose
    assert_array_compare(compare, actual, desired, err_msg=str(err_msg),
  File "/home/palisand/miniconda3/envs/multinerf/lib/python3.9/site-packages/numpy/testing/_private/utils.py", line 844, in assert_array_compare
    raise AssertionError(msg)
AssertionError:
Not equal to tolerance rtol=1e-05, atol=1e-05

Mismatched elements: 43 / 100 (43%)
Max absolute difference: 0.00045204
Max relative difference: 7.56597e-05
 x: array([ 2.400275,  1.668342,  2.044059,  6.927439,  1.078879,  0.115673,
        0.290508,  0.686725,  0.240134,  0.186716,  0.534207,  0.409191,
        2.090983,  0.41522 ,  0.722983,  1.309822,  0.97231 ,  0.64675 ,...
 y: array([ 2.400219,  1.668289,  2.044053,  6.927345,  1.078887,  0.115672,
        0.290508,  0.686718,  0.240134,  0.186729,  0.534207,  0.409187,
        2.090986,  0.415209,  0.722943,  1.309913,  0.972313,  0.646793,...

======================================================================
FAIL: test_pos_enc_25_2 (tests.coord_test.CoordTest)
tests.coord_test.CoordTest.test_pos_enc_25_2
test_pos_enc_25_2(25, 2)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/palisand/miniconda3/envs/multinerf/lib/python3.9/site-packages/absl/testing/parameterized.py", line 314, in bound_param_test
    return test_method(self, *testcase_params)
  File "/home/palisand/multinerf/tests/coord_test.py", line 127, in test_pos_enc
    self.assertLess(max_err, tol)
AssertionError: 2.3317099 not less than 2

======================================================================
FAIL: test_pos_enc_30_2 (tests.coord_test.CoordTest)
tests.coord_test.CoordTest.test_pos_enc_30_2
test_pos_enc_30_2(30, 2)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/palisand/miniconda3/envs/multinerf/lib/python3.9/site-packages/absl/testing/parameterized.py", line 314, in bound_param_test
    return test_method(self, *testcase_params)
  File "/home/palisand/multinerf/tests/coord_test.py", line 127, in test_pos_enc
    self.assertLess(max_err, tol)
AssertionError: 109575406000.0 not less than 2

----------------------------------------------------------------------
Ran 21 tests in 30.823s

FAILED (failures=3)
.FFFF.
======================================================================
FAIL: test_mse_to_psnr_golden (tests.image_test.ImageTest)
tests.image_test.ImageTest.test_mse_to_psnr_golden
A lazy golden test for mse_to_psnr.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/palisand/multinerf/tests/image_test.py", line 127, in test_mse_to_psnr_golden
    np.testing.assert_allclose(psnr, psnr_gt, atol=1E-5, rtol=1E-5)
  File "/home/palisand/miniconda3/envs/multinerf/lib/python3.9/site-packages/numpy/testing/_private/utils.py", line 1527, in assert_allclose
    assert_array_compare(compare, actual, desired, err_msg=str(err_msg),
  File "/home/palisand/miniconda3/envs/multinerf/lib/python3.9/site-packages/numpy/testing/_private/utils.py", line 844, in assert_array_compare
    raise AssertionError(msg)
AssertionError:
Not equal to tolerance rtol=1e-05, atol=1e-05

Mismatched elements: 28 / 64 (43.8%)
Max absolute difference: 0.00061035
Max relative difference: 0.00023579
 x: array([43.429222, 42.739685, 42.050312, 41.360874, 40.671413, 39.982204,
       39.29292 , 38.603603, 37.91449 , 37.225014, 36.535473, 35.846165,
       35.156944, 34.46737 , 33.777996, 33.088787, 32.399387, 31.709982,...
 y: array([43.429447, 42.74009 , 42.050735, 41.361378, 40.672024, 39.982666,
       39.29331 , 38.603954, 37.914597, 37.22524 , 36.535885, 35.846527,
       35.15717 , 34.46781 , 33.778458, 33.0891  , 32.399746, 31.710388,...

======================================================================
FAIL: test_psnr_mse_round_trip (tests.image_test.ImageTest)
tests.image_test.ImageTest.test_psnr_mse_round_trip
PSNR -> MSE -> PSNR is a no-op.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/palisand/multinerf/tests/image_test.py", line 63, in test_psnr_mse_round_trip
    np.testing.assert_allclose(
  File "/home/palisand/miniconda3/envs/multinerf/lib/python3.9/site-packages/numpy/testing/_private/utils.py", line 1527, in assert_allclose
    assert_array_compare(compare, actual, desired, err_msg=str(err_msg),
  File "/home/palisand/miniconda3/envs/multinerf/lib/python3.9/site-packages/numpy/testing/_private/utils.py", line 844, in assert_array_compare
    raise AssertionError(msg)
AssertionError:
Not equal to tolerance rtol=1e-05, atol=1e-05

Mismatched elements: 1 / 1 (100%)
Max absolute difference: 0.00024223
Max relative difference: 1.21116638e-05
 x: array(20.000242, dtype=float32)
 y: array(20.)

======================================================================
FAIL: test_srgb_linearize (tests.image_test.ImageTest)
tests.image_test.ImageTest.test_srgb_linearize
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/palisand/multinerf/tests/image_test.py", line 81, in test_srgb_linearize
    np.testing.assert_allclose(
  File "/home/palisand/miniconda3/envs/multinerf/lib/python3.9/site-packages/numpy/testing/_private/utils.py", line 1527, in assert_allclose
    assert_array_compare(compare, actual, desired, err_msg=str(err_msg),
  File "/home/palisand/miniconda3/envs/multinerf/lib/python3.9/site-packages/numpy/testing/_private/utils.py", line 844, in assert_array_compare
    raise AssertionError(msg)
AssertionError:
Not equal to tolerance rtol=1e-05, atol=1e-05

Mismatched elements: 3524 / 10000 (35.2%)
Max absolute difference: 0.00025249
Max relative difference: 0.00018565
 x: array([-1.      , -0.9996  , -0.9992  , ...,  2.999342,  2.999746,
        3.00015 ], dtype=float32)
 y: array([-1.    , -0.9996, -0.9992, ...,  2.9992,  2.9996,  3.    ],
      dtype=float32)

======================================================================
FAIL: test_srgb_to_linear_golden (tests.image_test.ImageTest)
tests.image_test.ImageTest.test_srgb_to_linear_golden
A lazy golden test for srgb_to_linear.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/palisand/multinerf/tests/image_test.py", line 108, in test_srgb_to_linear_golden
    np.testing.assert_allclose(linear, linear_gt, atol=1E-5, rtol=1E-5)
  File "/home/palisand/miniconda3/envs/multinerf/lib/python3.9/site-packages/numpy/testing/_private/utils.py", line 1527, in assert_allclose
    assert_array_compare(compare, actual, desired, err_msg=str(err_msg),
  File "/home/palisand/miniconda3/envs/multinerf/lib/python3.9/site-packages/numpy/testing/_private/utils.py", line 844, in assert_array_compare
    raise AssertionError(msg)
AssertionError:
Not equal to tolerance rtol=1e-05, atol=1e-05

Mismatched elements: 19 / 64 (29.7%)
Max absolute difference: 6.875396e-05
Max relative difference: 0.00015924
 x: array([0.      , 0.001229, 0.002457, 0.003725, 0.005261, 0.007113,
       0.009299, 0.011834, 0.014733, 0.018009, 0.02167 , 0.025736,
       0.030215, 0.035118, 0.040456, 0.04624 , 0.05248 , 0.059185,...
 y: array([0.      , 0.001229, 0.002457, 0.003725, 0.005261, 0.007113,
       0.0093  , 0.011835, 0.014732, 0.018007, 0.021671, 0.025736,
       0.030215, 0.035118, 0.040456, 0.04624 , 0.052479, 0.059184,...

----------------------------------------------------------------------
Ran 6 tests in 9.887s

FAILED (failures=4)

from multinerf.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.