Hello I am Running this code on a single gpu GTX 1080 Ti and getting this error:</

<div class="highlight highlight-source-python notranslate position-relative overflow-auto" dir="auto

Hey <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url=

Runtime Error: CUDA out of Memory about deep-geometric-prior HOT 15 CLOSED

fwilliams commented on September 15, 2024

Runtime Error: CUDA out of Memory

from deep-geometric-prior.

Comments (15)

fwilliams commented on September 15, 2024

The method unfortunately requires a lot of GPU memory since we're fitting an individual network per surface patch. If you don't want to modify the code, you will need to use more GPUs or a GPU with more VRAM.

If you downsample the pointcloud, you will probably need to play with the parameters a bit to get a good reconstruction.

If you're willing to modify the code a bit, you could split up fitting the networks into chunks so each chunk fits in memory (i.e. fit N networks at a time). I did not include this in the implementation as it made the code more complex and I had access to multiple GPUs with a lot of VRAM.

If you want to make a version of DGP which splits fitting patches into chunks, I'd be more than happy to merge a PR!

from deep-geometric-prior.

aGIToz commented on September 15, 2024

I actually ran the code on the 4 gpus (nvidia-titan 12gb), on the cluster first and then on my local computer. But I had the same Run-time cuda error. I checked in your paper you mentioned using p40 gpu (24 gb) so indeed high memory.

I shall be willing to modify the code, but this can take some time.

from deep-geometric-prior.

fwilliams commented on September 15, 2024

Yes you are correct. I ran the original code across 4 Tesla P-40s with 24GiB of VRAM each.

I think the best bet is to modify the code to perform reconstruction in batches that fit in memory. The algorithm is very parallelizable so this shouldn't be too difficult. I'd really appreciate this PR since it would make it easier for people to run comparisons.

from deep-geometric-prior.

aGIToz commented on September 15, 2024

Hey Francis,

I spend considerable time on your code, here are my thoughts, correct me if I am wrong.
Normally in deep learning, the training is done is batches as size of the data-set can be very large to find in the vram. Here for the surface the reconstruction, size of the data-set is not the issue, but the size and number of network itself. For the lord_quas.ply with default parameters, it generates 5017 patches and for each patch we train a MLP, so we have in total 2.15 billion params to learn, this itself is 16 times more than the params of vgg16, and it amounts 8.6GB, (4bytes each) and sure internally torch will also use more memory when creating gradients for those params.

So with the objective that the code works on single gpu, we can do is to locally train N batches of MLP , then train them to have consistency between them and plot the reconstruction and repeat the process for next batch of MLP. In the end we have Q reconstruction (assuming num_batches = QN) in total.

So it should turn out something like:

let num_batches = QN

where N is the batch size,

for each batch out of Q:
    phi=nn.Modulelist([MLP for i in  range(N)])
    for epcoch in range(args.local_epoch):
         for i in range(N):
                # get the patch_uvs, patch_xs, for the batch range
                # do ot to get loss for each NN, loss_i
          sum_loss_batch +=loss_i  # loss for batch
          sum_loss_batch.backward()


     # Do it for consistency among N patches.
     # similar loop will come here

     # Then upsamle the surface the with N MLP trained for N patches 
     # save their reconstruction.

The above thing is certainly doable.

But I am not sure if this final reconstruction, which will be merge of all the reconstruction will be completely equivalent of what you got.

from deep-geometric-prior.

fwilliams commented on September 15, 2024

Hey there, you are correct about the memory usage scaling with the number of networks.

The first loop you describe is indeed correct. You fit N networks independently to their local neighborhoods.

For the consistency part we fix the correspondences (so no more optimal transport loss) and we do one of two things:

For each point which is overlapped by multiple MLPs, we compute the 3D prediction of that point for each MLP. We then fit these MLPs to the mean of all the 3D predictions.
For data without noise, we simply fit all the MLPs to agree with the input point (this should interpolate the noise).

In both cases, the networks can be trained independently of each other since they rely on one precomputation of the overlapping points. The tricky part with the precomputation (1) above is that you need to evaluate all the MLPs but you can't keep them all in VRAM. Thus, you'll need to move MLPs to/from the CPU/GPU to do the evaluation.

Does that make sense? Let me know if there's anything else I can clear up for you and thanks for taking the time to do this!

from deep-geometric-prior.

aGIToz commented on September 15, 2024

Ok, so I suppose that lines 438 to 441, correspond to the 1. and 2. ?

    if not args.interpolate:
        print("Computing patch means...")
        with torch.no_grad():
            patch_xs = patch_means(pi, patch_uvs, patch_idx, patch_tx, phi, x)

So it should be like after a batch of MLP has been trained on GPU, I save it on CPU and repeat for other batches, so that all the MLP for num_patches are now saved on cpu and then evaluate patch_xs on cpu?

from deep-geometric-prior.

fwilliams commented on September 15, 2024

Yes, you're exactly right. Those lines compute the mean predictions for each patch.
So the procedure would be something like the following (in extremely rough approximate pseudocode)

trained_nets = []
for batch in batches:
    train_batch(batch)
    trained_net.append(batch.to('cpu')) 

# At this point in time, you have all your trained models stored on the CPU
# The function patch_means() needs to call each neural net once so you're going to 
# need to modify to use batches
patch_xs = patch_means(...)

for batch in batches:
   train_batch_consistency(...)

from deep-geometric-prior.

aGIToz commented on September 15, 2024

# The function patch_means() needs to call each neural net once so you're going to 
# need to modify to use batches

One trivial way which comes to my mind to modify it would be:

def patch_means(patch_pis, patch_uvs, patch_idx, patch_tx, phi, x):

The inputs, patch_pis, patch_uvs ... are for a particular batch, for which the MLPs were trained and so the output patch_xs is also for a batch. Is this what we aim here?

from deep-geometric-prior.

fwilliams commented on September 15, 2024

I think its more complex than that because patch_means works by accumulating the 3D prediction for each point across all patches. The accumulation is done by storing all the prediction in mean_pts and the number of predictions into counts. Thus the means are mean_pts / counts.

One way of doing this is to create all the networks phi on the CPU. Then, in the for loop where you call each phi, you copy phi[i] to the device, run it, then move it back. Since each model only gets called once, this shouldn't be too slow. I would also make this behavior optional with a flag (so add a batched=False argument)

from deep-geometric-prior.

fwilliams commented on September 15, 2024

Hey @aGIToz how is this going? Any questions I can answer?

from deep-geometric-prior.

aGIToz commented on September 15, 2024

Hey man I am able to do the reconstruction with single gpu in batch size of 250 mlps. The reconstruction is definitely better compare to original scans and but it requires more epochs.
The reconstruction is still no match to your reconstruction in our_reconstructions folder? Any idea why?

from deep-geometric-prior.

aGIToz commented on September 15, 2024

How exactly the upsampling factor is decided? Like in lord_quas.ply the original is 57k and in reconstruction I got it is 320k and in your reconstruction it is 9260k

from deep-geometric-prior.

fwilliams commented on September 15, 2024

Hey thanks for coding this up!

First off, I would make sure that your batched version works the same as the original code by verifying that they produce the same output. By fixing the random seed, you should expect the same output in both versions. Maybe try on a small model for starters. Note that you will likely need to play around with the neighborhood parameters to get something that looks good.

Unfortunately, I can't remember the exact upsampling factor I used with the original code. You can always run export_point_cloud.py on the .pt file outputted by reconstruct_surface.py to generate point clouds with different upsampling factors.

from deep-geometric-prior.

fwilliams commented on September 15, 2024

As for how the upsampling factor is decided. reconstruct_surface.py uses by default 8 upsamples per patch (use --upsamples-per-patch) to change this. You can always re-generate the point cloud after the fact using export_point_cloud.py as I mentioned in the previous message (again, the --upsamples-per-patch argument controls the number of samples generated in each patch)

from deep-geometric-prior.

fwilliams commented on September 15, 2024

This is fixed now. You can simply run reconstruct_surface.py with --batch-size N to split fitting into batches of N patches. Set N small enough to fit in VRAM.

from deep-geometric-prior.

Runtime Error: CUDA out of Memory about deep-geometric-prior HOT 15 CLOSED

Comments (15)

Related Issues (10)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent