Giter Club home page Giter Club logo

mvsnet_pl's Introduction

MVSNet_pl

Unofficial implementation of MVSNet: Depth Inference for Unstructured Multi-view Stereo using pytorch-lightning

An improved version of MVSNet: CasMVSNet is available!

References & Credits

Official implementation: MVSNet

A pytorch implementation: MVSNet_pytorch. This code is heavily borrowed from his implementation. Thank xy-guo for the effortful contribution! Two main difference w.r.t. his repo:

  1. homo_warping function is rewritten in a more concise and slightly faster way.
  2. Use Inplace-ABN in the model to reduce GPU memory consumption (about 10%).

Installation

Hardware

  • OS: Ubuntu 16.04 or 18.04
  • NVIDIA GPU with CUDA>=10.0 (tested with 1 RTX 2080Ti)

Software

  • Python>=3.6.1 (installation via anaconda is recommended, use conda create -n mvsnet_pl python=3.6 to create a conda environment and activate it by conda activate mvsnet_pl)
  • Python libraries
    • Install core requirements by pip install -r requirements.txt
    • Install Inplace-ABN by pip install git+https://github.com/mapillary/[email protected]

Data download

Download the preprocessed DTU training data from original MVSNet repo and unzip. For the description of how the data is created, please refer to the original paper.

Training

Run

python train.py \
  --root_dir $DTU_DIR \
  --num_epochs 6 --batch_size 1 \
  --n_depths 192 --interval_scale 1.06 \
  --optimizer adam --lr 1e-3 --lr_scheduler cosine

Note that the model consumes huge GPU memory, so the batch size is generally small. For reference, the above command requires 5901MB of GPU memory.

IMPORTANT : the combination of --n_depths and --interval_scale is important: you need to make sure 2.5 x n_depths x interval_scale is roughly equal to 510. The reason is that the actual depth ranges from 425 to 935mm, which is 510mm wide. Therefore, you need to make sure all the depth can be covered by the depth planes you set. Some common combinations are: --n_depths 256 --interval_scale 0.8, --n_depths 192 --interval_scale 1.06 and --n_depths 128 --interval_scale 1.6.

See opt.py for all configurations.

Example training log

log1 log2 log3

Metrics

The metrics are collected on the DTU val set.

abs_err 1mm acc 2mm acc 4mm acc
Paper 7.25mm* N/A N/A N/A
This repo 6.374mm 54.43% 74.23% 85.8%

*From P-MVSNet Table 2.

Some observations on training

  1. Larger n_depths theoretically gives better results, but requires larger GPU memory, so basically the batch_size can just be 1 or 2. However at the meanwhile, larger batch_size is also indispensable. To get a good balance between n_depths and batch_size, I found that n_depths 128 batch_size 2 performs better than n_depths 192 batch_size 1 given a fixed GPU memory of 11GB. Of course to get even better results, you'll definitely want to scale up the batch_size by using more GPUs, and that is easy under pytorch-lightning's framework!
  2. Longer training epochs produces better results. The pretrained model I provide is trained for 16 epochs, and it performs better than the model trained for only 6 epochs as the paper did.
  3. Image color augmentation worsen the result, and normalization seems to have little to no effect. However, BlendedMVS claims otherwise, they obtain better results using augmentation.

Testing

  1. Download pretrained model from release.
  2. Use test.ipynb for a simple depth inference for an image.

The repo is only for training purpose for now. Please refer to the other repositories mentioned at the beginning if you want to evaluate the model.

mvsnet_pl's People

Contributors

kwea123 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

mvsnet_pl's Issues

The correct way to enable multi-GPU training

Hi, @kwea123

I am conducting some experiments using this MVSNet implementation, since its clear and simple PyTorch Lightning warping.
For faster training process, the model is trained with 3 GPUs on my server, while an error comes out when the hyperparameter --gpu_num simply was set to 3. The PyTorch Lightning raised verbose information:

"You seem to have configured a sampler in your DataLoader. This will be replaced "
" by `DistributedSampler` since `replace_sampler_ddp` is True and you are using"
" distributed training. Either remove the sampler from your DataLoader or set"
" `replace_sampler_ddp=False` if you want to use your custom sampler."

To solve this problem, the train.py code has been modified by setting parameter in PL Trainer:

trainer = Trainer(#......
                  gpus=hparams.num_gpus,
                  replace_sampler_ddp=False,
                  distributed_backend='ddp' if hparams.num_gpus>1 else None,
                  # ......)

The model can be trained after this hyperparameter configured.

Is this the correct way to enable multi-GPU training manner?
For some reason, I cannot install nvidia-apex for current server.
Should and how do I use SyncBatchNorm for this model implementation?
Does it bear on performance without SyncBN?
Please tell me if I should, using nn.SyncBatchNorm.convert_sync_batchnorm() or PyTorch Lightning sync_bn in Trainer configuration?

Thanks a lot. 😊

What do the float numbers in unpreprocess function mean? (in train.py)

Where do these float numbers come from? I could not find any reference to them.

MVSNet_pl/train.py

Lines 32 to 41 in b18b5ee

class MVSSystem(pl.LightningModule):
def __init__(self, hparams):
super(MVSSystem, self).__init__()
self.hparams = hparams
# to unnormalize image for visualization
self.unpreprocess = T.Normalize(mean=[-0.485/0.229, -0.456/0.224, -0.406/0.225],
std=[1/0.229, 1/0.224, 1/0.225])
self.loss = loss_dict[hparams.loss_type](ohem=True, topk=0.6)

Question in homo_warping function

In the home warming function, the normalized result after coordinate transformation may not be between - 1 and 1, because the coordinate change may make x greater than width - 1, y greater than height - 1. How do you understand this? I look forward to your reply, thank you.

Produced result is a little bit worse

I use the following command, but it seems that the performance is worse than yours. May I ask do I miss any things?

CUDA_VISIBLE_DEVICES=1 python train.py
--root_dir data/DTU/mvs_training/dtu
--num_epochs 6 --batch_size 1
--n_depths 192 --interval_scale 1.06
--optimizer adam --lr 1e-3 --lr_scheduler cosine

image
image

About inverse depth

I saw some papers (like RMVSNet) say "when reconstructing larger scene, inverse depth is a good choice", but i can't understand the equation of "inverse depth", can you explain this? thanks a lot.

I can‘t reproduce ur result

I’ve tried 1 batchsize, 192depthnum and 2 batchsize, 128 depthnum in training but the results did't match yours. @kwea123
1 batchsize, 192depthnum
image

2 batchsize, 128 depthnum
image

Here are the issues:

  1. My results are far from yours;
    2)It seems that 1 batchsize, 192depthnum perfomance better than 2 batchsize, 128 depthnum.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.