Giter Club home page Giter Club logo

cascade-stereo's People

Contributors

alibaba-oss avatar gxd1994 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cascade-stereo's Issues

matlab codes

Hello, how should I run the matlab codes in order to obtain accuracy and completeness?

question about bias

self.inner1 = nn.Conv2d(base_channels * 2, final_chs, 1, bias=True)
self.inner2 = nn.Conv2d(base_channels * 1, final_chs, 1, bias=True)
self.out2 = nn.Conv2d(final_chs, base_channels * 2, 3, padding=1, bias=False)
self.out3 = nn.Conv2d(final_chs, base_channels, 3, padding=1, bias=False)

Some layers are with bias and some are not. These layers are not followed by batch normalization, so is there any reason that you set the bias to False?

camera parameters from DTU dataset

Dear author,
I downloaded original DTU dataset. But I cannot find camera parameters there.

Where do you find original camera parameters?

fusibile issue

I failed to successfully compile the fusion program on win10. Do you have a solution?

About inferrence time of cascade-GwcNet

In your paper, cascade-gwc is faster than gwcnet, however, my test time of your code is 500ms, gwcnet is 320ms.
In your cascade-gwc, firstly, constrct cost volume at 1/4 resolution(12H/4W/4), and then at 1/2 resolution, (12H/4W/4).

About Flops,
I test Flops of cascade-gwc is more than four times than gwcnet?

The trained model did not generate points during testing

The model obtained by using DTU dataset to train casmvsnet on v100 did not generate points in the test set, but in the same environment and code, the model trained on graphics cards such as 3060, 3090, m40 can generate points normally. Have you ever encountered this situation? Looking forward to your suggestion.

Training Fails on PFM read and reshape

Hello,
I am trying to train the CasMVSNet with DTU.
The training starts normally, but fails after ~1150 iterations on reshaping depth .pfm files:

training_fails

The specific file is: scan112, depth_map0026.pfm
I tried to remove the scan, but same error appeared on scan71 depth_map0005.pfm
and later on scan128 depth_map0016.pfm

I also tried to hard code the problematic .pfm file to always load, but the error did not repeat and the training proceeded with that only .pfm.

Any ideas regarding that issue?

Thanks

Can't replicate Tanks and Temples score

Hi, I am trying to replicate the score on Tanks and Temples benchmark but reaching lower performances. To compute SfM and image undistortion I am using the default COLMAP pipeline, and for post-processing I am using fusible as explained.
Have you used different hyperparameters i.e. max_h, max_w or gipuma ones?
Thank you.

Ground truth ply for Tanks and Temple intermediate test set

The official T&T dataset provides code for calculating F-Score along with ground truth plys for Training sets. The evaluation code requires ground truth data to compare with predicted data. However, I could not find ground truth data for intermediate test sets anywhere. So I wanted to ask if you created that by yourself by using some third-party software like COLMAP?

image

Tanks and Temples reproducing problem and one way might be right?

Hi, dear authors. Thank for your great work.When I tried to reproduce the results on the Tanks and Temples dataset, I got many background pixels, especially in "Family", in which the main part lost. I think the reason why there are so many background pixels is that in general_eval.py -> read_cam_file we use self.ndepths=192 but in YaoYao's T&T dataset, for example: Family, the num_depth is 700+.So I annotate the line 72~75 in general_eval.py and get rid of many background pixels, I do not konw if it is a mistake but it really confused me when I tried to reproduce the results on T&T. I want to know when you submit the results to T&T benchmark, how many self.ndepths you use and if you use the num_depth that YaoYao's dataset provide?Hope for your reply.

Results on DTU

Why can't I use your pre-training model to achieve your results on DTU?

Stuck in reproject_with_depth

Hi,

when running the cascadeMVSNet, I can get the depth.pfm and pro.pfm.
But once I get into the depth fusion, the code will stuck in line 259 in test.py. So I can't get the final .ply.

coarse to fine detach vs. nodetach

When doing depth regression using coarse to fine, by default you detach the gradient here:

if self.grad_method == "detach":
cur_depth = depth.detach()
else:
cur_depth = depth

It is intuitive, since we don't want the finer level training affect coarser levels, and it is necessary to actually analyze the effect of coarse to fine. But I wonder if the results get better if you don't detach the gradient here? Did you do any experiment, and what is your opinion?

Can you please provide a script to infer disparity maps on unseen data?

To check how well your method/models can generalize,
I tried using ./scripts/kitti15_save.sh to infer disparity maps on new and previously unseen data,
but unfortunately save_disp.py is too dependant on the KITTI data structure (e.g. via --testlist)
and I also do not know what the "third column" in ./filenames/*.txt is used for,
otherwise, I could maybe hack a script myself.

E.g. from ./filenames/kitti15_train.txt:
training/image_2/000000_10.png training/image_3/000000_10.png training/disp_occ_0/000000_10.png
left right ???

Having something like this would be great:

./scripts/infer.sh --left $PATH_TO_LEFT_IMAGES_FOLDER \
                   --right $PATH_TO_RIGHT_IMAGES_FOLDER \
                   --checkpoint ./checkpoints/kitti2015.ckpt \
                   --output $PATH_TO_DISPARITY_OUTPUT_FOLDER

(Assuming the corresponding file names within $PATH_TO_LEFT_IMAGES_FOLDER and $PATH_TO_RIGHT_IMAGES_FOLDER are the same.)

eval in DTU dataset

I'm confused about the speed of offical dtu eval code, your casMVS generate 30,000,000 points , which need too many time to run the matlab code, so I'm really appreciate if you have other methods that can speed up while keep the results same as matlab code .

it seems warmup scheduler has bug?

In utils.py , warmupscheduler, last epoch should not be used like that. scheduler.step() will not update it unless step(epoch)
And more important, the meaning of epoch is wrong in that logic.

Thats my personal opinion.......And I really appreciate your open source code
thanks

How to reproduce your result on DTU dataset?

Hello,

I'm trying to reproduce the result presented in the paper on DTU dataset. More specifically, as shown in your paper, the accuracy is 0.325 and the completeness is 0.385. However, I got the accuracy is 0.357 and the completeness is 0.359 when running your Mathlab evaluation code on the result of your pretrained model?
Can you provide more detail about the progress?

Many thanks,
Khang Truong

EPE Problem of Pretrained model on Sceneflow datasets

image

I have downloaded the provided pre-trained model on the Sceneflow dataset, the EPE result has a big gap with the published paper. I have tred pytorch 1.1 and 1.7, respectively. However, the results are the same as the upper figure. I am looking forward to your reply. Many thanks.

About parameters for fusion on Tank&&Temples

Hi,thanks for your excellent work and opensource code! Can you provide some details about fusion parameters on Tank dataset both of fusion via gipuma and normal fusion. Thank you!

Too many background pixels of tanks and temples

Hi. Thanks very much for your great work and releasing code. I use your pretrained model to test on tanks and temple dataset, the final fused model has too many background pixels such as sky in LIghtHouse. But in your paper, the model is quite clean. So how to remove such background pixels?

Spatial resolution of output feature maps in each stage

Thank you very much for sharing the source code of this great project.

I read the paper of Cas-MVSNet and I have wondered that the spatial resolution of the feature maps described in the paper is different from those used in this repo. The paper mentions that the spatial resolution of the feature maps are {1/16, 1/4, 1} of the input image in each stage, but in this repo, {1/4, 1/2, 1} of the input image seem to be used.

Also, I am wondering whether changing the scale_factor of F.interpolate to be 4 (instead of 2) in the following line is enough to reproduce the case of using the feature maps of {1/16, 1/4, 1} sizes.

intra_feat = F.interpolate(intra_feat, scale_factor=2, mode="nearest") + self.inner1(conv1)

Thank you very much for your help.

Unsatisfactory reconstruction effect on Tank and Temples dataset

Firstly, thank you for your great work and excellent code.
The pre-trained model perform perfectly on DTU dataset. However, pre-trained model cannot reconstruct on other dataset, such as Tank and Temples.
I resized the images in Tank and Temples from 1920X1080 to 1600X1200.
The param I set is: --dataset=general_eval --batch_size=1 --testpath=$TESTPATH --testlist=$TESTLIST --loadckpt $CKPT_FILE --outdir $save_results_dir --interval_scale 1.06 --max_h=2048 --max_w=2048$

The results are like this:
image

Could you please help me how can I get results in your paper?

training memory

The numbers in the paper are all for testing with batch size = 1, right? It means it doesn't have gradient, and many operations are done inplace to save memory.
Do you remember the memory requirement for training with batch size=2 and other default settings? I know I can download the repo and do it myself to see, but I think it's faster to ask you directly... If you remember the number it helps me a lot! Thank you.

larger max_w and max_h ,worse performance

Hi ,
i am trying to use your pretrained model to test some data. According to your paper ,larger max_h and max_w will lead to better results . The original max_w and max_h are 1152 x 864 , but when i set max_w and max_h to 2048 x 2048, the point cloulds i get are worse. I'm very confused. Do you know what may cause this problem?
Thank you!

Selection of number of depths in each level

For MVS, why exactly do you choose [48, 32, 8] as the final number of depths in each level?
For the experiments concerning the effect of different number of depths, I can only find Table 7, which compares different combinations like [96, 96], [96, 48, 48], etc. But I cannot find any explanation of how suddenly [48, 32, 8] is chosen or why it is better (for example is [48, 32, 16] better?). Did I miss anything in the paper? Or can you provide a brief explanation here? Thanks.

The training scheme for stereo matching task?

In the paper, I only find the training scheme for multi-view stereo, can you provide the training scheme (like batch size, number of epochs...) for stereo matching task? Thank you!

colmap2mvsnet.py not working

Hello and thank you for your work!

I get following error message when I try to run colmap2mvsnet.py with my own data. Can you help me?

(myenv) root@9264b8635167:/cascade-stereo/CasMVSNet# python colmap2mvsnet.py --d
ense_folder /data/dense --save_folder /data/outputs/scene
intrinsic
 {1: array([[3.43259350e+03, 0.00000000e+00, 1.00000000e+03],
       [0.00000000e+00, 3.43147203e+03, 8.64500000e+02],
       [0.00000000e+00, 0.00000000e+00, 1.00000000e+00]])}

extrinsic[1]
 [[-7.70919848e-01  6.35880054e-01 -3.65943429e-02  2.37719001e+01]
 [-6.36652015e-01 -7.67603992e-01  7.38804668e-02  1.88145246e+01]
 [ 1.88891514e-02  8.02537804e-02  9.96595470e-01 -5.89685944e-01]
 [ 0.00000000e+00  0.00000000e+00  0.00000000e+00  1.00000000e+00]]

depth_ranges[1]
 (8.08278138523733, 0.001266442170798268, 192, 8.3246718398598)


Traceback (most recent call last):
  File "colmap2mvsnet.py", line 469, in <module>
    processing_single_scene(args)
  File "colmap2mvsnet.py", line 406, in processing_single_scene
    result = p.map(func, queue)
  File "/root/miniconda3/envs/myenv/lib/python3.6/multiprocessing/pool.py", line 266, in map
    return self._map_async(func, iterable, mapstar, chunksize).get()
  File "/root/miniconda3/envs/myenv/lib/python3.6/multiprocessing/pool.py", line 644, in get
    raise self._value
  File "/root/miniconda3/envs/myenv/lib/python3.6/multiprocessing/pool.py", line 424, in _handle_tasks
    put(task)
  File "/root/miniconda3/envs/myenv/lib/python3.6/multiprocessing/connection.py", line 206, in send
    self._send_bytes(_ForkingPickler.dumps(obj))
  File "/root/miniconda3/envs/myenv/lib/python3.6/multiprocessing/connection.py", line 393, in _send_bytes
    header = struct.pack("!i", n)
struct.error: 'i' format requires -2147483648 <= number <= 2147483647

Accuracy for Test Data with Colmap

Hello.

I tried this implementation MVSNet+cascade cost volume. It has great results for the DTU dataset. I tried the Tanks and Temples and other private datasets with very bad results. I tried SfM using Colmap, then using the script provided to turn the colmap data into MVSNet data, and then run the algorithm. Do you have recommendations or is there any step I am missing? Does this implementation have problems with big depth differences within a depth map?

trained_model shows no points

Processing camera 0 Found 0.00 million points
Processing camera 1 Found 0.00 million points
Processing camera 2 Found 0.00 million points
Processing camera 3 Found 0.00 million points
Processing camera 4 Found 0.00 million points
Processing camera 5 Found 0.00 million points
The model I trained with your program couldn't find a point when I tested it on test.py, as shown above. But using your pretrained.model generates the point cloud normally, what could be wrong with me?Thank you!

参考视图

请问我如何能知道重建出的点云是以哪张图片作为参考图像的?

Train on private dataset - CascadeStereo

Hi,

I want to train your Net on my own dataset, with size of images about 1937x800. But if I'm doing that, I have a problem in Tensor size. What should I change to work on diffrent size of images?
On Kitti dataset everything is ok.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.