alibaba / cascade-stereo Goto Github PK

View Code? Open in Web Editor NEW

436.0 436.0 94.0 452 KB

cascade-stereo

License: MIT License

Python 87.17% MATLAB 10.44% Shell 2.39%

cascade-stereo's People

Contributors

Stargazers

Watchers

Forkers

wykxyz sarryjoe gninnur flamehaze1115 smallgreen1996 liuguoyou peterzhousz tarekbouamer arimkatz ygaojiany eatmelonboy dym-01 gxd1994 vilmchen wsandy keqpan veechiry hityzy1122 maotianwhu truongkhang martinhahner luoyin500 sttomato dojing xmlyqing00 freegliboracle cxfchxy junyingw renfan1 rensimon jeffreyking23 bee-cat chenliufeng jamesyi123 zhangtianjia darkmagicxyz lebronboo tanghf27 zemogoravla peterzs jiaxiangshang 320160040 lilipopololo tatsy 891991457 silviass98 zjjerica shengjie-lin mmalex kiyeong rongjiewang swjeong-912d dazaier edwardyang12 qining-07 zhdai ranqing tp030ny dingyikang cvhadessun capturefish ajunlonglive chen-albert-feng antonyvan hiyyg xgenietony linshaoching metavai laplacekorea lagrangeli edwinstudy lalit-96 nataliegithub minlattnwe mysterycver feilinx yunfeezhao nburgdorfer kimsoohwan guagod ipanagiotidou nicetryzz melung myegos bparui programath ck624 ethanzhangcn 0nedawn hyy-grab ssh98son agmahmoud umairkhawaja

cascade-stereo's Issues

matlab codes

Hello, how should I run the matlab codes in order to obtain accuracy and completeness?

question about bias

cascade-stereo/CasMVSNet/models/module.py

Lines 399 to 403 in 85803f3

 self.inner1 = nn.Conv2d(base_channels * 2, final_chs, 1, bias=True) 

 self.inner2 = nn.Conv2d(base_channels * 1, final_chs, 1, bias=True) 

 self.out2 = nn.Conv2d(final_chs, base_channels * 2, 3, padding=1, bias=False) 

 self.out3 = nn.Conv2d(final_chs, base_channels, 3, padding=1, bias=False)

Some layers are with bias and some are not. These layers are not followed by batch normalization, so is there any reason that you set the bias to False?

camera parameters from DTU dataset

Dear author,
I downloaded original DTU dataset. But I cannot find camera parameters there.

Where do you find original camera parameters?

How do you understand the 'I' in 'the plane interval I'? thank you !!!

Hello authors, I can't understand the meaning of the ' I ' from 'Fig3, I is the plane interval'. Can you explain it in detail?

fusibile issue

I failed to successfully compile the fusion program on win10. Do you have a solution?

About inferrence time of cascade-GwcNet

In your paper, cascade-gwc is faster than gwcnet, however, my test time of your code is 500ms, gwcnet is 320ms.
In your cascade-gwc, firstly, constrct cost volume at 1/4 resolution(12H/4W/4), and then at 1/2 resolution, (12H/4W/4).

About Flops,
I test Flops of cascade-gwc is more than four times than gwcnet?

How ti viualize the resut?

Hi,could you please tell me how to visualize to result?

The trained model did not generate points during testing

The model obtained by using DTU dataset to train casmvsnet on v100 did not generate points in the test set, but in the same environment and code, the model trained on graphics cards such as 3060, 3090, m40 can generate points normally. Have you ever encountered this situation? Looking forward to your suggestion.

Training Fails on PFM read and reshape

Hello,
I am trying to train the CasMVSNet with DTU.
The training starts normally, but fails after ~1150 iterations on reshaping depth .pfm files:

The specific file is: scan112, depth_map0026.pfm
I tried to remove the scan, but same error appeared on scan71 depth_map0005.pfm
and later on scan128 depth_map0016.pfm

I also tried to hard code the problematic .pfm file to always load, but the error did not repeat and the training proceeded with that only .pfm.

Any ideas regarding that issue?

Thanks

Can't replicate Tanks and Temples score

Hi, I am trying to replicate the score on Tanks and Temples benchmark but reaching lower performances. To compute SfM and image undistortion I am using the default COLMAP pipeline, and for post-processing I am using fusible as explained.
Have you used different hyperparameters i.e. max_h, max_w or gipuma ones?
Thank you.

How can I modify the code to run from scan1 to scan118

Hello. When I run this code, only scan1 is good
Error occurs from scan4. How can I solve this?
No ERROR message but after loading message "model ./casmvsnet.ckpt" , it doesnt work .

Ground truth ply for Tanks and Temple intermediate test set

The official T&T dataset provides code for calculating F-Score along with ground truth plys for Training sets. The evaluation code requires ground truth data to compare with predicted data. However, I could not find ground truth data for intermediate test sets anywhere. So I wanted to ask if you created that by yourself by using some third-party software like COLMAP?

Have you ever changed the resolution of the input image?

Have you ever trained a 1600x1184 or 1280x1024 resolution?
I don't know why I get that result when the result is not good.

Tanks and Temples reproducing problem and one way might be right?

Hi, dear authors. Thank for your great work.When I tried to reproduce the results on the Tanks and Temples dataset, I got many background pixels, especially in "Family", in which the main part lost. I think the reason why there are so many background pixels is that in general_eval.py -> read_cam_file we use self.ndepths=192 but in YaoYao's T&T dataset, for example: Family, the num_depth is 700+.So I annotate the line 72~75 in general_eval.py and get rid of many background pixels, I do not konw if it is a mistake but it really confused me when I tried to reproduce the results on T&T. I want to know when you submit the results to T&T benchmark, how many self.ndepths you use and if you use the num_depth that YaoYao's dataset provide?Hope for your reply.

Results on DTU

Why can't I use your pre-training model to achieve your results on DTU?

Stuck in reproject_with_depth

Hi,

when running the cascadeMVSNet, I can get the depth.pfm and pro.pfm.
But once I get into the depth fusion, the code will stuck in line 259 in test.py. So I can't get the final .ply.

coarse to fine detach vs. nodetach

When doing depth regression using coarse to fine, by default you detach the gradient here:

cascade-stereo/CasMVSNet/models/cas_mvsnet.py

Lines 130 to 133 in 85803f3

 if self.grad_method == "detach": 

 cur_depth = depth.detach() 

 else: 

 cur_depth = depth

It is intuitive, since we don't want the finer level training affect coarser levels, and it is necessary to actually analyze the effect of coarse to fine. But I wonder if the results get better if you don't detach the gradient here? Did you do any experiment, and what is your opinion?

camera projection matrices

I have downloaded dataset from https://roboimagedata.compute.dtu.dk/?page_id=36.

But I didn't find camera projection matrices.

Where can I find it?

Thanks!

Can you please provide a script to infer disparity maps on unseen data?

To check how well your method/models can generalize,
I tried using ./scripts/kitti15_save.sh to infer disparity maps on new and previously unseen data,
but unfortunately save_disp.py is too dependant on the KITTI data structure (e.g. via --testlist)
and I also do not know what the "third column" in ./filenames/*.txt is used for,
otherwise, I could maybe hack a script myself.

E.g. from ./filenames/kitti15_train.txt:
training/image_2/000000_10.png training/image_3/000000_10.png training/disp_occ_0/000000_10.png
left right ???

Having something like this would be great:

./scripts/infer.sh --left $PATH_TO_LEFT_IMAGES_FOLDER \
                   --right $PATH_TO_RIGHT_IMAGES_FOLDER \
                   --checkpoint ./checkpoints/kitti2015.ckpt \
                   --output $PATH_TO_DISPARITY_OUTPUT_FOLDER

(Assuming the corresponding file names within $PATH_TO_LEFT_IMAGES_FOLDER and $PATH_TO_RIGHT_IMAGES_FOLDER are the same.)

eval in DTU dataset

I'm confused about the speed of offical dtu eval code, your casMVS generate 30,000,000 points , which need too many time to run the matlab code, so I'm really appreciate if you have other methods that can speed up while keep the results same as matlab code .

RuntimeError: Legacy autograd function with non-static forward method is deprecated.

RuntimeError: Legacy autograd function with non-static forward method is deprecated. Please use new-style autograd function with static forward method.
请问在训练CasStereoNet时出现这样的报错要怎么处理？

it seems warmup scheduler has bug?

In utils.py , warmupscheduler, last epoch should not be used like that. scheduler.step() will not update it unless step(epoch)
And more important, the meaning of epoch is wrong in that logic.

Thats my personal opinion.......And I really appreciate your open source code
thanks

How to reproduce your result on DTU dataset?

Hello,

I'm trying to reproduce the result presented in the paper on DTU dataset. More specifically, as shown in your paper, the accuracy is 0.325 and the completeness is 0.385. However, I got the accuracy is 0.357 and the completeness is 0.359 when running your Mathlab evaluation code on the result of your pretrained model?
Can you provide more detail about the progress?

Many thanks,
Khang Truong

EPE Problem of Pretrained model on Sceneflow datasets

I have downloaded the provided pre-trained model on the Sceneflow dataset, the EPE result has a big gap with the published paper. I have tred pytorch 1.1 and 1.7, respectively. However, the results are the same as the upper figure. I am looking forward to your reply. Many thanks.

About parameters for fusion on Tank&&Temples

Hi,thanks for your excellent work and opensource code! Can you provide some details about fusion parameters on Tank dataset both of fusion via gipuma and normal fusion. Thank you!

How many views are used in DTU evaluation?

N=3 or N=5? You mentioned training used N=3, but the evaluation N is not stated. MVSNet uses N=5 to get better results, do you also use N=5?

Why I can not train the model on 2 Nvidia GTX 1080Ti GPUs with 2 training samples on each GPU?

I can not train the model on 2 Nvidia GTX 1080Ti GPUs with 2 training samples on each GPU.
And the GPU process can not be killed.

Too many background pixels of tanks and temples

Hi. Thanks very much for your great work and releasing code. I use your pretrained model to test on tanks and temple dataset, the final fused model has too many background pixels such as sky in LIghtHouse. But in your paper, the model is quite clean. So how to remove such background pixels?

Spatial resolution of output feature maps in each stage

Thank you very much for sharing the source code of this great project.

I read the paper of Cas-MVSNet and I have wondered that the spatial resolution of the feature maps described in the paper is different from those used in this repo. The paper mentions that the spatial resolution of the feature maps are {1/16, 1/4, 1} of the input image in each stage, but in this repo, {1/4, 1/2, 1} of the input image seem to be used.

Also, I am wondering whether changing the scale_factor of F.interpolate to be 4 (instead of 2) in the following line is enough to reproduce the case of using the feature maps of {1/16, 1/4, 1} sizes.

cascade-stereo/CasMVSNet/models/module.py

Line 439 in 2a1879a

 intra_feat = F.interpolate(intra_feat, scale_factor=2, mode="nearest") + self.inner1(conv1) 

Thank you very much for your help.

Unsatisfactory reconstruction effect on Tank and Temples dataset

Firstly, thank you for your great work and excellent code.
The pre-trained model perform perfectly on DTU dataset. However, pre-trained model cannot reconstruct on other dataset, such as Tank and Temples.
I resized the images in Tank and Temples from 1920X1080 to 1600X1200.
The param I set is: --dataset=general_eval --batch_size=1 --testpath=$TESTPATH --testlist=$TESTLIST --loadckpt $CKPT_FILE --outdir $save_results_dir --interval_scale 1.06 --max_h=2048 --max_w=2048$

The results are like this:

Could you please help me how can I get results in your paper?

效果没有agisoft好

请问有什么调参方法吗？

training memory

The numbers in the paper are all for testing with batch size = 1, right? It means it doesn't have gradient, and many operations are done inplace to save memory.
Do you remember the memory requirement for training with batch size=2 and other default settings? I know I can download the repo and do it myself to see, but I think it's faster to ask you directly... If you remember the number it helps me a lot! Thank you.

depth ranges from 425to 920 when save_images in TensorboardX?

Hi, great job.
In train.py, depth_est ranges from 425 to 920 when save_images in TensorboardX. But as the depth_est is the type of torch.float32, should it be scale to 0-1? See the use of save_images.

why trilinear upsample?

Great job!
Why do you use trilinear upsample?
why not bilinear upsample?

larger max_w and max_h ,worse performance

Hi ,
i am trying to use your pretrained model to test some data. According to your paper ,larger max_h and max_w will lead to better results . The original max_w and max_h are 1152 x 864 , but when i set max_w and max_h to 2048 x 2048, the point cloulds i get are worse. I'm very confused. Do you know what may cause this problem?
Thank you!

Selection of number of depths in each level

For MVS, why exactly do you choose [48, 32, 8] as the final number of depths in each level?
For the experiments concerning the effect of different number of depths, I can only find Table 7, which compares different combinations like [96, 96], [96, 48, 48], etc. But I cannot find any explanation of how suddenly [48, 32, 8] is chosen or why it is better (for example is [48, 32, 16] better?). Did I miss anything in the paper? Or can you provide a brief explanation here? Thanks.

problem about the stereo version

The training scheme for stereo matching task?

In the paper, I only find the training scheme for multi-view stereo, can you provide the training scheme (like batch size, number of epochs...) for stereo matching task? Thank you!

colmap2mvsnet.py not working

Hello and thank you for your work!

I get following error message when I try to run colmap2mvsnet.py with my own data. Can you help me?

(myenv) root@9264b8635167:/cascade-stereo/CasMVSNet# python colmap2mvsnet.py --d
ense_folder /data/dense --save_folder /data/outputs/scene
intrinsic
 {1: array([[3.43259350e+03, 0.00000000e+00, 1.00000000e+03],
       [0.00000000e+00, 3.43147203e+03, 8.64500000e+02],
       [0.00000000e+00, 0.00000000e+00, 1.00000000e+00]])}

extrinsic[1]
 [[-7.70919848e-01  6.35880054e-01 -3.65943429e-02  2.37719001e+01]
 [-6.36652015e-01 -7.67603992e-01  7.38804668e-02  1.88145246e+01]
 [ 1.88891514e-02  8.02537804e-02  9.96595470e-01 -5.89685944e-01]
 [ 0.00000000e+00  0.00000000e+00  0.00000000e+00  1.00000000e+00]]

depth_ranges[1]
 (8.08278138523733, 0.001266442170798268, 192, 8.3246718398598)


Traceback (most recent call last):
  File "colmap2mvsnet.py", line 469, in <module>
    processing_single_scene(args)
  File "colmap2mvsnet.py", line 406, in processing_single_scene
    result = p.map(func, queue)
  File "/root/miniconda3/envs/myenv/lib/python3.6/multiprocessing/pool.py", line 266, in map
    return self._map_async(func, iterable, mapstar, chunksize).get()
  File "/root/miniconda3/envs/myenv/lib/python3.6/multiprocessing/pool.py", line 644, in get
    raise self._value
  File "/root/miniconda3/envs/myenv/lib/python3.6/multiprocessing/pool.py", line 424, in _handle_tasks
    put(task)
  File "/root/miniconda3/envs/myenv/lib/python3.6/multiprocessing/connection.py", line 206, in send
    self._send_bytes(_ForkingPickler.dumps(obj))
  File "/root/miniconda3/envs/myenv/lib/python3.6/multiprocessing/connection.py", line 393, in _send_bytes
    header = struct.pack("!i", n)
struct.error: 'i' format requires -2147483648 <= number <= 2147483647

Accuracy for Test Data with Colmap

Hello.

I tried this implementation MVSNet+cascade cost volume. It has great results for the DTU dataset. I tried the Tanks and Temples and other private datasets with very bad results. I tried SfM using Colmap, then using the script provided to turn the colmap data into MVSNet data, and then run the algorithm. Do you have recommendations or is there any step I am missing? Does this implementation have problems with big depth differences within a depth map?

你好，请问在stereo matching的任务中，stage2的搜索范围，间隔和平面如何设置，如何实现

在num_stage=2时，我看到您的设置为-ndisps "48,24" --disp_inter_r "4,1"
stage1我的理解是在最大视差范围192中，以间隔为4的方式搜索48个视差平面，
但是stage2却以间隔为1搜索24个平面，请问这是怎样实现的？搜索范围怎么设置的呢？

trained_model shows no points

Processing camera 0 Found 0.00 million points
Processing camera 1 Found 0.00 million points
Processing camera 2 Found 0.00 million points
Processing camera 3 Found 0.00 million points
Processing camera 4 Found 0.00 million points
Processing camera 5 Found 0.00 million points
The model I trained with your program couldn't find a point when I tested it on test.py, as shown above. But using your pretrained.model generates the point cloud normally, what could be wrong with me?Thank you!

	self.inner1 = nn.Conv2d(base_channels * 2, final_chs, 1, bias=True)
	self.inner2 = nn.Conv2d(base_channels * 1, final_chs, 1, bias=True)

	self.out2 = nn.Conv2d(final_chs, base_channels * 2, 3, padding=1, bias=False)
	self.out3 = nn.Conv2d(final_chs, base_channels, 3, padding=1, bias=False)

	if self.grad_method == "detach":
	cur_depth = depth.detach()
	else:
	cur_depth = depth