nvlabs / pwc-net Goto Github PK

View Code? Open in Web Editor NEW

1.6K 44.0 357.0 206.87 MB

PWC-Net: CNNs for Optical Flow Using Pyramid, Warping, and Cost Volume, CVPR 2018 (Oral)

License: Other

Python 71.99% C++ 0.25% Cuda 10.84% C 16.35% Jupyter Notebook 0.35% Shell 0.15% Dockerfile 0.07%

deeplearning optical-flow computer-vision pytorch caffe cvpr2018 pwc-net

pwc-net's Introduction

PWC-Net: CNNs for Optical Flow Using Pyramid, Warping, and Cost Volume

License

Usage

For Caffe users, please refer to Caffe/README.md.

For PyTorch users, please refer to PyTorch/README.md

The PyTorch implementation almost matches the Caffe implementation (average EPE on the final pass of the Sintel training set: 2.31 by Pytorch and 2.29 by Caffe).

Network Architecture

PWC-Net fuses several classic optical flow estimation techniques, including image pyramid, warping, and cost volume, in an end-to-end trainable deep neural networks for achieving state-of-the-art results.

Paper & Citation

Deqing Sun, Xiaodong Yang, Ming-Yu Liu, and Jan Kautz. "PWC-Net: CNNs for Optical Flow Using Pyramid, Warping, and Cost Volume." CVPR 2018 or arXiv:1709.02371

Updated and extended version: "Models Matter, So Does Training: An Empirical Study of CNNs for Optical Flow Estimation." arXiv:1809.05571

Project page link

Talk at robust vision challenge workshop

Talk at CVPR 2018 conference

If you use PWC-Net, please cite the following paper:

@InProceedings{Sun2018PWC-Net,
  author    = {Deqing Sun and Xiaodong Yang and Ming-Yu Liu and Jan Kautz},
  title     = {{PWC-Net}: {CNNs} for Optical Flow Using Pyramid, Warping, and Cost Volume},
  booktitle = CVPR,
  year      = {2018},
}

or the arXiv paper

@article{sun2017pwc,
  author={Sun, Deqing and Yang, Xiaodong and Liu, Ming-Yu and Kautz, Jan},
  title={{PWC-Net}: {CNNs} for Optical Flow Using Pyramid, Warping, and Cost Volume},
  journal={arXiv preprint arXiv:1709.02371},
  year={2017}
}

or the updated and extended version

@article{Sun2018:Model:Training:Flow,
  author={Sun, Deqing and Yang, Xiaodong and Liu, Ming-Yu and Kautz, Jan},
  title={Models Matter, So Does Training: An Empirical Study of CNNs for Optical Flow Estimation},
  journal={IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)},
  note = {to appear}
}

For multi-frame flow, please also cite

@inproceedings{ren2018fusion,
  title={A Fusion Approach for Multi-Frame Optical Flow Estimation},
  author={Ren, Zhile and Gallo, Orazio and Sun, Deqing and Yang, Ming-Hsuan and Sudderth, Erik B and Kautz, Jan},
  booktitle={Proceedings of the IEEE Winter Conference on Applications of Computer Vision (WACV)},
  year={2019}
}

Related Work from NVIDIA

flownet2-pytorch

Learning Rigidity in Dynamic Scenes with a Moving Camera for 3D Motion Field Estimation (ECCV 2018)

Contact

Deqing Sun ([email protected])

pwc-net's People

Contributors

Stargazers

Watchers

Forkers

soumyadeepdey grseb9s locussam sunghoonim peiliangli asetsuna baiyancheng20 fahall quantumgame jingchuncheng hzhang57 escaton615 janysunny bennnun xuezhisd lincaiming smartporridge mengrang jrenzhile andyzzz peterzhousz shlpu sandy93 enqing626 winroot ljpadam cbhanu wpfhtl u10116032 ly015 cxmandtxw intuitionmachine codeaudit wine3603 jdc08161063 shubhampachori12110095 flt19940317 lyk125 keyky xianyuhong rajat95 6mz dqqeyeverify samarth4149 hqleeustc xychenunc sadjadasghari peterzs jiapei100 xiazhenyz yechengxi xjtuwh 156aasdfg qmind leoyouli imkanghan wang-kx deepcharle willprice klqulei runngezhang fuankarion amirunpri2018 hbachchas zoombapup bkvie yenchiasu khrylx dovedx topgun666 lizn520 baiti01 wujinlonglovezhangmiao1314 lvzhaoyang swayfreeda jeffbaena vinaysannaiah tsenst zacharygong fengkai11 zgsxwsdxg rahulschand zekunzh jingzhi sabarim whqing amo5 liuzhuforfun pkurainbow ossdc zhouyao4321 smallflyingpig nicolascaviezel hienpham15 ictzyq collector-m jgyoung33 berengarchen yh1992 cxvista

pwc-net's Issues

can you share the Pytorch training code?

Hi author,
thank you for sharing the handy Pytorch inference code. Could you share the training code?

On the training of pytorch model

What accuracy can be obtained if using Pytorch codes to train the model? Do you have the training code in Python version? Thank you so much~

CuDNN Error

Congratulations! You have done a great job.
I have a problem when I execute the code:
python script_pwc.py './data/frame_0010.png' './data/frame_0011.png' './tmp/frame_0010.flo'
and then, there is a problem just show here:

I download the pytorch use the code: conda install pytorch=0.2 torchvision cuda80 -c pytorch
and correctly install the cuDNN。So, I‘m so worried about this. Why does this happen? How can I solve it.
Looking forward to your reply.

Is the provided model finetuned on KITTI

Could the author specify the training set used for the provided model, i.e. pwc_net.pth.tar. Aside from the flying chair for pretraining, I guess it has also been finetuned on both sintel and KITTI?

question about the top level flow

Hi, this is good work!
I have a question about this model.
In paper the smallest flow is initialized to zero.
Is there a special reason doing this?

finetune on flyingthings3d

Hi, one question about the finetune on flyingthings3D.
Did you only use the "into_future_left", or all the combination "into_future_right", "into_past_left" and "into_past_right"?
Thanks for your answering.

self.deconv2 in PyTorch version is not being used

Thank you for sharing your work, it is greatly appreciated!

I just wanted to point out that self.deconv2 in the PyTorch version is not being used / referenced.

PWC-Net/PyTorch/models/PWCNet.py

Line 116 in a17ec5b

self.deconv2 = deconv(2, 2, kernel_size=4, stride=2, padding=1)

Correlation operation in cost volume layer

Hi,

My question is about the exact correlation operation used in the cost volume layer. In the paper, the operation is defined as:

where the dot product of the 2 feature vectors is divided by the length N of the feature vectors. However, in the code for the cost volume layer the dot product has been scaled by sumelems which has been defined as kernel_size*kernel_size*bottomchannels in one place and as (kernel_radius*2+1)*(kernel_radius*2+1)*bottomchannels in another. I'm guessing that the kernel_size is the same as 2*kernel_radius+1, but I am not sure.

I am implementing the layer in another framework and hence wanted to clarify this: the dot product of the 2 feature vectors is finally divided by (2*d+1)*(2*d+1)*N, where d is the kernel radius and N is the number of channels in the feature map. Is this correct?

Ask for the trainning code, thank you very much!

Firstly, thanks for your sharing with the code. Your work is brilliant and very helpful to me.
I'm new in pytorch, I need to the trainning code to apply the network on my data.
Is it convenient for you to provide the training code? Thank you very much!

Can't find corr_cuda_forward

Greetings,
When I try to repeat this model, I can't find the implementation of "corr_cuda_forward". Where is this defined?
Thanks

Warping with ground truth flow instead of predicted flow

Hi,

My question is about the operation used to warp the second image towards the first image. In your work, you choose to use upsampled flow prediction from the upper pyramid level to warp the image in the level below. However, won't this result in incorrect training signals being passed during the training phase?

For instance, consider the following: Let's focus on some one pixel (say A) in image 1. We are looking for the appropriate pixel in image 2 which should be warped to A's location. Let C be the true matching for A (that is, the pixel which is A in image 1 has moved to pixel C in image 2). However, while training (because our flow predictions are not perfect), the upsampled flow prediction takes us to a different point B as the point to be warped.

(Let a.b denote the correlation operation between feature vectors at A and B).
Now we compute the cost volume as a.b, and this is used to predict the flow. The training signal that goes back to the flow estimator network and the feature extractor network is incorrect as the gradient is computed using the wrong input (a.b) --- it would be the correct signal if the cost volume were computed as a.c.

In essence, this is a problem because you backpropagate the errors using only subgradients (as in the Spatial Transformers paper) rather than the full gradients. Thus, the flow predictor network updates its weights assuming that the input to it is a.b, whereas the correct input to it should have been a.c. This problem is also described in section 3 of the paper Occlusion Aware Unsupervised Learning of Optical Flow under the heading Backward warping with a larger search space.

That paper also proposes a different warping mechanism to choose the pixel as close to the true warping as possible. However, I was thinking that to solve this problem, can't we just use the ground truth flow for warping the image while training? This should not make a difference because the warping layer does not have any trainable weights, so the network will still be trained end-to-end. More importantly, using ground truth for warping will ensure that the error and gradient signals are computed using the true warped point, so the training signal propagated back will be correct. What are your thoughts on this?

PyTorch model has poor performance at high level

Hi @deqings,
I have tested the pre-trained models provided on Sintel and FlyingChairs datasets. The result of flow2 looks good. However, the results at the higher levels have worse performance. Even at the top level of flow6, the scale of prediction is not on same the order of magnitude.

size:  [(1L, 2L, 112L, 256L), (1L, 2L, 56L, 128L), (1L, 2L, 28L, 64L), (1L, 2L, 14L, 32L), (1L, 2L, 7L, 16L)]
max: [1.894425392150879, 1.697497010231018, 1.6027228832244873, 1.060249924659729, 0.23016595840454102]
min: [-0.40163561701774597, -0.5419626832008362, -1.1349056959152222, -0.5822294354438782, -0.2260863333940506]

I don't know if this is normal. I have tested on pytorch0.4.1/python3.5 and pytorch0.2.3/python2.7. Hope anyone can help me.

Unexpected key "deconv2.weight" in state_dict

I'm trying to run the pytorch version of PWC-net. I run it in Python 2.7.15, Pytorch 0.2.0_4.
When I run the demo code:

python script_pwc.py './data/frame_0010.png' './data/frame_0011.png' './tmp/frame_0010.flo'

It returned this error:

KeyError: 'unexpected key "deconv2.weight" in state_dict'

Do you know how to fix it?
Thank you.

Slow response before CVPR deadline (Nov. 16th)

Hi everyone,

Thanks for your interest in PWC-Net.

Many apologies for the slow/no response in September, because I have been out of office and traveling internationally. I will try to catch up with replies in the following week.

Best regards,
Deqing

jbarker-nvidia/pytorch-correlation does not exist

Hi there.
While building the external correlation package, I see that https://github.com/jbarker-nvidia/pytorch-correlation does not exist. Am I missing something obvious?

Thanks!

some issues and corresponding corrections in Caffe-Project

two issues and corresponding corrections:

visualization

change from flow_io import flow_read_uv to from flow_io import flow_read
change I_flow = Image.fromarray(viz_flow(uv[0,:,:], uv[1,:,:])) to I_flow = Image.fromarray(viz_flow(uv[0], uv[1]))

compile
error: Unknown Engine error
see link

one question: where is run_rob_test.py in the 3rd step (testing)

Flow vector rescaling

Thanks for sharing the code!

Why are the flow vectors rescaled this way:
u_ *= H/ float(H_)
v_ *= W/ float(W_) ,
instead of
u_ *= W/ float(W_)
v_ *= H/ float(H_)?

Low training speed using Caffe training code

Thanks for sharing the code. It is a really impressive work.

I am trying to repeat the result with the caffe training code. However, I observed the training speed is much slower than the stated one (in table 7 of the paper). I used a single NVIDIA 1080ti with Cuda 9.0 on an Ubuntu 16.04 machine. The training speed on FlyingChair is around 5.5K/h, which means at least 1200/ 5.5 = 218 = around 9 days are needed to train only on the FlyingChair. Is there any tip to speed up the training?

In addition, may I ask the training speed stated in table 7 on the paper means training on FlyingChair only or training on FlyingChair + FlyingThings + Kitti/Sintel?

What is the fine-tuning strategy on Kitti/Sintel, did you also use the S_fine schedule (which means in total the training step is 1200k + 500k + 500k = 2200k)?

Training epochs, and epoch size.

Hi,

When training your code, you might add some hyperparameters such as epochs and epoch size.
I am also training my own model and compare the results with yours.
Here, I wonder which value (epochs and epoch size) you used on your model training.
I think the results depend on the hyperparameters you selected.
Could you let me know hyperparameters?

Thanks

undefined symbol: __cudaPopCallConfiguration

Hi,

I am trying to get the code running.
I compiled and installed everything and believe I passed this stage successfully.
However, I currently receive the following error:
ImportError: /correlation_package/_ext/corr/_corr.so: undefined symbol: __cudaPopCallConfiguration

Any support would be helpful

Is the KITTI model only fine-tuned on KITTI2015 occ training set? not a mixture of KITTI2012 and 2015?

Failed to compile correlation with pytorch 0.4.1 cuda 9.0

os: ubuntu 17.10
cuda: 9.0
pytorch: 0.4.1.post2

tried to run make_cuda.sh got following error:

correlation-pytorch/correlation_package/src/corr_cuda.c:23:27: 
error: dereferencing pointer to incomplete type ‘THTensor {aka struct THTensor}’
     int batchSize = input1->size[0];

Regarding the number of epoch

Dear Author
It is very impressive work.
I can not find any refer either in paper or code for number of epoch, which you considered for training. I appreciate if you mention how many epoch did you use for training purposes.

Thanks in advance

About the EPE computing

As you have mentioned, the EPE is 2.29 and 2.31. Which data set is this performance from? Training set, test set or eval set you customized?

About the upfeature

A great work! I have a doubt when reading your code.
I note that you employ several deconv layer to obtain upfeature with 2 channels. What motivated you to do this? Have you tried a thicker upfeature(channels number >2)?
Thanks!

PWC-Net-small

Hi,deqing

I tried to drop the densenet connetctions to generate the PWC-Net-small network and other settings are same as the default PWC-Net. But it didn't converge.
The image show an example architecture from predict_flow5 -> predict_flow4.

Could you check this img to see if it's correct or if there is something important that I don't know.

Thank you
Jiqiang

pretrained model

Could you please provide the model weights which pretrained on Flying Chair and finetuned on flyingthings3D?
Thanks~

undefined symbol: state

When I ran "bash make_cuda.sh", it didn't raise any mistake, but when I ran "python test/test.py", it reported "correlation_package/_ext/corr/_corr.so: undefined symbol: state", I don't know how to solve it.
I use Ubuntu16.04, cuda8.0 and pytorch 0.2

Why the conv weight of conv6aa and conv6a are zeros, any illustration?

Dear Deqings,
Thanks for your contribution! Really impressive work!.
But when I load your model, I found a problem, would you mind giving some illustrations about this phenomenon?
this is part of the nets:
c11 = self.conv1b(self.conv1aa(self.conv1a(im1)))
c21 = self.conv1b(self.conv1aa(self.conv1a(im2)))
c12 = self.conv2b(self.conv2aa(self.conv2a(c11)))
c22 = self.conv2b(self.conv2aa(self.conv2a(c21)))
c13 = self.conv3b(self.conv3aa(self.conv3a(c12)))
c23 = self.conv3b(self.conv3aa(self.conv3a(c22)))
c14 = self.conv4b(self.conv4aa(self.conv4a(c13)))
c24 = self.conv4b(self.conv4aa(self.conv4a(c23)))
c15 = self.conv5b(self.conv5aa(self.conv5a(c14)))
c25 = self.conv5b(self.conv5aa(self.conv5a(c24)))

    c16 = self.conv6b(self.conv6a(self.conv6aa(c15)))
    c26 = self.conv6b(self.conv6a(self.conv6aa(c25)))

After loading your model, I print the value of each layers and find that:

('conv6aa.0.weight', tensor(1.00000e-35 *
[[[[-0.9822, -0.9869, -0.9835],
'conv6aa.0.bias', tensor(1.00000e-03 *
[-2.3138, 2.3138, -2.3143, -2.3138, 2.3135, -2.3138, -2.3138,
'conv6a.0.weight', tensor([[[[-1.5347e-36, -1.0942e-35, -1.7988e-35],
'conv6a.0.bias', tensor([-1.9254e-05, -1.9237e-05, -1.5575e-05, -1.6169e-05, -1.5735e-05

something about train.prototxt

First, thank you for your great work!

I am not very failiar with the training process of CNN Optical Flow, so will you release a templet train prototxt of the Caffe version code?

Best

"cudaCheckError() failed: a PTX JIT compilation failed"

Thanks for your work!

I have installed the correlation package. Then I run the test.py and it got:
"cudaCheckError() failed: a PTX JIT compilation failed"

Since the correlation package is necessary for the PWC-Net, I can't go on with the rest.
Can you give some idea to solve this issue? I use GTX 1080 and CUDA 9.1.
Thanks a lot!

Getting very large flow output.

Thank you very much for your sharing.

I'm getting the following output when running command
python script_pwc.py './data/frame_0010.png' './data/frame_0011.png' './tmp/frame_0010.flo'
with line 67 in script_pwc.py changed to "net = models.pwc_dc_net_old(pwc_model_fn)".
(net = models.pwc_dc_net(pwc_model_fn) function causes error: KeyError: 'unexpected key "deconv2.weight" in state_dict' .)

This output has extremely large flow (max flow: 1136.9962 flow range: u = -485.907 .. 717.155; v = -136.420 .. 1076.126) and looks very weird.

Could you please give me some advice on what could possible go wrong?
(I'm running with cuda 8.0, python 2.7 and pytorch 0.2.0; and the reference flow tmp/reference_frame_0010.flo looks good using the same illustration code.)

Thank you very much!

Whether the model can detect subtle movement

Congratulations, you have done a great job.
I have a question about whether the model can detect subtle movement? For example, in the face recognition system, the movement of the face in front of the camera within 1 second is very subtle. So, whether this model can detect such subtle movement?
Thank you very much. Looking forward to your reply.

Recompute_mean in Test Phase

Hi Author,

very impressive work!

When I check the code, I found there still recompute mean when testing.

Is this mean you use the statistical characteristics of test dataset?

proc_images output does not match reference

I ran proc_images.py in the Caffe/ subdirectory, but the output flow does not match the provided reference flows.

The forward flow ("frame_0010_forward.png") that was output when I ran proc_images.py:

The reference image:

I used OpticalFlowToolkit to visualize the flows, but the raw .flo files also differ. I compiled flownet2 from commit 9eed763, and PWC-net at commit 6ebd42a.

Am I running the wrong model? The reference seems a bit cleaner than what I get on my machine.

Training scripts

Great project. Any plans to release the training scripts?

meaning of scaling factor

Hi, Nice work and your code works well Thanks!

Just a simple question about training and testing

Is there any reason why you multiply (0.65 1.25 2.5 5.0) to flow not (1.0 2.0 4.0 8.0). ( maybe this parameter come from flownet implementaion?)
Can model predict flow larger than trained resolution images? ( eg Full HD size )

waiting your reply Thanks!

Training the context network

Hi,
Your paper has very interesting ideas and I thoroughly enjoyed reading it. I had a couple of questions about the paper:

Question 1

I had got my hands on an earlier version of the paper. In this version, it was mentioned that:

We jointly learn all the parameters, including those of the context network.

However, in the same paper it was said that the loss function used while training is:

The w(x) used in the above equation has been used to denote the output of the optical flow estimator network. If this loss function is used, I don't see how the context network would be trained. Should the equation use ŵ(x) instead of w(x), where ŵ(x) refers to the output of the context network? This will ensure that the full network is indeed trained end-to-end.

Question 2

This line about learning all the parameters together has been removed in a recent version of the paper. Is that because you finally trained the context network separately from the rest of the pipeline? If that is the case, what loss function did you use when training just the context network?

Minor Error in proc_images.py

I just noticed that at the line 37

im2 = imread(images1[0]) should be im2 = imread(images2[0])

if I'm not misunderstanding your code.

Training time

Can you comment on the how much time it took you guys to train the model from scratch? And what configuration you have in terms of GPU power?

Thanks

some doubts

Hi Deqing,

Thanks for sharing your code, it's really cool. I trained it on Chairs but I still have some doubts.

The model size of Caffe/model/pwc_net.caffemodel and pwc_net_chairs_things.caffemodel is not equal. One is 43.6M and one is 69.5M , meanwhile, the caffemodel I trained using your solver and train.prototxt is about 39M. But they can be used by the same pwc_net_test.prototxt...... I am confused. The smaller one is from pwc-net-small in your paper?
The multi-gpus training of Flownet2.0 code has something wrong with the training time. In order to training with multi gpus, I add all the layers to my own caffe-tools, and I test the AEPE of your two pretrained models on Chairs_test(640 image pairs), the result is :
pwc_net.caffemodel AEE： 2.8355
pwc_net_chairs_things.caffemodel AEE：2.3099
Is this result correct? If it's wrong, would you share your AEE eval tools?
For 1 GPU experiments ,I trained 3 times using the flownet2.0 code, and my batchsize is 8, all settings is copied from your code. But the loss didn't converge for 2 times and only 1 times the loss converged to 20-40. When set the batchsize to 32, it converged to 25-40. Is that normal?

Thank you for your patience!
Best Regards!

optical flow feature and optical flow upsampling

This is a very elegant work.I have some doubts,and I hope to get help.
1.The highest scale features used in the program are 1/4, why not use the original scale features?
2.Why does the optical flow upsampling not use linear interpolation, but use nn.ConvTranspose2d, through the layer with variables, the optical flow will not find changes?

Thanks!

Question about input to optical flow estimator

Hi, I have a question about the input to the flow estimator. As mentioned in the paper, the input is:

the cost volume, features of the first image, and upsampled optical flow

Then I notice in the code that the input also consists of upsampled features from previous scale, such as up_feat6 in x = torch.cat((corr5, c15, up_flow6, up_feat6), 1). Could you please give some insights on incorporating this kind of information? Thanks.

how can i use corr1d with single direction for disparity estimation?

how can i use corr1d with single direction for disparity estimation?
int corr1d_cuda_forward(THCudaTensor *input1,
THCudaTensor *input2,
THCudaTensor *rbot1,
THCudaTensor *rbot2,
THCudaTensor *output,
int pad_size,
int kernel_size,
int max_displacement,
int stride1,
int stride2,
int corr_type_multiply
//single_direction=0
)
best and thanks!

The input channel of convolutions in the optical flow estimator class

Thanks for such a great work.

I checked the code and paper, I cannot figure out one point. I would be very thankful if you can clarify it for me.
How did you define the number of the in channel input for the the optical flow decoder? why the input channel of the next level is not equal to output channel of previous layer?
what is the logic behind these number?
I really appreciate your support.

Thanks for sharing and some little doubts

Hi all,

Thanks for sharing your code! I've been trying to understand your work for a couple of weeks, and just read your released code. Could you please answer my doubts as following? I'm not familiar with Caffe so I didn't read your Caffe code yet. Sorry for if there were some details having been clearified in Caffe version.

Why you divide groundtruth by 20. It seems that dividing GT level by level by its scaling factor is making more sense. Dividing by 20 makes the problem more like regression instead of geometry or principle, doesn't it?
What's the difference between PWC-Net and PWC-Net_ROB?

Difference between Flowwarp Layer and Warp Layer

Hi, I have a small question about the caffe version of your PWC-net.
As shown in readme.md, this version is based on flownet2.0. However, you re-implemented warp layer as 'warp_layer.cu'. I also noticed that flownet2.0 had a similar implemented layer called 'flow_warp_layer.cu' (https://github.com/lmb-freiburg/flownet2/blob/master/src/caffe/layers/flow_warp_layer.cu) . After reading codes, i found both warped images/features using bilinear interpolation and i wonder what's the difference?
Thank you!

Much Memory needing for the caffe version

Thank you so much for sharing your code.
I have an issue about running the caffe version.
My PC has NVIDIA 1080 Ti.
When I run it on a pair of images 1080p, it raised this error:

syncedmem.cpp:56] Check failed: error == cudaSuccess (2 vs. 0) out of memory

Does it need that much memory?
Thank you.

Getting better results than the reported ones

Hi @deqings,

This is very elegant work, Dr. Sun. Implementing your architecture using TensorFlow and making a couple changes, I was able to get better results than the ones reported at https://arxiv.org/abs/1709.02371. Please believe me when I write that sharing this with you isn't meant to criticize the value of your research in any way. I just want to single out these two low-hanging fruits so that you may also benefit from them, should be you be interested in doing so.

The official multistep schedule discussed in your paper is as follows: S_long 1.2M iters training, batch size 8 + S_fine 500k iters finetuning, batch size 4). Ours is S_long only, 1.2M iters, batch size 8, on a mix of FlyingChairs and FlyingThings3DHalfRes. FlyingThings3DHalfRes is our own version of FlyingThings3D where every input image pair and groundtruth flow has been downsampled by two in each dimension. We also use a different set of augmentation techniques (details in augment.py).

The motivation for using FlyingThings3DHalfRes is as follows: the average flow magnitude on the MPI-Sintel dataset is only 13.5, while the average flow magnitudes on FlyingChairs and FlyingThings3D are 11.1 and 38, respectively. In our experiments, finetuning on FlyingThings3D would only yield worse results on MPI-Sintel.

We got more stable results by using a half-resolution version of the FlyingThings3D dataset with an average flow magnitude of 19, much closer to FlyingChairs and MPI-Sintel in that respect. We then trained on a mix of the FlyingChairs and FlyingThings3DHalfRes datasets.

Our results are shown below:

Model name	Notebooks	FlyingChairs (384x512) AEPE	Sintel clean (436x1024) AEPE	Sintel final (436x1024) AEPE
`pwcnet-lg-6-2-multisteps-chairsthingsmix`	train	1.44 (notebook)	2.60 (notebook)	3.70 (notebook)
`pwcnet-sm-6-2-multisteps-chairsthingsmix`	train	1.71 (notebook)	2.96 (notebook)	3.83 (notebook)

The official, reported results are as follow:

Thank you again for this very impressive work!

Respectfully, -- Phil

Error when trying to build

I am trying to follow line by line the commands on the make_cuda.sh file (on windows 10), but when I run the last one, I get the following errors:

Creating` library build\temp.win-amd64-3.6\Release\build\temp.win-amd64-3.6\Release\_corr.cp36-win_amd64.lib and object build\temp.win-amd64-3.6\Release\build\temp.win-amd64-3.6\Release\_corr.cp36-win_amd64.exp
LINK : warning LNK4098: defaultlib 'LIBCMT' conflicts with use of other libs; use /NODEFAULTLIB:library
corr_cuda.obj : error LNK2001: unresolved external symbol __imp_THCState_getCurrentStream
corr_cuda.obj : error LNK2001: unresolved external symbol __imp_THCudaTensor_resize4d
corr_cuda.obj : error LNK2001: unresolved external symbol __imp_THCudaTensor_data
corr_cuda.obj : error LNK2001: unresolved external symbol __imp_THCudaTensor_free
corr_cuda.obj : error LNK2001: unresolved external symbol state
corr_cuda.obj : error LNK2001: unresolved external symbol __imp_THCudaTensor_zero
corr_cuda_kernel.cu.o : error LNK2001: unresolved external symbol cudaGetLastError
corr1d_cuda_kernel.cu.o : error LNK2001: unresolved external symbol cudaGetLastError
corr_cuda_kernel.cu.o : error LNK2001: unresolved external symbol cudaGetErrorString
corr1d_cuda_kernel.cu.o : error LNK2001: unresolved external symbol cudaGetErrorString
corr_cuda_kernel.cu.o : error LNK2001: unresolved external symbol cudaConfigureCall
corr1d_cuda_kernel.cu.o : error LNK2001: unresolved external symbol cudaConfigureCall
corr_cuda_kernel.cu.o : error LNK2001: unresolved external symbol cudaSetupArgument
corr1d_cuda_kernel.cu.o : error LNK2001: unresolved external symbol cudaSetupArgument
corr_cuda_kernel.cu.o : error LNK2001: unresolved external symbol cudaLaunch
corr1d_cuda_kernel.cu.o : error LNK2001: unresolved external symbol cudaLaunch
corr_cuda_kernel.cu.o : error LNK2001: unresolved external symbol __cudaRegisterFatBinary
corr1d_cuda_kernel.cu.o : error LNK2001: unresolved external symbol __cudaRegisterFatBinary
corr_cuda_kernel.cu.o : error LNK2001: unresolved external symbol __cudaUnregisterFatBinary
corr1d_cuda_kernel.cu.o : error LNK2001: unresolved external symbol __cudaUnregisterFatBinary
corr_cuda_kernel.cu.o : error LNK2001: unresolved external symbol __cudaRegisterFunction
corr1d_cuda_kernel.cu.o : error LNK2001: unresolved external symbol __cudaRegisterFunction
build\lib.win-amd64-3.6\correlation_package\_ext\corr\_corr.cp36-win_amd64.pyd : fatal error LNK1120: 14 unresolved externals
error: command 'C:\\Program Files (x86)\\Microsoft Visual Studio 14.0\\VC\\BIN\\x86_amd64\\link.exe' failed with exit status 1120

Does anyone have an idea of what the problem could be? I tried looking around on the net and in the code but I can't figure it out.

Thank you !!