Giter Club home page Giter Club logo

gma's Introduction

Learning to Estimate Hidden Motions with Global Motion Aggregation

This repository contains the source code for our paper:

Learning to Estimate Hidden Motions with Global Motion Aggregation
ICCV 2021
Shihao Jiang, Dylan Campbell, Yao Lu, Hongdong Li, Richard Hartley
ANU, Oxford

Environments

You will have to choose cudatoolkit version to match your compute environment. The code is tested on PyTorch 1.8.0 but other versions might also work.

conda create --name gma python==3.7
conda activate gma
conda install pytorch=1.8.0 torchvision=0.9.0 cudatoolkit=11.1 -c pytorch -c conda-forge
pip install matplotlib imageio einops scipy opencv-python

Demo

sh demo.sh

Train

sh train.sh

Evaluate

sh evaluate.sh

License

WTFPL. See LICENSE file.

Acknowledgement

The overall code framework is adapted from RAFT. We thank the authors for the contribution. We also thank Phil Wang for open-sourcing transformer implementations.

gma's People

Contributors

zacjiang avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

gma's Issues

Reproducibility of GMA on Sintel and KITTI test

Thanks for the great code !
I try to reproduce the Sintel and KITTI test results reported in the paper. However, I got 1.58 on Sintel clean, 2.64 on Sintel final for GMA(our), and 5.14 on KITTI for GMA(p only). The results seem worse than those reported in the paper (1.39 on Sintel clean, 2.47 on Sintel final for GMA(our), and 4.93 on KITTI for GMA(p only)).
Is it because you find the best iteration checkpoint on the validation set, while I use the last iteration checkpoint? If so, may I know the validation set you choose?

Is there any requirements on the size of input images

When I input two images of size 768 x 1856 to the model, I got the error below:
einops.EinopsError: Error while processing rearrange-reduction pattern "(y v) d -> y () v d". Input tensor shape: torch.Size([25600, 128]). Additional info: {'y': 232}. Shape mismatch, can't divide axis of length 25600 in chunks of 232
Any idea why this happens?

reproduce results

Hi, I have run the first two stages in your train.sh, which are chairs and things. But I only get 1.35 and 2.83 EPE on Sintel's clean and final pass.
I want to know how can I get the same result in your paper, that is 1.30 and 2.74. Is it achieved by training multiple times to get the optimal value?

Training Set of Sintel Submission

Hi, I'm trying to reproduce your result on sintel benchmark. I notice that you use 'C + T + S/K (+ H)' in the experiment table of the paper. To my knowledge, referring to the RAFT paper, C+T+S/K means when you train the sintel stage, you only use C+T+S. I have no idea what is the (+H), even with the explain “S/K (+ H)” refers to methods that are fine-tuned on the Sintel and KITTI datasets, with some also fine-tuned on the HD1K dataset. '. What does the with some' mean?
Could you please detail the training schedule you used in the Sintel submission? Is it the C+T+S+H?

the transformer head number

Thank you for your concise and efficient work!
I would like to ask a question about the impact of the transformer head's number. I found no ablation experiments for this variable in the paper you published, and you set it to 1 in your code.
May I ask if you have conducted relevant experiments on this variable, and whether the performance will be improved if the number is increased?

Cannot find file named 'things_val_test_set.txt'

Hi, file named 'things_val_test_set.txt' cannot be found in in core/datasets.py, which leads to failure when validate flyingthings3d dataset.
Please provide this file, and explain the source of that. Thanks.

Have you encountered the following two warnings?

/home/sunyy/anaconda3/envs/gma/lib/python3.7/site-packages/torch/optim/lr_scheduler.py:134: UserWarning: Detected call of lr_scheduler.step() before optimizer.step(). In PyTorch 1.1.0 and later, you should call them in the opposite order: optimizer.step() before lr_scheduler.step(). Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate
"https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate", UserWarning)
/home/sunyy/anaconda3/envs/gma/lib/python3.7/site-packages/torch/optim/lr_scheduler.py:1290: UserWarning: To get the last learning rate computed by the scheduler, please use get_last_lr().

kitti submission

Hi, it is a very nice work!
I would like to ask questions about kitti submission. When I get 'kitti_submission' folder, how can I create the right ZIP file.
The tips on the Kitti website are as follows:
1622093411(1)
For the optical flow task, do we only need to include the 'flow' folder in the submitted folder?So just rename your procedurally generated 'kitti_submission' folder to 'flow' and compress it into a ZIP file?

how to count FLOPS of GMA model?

I wonder how to accurately count floating point operations per second of GMA model. The open interface basically only counts the convolution , for those special operations in the model, are there any good statistical methods? In other words, how do you do that?

reproduce results of table 1

I want to reproduce the results of Table 1 in the paper, but the epe of the occluded area is inconsistent with that in the paper. I use the weight you provided 'gma-sintel.pth' and set the number of iterations to 32. What might be the cause of this?

image

traing time

Hi,can you tell me the traing time of each training phase(charis,things,sintel,kitti) and the data storage device you have used(ssd or hhd)?Thank you very much!

Attention Map Visualization

Thanks for making your source code public.
Could you please share your code for visualizing the attention map in Figure 6 or guide me on how to obtain it?

Running Evaluation for KITTI Test

Hey @zacjiang ,

Thank you for sharing your work!
@zacjiang, I was looking to evaluate the pre-trained model on the KITTI test set. I have completed the repository set up, and got it running according to instructions mentioned on GitHub. I was able to reproduce the results mentioned in the paper in Table 2 for the KITTI train dataset.

But When I run it for the KITTI test, the execution of evaluation.py fails. This is because in evaluation.py file, It expects 4 outputs from the data loader,

GMA/evaluate.py

Lines 348 to 355 in 2f1fd29

def validate_kitti(model, iters=6):
""" Peform validation using the KITTI-2015 (train) split """
model.eval()
val_dataset = datasets.KITTI(split='training')
out_list, epe_list = [], []
for val_id in range(len(val_dataset)):
image1, image2, flow_gt, valid_gt = val_dataset[val_id]
. Whereas if the split is test then according to the dataset file here:

GMA/core/datasets.py

Lines 38 to 46 in 2f1fd29

if self.is_test:
img1 = frame_utils.read_gen(self.image_list[index][0])
img2 = frame_utils.read_gen(self.image_list[index][1])
img1 = np.array(img1).astype(np.uint8)[..., :3]
img2 = np.array(img2).astype(np.uint8)[..., :3]
img1 = torch.from_numpy(img1).permute(2, 0, 1).float()
img2 = torch.from_numpy(img2).permute(2, 0, 1).float()
return img1, img2, self.extra_info[index]
. It returns only three values.

Then in that case how do i get numbers for KITTI Test split.

Regards,
Nitin Bansal

reproduce results

Hi, I have run the first two stages in your train.sh, which are chairs and things. But I only get 1.35 and 2.83 EPE on Sintel's clean and final pass.
I want to know how can I get the same result in your paper, that is 1.30 and 2.74. Is it achieved by training multiple times to get the optimal value?

How to handle subtitle with large motion?

Hi,

I wonder to know whether GMA could handle the case that subtitle accompany with large motion which is common in the movie?
Would the subtitle be kept well in the interpolation result?

Intuition behind two details in the code

Hello,
Thank you very much for sharing your precious work with us.
I had two questions regarding the code.

  1. In gma.py, line 60, the query tensor is scaled by self.scale = dim_head ** -0.5. Why is that necessary? I would also be thankful if you explain why you set the value to dim_head ** -0.5.
  2. In the same file, line 113, motion features are added to the attention output tensor. Could you please give some insights on that as well?

Thanks a lot.
Azin

Recommended checkpoint for real-world images

Hi, thank you for your great work.

I want to test the GMA on real-world images (ie, not synthetic ones). Could you tell me which one of the four checkpoints (chairs, kitti, sintel, things) is expected to generalize well on real-world images?

about query projector and key projector

It says,"we project the context feature map to a query feature map and a key feature map. We then take the dot product of the two feature maps and a softmax to obtain an attention matrix"
but in network.py line 99,I just found "attention = self.att(inp)". this is what puzzled me.

Does HD1k really help?

I was using HD1k when doing Sintel finetuning, just as GMA does. I'm surprised that it only consists of grayscale images; that means there's a big domain gap between HD1k and other training sets. Wonder if I remove HD1k, would the model be trained better? My training without HD1k is ongoing, and seems the loss on the training data is much smaller, and accuracy is higher. Would update when it finishes.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.