zacjiang / gma Goto Github PK

Learning to Estimate Hidden Motions with Global Motion Aggregation (ICCV 2021)

License: Do What The F*ck You Want To Public License

Python 98.28% Shell 1.72%

gma's Introduction

Learning to Estimate Hidden Motions with Global Motion Aggregation

This repository contains the source code for our paper:

Learning to Estimate Hidden Motions with Global Motion Aggregation
ICCV 2021
Shihao Jiang, Dylan Campbell, Yao Lu, Hongdong Li, Richard Hartley
ANU, Oxford

Environments

You will have to choose cudatoolkit version to match your compute environment. The code is tested on PyTorch 1.8.0 but other versions might also work.

conda create --name gma python==3.7
conda activate gma
conda install pytorch=1.8.0 torchvision=0.9.0 cudatoolkit=11.1 -c pytorch -c conda-forge
pip install matplotlib imageio einops scipy opencv-python

Demo

sh demo.sh

Train

sh train.sh

Evaluate

sh evaluate.sh

License

WTFPL. See LICENSE file.

Acknowledgement

The overall code framework is adapted from RAFT. We thank the authors for the contribution. We also thank Phil Wang for open-sourcing transformer implementations.

gma's People

Contributors

Stargazers

Watchers

gma's Issues

Reproducibility of GMA on Sintel and KITTI test

Thanks for the great code !
I try to reproduce the Sintel and KITTI test results reported in the paper. However, I got 1.58 on Sintel clean, 2.64 on Sintel final for GMA(our), and 5.14 on KITTI for GMA(p only). The results seem worse than those reported in the paper (1.39 on Sintel clean, 2.47 on Sintel final for GMA(our), and 4.93 on KITTI for GMA(p only)).
Is it because you find the best iteration checkpoint on the validation set, while I use the last iteration checkpoint? If so, may I know the validation set you choose?

Is there any requirements on the size of input images

When I input two images of size 768 x 1856 to the model, I got the error below:
einops.EinopsError: Error while processing rearrange-reduction pattern "(y v) d -> y () v d". Input tensor shape: torch.Size([25600, 128]). Additional info: {'y': 232}. Shape mismatch, can't divide axis of length 25600 in chunks of 232
Any idea why this happens?

IndexError: index 0 is out of bounds for axis 0 with size 0

Hi! great work! When I tested sintel-test-final dataset and created a create_sintel_submission, It happened" IndexError: index 0 is out of bounds for axis 0 with size 0" on a pair of images, why?

reproduce results

Hi, I have run the first two stages in your train.sh, which are chairs and things. But I only get 1.35 and 2.83 EPE on Sintel's clean and final pass.
I want to know how can I get the same result in your paper, that is 1.30 and 2.74. Is it achieved by training multiple times to get the optimal value?

Training Set of Sintel Submission

Hi, I'm trying to reproduce your result on sintel benchmark. I notice that you use 'C + T + S/K (+ H)' in the experiment table of the paper. To my knowledge, referring to the RAFT paper, C+T+S/K means when you train the sintel stage, you only use C+T+S. I have no idea what is the (+H), even with the explain “S/K (+ H)” refers to methods that are fine-tuned on the Sintel and KITTI datasets, with some also fine-tuned on the HD1K dataset. '. What does the with some' mean?
Could you please detail the training schedule you used in the Sintel submission? Is it the C+T+S+H?

the transformer head number

Thank you for your concise and efficient work!
I would like to ask a question about the impact of the transformer head's number. I found no ablation experiments for this variable in the paper you published, and you set it to 1 in your code.
May I ask if you have conducted relevant experiments on this variable, and whether the performance will be improved if the number is increased?

Cannot find file named 'things_val_test_set.txt'

Hi, file named 'things_val_test_set.txt' cannot be found in in core/datasets.py, which leads to failure when validate flyingthings3d dataset.
Please provide this file, and explain the source of that. Thanks.

Have you encountered the following two warnings?

/home/sunyy/anaconda3/envs/gma/lib/python3.7/site-packages/torch/optim/lr_scheduler.py:134: UserWarning: Detected call of lr_scheduler.step() before optimizer.step(). In PyTorch 1.1.0 and later, you should call them in the opposite order: optimizer.step() before lr_scheduler.step(). Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate
"https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate", UserWarning)
/home/sunyy/anaconda3/envs/gma/lib/python3.7/site-packages/torch/optim/lr_scheduler.py:1290: UserWarning: To get the last learning rate computed by the scheduler, please use get_last_lr().

Wrong Bilinear Interpolation in FlowAugmentor

I created this issue on the RAFT github, yet I feel like this is applicable to your work as well as you also adopt their FlowAugmentor. Could you maybe provide reasoning for this?

princeton-vl/RAFT#156

Thanks in advance!

kitti submission

Hi, it is a very nice work!
I would like to ask questions about kitti submission. When I get 'kitti_submission' folder, how can I create the right ZIP file.
The tips on the Kitti website are as follows:

For the optical flow task, do we only need to include the 'flow' folder in the submitted folder?So just rename your procedurally generated 'kitti_submission' folder to 'flow' and compress it into a ZIP file?

how to count FLOPS of GMA model?

I wonder how to accurately count floating point operations per second of GMA model. The open interface basically only counts the convolution , for those special operations in the model, are there any good statistical methods? In other words, how do you do that?

reproduce results of table 1

I want to reproduce the results of Table 1 in the paper, but the epe of the occluded area is inconsistent with that in the paper. I use the weight you provided 'gma-sintel.pth' and set the number of iterations to 32. What might be the cause of this?

will the pre-trained model be made available?

Can you guys share the model?

traing time

Hi,can you tell me the traing time of each training phase(charis,things,sintel,kitti) and the data storage device you have used(ssd or hhd)?Thank you very much!

Two-view results on Sintel

Hi @zacjiang

Would you like to release the 2-view results on Sintel as RAFT, i.e. the results without warm-start?

Can I estimate the optical flow of 4K pictures?

I'd like to estimate the optical flow of 4K pictures, but cuda out of memory.(Tesla T4)
Please tell me what I can do.
Thank you!!!

Attention Map Visualization

Thanks for making your source code public.
Could you please share your code for visualizing the attention map in Figure 6 or guide me on how to obtain it?

The document could not be downloaded.

Running Evaluation for KITTI Test

Hey @zacjiang ,

Thank you for sharing your work!
@zacjiang, I was looking to evaluate the pre-trained model on the KITTI test set. I have completed the repository set up, and got it running according to instructions mentioned on GitHub. I was able to reproduce the results mentioned in the paper in Table 2 for the KITTI train dataset.

But When I run it for the KITTI test, the execution of evaluation.py fails. This is because in evaluation.py file, It expects 4 outputs from the data loader,

GMA/evaluate.py

Lines 348 to 355 in 2f1fd29

 def validate_kitti(model, iters=6): 

 """ Peform validation using the KITTI-2015 (train) split """ 

 model.eval() 

 val_dataset = datasets.KITTI(split='training') 

 out_list, epe_list = [], [] 

 for val_id in range(len(val_dataset)): 

 image1, image2, flow_gt, valid_gt = val_dataset[val_id]

. Whereas if the split is test then according to the dataset file here:

GMA/core/datasets.py

Lines 38 to 46 in 2f1fd29

 
 if self.is_test: 

 img1 = frame_utils.read_gen(self.image_list[index][0]) 

 img2 = frame_utils.read_gen(self.image_list[index][1]) 

 img1 = np.array(img1).astype(np.uint8)[..., :3] 

 img2 = np.array(img2).astype(np.uint8)[..., :3] 

 img1 = torch.from_numpy(img1).permute(2, 0, 1).float() 

 img2 = torch.from_numpy(img2).permute(2, 0, 1).float() 

 return img1, img2, self.extra_info[index]

. It returns only three values.

Then in that case how do i get numbers for KITTI Test split.

Regards,
Nitin Bansal

reproduce results

How to handle subtitle with large motion?

Hi,

I wonder to know whether GMA could handle the case that subtitle accompany with large motion which is common in the movie?
Would the subtitle be kept well in the interpolation result?

Intuition behind two details in the code

Hello,
Thank you very much for sharing your precious work with us.
I had two questions regarding the code.

In gma.py, line 60, the query tensor is scaled by self.scale = dim_head ** -0.5. Why is that necessary? I would also be thankful if you explain why you set the value to dim_head ** -0.5.
In the same file, line 113, motion features are added to the attention output tensor. Could you please give some insights on that as well?

Thanks a lot.
Azin

Why don't you use "attention=self.att(fmp1)", but use "attention=self.att(inp)"

Hi, good work ! I have a question .Why don't you use "attention=self.att(fmp1)", but use "attention=self.att(inp)"

Recommended checkpoint for real-world images

Hi, thank you for your great work.

I want to test the GMA on real-world images (ie, not synthetic ones). Could you tell me which one of the four checkpoints (chairs, kitti, sintel, things) is expected to generalize well on real-world images?

about query projector and key projector

It says,"we project the context feature map to a query feature map and a key feature map. We then take the dot product of the two feature maps and a softmax to obtain an attention matrix"
but in network.py line 99,I just found "attention = self.att(inp)". this is what puzzled me.

Pretrained models

请问您是否会考虑公开训练好的模型呀？

Does HD1k really help?

I was using HD1k when doing Sintel finetuning, just as GMA does. I'm surprised that it only consists of grayscale images; that means there's a big domain gap between HD1k and other training sets. Wonder if I remove HD1k, would the model be trained better? My training without HD1k is ongoing, and seems the loss on the training data is much smaller, and accuracy is higher. Would update when it finishes.

	def validate_kitti(model, iters=6):
	""" Peform validation using the KITTI-2015 (train) split """
	model.eval()
	val_dataset = datasets.KITTI(split='training')

	out_list, epe_list = [], []
	for val_id in range(len(val_dataset)):
	image1, image2, flow_gt, valid_gt = val_dataset[val_id]


	if self.is_test:
	img1 = frame_utils.read_gen(self.image_list[index][0])
	img2 = frame_utils.read_gen(self.image_list[index][1])
	img1 = np.array(img1).astype(np.uint8)[..., :3]
	img2 = np.array(img2).astype(np.uint8)[..., :3]
	img1 = torch.from_numpy(img1).permute(2, 0, 1).float()
	img2 = torch.from_numpy(img2).permute(2, 0, 1).float()
	return img1, img2, self.extra_info[index]