atze00 / movinet-pytorch Goto Github PK

View Code? Open in Web Editor NEW

256.0 9.0 50.0 606.12 MB

MoViNets PyTorch implementation: Mobile Video Networks for Efficient Video Recognition;

License: MIT License

Python 43.52% Jupyter Notebook 56.48%

video-recognition movinets mobile-video-networks video stream-buffer

movinet-pytorch's Issues

stream inference

how to do streamed inference in this repo?
can you guide?

Training code

Thank you for your good implementation! Could you provide more detail training code or example?

Originally posted by @zhangyuan1994511 in #2 (comment)

When pretrained stream version will be available?

It looks like there is a bug in the current causal padding of the official TF models, I have filed an issue and a pull request to fix it. I am currently waiting for feedbacks on the PR.
There is the possibility to obtain the same behavior even without the fix but I'm not considering it at the current moment.
Reference to the issue:
tensorflow/models#10062

Validation Loss did not decrease in the HMDB51 notebook?

I trained your HMDB51 notebook in 10 epochs but the validation loss did not decrease? Why did it happened?

Kinetics400/600

Thanks a lot for your codes.
I have problem with the training of kinetics400.

kinetics400_train = torchvision.datasets.Kinetics400(root='...',
                                step_between_clips=clip_steps,
                                frames_per_clip=num_frames, frame_rate=5,
                                extensions=('mp4'),
                                transform=transform, num_workers=2)

I wrote this Dataset in the same way as hmdb51, and I meet an error.
Can you provide the codes of Kinetics400 Dataset?
Looking forward to your reply.

How can we access the stream buffer?

Hi, thanks for this repo. it's amazing.
I am interested in analyzing the stream buffer but couldn't find it. Can someone ponit me in the right direction?

Thanks

The parameters that trained on Charades.

Could you please provide us with the parameters that trained on Charades ? (I just want to do a inferrance with Charades parameters on our self-bulit datasets).
Thanks a lot!

accuracy drop

I used your model code to reproduce, but the results are far from the results published in the paper. If possible, can you share some more complete implementation code, thank you very much for your work!
This is my mailbox, [email protected]. Looking forward to your reply.

need to process HMDB51 dataset?

Hi, I am trying to run the movinet_tutorial.py，while I met an error: list index out of range. Could you give any clues about this?

got wrong results during test

Hi, thanks for your great work in movinet.
I met a problem when testing hmdb51 videos.
For example, the inference result for "brush hair" seems weird, in some frames, the result shows "brush hair", while in other frames, it shows "kick ball". Did you met this problem before?
In the code, 16 frames are divided into 2 clips, each clip with 8 frames, but during test phase, the first clip's prediction is different from the second one's, and the ultimate prediction used the second. Is there anything wrong with this?

weight

In which data set is the pre training weight?

MoViNets are now open source

Just FYI, the official implementation has been released at https://github.com/tensorflow/models/tree/master/official/vision/beta/projects/movinet

Model not compatible with TorchScript conversion (via torch.jit.script)

It is not possible to convert the model to TorchScript using the function torch.jit.script. In particular, the code returns an error because of the usage of ... in the line:

MoViNet-pytorch/movinets/models.py

Line 276 in c2d1edf

def _setup_activation(self, input_shape: Tuple[float, ...]) -> None:

Even changing the type-hint definition to overcome this problem, the conversion is not possible because the attribute activation is initialized as None and then filled with a Tensor.

multi-clip sampling in dataloader

Thank you for your good implementation!
Could you provide more detail code about multi-clip sampling in dataloader?

Does this implementation supports Streaming Buffer and Causal Conv?

。

Accuracy drop when using causalConv

When I use the model without causalConv, I get the results that I expected.
However when I set causal to True, the training/validation loss drops very suddenly and eventually turns into NaN.

This leads to very undesirable results and accuracy, whilst not having any insight into the loss.
Are there any ways to solve this?

Error in running movinet-pytorch to onnx converted model

Hi @Atze00

I succesfully transfer trained this repos movinets

But i want to deploy it else where into tensorrt format

So i first converted pytorch model to onnx using torch.onnx.export

I wanted to verify whether model is correctly ported . So I ran it in onnxruntime

It is throwing error like in the screenshot

Suggest any solutions

kindly help me here

let me know if any more screenshots or anything else

F.ToFloatTensorInZeroOne not exist

Did you define you own transforms? I can't import ToFloatTensorInZeroOne() from torchvision.transforms. And I find that type of video returned from torchvision.datasets.HMDB51() is tensor, dtype=torch.uint8, so just divide 255.0 ?

Very low validation accuracy with pretrained models!

Hello, thanks for the PyTorch implementation!
I noticed that the pre-trained models have very low accuracy (0.36%) on the validation set. Have the weights changed? I'm just trying the colab tutorial.

transform for test video in movinet_tutorial.ipynb

In movinet_tutorial.ipynb, video transform for test dataset should be transform_test rather than transform I guess. Namely, hmdb51_test = torchvision.datasets.HMDB51('video_data/', 'test_train_splits/', num_frames,frame_rate=5,
step_between_clips = clip_steps, fold=1, train=False,
transform=transform_test, num_workers=2)

Training

Can I use this code to train the model on my own dataset? If so, how should the dataset be structured?

how to train customer data

Test model based on 'evaluate_stream' is ok, but do inference frame by frame is very different?

Train 'a0' model by clip=1, Tclip=8, and the acc on my custom test dataset is good, but when the model do inference for online video stream, recognition is not good as test dataset. In fact, test dataset if's from the video stream i use, so i believe their domain is same.

Online demo read one frame from video stream and concatente with previous 7 frames, then input to model, so the model do inference with 8 frames too.

code for online video stream is as follows:

if isinstance(new_frame, torch.Tensor):
      torch_inputs = torch.cat((tensor_fifo[:, :, 1:, :, :], new_frame), 2)
else:
      cvt_img = cv2.cvtColor(new_frame, cv2.COLOR_BGR2RGB)
      cvt_img = np.transpose(cvt_img, (2, 0, 1))
      cvt_img = cvt_img[:, np.newaxis, :, :][np.newaxis, :, :, :, :]
      torch_inputs = np.concatenate((tensor_fifo[:, :, 1:, :, :],
                                     cvt_img), 2)

Tips for Implementing a3 ,a4,a5 movinets streaming version

Hi @Atze00 , Many thanks to you for implementing movinets pytorch version , I had benefited greatly for my project .

In order to take my project to next level ,I need to switch to streaming versions as bigger the input dimension the better .

So I decided to implement myself if needed be the a3,a4,a5 versions of movinets streaming models and may be contribute back to community

Could u kindly give me the direction to where to start, and some tips if u may for doing it.

Thanks !

why don't you use 'T.Normalize' when you train HMDB51?

i have read your code which is at '.ipynb'.However you didn't use 'T.Normalize'. Is this transformer has a bad affect?

Using MoViNet in a dataset with variable-length videos

Hi!

Thanks for the work to bring this paper to PyTorch.

I was wondering how can we use MoViNet training with the stream buffer when we don't have videos with the same number of frames.

Authors of the paper claimed, in Section 4, that they used the method in Charades Dataset: "[...] Charades [53], which has variable-length videos with 157 action classes where a video can contain multiple class annotations." However, they don't specify which policy they used when training with variable-length videos. They also report results on the EPIC Kitchens Dataset which is also variable-length.

Do you have any insights into how they may have trained these models? The main issue here is how to build a batch when temporal dimensions are not the same...

Thank you!

Kinetics 400 models

Thanks for your Pytorch implementation for MoViNet.
Could you please provide a model trained on Kinetics 400 as Table 9 in [1].
It is quite important for our future works.
Looking forward to your reply.

[1] MoViNets: Mobile Video Networks for Efficient Video Recognition

MoViNet-A0 block 2 should have 40 out channels

Nit: per the paper, there should be 40 output channels in block 2 for MoViNet-A0, while this implementation currently has 24.

MoViNet-pytorch/movinets/config.py

Line 62 in 2ad697f

 fill_SE_config(_C.MODEL.MoViNetA0.blocks[0][0], 8, 8, 24, (1,5,5), (1,2,2), (0,2,2), (0,1,1)) 

https://arxiv.org/pdf/2103.11511.pdf

There seems no implementation of positional_encoding

Hello, thanks for the PyTorch implementation!
There seems no implementation of positional_encoding,does that influence in the stream model?

accuracy

What is the accuracy rate of your reproduced A0 network?

Modifying for binary classification

Hi, trying to solve a binary classification problem with these movinets and was wondering how I would go about creating a MoVinet with some augmentation with the aim of reducing the TFLOPS required and decreasing the inference time

Training on the custom dataset

Can I use this code to train the model on my own dataset? If so, how should the dataset be structured?

a6 implementation

hey any chance the config file will be updated for a6 soon? thanks :)

Neural network arch displayed by Netron is wrong

Hi, @Atze00
i saved MoViNtes-A0 model to 'pth' and look through model by Netron, but the structure is a little strange. maybe there is something wrong at '_forward_impl' at class 'class MoViNet(nn.Module)'.

My code is as follows:

model = MoViNet(_C.MODEL.MoViNetA0, causal=True, pretrained=False, num_classes=num_class)
...
torch.save(model, '/path/to/*.pth')
path = '/path/to/*.pth'
model = torch.load(path, map_location='cpu')

Please let me know if i did something wrong.

pretrain dataset?

Thanks for the great work, what dataset did you use for the pretrain weights?
Is it possible to provide the weights for the kinetics dataset?

About frames number of dataset

According to paper, "For all datasets, we train with 64 frames (except when the inference frames are fewer) at various frame-rates, and run inference with the same frame-rate". Frames of training dataset are 64. but frames of one video from my dataset may be less than 32, may be it's 8, 10,13, 17 etc. so is it ok?

Thanks in advance.

atze00 / movinet-pytorch Goto Github PK

movinet-pytorch's Issues

Recommend Projects

Recommend Topics

Recommend Org