Giter Club home page Giter Club logo

movinet-pytorch's People

Contributors

atze00 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

movinet-pytorch's Issues

Kinetics400/600

Thanks a lot for your codes.
I have problem with the training of kinetics400.

kinetics400_train = torchvision.datasets.Kinetics400(root='...',
                                step_between_clips=clip_steps,
                                frames_per_clip=num_frames, frame_rate=5,
                                extensions=('mp4'),
                                transform=transform, num_workers=2)

I wrote this Dataset in the same way as hmdb51, and I meet an error.
Can you provide the codes of Kinetics400 Dataset?
Looking forward to your reply.

About frames number of dataset

According to paper, "For all datasets, we train with 64 frames (except when the inference frames are fewer) at various frame-rates, and run inference with the same frame-rate". Frames of training dataset are 64. but frames of one video from my dataset may be less than 32, may be it's 8, 10,13, 17 etc. so is it ok?

Thanks in advance.

need to process HMDB51 dataset?

Hi, I am trying to run the movinet_tutorial.py,while I met an error: list index out of range. Could you give any clues about this?
image

weight

In which data set is the pre training weight?

Kinetics 400 models

Thanks for your Pytorch implementation for MoViNet.
Could you please provide a model trained on Kinetics 400 as Table 9 in [1].
It is quite important for our future works.
Looking forward to your reply.

[1] MoViNets: Mobile Video Networks for Efficient Video Recognition

F.ToFloatTensorInZeroOne not exist

Did you define you own transforms? I can't import ToFloatTensorInZeroOne() from torchvision.transforms. And I find that type of video returned from torchvision.datasets.HMDB51() is tensor, dtype=torch.uint8, so just divide 255.0 ?
Snipaste_2022-04-01_16-09-01

accuracy drop

I used your model code to reproduce, but the results are far from the results published in the paper. If possible, can you share some more complete implementation code, thank you very much for your work!
This is my mailbox, [email protected]. Looking forward to your reply.

Using MoViNet in a dataset with variable-length videos

Hi!

Thanks for the work to bring this paper to PyTorch.

I was wondering how can we use MoViNet training with the stream buffer when we don't have videos with the same number of frames.

Authors of the paper claimed, in Section 4, that they used the method in Charades Dataset: "[...] Charades [53], which has variable-length videos with 157 action classes where a video can contain multiple class annotations." However, they don't specify which policy they used when training with variable-length videos. They also report results on the EPIC Kitchens Dataset which is also variable-length.

Do you have any insights into how they may have trained these models? The main issue here is how to build a batch when temporal dimensions are not the same...

Thank you!

a6 implementation

hey any chance the config file will be updated for a6 soon? thanks :)

got wrong results during test

Hi, thanks for your great work in movinet.
I met a problem when testing hmdb51 videos.
For example, the inference result for "brush hair" seems weird, in some frames, the result shows "brush hair", while in other frames, it shows "kick ball". Did you met this problem before?
In the code, 16 frames are divided into 2 clips, each clip with 8 frames, but during test phase, the first clip's prediction is different from the second one's, and the ultimate prediction used the second. Is there anything wrong with this?
image

The parameters that trained on Charades.

Could you please provide us with the parameters that trained on Charades ? (I just want to do a inferrance with Charades parameters on our self-bulit datasets).
Thanks a lot!

When pretrained stream version will be available?

It looks like there is a bug in the current causal padding of the official TF models, I have filed an issue and a pull request to fix it. I am currently waiting for feedbacks on the PR.
There is the possibility to obtain the same behavior even without the fix but I'm not considering it at the current moment.
Reference to the issue:
tensorflow/models#10062

Tips for Implementing a3 ,a4,a5 movinets streaming version

Hi @Atze00 , Many thanks to you for implementing movinets pytorch version , I had benefited greatly for my project .

In order to take my project to next level ,I need to switch to streaming versions as bigger the input dimension the better .

So I decided to implement myself if needed be the a3,a4,a5 versions of movinets streaming models and may be contribute back to community

Could u kindly give me the direction to where to start, and some tips if u may for doing it.

Thanks !

Error in running movinet-pytorch to onnx converted model

Hi @Atze00

error_important

I succesfully transfer trained this repos movinets

But i want to deploy it else where into tensorrt format

So i first converted pytorch model to onnx using torch.onnx.export

I wanted to verify whether model is correctly ported . So I ran it in onnxruntime

It is throwing error like in the screenshot

Suggest any solutions

kindly help me here

let me know if any more screenshots or anything else

Neural network arch displayed by Netron is wrong

Hi, @Atze00
i saved MoViNtes-A0 model to 'pth' and look through model by Netron, but the structure is a little strange. maybe there is something wrong at '_forward_impl' at class 'class MoViNet(nn.Module)'.

My code is as follows:

model = MoViNet(_C.MODEL.MoViNetA0, causal=True, pretrained=False, num_classes=num_class)
...
torch.save(model, '/path/to/*.pth')
path = '/path/to/*.pth'
model = torch.load(path, map_location='cpu')

Please let me know if i did something wrong.
截屏2021-10-28 下午2 57 31

Training

Can I use this code to train the model on my own dataset? If so, how should the dataset be structured?

pretrain dataset?

Thanks for the great work, what dataset did you use for the pretrain weights?
Is it possible to provide the weights for the kinetics dataset?

Modifying for binary classification

Hi, trying to solve a binary classification problem with these movinets and was wondering how I would go about creating a MoVinet with some augmentation with the aim of reducing the TFLOPS required and decreasing the inference time

transform for test video in movinet_tutorial.ipynb

In movinet_tutorial.ipynb, video transform for test dataset should be transform_test rather than transform I guess. Namely, hmdb51_test = torchvision.datasets.HMDB51('video_data/', 'test_train_splits/', num_frames,frame_rate=5,
step_between_clips = clip_steps, fold=1, train=False,
transform=transform_test, num_workers=2)

accuracy

What is the accuracy rate of your reproduced A0 network?

How can we access the stream buffer?

Hi, thanks for this repo. it's amazing.
I am interested in analyzing the stream buffer but couldn't find it. Can someone ponit me in the right direction?

Thanks

Test model based on 'evaluate_stream' is ok, but do inference frame by frame is very different?

Train 'a0' model by clip=1, Tclip=8, and the acc on my custom test dataset is good, but when the model do inference for online video stream, recognition is not good as test dataset. In fact, test dataset if's from the video stream i use, so i believe their domain is same.

Online demo read one frame from video stream and concatente with previous 7 frames, then input to model, so the model do inference with 8 frames too.

code for online video stream is as follows:

if isinstance(new_frame, torch.Tensor):
      torch_inputs = torch.cat((tensor_fifo[:, :, 1:, :, :], new_frame), 2)
else:
      cvt_img = cv2.cvtColor(new_frame, cv2.COLOR_BGR2RGB)
      cvt_img = np.transpose(cvt_img, (2, 0, 1))
      cvt_img = cvt_img[:, np.newaxis, :, :][np.newaxis, :, :, :, :]
      torch_inputs = np.concatenate((tensor_fifo[:, :, 1:, :, :],
                                     cvt_img), 2)

Model not compatible with TorchScript conversion (via torch.jit.script)

It is not possible to convert the model to TorchScript using the function torch.jit.script. In particular, the code returns an error because of the usage of ... in the line:

def _setup_activation(self, input_shape: Tuple[float, ...]) -> None:

Even changing the type-hint definition to overcome this problem, the conversion is not possible because the attribute activation is initialized as None and then filled with a Tensor.

Accuracy drop when using causalConv

When I use the model without causalConv, I get the results that I expected.
However when I set causal to True, the training/validation loss drops very suddenly and eventually turns into NaN.

This leads to very undesirable results and accuracy, whilst not having any insight into the loss.
Are there any ways to solve this?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.