atze00 / movinet-pytorch Goto Github PK
View Code? Open in Web Editor NEWMoViNets PyTorch implementation: Mobile Video Networks for Efficient Video Recognition;
License: MIT License
MoViNets PyTorch implementation: Mobile Video Networks for Efficient Video Recognition;
License: MIT License
how to do streamed inference in this repo?
can you guide?
Thank you for your good implementation! Could you provide more detail training code or example?
Originally posted by @zhangyuan1994511 in #2 (comment)
It looks like there is a bug in the current causal padding of the official TF models, I have filed an issue and a pull request to fix it. I am currently waiting for feedbacks on the PR.
There is the possibility to obtain the same behavior even without the fix but I'm not considering it at the current moment.
Reference to the issue:
tensorflow/models#10062
I trained your HMDB51 notebook in 10 epochs but the validation loss did not decrease? Why did it happened?
Thanks a lot for your codes.
I have problem with the training of kinetics400.
kinetics400_train = torchvision.datasets.Kinetics400(root='...',
step_between_clips=clip_steps,
frames_per_clip=num_frames, frame_rate=5,
extensions=('mp4'),
transform=transform, num_workers=2)
I wrote this Dataset
in the same way as hmdb51, and I meet an error.
Can you provide the codes of Kinetics400 Dataset?
Looking forward to your reply.
Hi, thanks for this repo. it's amazing.
I am interested in analyzing the stream buffer but couldn't find it. Can someone ponit me in the right direction?
Thanks
Could you please provide us with the parameters that trained on Charades ? (I just want to do a inferrance with Charades parameters on our self-bulit datasets).
Thanks a lot!
I used your model code to reproduce, but the results are far from the results published in the paper. If possible, can you share some more complete implementation code, thank you very much for your work!
This is my mailbox, [email protected]. Looking forward to your reply.
Hi, thanks for your great work in movinet.
I met a problem when testing hmdb51 videos.
For example, the inference result for "brush hair" seems weird, in some frames, the result shows "brush hair", while in other frames, it shows "kick ball". Did you met this problem before?
In the code, 16 frames are divided into 2 clips, each clip with 8 frames, but during test phase, the first clip's prediction is different from the second one's, and the ultimate prediction used the second. Is there anything wrong with this?
In which data set is the pre training weight?
Just FYI, the official implementation has been released at https://github.com/tensorflow/models/tree/master/official/vision/beta/projects/movinet
It is not possible to convert the model to TorchScript using the function torch.jit.script
. In particular, the code returns an error because of the usage of ...
in the line:
MoViNet-pytorch/movinets/models.py
Line 276 in c2d1edf
Even changing the type-hint definition to overcome this problem, the conversion is not possible because the attribute activation
is initialized as None
and then filled with a Tensor
.
Thank you for your good implementation!
Could you provide more detail code about multi-clip sampling in dataloader?
When I use the model without causalConv, I get the results that I expected.
However when I set causal to True, the training/validation loss drops very suddenly and eventually turns into NaN.
This leads to very undesirable results and accuracy, whilst not having any insight into the loss.
Are there any ways to solve this?
Hi @Atze00
I succesfully transfer trained this repos movinets
But i want to deploy it else where into tensorrt format
So i first converted pytorch model to onnx using torch.onnx.export
I wanted to verify whether model is correctly ported . So I ran it in onnxruntime
It is throwing error like in the screenshot
Suggest any solutions
kindly help me here
let me know if any more screenshots or anything else
Hello, thanks for the PyTorch implementation!
I noticed that the pre-trained models have very low accuracy (0.36%) on the validation set. Have the weights changed? I'm just trying the colab tutorial.
In movinet_tutorial.ipynb, video transform for test dataset should be transform_test rather than transform I guess. Namely, hmdb51_test = torchvision.datasets.HMDB51('video_data/', 'test_train_splits/', num_frames,frame_rate=5,
step_between_clips = clip_steps, fold=1, train=False,
transform=transform_test, num_workers=2)
Can I use this code to train the model on my own dataset? If so, how should the dataset be structured?
Train 'a0' model by clip=1, Tclip=8, and the acc on my custom test dataset is good, but when the model do inference for online video stream, recognition is not good as test dataset. In fact, test dataset if's from the video stream i use, so i believe their domain is same.
Online demo read one frame from video stream and concatente with previous 7 frames, then input to model, so the model do inference with 8 frames too.
code for online video stream is as follows:
if isinstance(new_frame, torch.Tensor):
torch_inputs = torch.cat((tensor_fifo[:, :, 1:, :, :], new_frame), 2)
else:
cvt_img = cv2.cvtColor(new_frame, cv2.COLOR_BGR2RGB)
cvt_img = np.transpose(cvt_img, (2, 0, 1))
cvt_img = cvt_img[:, np.newaxis, :, :][np.newaxis, :, :, :, :]
torch_inputs = np.concatenate((tensor_fifo[:, :, 1:, :, :],
cvt_img), 2)
Hi @Atze00 , Many thanks to you for implementing movinets pytorch version , I had benefited greatly for my project .
In order to take my project to next level ,I need to switch to streaming versions as bigger the input dimension the better .
So I decided to implement myself if needed be the a3,a4,a5 versions of movinets streaming models and may be contribute back to community
Could u kindly give me the direction to where to start, and some tips if u may for doing it.
Thanks !
i have read your code which is at '.ipynb'.However you didn't use 'T.Normalize'. Is this transformer has a bad affect?
Hi!
Thanks for the work to bring this paper to PyTorch.
I was wondering how can we use MoViNet training with the stream buffer when we don't have videos with the same number of frames.
Authors of the paper claimed, in Section 4, that they used the method in Charades Dataset: "[...] Charades [53], which has variable-length videos with 157 action classes where a video can contain multiple class annotations." However, they don't specify which policy they used when training with variable-length videos. They also report results on the EPIC Kitchens Dataset which is also variable-length.
Do you have any insights into how they may have trained these models? The main issue here is how to build a batch when temporal dimensions are not the same...
Thank you!
Thanks for your Pytorch implementation for MoViNet.
Could you please provide a model trained on Kinetics 400 as Table 9 in [1].
It is quite important for our future works.
Looking forward to your reply.
[1] MoViNets: Mobile Video Networks for Efficient Video Recognition
Nit: per the paper, there should be 40 output channels in block 2 for MoViNet-A0, while this implementation currently has 24.
MoViNet-pytorch/movinets/config.py
Line 62 in 2ad697f
Hello, thanks for the PyTorch implementation!
There seems no implementation of positional_encoding,does that influence in the stream model?
What is the accuracy rate of your reproduced A0 network?
Hi, trying to solve a binary classification problem with these movinets and was wondering how I would go about creating a MoVinet with some augmentation with the aim of reducing the TFLOPS required and decreasing the inference time
Can I use this code to train the model on my own dataset? If so, how should the dataset be structured?
hey any chance the config file will be updated for a6 soon? thanks :)
Hi, @Atze00
i saved MoViNtes-A0 model to 'pth' and look through model by Netron, but the structure is a little strange. maybe there is something wrong at '_forward_impl' at class 'class MoViNet(nn.Module)'.
My code is as follows:
model = MoViNet(_C.MODEL.MoViNetA0, causal=True, pretrained=False, num_classes=num_class)
...
torch.save(model, '/path/to/*.pth')
path = '/path/to/*.pth'
model = torch.load(path, map_location='cpu')
Thanks for the great work, what dataset did you use for the pretrain weights?
Is it possible to provide the weights for the kinetics dataset?
According to paper, "For all datasets, we train with 64 frames (except when the inference frames are fewer) at various frame-rates, and run inference with the same frame-rate". Frames of training dataset are 64. but frames of one video from my dataset may be less than 32, may be it's 8, 10,13, 17 etc. so is it ok?
Thanks in advance.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.