Giter Club home page Giter Club logo

Comments (7)

mx-mark avatar mx-mark commented on June 19, 2024

@Enclavet First, the results of timesformer shown in this repo are pretrained on the K600, and for k400, it can achieve around 77%. How to get a similar performance largely depends on your hparams. Would you please show me your hparams loged before the training start?

from videotransformer-pytorch.

Enclavet avatar Enclavet commented on June 19, 2024

Attaching hparams:

Namespace(lr=0.005, epoch=15, gpus=-1, nccl_ifname='lan2', batch_size=8, num_workers=4, log_interval=30, save_ckpt_freq=20, num_class=400, num_samples_per_cls=10000, arch='timesformer', attention_type='divided_space_time', pretrain='vit', optim_type='sgd', lr_schedule='cosine', objective='supervised', resume=False, resume_from_checkpoint=None, num_frames=8, frame_interval=40, seed=0, train_data_path='/home/ec2-user/train_list.txt', val_data_path='/home/ec2-user/val_list.txt', test_data_path=None, root_dir='/home/ec2-user/workdir')

from videotransformer-pytorch.

mx-mark avatar mx-mark commented on June 19, 2024

@Enclavet The hparams are almostly same with my experiment settings except that i set the epoch to 30 for the consine lr schedule and 32 for frame interval. You can try the default settings to see the final result. By the way, why do you choose 40 for the frame interval? In my opnion, the 32 is enough to cover the entire video frames under a 25fps. So what i think about is how do you perform the data paperation for k400 and have you ever aligned the fps of each video sample?

from videotransformer-pytorch.

Enclavet avatar Enclavet commented on June 19, 2024

@mx-mark My data comes from this repo: https://github.com/cvdfoundation/kinetics-dataset.

Videos appear to be 10 seconds in length at around 25-30fps (not all the same). Are you doing any more data preparation beyond downloading the video + cutting the relevant section?

As mentioned 32 frame interval with 8 frames should cover most videos and I was using 40 as a test. I have done training with 32 as well and gotten similar performance. Actually the best val acc_top1 was ~75 after 15e, not ~73 as mentioned earlier.

Do you think more epochs will help? I notice that at some point acc does not improve with more epochs and can actually decrease.

from videotransformer-pytorch.

mx-mark avatar mx-mark commented on June 19, 2024

@Enclavet normally, we will resample the video fps to the same

from videotransformer-pytorch.

Enclavet avatar Enclavet commented on June 19, 2024

I aligned my dataset for 225 dimensions and 25fps and ran training on K400. I was able to achieve 76 >top1 acc.

Running it with K600 now.

from videotransformer-pytorch.

Enclavet avatar Enclavet commented on June 19, 2024

Was never able to achieve 78>top1 acc without modifying the num_frames and frame_intervals.

Was able to achieve 78>top1acc on K600 with num_frames = 12 and frame_interval set to 20.

This is with a dataset from https://github.com/cvdfoundation/kinetics-dataset resampled to 25fps and aligned to 225 dimensions.

Closing this as I am happy with the performance.

from videotransformer-pytorch.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.