Giter Club home page Giter Club logo

vedatad's People

Contributors

hxcai avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

vedatad's Issues

Question about training time

Hi,

Thanks for your wonderful repo. It reports a very large epoch number in your paper. So I wonder the training time of the model. And how many GPUs do you use?

GPU type for training

Hi! Thanks for the great work!

May I ask what GPU type did you use for training? Did you also use gtx1080ti the same as inference?

Labels: background as 1, foreground as 0?

Hi guys, thanks for your work and sharing the code. I have a question about the labels input to calculate the loss. So I understand it as if we have multi-class detection problem, say 5 categories, then the foreground would be 0,1,2,3,4 and the background will be 5. So similarly, if we only have 1 class, then foreground would be 0, background would be 1.

I was just wondering whether this "fg-0 bg-1" has been flipped (as in "fg-1 bg-0") in calculating the loss? Cuz I saw from the vadacore.ops.sigmoid_focal_loss, specifically in sigmoid_focal_loss_cuda.cu file, it wrote

__global__ void SigmoidFocalLossForward(const int nthreads,
                                        const scalar_t *logits,
                                        const int64_t *targets,
                                        const int num_classes,
                                        const float gamma, const float alpha,
                                        const int num, scalar_t *losses) {
  CUDA_1D_KERNEL_LOOP(i, nthreads) {
    int n = i / num_classes;
    int d = i % num_classes;  // current class[0~79];
    int t = targets[n];       // target class [0~79];

    // Decide it is positive or negative case.
    scalar_t c1 = (t == d);
    scalar_t c2 = (t >= 0 & t != d);

And I guess this int d = i % num_classes; // current class[0~79] is where the labels are flipped (so labels become bg-0 fg-1)?

The reason why I have this question is when I look at the loss, if the labels aren't flipped, it doesn't make sense. For the simplest case, Binary Cross Entropy loss, it should be

loss = - [y log(p) + (1-y) log(1-p)]

Minimizing the loss is equivalent to maximizing y log(p) + (1-y) log(1-p). So here, when y=1, we maximize p; when y=0, we maximize 1-p i.e. minimize p. And so here, if the input labels are in "bg-1 fg-0", we should make it "bg-0 fg-1". Is this correct?

Thanks!

End-to-end training?

Hi @Media-Smart Thank you for your excellent work and clean implementation. I want to ask you if you trained the whole network end-to-end. As you have described in the paper one of the disadvantage of two stream methods is their difficulty of training end-to-end. As this work is not based on two-stream input, I am assuming you have trained the network end-to-end. So did you optimize the feature extractor network too when training your model?

open-mmlab weights are not loading

Tried to run model on THUMOS14 and seems open-mmlab://i3d_r50_256p_32x2x1_100e_kinetics400_rgb having an issue with loading. Attached the error log for reference.

weight file mismatch with the model

Hii thanks a lot for your work and sharing the code. I'm having trouble loading the weights file. So at the very beginning, it could not be loaded and then I used #10 suggested method and it worked. But then, I had this "unexpected keys" and "missing keys" issues.
image

I only changed num_classes = 1 in the second section "2. model" as I want to retrain the model on my dataset. But even if I changed it back to num_classes = 20, it's still having the same problem.

Could you help me with it? Thanks!

Question about FPS

Thank you for you excellent work! I have a question about the fps.
The fps is 25 when you extract frames, but the fps of the video is 30, and duration in txt2json.py is calculated by fps 30.
Does this influence the results? Waiting for your reply sincerely.

question about .cpp

在dcn/deform_conv.py文件中导入了.cpp文件, from . import deform_conv_ext,deform_conv_ext.cpp等文件您是怎样编译的呢?我没找到编译命令,烦请指点。

Question about pipeline for Inference with `InferEngine`

How is data needed to be prepared for using InferEngine
If my inference was something like

def read_video(video):
    '''Read video prepare video_metas'''
    pass

def prepare(cfg, checkpoint):
    engine = build_engine(cfg.infer_engine)
    load_weights(engine.model, checkpoint, map_location='cpu')

    device = torch.cuda.current_device()
    engine = MMDataParallel(
        engine.to(device), device_ids=[torch.cuda.current_device()])

    data_pipeline = Compose(cfg.data_pipeline)

    return engine, data_pipeline

def main():
    args = parse_args()
    cfg = Config.fromfile(args.config)

    engine, data_pipeline = prepare(cfg, args.checkpoint)

    imgs, video_metas = read_video(args.video)

    data = data_pipeline(imgs)
    
    # scatter here

    results = engine.infer(data['imgs'], video_metas)

    print(results)

I will likely need to change the pipeline from the default but to what

data_pipeline=[
    dict(typename='LoadMetaInfo'), # probably dont need
    dict(typename='Time2Frame'), # probably dont need
    dict(
        typename='OverlapCropAug',
        num_frames=num_frames,
        overlap_ratio=overlap_ratio,
        transforms=[
            dict(typename='TemporalCrop'),
            dict(typename='LoadFrames', to_float32=True), # probably dont need 
            dict(typename='SpatialCenterCrop', crop_size=img_shape),
            dict(typename='Normalize', **img_norm_cfg),
            dict(typename='Pad', size=(num_frames, *img_shape)),
            dict(typename='DefaultFormatBundle'),
            dict(typename='Collect', keys=['imgs'])
    ])
]

I imagine ImageToTensor is needed as a last step before Collect and loading the frame will need to be different
Any clues or help is appreciated

请问eval_map输入的格式是什么样的呢?

您好,
请问一下evaluation\mean_ap.py里的eval_map输入数据是怎么组织的呢?
我输入det_results的格式是 第一层:list,包含N个元素,每个元素是一个视频的预测结果list --> 第二层:包含C个元素,每个元素代表着一个类的预测结果 --> 第三层:包含不定个元素,如果有K个预测结果,就是K×3个元素,分别是起始点,终止点和该类预测概率。如下图:
pic

但是在315行报错:
cls_dets = np.vstack(cls_dets)

提示:
ValueError: all the input array dimensions for the concatenation axis must match exactly, but along dimension 1, the array at index 0 has size 0 and the array at index 21 has size 3

想请问一下应该如何正确组织det_results的格式呢?

question about test result

image
Hi,

I ran the code of test, and the results are shown as up.
The result I got is much lower than the result of the paper, what is the possible reason?

Test Question

howdy,Read your paper, very admire, but have a few questions, hope you to answer. First of all, I ran the training model and measured several sets of epoch, but it is not consistent with the 53.8 in your paper. Is the baseline data obtained here?

1200epoch 1000 900 800 700 600 300 200 100
0.445 0.448 0.455 0.456 0.457 0.45 0.445 0.416 0.34

Question about test

Hi,

I ran the code of test, and the results are shown as blow.
image
I wonder whether the repo supports evaluation of mAP of iou@[03:0.7], as reported in the paper.
Another question is why the inference needs so much time?
Thanks.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.