Giter Club home page Giter Club logo

explore-and-match's People

Contributors

sangminwoo avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

explore-and-match's Issues

No zero_shot_clip

In /lib/modeling/model.py, there is the following call for zero_shot_clip. However, there is no relevant content in the whole project.

from lib.modeling.zero_shot_clip import build_zeroshot_clip

Training equipment

Hello, very meaningful work. How many GPUs are needed for the training? How long is it altogether?

ap_array is empty

I got this error during evaluating. I found sometimes ap_array was empty. Could you kind give some advice to adjust it?

  File "/Explore-and-Match/lib/evaluate/eval.py", line 75, in compute_ap
    iou_thd2ap = dict(zip([str(e) for e in iou_thds], ap_thds))
TypeError: 'numpy.float64' object is not iterable

Reproduce results of LVTR-CLIP

Hello, the author! I followed all your feature extraction and charades training suggestions in your github homepage. But in my work environment, the reprocude results of LVTR-CLIP in 200-th epoch looked like this

>>>>> Evalutation
[Epoch] 200
[Loss]
        > loss_label 0.0958
        > class_error 0.0000
        > loss_span 0.0345
        > loss_giou 0.5867
        > loss_label_0 0.0965
        > class_error_0 0.0000
        > loss_span_0 0.0351
        > loss_giou_0 0.6012
        > loss_label_1 0.0960
        > class_error_1 0.0000
        > loss_span_1 0.0345
        > loss_giou_1 0.5858
        > loss_label_2 0.0958
        > class_error_2 0.0000
        > loss_span_2 0.0342
        > loss_giou_2 0.5838
        > loss_overall 2.8799
[Metrics_No_NMS]
OrderedDict([   ('[email protected]', 54.11),
                ('[email protected]', 38.61),
                ('[email protected]', 22.44),
                ('[email protected]', 7.09),
                ('[email protected]', 87.81),
                ('[email protected]', 77.6),
                ('[email protected]', 62.84),
                ('[email protected]', 33.93),
                ('VG-full-mAP', 33.87),
                ('VG-full-mIoU@R1', 0.2466),
                ('VG-full-mIoU@R5', 0.5349),
                ('[email protected]', 34.25),
                ('[email protected]', 73.62),
                ('VG-middle-mAP', 40.85),
                ('[email protected]', 15.05),
                ('[email protected]', 55.39),
                ('VG-short-mAP', 29.21)])

To record the problem, I used tensorboard to collect some evaluation information of each epoch.
在这里插入图片描述
在这里插入图片描述
I followed all your code without any modification except removing the evaluation record of "long" length_range. So, would you kindly give us some advice to solve this problem to successfully reproduce your work.? Thank you very much!

Frames of some videos in the charades cannot be read

Thanks for your meaningful work. When I tried to get frames per video, I noticed some of frames folders were empty, it happened to num_frames of 16/32/64/126/256 all. It looks like there were some problems when reading the video frames of charades with opencv, would you kindly give us some advice? And I downloaded the original size of charades(55GB), which version of charades did you use?Thank you!

Regarding Figure 7. on the paper.

Hello,
Could you please explain more about fig.7 on the paper?
What does the x-axis and y-axis of the each proposals mean?
Thanks!

What's the definition of run ?

I find this snippet in train.py. What's the definition of run ?

if __name__ == '__main__':
    logger = setup_logger('LVTR', args.log_dir, distributed_rank=0, filename=cur_time()+"_train.txt")
    train_val(logger, run=run)

Get CLIP Feature for Charades

I find the following code in clip_encoder.py. There are train.json and text.json required for charades. But in jiyanggao/TALL, only charades_sta_train/test.txt are provided. How do you extract the CLIP features for charades ? Could you provide the features used in LVTR by Google Drive or Box ? And will you kindly provide some detailed description of the document organization of your datasets ?

    phases = ['train', 'val', 'test'] if dataset in ['activitynet'] else ['train', 'test']
    for phase in phases:
        # load annotations
        with open(os.path.join(data_dir, phase + '.json')) as j:
            annos = json.load(j)
        time_meters['load_annotations'].update(time.time()-tictoc)
        tictoc = time.time()

'clip' Package in preprocess/clip_encoder.py

Thanks for your good work. When I extract the clip features for the datasets, I find the source code (preprocess/clip_encoder.py) imports a package named 'clip', which is not mentioned in Installation. Where can I install the 'clip' package? Thanks

Reproducing with C3D

Hi, thank you for the interesting work.
I want to reproduce the results with C3D features.
Is the configuration same as CLIP?
Could you provide a detailed config file for both Charades and ActivityNet with C3D features please?

LVTR-CLIP

Hello, I'm very interested in your LVTR-CLIP model. The features ectracted by CLIP only include image information, whereas the features ectracted by C3D include both image information and video termporal information. So, why does LVTR-CLIP outperform LVTR-C3D ? Or, in another word, cross_model encoder has capability to modeling the temporal relations both between frame and frame, frame and text ?

it seems that some queries were not evaluated?

There seems to be bugs in line 119 - 122, test.py.
if the number of sentences > num_input_sentences, the length of the list "split_src_txt" will be larger than 1, while the length of "annotations" is always 1. Therefore, the For loop (from lines 119 to 181) will only be executed once, regardless of the size of "split_src_txt".
That is to say, it seems that some queries were not evaluated?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.