sangminwoo / explore-and-match Goto Github PK
View Code? Open in Web Editor NEWOfficial pytorch implementation of "Explore-And-Match: Bridging Proposal-Based and Proposal-Free With Transformer for Sentence Grounding in Videos"
License: MIT License
Official pytorch implementation of "Explore-And-Match: Bridging Proposal-Based and Proposal-Free With Transformer for Sentence Grounding in Videos"
License: MIT License
In /lib/modeling/model.py, there is the following call for zero_shot_clip. However, there is no relevant content in the whole project.
from lib.modeling.zero_shot_clip import build_zeroshot_clip
Hello, very meaningful work. How many GPUs are needed for the training? How long is it altogether?
I got this error during evaluating. I found sometimes ap_array was empty. Could you kind give some advice to adjust it?
File "/Explore-and-Match/lib/evaluate/eval.py", line 75, in compute_ap
iou_thd2ap = dict(zip([str(e) for e in iou_thds], ap_thds))
TypeError: 'numpy.float64' object is not iterable
Hello, the author! I followed all your feature extraction and charades training suggestions in your github homepage. But in my work environment, the reprocude results of LVTR-CLIP in 200-th epoch looked like this
>>>>> Evalutation
[Epoch] 200
[Loss]
> loss_label 0.0958
> class_error 0.0000
> loss_span 0.0345
> loss_giou 0.5867
> loss_label_0 0.0965
> class_error_0 0.0000
> loss_span_0 0.0351
> loss_giou_0 0.6012
> loss_label_1 0.0960
> class_error_1 0.0000
> loss_span_1 0.0345
> loss_giou_1 0.5858
> loss_label_2 0.0958
> class_error_2 0.0000
> loss_span_2 0.0342
> loss_giou_2 0.5838
> loss_overall 2.8799
[Metrics_No_NMS]
OrderedDict([ ('[email protected]', 54.11),
('[email protected]', 38.61),
('[email protected]', 22.44),
('[email protected]', 7.09),
('[email protected]', 87.81),
('[email protected]', 77.6),
('[email protected]', 62.84),
('[email protected]', 33.93),
('VG-full-mAP', 33.87),
('VG-full-mIoU@R1', 0.2466),
('VG-full-mIoU@R5', 0.5349),
('[email protected]', 34.25),
('[email protected]', 73.62),
('VG-middle-mAP', 40.85),
('[email protected]', 15.05),
('[email protected]', 55.39),
('VG-short-mAP', 29.21)])
To record the problem, I used tensorboard to collect some evaluation information of each epoch.
I followed all your code without any modification except removing the evaluation record of "long" length_range. So, would you kindly give us some advice to solve this problem to successfully reproduce your work.? Thank you very much!
Thanks for your meaningful work. When I tried to get frames per video, I noticed some of frames folders were empty, it happened to num_frames of 16/32/64/126/256 all. It looks like there were some problems when reading the video frames of charades with opencv, would you kindly give us some advice? And I downloaded the original size of charades(55GB), which version of charades did you use?Thank you!
Hello,
Could you please explain more about fig.7 on the paper?
What does the x-axis and y-axis of the each proposals mean?
Thanks!
I find this snippet in train.py. What's the definition of run ?
if __name__ == '__main__':
logger = setup_logger('LVTR', args.log_dir, distributed_rank=0, filename=cur_time()+"_train.txt")
train_val(logger, run=run)
I find the following code in clip_encoder.py. There are train.json and text.json required for charades. But in jiyanggao/TALL, only charades_sta_train/test.txt are provided. How do you extract the CLIP features for charades ? Could you provide the features used in LVTR by Google Drive or Box ? And will you kindly provide some detailed description of the document organization of your datasets ?
phases = ['train', 'val', 'test'] if dataset in ['activitynet'] else ['train', 'test']
for phase in phases:
# load annotations
with open(os.path.join(data_dir, phase + '.json')) as j:
annos = json.load(j)
time_meters['load_annotations'].update(time.time()-tictoc)
tictoc = time.time()
Thanks for your good work. When I extract the clip features for the datasets, I find the source code (preprocess/clip_encoder.py) imports a package named 'clip', which is not mentioned in Installation. Where can I install the 'clip' package? Thanks
Hi, thank you for the interesting work.
I want to reproduce the results with C3D features.
Is the configuration same as CLIP?
Could you provide a detailed config file for both Charades and ActivityNet with C3D features please?
Hello, I'm very interested in your LVTR-CLIP model. The features ectracted by CLIP only include image information, whereas the features ectracted by C3D include both image information and video termporal information. So, why does LVTR-CLIP outperform LVTR-C3D ? Or, in another word, cross_model encoder has capability to modeling the temporal relations both between frame and frame, frame and text ?
There seems to be bugs in line 119 - 122, test.py.
if the number of sentences > num_input_sentences, the length of the list "split_src_txt" will be larger than 1, while the length of "annotations" is always 1. Therefore, the For loop (from lines 119 to 181) will only be executed once, regardless of the size of "split_src_txt".
That is to say, it seems that some queries were not evaluated?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.