tianyu0207 / rtfm Goto Github PK
View Code? Open in Web Editor NEWOfficial code for 'Weakly-supervised Video Anomaly Detection with Robust Temporal Feature Magnitude Learning' [ICCV 2021]
Official code for 'Weakly-supervised Video Anomaly Detection with Robust Temporal Feature Magnitude Learning' [ICCV 2021]
Hello @tianyu0207,
In the main.py file there is a command like this: viz = Visualizer(env='shanghai tech 10 crop', use_incoming_socket=False)
I found the Visualizer function in the utils.py file but I still don't understand where the parameter 'shanghai tech 10 crop' comes from?
and am I wrong in assuming the step variable has a value of 5000?
Thanks for your help!
Hello,
It seems that you only provide code to make gt for shanghaitech dataset. Could you share the code to make gt for UCF crime dataset? That would be very helpful.
Hi, first of all, congratulations on the successful acceptance of your paper!
After i download the feature you have released and check the feature, while the temporal number of the feature is different as the pervious work released.
For example, one video with 17 frames, the previous work would ignore the last group frames (less than 16 frames) and obtain 1 feature, while the feature data of yours is 2 features, can you say some detail of temporal operation in feature extractor?
I also see the code you have write in make_gt.py , and the temporal length of gt is based on feature numbers and this would make you have longer gt than the origin dataset.
Hi @tianyu0207,
If I want to use ShanghaiTech's pre-trained model I just need to put the path to shanghai_best_ckpt.pkl at --pretrained-ckpt argument in option.py file, right?
Thank you so much!
We have used the following repo for extracting 10-crop I3D features.
https://github.com/Tushar-N/pytorch-resnet3d
But while training we are not having any improvement in the auc after 400 iterations. I have attached the graph shown in visdom, so is there anything that we are doing wrong? And when will you publish the I3D features for UCF-Crime dataset?
Hi tianyu!
Thank you for your perfect job. I want to know why in your method, the split of the training and test data sets is not the same as the original. In addition, when I ran the test function, I found that starting from video 43, gt is always Flase?
Hello:
Thank you for your excellent work!
Can you specify exactly how you do the tenCrop on an shanghai dataset?
thanks!
Hello! First, congratulaton for the excelent paper.
What are the excat changes to make to train your model on Ucf-Crime, with your code.
To what I noted in the paper and in the various issues:
assign args.batch_size = 32 (When we concatenate we get a batch size of 64. Moreover when I set to 64 (128) the results do not increase)
weight_decay = 0.0005
and in the dataset.py:
if self.is_normal:
self.list = self.list [810:]
else:
self.list = self.list [: 810]
Is that all?
Because I do not achieve the same performance even after these changes.
I get a maximum of :
Hi @tianyu0207,
I loaded the i3d RGB feature of the XD-Violence dataset, but it uses 5 crops instead of 10 crops like you said.
So do I need to fix 10 to 5 in the below two command lines in model.py?
normal_features = features[0:self.batch_size*10]
abnormal_features = features[self.batch_size*10:]
And when you experiment do you use 5 crops or 10 crops?
Thank you in advance!
Thanks for your great work, @tianyu0207, I was able to implement your RTFM on my custom data.
Please can you describe how to visualize the results from .pkl model files like in your paper?
Thank you so much!
I'm training on the UCF Crime dataset based on the C3D features. While running main.py, initially test.py is called and the size of the dataloader.dataset
argument that is passed is (266, 32, 4096)
and this is resulting in error. I think this is because the code is iterating on the dataloader and each item doesn't have 4 dimensions to permute.
Traceback (most recent call last): File "main.py", line 45, in <module> auc = test(test_loader, model, args, viz, device) File "/home/yggdrasil/WorkingDirectory/FinalYearProject/RTFM/test_10crop.py", line 13, in test input = input.permute(0,2,1,3) RuntimeError: number of dims don't match in permute
BTW all my features are of dimension 1*32*4096
per video. And the batch size argument passed in option.py is 32. What do I need to do to solve this?
First of all, I apologize for asking you so much, thanks for your help, @tianyu0207!
I think the ground truth of the XD-Violence dataset I used from Not only Look but also Listen: Learning Multimodal Violence Detection under Weak Supervision is not suited, so the AP result of the Violence dataset I got is only 0.15.
Can you provide the ground truth of the XD-Violence dataset that you used in your paper?
Thank you very much!
The AP result of the Violence dataset I got was only 0.15.
I fixed a few places when implementing RTFM on XD-Violence:
rec_auc
to ap = average_precision_score(list(gt), pred, pos_label=1)
in test_10crop.pydef _parse_list(self):
self.list = list(open(self.rgb_list_file))
if self.test_mode is False:
if self.is_normal:
self.list = self.list[:2046]
else:
self.list = self.list[2046:]
normal_features = features[0:self.batch_size*5]
normal_scores = scores[0:self.batch_size]
abnormal_features = features[self.batch_size*5:]
abnormal_scores = scores[self.batch_size:]
I think maybe my XD-Violence's ground truth isn't right?
Where do you think I am going wrong and could you please give me the ground truth of XD-Violence?
Thank you so much @tianyu0207
Can you kindly explain the process you followed for generating the i3d features of the shanghai tech dataset so that we can follow the same for other datasets and videos as well?
I see your features‘ shape is N * 10 * 2048,but mix_5c output shape is 1 * 1024 when input is 16 * 224 * 224
Thanks for viewing my issue, @tianyu0207
I have 4 questions that I hope you can explain:
04_008.avi video in shanghaitech dataset has 1700 frames, so the clip number should be 1700/16=106 or 107, but 04_008_i3d.npy you provide has (103,10,2048) dimension,can you explain the reason?
The same problem also appeared in ucf dataset
Hello, Thank you for your excellence job!
1.May I ask why you wrote in the article that epoch is 50, but in the code it is 15000?
2.When I reproduced the data set of Shanghai, I found that the AUC was not very high. May I ask if you have adopted the learning rate attenuation mechanism?Or anything else you forgot to upload to the code?
Hi,
First of all, thank you for making the code available to everyone. I noticed that there is a hard coded value in dataset.py at line 27. Is this specific to shanghai dataset? Does it need to be changes for ucf-crime dataset?
Line 27 in ea75ea0
Thanks
I have a doubt in your paper. In the section 3.4 the loss function is written as
Here the loss function is using the output from f
which is [0,1,1,....T times] vector and y
which is single value video level label. I am having hard time to understand how does this log loss function work if one is of length T and the other one is a single value?
While extracting the features using an I3D network with resnet50 backbone, which pretrained model did you use and did you finetune it with the dataset before extracting features?
Hello, thank you for your excellent work. Could you please share the code version on extracting the I3D features?
Hi @tianyu0207,
I have extracted a 3D feature on our dataset, the feature file has a shape (k, 2048) where k equals the number of frames of the video divided by 16.
Can you show me how to do 10 crops augmentation per video, from there I will combine 10 feature files together to create shapes (10,k, 2048)?
Thank you very much!
The I3D files available in this repo is hosted in the Office 365, but every time my download is aborted or the files come corrupted. I tried download few individuals files separately per time, but some problems occurred again.
There is another source of the I3D files?
Any plan to release code...Timelines
Do you have any plans for uploading I3D features for the UCF-Crime dataset?
Hi @tianyu0207,
As far as I understand, µ represents the number of abnormal snippets in an abnormal video, so how can we determine all the values of µ in the training set from which to choose k appropriately?
Thank you!
thanks for your great work!
can you share the pretrained RTFM model of UCF-crime dataset?
Thanks for your work! I use the features extracted by you but encount an error.
The size of label is 1109680, but the sum of numbers of your feature's snipets is 69634, not 1109680/16=69355. I am confuesd about the mismatch which leads to error in the process of testing.
Will you share the code of extracting I3D feature? Thanks!
Hi
Could you release some features including shanghaitech and UCF-Crime in C3D and I3D without ten-crop, I used to work on this and realized the mean/std and data operation (resize/crop) is important for the final feature value. So if you can release few example features, I can check the feature extract operation, thanks.
Hello!
In Equation 6, as shown bellow, you calcule the L_{s} by:
However, in train.py, you calcule loss_abn and loss_nor by doing:
loss_abn = torch.abs(self.margin - torch.norm(torch.mean(feat_a, dim=1), p=2, dim=1))
loss_nor = torch.norm(torch.mean(feat_n, dim=1), p=2, dim=1)
loss_um = torch.mean((loss_abn + loss_nor) ** 2)
Why you used the torch.abs and squared (loss_abn + loss_nor)? This don't match the equation. Maybe some trick to converge?
Regards!
Dear all, thanks for your excellent work. There is a problem when I ran the main.py file. It needs a file called shanghai-i3d-test-10crop-figure.list. I would like to know how can I get this file? Thank you.
Hello,
I want to extract C3D features from our dataset. Could you please tell me how did you extract the features.
Especially, i confused with 10 crops. For example if a video has 160 frames, features should 10x4096 with using 16 frames. But your result is 10x10x4096. Could you please explain the 10 crops part?
when I use torch.onnx.export and set opset_version=11,
it cause Segmentation fault (core dumped)
Thank you very much for your work.The code starts directly from the processed features.This step has a significant impact on performance.Can you tell me how to process and train the original video data to get the segment features you share. Can you share the code?
Hi guys, thanks for your work. I'm wondering if you guys declare SOTA on XD-Violence? I see from your results, you achieved AP with 77.81 I3D&RGB and you said Wu's Not only Look, but also Listen has AP with 75.41. However Wu's paper stated they achieved 78.64 AP and there is no 75.41 appeared in their paper. How did you get 75.41?
Hello, Yu Tian.
I have 2 questions hope you respond:
Hello @tianyu0207,
I am using your RTFM method to evaluate my dataset but I made an error like this:
RuntimeError: Expected a 'cuda' device type for generator but found 'cpu'
Is it because my pytorch version and yours are different?
Can you tell me its problem?
Thank you so much!
I have doubt regarding the dimension required of the input discussed in the paper. It is said that the input is a T*2048 feature vector for a given video. And it also said that T is taken as 32 in implementation details.
Does this mean, for any given video, we need to divide it into 32 parts (no matter the no.of frames) and find 1*2048 vector for each part?
When I train on UCF-Crime I get this error, is it due to a damaged file? I had a hard time downloading your files by the way.
Thank you :-)
Traceback (most recent call last):
File "/home/quadro_6000/Téléchargements/RTFM-main/main.py", line 59, in <module>
train(loadern_iter, loadera_iter, model, args.batch_size, optimizer, viz, device)
File "/home/quadro_6000/Téléchargements/RTFM-main/train.py", line 109, in train
ninput, nlabel = next(nloader)
File "/home/quadro_6000/.local/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 517, in __next__
data = self._next_data()
File "/home/quadro_6000/.local/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 557, in _next_data
data = self._dataset_fetcher.fetch(index) # may raise StopIteration
File "/home/quadro_6000/.local/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/quadro_6000/.local/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 44, in <listcomp>
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/quadro_6000/Téléchargements/RTFM-main/dataset.py", line 39, in __getitem__
features = np.load(self.list[index].strip('\n'), allow_pickle=True)
File "/home/quadro_6000/anaconda3/lib/python3.8/site-packages/numpy/lib/npyio.py", line 440, in load
return format.read_array(fid, allow_pickle=allow_pickle,
File "/home/quadro_6000/anaconda3/lib/python3.8/site-packages/numpy/lib/format.py", line 783, in read_array
array.shape = shape
ValueError: cannot reshape array of size 2559968 into shape (176,10,2048)
thank you for your excellent work!
I saw people met the same problem of data preprocessing on the I3D feature extraction.
Can you share the code of data preprocessing before I3d ?
Can you specify exactly how you do the tenCrop on an image ?
thanks!
I test your model on shanghaitech dataset several times, but the highest AUC is always 96.3x, there is still a big gap from 97.21 given in the paper.
Could you please share the checkpoint for shanghaitech dataset? thanks a lot
有以下几个问题向您请教:
1、我复现出来的ShanghaiTech结果只有96%左右,没有修改任何代码,不知道什么原因;
2、crop的次数对识别率的影响特别大,我从测试集中的10次crop中随机取出一次crop来测试,识别率降到了88%左右;
3、和其他论文比较的识别率的时候,RTFM是做过10crop来训练和测试的,但是请问其他论文也做过10次crop吗?如果没有的话,那就有失公平了。
4、在Nvidia 2080Ti的inference time=0.76s,请问这个时间是10次crop的总时间还是单次crop的时间?
5、由于RTFM采用的多尺度技术,因此对于一段视频,如果有N个16帧的clips,最好的方式是这个N个clip全部输入到网络里识别是最好的,也正如测试代码所示,clips越少识别率越低。如果视频中的的N个clips单独测试,识别率也非常低,会降到80%左右。那么在实时监控的时候,每次就应该输入32个clips才能达到可以接受的识别率,但是这种方式,会导致实时性严重下降,甚至不可用。
Hello! First, congratulaton for the excelent paper. I'm trying training the network from scratch using UCF Crimes dataset, starting with the I3D features available in this repo (I do not extracted the I3D features by my self). However, setting the same hyperparameters as paper (0,001 initial LR, 0.0005 weight decay, 64 batch size, 50 epochs and same dataset division) I'm unable to reach the 0.84 of AUC, even after 150 epochs. The greater AUC achieved was near 0.63 (curiosly in the end of training).
Please, look the charts bellow.
I'm ploting the test AUC every 5 epochs, and the training loss is plotted every epoch. Notice that, in the 5º epoch, the test AUC reaches the greater value (near 0.66), then suddenly the measure drops down, while the training loss drops down during all training. In my opinion, these charts suggest overfitting (error in test set increasing and error in training set decreasing) >maybe< due high Learning Rate, but I trained again with LR 0.0001 (one magnitude order bellow than previous training) and the behavior was similar.
Maybe the hard coded dataset division in the dataset.py self.list = self.list[63:] and self.list = self.list[:63] was setted for the Shangay not for UCF?
Can you kindle suggest me some modification?
I am using this file list/make_gt.py to generate the ground-truth file for my UCF Crime dataset. Instead of calculating the no. of frames using the features file as you did, I'm using cv2 to open the video and compute the no. of frames. But after some iterations at a video called Arson/Arson011_x264.mp4
the code is stopping here.
Line 123 in 950243a
Line 50 in 950243a
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.