tianyu0207 / rtfm Goto Github PK

Official code for 'Weakly-supervised Video Anomaly Detection with Robust Temporal Feature Magnitude Learning' [ICCV 2021]

Python 100.00%

pytorch deep-learning anomaly-detection video-anomaly-detection

rtfm's People

Contributors

Stargazers

Watchers

rtfm's Issues

Why is the splitting of train dataset hardcoded?

While dividing the train set into two classes, why is it hardcoded that first 63 values are one class and the remaining are one? Aren't we supposed to split it using the labels file provided through option.py

RTFM/main.py

Lines 19 to 24 in 950243a

 train_nloader = DataLoader(Dataset(args, test_mode=False, is_normal=True), 

 batch_size=args.batch_size, shuffle=True, 

 num_workers=0, pin_memory=False, drop_last=True) 

 train_aloader = DataLoader(Dataset(args, test_mode=False, is_normal=False), 

 batch_size=args.batch_size, shuffle=True, 

 num_workers=0, pin_memory=False, drop_last=True)

RTFM/dataset.py

Lines 23 to 34 in 950243a

 def _parse_list(self): 

 self.list = list(open(self.rgb_list_file)) 

 if self.test_mode is False: 

 if self.is_normal: 

 self.list = self.list[63:] 

 print('normal list') 

 print(self.list) 

 else: 

 self.list = self.list[:63] 

 print('abnormal list') 

 print(self.list)

Visualizer function

Hello @tianyu0207,
In the main.py file there is a command like this: viz = Visualizer(env='shanghai tech 10 crop', use_incoming_socket=False)
I found the Visualizer function in the utils.py file but I still don't understand where the parameter 'shanghai tech 10 crop' comes from?
and am I wrong in assuming the step variable has a value of 5000?
Thanks for your help!

code for ucfcrime ground truth

Hello,
It seems that you only provide code to make gt for shanghaitech dataset. Could you share the code to make gt for UCF crime dataset? That would be very helpful.

Preprocessing Ucf-crime results are too different

RuntimeError Question

Thanks for your great sharing!
When I run your code, I met these problem. It seems that the number of channels of the image does not match. Can you please help me? :)

Detail for temporal number of i3d feature

Hi, first of all, congratulations on the successful acceptance of your paper!
After i download the feature you have released and check the feature, while the temporal number of the feature is different as the pervious work released.
For example, one video with 17 frames, the previous work would ignore the last group frames (less than 16 frames) and obtain 1 feature, while the feature data of yours is 2 features, can you say some detail of temporal operation in feature extractor?

I also see the code you have write in make_gt.py , and the temporal length of gt is based on feature numbers and this would make you have longer gt than the origin dataset.

Use pre-trained model

Hi @tianyu0207,
If I want to use ShanghaiTech's pre-trained model I just need to put the path to shanghai_best_ckpt.pkl at --pretrained-ckpt argument in option.py file, right?
Thank you so much!

Getting low auc while training on UCF-Crime Data using I3D features(10-crop)

We have used the following repo for extracting 10-crop I3D features.
https://github.com/Tushar-N/pytorch-resnet3d

But while training we are not having any improvement in the auc after 400 iterations. I have attached the graph shown in visdom, so is there anything that we are doing wrong? And when will you publish the I3D features for UCF-Crime dataset?

Training and testing in ShanghaiTech Campus dataset

Hi tianyu!
Thank you for your perfect job. I want to know why in your method, the split of the training and test data sets is not the same as the original. In addition, when I ran the test function, I found that starting from video 43, gt is always Flase?

Feature Extraction Setup

Hello：
Thank you for your excellent work!

Can you specify exactly how you do the tenCrop on an shanghai dataset?

thanks!

The excat changes to make to train on Ucf-crime

Hello! First, congratulaton for the excelent paper.

What are the excat changes to make to train your model on Ucf-Crime, with your code.

To what I noted in the paper and in the various issues:

assign args.batch_size = 32 (When we concatenate we get a batch size of 64. Moreover when I set to 64 (128) the results do not increase)
weight_decay = 0.0005
and in the dataset.py:

 if self.is_normal:
     self.list = self.list [810:]
 else:
     self.list = self.list [: 810]

Is that all?

Because I do not achieve the same performance even after these changes.
I get a maximum of :

auc: 0.75
pr_auc: 0.18588291392503292

Experiment on the XD-Violence dataset

Hi @tianyu0207,
I loaded the i3d RGB feature of the XD-Violence dataset, but it uses 5 crops instead of 10 crops like you said.
So do I need to fix 10 to 5 in the below two command lines in model.py?
normal_features = features[0:self.batch_size*10]
abnormal_features = features[self.batch_size*10:]
And when you experiment do you use 5 crops or 10 crops?

Thank you in advance!

Visualize the results

Thanks for your great work, @tianyu0207, I was able to implement your RTFM on my custom data.
Please can you describe how to visualize the results from .pkl model files like in your paper?

Thank you so much!

Error in test_10crop.py while training with C3D features

I'm training on the UCF Crime dataset based on the C3D features. While running main.py, initially test.py is called and the size of the dataloader.dataset argument that is passed is (266, 32, 4096) and this is resulting in error. I think this is because the code is iterating on the dataloader and each item doesn't have 4 dimensions to permute.

Traceback (most recent call last): File "main.py", line 45, in <module> auc = test(test_loader, model, args, viz, device) File "/home/yggdrasil/WorkingDirectory/FinalYearProject/RTFM/test_10crop.py", line 13, in test input = input.permute(0,2,1,3) RuntimeError: number of dims don't match in permute

BTW all my features are of dimension 1*32*4096 per video. And the batch size argument passed in option.py is 32. What do I need to do to solve this?

Ground truth of the XD-Violence dataset

First of all, I apologize for asking you so much, thanks for your help, @tianyu0207!

I think the ground truth of the XD-Violence dataset I used from Not only Look but also Listen: Learning Multimodal Violence Detection under Weak Supervision is not suited, so the AP result of the Violence dataset I got is only 0.15.
Can you provide the ground truth of the XD-Violence dataset that you used in your paper?

Thank you very much!

Ground truth of XD-Violence

The AP result of the Violence dataset I got was only 0.15.
I fixed a few places when implementing RTFM on XD-Violence:

Change rec_auc to ap = average_precision_score(list(gt), pred, pos_label=1) in test_10crop.py
From first index to the 2046th index are normal, the rest are abnormal, so I edited dataset.py like this:

def _parse_list(self):
        self.list = list(open(self.rgb_list_file))
        if self.test_mode is False:
            if self.is_normal:
                self.list = self.list[:2046]
            else:
                self.list = self.list[2046:]

And I fixed model.py since XD-Violence uses 5 crops:

normal_features = features[0:self.batch_size*5]
normal_scores = scores[0:self.batch_size]

abnormal_features = features[self.batch_size*5:]
abnormal_scores = scores[self.batch_size:]

I think maybe my XD-Violence's ground truth isn't right?
Where do you think I am going wrong and could you please give me the ground truth of XD-Violence?

Thank you so much @tianyu0207

Can you show the structure of MTN?

Process followed for generating the i3d features

Can you kindly explain the process you followed for generating the i3d features of the shanghai tech dataset so that we can follow the same for other datasets and videos as well?

how to extract i3d features

I see your features‘ shape is N * 10 * 2048，but mix_5c output shape is 1 * 1024 when input is 16 * 224 * 224

After obtaining the final temporal feature representation X

Thanks for viewing my issue, @tianyu0207
I have 4 questions that I hope you can explain:

After obtaining X, the snippets have been divided into 2 groups normal and abnormal, right?
In the Select Top-k snippets stage, do you select k snippets from both the normal and the abnormal groups, or will each group select k snippets?
Assuming k = 3, in case a video has less than 3 abnormal or normal snippets, how will RTFM choose?
When the input is normal video, how will the RTFM-enabled Snippet Classifier Learning stage classify?

Clip number wrong

04_008.avi video in shanghaitech dataset has 1700 frames, so the clip number should be 1700/16=106 or 107, but 04_008_i3d.npy you provide has (103,10,2048) dimension,can you explain the reason？
The same problem also appeared in ucf dataset

About epoch and AUC

Hello, Thank you for your excellence job！
1.May I ask why you wrote in the article that epoch is 50, but in the code it is 15000?
2.When I reproduced the data set of Shanghai, I found that the AUC was not very high. May I ask if you have adopted the learning rate attenuation mechanism?Or anything else you forgot to upload to the code?

Is the dataset split at 63 specific to shanghai dataset?

Hi,

First of all, thank you for making the code available to everyone. I noticed that there is a hard coded value in dataset.py at line 27. Is this specific to shanghai dataset? Does it need to be changes for ucf-crime dataset?

RTFM/dataset.py

Line 27 in ea75ea0

self.list = self.list[63:]

Thanks

How is the final loss function used to find the [0,1] T dimensional vector?

I have a doubt in your paper. In the section 3.4 the loss function is written as

Here the loss function is using the output from f which is [0,1,1,....T times] vector and y which is single value video level label. I am having hard time to understand how does this log loss function work if one is of length T and the other one is a single value?

Regarding I3D Network

While extracting the features using an I3D network with resnet50 backbone, which pretrained model did you use and did you finetune it with the dataset before extracting features?

I3D code version

Hello, thank you for your excellent work. Could you please share the code version on extracting the I3D features？

How to 10 crops augmentation

Hi @tianyu0207,

I have extracted a 3D feature on our dataset, the feature file has a shape (k, 2048) where k equals the number of frames of the video divided by 16.

Can you show me how to do 10 crops augmentation per video, from there I will combine 10 feature files together to create shapes (10,k, 2048)?

Thank you very much!

Unable to download I3D train and test files

The I3D files available in this repo is hosted in the Office 365, but every time my download is aborted or the files come corrupted. I tried download few individuals files separately per time, but some problems occurred again.

There is another source of the I3D files?

Any plan to release code...Timelines

I3D Features for UCF-Crime dataset

Do you have any plans for uploading I3D features for the UCF-Crime dataset?

Regarding to µ

Hi @tianyu0207,
As far as I understand, µ represents the number of abnormal snippets in an abnormal video, so how can we determine all the values of µ in the training set from which to choose k appropriately?
Thank you!

can you share the pretrained model?

thanks for your great work!
can you share the pretrained RTFM model of UCF-crime dataset?

Mismatch of numbers of labels and extracted features given in link

Thanks for your work! I use the features extracted by you but encount an error.
The size of label is 1109680, but the sum of numbers of your feature's snipets is 69634, not 1109680/16=69355. I am confuesd about the mismatch which leads to error in the process of testing.

Will you share the code of extracting I3D feature?

Will you share the code of extracting I3D feature? Thanks!

Upload some pretrained features

Hi
Could you release some features including shanghaitech and UCF-Crime in C3D and I3D without ten-crop, I used to work on this and realized the mean/std and data operation (resize/crop) is important for the final feature value. So if you can release few example features, I can check the feature extract operation, thanks.

Why you take the abs() and rise to 2 in L_{s}?

Hello!

In Equation 6, as shown bellow, you calcule the L_{s} by:

However, in train.py, you calcule loss_abn and loss_nor by doing:

loss_abn = torch.abs(self.margin - torch.norm(torch.mean(feat_a, dim=1), p=2, dim=1))
loss_nor = torch.norm(torch.mean(feat_n, dim=1), p=2, dim=1)

loss_um = torch.mean((loss_abn + loss_nor) ** 2)

Why you used the torch.abs and squared (loss_abn + loss_nor)? This don't match the equation. Maybe some trick to converge?

Regards!

train request

Dear all, thanks for your excellent work. There is a problem when I ran the main.py file. It needs a file called shanghai-i3d-test-10crop-figure.list. I would like to know how can I get this file? Thank you.

About Feature extraction

Hello,

I want to extract C3D features from our dataset. Could you please tell me how did you extract the features.
Especially, i confused with 10 crops. For example if a video has 160 frames, features should 10x4096 with using 16 frames. But your result is 10x10x4096. Could you please explain the 10 crops part?

Segmentation fault during ONNX exportation

when I use torch.onnx.export and set opset_version=11,
it cause Segmentation fault (core dumped)

How to process and train the original video data?

Thank you very much for your work.The code starts directly from the processed features.This step has a significant impact on performance.Can you tell me how to process and train the original video data to get the segment features you share. Can you share the code?

About SOTA on XD-Violence

Hi guys, thanks for your work. I'm wondering if you guys declare SOTA on XD-Violence? I see from your results, you achieved AP with 77.81 I3D&RGB and you said Wu's Not only Look, but also Listen has AP with 75.41. However Wu's paper stated they achieved 78.64 AP and there is no 75.41 appeared in their paper. How did you get 75.41?

How to extract the I3D-10crop feature?

Hello, Yu Tian.
I have 2 questions hope you respond:

Can you tell me why I3D-10 crops features of XD-Violence are not provided in this git?
Please can you show me the I3D-10 crops feature extraction code? Because I extracted only 2 dimensions, not 3 dimensions like yours.
Thank you!

expected a 'cuda' device type for generator but found 'cpu'

Hello @tianyu0207,
I am using your RTFM method to evaluate my dataset but I made an error like this:
RuntimeError: Expected a 'cuda' device type for generator but found 'cpu'
Is it because my pytorch version and yours are different?
Can you tell me its problem?
Thank you so much!

How is the input dimension of the I3D feature vector generated?

I have doubt regarding the dimension required of the input discussed in the paper. It is said that the input is a T*2048 feature vector for a given video. And it also said that T is taken as 32 in implementation details.

Does this mean, for any given video, we need to divide it into 32 parts (no matter the no.of frames) and find 1*2048 vector for each part?

ValueError: cannot reshape array of size 2559968 into shape (176,10,2048)

When I train on UCF-Crime I get this error, is it due to a damaged file? I had a hard time downloading your files by the way.

Thank you :-)

Traceback (most recent call last):

  File "/home/quadro_6000/Téléchargements/RTFM-main/main.py", line 59, in <module>
    train(loadern_iter, loadera_iter, model, args.batch_size, optimizer, viz, device)

  File "/home/quadro_6000/Téléchargements/RTFM-main/train.py", line 109, in train
    ninput, nlabel = next(nloader)

  File "/home/quadro_6000/.local/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 517, in __next__
    data = self._next_data()

  File "/home/quadro_6000/.local/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 557, in _next_data
    data = self._dataset_fetcher.fetch(index)  # may raise StopIteration

  File "/home/quadro_6000/.local/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]

  File "/home/quadro_6000/.local/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 44, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]

  File "/home/quadro_6000/Téléchargements/RTFM-main/dataset.py", line 39, in __getitem__
    features = np.load(self.list[index].strip('\n'), allow_pickle=True)

  File "/home/quadro_6000/anaconda3/lib/python3.8/site-packages/numpy/lib/npyio.py", line 440, in load
    return format.read_array(fid, allow_pickle=allow_pickle,

  File "/home/quadro_6000/anaconda3/lib/python3.8/site-packages/numpy/lib/format.py", line 783, in read_array
    array.shape = shape

ValueError: cannot reshape array of size 2559968 into shape (176,10,2048)

ask for the code of data preprocessing before I3d

thank you for your excellent work!

I saw people met the same problem of data preprocessing on the I3D feature extraction.

Can you share the code of data preprocessing before I3d ?

Can you specify exactly how you do the tenCrop on an image ?

thanks!

Can you share the checkpoint for shanghaitech dataset?

I test your model on shanghaitech dataset several times, but the highest AUC is always 96.3x, there is still a big gap from 97.21 given in the paper.
Could you please share the checkpoint for shanghaitech dataset? thanks a lot

crop的次数对识别率的影响

有以下几个问题向您请教：
1、我复现出来的ShanghaiTech结果只有96%左右，没有修改任何代码，不知道什么原因；
2、crop的次数对识别率的影响特别大，我从测试集中的10次crop中随机取出一次crop来测试，识别率降到了88%左右；
3、和其他论文比较的识别率的时候，RTFM是做过10crop来训练和测试的，但是请问其他论文也做过10次crop吗？如果没有的话，那就有失公平了。
4、在Nvidia 2080Ti的inference time=0.76s，请问这个时间是10次crop的总时间还是单次crop的时间？
5、由于RTFM采用的多尺度技术，因此对于一段视频，如果有N个16帧的clips，最好的方式是这个N个clip全部输入到网络里识别是最好的，也正如测试代码所示，clips越少识别率越低。如果视频中的的N个clips单独测试，识别率也非常低，会降到80%左右。那么在实时监控的时候，每次就应该输入32个clips才能达到可以接受的识别率，但是这种方式，会导致实时性严重下降，甚至不可用。

Unable to reach auc results showed in paper

Hello! First, congratulaton for the excelent paper. I'm trying training the network from scratch using UCF Crimes dataset, starting with the I3D features available in this repo (I do not extracted the I3D features by my self). However, setting the same hyperparameters as paper (0,001 initial LR, 0.0005 weight decay, 64 batch size, 50 epochs and same dataset division) I'm unable to reach the 0.84 of AUC, even after 150 epochs. The greater AUC achieved was near 0.63 (curiosly in the end of training).

Please, look the charts bellow.

I'm ploting the test AUC every 5 epochs, and the training loss is plotted every epoch. Notice that, in the 5º epoch, the test AUC reaches the greater value (near 0.66), then suddenly the measure drops down, while the training loss drops down during all training. In my opinion, these charts suggest overfitting (error in test set increasing and error in training set decreasing) >maybe< due high Learning Rate, but I trained again with LR 0.0001 (one magnitude order bellow than previous training) and the behavior was similar.

Maybe the hard coded dataset division in the dataset.py self.list = self.list[63:] and self.list = self.list[:63] was setted for the Shangay not for UCF?

Can you kindle suggest me some modification?

Error while creating a ground truth file

I am using this file list/make_gt.py to generate the ground-truth file for my UCF Crime dataset. Instead of calculating the no. of frames using the features file as you did, I'm using cv2 to open the video and compute the no. of frames. But after some iterations at a video called Arson/Arson011_x264.mp4 the code is stopping here.

RTFM/list/make_gt.py

Line 123 in 950243a

if count != num_frame:

Only the videos that are going into this if (2 annotations) are stopping, others are fine

RTFM/list/make_gt.py

Line 50 in 950243a

if len(annots_idx[0][0]) == 2:

How to resolve this?

	train_nloader = DataLoader(Dataset(args, test_mode=False, is_normal=True),
	batch_size=args.batch_size, shuffle=True,
	num_workers=0, pin_memory=False, drop_last=True)
	train_aloader = DataLoader(Dataset(args, test_mode=False, is_normal=False),
	batch_size=args.batch_size, shuffle=True,
	num_workers=0, pin_memory=False, drop_last=True)

	def _parse_list(self):
	self.list = list(open(self.rgb_list_file))
	if self.test_mode is False:
	if self.is_normal:
	self.list = self.list[63:]
	print('normal list')
	print(self.list)
	else:
	self.list = self.list[:63]

	print('abnormal list')
	print(self.list)

tianyu0207 / rtfm Goto Github PK

rtfm's People

Contributors

Stargazers

Watchers

Forkers

rtfm's Issues

Recommend Projects

Recommend Topics

Recommend Org