lukasruff / deep-svdd-pytorch Goto Github PK

View Code? Open in Web Editor NEW

687.0 16.0 197.0 2.15 MB

A PyTorch implementation of the Deep SVDD anomaly detection method

License: MIT License

Python 100.00%

anomaly-detection deep-learning one-class-learning pytorch python machine-learning deep-anomaly-detection icml-2018

deep-svdd-pytorch's People

Contributors

Stargazers

Watchers

Forkers

bhorkar yydxlv marstarck thomaslin1990 chandan-iiti songfgh natasasdj yhldhit nemocpp sher-ali84 yamin05114 trung1309vn hoaipham97 yifei87 vsehwag asa008 jeaninezpp idoamihai montanier anormaly-detection yu1ut sarthakyadav dillionapple felicienveldema njuhaozhang chenyichun binjiaqm kumarneelabh13 cerisara reyesdejong dennistang742 3woodwater meshiguge wiwi waitalone abhiram98 tyanghui yh2010 yokings minghao2016 forrestk3 zlannnn hdyen saching007 jaeinkim85 cgmmonster zhaoshishuang saifullah73 flashkong nicfang junhua-zhang cnqlgdl agentoo7 chiragkyal sunilsivadas lisir-boy sehyungp92 csraf songshuy happy20200 ofirsh yanqingwu silentknight2 hukefei sjw821 weizhixiang66 asadmujtaba lijer rearwist3 liviurotiul chauchy-liu junweston leochencipher javersteeg lukaskratochvila dagondd themrghostman emckwon iyyguo apurl1 minhakim2 zcmail hongxing2020 eakirtas tyousei sonsongithub zzxx361122 swansealeo quintus0505 sun-pyo denix56 vegeballoon loubna-msellek alphacentauri763 pangoraw edn314 luojinshen wdjang wogong xuatpham

deep-svdd-pytorch's Issues

set multiple classes to be normal

Is it possible to set multiple classes as normal classes
And how to do that??

Performance of DeepSVDD in cifar10

Hi,

Thanks for sharing your source code! I clone this repository and run the experiment of cifar10 with digit 6 as the known class samples by the following command:

python main.py \
    cifar10 \
    cifar10_LeNet \
    ../log/cifar10_test \
    ../data \
    --objective one-class \
    --lr 0.0001 \
    --n_epochs 150 \
    --lr_milestone 50 \
    --batch_size 200 \
    --weight_decay 0.5e-6 \
    --pretrain True \
    --ae_lr 0.0001 \
    --ae_n_epochs 350 \
    --ae_lr_milestone 250 \
    --ae_batch_size 200 \
    --ae_weight_decay 0.5e-6 \
    --normal_class 7

However, the result is:

INFO:root:Training time: 183.387
INFO:root:Finished training.
INFO:root:Starting testing...
INFO:root:Testing time: 1.911
INFO:root:Test set AUC: 60.59%
INFO:root:Finished testing.

Could you please help me with this?

Hi
Thank you for sharing the code.
I tried to run the cifar10 example; however, I got an error like the bottom.
Is there something wrong in reading cifar10 dataset?
(btw, I can run mnist example perfectly.)

训练自己数据集时，训练自编码器时的loss特别大，但是在下降，请问可能是什么原因

Training error - ValueError: 'default' must be a list when 'multiple' is true.

Traceback (most recent call last):
File "d:\gitdownload\TS\Anomaly_detection\Deep-SVDD-PyTorch\src\main.py", line 56, in
def main(dataset_name, net_name, xp_path, data_path, load_config, load_model, objective, nu, device, seed,
File "C:\Users\PC.conda\envs\TS\lib\site-packages\click\decorators.py", line 373, in decorator
_param_memo(f, cls(param_decls, **attrs))
File "C:\Users\PC.conda\envs\TS\lib\site-packages\click\core.py", line 2536, in init
super().init(param_decls, type=type, multiple=multiple, **attrs)
File "C:\Users\PC.conda\envs\TS\lib\site-packages\click\core.py", line 2151, in init
raise ValueError(
ValueError: 'default' must be a list when 'multiple' is true.

will you please help on this?

About loss function

In paper, "Deep One-Class Classification", the loss function goes as

However, the loss is calculated as

They are different in these two places. The loss in this code lacks of the weights of neural network. Is it important or considered in another place?

3D data & architectures

Hi,

I would like to extend the framework to work with 3D (medical) images (e.g. size 128x128x128). Since choice of the autoencoder's architecture is important, I was wondering if you would be able to give some brief insight or intuition on how to approach it.

Thank you!

How to map scores to predicted labels to compute F1 score with one-class objective

Hi @lukasruff ,

Could you please explain how can I map the predicted scores into predicted labels to compute precision, recall and f1 score. Line 147 from src/optim/deepSVDD_trainer.py
_, labels, scores = zip(*idx_label_score)
For soft-boundary, I found in the paper that +ve values are considered as outliers while negative values (< 0) are treated are normal (inliers). Does this apply to the one-class objective as well?
Thanks,

Center of volume

We don't need to update value of c (center of volume) during the training?

Wrong get_radius function

Deep-SVDD-PyTorch/src/optim/deepSVDD_trainer.py

Lines 179 to 181 in 1919546

 def get_radius(dist: torch.Tensor, nu: float): 

 """Optimally solve for radius R via the (1-nu)-quantile of distances.""" 

 return np.quantile(dist.clone().data.cpu().numpy(), 1-nu)

Hi, I think this line 181 is wrong. We can modify this line to

np.quantile(np.sqrt(dist.clone().data.cpu().numpy()), 1-nu).

Because dist is (feat - center)**2 and we are comparing dist to R^2 in

Deep-SVDD-PyTorch/src/optim/deepSVDD_trainer.py

Line 131 in 1919546

scores = dist - self.R ** 2

How can I get `min_max` value list in mnist.py?

In your code, there is a constant variable, min_max in mnsit.py:

# Pre-computed min and max values (after applying GCN) from train data per class
min_max = [(-0.8826567065619495, 9.001545489292527),
           (-0.6661464580883915, 20.108062262467364),
           (-0.7820454743183202, 11.665100841080346),
           (-0.7645772083211267, 12.895051191467457),
           (-0.7253923114302238, 12.683235701611533),
           (-0.7698501867861425, 13.103278415430502),
           (-0.778418217980696, 10.457837397569108),
           (-0.7129780970522351, 12.057777597673047),
           (-0.8280402650205075, 10.581538445782988),
           (-0.7369959242164307, 10.697039838804978)]

I've tried to get this number by myself using a global_contrast_normalization function, but I couldn't get it. Here is what I've tried:

train_set = dsets.MNIST(root='data/', train=True, download=True)
test_set = dsets.MNIST(root='data/', train=False, download=True)

train_data = train_set.train_data.float()
train_label = train_set.train_labels.numpy()


# 1. Normalize whole data
data = train_data
label = train_label

digit = 0
given_index = np.where(label==digit)[0] 

data = global_contrast_normalization(data, scale='l1')
print(data[given_index].max())

# 2. Normalize label by label
data = train_data
label = train_label

digit = 0
given_index = np.where(label==digit)[0] 

data = global_contrast_normalization(data[given_index], scale='l1')
print(data.max())

But the values are different with the min_max values.

Did I miss something? Could you let me know how I can get that numbers?

Seed does not work for autoencoder (not reproducible results)

As you when we set seed to any number except -1, the result must not change by rerunning. however, I set the seed to 10 and use preparing. So the results of both pretraining and training change widely. It is worth mentioning that if you set pretraining False, the network is trained without any pre-trained weights and the result of many times training is the same for the specific seed value. (of course except -1)
Does anybody know why seed (use for reproducibility) does not work for autoencoder?

why use conv2d rather than ConvTranspose2d in decoder?

I'm new in Pytorch, but I've found out in the autoencoder part, you use conv2d in decoder, I wonder why?

Also, if I increase epoch in training autoencoder, the AUC of ae will be larger than SVDD, does this mean ae is better than SVDD?

L2 Normalization

Hi, I have a question regarding the L2 normalization. Since we want to map the anomalies fall outside the hypersphere, we might perform L2 normalization before calculating the embedding distance. But I didn't find the L2-norm in the code. Did I misunderstand something?

Thank you in advance!

GCN divided by 0

In GCN preporcessing, if a tensor x has the same feature values (e.g., a pixel with RGB = [255, 255, 255]), after mean removal, the L1-norm scale would be equal to 0, which results in 0 / 0. How would you avoid this? Thanks.

Data

Why the "/log/" folder is empty?

It requires "log.txt", "result.json", "model.tar" such files and so on.
But how could I get these files to fill the "/log/" folder?
There is no related content in "Readme.md". So now I'm really confused.
Thanks.

python3: can't open file '/content/drive/My Drive/Deep-SVDD-pytorch/src/ main.py cifar10 cifar10_LeNet ../log/cifar10_test ../data --objective one-class --lr 0.0001 --n_epochs 150 --lr_milestone 50 --batch_size 200 --weight_decay 0.5e-6 --pretrain True --ae_lr 0.0001 --ae_n_epochs 350 --ae_lr_milestone 250 --ae_batch_size 200 --ae_weight_decay 0.5e-6 --normal_class 3;': [Errno 2] No such file or directory

i am working in google colab after finishing all the requirements is shows that error for both mnist and cifar kindly help it

about results(Average AUCs in % with StdDevs (over 10 seeds) )

Hi, I am mystified by this (Average AUCs in % with StdDevs (over 10 seeds) ).

Do you mean to get the average results over 10 seeds.
If then, what should be the range of seeds? (1 - 10 ? or 10 -20 ?). The results varied widely, because of different seed settings.
Can you help me with this question? Thank you.

Two Questions

First, How can I get min-max values for a new dataset? I cannot find the code.
Second, there is a Bug exists in this code that no matter how I change the normal_class number, It always chooses 0 class as normal_class.

load pretrained model

If you load pretrained model and weights for Autoencoder and pretrain = True, in deepSVDD.pretrain, it will build a new autoencoder without weights.

Pre-computed min-max values

I think there's a problem with precomputing min-max values for separate classes. The problem is that this technique will not be suitable for a new and "unknown" sample.

from datasets/cifar10.py

    # Pre-computed min and max values (after applying GCN) from train data per class

    min_max = [(-28.94083453598571, 13.802961825439636),
               (-6.681770233365245, 9.158067708230273),
               (-34.924463588638204, 14.419298165027628),
               (-10.599172931391799, 11.093187820377565),
               (-11.945022995801637, 10.628045447867583),
               (-9.691969487694928, 8.948326776180823),
               (-9.174940012342555, 13.847014686472365),
               (-6.876682005899029, 12.282371383343161),
               (-15.603507135507172, 15.2464923804279),
               (-6.132882973622672, 8.046098172351265)]

Or am I missing something?

Why do not update batchnorm mean and var during training?

Hello, Thanks for your source codes.

Could I ask Why do not update batchnorm mean and var during training?
affine=False means do not update batch normalization parameters.

Thanks

PyTorch transforms.Normalize() usage

According to PyTorch docs, the Normalize transform expects the mean and std for every channel.

CLASS torchvision.transforms.Normalize(mean, std, inplace=False)

But currently, this implementation of Deep SVDD passes the "min" value in place of "mean" and "max - min" value in place of std. And that too, for only one channel even in case of CIFAR-10.

from datasets/cifar10.py line 35

> transforms.Normalize([min_max[normal_class][0]] * 3,
                       [min_max[normal_class][1] - min_max[normal_class][0]] * 3)])

Is this intentional or a real issue?

	def get_radius(dist: torch.Tensor, nu: float):
	"""Optimally solve for radius R via the (1-nu)-quantile of distances."""
	return np.quantile(dist.clone().data.cpu().numpy(), 1-nu)