enyandai / ganf Goto Github PK

View Code? Open in Web Editor NEW

133.0 133.0 23.0 315 KB

Offical implementation of "Graph-Augmented Normalizing Flows for Anomaly Detection of Multiple Time Series" (ICLR 2022)

Python 99.86% Shell 0.14%

anomaly-detection graph multiple-time-series

ganf's People

Contributors

Stargazers

Watchers

ganf's Issues

How to use the baseline training code

Hello,
Thanks for sharing the code! I found the python code for baselines in the models folder, but I am not sure how to use them. Could you please give some instructions about how to use them to reproduce the result of baselines when you have a moment?

 Thank you in advance! My email is [email protected].

Thank you!

Test and validation dataset for traffic data overlap.

GANF/dataset.py

Line 29 in d207f7e

test_df = df.iloc[int(0.75*len(df)):]

The confusion about code

Hello, thank you for making the code available, it's a very nice work. When I run train_water.py, I have a confusion on line 66 of code in GANF.py. In the paper, the log(p(x)) in equation (10) is sum() but in the code it becomes mean(). I can't understand why this change was introduced. I have tried to change mean() to sum(), I find the result about the best roc_test is 0.7875 that satisfies 79.6±0.9 in Table 1. But if I don't change it, the result will be better that the roc_test will be 0.79 or 0.80. I can't understand why this change was introduced. Thank you!

I think log_prob = log_prob.sum(dim=1) are more reasonable.

class GANF(nn.Module):

    def __init__ (self, n_blocks, input_size, hidden_size, n_hidden ,dropout = 0.1, model="MAF", batch_norm=True):
        super(GANF, self).__init__()

        self.rnn = nn.LSTM(input_size=input_size,hidden_size=hidden_size,batch_first=True, dropout=dropout)
        self.gcn = GNN(input_size=hidden_size, hidden_size=hidden_size)
        if model=="MAF":
            self.nf = MAF(n_blocks, input_size, hidden_size, n_hidden, cond_label_size=hidden_size, batch_norm=batch_norm,activation='tanh')
        else:
            self.nf = RealNVP(n_blocks, input_size, hidden_size, n_hidden, cond_label_size=hidden_size, batch_norm=batch_norm)

    def forward(self, x, A):

        return self.test(x, A).mean()

    def test(self, x, A):
        # x: N X K X L X D 
        full_shape = x.shape

        # reshape: N*K, L, D
        x = x.reshape((x.shape[0]*x.shape[1], x.shape[2], x.shape[3]))
        h,_ = self.rnn(x)

        # resahpe: N, K, L, H
        h = h.reshape((full_shape[0], full_shape[1], h.shape[1], h.shape[2]))


        h = self.gcn(h, A)

        # reshappe N*K*L,H
        h = h.reshape((-1,h.shape[3]))
        x = x.reshape((-1,full_shape[3]))

        log_prob = self.nf.log_prob(x,h).reshape([full_shape[0],-1])#*full_shape[1]*full_shape[2]
        log_prob = log_prob.mean(dim=1)

        return log_prob

The dataset in train_water.py

Hi, the dataset of train GANF in train_water.py is SWaT_Dataset_Attack_v0.csv. When running train_water.py, SWaT_Dataset_Attack_v0.csv is splitted train/val/test dataloader. I can't understand why this model was trained in SWaT_Dataset_Attack_v0.csv. I think this model is more reasonable to train on SWaT_Dataset_Normal_v1.csv that is not attacked, and to test on SWaT_Dataset_Attack_v0.csv. I think this training method will make the attacked points more likely to be located in areas of low probability density. Thank you very much!

Univariate time series

Hello, I would like to ask if the code has an effect on univariate time series, and I would appreciate it if you could answer me.

DeepSVDD get 84% AUROC on SWaT, better than GANF

I'm following your great work. However, when I run the DeepSVDD provided in the code, I get 84% AUROC on SWaT, better than GANF. It seems that DeepSVDD overfits the dataset. How can I solve this? The settings are as follows. I wish you can provide the setting of DeepSVDD so that I can continue following your great work. Thank you very much.
epochs = 40
input_feature = 51
hidden_size = 64
If possible, could you send your trianing code for baseline models to my email [email protected]

Baselines Training Code

Hi,

Great work from the authors and thank you for making the code available. I was wondering whether the training code for the baselines could also be shared? I am working on a variant problem for which I would like to try one of the baselines DeepSVDD or DeepSAD. Since the codebase provides a nice framework to build on, I would highly appreciate it if the baseline training code could also be made available.
My email id is [email protected] for further communication.

Thanks a lot.

How to run DeepSAD and DeepSVDD

I am trying to run the baseline DeepSAD and DeepSVDD that you provided, however, I don't know what the formal parameters delta_t and sigma of the test function refer to, or what parameters to pass into the test function. Can you answer my confusion?I hope you can reply soon!

Why continue training after training with max_iter epoches?

Great work!

I was wondering, after the following line, 30 more epoches are trained again, why introducing this training step?

GANF/train_water.py

Line 188 in d207f7e

for _ in range(30):

Thanks in advance!

Bug in load_water

There is a bug in the load_water(..) function:

root = 'data/SWaT_Dataset_Attack_v0.csv'
data = pd.read_csv(root)
data = data.rename(columns={"Normal/Attack":"label"})
data.label[data.label!="Normal"]=1
data.label[data.label=="Normal"]=0
ts_format = pd.to_datetime(data["Timestamp"], format="%d/%m/%Y %I:%M:%S %p")
ts_no_format = pd.to_datetime(data["Timestamp"])

In the above code block, the dataframes ts_format and ts_no_format should be identical. However, since ts_no_format does not see the format, it treats the string 2/1/2016 7:00:00 AM as Feb 1st 2016 instead of the TRUE date Jan 2nd 2016.

The format specified in the format argument matches the format of the string timestamp. This can be easily verified by checking the format of any string timestamp with date >12.

Not sure how much this bug would affect the performance, but would be nice if the authors could fix it.

The output of the model is negative log_prob,so how to judge whether the data is anomalous?

How does MAF work for this one-dimensional univariate time series?

Thank you very much for sharing the code. In your paper, you used CNF based on MAF to evaluate the conditional probability distribution of each time series variable. But MAF should mask some of the dimensions of hidden variables and then perform autoregression. How does MAF work for this one-dimensional univariate time series?

enyandai / ganf Goto Github PK

ganf's People

Contributors

Stargazers

Watchers

Forkers

ganf's Issues

Recommend Projects

Recommend Topics

Recommend Org