ai4risk / antifraud Goto Github PK

View Code? Open in Web Editor NEW

146.0 4.0 38.0 167.9 MB

A repository for financial fraud detection

License: GNU General Public License v3.0

Python 100.00%

antifraud's People

Stargazers

Watchers

antifraud's Issues

How can I obtain the FFSD dataset

Could you please clarify which dataset is used by default when I use the command "python main.py --method gtan"? How can I specify testing three different data sets separately?

Seeking help

Hello, I am replicating your work on STAN and am very interested in the 3D convolution you proposed. Could you provide the code for it?

In gtan, dgl.dataloading has no class named NodeDataLoader?, replaced with DataLoader since v0.8, as well as "S-FFSDneofull.csv"?

I coulnd't run gtan due to:

ImportError: cannot import name 'NodeDataLoader' from 'dgl.dataloading'

And after searching for the file in my version of dgl directory, the closest class I could find was DistNodeDataLoader. Found out that according to this:

https://docs.dgl.ai/en/0.9.x/generated/dgl.dataloading.NodeDataLoader.html

Deprecated since version 0.8: The class is deprecated since v0.8, replaced by DataLoader

Also, I assumed the csv below was a test one.

line 240, in load_gtan_data
    df = pd.read_csv(prefix + "S-FFSDneofull.csv")

Miss Code Inference

I want to try process data S-FFSD and inference from GTAN model to testing. Can you help me !

This error message indicates that there is a problem with CUDA indexing and selection during training. The error is associated with assertions failing in the CUDA code. Specifically, it mentions that the srcIndex is not less than the srcSelectDimSize within certain CUDA threads.

The error message you're encountering during training of "rgtan" is as follows:
bash
Copy code
Training fold 1
In epoch:000|batch:0000, train_loss:0.774403, train_ap:0.0769, train_acc:0.5000, train_auc:0.4783
In epoch:000|batch:0010, train_loss:0.647940, train_ap:0.3581, train_acc:0.7879, train_auc:0.6813
In epoch:000|batch:0020, train_loss:0.588786, train_ap:0.4898, train_acc:0.7500, train_auc:0.5926
In epoch:000|batch:0030, train_loss:0.552446, train_ap:0.3437, train_acc:0.8667, train_auc:0.4038
In epoch:000|batch:0040, train_loss:0.543063, train_ap:0.2750, train_acc:0.7308, train_auc:0.4812
In epoch:000|batch:0050, train_loss:0.522105, train_ap:0.7437, train_acc:0.7857, train_auc:0.9015
../aten/src/ATen/native/cuda/Indexing.cu:1141: indexSelectLargeIndex: block: [208,0,0], thread: [96,0,0] Assertion srcIndex < srcSelectDimSize failed.
../aten/src/ATen/native/cuda/Indexing.cu:1141: indexSelectLargeIndex: block: [208,0,0], thread: [97,0,0] Assertion srcIndex < srcSelectDimSize failed.
../aten/src/ATen/native/cuda/Indexing.cu:1141: indexSelectLargeIndex: block: [208,0,0], thread: [98,0,0] Assertion srcIndex < srcSelectDimSize failed.
../aten/src/ATen/native/cuda/Indexing.cu:1141: indexSelectLargeIndex: block: [208,0,0], thread: [99,0,0] Assertion srcIndex < srcSelectDimSize failed.
This error message indicates that there is a problem with CUDA indexing and selection during training. The error is associated with assertions failing in the CUDA code. Specifically, it mentions that the srcIndex is not less than the srcSelectDimSize within certain CUDA threads.
I want to get helps, thank you.

random seed

Hi！I reproduced the experiment through the source code, but there was a gap with the experimental results in the paper. I found that there was no random seed set in the training, could you please provide the random seed parameters?

RGTAN

RGTAN: Enhancing Attribute-driven Fraud Detection with Risk-aware Graph Representation
能提供一下这篇论文吗？没找到。
Can you provide me with this paper? did not find it.

Hello, I still can’t fully understand the rgtan model just by looking at the code. Can you provide a model structure diagram? Or the conception and description of the model?

Hello,
I still can’t fully understand the rgtan model just by looking at the code. Can you provide a model structure diagram? Or the conception and description of the model?
Thank you

Overfitting problem of STAN

Hi, thanks for your contribution of this marvellous work, I appreciate it very much for proving this great opportunity to reproduce your work. However, when reproducing the code, I found out the model shows very serious overfitting problem when training epoch turns to 100 or more. The initial epoch is set to 30, where the result of training and testing shown very close value. When I reset the epoch into 100 or more, the training result continue increasing even to 90%, while the testing result are always near 70%. I believe this is an overfitting problem, and I will appreciate it very much if you could give me some advice of this problem.

CUDA Error: device-side assert triggered only with S-FFSD dataset using rgtan method

Hello,

I'm encountering a CUDA error when I attempt to run the rgtan method with the S-FFSD dataset using the command:

python main.py --method rgtan

This issue only occurs with the S-FFSD dataset; when I use the Yelp dataset, the program runs without any issues. The specific error message, as shown in the screenshot below, indicates a "CUDA error: device-side assert triggered."

Clarification on Dataset Fields in FFSD Compared to Described in the Paper

Hello,

I am currently reviewing your work on the FFSD dataset. In your paper, it was mentioned that the attributes of the card include the card type, cardholder type, card limit, remaining limit, etc. The transaction attributes include the channel ID, currency ID, transaction amount, etc. The merchant attributes contain merchant type, terminal type, merchant location, sector, charge ratio, etc.

However, when observing the dataset in the source code, I only see the following fields: Time, Source, Target, Amount, Location, Type, Labels.

Could you kindly clarify the correlation between the fields in the source code dataset and those described in the paper? Additionally, I would appreciate if you could provide a brief explanation for what each field in the source code dataset represents.

Thank you for your time and I look forward to your response.

Best regards,

Potential Data Leakage in RGTAN Model

Dear AI4Risk Team,

I am writing to bring to your attention a potential issue regarding data leakage in the RGTAN model implementation. After reviewing the code, I have noticed that there might be an inadvertent use of test labels during the training phase, which could lead to overly optimistic performance metrics. Below are the specific details of my observation:

Details:

File Involved: data_process.py
Method of Concern: count_risk_neighs

Issue Description:

In the data_process.py script, the count_risk_neighs method computes the number of high-risk neighbors for each node using the labels tensor, which includes both training and test set labels. This can be seen in the method where it calculates the risk features based on the labels of the neighboring nodes.
Specifically, the risk feature computation in the count_risk_neighs method indirectly exposes the model to the test labels during the training process, leading to potential data leakage.

Potential Impact:

This approach could inadvertently allow the model to "see" the test labels during training, resulting in an artificial boost in performance metrics such as AUC. The evaluation metrics may therefore not accurately reflect the model's true generalization performance.

Thank you for your attention to this matter. I appreciate your hard work on this project and look forward to any clarifications or fixes that may be applied to address this issue.

Best regards,

Lower results on CPU for STAGN method

Could not reproduce results of stagn method. I got the following

test set | auc: 0.5839, F1: 0.5559, AP: 0.2189

I installed environment as per requirements. The only thing i changed in code is setting device: cpu in stagn_cfg.yaml file.

Is such drop in model results normal?

ImportError: cannot import name 'NodeDataLoader' from 'dgl.dataloading'

Why this issue happened ?

ai4risk / antifraud Goto Github PK

antifraud's People

Stargazers

Watchers

Forkers

antifraud's Issues

Recommend Projects

Recommend Topics

Recommend Org