ai4risk / antifraud Goto Github PK
View Code? Open in Web Editor NEWA repository for financial fraud detection
License: GNU General Public License v3.0
A repository for financial fraud detection
License: GNU General Public License v3.0
Could you please clarify which dataset is used by default when I use the command "python main.py --method gtan"? How can I specify testing three different data sets separately?
Hello, I am replicating your work on STAN and am very interested in the 3D convolution you proposed. Could you provide the code for it?
I coulnd't run gtan due to:
ImportError: cannot import name 'NodeDataLoader' from 'dgl.dataloading'
And after searching for the file in my version of dgl directory, the closest class I could find was DistNodeDataLoader. Found out that according to this:
https://docs.dgl.ai/en/0.9.x/generated/dgl.dataloading.NodeDataLoader.html
Deprecated since version 0.8: The class is deprecated since v0.8, replaced by DataLoader
Also, I assumed the csv below was a test one.
line 240, in load_gtan_data
df = pd.read_csv(prefix + "S-FFSDneofull.csv")
I want to try process data S-FFSD and inference from GTAN model to testing. Can you help me !
The error message you're encountering during training of "rgtan" is as follows:
bash
Copy code
Training fold 1
In epoch:000|batch:0000, train_loss:0.774403, train_ap:0.0769, train_acc:0.5000, train_auc:0.4783
In epoch:000|batch:0010, train_loss:0.647940, train_ap:0.3581, train_acc:0.7879, train_auc:0.6813
In epoch:000|batch:0020, train_loss:0.588786, train_ap:0.4898, train_acc:0.7500, train_auc:0.5926
In epoch:000|batch:0030, train_loss:0.552446, train_ap:0.3437, train_acc:0.8667, train_auc:0.4038
In epoch:000|batch:0040, train_loss:0.543063, train_ap:0.2750, train_acc:0.7308, train_auc:0.4812
In epoch:000|batch:0050, train_loss:0.522105, train_ap:0.7437, train_acc:0.7857, train_auc:0.9015
../aten/src/ATen/native/cuda/Indexing.cu:1141: indexSelectLargeIndex: block: [208,0,0], thread: [96,0,0] Assertion srcIndex < srcSelectDimSize
failed.
../aten/src/ATen/native/cuda/Indexing.cu:1141: indexSelectLargeIndex: block: [208,0,0], thread: [97,0,0] Assertion srcIndex < srcSelectDimSize
failed.
../aten/src/ATen/native/cuda/Indexing.cu:1141: indexSelectLargeIndex: block: [208,0,0], thread: [98,0,0] Assertion srcIndex < srcSelectDimSize
failed.
../aten/src/ATen/native/cuda/Indexing.cu:1141: indexSelectLargeIndex: block: [208,0,0], thread: [99,0,0] Assertion srcIndex < srcSelectDimSize
failed.
This error message indicates that there is a problem with CUDA indexing and selection during training. The error is associated with assertions failing in the CUDA code. Specifically, it mentions that the srcIndex is not less than the srcSelectDimSize within certain CUDA threads.
I want to get helps, thank you.
Hi!I reproduced the experiment through the source code, but there was a gap with the experimental results in the paper. I found that there was no random seed set in the training, could you please provide the random seed parameters?
RGTAN: Enhancing Attribute-driven Fraud Detection with Risk-aware Graph Representation
能提供一下这篇论文吗?没找到。
Can you provide me with this paper? did not find it.
Hello,
I still can’t fully understand the rgtan model just by looking at the code. Can you provide a model structure diagram? Or the conception and description of the model?
Thank you
Hi, thanks for your contribution of this marvellous work, I appreciate it very much for proving this great opportunity to reproduce your work. However, when reproducing the code, I found out the model shows very serious overfitting problem when training epoch turns to 100 or more. The initial epoch is set to 30, where the result of training and testing shown very close value. When I reset the epoch into 100 or more, the training result continue increasing even to 90%, while the testing result are always near 70%. I believe this is an overfitting problem, and I will appreciate it very much if you could give me some advice of this problem.
Hello,
I'm encountering a CUDA error when I attempt to run the rgtan method with the S-FFSD dataset using the command:
python main.py --method rgtan
This issue only occurs with the S-FFSD dataset; when I use the Yelp dataset, the program runs without any issues. The specific error message, as shown in the screenshot below, indicates a "CUDA error: device-side assert triggered."
Hello,
I am currently reviewing your work on the FFSD dataset. In your paper, it was mentioned that the attributes of the card include the card type, cardholder type, card limit, remaining limit, etc. The transaction attributes include the channel ID, currency ID, transaction amount, etc. The merchant attributes contain merchant type, terminal type, merchant location, sector, charge ratio, etc.
However, when observing the dataset in the source code, I only see the following fields: Time, Source, Target, Amount, Location, Type, Labels.
Could you kindly clarify the correlation between the fields in the source code dataset and those described in the paper? Additionally, I would appreciate if you could provide a brief explanation for what each field in the source code dataset represents.
Thank you for your time and I look forward to your response.
Best regards,
Dear AI4Risk Team,
I am writing to bring to your attention a potential issue regarding data leakage in the RGTAN model implementation. After reviewing the code, I have noticed that there might be an inadvertent use of test labels during the training phase, which could lead to overly optimistic performance metrics. Below are the specific details of my observation:
Details:
data_process.py
count_risk_neighs
Issue Description:
data_process.py
script, the count_risk_neighs
method computes the number of high-risk neighbors for each node using the labels
tensor, which includes both training and test set labels. This can be seen in the method where it calculates the risk features based on the labels of the neighboring nodes.count_risk_neighs
method indirectly exposes the model to the test labels during the training process, leading to potential data leakage.Potential Impact:
This approach could inadvertently allow the model to "see" the test labels during training, resulting in an artificial boost in performance metrics such as AUC. The evaluation metrics may therefore not accurately reflect the model's true generalization performance.
Thank you for your attention to this matter. I appreciate your hard work on this project and look forward to any clarifications or fixes that may be applied to address this issue.
Best regards,
Could not reproduce results of stagn method. I got the following
test set | auc: 0.5839, F1: 0.5559, AP: 0.2189
I installed environment as per requirements. The only thing i changed in code is setting device: cpu in stagn_cfg.yaml file.
Is such drop in model results normal?
Why this issue happened ?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.