Giter Club home page Giter Club logo

mgtab's Introduction

MGTAB

MGTAB: A Multi-Relational Graph-Based Twitter Account Detection Benchmark

Introduction

MGTAB is the first standardized graph-based benchmark for stance and bot detection. MGTAB contains 10,199 expert-annotated users and 7 types of relationships, ensuring high-quality annotation and diversified relations. For more details, please refer to the MGTAB paper.

Distribution of labels in annotations.

Stance Bot
Lable Count Percentage Lable Count Percentage
neutral 3,776 37.02 human 7,451 73.06
against 3,637 35.66 bot 2,748 26.94
support 2,786 27.32
MGTAB contains 10,199 expert-annotated users, and 400,000 additional unlabelled users in MGTAB-large compared to MGTAB.

Multiple relations in the MGTAB.

Our proposed dataset has seven types of user relationships.

MGTAB
Edge type followers friends mention reply quoted URL hashtag
Numbers 308,120 412,575 114,516 223,466 77,631 263,800 300,000
MGTAB-large
Edge type followers friends mention reply quoted URL hashtag
Numbers 31,990,488 49,668,723 7,135,192 1,018,834 182,296 51,281 7,950,896

Enviromment

python 3.7
scikit-learn 1.0.2
torch 1.8.1+cu111
torch_cluster-1.5.9
torch_scatter-2.0.6
torch_sparse-0.6.9
torch_spline_conv-1.2.1
torch-geometric 2.0.4
pytorch-lightning 1.5.0

Train Model

To start training process:

Train GNN models

python MGTAB-GNN.py  --task stance --model GCN --relation_select 0 1 --random_seed 0 1 2 3 4
python MGTAB-GNN.py  --task bot --model RGCN --relation_select 0 1 --random_seed 0 1 2 3 4

Train Machine Learning models

python MGTAB-ML.py  --task stance --models_list 1 2 3  --random_seed 0 1 2 3 4
python MGTAB-ML.py  --task bot --models_list 4 5 6 7  --random_seed 0 1 2 3 4

Train GNN models parallel using multi-gpu

python GNN_sample_large.py  --task bot --relation_select 0 1 2 3 4 4 6 --model RGT --GPU_num 4
python GNN_sample_large.py  --task bot --relation_select 0 1 2 3 4 --model SHGN --GPU_num 4
python GNN_sample_large.py  --task stance --relation_select 0 1 --model GCN --GPU_num 4
python GNN_sample_large.py  --task stance --relation_select 0 --model GAT --GPU_num 4

Baseline performance

Stance detection performance on MGTAB

methods type accuracy precision recall f1-score
AdaBoost F 74.59
$_{1.41}$
74.60
$_{1.35}$
74.02
$_{1.61}$
73.88
$_{1.47}$
Random Forest F 79.62
$_{0.68}$
80.04
$_{0.43}$
78.83
$_{0.98}$
79.04
$_{0.82}$
Decision Tree F 66.92
$_{0.93}$
66.34
$_{1.02}$
66.23
$_{1.06}$
66.03
$_{0.84}$
SVM F 81.23
$_{0.66}$
81.40
$_{0.71}$
80.86
$_{1.09}$
80.71
$_{0.78}$
KNN F 76.25
$_{1.32}$
75.54
$_{1.41}$
75.70
$_{1.37}$
75.48
$_{1.37}$
Logistic Regression F 79.51
$_{1.00}$
79.33
$_{0.98}$
78.83
$_{1.17}$
78.98
$_{1.11}$
GCN G 81.35
$_{0.58}$
81.08
$_{0.30}$
80.19
$_{0.56}$
80.08
$_{0.56}$
GrapgSAGE G 83.33
$_{1.22}$
82.52
$_{1.63}$
83.45
$_{0.63}$
82.72
$_{1.34}$
GAT G 82.19
$_{1.23}$
81.72
$_{1.19}$
81.68
$_{1.16}$
81.04
$_{1.24}$
HGT G 83.29
$_{0.44}$
81.63
$_{0.58}$
81.51
$_{0.76}$
81.82
$_{0.34}$
S-HGN G 85.32
$_{0.53}$
83.93
$_{0.67}$
83.65
$_{0.65}$
84.42
$_{0.43}$
BotRGCN G 84.71
$_{1.43}$
83.43
$_{1.23}$
84.08
$_{0.94}$
84.30
$_{1.44}$
RGT G 87.78
$_{0.43}$
85.22
$_{0.89}$
84.40
$_{0.74}$
86.86
$_{0.43}$

Bot detection performance on MGTAB

methods type accuracy precision recall f1-score
AdaBoost F 90.12
$_{0.92}$
88.51
$_{1.33}$
89.10
$_{0.92}$
87.71
$_{1.10}$
Random Forest F 89.52
$_{0.44}$
88.92
$_{0.49}$
86.72
$_{1.15}$
86.83
$_{0.53}$
Decision Tree F 87.13
$_{0.51}$
83.81
$_{0.72}$
83.39
$_{1.06}$
83.70
$_{0.74}$
SVM F 88.68
$_{1.40}$
85.73
$_{1.84}$
85.73
$_{1.84}$
85.31
$_{1.73}$
KNN F 85.78
$_{0.84}$
82.28
$_{1.22}$
80.49
$_{0.64}$
81.28
$_{0.66}$
Logistic Regression F 88.49
$_{1.31}$
85.69
$_{1.69}$
84.41
$_{1.96}$
84.97
$_{1.67}$
GCN G 85.81
$_{1.32}$
77.40
$_{2.12}$
84.37
$_{1.73}$
78.33
$_{1.67}$
GrapgSAGE G 88.71
$_{1.24}$
85.33
$_{1.83}$
86.15
$_{2.55}$
85.44
$_{1.08}$
GAT G 86.96
$_{1.28}$
79.71
$_{2.96}$
84.88
$_{1.52}$
82.33
$_{2.12}$
HGT G 90.28
$_{0.29}$
85.35
$_{0.33}$
85.97
$_{0.41}$
87.52
$_{0.37}$
S-HGN G 91.42
$_{0.43}$
87.40
$_{0.67}$
86.73
$_{0.64}$
88.72
$_{0.58}$
BotRGCN G 89.60
$_{0.82}$
85.21
$_{1.81}$
87.07
$_{1.38}$
87.16
$_{0.74}$
RGT G 92.12
$_{0.37}$
88.08
$_{0.43}$
86.64
$_{0.25}$
90.41
$_{0.47}$

Licensing

The MGTAB dataset uses the CC BY-NC-ND 4.0 license. Implemented code in the MGTAB evaluation framework uses the MIT license.

Datasets download

For SemEval-2016 T6, visit the SemEval2016 repository. For SemEval-2019 T7, visit the SemEval2019 github repository. For TwiBot-20, visit the TwiBot-20 github repository. For TwiBot-22, visit the TwiBot-22 github repository. For other bot detection datasets, please visit the Bot Repository.

MGTAB is available at Google Drive. MGTAB-large (contains 400,000 unlabeled users) is available at Google Drive. We also offer the standardized Cresci-15 at Google Drive. After downloading these datasets, please unzip it into path "./Dataset".

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.