The source code for SIGIR 2022 paper: "Unsupervised Belief Representation Learning with Information-Theoretic Variational Graph Auto-Encoders"
To run InfoVGAE on Eurovision dataset:
python3 main.py --config_name InfoVGAE_eurovision_3D
To run InfoVGAE on Election dataset:
python3 main.py --config_name InfoVGAE_election_3D
To run InfoVGAE on Voteview 105th Congress dataset:
python3 main.py --config_name InfoVGAE_bill_3D
To run InfoVGAE on TIMME dataset:
python3 main.py --config_name InfoVGAE_timme_3D
To run InfoVGAE on TIMME dataset with follow (friend) links:
python3 main.py --config_name InfoVGAE_timme_follow_3D
The embeddings
, labels
, figures
, and top-k tweets
(only applicable for Twitter datasets), etc, will be saved in ./output
We uploaded the pre-processed datasets with smaller size, due to the file size limits of Github. The datasets are located in dataset/election
, dataset/eurovision
, and dataset/bill
. It may takes some time to clone this repo (297MB
). After cloning this repo, please run:
unzip dataset/bill/bmap2.pkl.zip; unzip dataset/bill/data_80_115.pkl.zip
Evaluation will be automaticly triggered after the training process. To evaluate again, modify the evaluator.init_from_dir()
in evaluate.py
.
General
--use_cuda
: training with GPU
--epochs
: iterations for training
--learning_rate
: learning rate for training
--device
: which gpu to use. empty for cpu.
--num_process
: num process for pandas processing
Data
--data_path
: csv path for data file
--stopword_path
: stopword path for text parsing
--kthreshold
: tweet count threshold to filter not popular tweets.
--uthreshold
: user count threshold to filter not popular users.
For InfoVGAE model
--hidden1_dim
: the latent space dimension of first layer
--hidden2_dim
: the latent space dimension of target layer
Result
--output_path
path to save the result