chunyuany / rumordetection Goto Github PK

Python 100.00%

rumor deep-learning network-representation-learning social-network

rumordetection's Introduction

Paper of the source codes released:

Chunyuan Yuan, Qianwen Ma, Wei Zhou, Jizhong Han, Songlin Hu. Jointly embedding the local and global relations of heterogeneous graph for rumor detection. In 19th IEEE International Conference on Data Mining, IEEE ICDM 2019.

Dependencies:

Gensim==3.7.2

Jieba==0.39

Scikit-learn==0.21.2

Pytorch==1.4.0

Datasets

The main directory contains the directories of Weibo dataset and two Twitter datasets: twitter15 and twitter16. In each directory, there are:

twitter15.train, twitter15.dev, and twitter15.test file: This files provide traing, development and test samples in a format like: 'source tweet ID \t source tweet content \t label'
twitter15_graph.txt file: This file provides the source posts content of the trees in a format like: 'source tweet ID \t userID1:weight1 userID2:weight2 ...'

These dastasets are preprocessed according to our requirement and original datasets can be available at https://www.dropbox.com/s/7ewzdrbelpmrnxu/rumdetect2017.zip?dl=0 (Twitter) and http://alt.qcri.org/~wgao/data/rumdect.zip (Weibo).

If you want to preprocess the dataset by youself, you can use the word2vec used in our work. The pretrained word2vec can be available at https://drive.google.com/drive/folders/1IMOJCyolpYtoflEqQsj3jn5BYnaRhsiY?usp=sharing.

Reproduce the experimental results:

create an empty directory: checkpoint/
run script run.py

Citation

If you find this code useful in your research, please cite our paper:

@inproceedings{rumor_yuan_2019,
  title={Jointly embedding the local and global relations of heterogeneous graph for rumor detection},
  author={Yuan, Chunyuan and Ma, Qianwen and Zhou, Wei and Han, Jizhong and Hu, Songlin},
  booktitle={The 19th IEEE International Conference on Data Mining},
  year={2019},
  organization={IEEE}
}

rumordetection's People

Contributors

Stargazers

Watchers

rumordetection's Issues

how to get the weight of user

randomly initialize or preceed with certain algorithm
thanks~

early detection的截止时间是怎么设置的？

新人小白，刚刚入门谣言检测，目前比较疑惑早期检测的截止时间是怎么设置的？目前最早能看到2015年的《Enquiring Minds: Early Detection of Rumors in Social Media from Enquiry Posts》文章里阐述了这个概念，但是后边看很多论文提到早期检测实验的时候都没有详细写怎么做的早期检测实验。
求教TAT

How to create the twitter15_graph.txt file

Hi !

I trying to reproduce the creation of the dataset in dataset/twitter15 with the original dataset of twitter15 , but i can't undersstan how to creeate the file twitter15_graph.txt

Can you help with this please or explain me how to create it?

PD: Nice paper and work. Congrats !

Thanks !!

同学您好，能分享你的数据集吗？

http://alt.qcri.org/~wgao/data/rumdect.zip链接失效

Weibo数据无法获得，请问能分享一下数据集吗

Only Text

您好，想问一下如何只用文本内容跑weibo数据集的内容呢，能分享一下不需要要graph的模型代码吗？

edge between users or edge between tweets

hi~ chunyuan, the paper says, "we connect the user nodes if they participate in common microblogs and link the nodes of source tweets by their common users", when i check weibo dataset, the file 'weibo_graph.txt' does not contain links between two uses or two source tweets. it only contains 4664 records between tweet and users. so does twitter15_graph.txt and twitter16_graph.txt
so, could GLAN still works without links between two uses or two source tweets ?

about reproducing the paper

Hello, I encountered a problem when reproducing your paper. You mentioned that the pytorch version used is 1.1.0, but the torch_geometric used in your code does not support torch 1.1.0, so where is the problem?

paper equations vs code

Hi,

I have read your paper it says it derives the conv features first from the w2v embedding and then applied multi-head attention over that to arrive at final message embeddings and then applied GRE to obtain another representation. but in your code directly conv features are fed for classification
Please clarify:

    conv_feature = torch.cat(conv_block, dim=1)
    features = self.dropout(conv_feature)

    a1 = self.relu(self.fc1(features))
    d1 = self.dropout(a1)

    output = self.fc2(d1)
    return output

where fc2 -----> self.fc2 = nn.Linear(300, config['num_classes'])

I am writing the survey paper to include your results please verify the code or give me the updated code as per paper.

I want to know how the r value in target:r in *graph.txt in the data set is obtained. Is it the value normalized by time?
In the paper, you said "To build the connection inside between source microblog and retweet, we ﬁrst use multi-head attention to reﬁne the representation of every retweet",but i did not get the construction of this relationship from the code, only directly input the source tweet into the multi head attention. So can i understand it this way, here is just the semantic relationship in the tweet.
3.In part C of the paper, you introduced that user information u was introduced, but in the source code, I did not find anything related to this piece. Can you provide details?