graphdetec / mgtab Goto Github PK
View Code? Open in Web Editor NEWA Multi-relational Graph-Based Twitter Account Detection Benchmark
A Multi-relational Graph-Based Twitter Account Detection Benchmark
Can you provide the scripts you used to produce the embeddings?
Hi,
Thank you for your effort on this project.
I would like to know how can we retrieve the names of the columns (features names) in the dataset, as it is provided as a torch tensor with only numerical values. Similarly, the accounts' human/bot labels are provided as a binary vector, without the account name or IDs.
Are the numerical and Boolean characteristics with the top 10 information gain of stance and bot detection the same?
Hello @GraphDetec,
First time, thank you for your great works.
I want to ask about code for preprocess the raw data from the original author into the format used by MGTAB.
For example with Cresci2015, when I access the author of Cresci2015 from them web site (http://mib.projects.iit.cnr.it/dataset.html). I see only raw data. But I access your raw data in google drive (https://drive.google.com/uc?export=download&id=1AzMUNt70we5G2DShS8hk5qH95VR9HfD3), I see data set have different format (some file name are cat_properties_tensor.pt, des_tensor.pt,....).
Can you share the notebook code to preprocess the raw data from the original author into the format used by MGTAB?
It's my pleasure to read your paper. I have some questions about the dataset collection process:
How to get the other accounts based on seed accounts?
What are the detailed online events?
What are the relationships between seed accounts and online events?
Thank you for offering this project for the stance detection community with social links.
However, I have some questions about the datasets. Could you help me to solve it?
As you said in the introduction section,
Stance detection aims at detecting the user’s stance on a topic or claim.
But in the datasets, I don't find the labels for the topics/claims/events.
I understand that the datasets can be modeled as a node classification task on a heterogeneous graph.
When I load the label_stances, I can have the label of 0/1/2(neutral/against/support). But I want to know the topic for such labels.
For example, if I have node 0 as 1, node 2 as 1, do they have the same stance on the same topic?
Because the tweets are given with 768-d embeddings, it is hard to extract meaningful topics.
How can I get the topics/claims for the label of stance?
Hello! Is there a way to access the raw data of the tweets? I mean the text itself. This would be very helpful if I want to try different embeddings. Thanks!
Stance detection is generally used to detect the stance of a piece of text.
Is the stance detection label of your data annotation here annotated for all historical tweets of the user as a whole text?
Why are there more "friends" than "followers"?
Hello,
1)You mentioned in the paper that you've calculated the z-score of each feature. However, upon inspecting the dataset, I found that no feature has a value greater than one. To my knowledge, the z-score is calculated as:
z = (x-E(x)) / std(x)
Have you standardized the data using the above z-score, or normalized it by dividing each column's values by the maximum value?
It would be easier to share the user_name feature or at least the user ID, for easier reproducibility.
Several authors who released public datasets have shared the user-ID. I kindly request to share with me in private the account ids or usernames via my email ([email protected]). If you really cannot share it, please provide me with preprocessing code for the entire dataset (especially graph features).
Another concern to me that is related to the above is what Twitter API endpoint I want to use so that I can construct and preprocess the data point identically to the dataset (especially the graph part). Thus, sharing the code you've used to go from raw data coming from Twitter API to such a dataset would be extremely helpful.
Thank you in advance.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.