/** Copyright (C) 2013 by SMU Text Mining Group/Singapore Management University/Peking University
Debate Dataset is distributed for research purpose, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
If you use this dataset, please cite the following paper:
Minghui Qiu, Liu Yang and Jing Jiang. Mining User Relations from Online Discussions using Sentiment Analysis and Probabilistic Matrix Factorization.In Proceedings of the 2013 Conference of North American Chapter of Association for Computational Linguistics: Human Language Technologies (NAACL 2013). (http://aclweb.org/anthology//N/N13/N13-1041.pdf) **/
-
This data sets are used in the paper: Minghui Qiu, Liu Yang and Jing Jiang. Mining User Relations from Online Discussions using Sentiment Analysis and Probabilistic Matrix Factorization. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), Atlanta, GA, 2013.
-
Descriptions: Folder "sents":
- It contains all sents for threads.
- Each file is a thread, and each line is a post.
- Sent format: each post is in this fomrat "source target url post_id sentence"
-
Folder "labels": It contains user labels for each thread.
-
Acknowledge:
Please note that the above data sets are from: Amjad Abu-Jbara, Pradeep Dasigi, Mona Diab, and Dragomir R. Radev. 2012. Subgroup detection in ideological discussions. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics, pages 399–409.
The original data sets contain: 117 Wikipedia discussions collected from www.wikipedia.org (directory: wikipedia) 30 Debates collected from www.createdebate.com (directory: createdeate) 12 Political discussions collected from www.politicalforum.com (directory: politicalforum)