Giter Club home page Giter Club logo

mrdl-and-mrdr's Introduction

The dataset for multi-party dialogue discourse parsing task

We mine the controversial topics in Reddit forum and proposed two new datasets: Multi-party Reddit Dialogue Link (MRDL) and Multi-party Reddit Dialogue Relation (MRDR).

Why reddit

The Reddit forums exhibit two text characteristics different from current datasets:
(1) User utterances are usually long and have complex logic.
(2) Related utterances are sometimes far away from each other in the dialogue due to the asynchronous nature of Reddit, for example other people insert into the conversations.

Data analysis

MRDL consists of 28922 dialogues and 265078 utterances, its labels is whether two utterances has relationship, which is drawn from the actual reply in the forum. MRDR is a subset of MRDL that contains 15645 dialogues and 185823 utterances with human-labeled reply relationships as labels. Our datasets provide a means to evaluate the performance of multi-party dialogue systems in argumentative multi-party dialogues, which are currently lacking in existing datasets.
Our dataset naturally forms a graph network with multiple types of nodes, and the connections between these nodes contain important information. We random select a comment node, dialogue node, and user node respectively, along with their one-hop neighbors in the graph to construct subgraphs, which are visualized as follows.
The large red nodes at the center of each graph represent the initially selected nodes for the subgraph, comment nodes are orange, user nodes are blue, and dialogue nodes are grey.
(a) shows the comment subgraph, which shows that a comment is published by a user, and has multiple reply comments, belonging to a specific dialogue.

(b) shows the user subgraph, where a user makes comments, participates in multiple dialogues, and interacts with other users. The related user count may be greater than the comment count , because when the user makes a high-value comment, multiple users may reply to it.
(c) shows the dialogue subgraph, which contains users and comments. The edges between comments represent the reply relationships and the edges between users represent interaction relationships.

mrdl-and-mrdr's People

Contributors

ai0research avatar

Stargazers

Shahriar avatar  avatar  avatar Yaxin Fan avatar Zhao ZhouYang avatar Haoran Yu avatar  avatar MR avatar MeowRain avatar AndrShikov avatar SerhiyMytrovtsiy avatar

Watchers

GodQI avatar  avatar

mrdl-and-mrdr's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.