Giter Club home page Giter Club logo

corwa's People

Contributors

jacklxc avatar mandalbiswadip avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

Forkers

dankoan

corwa's Issues

Dataset cited paper in acl/pdf_parses.jsonl

I followed your instruction to extract the pdf_parses for papers in ACL, from the S2ORC dataset, obtaining a "20200705v1/acl/pdf_parses.jsonl". However, I notice that not all cited and citing papers in your dataset (e.g., test dataset CORWA_test.jsonl) can be found in this file (i.e., the ids listed in the CORWA_test.jsonl for both cited and citing papers are not found in acl/pdf_parses.jsonl). In the LED generation process, you need to take as input the abstract / introduction of both cited and citing paper, and if I understand correctly, you at least need to extract the abstract and introduction of all the citing and cited papers in the test dataset, from any pdf_parses file. Should I just scan through the entire "20200705v1/full/pdf_parses/* " to obtain such information? Thank you very much in advance!

Confusion regarding citation span detection

Hello, first of all thanks for the nice dataset. One part of your paper that caught my attention is Section 3.1.2 Citation Span Detection, where you defined a citation span as "the span of text whose information is directly derived from a specific cited paper". To my understanding, the annotation protocol relevant to this section is like so:

  • If the cited paper is explained, then the annotators were to label the explanation within the citing paper. The explanation may be only part of a sentence or go across sentence boundaries.
  • If the cited paper is not explained, then the annotators were to label the citation mark for the cited paper.

Here's an example of what I think counts as the first case:

  • data/annotated_train/10011032.txt, Line 37: [BOS] Zhang and Clark (2008) proposed an incremental joint segmentation and POS tagging model, with an effective feature set for Chinese.
  • data/annotated_train/10011032.ann, Line 54: T56 Dominant 4577 4599 Zhang and Clark (2008)

I expected the annotation to be "incremental joint segmentation and POS tagging model, with an effective feature set for Chinese" instead. Did I maybe understand the explanation of the annotation protocol wrongly? Looking forward to your response.

Example of `related_work.jsonl` file

I am trying to run your model on my own dataset and was wondering if it would be possible for you to share your related_work.jsonl file, to use as a reference for the data structure.

Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.