Giter Club home page Giter Club logo

rst-converter-service's Introduction

rst-converter-service

Github Actions Build status
Docker hub: nlpbox/rst-converter-service

REST API to convert between different Rhetorical Structure Theory file formats. It is built on top of the discoursegraphs library.

Supported Input Formats

original HILDA format

(Contrast[S][N]
  _!Although they did n't like it ,!_
  _!they accepted the offer . <P>!_)

adapted HILDA format

ParseTree('Contrast[S][N]', ["Although they did n't like it ,", 'they accepted the offer .'])

Supported Output Formats

  • dis
  • rs3
  • rstlatex (for embedding RST trees into LaTeX documents)
  • tree.prettyprint (ASCII-style tree)
  • svgtree (SVG image of an nltk Tree)
  • svgtree-base64 (base64 encoded SVG image of an nltk Tree)

Installation

The simplest way to install the rst-converter-service is using Docker:

git clone https://github.com/nlpbox/rst-converter-service.git
cd rst-converter-service/
docker build -t rst-converter-service .

Usage

To run the web service, type:

docker run -p 5000:5000 -ti rst-converter-service

In another terminal, you can now convert RST files. To convert the file car-repair.rs3 from rs3 format to dis format, type:

curl -XPOST localhost:5000/convert/rs3/dis -F [email protected]
(Root
  (span 1 2)
  (Satellite
    (leaf 1)
    (rel2par background)
    (text
      _!I am having my car repaired in Santa Monica (1522 Lincoln Blvd.) this Thursday 19th._!))
  (Nucleus
    (leaf 2)
    (rel2par span)
    (text
      _!Would anyone be able to bring me to ISI from there by 5 pm please?_!)))

For "visualizing" the RST tree, you might try prettyprinted trees:

curl -XPOST localhost:5000/convert/rs3/tree.prettyprint -F [email protected]
                 background
        _____________|______________
       S                            N
       |                            |
I am having my               Would anyone be
car repaired in             able to bring me
 Santa Monica (                to ISI from
 1522 Lincoln                 there by 5 pm
  Blvd.) this                    please?
 Thursday 19th.

To see all supported input and output formats, type

curl localhost:5000/input-formats
["codra", "dis", "dplp", "hilda", "hs2015", "rs3"]

or

curl localhost:5000/output-formats
["dis", "rs3", "rstlatex", "svgtree", "svgtree-base64", "tree.prettyprint"]

Citation

If you use the rst-converter-service in your academic work, please cite the following paper:

Neumann, A. 2015. discoursegraphs: A graph-based merging tool and converter for multilayer annotated corpora. In Proceedings of the 20th Nordic Conference of Computational Linguistics (NODALIDA 2015), pp. 309-312.

 @inproceedings{neumann2015discoursegraphs,
   title={discoursegraphs: A graph-based merging tool and converter for multilayer annotated corpora},
   author={Neumann, Arne},
   booktitle={Proceedings of the 20th Nordic Conference of Computational Linguistics (NODALIDA 2015)},
   pages={309-312},
   year={2015}
 }

rst-converter-service's People

Contributors

arne-cl avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

rst-converter-service's Issues

MemoryError: Unable to parse large HILDA output files

While the rst-converter-service is able to convert many of the hilda output files generated using feng-hirst-rst-parser but it is giving me memory issue for some. This is what I see:

{"error":"<class 'discoursegraphs.readwrite.rst.hilda.HILDARSTTree'> can't handle input file 'A02.txt'. Got: ","traceback":"Traceback (most recent call last):\n File \"app.py\", line 111, in post\n tree = read_function(temp_inputfile.name)\n File \"/usr/lib/python2.7/site-packages/discoursegraphs/readwrite/rst/hilda.py\", line 33, in __init__\n self.hildafile_tree = self.hildastr2hildatree(hilda_str)\n File \"/usr/lib/python2.7/site-packages/discoursegraphs/readwrite/rst/hilda.py\", line 56, in hildastr2hildatree\n return eval(parented_tree_str)\nMemoryError\n"}

dplp->rs3 can't handle Desai and Moldovan (2021) example

Input text:

On the surface, the overall unemployment rate is expected to be little changed from September's 5.3%.
But the actual head count of non-farm employment payroll jobs is likely to be muddied by the impact of Hurricane Hugo, strikes, and less-than-perfect seasonal adjustments, economists said.

DPLP output:

0 1 On on IN case 3 O (ROOT (S (PP (IN On) 1
0 2 the the DT det 3 O (NP (DT the) 1
0 3 surface, surface, NN nmod 44 O (NN surface,))) 1
0 4 the the DT det 7 O (NP (NP (DT the) 1
0 5 overall overall JJ amod 7 O (JJ overall) 1
0 6 unemployment unemployment NN compound 7 O (NN unemployment) 1
0 7 rate rate NN nsubj 44 O (NN rate)) 1
0 8 is be VBZ auxpass 9 O (VP (VBZ is) 1
0 9 expected expect VBN dep 7 O (VP (VBN expected) 1
0 10 to to TO mark 13 O (S (VP (TO to) 1
0 11 be be VB aux 13 O (VP (VB be) 1
0 12 little little RB advmod 13 O (VP (VP (ADVP (RB little)) 1
0 13 changed change VBN xcomp 9 O (VBN changed) 1
0 14 from from IN case 15 O (PP (IN from) 1
0 15 September's september' NNS nmod 13 O (NP (NP (NNS September's)) 1
0 16 5.3%. 5.3%. CD nummod 15 NUMBER (CD 5.3%.)))) 1
0 17 But but CC cc 28 O (SBAR (S (CC But) 1
0 18 the the DT det 21 O (NP (NP (DT the) 1
0 19 actual actual JJ amod 21 O (JJ actual) 1
0 20 head head NN compound 21 O (NN head) 1
0 21 count count NN nsubj 28 TITLE (NN count)) 1
0 22 of of IN case 26 O (PP (IN of) 1
0 23 non-farm non-farm JJ amod 26 O (NP (JJ non-farm) 1
0 24 employment employment NN compound 26 O (NN employment) 1
0 25 payroll payroll NN compound 26 O (NN payroll) 1
0 26 jobs job NNS nmod 21 O (NNS jobs)))) 1
0 27 is be VBZ cop 28 O (VP (VBZ is) 1
0 28 likely likely JJ ccomp 13 O (ADJP (JJ likely) 1
0 29 to to TO mark 31 O (S (VP (TO to) 1
0 30 be be VB auxpass 31 O (VP (VB be) 1
0 31 muddied muddy VBN xcomp 28 O (VP (VBN muddied) 1
0 32 by by IN case 34 O (PP (IN by) 1
0 33 the the DT det 34 O (NP (NP (DT the) 1
0 34 impact impact NN nmod 31 O (NN impact)) 1
0 35 of of IN case 38 O (PP (IN of) 1
0 36 Hurricane Hurricane NNP compound 38 CAUSE_OF_DEATH (NP (NP (NNP Hurricane) 1
0 37 Hugo, Hugo, NNP compound 38 O (NNP Hugo,) 1
0 38 strikes, strikes, NN nmod 34 O (NN strikes,)) 1
0 39 and and CC cc 38 O (CC and) 1
0 40 less-than-perfect less-than-perfect JJ amod 43 O (NP (JJ less-than-perfect) 1
0 41 seasonal seasonal JJ amod 43 O (JJ seasonal) 1
0 42 adjustments, adjustments, NN compound 43 O (NN adjustments,) 1
0 43 economists economist NNS conj 38 O (NNS economists))))))))))))))))))))) 1
0 44 said. said. VBP root 0 O (VP (VBP said.)))) 1

ParentedTree('EDU', ['1'])

rst-converter-service error:

Error: 500: INTERNAL SERVER ERROR
{"error":"<class 'discoursegraphs.readwrite.rst.dplp.DPLPRSTTree'> can't handle input file 'input.ext'. Got: The tree position () may not be assigned to.","traceback":"Traceback (most recent call last):\n File \"app.py\", line 113, in post\n tree = read_function(temp_inputfile.name)\n File \"/opt/discoursegraphs/src/discoursegraphs/readwrite/rst/dplp.py\", line 35, in __init__\n self.add_edus()\n File \"/opt/discoursegraphs/src/discoursegraphs/readwrite/rst/dplp.py\", line 91, in add_edus\n self.parsetree[parent_pos] = u\" \".join(edu_tokens)\n File \"/usr/lib/python2.7/site-packages/nltk/tree.py\", line 172, in __setitem__\n raise IndexError('The tree position () may not be '\nIndexError: The tree position () may not be assigned to.\n"}

It seems that DPLP only found one EDU and the dplp->rs3 converter can't handle that.

Error importing Werkzeug

just in case someone encountered an error in running the API .

add Werkzeug in docker file: Werkzeug==0.16.1

Flask broken keyword argument

flask renamed attachment_file to download_name in the send_file() function. This change breaks the code in src/rstconverter/app.py

Can't handle rs3 files generated by isanlp_rst

hate.rs3.txt

The isanlp_rst parser produces rs3 files that can be read by RSTTool,
but it crashes rst-converter-service / discoursegraphs, as well as rstWeb (at least the version I integrated into rst-workbench).

arne@t470:~/repos/isanlp_rst/examples$ curl -XPOST localhost:5000/convert/rs3/tree.prettyprint -F [email protected]
{"error":"<class 'discoursegraphs.readwrite.rst.rs3.rs3tree.RSTTree'> can't handle input file 'hate.rs3'. Got: 'satellite'","traceback":"Traceback (most recent call last):\n  File \"app.py\", line 112, in post\n    tree = read_function(temp_inputfile.name)\n  File \"/opt/discoursegraphs/src/discoursegraphs/readwrite/rst/rs3/rs3tree.py\", line 57, in __init__\n    self.tree = self.dt()\n  File \"/opt/discoursegraphs/src/discoursegraphs/readwrite/rst/rs3/rs3tree.py\", line
117, in dt\n    return self.root2tree(start_node=start_node)\n  File \"/opt/discoursegraphs/src/discoursegraphs/readwrite/rst/rs3/rs3tree.py\", line 140, in root2tree\n    return self.dt(start_node=root_nodes[0])\n  File \"/opt/discoursegraphs/src/discoursegraphs/readwrite/rst/rs3/rs3tree.py\", line 134, in dt\n    elem_id, elem, elem_type, start_node=start_node)\n  File \"/opt/discoursegraphs/src/discoursegraphs/readwrite/rst/rs3/rs3tree.py\", line 240, in group2tree\n
  sat_id = children['satellite']\nKeyError: 'satellite'\n"}
arne@t470:~/repos/isanlp_rst/examples$ curl -XPOST localhost:5000/convert/rs3/dis -F [email protected]
{"error":"<class 'discoursegraphs.readwrite.rst.rs3.rs3tree.RSTTree'> can't handle input file 'hate.rs3'. Got: 'satellite'","traceback":"Traceback (most recent call last):\n  File \"app.py\", line 112, in post\n    tree = read_function(temp_inputfile.name)\n  File \"/opt/discoursegraphs/src/discoursegraphs/readwrite/rst/rs3/rs3tree.py\", line 57, in __init__\n    self.tree = self.dt()\n  File \"/opt/discoursegraphs/src/discoursegraphs/readwrite/rst/rs3/rs3tree.py\", line
117, in dt\n    return self.root2tree(start_node=start_node)\n  File \"/opt/discoursegraphs/src/discoursegraphs/readwrite/rst/rs3/rs3tree.py\", line 140, in root2tree\n    return self.dt(start_node=root_nodes[0])\n  File \"/opt/discoursegraphs/src/discoursegraphs/readwrite/rst/rs3/rs3tree.py\", line 134, in dt\n    elem_id, elem, elem_type, start_node=start_node)\n  File \"/opt/discoursegraphs/src/discoursegraphs/readwrite/rst/rs3/rs3tree.py\", line 240, in group2tree\n
  sat_id = children['satellite']\nKeyError: 'satellite'\n"}

rsttool_hate

xvfb-run: error: Xvfb failed to start

When starting rst-converter-service as part of a docker-compose setup (e.g. in the rst-workbench), it often crashes like this:

arne@tiny-brick ~/repos/rst/rst-workbench $ docker-compose up
Starting rstworkbench_rst-converter-service_1       ... done
Starting rstworkbench_rstweb-service_1              ... done
Starting rstworkbench_rst-workbench-frontend_1      ... done
Recreating rstworkbench_rst-workbench-mock-parser_1 ... done
Starting rstworkbench_hilda-service_1               ... done
Attaching to rstworkbench_rstweb-service_1, rstworkbench_rst-converter-service_1, rstworkbench_hilda-service_1, rstworkbench_rst-workbench-frontend_1, rstworkbench_rst-workbench-mock-parser_1
rstweb-service_1             | [05/Dec/2019:12:08:45] ENGINE Bus STARTING
rstweb-service_1             | [05/Dec/2019:12:08:45] ENGINE Started monitor thread 'Autoreloader'.
rstweb-service_1             | [05/Dec/2019:12:08:45] ENGINE Serving on http://0.0.0.0:8080
rstweb-service_1             | [05/Dec/2019:12:08:45] ENGINE Bus STARTED
rst-workbench-mock-parser_1  |  * Serving Flask app "app" (lazy loading)
rst-workbench-mock-parser_1  |  * Environment: production
rst-workbench-mock-parser_1  |    WARNING: This is a development server. Do not use it in a production deployment.
rst-workbench-mock-parser_1  |    Use a production WSGI server instead.
rst-workbench-mock-parser_1  |  * Debug mode: off
rst-workbench-mock-parser_1  |  * Running on http://0.0.0.0:5000/ (Press CTRL+C to quit)
rst-converter-service_1      | xvfb-run: error: Xvfb failed to start
rstworkbench_rst-converter-service_1 exited with code 1

We can work around this by running the rst-workbench as docker-compose up --build --force-recreate, but a proper fix would be better.

Could not use the service - Error

Hi
I could not use the service. I got this error and I guess it is because of the python version.

  File "app.py", line 21, in <module>
    import discoursegraphs as dg
  File "/usr/lib/python2.7/site-packages/discoursegraphs/__init__.py", line 23, in <module>
    from discoursegraphs.readwrite import (
  File "/usr/lib/python2.7/site-packages/discoursegraphs/readwrite/__init__.py", line 20, in <module>
    from discoursegraphs.readwrite.freqt import docgraph2freqt, write_freqt
  File "/usr/lib/python2.7/site-packages/discoursegraphs/readwrite/freqt.py", line 14, in <module>
    from discoursegraphs.readwrite.tree import sorted_bfs_successors
  File "/usr/lib/python2.7/site-packages/discoursegraphs/readwrite/tree.py", line 13, in <module>
    from nltk.tree import Tree, ParentedTree
  File "/usr/lib/python2.7/site-packages/nltk/__init__.py", line 128, in <module>
    from nltk.collocations import *
  File "/usr/lib/python2.7/site-packages/nltk/collocations.py", line 35, in <module>
    from nltk.probability import FreqDist
  File "/usr/lib/python2.7/site-packages/nltk/probability.py", line 333
    print("%*s" % (width, samples[i]), end=" ")

rs3->dis/svg conversion can't handle Szeryng example text

feng-hirst-2014-result.rs3.txt

curl -XPOST localhost:9150/convert/rs3/dis -F [email protected]
{"error":"<class 'discoursegraphs.readwrite.rst.rs3.rs3tree.RSTTree'> can't handle input file 'feng-hirst-2014-result.rs3.txt'. Got: ","traceback":"Traceback (most recent call last):\n  File \"app.py\", line 113, in post\n    tree = read_function(temp_inputfile.name)\n  File \"/opt/discoursegraphs/src/discoursegraphs/readwrite/rst/rs3/rs3tree.py\", line 57, in __init__\n    self.tree = self.dt()\n  File \"/opt/discoursegraphs/src/discoursegraphs/readwrite/rst/rs3/rs3tree.py\", line 117, in dt\n    return self.root2tree(start_node=start_node)\n  File \"/opt/discoursegraphs/src/discoursegraphs/readwrite/rst/rs3/rs3tree.py\", line 140, in root2tree\n    return self.dt(start_node=root_nodes[0])\n  File \"/opt/discoursegraphs/src/discoursegraphs/readwrite/rst/rs3/rs3tree.py\", line 134, in dt\n    elem_id, elem, elem_type, start_node=start_node)\n  File \"/opt/discoursegraphs/src/discoursegraphs/readwrite/rst/rs3/rs3tree.py\", line 231, in group2tree\n    return self.dt(start_node=child_id)\n  File \"/opt/discoursegraphs/src/discoursegraphs/readwrite/rst/rs3/rs3tree.py\", line 134, in dt\n    elem_id, elem, elem_type, start_node=start_node)\n  File \"/opt/discoursegraphs/src/discoursegraphs/readwrite/rst/rs3/rs3tree.py\", line 245, in group2tree\n    sat_subtree = self.dt(start_node=sat_id)\n  File \"/opt/discoursegraphs/src/discoursegraphs/readwrite/rst/rs3/rs3tree.py\", line 134, in dt\n    elem_id, elem, elem_type, start_node=start_node)\n  File \"/opt/discoursegraphs/src/discoursegraphs/readwrite/rst/rs3/rs3tree.py\", line 174, in group2tree\n    subtree = self.dt(start_node=subtree_id)\n  File \"/opt/discoursegraphs/src/discoursegraphs/readwrite/rst/rs3/rs3tree.py\", line 134, in dt\n    elem_id, elem, elem_type, start_node=start_node)\n  File \"/opt/discoursegraphs/src/discoursegraphs/readwrite/rst/rs3/rs3tree.py\", line 247, in group2tree\n    nuc_subtree = self.dt(start_node=children['nucleus'])\n  File \"/opt/discoursegraphs/src/discoursegraphs/readwrite/rst/rs3/rs3tree.py\", line 134, in dt\n    elem_id, elem, elem_type, start_node=start_node)\n  File \"/opt/discoursegraphs/src/discoursegraphs/readwrite/rst/rs3/rs3tree.py\", line 245, in group2tree\n    sat_subtree = self.dt(start_node=sat_id)\n  File \"/opt/discoursegraphs/src/discoursegraphs/readwrite/rst/rs3/rs3tree.py\", line 134, in dt\n    elem_id, elem, elem_type, start_node=start_node)\n  File \"/opt/discoursegraphs/src/discoursegraphs/readwrite/rst/rs3/rs3tree.py\", line 174, in group2tree\n    subtree = self.dt(start_node=subtree_id)\n  File \"/opt/discoursegraphs/src/discoursegraphs/readwrite/rst/rs3/rs3tree.py\", line 134, in dt\n    elem_id, elem, elem_type, start_node=start_node)\n  File \"/opt/discoursegraphs/src/discoursegraphs/readwrite/rst/rs3/rs3tree.py\", line 217, in group2tree\n    for child_id in other_child_ids]\n  File \"/opt/discoursegraphs/src/discoursegraphs/readwrite/rst/rs3/rs3tree.py\", line 134, in dt\n    elem_id, elem, elem_type, start_node=start_node)\n  File \"/opt/discoursegraphs/src/discoursegraphs/readwrite/rst/rs3/rs3tree.py\", line 174, in group2tree\n    subtree = self.dt(start_node=subtree_id)\n  File \"/opt/discoursegraphs/src/discoursegraphs/readwrite/rst/rs3/rs3tree.py\", line 134, in dt\n    elem_id, elem, elem_type, start_node=start_node)\n  File \"/opt/discoursegraphs/src/discoursegraphs/readwrite/rst/rs3/rs3tree.py\", line 245, in group2tree\n    sat_subtree = self.dt(start_node=sat_id)\n  File \"/opt/discoursegraphs/src/discoursegraphs/readwrite/rst/rs3/rs3tree.py\", line 134, in dt\n    elem_id, elem, elem_type, start_node=start_node)\n  File \"/opt/discoursegraphs/src/discoursegraphs/readwrite/rst/rs3/rs3tree.py\", line 178, in group2tree\n    for c in self.child_dict[elem_id]]\n  File \"/opt/discoursegraphs/src/discoursegraphs/readwrite/rst/rs3/rs3tree.py\", line 134, in dt\n    elem_id, elem, elem_type, start_node=start_node)\n  File \"/opt/discoursegraphs/src/discoursegraphs/readwrite/rst/rs3/rs3tree.py\", line 174, in group2tree\n    subtree = self.dt(start_node=subtree_id)\n  File \"/opt/discoursegraphs/src/discoursegraphs/readwrite/rst/rs3/rs3tree.py\", line 134, in dt\n    elem_id, elem, elem_type, start_node=start_node)\n  File \"/opt/discoursegraphs/src/discoursegraphs/readwrite/rst/rs3/rs3tree.py\", line 266, in group2tree\n    assert len(children['nucleus']) == 1\nAssertionError\n"}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.