Giter Club home page Giter Club logo

languagetoolr's Introduction

Project Status: Inactive โ€“ The project has reached a stable, usable state but is no longer being actively developed; support/maintenance will be provided as time allows. R-CMD-check Coverage Status license

LanguageToolR

LanguageToolR provides a wrapper for the LanguageTool CLI tool for spelling, grammar and language checking.

โ— We're not part of the LanguageTool team. This is an unofficial interface.

We only tested with LanguageTool 5.9, but it might as well work with other versions.

Installation

  1. Install this package via remotes
if (!require(remotes)) install.packages("remotes")
remotes::install_github("nevrome/LanguageToolR")
  1. Install languagetool for your system. You can do this with the following setup function or directly from package sources for your OS or manually following the instructions here: https://github.com/languagetool-org/languagetool
LanguageToolR::lato_quick_setup()

Usecase

testtext <- c(
  "LanguageTool offers spell and grammar checking.", 
  "Just paste your text here and click the 'Check Text' button.", 
  "Click the colored phrases for details on potential errors.", 
  "or use this text too see an few of of the problems that LanguageTool can detecd.", 
  "What do you thinks of grammar checkers? Please not that they are not perfect.", 
  "Style issues get a blue marker: It's 5 P.M. in the afternoon.", 
  "The weather was nice on Thursday, 27 June 2017."
)

LanguageToolR::languagetool(testtext)

languagetoolr's People

Contributors

gegznav avatar jmaspons avatar nevrome avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

languagetoolr's Issues

Use unit tests

I think, at least a few unit tests should be added to check basic functionality of the package.

The easiest way to set up the testing is to use functions: usethis::use_testthat() and usethis::use_test().

Not working with 3.4.4

While running devtools::install_github("nevrome/LanguageToolR") the following error shows
ERROR: this R is version 3.4.4, package 'LanguageToolR' requires R >= 3.5.0

Avoid function name clash (create a system and rename functions)

There are functions/objects, that have the same names, e.g., base::version and LanguageToolR::version(). I think, this kind of issue should be avoided. So I suggest preceding all helper function names with, e.g., lng_ (or a more appropriate prefix), and renaming to, e.g. lng_get_version(), lng_test_setup(), lng_get_languages() or lng_list_languages().

@nevrome What is your opinion on this?

Errors related to parsing of JSON output of languagetool() on Windows

Errors related to parsing of JSON output of languagetool() on Windows:

LanguageToolR::languagetool(LanguageToolR::test_text)
#> Warning in system(command, intern = TRUE, ignore.stderr = quiet):
#> running command 'java -jar "D:/Dokumentai/LanguageTool-4.6/languagetool-
#> commandline.jar" --encoding utf-8 --language en-GB --json "C:/Users/User/
#> AppData/Local/Temp/Rtmp8WL9BC/file18b8754a4c9a"' had status 1
#> Error in rjson::fromJSON(output_json): CHAR() can only be applied to a 'CHARSXP', not a 'NULL'

Created on 2019-08-09 by the reprex package (v0.3.0)

Let's use glue::glue() to construct strings

Package glue provides with an elegant way to construct strings. E.g.:

lang_tool_version <- 4.6
glue::glue("Current version: {lang_tool_version}")

results in:

#> Current version: 4.6

Could glue be used internally in this package to construct strings of commands, file names, etc. as a replacement for, e.g., paste()? @nevrome What do you think?

Check if correct version on JAVA is used

On my PC, the following command line code results in JAVA exception/Error, if 32-bit JAVA is used.

java -jar "D:/Dokumentai/LanguageTool-4.6/languagetool-commandline.jar" --encoding utf-8 --language en-GB --json "inst/test/test_text.txt"
(The output)
Warning: At the moment, your platform (Windows) is not supported by the official XGBoost maven package; ML-based suggestion reordering is disabled.
Expected text language: English (GB)
Working on inst/test/test_text.txt...
Exception in thread "main" java.lang.RuntimeException: java.util.concurrent.ExecutionException: java.lang.OutOfMemoryError
        at org.languagetool.MultiThreadedJLanguageTool.analyzeSentences(MultiThreadedJLanguageTool.java:153)
        at org.languagetool.JLanguageTool.check(JLanguageTool.java:738)
        at org.languagetool.JLanguageTool.check(JLanguageTool.java:716)
        at org.languagetool.JLanguageTool.check(JLanguageTool.java:699)
        at org.languagetool.JLanguageTool.check(JLanguageTool.java:658)
        at org.languagetool.JLanguageTool.check(JLanguageTool.java:642)
        at org.languagetool.commandline.CommandLineTools.checkText(CommandLineTools.java:106)
        at org.languagetool.commandline.CommandLineTools.checkText(CommandLineTools.java:82)
        at org.languagetool.commandline.Main.runOnFile(Main.java:194)
        at org.languagetool.commandline.Main.main(Main.java:466)
Caused by: java.util.concurrent.ExecutionException: java.lang.OutOfMemoryError
        at java.util.concurrent.ForkJoinTask.get(Unknown Source)
        at org.languagetool.MultiThreadedJLanguageTool.analyzeSentences(MultiThreadedJLanguageTool.java:146)
        ... 9 more
Caused by: java.lang.OutOfMemoryError
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source)
        at java.lang.reflect.Constructor.newInstance(Unknown Source)
        at java.util.concurrent.ForkJoinTask.getThrowableException(Unknown Source)
        ... 11 more
Caused by: java.lang.OutOfMemoryError: Java heap space
        at opennlp.tools.ml.model.AbstractModelReader.getParameters(AbstractModelReader.java:140)
        at opennlp.tools.ml.maxent.io.GISModelReader.constructModel(GISModelReader.java:78)
        at opennlp.tools.ml.model.GenericModelReader.constructModel(GenericModelReader.java:62)
        at opennlp.tools.ml.model.AbstractModelReader.getModel(AbstractModelReader.java:85)
        at opennlp.tools.util.model.GenericModelSerializer.create(GenericModelSerializer.java:32)
        at opennlp.tools.util.model.GenericModelSerializer.create(GenericModelSerializer.java:29)
        at opennlp.tools.util.model.BaseModel.finishLoadingArtifacts(BaseModel.java:309)
        at opennlp.tools.util.model.BaseModel.loadModel(BaseModel.java:239)
        at opennlp.tools.util.model.BaseModel.<init>(BaseModel.java:173)
        at opennlp.tools.postag.POSModel.<init>(POSModel.java:82)
        at org.languagetool.chunking.EnglishChunker.<init>(EnglishChunker.java:64)
        at org.languagetool.language.English.getChunker(English.java:136)
        at org.languagetool.JLanguageTool.getRawAnalyzedSentence(JLanguageTool.java:991)
        at org.languagetool.JLanguageTool.getAnalyzedSentence(JLanguageTool.java:966)
        at org.languagetool.MultiThreadedJLanguageTool$AnalyzeSentenceCallable.call(MultiThreadedJLanguageTool.java:208)
        at org.languagetool.MultiThreadedJLanguageTool$AnalyzeSentenceCallable.call(MultiThreadedJLanguageTool.java:199)
        at java.util.concurrent.ForkJoinTask$AdaptedCallable.exec(Unknown Source)
        at java.util.concurrent.ForkJoinTask.doExec(Unknown Source)
        at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(Unknown Source)
        at java.util.concurrent.ForkJoinPool.runWorker(Unknown Source)
        at java.util.concurrent.ForkJoinWorkerThread.run(Unknown Source)
I also asked a question about this kind of output (JAVA out of the memory issue) on Windows: languagetool-org/languagetool#1813 and got answer, that there is a way to solve it by adding additional lines of code, but [here](https://forum.languagetool.org/t/resolved-xgboost-warning-and-outofmemoryerror-when-checking-english-with-ngram/4539/2) is reported that similar issue was solved only by upgrading to 64-bit JAVA.

Could we include a function that correctly identifies this issue and creates a meaningful warning?

We can start with something like this:

# Check if 64-bit JAVA is used.
# Based on https://stackoverflow.com/a/38154064/4783029
lato_is_java_64bit <- function() {
  str1 <- system("java -XshowSettings:properties -version", intern = TRUE, ignore.stderr = FALSE)
  str2 <- gsub(".*sun.arch.data.model = (.*)", "\\1", str1[grepl("sun.arch.data.model = ", str1)])
  str2 == 64
}

Based on https://stackoverflow.com/a/38154064/4783029

This error is related to JAVA: 64-bit version of JAVA should be used to prevent the error.

Maybe checking if a correct version of JAVA is used could be performed and a meaningful warning displayed if needed?

Use Travis CI

Please, enable continuous integration service, such as Travis CI (website: https://travis-ci.org), to carry out free automatic checking for the package. In this way, you could know if, e.g., a new pull request breaks the functionality of the package or everything is OK.

The quickest way to enable Travis CI is to use function usethis::use_travis() (link to usethis) in the project of this package. This function will create necessary setup files and will open websites, where you will have to sign in. And these actions should be performed by the owner of this GitHub repository.

Errors related to path (~) on Windows

I installed the language tool with code LanguageToolR::quick_setup().
But it does not work:

data(test_text, package = "LanguageToolR")

LanguageToolR::languagetool(test_text)
#> Error in LanguageToolR::languagetool(test_text): The provided executable is not available or does not work correctly. You can install LanguageTool with the quick_setup() function.

Created on 2019-08-08 by the reprex package (v0.3.0)

If I use expand.path(), the tool seems to be working. Thus in my PC:

path.expand("~/LanguageTool-4.4/languagetool-commandline.jar")
#> D:/Dokumentai/LanguageTool-4.4/languagetool-commandline.jar

Unfortunately, one more issue occurs:

data(test_text, package = "LanguageToolR")

path_to_tool <- paste0('java -jar "', path.expand("~/LanguageTool-4.4/languagetool-commandline.jar"),'"')

LanguageToolR::languagetool(test_text, executable = path_to_tool)
#> Warning in system(command = paste(executable, paste(ifelse(recursive,
#> paste("--recursive"), : running command 'java -jar "D:/Dokumentai/
#> LanguageTool-4.4/languagetool-commandline.jar" --encoding utf-8 --language
#> en-GB --json C:\Users\ViG\AppData\Local\Temp\Rtmp6DK2xi\file51105287127d'
#> had status 1
#> Error in rjson::fromJSON(output_json): CHAR() can only be applied to a 'CHARSXP', not a 'NULL'

Created on 2019-08-08 by the reprex package (v0.3.0)

Session info
devtools::session_info()
- Session info -------------------------------------------------------------------------
 setting  value                       
 version  R version 3.6.1 (2019-07-05)
 os       Windows 10 x64              
 system   x86_64, mingw32             
 ui       RStudio                     
 language (EN)                        
 collate  English_United States.1252  
 ctype    English_United States.1252  
 tz       Europe/Helsinki             
 date     2019-08-08                  

- Packages -----------------------------------------------------------------------------
 package     * version    date       lib source                     
 assertthat    0.2.1      2019-03-21 [1] CRAN (R 3.6.0)             
 backports     1.1.4      2019-04-10 [1] CRAN (R 3.6.0)             
 callr         3.3.1      2019-07-18 [1] CRAN (R 3.6.1)             
 cli           1.1.0      2019-03-19 [1] CRAN (R 3.6.0)             
 crayon        1.3.4      2017-09-16 [1] CRAN (R 3.6.0)             
 desc          1.2.0      2019-06-24 [1] Github (r-lib/desc@c860e7b)
 devtools      2.1.0      2019-07-06 [1] CRAN (R 3.6.0)             
 digest        0.6.20     2019-07-04 [1] CRAN (R 3.6.0)             
 fs            1.3.1.9000 2019-06-24 [1] Github (r-lib/fs@00e2de8)  
 glue          1.3.1      2019-03-12 [1] CRAN (R 3.6.1)             
 magrittr      1.5        2014-11-22 [1] CRAN (R 3.6.0)             
 memoise       1.1.0      2017-04-21 [1] CRAN (R 3.6.0)             
 packrat       0.5.0      2018-11-14 [1] CRAN (R 3.6.0)             
 pkgbuild      1.0.4      2019-08-05 [1] CRAN (R 3.6.1)             
 pkgload       1.0.2      2018-10-29 [1] CRAN (R 3.6.0)             
 prettyunits   1.0.2      2015-07-13 [1] CRAN (R 3.6.0)             
 processx      3.4.1      2019-07-18 [1] CRAN (R 3.6.1)             
 ps            1.3.0      2018-12-21 [1] CRAN (R 3.6.0)             
 R6            2.4.0      2019-02-14 [1] CRAN (R 3.6.0)             
 Rcpp          1.0.2      2019-07-25 [1] CRAN (R 3.6.1)             
 remotes       2.1.0      2019-06-24 [1] CRAN (R 3.6.0)             
 rlang         0.4.0      2019-06-25 [1] CRAN (R 3.6.0)             
 rprojroot     1.3-2      2018-01-03 [1] CRAN (R 3.6.0)             
 rstudioapi    0.10       2019-03-19 [1] CRAN (R 3.6.0)             
 sessioninfo   1.1.1      2018-11-05 [1] CRAN (R 3.6.0)             
 testthat      2.2.1      2019-07-25 [1] CRAN (R 3.6.1)             
 usethis       1.5.1      2019-07-04 [1] CRAN (R 3.6.0)             
 withr         2.1.2      2018-03-15 [1] CRAN (R 3.6.0)   

Maybie some recent updates in R packages have broken the functionality of languagetool()?
Can these issues be fixed?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.