nevrome / languagetoolr Goto Github PK
View Code? Open in Web Editor NEWR Package: Wrapper for the LanguageTool checking engine
License: GNU General Public License v3.0
R Package: Wrapper for the LanguageTool checking engine
License: GNU General Public License v3.0
There are functions/objects, that have the same names, e.g., base::version
and LanguageToolR::version()
. I think, this kind of issue should be avoided. So I suggest preceding all helper function names with, e.g., lng_
(or a more appropriate prefix), and renaming to, e.g. lng_get_version()
, lng_test_setup()
, lng_get_languages()
or lng_list_languages()
.
@nevrome What is your opinion on this?
I think, at least a few unit tests should be added to check basic functionality of the package.
The easiest way to set up the testing is to use functions: usethis::use_testthat()
and usethis::use_test()
.
Errors related to parsing of JSON output of languagetool()
on Windows:
LanguageToolR::languagetool(LanguageToolR::test_text)
#> Warning in system(command, intern = TRUE, ignore.stderr = quiet):
#> running command 'java -jar "D:/Dokumentai/LanguageTool-4.6/languagetool-
#> commandline.jar" --encoding utf-8 --language en-GB --json "C:/Users/User/
#> AppData/Local/Temp/Rtmp8WL9BC/file18b8754a4c9a"' had status 1
#> Error in rjson::fromJSON(output_json): CHAR() can only be applied to a 'CHARSXP', not a 'NULL'
Created on 2019-08-09 by the reprex package (v0.3.0)
While running devtools::install_github("nevrome/LanguageToolR")
the following error shows
ERROR: this R is version 3.4.4, package 'LanguageToolR' requires R >= 3.5.0
I installed the language tool with code LanguageToolR::quick_setup()
.
But it does not work:
data(test_text, package = "LanguageToolR")
LanguageToolR::languagetool(test_text)
#> Error in LanguageToolR::languagetool(test_text): The provided executable is not available or does not work correctly. You can install LanguageTool with the quick_setup() function.
Created on 2019-08-08 by the reprex package (v0.3.0)
If I use expand.path()
, the tool seems to be working. Thus in my PC:
path.expand("~/LanguageTool-4.4/languagetool-commandline.jar")
#> D:/Dokumentai/LanguageTool-4.4/languagetool-commandline.jar
Unfortunately, one more issue occurs:
data(test_text, package = "LanguageToolR")
path_to_tool <- paste0('java -jar "', path.expand("~/LanguageTool-4.4/languagetool-commandline.jar"),'"')
LanguageToolR::languagetool(test_text, executable = path_to_tool)
#> Warning in system(command = paste(executable, paste(ifelse(recursive,
#> paste("--recursive"), : running command 'java -jar "D:/Dokumentai/
#> LanguageTool-4.4/languagetool-commandline.jar" --encoding utf-8 --language
#> en-GB --json C:\Users\ViG\AppData\Local\Temp\Rtmp6DK2xi\file51105287127d'
#> had status 1
#> Error in rjson::fromJSON(output_json): CHAR() can only be applied to a 'CHARSXP', not a 'NULL'
Created on 2019-08-08 by the reprex package (v0.3.0)
devtools::session_info()
- Session info -------------------------------------------------------------------------
setting value
version R version 3.6.1 (2019-07-05)
os Windows 10 x64
system x86_64, mingw32
ui RStudio
language (EN)
collate English_United States.1252
ctype English_United States.1252
tz Europe/Helsinki
date 2019-08-08
- Packages -----------------------------------------------------------------------------
package * version date lib source
assertthat 0.2.1 2019-03-21 [1] CRAN (R 3.6.0)
backports 1.1.4 2019-04-10 [1] CRAN (R 3.6.0)
callr 3.3.1 2019-07-18 [1] CRAN (R 3.6.1)
cli 1.1.0 2019-03-19 [1] CRAN (R 3.6.0)
crayon 1.3.4 2017-09-16 [1] CRAN (R 3.6.0)
desc 1.2.0 2019-06-24 [1] Github (r-lib/desc@c860e7b)
devtools 2.1.0 2019-07-06 [1] CRAN (R 3.6.0)
digest 0.6.20 2019-07-04 [1] CRAN (R 3.6.0)
fs 1.3.1.9000 2019-06-24 [1] Github (r-lib/fs@00e2de8)
glue 1.3.1 2019-03-12 [1] CRAN (R 3.6.1)
magrittr 1.5 2014-11-22 [1] CRAN (R 3.6.0)
memoise 1.1.0 2017-04-21 [1] CRAN (R 3.6.0)
packrat 0.5.0 2018-11-14 [1] CRAN (R 3.6.0)
pkgbuild 1.0.4 2019-08-05 [1] CRAN (R 3.6.1)
pkgload 1.0.2 2018-10-29 [1] CRAN (R 3.6.0)
prettyunits 1.0.2 2015-07-13 [1] CRAN (R 3.6.0)
processx 3.4.1 2019-07-18 [1] CRAN (R 3.6.1)
ps 1.3.0 2018-12-21 [1] CRAN (R 3.6.0)
R6 2.4.0 2019-02-14 [1] CRAN (R 3.6.0)
Rcpp 1.0.2 2019-07-25 [1] CRAN (R 3.6.1)
remotes 2.1.0 2019-06-24 [1] CRAN (R 3.6.0)
rlang 0.4.0 2019-06-25 [1] CRAN (R 3.6.0)
rprojroot 1.3-2 2018-01-03 [1] CRAN (R 3.6.0)
rstudioapi 0.10 2019-03-19 [1] CRAN (R 3.6.0)
sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 3.6.0)
testthat 2.2.1 2019-07-25 [1] CRAN (R 3.6.1)
usethis 1.5.1 2019-07-04 [1] CRAN (R 3.6.0)
withr 2.1.2 2018-03-15 [1] CRAN (R 3.6.0)
Maybie some recent updates in R packages have broken the functionality of languagetool()
?
Can these issues be fixed?
On my PC, the following command line code results in JAVA exception/Error, if 32-bit JAVA is used.
java -jar "D:/Dokumentai/LanguageTool-4.6/languagetool-commandline.jar" --encoding utf-8 --language en-GB --json "inst/test/test_text.txt"
Warning: At the moment, your platform (Windows) is not supported by the official XGBoost maven package; ML-based suggestion reordering is disabled.
Expected text language: English (GB)
Working on inst/test/test_text.txt...
Exception in thread "main" java.lang.RuntimeException: java.util.concurrent.ExecutionException: java.lang.OutOfMemoryError
at org.languagetool.MultiThreadedJLanguageTool.analyzeSentences(MultiThreadedJLanguageTool.java:153)
at org.languagetool.JLanguageTool.check(JLanguageTool.java:738)
at org.languagetool.JLanguageTool.check(JLanguageTool.java:716)
at org.languagetool.JLanguageTool.check(JLanguageTool.java:699)
at org.languagetool.JLanguageTool.check(JLanguageTool.java:658)
at org.languagetool.JLanguageTool.check(JLanguageTool.java:642)
at org.languagetool.commandline.CommandLineTools.checkText(CommandLineTools.java:106)
at org.languagetool.commandline.CommandLineTools.checkText(CommandLineTools.java:82)
at org.languagetool.commandline.Main.runOnFile(Main.java:194)
at org.languagetool.commandline.Main.main(Main.java:466)
Caused by: java.util.concurrent.ExecutionException: java.lang.OutOfMemoryError
at java.util.concurrent.ForkJoinTask.get(Unknown Source)
at org.languagetool.MultiThreadedJLanguageTool.analyzeSentences(MultiThreadedJLanguageTool.java:146)
... 9 more
Caused by: java.lang.OutOfMemoryError
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source)
at java.lang.reflect.Constructor.newInstance(Unknown Source)
at java.util.concurrent.ForkJoinTask.getThrowableException(Unknown Source)
... 11 more
Caused by: java.lang.OutOfMemoryError: Java heap space
at opennlp.tools.ml.model.AbstractModelReader.getParameters(AbstractModelReader.java:140)
at opennlp.tools.ml.maxent.io.GISModelReader.constructModel(GISModelReader.java:78)
at opennlp.tools.ml.model.GenericModelReader.constructModel(GenericModelReader.java:62)
at opennlp.tools.ml.model.AbstractModelReader.getModel(AbstractModelReader.java:85)
at opennlp.tools.util.model.GenericModelSerializer.create(GenericModelSerializer.java:32)
at opennlp.tools.util.model.GenericModelSerializer.create(GenericModelSerializer.java:29)
at opennlp.tools.util.model.BaseModel.finishLoadingArtifacts(BaseModel.java:309)
at opennlp.tools.util.model.BaseModel.loadModel(BaseModel.java:239)
at opennlp.tools.util.model.BaseModel.<init>(BaseModel.java:173)
at opennlp.tools.postag.POSModel.<init>(POSModel.java:82)
at org.languagetool.chunking.EnglishChunker.<init>(EnglishChunker.java:64)
at org.languagetool.language.English.getChunker(English.java:136)
at org.languagetool.JLanguageTool.getRawAnalyzedSentence(JLanguageTool.java:991)
at org.languagetool.JLanguageTool.getAnalyzedSentence(JLanguageTool.java:966)
at org.languagetool.MultiThreadedJLanguageTool$AnalyzeSentenceCallable.call(MultiThreadedJLanguageTool.java:208)
at org.languagetool.MultiThreadedJLanguageTool$AnalyzeSentenceCallable.call(MultiThreadedJLanguageTool.java:199)
at java.util.concurrent.ForkJoinTask$AdaptedCallable.exec(Unknown Source)
at java.util.concurrent.ForkJoinTask.doExec(Unknown Source)
at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(Unknown Source)
at java.util.concurrent.ForkJoinPool.runWorker(Unknown Source)
at java.util.concurrent.ForkJoinWorkerThread.run(Unknown Source)
Could we include a function that correctly identifies this issue and creates a meaningful warning?
We can start with something like this:
# Check if 64-bit JAVA is used.
# Based on https://stackoverflow.com/a/38154064/4783029
lato_is_java_64bit <- function() {
str1 <- system("java -XshowSettings:properties -version", intern = TRUE, ignore.stderr = FALSE)
str2 <- gsub(".*sun.arch.data.model = (.*)", "\\1", str1[grepl("sun.arch.data.model = ", str1)])
str2 == 64
}
Based on https://stackoverflow.com/a/38154064/4783029
This error is related to JAVA: 64-bit version of JAVA should be used to prevent the error.
Maybe checking if a correct version of JAVA is used could be performed and a meaningful warning displayed if needed?
Please, enable continuous integration service, such as Travis CI (website: https://travis-ci.org), to carry out free automatic checking for the package. In this way, you could know if, e.g., a new pull request breaks the functionality of the package or everything is OK.
The quickest way to enable Travis CI is to use function usethis::use_travis()
(link to usethis) in the project of this package. This function will create necessary setup files and will open websites, where you will have to sign in. And these actions should be performed by the owner of this GitHub repository.
Package glue
provides with an elegant way to construct strings. E.g.:
lang_tool_version <- 4.6
glue::glue("Current version: {lang_tool_version}")
results in:
#> Current version: 4.6
Could glue
be used internally in this package to construct strings of commands, file names, etc. as a replacement for, e.g., paste()
? @nevrome What do you think?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.