Comments (11)
Yep - seems to work fine for me now.
Thanks:-)
from tabulapdf.
Can you give me the output of sessionInfo()
?
from tabulapdf.
Sure:
R version 3.2.3 (2015-12-10)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.9.5 (Mavericks)
locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] ggplot2_2.1.0 plyr_1.8.3 RJSONIO_1.3-0 tabulizer_0.1.4 RSQLite_1.0.0 DBI_0.3.1
loaded via a namespace (and not attached):
[1] labeling_0.3 colorspace_1.2-6 scales_0.4.0 tools_3.2.3 gtable_0.2.0 Rcpp_0.12.4
[7] grid_3.2.3 knitr_1.12.22 rJava_0.9-8 munsell_0.4.3
from tabulapdf.
Thanks. I will investigate.
from tabulapdf.
Thanks.
FWIW, quick write up on package in general here: https://blog.ouseful.info/2016/05/02/when-documents-become-databases-tabulizer-r-wrapper-for-tabula-pdf-table-extractor/
I note that the area settings passed to the area function aren't the exact same ones as those displayed by the Tabula app.
from tabulapdf.
With the update I just pushed, give this a try for that example:
x1 <- extract_tables(tmp, guess = FALSE, pages = 2, area=list(c(200, 10, 700, 120)), method = "data.frame")
str(x2)
## List of 1
## $ :'data.frame': 27 obs. of 2 variables:
## ..$ LAP : chr [1:27] "1 " "2 " "3 " "4 " ...
## ..$ TIME: chr [1:27] "15:05:38" "3:02.377" "2:43.433" "1:44.375" ...
x2 <- extract_tables(tmp, guess = FALSE, pages = 2, area=list(c(200, 10, 700, 120)), method = "data.frame")
str(x2)
## List of 2
## $ :'data.frame': 27 obs. of 2 variables:
## ..$ LAP : chr [1:27] "1 " "2 " "3 " "4 " ...
## ..$ TIME: chr [1:27] "15:05:38" "3:02.377" "2:43.433" "1:44.375" ...
## $ :'data.frame': 26 obs. of 2 variables:
## ..$ LAP : chr [1:26] "1 P " "2 " "3 " "4 " ...
## ..$ TIME: chr [1:26] "15:06:12" "2:47.116" "2:34.467" "1:47.931" ...
from tabulapdf.
So:
- area bounds thing seems to be fixed;
- scrape works for multiple pages specified;
- omitting
pages
attribute throws an error:
May 2, 2016 4:14:09 PM org.apache.pdfbox.util.operator.pagedrawer.FillEvenOddRule process
WARNING: java.lang.ArrayIndexOutOfBoundsException
java.lang.ArrayIndexOutOfBoundsException
Error in area[[p]] : subscript out of bounds
(Thanks for the quick fix, btw:-)
from tabulapdf.
Can you let me know if this is still happening with the current version?
from tabulapdf.
If I remove the pages argument from a test where it works with multiple pages, it still seems to fail with Error in area[[p]] : subscript out of bounds
from tabulapdf.
This should be fixed now. For your example file:
> x1 <- extract_tables(tmp, guess = FALSE, pages = NULL, area=list(c(200, 10, 700, 120)), method = "data.frame")
> str(x1)
List of 8
$ :'data.frame': 26 obs. of 2 variables:
..$ LAP : chr [1:26] "1 P " "2 " "3 " "4 " ...
..$ TIME: chr [1:26] "15:05:52" "3:07.062" "2:34.443" "1:47.719" ...
$ :'data.frame': 27 obs. of 2 variables:
..$ LAP : chr [1:27] "1 " "2 " "3 " "4 " ...
..$ TIME: chr [1:27] "15:05:38" "3:02.377" "2:43.433" "1:44.375" ...
$ :'data.frame': 26 obs. of 2 variables:
..$ LAP : chr [1:26] "1 P " "2 " "3 " "4 " ...
..$ TIME: chr [1:26] "15:06:12" "2:47.116" "2:34.467" "1:47.931" ...
$ :'data.frame': 27 obs. of 2 variables:
..$ LAP : chr [1:27] "1 " "2 " "3 " "4 " ...
..$ TIME: chr [1:27] "15:05:40" "3:02.132" "2:42.205" "1:46.001" ...
$ :'data.frame': 26 obs. of 2 variables:
..$ LAP : chr [1:26] "1 " "2 " "3 " "4 " ...
..$ TIME: chr [1:26] "15:05:53" "3:01.022" "2:35.753" "1:45.909" ...
$ :'data.frame': 26 obs. of 2 variables:
..$ LAP : chr [1:26] "1 " "2 " "3 " "4 " ...
..$ TIME: chr [1:26] "15:05:49" "2:59.825" "2:39.662" "1:46.149" ...
$ :'data.frame': 26 obs. of 2 variables:
..$ LAP : chr [1:26] "1 " "2 " "3 " "4 " ...
..$ TIME: chr [1:26] "15:05:50" "2:59.526" "2:38.929" "1:46.122" ...
$ :'data.frame': 26 obs. of 2 variables:
..$ LAP : chr [1:26] "1 " "2 " "3 " "4 " ...
..$ TIME: chr [1:26] "15:05:54" "3:01.144" "2:35.261" "1:46.114" ...
from tabulapdf.
Hi All,
I am facing similar error for extract_areas function. I am able to select the table which I want to extract from PDF document but whenever I am clicking Done button, getting following error. I am not facing any issues for other pdf docs
code -
jh<- extract_areas(loaction1,6)
Error-
Error in .jcall("RJavaTools", "Ljava/lang/Object;", "invokeMethod", cl, :
java.util.NoSuchElementException
sessionInfo():
R version 3.3.1 (2016-06-21)
Platform: x86_64-redhat-linux-gnu (64-bit)
Running under: Red Hat Enterprise Linux Server release 6.5 (Santiago)
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=en_US.UTF-8
[9] LC_ADDRESS=en_US.UTF-8 LC_TELEPHONE=en_US.UTF-8
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods
[7] base
other attached packages:
[1] miniUI_0.1.1 shiny_1.0.0 dplyr_0.5.0
[4] tabulizer_0.1.22
loaded via a namespace (and not attached):
[1] Rcpp_0.12.9 png_0.1-7 digest_0.6.11
[4] assertthat_0.1 mime_0.5 R6_2.2.0
[7] jsonlite_1.2 xtable_1.8-2 DBI_0.5-1
[10] magrittr_1.5 tabulizerjars_0.1.2 tools_3.3.1
[13] httpuv_1.3.3 rJava_0.9-8 htmltools_0.3.5
[16] tibble_1.2
from tabulapdf.
Related Issues (20)
- Specifying columns as percentages
- Having problems with automate table recognition, can one save areas found manually for reproduction?
- {tabulizer} got archived on CRAN on 2021-10-31 HOT 20
- extract_tables function status was 'SSL connect error' error
- Select multiple areas per page in `*_areas()`
- q question about package( Tabulizer) installation HOT 4
- a suggested code or documentation change, improvement to the code, or feature request HOT 1
- inconsistent behavior of extract_tables and extract_areas HOT 4
- Error in .jcall("RJavaTools", "Ljava/lang/Object;", "invokeMethod", cl, : java.lang.IllegalAccessException: class RJavaTools cannot access a member of class java.util.ArrayList$Itr (in module java.base) with modifiers "public" HOT 11
- New Maintainer Wanted :-) HOT 6
- An illegal reflective access operation has occurred HOT 1
- Renaming to tabula HOT 2
- Windows CI fails because of Java 8 requirement HOT 1
- build fails with tabula 1.2.1 jar HOT 1
- ROADMAP FOR FALL 2023 HOT 6
- Unable to install in tabulizer HOT 1
- pkgdown building issue HOT 10
- Is jdk7 -y needed? HOT 6
- Is the package abandoned? HOT 1
- Issue with extract_tables function. Couldn't run the example: getRowCount HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from tabulapdf.