Giter Club home page Giter Club logo

text-mining-r's People

Contributors

brunj7 avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

text-mining-r's Issues

Not able to read in papers with Corpus()

When executing the following command in R from your tutorial

papers <- Corpus(URISource(pdf), readerControl = list(reader=pdfRead))

I get the following error

sh: pdfinfo: command not found
sh: pdftotext: command not found
Error in system2("pdftotext", c(control$text, shQuote(x), "-"), stdout = TRUE) :
error in running command

My pdf object looks like this:

pdf
[1] "arees-informatics-2006-reprint.pdf"
[2] "Borer et al 2009 Bull ESA_Effective Data Management.pdf"
[3] "Fegraus-esa_bulletin_eml_ms_07_2005.pdf"
[4] "Harris_2017_Environ._Res._Lett._12_024012.pdf"
[5] "Heidorn_2008_Shedding Light on the Dark Data in the Long Tail of Science.pdf"
[6] "MORTON_et_al-2008-Global_Change_Biology.pdf"
[7] "Ohara et al 2016_Aligning marine species range data to better serve science and
conservation.pdf"
[8] "peerj-preprints-549.pdf"

and my readPDF function object looks like this:

pdfRead
function (elem, language, id)
{
uri <- processURI(elem$uri)
meta <- pdf_info(uri)
content <- pdf_text(uri)
PlainTextDocument(content, meta$Author, meta$CreationDate,
meta$Subject, meta$Title, basename(elem$uri), language,
meta$Creator)
}
<environment: 0x110bd22a0>

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.