Comments (5)
Dump everything out to temp rds and read back to the clusters...add a library arg
from sentimentr.
Initial attempts leads to error on Windows (parallel seems to be using an old version of R and throws an error with regard to Rcpp being the wrong version fixed this by using newer version of R on path but now an error related to sentimentr indicating still an old version???). Maybe need to remove all R from path??
if (!require("pacman")) install.packages("pacman")
pacman::p_load(sentimentr, parallel, textshape, dplyr)
chunk_size <- 1e5
dir.create('data')
dat <- combine_data() %>%
{.[rep(seq_len(nrow(.)), 100),]} %>%
sample_n(nrow(.)) %>%
split_index({inds <- chunk_size * 1:round(nrow(.)/chunk_size, 0); inds[inds < nrow(.)]})
tic <- Sys.time()
cl <- makeCluster(mc <- getOption("cl.cores", detectCores() - 2))
clusterEvalQ(cl, {
library(sentimentr)
library(lexicon)
})
parLapply(cl, dat, function(x){
gc()
senti_dat <- sentimentr::get_sentences(x)
senti_dat <- sentimentr::sentiment_by(senti_dat)
outfile <- sprintf('data/file_%s.rds', sample(1:100000))
saveRDS(senti_dat, outfile)
}) %>%
invisible()
stopCluster(cl)
Sys.time() - tic
Results in:
Error in checkForRemoteErrors(val) :
6 nodes produced errors; first error: 'get_sentences' is not an exported object from 'namespace:sentimentr'
from sentimentr.
http://appliedpredictivemodeling.com/blog/2018/1/17/parallel-processing
Is either of the following a better way to run parallel code:
https://github.com/r-lib/callr
https://github.com/r-lib/processx
A OS independent solution is needed. Re investigate available solutions and reach out to the R community for current best practices.
from sentimentr.
Here's where I ask the R community: https://twitter.com/tylerrinker/status/1044364197797265408
- https://github.com/HenrikBengtsson/future recommended by Julia Silge
- https://github.com/DavisVaughan/furrr recommended by Garrett Mooney
from sentimentr.
Some other packages:
- https://cran.r-project.org/package=snow
- https://cran.r-project.org/package=pbdMPI
Futures looks easiest to use, but MPI has a long history of support. A tutorial with some relevant further reading:
https://towardsdatascience.com/getting-started-with-parallel-programming-in-r-d5f801d43745
from sentimentr.
Related Issues (20)
- Any hint why the two emoji approaches are different and in what circumstance which one is better?! HOT 1
- Amplifiers
- polarity_dt requires words with spaces to work HOT 1
- Sentimentr split words when there is an accented character HOT 5
- Totally different results in Qdap polarity and sentiment r HOT 4
- Changing polarity of words in the dictionary HOT 2
- Question about 'highlight' HOT 1
- how to calculate sum of sentiment? HOT 1
- Question about how the dictionary was built HOT 1
- .mygsub is slow HOT 8
- Transparency for lexicon::hash_sentiment_jockers_rinker HOT 1
- What is the theoretical range for the "sentiment" variable? HOT 1
- I wonder why this sentence is recognized as positive HOT 2
- Polarity categorization? HOT 1
- relabel "black" and "white" as neutral sentiments (black is currently negative; white is labeled positive)
- Overall positive and negative scores given a piece of text
- Positive and Negative Sentiment based on German Language text HOT 1
- how to get pos, neg, neu other than ave_sentiment ?
- plot.sentiment_by ordered fix?
- N-CSR SIC code error
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from sentimentr.