mhahsler / recommenderlab Goto Github PK
View Code? Open in Web Editor NEWrecommenderlab - Lab for Developing and Testing Recommender Algorithms - R package
recommenderlab - Lab for Developing and Testing Recommender Algorithms - R package
both, jaccard and cosine similarity work, but when I use pearson I get the following error:
> recc_model <- Recommender(data = R, method = "UBCF", parameter = list(method = "pearson"))
> recc_predicted <- predict(object = recc_model, newdata = R,n = 6)
Error in if (!is.null(attr(d, "method")) && tolower(attr(d, "method")) %in% :
missing value where TRUE/FALSE needed
Is there any efficient way (similar to the read.transaction() in arules package) that read a large rating csv file to a realratingmatrix object?
Hello
When building a recommendation model using a binaryRatingMatrix, the predicted ratings are always 1 when "weighted" instead of the weighted average.
This behavior is only in 0.2-6 (it worked fine in 0.2-4)
library(recommenderlab)
data("MovieLense")
MovieLense100 <- MovieLense[rowCounts(MovieLense) >100,]
MovieLense100 = binarize(MovieLense100,minRating=1)
train <- MovieLense100[1:50]
rec <- Recommender(train, method = "UBCF", list(weighted=TRUE))
pre <- predict(rec, MovieLense100[101:102], n = 10)
pre@ratings
Since I have read through Hybrid Recommender.R ,
and I saw there is missing the below in order to set the HYBRID as the recommender entry.
recommenderRegistry$set_entry(
method="HYBRID", dataType = "realRatingMatrix", fun= HybridRecommender,
description="Hybrid Recommender")
Also, besides this issue,
is the HybridRecommender only performs prediction (both rating and TopNList) and only able to evaluate the ratings only?
Is any chance/ plan that extending that to evaluation on topNList (ROC confusion matrix)?
Thank you.
Hi @mhahsler, a follow-up on the previous bug: when calling predict with type=ratingmatrix the model returns the actual ratings on the positions of known ratings instead of the ratings predicted by the model (the normalization issue seems fixed).
For example if my known ratings for user X are:
NA NA 1 NA 5 NA 5
The predictions are
2.58 2.54 1.00 3.87 5.00 2.59 5.00
Hope to hear from you!
when using binary data and IBCF creat a recommender,I want return a top-N list of recommendations for seven users,where N equals to 3.But for one user ,I just get two recommendations,for two users I get one recommendation,and for four users I get no recommendations.The code likes:
`library("recommenderlab")
#getData
logDataRead <- read.csv("logData.csv", header = TRUE, sep = ",", quote = """)
#add number of clicks as a column
#use ID
logDataPreID <- data.frame(logDataRead[c("UserID","ItemID")],rate = rep(1,length(logDataRead["UserUUID"])))
#coercion from data.frame to realRatingMatrix
logDataID <- as(logDataPreID, "realRatingMatrix")
logDataID
#show details
getRatingMatrix(logDataID)
#numbers of item for every user
rowCounts(logDataID)
#numbers of user for every item
colCounts(logDataID)
#return item clicked and rating for every user
as(logDataID, "list")
#change realRatingMatrix to binaryRatingMatrix
logDataID_b <- binarize(logDataID, minRating=1)
as(logDataID_b, "matrix")
as(logDataID_b, "list")
#show methods
recommenderRegistry$get_entries(dataType = "binaryRatingMatrix")
#IBCF item-based
RecoModel_b <- Recommender(logDataID_b, method = "IBCF")
getModel(RecoModel_b)
recomTopN <- predict(RecoModel_b, logDataID_b,n=3)
as(recomTopN, "list")
as(logDataID_b, "list")
as(logDataID, "list")
similarity(logDataID_b, method = "jaccard",which="items")
#UBCF user-based
RecoModel_b_UB <- Recommender(logDataID_b, method = "UBCF")
getModel(RecoModel_b_UB)
recomTopN_UB <- predict(RecoModel_b_UB, logDataID_b,n=3)
as(recomTopN_UB, "list")
as(logDataID_b, "list")
as(logDataID, "list")
#similarity between users
similarity(logDataID_b, method = "jaccard",which="users")
dissimilarity(logDataID_b, method = "jaccard",which="users")`
the data likes:
logDataPreID
UserID ItemID rate
1 u1 i1 1
2 u2 i2 1
3 u3 i3 1
4 u4 i4 1
5 u5 i5 1
6 u6 i6 1
7 u7 i7 1
8 u1 i2 1
9 u1 i5 1
10 u5 i2 1
11 u3 i7 1
the outcome of top-3 list using IBCF method likes:
as(recomTopN, "matrix")
i1 i2 i3 i4 i5 i6 i7
u1 NA NA NA NA NA NA NA
u2 0.3333333 NA NA NA 0.6666667 NA NA
u3 NA NA NA NA NA NA NA
u4 NA NA NA NA NA NA NA
u5 0.4166667 NA NA NA NA NA NA
u6 NA NA NA NA NA NA NA
u7 NA NA 0.5 NA NA NA NA
so for u2 ,I just get two recommendations--i5,i1,for two users(u5,u7) I get one recommendation(i1,i3),and for four users I get no recommendations.
However,when I use UBCF method,I can get top-3 list of recommendations for every user.
My English isn't good.I dont know whether I describe my problem clearly.
Can anyone tell me
1.The problem above is normal or abnormal. Is it a real problem?
2.The reason for the problem
3.the solutions for the problem
Hello,
I really admire your work with this package and it helps me greatly for my bachelor thesis. I’m not that experienced with Recommender Systems and put in a lot of effort in understanding the last month. Still I have a few questions and I would really appreciate it, if someone could provide me with answers:
Like I already said, I really would appreciate an answer to my questions but I understand that there is probably bigger fish to fry :)
Have a great day,
Daniel
The feature to supply a vector of top-N-list lengths is very nice. Not so much for efficiency but certainly for experimental consistency, it would be great to have a similar feature for the "given-x" (or "all-but-x") parameter x aka given
. One could easily guarantee that (i) the test users are the same for all x, and (ii) that all recommendations for xi are considered also when using xi+1 (with xi+1 > xi), by drawing all indices at once for the highest x, and then just subset them. In particular, this would give a faster converging estimate of the difference between the parametrizations, I presume.
Plotting could then be extended to feature also the x dimension: in the present setup, colours together with marker types define the method, but line types are the same for all curves. One could add this dimension through different line types within each method (i.e., within each line color) to visually argue that, for example, some methods provide more accurate forecasts than others - even when using less information on the users.
Moreover, I think it would be great if the "known"/"unknown" recommendations could be assigned by the user (or, even better, the distribution from which it is drawn). For example, this can be used to simulate a situation where some items are usually "consumed" very early in history, and thus should enter the "known" sample more often than they would appear following a uniform distribution. (In particular, I conjecture that the performance of RECOM_POPULAR is overestimated under uniform sampling in such a situation.) One idea to implement this would be quite simple through a realRatingMatrix
with the order of the elements to appear.
Any thoughts, ideas, or warnings?
The current implementation of RECOM_RANDOM will use excessive amounts of memory for predicting top-N-lists if many items exists. This is because a dense matrix is created for all (new) users and items, and is passed to returnRatings()
as a whole. If the resulting object is sparse (like, e.g., top-N-lists), this is inefficient from a memory usage point of view (because most of the non-NA entries are thrown away anyway, but it is ex ante unclear how many).
I have created a variant in my fork (gregreich/recommenderlab@85f3e62) which loops over (new) users, passing each row to returnRatings()
individually (to be consistent with the other RECOMs's return values), and stacking the results on top of each other (via a dgCMatrix
). This approach costs runtime (a factor of 4 to 5, see below), but seems to have constant memory usage (n
is number of new users here; number of items is about 250k).
Function_Call Elapsed_Time_sec Total_RAM_Used_MiB Peak_RAM_Used_MiB
1 predict [old, n=20] 3.495 971.4
2 predict [new, n=20] 17.502 750.8
3 predict [old, n=100] 18.185 1839.6
4 predict [new, n=100] 95.057 751.4
Maybe you can have a look and comment on it. Runtime efficiency should definitely improved; also, maybe the conversion from and to dgCMatrix
could be avoided, but I do not know how to concatenate to objects of type realRatingMatrix
directly.
A similar problem seems to exist for RECOM_POPULAR, but I haven't attempted that one yet.
I am a bit confused about the normalization of (False/True)-(Positive/Negative) rates output by getConfusionMatrix()
for the top N classification task.
I see that the *-Positive frequencies are normalized to the total number of users in the test-set. For instance, with 100 users and a fixed number N of recommendation per user we have:
What about the *-Negative frequencies? How are TN and FN computed? Sorry if this is obvious, but I cannot figure it out.
Thanks in advance,
Valerio
Hey there,
it's me again :). Now I'm having a problem with the Evaluation Scheme. I have some matrices and the evaluation works fine for the binary matrix (Size 158 MB). Now I also wanted to use it for my Real Rating Matrices (Size ~ 196MB) and the code runs endlessly and nothing happens. I also tried it on a more powerful computer and it ran for two hours and nothing (not even a error message). I tried split, cross-validation and it just wouldn't work. For reference: The binary matrix which is identic (with exception of the ratings numeric value) to the real rating matrices took only 10 minutes to calculate.
That's my code
getRatings(rrm)
normalize(rrm)
evaluation_scheme_rrm <- evaluationScheme(rrm,
method = "cross-validation",
train = 0.8,
k = 5,
given = -1,
goodRating = 1)
My specs:
Core i7 1165G7
64 GB RAM
512 GB SSD
Iris Xe Graphics
I really would appreciate help here. You can find the matrix attached.
Best regards,
Daniel
RealRatingMatrix.zip
I get the following error in 0.2-6 which I didn't see with 0.2-5:
Error in neighbors[, x] : incorrect number of dimensions
Calls: getRecos ... predict -> .local -> -> sapply -> lapply -> FUN
Execution halted
Hi, I would like to try other starting values for U and V in funkSVD. No matter what I do, tcrossprod(fsvd$U, fsvd$V)
results in very similar predictions for the missing values per row but that's not what it is supposed to do. I already tried many different parameter combinations but that didn't help. So I was hoping, other starting values (e.g. matrix(rnorm(nrow(x)*k),nrow = nrow(x), ncol = k)
might solve the problem. I tried to copy the function and initialize U and V with random numbers but R always crashes. Would it be possible to make the initialization of U and V a definable variable as well? Or do you have any other solution? Thanks.
I am using R 3.4.1. and the latest development version from github (also tested with the CRAN version).
As the title says, setting progress = FALSE in the evaluate function does not have an effect and evaluate still outputs progress for the folds in crossvalidation.
Hi @mhahsler, I want to make predictions using my trained Recommender on a test set (a realRatingMatrix instance). The issue is that I can only make useful predictions on unknown data. I want to predict known ratings in this test set for manual performance testing, but when setting type="ratingMatrix" for returning the full matrix (instead of only the predictions for the unknown ratings) I get predictions that don't make sense. See below:
ratingmatrix <- as(movielense, "realRatingMatrix")
train <- ratingmatrix[fold, ]
test <- ratingmatrix[-fold, ]
# train SVD model
svd_model <- Recommender(data = train, method = "SVD")
pred_test <- as(predict(object = svd_model, newdata = test, type = "ratingMatrix"), "matrix")`
When my test dataset looks for example like this:
NA NA NA 3 NA NA 4 NA NA NA
My predictions become
3.952 3.951 3.948 -0.948 3.948 3.947 0.051 3.948 3.948 3.948
Note how all predictions seem fine except the two known entries (one even becomes negative...). Is there a way to predict these correctly so I can manually evaluate my model on the known ratings? The only alternative I found was setting type="ratings", which excludes the known ratings completely from the predictions.
Thank you!
Hi,
I want to build a recommendation model by applying this package, however when I evaluate the performance of the model, I'm not quite understand the function "calcPredictionAccuracy".
Is the "goodRating" argument means the threshold that the the item would be recommended if the rating is higher than this threshold?
I used part of the example code in the document as follows:
I found that none of the prediction rating 'p' of the member u6662 is higher than the threshold, so I thought it means that no items would be recommended when evaluating, while the TP of this member is 28, and I'm confused about it.
Is there anyone knows how this thing work?
as the title dexcribes,I think, for "UBCF"algorithm, in "REAL_UBCF"function, "sum_s_uk <- colSums(s_uk)"I think should change to "sum_s_uk <- colSums(s_uk, na.rm=TRUE)"
I'm not sure whether it's a bug.
Hi, I had problems with installation of recommenderlab, almost 2 days it did not want to installed in Rstudio even manually downloaded. Now try to download and install every dependencies manually. I start with 'irlba' manually and then try to install recommenderlab again, sound it was installed,but Rstudio immediately come up with Fatal Error and end of Session! :( I rapid action and the same result. Can you please recommend me what to do? Thanks.
https://www.dropbox.com/s/d62ci0vujpdxkiu/Screenshot%202017-02-07%2000.11.41.png?dl=0
Inside normalize.R,
in line 94 - 96,
if(method_id==2) { ## Z-score
data@x <- data@x/rep(sds, colCounts(x))
}
Will this be a typo of the operator symbol division ("/")?
Since this is the function of normalize under Z-Score method.
while the line 83 - 85, are using multiply ("*")
if(method_id==2) { ## Z-Score
data@x <- data@x*rep(sds, rowCounts(x))
}
Thank you.
Minimum example code:
data(Groceries)
dat <- as(Groceries, "binaryRatingMatrix")
rec <- Recommender(dat, method = "AR",
parameter=list(support = 0.0005, conf = 0.5, maxlen = 5))
pred <- predict(rec, newdata=c(1), data = dat, n=30)
as(pred, "list")
If the newdata
argument in predict
function is a list of index of users in the training data, newdata <- data[newdata,, drop=FALSE]
will throw a warning drop not implemented for ratingMatrix!
.
Hello Michael,
Thank you for this amazing package! There seems to be a bug about sparse matrix multiplication within the predict() function. I'm working on the below code, which keeps reporting an error on the predict() function. any ideas about how to deal with it?
library(recommenderlab)
# simulate matrix with 1000 users and 100 movies
m <- matrix(nrow = 2000, ncol = 100)
# simulated ratings (5% of the data)
m[sample.int(100 * 2000, 10000)] <- ceiling(runif(1000, 0, 5))
# convert into a realRatingMatrix
r <- as(m, "realRatingMatrix")
# UBCF recommender
UB.Rec <- Recommender(r, method = "UBCF")
pred <- predict(UB.Rec, r, type = "ratings")
as(pred, "matrix")
sessioninfo
R version 4.2.1 (2022-06-23 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19043)
Matrix products: default
locale:
[1] LC_COLLATE=Chinese (Simplified)_China.utf8
[2] LC_CTYPE=Chinese (Simplified)_China.utf8
[3] LC_MONETARY=Chinese (Simplified)_China.utf8
[4] LC_NUMERIC=C
[5] LC_TIME=Chinese (Simplified)_China.utf8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] recommenderlab_1.0.2 registry_0.5-1 proxy_0.4-27 arules_1.7-5
[5] Matrix_1.5-1
loaded via a namespace (and not attached):
[1] compiler_4.2.1 generics_0.1.3 recosystem_0.5 tools_4.2.1
[5] float_0.3-0 Rcpp_1.0.9 grid_4.2.1 irlba_2.3.5.1
[9] matrixStats_0.62.0 lattice_0.20-45
After predicting the hybridRecommender results, user-id s can't display in the last output. There is only item-column names from X1 to X10 and item-id s in the columns. Is it possible to display user-ids in output? Or the function runs only in this way?
Hi Dr. Hahsler,
I noticed that for a 'binaryRatingMatrix', no matter what the type is (i.e. "topNList", "ratings"), BIN_UBCF() always returns a realRatingMatrix. Please see Line #74 in https://github.com/mhahsler/recommenderlab/blob/master/R/RECOM_UBCF.R.
For example, if I set results = evaluate(scheme, c("UBCF","IBCF"), type = "topNList")
I should get a topNList as results. I also noticed IBCF does return a topNList (line #66 - #68 in https://github.com/mhahsler/recommenderlab/blob/master/R/RECOM_IBCF.R)
Can you please tell me if this is an actual issue and can be fixed. I added the following lines before line #74 in RECOM_UBCF.R and the accuracy in case of a binaryRatingMatrix increases.
if(type=='topNList'){
top_N= (getTopNLists(ratings, n=ncol(ratings)))
top_N <- bestN(top_N, n)
return(top_N)
}
## create random ratings (Z-scores)
ratings <- matrix(rnorm(nrow(newdata)*ncol(newdata)),
nrow=nrow(newdata), ncol=ncol(newdata),
dimnames=list(NULL, model$labels))
Why NULL in dimnames, and not user ids?
as(ratings, "data.frame") fails without row names.
I have a ratings database of the user_id,item_id,rating style in R. If I use the whole thing or even anything larger than 1k users by 200 items as a base for a matrix my computer runs out of memory. Data.table manages to work on same dataset lightning quick. Can recommenderlab use such a dataset type directly without having to matrix it?
Hello, thank you for such an awesome library.
Can you help me to parallelize 'predict' function for get top N recommendation?
See reprex
below for a minimal example of the issue. I think it's a bug, but honestly I can't make heads or tails of what might be going on here. I poked through the code a little bit but didn't see anything obviously wrong -- I'm probably just missing something though. Is this a feature or a bug?
library(recommenderlab)
#> Loading required package: Matrix
#> Loading required package: arules
#>
#> Attaching package: 'arules'
#> The following objects are masked from 'package:base':
#>
#> abbreviate, write
#> Loading required package: proxy
#>
#> Attaching package: 'proxy'
#> The following object is masked from 'package:Matrix':
#>
#> as.matrix
#> The following objects are masked from 'package:stats':
#>
#> as.dist, dist
#> The following object is masked from 'package:base':
#>
#> as.matrix
#> Loading required package: registry
#> Registered S3 methods overwritten by 'registry':
#> method from
#> print.registry_field proxy
#> print.registry_entry proxy
train <- matrix(
## 5 users, 2 items
sample(c(0, 1), size = 10, replace = TRUE),
nrow = 5,
ncol = 2
)
## 5x2, as expected
dim(train)
#> [1] 5 2
train.brm <- as(train, "binaryRatingMatrix")
## This seems fine
train.brm@data
#> itemMatrix in sparse format with
#> 5 rows (elements/transactions) and
#> 2 columns (items)
## This also seems fine
dim(train.brm@data)
#> [1] 5 2
## Somehow this is broken
train.brm@data@data@Dim
#> [1] 2 5
Created on 2022-01-15 by the reprex package (v2.0.1)
thank you for mhahsler,i meet with difficulties
after tansform the data to realRatingMatrix,then i use predict function but the results as follow
[[1]]
character(0)
[[2]]
character(0)
[[3]]
character(0)
so i transform the data to binaryRatingMatrix,it can work. i don't know why
I'm new in R programming and i don't realy know much thing about it but i have to do a simple project. I need to make a something like in this site. https://www.r-bloggers.com/recommender-systems-101-a-step-by-step-practical-example-in-r/
(codes from that website)
1 - library("recommenderlab")
2 - data <- read.csv("collected_data.csv")
3 - affinity.data<-read.csv("collected_data.csv")
4 - affinity.matrix<- as(affinity.data,"realRatingMatrix")
This is where i can't pass.
After this, i get a data at the environment tab and types "Formal class realRatingMatrix" but i saw in another video, it types "Large class realRatingMatrix"
5 - Rec.model<-Recommender(affinity.matrix[1:5000], method = "UBCF")
Btw my .csv file has 1000 lines.
After this code, iget an error and it says "Error in intI(i, n = d[1], dn[[1]], give.dn = FALSE) :
index larger than maximal 4".
I need this programme immediately, i hope someone can answer it in this day.
Hi again,
the evaluation schemes work great now, but still I'm facing issues with my implicit dataset. I always get the same error when I try to apply the iALS algorithm on my evaluation Scheme.
I'm using the RRM with confidence values between 1 and 3. Matrix sparsity is around 99%, due to the sparse implicit feedback. The dimensions for the sample I use are: 651779 * 4694
I googled the error code and found this: https://stackoverflow.com/questions/58302449/what-does-the-cholmod-error-problem-too-large-means-exactly-problem-when-conv
So it propably has something to do with the number of columns. For a small sample with 100 entries it works.
I don't think it is a memory issue because the RAM-use doesn't spike up when I start evaluating.
algorithms_realRating <- list(
"ALS" = list(name = "ALS_implicit", param = NULL),
"LIBMF" = list(
name = "LIBMF",
param = list(
dim = 20,
costp_l2 = 0.1,
costq_l2 = 0.01,
nthread = 4,
verbose = FALSE
)
),
"SVDF" = list(name = "SVDF", param = NULL),
"SVD" = list(name = "SVD", param = NULL),
"random items" = list(name = "RANDOM")
)
es_td_rrm<- evaluationScheme(
rrm[1:600000],
method = "cross-validation",
train = 0.8,
k = 3,
given = -1,
goodRating = 1
)
results_ratings_rrm <- evaluate(es_td_rrm,
algorithms_realRating,
type = "ratings"
)
Thanks in advance for help. I have to admit I'm not a coding expert and just a bachelor student, which is the reason I have to rely on such great packages for my first hands-on experiences with Recommender Tasks.
Have a great day you all!
Edit: I tested on a different computer
So it propably is an RAM issue after all. So you can either close this issue or put it into enhancements, because it is propably possible for it to run without this sparse to dense matrix transition.
https://github.com/mhahsler/recommenderlab/files/9931355/RealRatingMatrix.zip
Hi.
Thank you for a great package.
I cannot get the output of "calcPredictionAccuracy" to make sense. Traning and evaluation of the recommender models works well and produces fine results.
However, when I use calcPredictionAccuracy I get only zero values in the TP columns. That does not align with the results I see when I train the model.
Can you shed some light over what I might be doing wrong, if anything? Reproducible example is provided below. See how the TP columns is consistently 0 even though the recommender itself should be performing well:
TP FP FN TN N precision recall TPR FPR
1 0 2 3 1 6 0 0 0 0.66
2 0 2 3 1 6 0 0 0 0.66
3 0 2 3 1 6 0 0 0 0.66
4 0 2 3 1 6 0 0 0 0.66
5 0 3 2 1 6 0 0 0 0.75
6 0 2 3 1 6 0 0 0 0.66
7 0 3 2 1 6 0 0 0 0.75
8 0 2 3 1 6 0 0 0 0.66
9 0 2 3 1 6 0 0 0 0.66
10 0 2 3 1 6 0 0 0 0.66
# load recommenderlab
library(recommenderlab)
# create a vector with binary entries
vec1 <- c(1,1,1,1,0,1,0,1,1,1)
vec2 <- c(0,0,0,0,1,0,1,0,0,0)
# write to dataframe
dat <- data.frame(
p1 = vec1,
p2 = vec2,
p3 = vec1,
p4 = vec2,
p5 = vec1
)
# convert to matrix
dat <- as.matrix(
dat
)
# create binary rating matrix
dat_train <- as(dat, "binaryRatingMatrix")
# define algorithms to test
algorithms <- list(
"Jaccard K1" = list(name="UBCF", param=list(nn=1, method = "Jaccard", verbose = T)),
"Jaccard K2" = list(name="UBCF", param=list(nn=10, method = "Jaccard", verbose = T)),
"Jaccard K3" = list(name="UBCF", param=list(nn=100, method = "Jaccard", verbose = T)),
"Most Popular" = list(name="POPULAR", param=NULL),
"Random" = list(name="RANDOM")
)
# set up evaluations scheme
es <- recommenderlab::evaluationScheme(
data = dat_train,
method = "cross-validation",
k = 5,
train = 0.7,
given = -1,
)
# evaluate
results <- recommenderlab::evaluate(
es,
algorithms,
type = "topNList",
n=seq(1,3,1)
)
# assess resutls
res <- avg(results)
res <- results[["Jaccard K2"]]
# assess results
plot(results, "prec/rec", annotate=2, legend="topleft")
plot(results, annotate=c(1,3), legend="bottomright")
# we choose jackard K1 settings
rec <- Recommender(
dat_train,
method = "UBCF",
parameter = list(
nn=10
)
)
getModel(rec)
# predict
pre <- predict(rec, dat_train, n = 3)
pre
# as list
as(pre, "list")
# se prediction acuracy on user level
acc <- calcPredictionAccuracy(
pre,
data = dat_train,
byUser = T,
given = -1
)
acc
It would be great if the RECOMs are stored when evaluated via evaluationScheme
, for example to record the outcome of an AR mining process, etc. My understanding is that evaluate(..., keepModel = TRUE)
would allow this; but it gives me an error:
AR run fold/sample [model time/prediction time]
...
[30.545sec/38.538sec] Error in (function (classes, fdef, mtable) :
unable to find an inherited method for function ‘getModel’ for signature ‘"numeric"’
RANDOM run fold/sample [model time/prediction time]
1 [0.912sec/3.366sec] Error in (function (classes, fdef, mtable) :
unable to find an inherited method for function ‘getModel’ for signature ‘"numeric"’
Warnmeldung:
In .local(x, method, ...) :
Recommender 'AR1' has failed and has been removed from the results!
Recommender 'RANDOM' has failed and has been removed from the results!
Hi @mhahsler , i'm curious to know is it possible to evaluate a hybridRecommender? I'm trying to search on the internet regarding this question but fail to find any.
There is a way to evaluate different methods (e.g AR, IBCF, POPULAR, RANDOM & UBCF), and with evaluate
function provided in recommenderlab
it's easily for user to compare the performance between methods.
Is there a way to evaluate a hybridRecommender
together with other methods?
# Below is the sample dataset
m <- matrix(sample(c(0,1), 50, replace=TRUE), nrow=5, ncol=10,
dimnames=list(users=paste("u", 1:5, sep=''),
items=paste("i", 1:10, sep='')))
# Convert matrix into binaryRatingMatrix
b <- as(m, "binaryRatingMatrix")
# Compute HybridRecommender
recom <- recommenderlab::HybridRecommender(
Recommender(b, method = "AR"),
Recommender(b, method = "IBCF"),
Recommender(b, method = "POPULAR"),
Recommender(b, method = "UBCF"),
weights = c(.25, .25, .25, .25))
# Set Evaluation Scheme
scheme <- evaluationScheme(b,
method="split",
train=0.9,
given=1)
# Set list of algorithms to evaluate
algorithms <- list(
"Association rules" = list(name = "AR"),
"Item-based CF" = list(name = "IBCF"),
"Popular items" = list(name = "POPULAR"),
"Random items" = list(name = "RANDOM"),
"User-based CF" = list(name = "UBCF"),
"Hybrid" = list(name = "recom")
)
# run algorithms, predict next n movies
results <- evaluate(scheme, algorithms, n=c(1, 3, 5, 10, 15, 20))
# Draw ROC curve
plot(results, annotate = 1:4, legend="bottomright")
Hope to hear from you!
hi there, is there a way to extract the resulting matrices (ie latent features) from the ALS implementation?
I think it would be useful to have summary()
methods for more classes, for example for evaluationScheme
and Recommender
. The problem is that saving the parametrisation of runs either has to be manually, or, if one want to query the objects directly to retrieve the parameters, there is quite some overhead because they store all the data with them (making load()
extremely slow).
For example, for RECOM_ARs it could contain the output of summary()
on the corresponding rules
object. This could then be stored in the evaluationResults object by default, to avoid having to store the full recommender using evaluate(..., keepModel=TRUE)
. For evaluationScheme
, it could contain the parameters used to set up the scheme (essentially everything but the data).
Any objections?
The MovieLens 100k dataset comes with some side information about the users. This package has the ratings and item side information (title, year, genres) already - would be nice if it also included the user's side info.
Hi there,
I teach a class that involves a few lessons using RecommenderLab, and my students and I are encountering an error when using predict
with user-based CF: Error in neighbors[, x] : incorrect number of dimensions.
I thought it might be due to sparseness of the data, so I encouraged my students to try out different options for nearest neighbors. However, the error continued.
I re-ran code that as of November 2019 executed UBCF without issue and it now fails. Is this error a result of a new update?
Here is my data:
BDS-W12-fullrecommender-DataSet.txt
, and here is my code that produces the error:
dat <- read.csv("BDS-W12-fullrecommender-DataSet.csv", header=T)
dat <- as.data.frame(dat)
dat2 <- dat %>%
dplyr::select( userID,placeID,rating)
dat3<-as(dat2, "realRatingMatrix")
e2 <- evaluationScheme(dat3, method="split", train=.7, given=2) # small 'given' because some users have few ratings
r2 <- HybridRecommender( Recommender(getData(e2, "train"), "IBCF"), Recommender(getData(e2, "train"), "UBCF"))
r2
# prediction, "e2, known" signifies the 2 ratings in the test dataset.
p2 <- predict(r2, getData(e2, "known"), type="ratings")
Hello Prof. Hahsler,
I wish to trial the ALS algorithm on my dataset which is composed of implicit-feedback data for product purchases. I see that the recommenderlab package appears to support an appropriate implementation of as Koren et al. ("Collaborative Filtering for Implicit Feedback Datasets") within the recommenderRegistry however I'm encountering errors and I'm unsure if my approach is incorrect or unsupported by the software..
To illustrate these errors please refer to my example using the MovieLens data which follows. I'm a little confused at these errors because:
the binaryRatingMatrix would appear to be supported in ALS_implicit as per the documentation, and
The the calcPredictionAccuracy does not appear to return a correct result for the 'topNList, binaryRatingMatrix' signature dspite being referenced within the documentation.
Any assistance you could provide to clarify my queries would be greatly appreciated.
Kind regards,
Michael
library(recommenderlab)
library(arules)
# import MovieLens data and transform to binaryRatingMatrix
data(MovieLense)
data.bin = binarize(MovieLense, minRating=1)
# create recommender object using AL_implicit
r = Recommender(data.bin[1:500], method = "ALS_implicit",
parameter = list(lambda=0.1, n_factors=10,
n_iterations=10, seed = NULL, verbose = TRUE))
recom = predict(r, data.bin[501:502], n=7)
recom_topNList = predict(r, newdata = data.bin[501:502,], type = "topNList", n = 7)
as(recom_topNList, "list")
# create evaluation scheme
scheme <- evaluationScheme(data.bin[1:500], method="split", train=0.9, given=-5)
# list available methods for implicit data
recommenderRegistry$get_entries(dataType = "binaryRatingMatrix")
# form list of algorithms supporting implicit data
algorithms = list("random items" = list(name="RANDOM", param=NULL),
"popular items" = list(name="POPULAR", param=NULL),
"user-based CF" = list(name="UBCF", param=list(nn=50)),
"item-based CF" = list(name="IBCF", param=list(k=50)),
"ALS Implicit" = list(name="ALS_implicit", param=list(lambda=0.1, n_factors=10,
n_iterations=10, seed = NULL, verbose = TRUE)),
"Association Rules" = list(name="AR", param=NULL))
# output results of evaluation
results = evaluate(scheme, algorithms)
# ^^^ the above line produces the following error
# Error in matrix2[only_new_users, , drop = FALSE] :
# invalid or not-yet-implemented 'Matrix' subsetting
# calculate metrics for algorithms
accuracy_table <- function(scheme, algorithm, parameter){
r <- Recommender(getData(scheme, "train"), algorithm, parameter = parameter)
p <- predict(r, getData(scheme, "known"), type="ratings")
acc_list <- calcPredictionAccuracy(p, getData(scheme, "unknown"))
total_list <- c(algorithm =algorithm, acc_list)
total_list <- total_list[sapply(total_list, function(x) !is.null(x))]
return(data.frame(as.list(total_list)))
}
# calculate accuracy metrics
table_random <- accuracy_table(scheme, algorithm = "RANDOM", parameter = NULL)
table_ubcf <- accuracy_table(scheme, algorithm = "UBCF", parameter = list(nn=50))
table_ibcf <- accuracy_table(scheme, algorithm = "IBCF", parameter = list(k=50))
table_pop <- accuracy_table(scheme, algorithm = "POPULAR", parameter = NULL)
table_ALS_implicit <- accuracy_table(scheme, algorithm = "ALS_implicit",
parameter = list(lambda=0.1, n_factors=10,
n_iterations=10, seed = NULL, verbose = TRUE))
# ^^^ the calcPredictionAccuracy does not appear to return a correct result for the
'topNList,binaryRatingMatrix' signature
# report metrics
rbind(table_random, table_pop, table_ubcf, table_ibcf, table_ALS_implicit)
# plot ROC and precicion/accuracy graphs
plot(results, annotate=c(1,3), legend="topright")
plot(results, "prec/rec", annotate=3, legend="topleft")
I was trying to do cross validation with IBCF model.
Is there a specific reason that we should use "known" data in predict() function as below,can't we use "unknown" data, I was getting NA
predict(eval_ib,newdata=getData(eval_sets,"known"),n=5,type="ratings")
While calculating model accuracy by user, was getting NA for one user, is there a specific reason for that.
calcPredictionAccuracy(x=eval_pred,data=getData(eval_sets,"unknown"),byUser=TRUE)
How can I do grid search for the parameters like 'nn' in UBCF and 'k' in IBCF?
Hi @mhahsler, I'm current running R-64bit (version 3.4.4) on windows machine with recommenderlab version 0.2-2 . I got an issue when predict recommendation "topNList" using "HybridRecommender" on "binaryRatingMatrix". Below are my code:-
# Below is the sample dataset
m <- matrix(sample(c(0,1), 50, replace=TRUE), nrow=5, ncol=10,
dimnames=list(users=paste("u", 1:5, sep=''),
items=paste("i", 1:10, sep='')))
# Convert matrix into binaryRatingMatrix
b <- as(m, "binaryRatingMatrix")
# Compute HybridRecommender
system.time(
recom <- recommenderlab::HybridRecommender(
Recommender(b, method = "AR"),
Recommender(b, method = "IBCF"),
Recommender(b, method = "POPULAR"),
Recommender(b, method = "UBCF"),
weights = c(.25, .25, .25, .25))
)
# Compute predicted recommendation items "topNList" (return error)
getList(predict(recom, 1:5, data = b, type = "topNList", n = 5, ))
Error in match.arg(type) : 'arg' should be one of “topNList”
In addition: Warning message:
In data[newdata, , drop = FALSE] : drop not implemented for ratingMatrix!
# Load sample dataset
data(Jester5k)
# check dataset class
class(Jester5k)
[1] "realRatingMatrix"
attr(,"package")
[1] "recommenderlab"
# Compute HybridRecommender
system.time(
recom2 <- HybridRecommender(
Recommender(Jester5k, method = "POPULAR"),
Recommender(Jester5k, method = "IBCF"),
Recommender(Jester5k, method = "SVDF"),
Recommender(Jester5k, method = "UBCF"),
weights = c(.25, .25, .25, .25))
)
# Predict recommendation (works well with no prediction issue)
getList(predict(recom2, 1:5, data = Jester5k, type = "topNList", n = 5))
[[1]]
[1] "j84" "j85" "j83" "j82" "j81"
[[2]]
[1] "j89" "j93" "j76" "j81" "j88"
[[3]]
character(0)
[[4]]
character(0)
[[5]]
[1] "j80" "j81" "j100" "j72" "j89"
Warning message:
In data[newdata, , drop = FALSE] : drop not implemented for ratingMatrix!
Hope to hear from you!
Dear Sir, I have the trouble to find out all the methords we can use in Recommender(), function and advantages, disadvantages.
could you let me know how to find it? thank you!
Dear Sir,
I am using "binaryRatingMatrix" as input. and using
seg_pre_ubcf<-Recommender(method = "UBCF", param=list(method="Cosine")) %>%
recommenderlab::predict(seg_pred_matrix, n = 20)
to pipe the result.
then I can use as(seg_pre_ubcf, "list") to check the result. (produce the recomendation for each cust_id)
in the result, i have a list of item_id for a specific custmoer_id, but I am not sure, is the first item_id is the best recommendation from UBCF model for this customer_id (I having n=20, is the first item_id means it rank as the first, all the all 20 item_id having same probability/similarity score to the specific customer? )
could I use existing function to reformat it?
I did not find one then I creat my own lop, as
seg2df <- function(top_list){
tmp <- as(top_list, "list")
n <- length(names(tmp))
out <- tibble(rownum = NA, item_id = NA)
for (i in 1:n){
list_item <- tibble(rownum = i,
item_id= tmp[[i]])
out <- rbind(out, list_item)
}
out <- out %>%
group_by(FK_Customer_No) %>%
mutate(rank = row_number()) %>%
ungroup()
return(out)
}
any suggestion? input? thank you!
Hello Michael
Thank you for a great package! I am using recommenderlab with my students since it nicely offers to experiment with many of the concepts published in the literature.
In the students' challenge we are struggling with the fact, that for the MovieLense data set no negative Item-Item similarities result when applying an IBCF recommender with "cosine" on user-mean centered ratings.
Most likely this is due to proxy::dist() but we would be more than happy if you could shed some light over what we might be doing wrong, if anything.
We have included a reproducible example below.
Minimal example
# Movie-to-Movie cosine similarity on user-centered MovieLense ratings
# Negative similarities seem to be mapped to positive similarities
library(recommenderlab)
library(tidyverse)
# minimal example
x <- rbind(c(1,-1),c(-1,1))
rating <- as(matrix(x, ncol = 2), "realRatingMatrix")
ibcf <- Recommender(rating, method = 'IBCF',
parameter = list(method = 'cosine', k = dim(MovieLense)[2],
normalize = NULL, na_as_zero = FALSE))
as(ibcf@model$sim, "matrix")[1,2]
# --> 1 is not correct
proxy::dist(x = x, y = x, method = "cosine")[1,2]
# --> 0 is not correct
lsa::cosine(x[1,], x[2,])
# --> -1 is correct
Movie Lense example
A) All movies considered
--> no negative similarities
# MovieLense example
data("MovieLense")
# @k: no neighborhood constraint
ibcf <- Recommender(MovieLense, method = 'IBCF',
parameter = list(method = 'cosine', k = dim(MovieLense)[2],
normalize = "center", na_as_zero = FALSE))
# similarity distribution
summary(as.vector(as(ibcf@model$sim, "matrix")))
# Min. 1st Qu. Median Mean 3rd Qu. Max.
# 0.0000 0.0000 0.2668 0.4067 0.8992 1.0000
# --> no negative similarities
B) Two movies selected
--> similarities match up to negative sign
# Example: "Money Train (1995)") vs "One Flew Over the Cuckoo's Nest (1975)"
# extract similarity
similarity1 <- as(ibcf@model$sim, "matrix")[536, 354]
# normalize ratings
center_MovieLense <- normalize(MovieLense, method = "center")
centered_ratings_twomovies <- as(center_MovieLense@data, "matrix")[, c(536, 354)]
# mimic nas_as_zero = FALSE
centered_ratings_twomovies[centered_ratings_twomovies[,1] == 0 | centered_ratings_twomovies[,2] == 0] <- NA
centered_ratings_twomovies[is.na(centered_ratings_twomovies)] <- 0
similarity2 <- as.numeric(lsa::cosine(centered_ratings_twomovies[,1], centered_ratings_twomovies[,2]))
paste0("cosine similarity recommenderlab: ", similarity1)
# --> "cosine similarity recommenderlab: 0.660194486849794"
paste0("cosine similarity lsa: ", similarity2)
# --> "cosine similarity lsa: -0.660194486849794"
# --> similarities match up to negative sign
# check
# which(MovieLense@data@Dimnames[[2]] == "Money Train (1995)")
# which(MovieLense@data@Dimnames[[2]] == "One Flew Over the Cuckoo's Nest (1975)")
# which(ibcf@model$sim@Dimnames[[1]] == "Money Train (1995)")
# which(ibcf@model$sim@Dimnames[[1]] == "One Flew Over the Cuckoo's Nest (1975)")
Session Info
sessionInfo()
# R version 4.0.2 (2020-06-22)
# Platform: x86_64-w64-mingw32/x64 (64-bit)
# Running under: Windows 10 x64 (build 18363)
#
# Matrix products: default
#
# locale:
# [1] LC_COLLATE=German_Switzerland.1252 LC_CTYPE=German_Switzerland.1252 LC_MONETARY=German_Switzerland.1252 LC_NUMERIC=C
# [5] LC_TIME=German_Switzerland.1252
#
# attached base packages:
# [1] stats graphics grDevices utils datasets methods base
#
# other attached packages:
# [1] forcats_0.5.0 stringr_1.4.0 dplyr_1.0.2 purrr_0.3.4 readr_1.4.0 tidyr_1.1.2 tibble_3.1.0
# [8] ggplot2_3.3.2 tidyverse_1.3.0 recommenderlab_0.2-7 registry_0.5-1 proxy_0.4-24 arules_1.6-6 Matrix_1.2-18
#
# loaded via a namespace (and not attached):
# [1] Rcpp_1.0.5 lubridate_1.7.9 lattice_0.20-41 assertthat_0.2.1 digest_0.6.25 utf8_1.1.4 lsa_0.73.2 R6_2.4.1
# [9] cellranger_1.1.0 backports_1.1.10 reprex_0.3.0 evaluate_0.14 httr_1.4.2 pillar_1.5.1 rlang_0.4.10 readxl_1.3.1
# [17] rstudioapi_0.13 recosystem_0.4.3 irlba_2.3.3 blob_1.2.1 rmarkdown_2.4 labeling_0.3 munsell_0.5.0 broom_0.7.5
# [25] compiler_4.0.2 modelr_0.1.8 xfun_0.18 pkgconfig_2.0.3 htmltools_0.5.0 tidyselect_1.1.0 fansi_0.4.1 crayon_1.3.4
# [33] dbplyr_1.4.4 withr_2.3.0 SnowballC_0.7.0 grid_4.0.2 jsonlite_1.7.1 gtable_0.3.0 lifecycle_0.2.0 DBI_1.1.0
# [41] magrittr_2.0.1 scales_1.1.1 cli_2.3.1 stringi_1.5.3 farver_2.0.3 fs_1.5.0 xml2_1.3.2 ellipsis_0.3.1
# [49] generics_0.1.0 vctrs_0.3.4 tools_4.0.2 glue_1.4.2 hms_0.5.3 yaml_2.2.1 colorspace_1.4-1 rvest_0.3.6
# [57] knitr_1.30 haven_2.3.1
Best, Daniel
The issues are with these lines:
sum_s_uk <- colSums(s_uk, na.rm=TRUE)
## calculate the weighted sum
r_a_norms <- sapply(1:nrow(newdata), FUN=function(i) {
## neighbors ratings of active user i
r_neighbors <- as(model$data[neighbors[,i]], "dgCMatrix")
drop(as(crossprod(r_neighbors, s_uk[,i]), "matrix"))
})
ratings <- t(r_a_norms)/sum_s_uk
If training data contain NAs, r_a_norms doesn't count them, but sum_s_uk still contains 'proximity' to all neighbors, which leads to normalization by a larger number than it should be ==> sum_s_uk should be a matrix, not a vector.
sum_s_uk <- colSums(s_uk, na.rm=TRUE); # <--This line could be replaced with these:
sum_s_uk.perPerson = sapply(1:nrow(newdata), FUN=function(i) {
d_neighbors=s_uk[,i];
r_neighbors=as(model$data[neighbors[,i]], "matrix");
r_neighbors[!is.na(r_neighbors)] = 1;
dr_neighbours =r_neighbors* d_neighbors;
return(apply(dr_neighbours, 2, sum, na.rm = T));
});
sum_s_uk.perPerson[sum_s_uk.perPerson==0] = 1; # <== This is a quick&dirty solution to the case of no ratings (all are NAs for all neighbors) translating in 0; We want to avoid dividing by 0, and it doesn't matter what we replace it with, either 1 or whatever else non-zero and non-infinity, because the corresponding r_a_norms will 0 anyways.
And this line
ratings <- t(r_a_norms)/sum_s_uk
Could be replaced with this:
ratings <- t(r_a_norms/sum_s_uk.perPerson);
When newdata are the same as training data (i.e., we want to fill in missing ratings in the original training user-item matrix), UBCF (maybe IBCF, I didn't check it) counts the active user as a part of their neighborhood. The code then generates new estimates for the ratings that we already have (i.e., for the ratings of the active user that were in the training matrix) - I'd expect the original ratings to be copied into the output matrix (instead of being generated based on the neighborhood).
The binaryRatingMatrix I'm working with contains several rows with just 1 rating per user. I developed a custom UBCF that takes users' metadata to still be able to predict, even with 0 given ratings. When testing the method with given all-but-one ratings, I noticed a bug within .splitKnownUnknown() (called by evaluationScheme) that assigns nothing to the unknown split.
With given all-but-one there should be at least 1 rating per row.
Here is a reproducible example:
set.seed(2100)
#parameters of constructed binaryRatingMatrix
nr_users <- 400
nr_items <- 500
sparsity <- 0.004
#create random binaryRatingMatrix
data <- new("binaryRatingMatrix",
data=as(t(as(Matrix(rbinom(nr_users*nr_items , 1 , sparsity),
nrow=nr_users, ncol=nr_items, sparse = TRUE),
"ngCMatrix")), "itemMatrix"))
#select rows with min. 1 rating/user
min_1_item <- data[rowCounts(data) >= 1]
#inspect nr of ratings/user
table(rowCounts(min_1_item))
1 2 3 4 5 6 7 8
120 88 74 47 12 3 1 1
#Evaluation scheme with given-protocol all-but-one
es <- evaluationScheme(min_1_item, method='split', train=0.8, given=-1)
train_set <- getData(es, type='train')
known <- getData(es, type='known')
unknown <- getData(es, type='unknown')
unknown #should have at least as many ratings as rows
70 x 500 rating matrix of class ‘binaryRatingMatrix’ with 49 ratings.
#too few ratings
####
The problem lies in the sample() function, which returns "integer(0)" when there is nothing to sample from. For a user with 1 rating, there are 0 given/known entries and 1 outcome/unknown entries.
Below you find my workaround within splitKnownUnknown(). Feel free to use it for the fix:
setMethod(".splitKnownUnknown_fixed", signature(data="binaryRatingMatrix"),
function(data, given) {
## given might of length one or length(data)
if(length(given)==1) given <- rep(given, nrow(data))
nitems <- rowCounts(data)
allBut <- given < 0
if(any(allBut)) {
given[allBut] <- nitems[allBut] + given[allBut]
}
if(any(given>nitems)) stop("Not enough ratings for user" ,
paste(which(given>nitems), collapse = ", "))
l <- getList(data, decode=FALSE) #item labels of ratings
known_index <- lapply(1:length(l),
FUN = function(i) {
if(given[i] == 0) 0
else sample(1:length(l[[i]]), given[i])
}
) #FIXED: sample() returns integer(0) for row with 0 known/1 unknown
#added due to integer(0) problem
unknown_index <- lapply(1:length(l),
FUN = function(i) {
if(given[i] == 0) 1
else -known_index[[i]]
}
)
#define KNOWN ratings
known <- encode(
lapply(1:length(l), FUN = function(x)
l[[x]][known_index[[x]]]),
itemLabels = itemLabels(data@data)
)
rownames(known) <- rownames(data)
#define UNKNOWN ratings
unknown <- encode(
lapply(1:length(l), FUN = function(x)
l[[x]][unknown_index[[x]]]),
itemLabels = itemLabels(data@data))
rownames(unknown) <- rownames(data)
known <- new("binaryRatingMatrix", data = known)
unknown <- new("binaryRatingMatrix", data = unknown)
list(
known = known,
unknown = unknown
)
})
I'm using recommenderlab on a binaryratingmatrix evaluating an item based CF recommender. I noticed that the minumum topN is ten whichever n I set and I wonder whether this is an error.
recc_model.ibcf <- Recommender(data = data_train, method = "IBCF",parameter = list(method = "Jaccard",k=4)) #item based with 4 items
model_details <- getModel(recc_model.ibcf)
n_recommended <- 2L
recc_predicted <- predict(object = recc_model.ibcf, newdata = data_test, n = n_recommended)
recc_user_1 <- recc_predicted@items[[1]]
ard_user_1 <- recc_predicted@itemLabels[recc_user_1]
ard_user_1
db4RS.zip
The attached zip files contains an rdata where train_data and test_data are saved
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.