mhahsler / recommenderlab Goto Github PK

View Code? Open in Web Editor NEW

208.0 16.0 61.0 86.04 MB

recommenderlab - Lab for Developing and Testing Recommender Algorithms - R package

R 55.99% Perl 0.27% Shell 0.24% TeX 43.50%

collaborative-filtering recommender-system

recommenderlab's People

Contributors

Stargazers

Watchers

recommenderlab's Issues

not able to use pearson similarity

both, jaccard and cosine similarity work, but when I use pearson I get the following error:

> recc_model <- Recommender(data = R, method = "UBCF", parameter = list(method = "pearson"))
> recc_predicted <- predict(object = recc_model, newdata = R,n = 6)
Error in if (!is.null(attr(d, "method")) && tolower(attr(d, "method")) %in%  : 
  missing value where TRUE/FALSE needed

Efficient way to read a large rating csv file to realratingmatrix object

Is there any efficient way (similar to the read.transaction() in arules package) that read a large rating csv file to a realratingmatrix object?

UBCF returns 1 as rating for predicted values with the weighted flag on binaryRatingMatrix

Hello

When building a recommendation model using a binaryRatingMatrix, the predicted ratings are always 1 when "weighted" instead of the weighted average.

This behavior is only in 0.2-6 (it worked fine in 0.2-4)

library(recommenderlab)
data("MovieLense")
MovieLense100 <- MovieLense[rowCounts(MovieLense) >100,]
MovieLense100 = binarize(MovieLense100,minRating=1)


train <- MovieLense100[1:50]
rec <- Recommender(train, method = "UBCF", list(weighted=TRUE))

pre <- predict(rec, MovieLense100[101:102], n = 10)
pre@ratings

Is HybridRecommender.R missing some commands

Since I have read through Hybrid Recommender.R ,
and I saw there is missing the below in order to set the HYBRID as the recommender entry.

recommenderRegistry$set_entry(
  method="HYBRID", dataType = "realRatingMatrix", fun= HybridRecommender,
  description="Hybrid Recommender")

Also, besides this issue,
is the HybridRecommender only performs prediction (both rating and TopNList) and only able to evaluate the ratings only?
Is any chance/ plan that extending that to evaluation on topNList (ROC confusion matrix)?

Thank you.

Predict ratingmatrix

Hi @mhahsler, a follow-up on the previous bug: when calling predict with type=ratingmatrix the model returns the actual ratings on the positions of known ratings instead of the ratings predicted by the model (the normalization issue seems fixed).

For example if my known ratings for user X are:

NA NA 1 NA 5 NA 5

The predictions are

2.58 2.54 1.00 3.87 5.00 2.59 5.00

Hope to hear from you!

when using IBCF method for top-N list,I only get n outcome(n is less than N) for some users

when using binary data and IBCF creat a recommender，I want return a top-N list of recommendations for seven users,where N equals to 3.But for one user ,I just get two recommendations,for two users I get one recommendation,and for four users I get no recommendations.The code likes:
`library("recommenderlab")
#getData
logDataRead <- read.csv("logData.csv", header = TRUE, sep = ",", quote = """)
#add number of clicks as a column
#use ID
logDataPreID <- data.frame(logDataRead[c("UserID","ItemID")],rate = rep(1,length(logDataRead["UserUUID"])))
#coercion from data.frame to realRatingMatrix
logDataID <- as(logDataPreID, "realRatingMatrix")
logDataID
#show details
getRatingMatrix(logDataID)

#numbers of item for every user
rowCounts(logDataID)
#numbers of user for every item
colCounts(logDataID)

#return item clicked and rating for every user
as(logDataID, "list")

#change realRatingMatrix to binaryRatingMatrix
logDataID_b <- binarize(logDataID, minRating=1)
as(logDataID_b, "matrix")
as(logDataID_b, "list")
#show methods
recommenderRegistry$get_entries(dataType = "binaryRatingMatrix")

#IBCF item-based
RecoModel_b <- Recommender(logDataID_b, method = "IBCF")
getModel(RecoModel_b)

recomTopN <- predict(RecoModel_b, logDataID_b,n=3)
as(recomTopN, "list")
as(logDataID_b, "list")
as(logDataID, "list")

similarity(logDataID_b, method = "jaccard",which="items")

#UBCF user-based
RecoModel_b_UB <- Recommender(logDataID_b, method = "UBCF")
getModel(RecoModel_b_UB)

recomTopN_UB <- predict(RecoModel_b_UB, logDataID_b,n=3)
as(recomTopN_UB, "list")
as(logDataID_b, "list")
as(logDataID, "list")
#similarity between users
similarity(logDataID_b, method = "jaccard",which="users")

dissimilarity(logDataID_b, method = "jaccard",which="users")`

the data likes:
logDataPreID
UserID ItemID rate
1 u1 i1 1
2 u2 i2 1
3 u3 i3 1
4 u4 i4 1
5 u5 i5 1
6 u6 i6 1
7 u7 i7 1
8 u1 i2 1
9 u1 i5 1
10 u5 i2 1
11 u3 i7 1

the outcome of top-3 list using IBCF method likes:
as(recomTopN, "matrix")
i1 i2 i3 i4 i5 i6 i7
u1 NA NA NA NA NA NA NA
u2 0.3333333 NA NA NA 0.6666667 NA NA
u3 NA NA NA NA NA NA NA
u4 NA NA NA NA NA NA NA
u5 0.4166667 NA NA NA NA NA NA
u6 NA NA NA NA NA NA NA
u7 NA NA 0.5 NA NA NA NA

so for u2 ,I just get two recommendations--i5,i1,for two users(u5,u7) I get one recommendation(i1,i3),and for four users I get no recommendations.
However,when I use UBCF method,I can get top-3 list of recommendations for every user.
My English isn't good.I dont know whether I describe my problem clearly.
Can anyone tell me
1.The problem above is normal or abnormal. Is it a real problem?
2.The reason for the problem
3.the solutions for the problem

Implementation of eALS

Hello,

I really admire your work with this package and it helps me greatly for my bachelor thesis. I’m not that experienced with Recommender Systems and put in a lot of effort in understanding the last month. Still I have a few questions and I would really appreciate it, if someone could provide me with answers:

I‘m mainly working with implicit Data, which means that I want to use mainly ALS_implicit_real_Rating_Matrix. Can this matrix include Ratings something like that: (1) if customer viewed product, (2) if customer added product to cart, (3) if customer purchased product?
I’m wondering if I can implement the logic behind eALS. eALS weights non existent ratings by the item popularity which reduces the sparsity and therefore increases the performance. Is it possible to edit the RealRatingMatrix like that and can the underlying algorithm handle this data input or is it required that either eALS is implemented in the recommenderlab package or I have to build it myself or switch to Python?
There isn’t an option to tune parameters right? So I have to do this through recosystem and then implement it in parameter options right?

Like I already said, I really would appreciate an answer to my questions but I understand that there is probably bigger fish to fry :)

Have a great day,

Daniel

Extensions for sampling "known"/"unknown" recommendations in test set

The feature to supply a vector of top-N-list lengths is very nice. Not so much for efficiency but certainly for experimental consistency, it would be great to have a similar feature for the "given-x" (or "all-but-x") parameter x aka given. One could easily guarantee that (i) the test users are the same for all x, and (ii) that all recommendations for x_i are considered also when using x_i+1 (with x_i+1 > x_i), by drawing all indices at once for the highest x, and then just subset them. In particular, this would give a faster converging estimate of the difference between the parametrizations, I presume.

Plotting could then be extended to feature also the x dimension: in the present setup, colours together with marker types define the method, but line types are the same for all curves. One could add this dimension through different line types within each method (i.e., within each line color) to visually argue that, for example, some methods provide more accurate forecasts than others - even when using less information on the users.

Moreover, I think it would be great if the "known"/"unknown" recommendations could be assigned by the user (or, even better, the distribution from which it is drawn). For example, this can be used to simulate a situation where some items are usually "consumed" very early in history, and thus should enter the "known" sample more often than they would appear following a uniform distribution. (In particular, I conjecture that the performance of RECOM_POPULAR is overestimated under uniform sampling in such a situation.) One idea to implement this would be quite simple through a realRatingMatrix with the order of the elements to appear.

Any thoughts, ideas, or warnings?

Memory efficiency of RECOM_RANDOM (and, possibly, RECOM_POPULAR)

The current implementation of RECOM_RANDOM will use excessive amounts of memory for predicting top-N-lists if many items exists. This is because a dense matrix is created for all (new) users and items, and is passed to returnRatings() as a whole. If the resulting object is sparse (like, e.g., top-N-lists), this is inefficient from a memory usage point of view (because most of the non-NA entries are thrown away anyway, but it is ex ante unclear how many).

I have created a variant in my fork (gregreich/recommenderlab@85f3e62) which loops over (new) users, passing each row to returnRatings() individually (to be consistent with the other RECOMs's return values), and stacking the results on top of each other (via a dgCMatrix). This approach costs runtime (a factor of 4 to 5, see below), but seems to have constant memory usage (n is number of new users here; number of items is about 250k).

         Function_Call Elapsed_Time_sec Total_RAM_Used_MiB Peak_RAM_Used_MiB
1 predict [old, n=20]             3.495                                971.4
2 predict [new, n=20]            17.502                                750.8
3 predict [old, n=100]           18.185                               1839.6
4 predict [new, n=100]           95.057                                751.4

Maybe you can have a look and comment on it. Runtime efficiency should definitely improved; also, maybe the conversion from and to dgCMatrix could be avoided, but I do not know how to concatenate to objects of type realRatingMatrix directly.

A similar problem seems to exist for RECOM_POPULAR, but I haven't attempted that one yet.

Confusion about Confusion Matrix

I am a bit confused about the normalization of (False/True)-(Positive/Negative) rates output by getConfusionMatrix() for the top N classification task.

I see that the *-Positive frequencies are normalized to the total number of users in the test-set. For instance, with 100 users and a fixed number N of recommendation per user we have:

TP = (# correct recommendations) / 100
FP = (# wrong recommendatons) / 100
TP + FP = N

What about the *-Negative frequencies? How are TN and FN computed? Sorry if this is obvious, but I cannot figure it out.

Thanks in advance,

Valerio

Evaluation Scheme doesn't work for Large Real Rating Matrix

Hey there,

it's me again :). Now I'm having a problem with the Evaluation Scheme. I have some matrices and the evaluation works fine for the binary matrix (Size 158 MB). Now I also wanted to use it for my Real Rating Matrices (Size ~ 196MB) and the code runs endlessly and nothing happens. I also tried it on a more powerful computer and it ran for two hours and nothing (not even a error message). I tried split, cross-validation and it just wouldn't work. For reference: The binary matrix which is identic (with exception of the ratings numeric value) to the real rating matrices took only 10 minutes to calculate.

That's my code


getRatings(rrm)

normalize(rrm)

evaluation_scheme_rrm <- evaluationScheme(rrm, 
                                                   method = "cross-validation",
                                                   train = 0.8,
                                                   k = 5,
                                                   given  = -1,
                                                   goodRating = 1)

My specs:

Core i7 1165G7
64 GB RAM
512 GB SSD
Iris Xe Graphics

I really would appreciate help here. You can find the matrix attached.

Best regards,

Daniel
RealRatingMatrix.zip

Regression from version 0.2-5 to 0.2-6 in Recommender.predict behavior

I get the following error in 0.2-6 which I didn't see with 0.2-5:

Error in neighbors[, x] : incorrect number of dimensions
Calls: getRecos ... predict -> .local -> -> sapply -> lapply -> FUN
Execution halted

funkSVD starting values for U and V

Hi, I would like to try other starting values for U and V in funkSVD. No matter what I do, tcrossprod(fsvd$U, fsvd$V) results in very similar predictions for the missing values per row but that's not what it is supposed to do. I already tried many different parameter combinations but that didn't help. So I was hoping, other starting values (e.g. matrix(rnorm(nrow(x)*k),nrow = nrow(x), ncol = k) might solve the problem. I tried to copy the function and initialize U and V with random numbers but R always crashes. Would it be possible to make the initialization of U and V a definable variable as well? Or do you have any other solution? Thanks.

still showing progress despite, progress = FALSE

I am using R 3.4.1. and the latest development version from github (also tested with the CRAN version).

As the title says, setting progress = FALSE in the evaluate function does not have an effect and evaluate still outputs progress for the folds in crossvalidation.

Predicting known ratings when manually testing performance/evaluating model

Hi @mhahsler, I want to make predictions using my trained Recommender on a test set (a realRatingMatrix instance). The issue is that I can only make useful predictions on unknown data. I want to predict known ratings in this test set for manual performance testing, but when setting type="ratingMatrix" for returning the full matrix (instead of only the predictions for the unknown ratings) I get predictions that don't make sense. See below:

ratingmatrix <- as(movielense, "realRatingMatrix")
train <- ratingmatrix[fold, ]
test <- ratingmatrix[-fold, ]

# train SVD model
svd_model <- Recommender(data = train, method = "SVD")
pred_test <- as(predict(object = svd_model, newdata = test, type = "ratingMatrix"), "matrix")`

When my test dataset looks for example like this:

NA  NA  NA   3  NA  NA  4  NA  NA  NA

My predictions become

3.952  3.951  3.948  -0.948  3.948  3.947  0.051  3.948  3.948  3.948

Note how all predictions seem fine except the two known entries (one even becomes negative...). Is there a way to predict these correctly so I can manually evaluate my model on the known ratings? The only alternative I found was setting type="ratings", which excludes the known ratings completely from the predictions.

Thank you!

How the argument "goodRating" work in function "calcPredictionAccuracy"

Hi,

I want to build a recommendation model by applying this package, however when I evaluate the performance of the model, I'm not quite understand the function "calcPredictionAccuracy".

Is the "goodRating" argument means the threshold that the the item would be recommended if the rating is higher than this threshold?

I used part of the example code in the document as follows:

https://pastebin.com/6WyXt6PC

I found that none of the prediction rating 'p' of the member u6662 is higher than the threshold, so I thought it means that no items would be recommended when evaluating, while the TP of this member is 28, and I'm confused about it.

Is there anyone knows how this thing work?

for "UBCF"algorithm, in "REAL_UBCF"function, "sum_s_uk <- colSums(s_uk)"I think should change to sum_s_uk <- colSums(s_uk, na.rm=TRUE)

as the title dexcribes,I think, for "UBCF"algorithm, in "REAL_UBCF"function, "sum_s_uk <- colSums(s_uk)"I think should change to "sum_s_uk <- colSums(s_uk, na.rm=TRUE)"
I'm not sure whether it's a bug.

Fatal error after open package

Hi, I had problems with installation of recommenderlab, almost 2 days it did not want to installed in Rstudio even manually downloaded. Now try to download and install every dependencies manually. I start with 'irlba' manually and then try to install recommenderlab again, sound it was installed,but Rstudio immediately come up with Fatal Error and end of Session! :( I rapid action and the same result. Can you please recommend me what to do? Thanks.
https://www.dropbox.com/s/d62ci0vujpdxkiu/Screenshot%202017-02-07%2000.11.41.png?dl=0

Is this line a typo, for the symbol "/"

Inside normalize.R,

in line 94 - 96,

if(method_id==2) { ## Z-score
        data@x <- data@x/rep(sds, colCounts(x))
}

Will this be a typo of the operator symbol division ("/")?
Since this is the function of normalize under Z-Score method.

while the line 83 - 85, are using multiply ("*")

if(method_id==2) { ## Z-Score
        data@x <- data@x*rep(sds, rowCounts(x))
}

Thank you.

"drop not implemented for ratingMatrix!" warning in Predict

Minimum example code:

 data(Groceries)
dat <- as(Groceries, "binaryRatingMatrix")
rec <- Recommender(dat, method = "AR", 
  parameter=list(support = 0.0005, conf = 0.5, maxlen = 5))
pred <- predict(rec, newdata=c(1), data = dat, n=30)
as(pred, "list")

If the newdata argument in predict function is a list of index of users in the training data, newdata <- data[newdata,, drop=FALSE] will throw a warning drop not implemented for ratingMatrix!.

Recommenderlab with Predict error

Hello Michael,
Thank you for this amazing package! There seems to be a bug about sparse matrix multiplication within the predict() function. I'm working on the below code, which keeps reporting an error on the predict() function. any ideas about how to deal with it?

library(recommenderlab)

# simulate matrix with 1000 users and 100 movies
m <- matrix(nrow = 2000, ncol = 100)

# simulated ratings (5% of the data)
m[sample.int(100 * 2000, 10000)] <- ceiling(runif(1000, 0, 5))

# convert into a realRatingMatrix
r <- as(m, "realRatingMatrix")

# UBCF recommender
UB.Rec <- Recommender(r, method = "UBCF")

pred <- predict(UB.Rec, r, type = "ratings")

as(pred, "matrix")

sessioninfo

R version 4.2.1 (2022-06-23 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19043)

Matrix products: default

locale:
[1] LC_COLLATE=Chinese (Simplified)_China.utf8 
[2] LC_CTYPE=Chinese (Simplified)_China.utf8   
[3] LC_MONETARY=Chinese (Simplified)_China.utf8
[4] LC_NUMERIC=C                               
[5] LC_TIME=Chinese (Simplified)_China.utf8    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] recommenderlab_1.0.2 registry_0.5-1       proxy_0.4-27         arules_1.7-5        
[5] Matrix_1.5-1        

loaded via a namespace (and not attached):
 [1] compiler_4.2.1     generics_0.1.3     recosystem_0.5     tools_4.2.1       
 [5] float_0.3-0        Rcpp_1.0.9         grid_4.2.1         irlba_2.3.5.1     
 [9] matrixStats_0.62.0 lattice_0.20-45

HybridRecommender Output

After predicting the hybridRecommender results, user-id s can't display in the last output. There is only item-column names from X1 to X10 and item-id s in the columns. Is it possible to display user-ids in output? Or the function runs only in this way?

BIN_UBCF() of RECOM_UBCF does not return topNList when type==''topNList"

Hi Dr. Hahsler,

I noticed that for a 'binaryRatingMatrix', no matter what the type is (i.e. "topNList", "ratings"), BIN_UBCF() always returns a realRatingMatrix. Please see Line #74 in https://github.com/mhahsler/recommenderlab/blob/master/R/RECOM_UBCF.R.

For example, if I set results = evaluate(scheme, c("UBCF","IBCF"), type = "topNList")
I should get a topNList as results. I also noticed IBCF does return a topNList (line #66 - #68 in https://github.com/mhahsler/recommenderlab/blob/master/R/RECOM_IBCF.R)

Can you please tell me if this is an actual issue and can be fixed. I added the following lines before line #74 in RECOM_UBCF.R and the accuracy in case of a binaryRatingMatrix increases.
if(type=='topNList'){
top_N= (getTopNLists(ratings, n=ncol(ratings)))
top_N <- bestN(top_N, n)
return(top_N)
}

RECOM_RANDOM why not assign rownames?

## create random ratings (Z-scores)
ratings <- matrix(rnorm(nrow(newdata)*ncol(newdata)),
  nrow=nrow(newdata), ncol=ncol(newdata),
  dimnames=list(NULL, model$labels))

Why NULL in dimnames, and not user ids?
as(ratings, "data.frame") fails without row names.

Matrix way too big. Can I use key - values database instead? Question

I have a ratings database of the user_id,item_id,rating style in R. If I use the whole thing or even anything larger than 1k users by 200 items as a base for a matrix my computer runs out of memory. Data.table manages to work on same dataset lightning quick. Can recommenderlab use such a dataset type directly without having to matrix it?

Parallelize

Hello, thank you for such an awesome library.
Can you help me to parallelize 'predict' function for get top N recommendation?

`@Dim` method for `binaryRatingMatrix` class appears to be inverted

See reprex below for a minimal example of the issue. I think it's a bug, but honestly I can't make heads or tails of what might be going on here. I poked through the code a little bit but didn't see anything obviously wrong -- I'm probably just missing something though. Is this a feature or a bug?

library(recommenderlab)
#> Loading required package: Matrix
#> Loading required package: arules
#> 
#> Attaching package: 'arules'
#> The following objects are masked from 'package:base':
#> 
#>     abbreviate, write
#> Loading required package: proxy
#> 
#> Attaching package: 'proxy'
#> The following object is masked from 'package:Matrix':
#> 
#>     as.matrix
#> The following objects are masked from 'package:stats':
#> 
#>     as.dist, dist
#> The following object is masked from 'package:base':
#> 
#>     as.matrix
#> Loading required package: registry
#> Registered S3 methods overwritten by 'registry':
#>   method               from 
#>   print.registry_field proxy
#>   print.registry_entry proxy

train <- matrix(
  ## 5 users, 2 items
  sample(c(0, 1), size = 10, replace = TRUE),
  nrow = 5,
  ncol = 2
)

## 5x2, as expected
dim(train)
#> [1] 5 2

train.brm <- as(train, "binaryRatingMatrix")

## This seems fine
train.brm@data
#> itemMatrix in sparse format with
#>  5 rows (elements/transactions) and
#>  2 columns (items)

## This also seems fine
dim(train.brm@data)
#> [1] 5 2

## Somehow this is broken
train.brm@data@data@Dim
#> [1] 2 5

^{Created on 2022-01-15 by the reprex package (v2.0.1)}

predict function can't input for realRatingMatrix

thank you for mhahsler，i meet with difficulties
after tansform the data to realRatingMatrix，then i use predict function but the results as follow
[[1]]
character(0)

[[2]]
character(0)

[[3]]
character(0)

so i transform the data to binaryRatingMatrix，it can work. i don't know why

Some problems about "realRatingMatrix"

I'm new in R programming and i don't realy know much thing about it but i have to do a simple project. I need to make a something like in this site. https://www.r-bloggers.com/recommender-systems-101-a-step-by-step-practical-example-in-r/

(codes from that website)

1 - library("recommenderlab")
2 - data <- read.csv("collected_data.csv")
3 - affinity.data<-read.csv("collected_data.csv")
4 - affinity.matrix<- as(affinity.data,"realRatingMatrix")

This is where i can't pass.
After this, i get a data at the environment tab and types "Formal class realRatingMatrix" but i saw in another video, it types "Large class realRatingMatrix"

5 - Rec.model<-Recommender(affinity.matrix[1:5000], method = "UBCF")

Btw my .csv file has 1000 lines.
After this code, iget an error and it says "Error in intI(i, n = d[1], dn[[1]], give.dn = FALSE) :
index larger than maximal 4".

I need this programme immediately, i hope someone can answer it in this day.

Implicit ALS Bug

Hi again,

the evaluation schemes work great now, but still I'm facing issues with my implicit dataset. I always get the same error when I try to apply the iALS algorithm on my evaluation Scheme.
I'm using the RRM with confidence values between 1 and 3. Matrix sparsity is around 99%, due to the sparse implicit feedback. The dimensions for the sample I use are: 651779 * 4694

I googled the error code and found this: https://stackoverflow.com/questions/58302449/what-does-the-cholmod-error-problem-too-large-means-exactly-problem-when-conv

So it propably has something to do with the number of columns. For a small sample with 100 entries it works.

I don't think it is a memory issue because the RAM-use doesn't spike up when I start evaluating.


algorithms_realRating <- list(
  "ALS"                                   = list(name = "ALS_implicit", param = NULL),
  "LIBMF"                                 = list(
    name = "LIBMF",
    param = list(
      dim = 20,
      costp_l2 = 0.1,
      costq_l2 = 0.01,
      nthread = 4,
      verbose = FALSE
    )
  ),
  "SVDF"                                  = list(name = "SVDF", param = NULL),
  "SVD"                                   = list(name = "SVD", param = NULL),
  "random items"                                = list(name = "RANDOM")
)

es_td_rrm<- evaluationScheme(
  rrm[1:600000],
  method = "cross-validation",
  train = 0.8,
  k = 3,
  given  = -1,
  goodRating = 1
)

results_ratings_rrm <- evaluate(es_td_rrm, 
                                    algorithms_realRating, 
                                    type  = "ratings"
                                    )

Thanks in advance for help. I have to admit I'm not a coding expert and just a bachelor student, which is the reason I have to rely on such great packages for my first hands-on experiences with Recommender Tasks.

Have a great day you all!

Edit: I tested on a different computer

So it propably is an RAM issue after all. So you can either close this issue or put it into enhancements, because it is propably possible for it to run without this sparse to dense matrix transition.

https://github.com/mhahsler/recommenderlab/files/9931355/RealRatingMatrix.zip

Is "calcPredictionAccuracy" working correctly?

Hi.

Thank you for a great package.

I cannot get the output of "calcPredictionAccuracy" to make sense. Traning and evaluation of the recommender models works well and produces fine results.

However, when I use calcPredictionAccuracy I get only zero values in the TP columns. That does not align with the results I see when I train the model.

Can you shed some light over what I might be doing wrong, if anything? Reproducible example is provided below. See how the TP columns is consistently 0 even though the recommender itself should be performing well:

   TP FP FN TN N precision recall TPR       FPR
1   0  2  3  1 6         0      0   0 0.66
2   0  2  3  1 6         0      0   0 0.66
3   0  2  3  1 6         0      0   0 0.66
4   0  2  3  1 6         0      0   0 0.66
5   0  3  2  1 6         0      0   0 0.75
6   0  2  3  1 6         0      0   0 0.66
7   0  3  2  1 6         0      0   0 0.75
8   0  2  3  1 6         0      0   0 0.66
9   0  2  3  1 6         0      0   0 0.66
10  0  2  3  1 6         0      0   0 0.66

# load recommenderlab
library(recommenderlab)

# create a vector with binary entries
vec1 <- c(1,1,1,1,0,1,0,1,1,1)
vec2 <- c(0,0,0,0,1,0,1,0,0,0)

# write to dataframe
dat <- data.frame(
  p1 = vec1,
  p2 = vec2,
  p3 = vec1,
  p4 = vec2,
  p5 = vec1
) 

# convert to matrix
dat <- as.matrix(
  dat
)

# create binary rating matrix
dat_train <- as(dat, "binaryRatingMatrix")

# define algorithms to test
algorithms <- list(
  "Jaccard K1" = list(name="UBCF", param=list(nn=1, method = "Jaccard", verbose = T)),
  "Jaccard K2" = list(name="UBCF", param=list(nn=10, method = "Jaccard", verbose = T)),
  "Jaccard K3" = list(name="UBCF", param=list(nn=100, method = "Jaccard", verbose = T)),
  "Most Popular" = list(name="POPULAR", param=NULL),
  "Random" = list(name="RANDOM")
)

# set up evaluations scheme
es <- recommenderlab::evaluationScheme(
  data = dat_train,
  method = "cross-validation",
  k = 5,
  train = 0.7,
  given = -1,
)

# evaluate
results <- recommenderlab::evaluate(
  es,
  algorithms,
  type = "topNList",
  n=seq(1,3,1)
)

# assess resutls
res <- avg(results)
res <- results[["Jaccard K2"]]

# assess results
plot(results, "prec/rec", annotate=2, legend="topleft")
plot(results, annotate=c(1,3), legend="bottomright")

# we choose jackard K1 settings
rec <- Recommender(
  dat_train, 
  method = "UBCF",
  parameter = list(
    nn=10
  )
)
getModel(rec)

# predict
pre <- predict(rec, dat_train, n = 3)
pre

# as list
as(pre, "list")

# se prediction acuracy on user level
acc <- calcPredictionAccuracy(
  pre,
  data = dat_train,
  byUser = T,
  given = -1
)

acc

Problem with `keepModel` option in `evaluationScheme.evaluate()` method

It would be great if the RECOMs are stored when evaluated via evaluationScheme, for example to record the outcome of an AR mining process, etc. My understanding is that evaluate(..., keepModel = TRUE) would allow this; but it gives me an error:

AR run fold/sample [model time/prediction time]
...
[30.545sec/38.538sec] Error in (function (classes, fdef, mtable)  : 
  unable to find an inherited method for function ‘getModel’ for signature ‘"numeric"’
RANDOM run fold/sample [model time/prediction time]
	 1  [0.912sec/3.366sec] Error in (function (classes, fdef, mtable)  : 
  unable to find an inherited method for function ‘getModel’ for signature ‘"numeric"’
Warnmeldung:
In .local(x, method, ...) : 
  Recommender 'AR1' has failed and has been removed from the results!
  Recommender 'RANDOM' has failed and has been removed from the results!

How to use rerecommend function in recommenderlab?

Possible to evaluate a hybridRecommender?

Hi @mhahsler , i'm curious to know is it possible to evaluate a hybridRecommender? I'm trying to search on the internet regarding this question but fail to find any.

There is a way to evaluate different methods (e.g AR, IBCF, POPULAR, RANDOM & UBCF), and with evaluate function provided in recommenderlab it's easily for user to compare the performance between methods.

Is there a way to evaluate a hybridRecommender together with other methods?

# Below is the sample dataset
m <- matrix(sample(c(0,1), 50, replace=TRUE), nrow=5, ncol=10,
            dimnames=list(users=paste("u", 1:5, sep=''),
                           items=paste("i", 1:10, sep='')))

# Convert matrix into binaryRatingMatrix
b <- as(m, "binaryRatingMatrix")

# Compute HybridRecommender
recom <- recommenderlab::HybridRecommender(
         Recommender(b, method = "AR"),
         Recommender(b, method = "IBCF"),
         Recommender(b, method = "POPULAR"),
         Recommender(b, method = "UBCF"),
         weights = c(.25, .25, .25, .25))

# Set Evaluation Scheme
scheme <- evaluationScheme(b, 
                           method="split",
                           train=0.9,
                           given=1)

# Set list of algorithms to evaluate
algorithms <- list(
  "Association rules" = list(name = "AR"),
  "Item-based CF" = list(name = "IBCF"),
  "Popular items" = list(name = "POPULAR"),
  "Random items" = list(name = "RANDOM"),
  "User-based CF" = list(name = "UBCF"),
  "Hybrid" = list(name = "recom")
  )

# run algorithms, predict next n movies
results <- evaluate(scheme, algorithms, n=c(1, 3, 5, 10, 15, 20))

# Draw ROC curve
plot(results, annotate = 1:4, legend="bottomright")

Hope to hear from you!

extract latent features from ALS

hi there, is there a way to extract the resulting matrices (ie latent features) from the ALS implementation?

`summary()` methods for more classes in recommenderlab

I think it would be useful to have summary() methods for more classes, for example for evaluationScheme and Recommender. The problem is that saving the parametrisation of runs either has to be manually, or, if one want to query the objects directly to retrieve the parameters, there is quite some overhead because they store all the data with them (making load() extremely slow).

For example, for RECOM_ARs it could contain the output of summary() on the corresponding rules object. This could then be stored in the evaluationResults object by default, to avoid having to store the full recommender using evaluate(..., keepModel=TRUE). For evaluationScheme, it could contain the parameters used to set up the scheme (essentially everything but the data).

Any objections?

MovieLens metadata

The MovieLens 100k dataset comes with some side information about the users. This package has the ratings and item side information (title, year, genres) already - would be nice if it also included the user's side info.

User-based Collaborative Filtering fails in nearest neighbor assignment

Hi there,

I teach a class that involves a few lessons using RecommenderLab, and my students and I are encountering an error when using predict with user-based CF: Error in neighbors[, x] : incorrect number of dimensions. I thought it might be due to sparseness of the data, so I encouraged my students to try out different options for nearest neighbors. However, the error continued.

I re-ran code that as of November 2019 executed UBCF without issue and it now fails. Is this error a result of a new update?

Here is my data:
BDS-W12-fullrecommender-DataSet.txt
, and here is my code that produces the error:

dat <- read.csv("BDS-W12-fullrecommender-DataSet.csv", header=T)

dat <- as.data.frame(dat)
dat2 <- dat %>%
  dplyr::select( userID,placeID,rating)

dat3<-as(dat2, "realRatingMatrix")

e2 <- evaluationScheme(dat3, method="split", train=.7, given=2) # small 'given' because some users have few ratings

r2 <- HybridRecommender( Recommender(getData(e2, "train"), "IBCF"),  Recommender(getData(e2, "train"), "UBCF"))
r2

# prediction, "e2, known" signifies the 2 ratings in the test dataset.
p2 <- predict(r2, getData(e2, "known"), type="ratings")

ALS for implicit data

Hello Prof. Hahsler,

I wish to trial the ALS algorithm on my dataset which is composed of implicit-feedback data for product purchases. I see that the recommenderlab package appears to support an appropriate implementation of as Koren et al. ("Collaborative Filtering for Implicit Feedback Datasets") within the recommenderRegistry however I'm encountering errors and I'm unsure if my approach is incorrect or unsupported by the software..

To illustrate these errors please refer to my example using the MovieLens data which follows. I'm a little confused at these errors because:

the binaryRatingMatrix would appear to be supported in ALS_implicit as per the documentation, and
The the calcPredictionAccuracy does not appear to return a correct result for the 'topNList, binaryRatingMatrix' signature dspite being referenced within the documentation.

Any assistance you could provide to clarify my queries would be greatly appreciated.

Kind regards,
Michael

library(recommenderlab)
library(arules)

# import MovieLens data and transform to binaryRatingMatrix
data(MovieLense)
data.bin = binarize(MovieLense, minRating=1)

# create recommender object using AL_implicit
r = Recommender(data.bin[1:500], method = "ALS_implicit", 
                parameter = list(lambda=0.1, n_factors=10, 
                                 n_iterations=10, seed = NULL, verbose = TRUE))
recom = predict(r, data.bin[501:502], n=7)
recom_topNList = predict(r, newdata = data.bin[501:502,], type = "topNList", n = 7)
as(recom_topNList, "list")

# create evaluation scheme
scheme <- evaluationScheme(data.bin[1:500], method="split", train=0.9, given=-5)

# list available methods for implicit data
recommenderRegistry$get_entries(dataType = "binaryRatingMatrix")

# form list of algorithms supporting implicit data
algorithms = list("random items" = list(name="RANDOM", param=NULL),
                  "popular items" = list(name="POPULAR", param=NULL),
                  "user-based CF" = list(name="UBCF", param=list(nn=50)),
                  "item-based CF" = list(name="IBCF", param=list(k=50)),
                  "ALS Implicit" = list(name="ALS_implicit", param=list(lambda=0.1, n_factors=10, 
                                                                        n_iterations=10, seed = NULL, verbose = TRUE)),
                  "Association Rules" = list(name="AR", param=NULL))

# output results of evaluation 
results = evaluate(scheme, algorithms)

# ^^^ the above line produces the following error
# Error in matrix2[only_new_users, , drop = FALSE] : 
# invalid or not-yet-implemented 'Matrix' subsetting

# calculate metrics for algorithms

accuracy_table <- function(scheme, algorithm, parameter){
  r <- Recommender(getData(scheme, "train"), algorithm, parameter = parameter)
  p <- predict(r, getData(scheme, "known"), type="ratings")                      
  acc_list <- calcPredictionAccuracy(p, getData(scheme, "unknown"))
  total_list <- c(algorithm =algorithm, acc_list)
  total_list <- total_list[sapply(total_list, function(x) !is.null(x))]
  return(data.frame(as.list(total_list)))
}

# calculate accuracy metrics
table_random <- accuracy_table(scheme, algorithm = "RANDOM", parameter = NULL)
table_ubcf <- accuracy_table(scheme, algorithm = "UBCF", parameter = list(nn=50))
table_ibcf <- accuracy_table(scheme, algorithm = "IBCF", parameter = list(k=50))
table_pop <- accuracy_table(scheme, algorithm = "POPULAR", parameter = NULL)
table_ALS_implicit <- accuracy_table(scheme, algorithm = "ALS_implicit", 
                              parameter = list(lambda=0.1, n_factors=10, 
                                               n_iterations=10, seed = NULL, verbose = TRUE))
# ^^^ the calcPredictionAccuracy does not appear to return a correct result for the 
'topNList,binaryRatingMatrix' signature

# report metrics
rbind(table_random, table_pop, table_ubcf, table_ibcf, table_ALS_implicit)

# plot ROC and precicion/accuracy graphs
plot(results,  annotate=c(1,3), legend="topright")
plot(results, "prec/rec", annotate=3, legend="topleft")

Grid search or Parameter tuning in recommendations

I was trying to do cross validation with IBCF model.

Is there a specific reason that we should use "known" data in predict() function as below,can't we use "unknown" data, I was getting NA
predict(eval_ib,newdata=getData(eval_sets,"known"),n=5,type="ratings")
While calculating model accuracy by user, was getting NA for one user, is there a specific reason for that.
calcPredictionAccuracy(x=eval_pred,data=getData(eval_sets,"unknown"),byUser=TRUE)
How can I do grid search for the parameters like 'nn' in UBCF and 'k' in IBCF?

Unable to predict "topNList" using "HybridRecommender" on "binaryRatingMatrix"

Hi @mhahsler, I'm current running R-64bit (version 3.4.4) on windows machine with recommenderlab version 0.2-2 . I got an issue when predict recommendation "topNList" using "HybridRecommender" on "binaryRatingMatrix". Below are my code:-

Tried predict using "HybridRecommender" on "binaryRatingMatrix" (return error)

# Below is the sample dataset
m <- matrix(sample(c(0,1), 50, replace=TRUE), nrow=5, ncol=10,
            dimnames=list(users=paste("u", 1:5, sep=''),
                           items=paste("i", 1:10, sep='')))

# Convert matrix into binaryRatingMatrix
b <- as(m, "binaryRatingMatrix")

# Compute HybridRecommender
system.time(
     recom <- recommenderlab::HybridRecommender(
         Recommender(b, method = "AR"),
         Recommender(b, method = "IBCF"),
         Recommender(b, method = "POPULAR"),
         Recommender(b, method = "UBCF"),
         weights = c(.25, .25, .25, .25))
)

# Compute predicted recommendation items "topNList" (return error)
getList(predict(recom, 1:5, data = b, type = "topNList", n = 5, ))

Error in match.arg(type) : 'arg' should be one of “topNList”
In addition: Warning message:
In data[newdata, , drop = FALSE] : drop not implemented for ratingMatrix!

Tried predict using "HybridRecommender" on "realRatingMatrix", it work as well with no prediction issue

# Load sample dataset
data(Jester5k)

# check dataset class
class(Jester5k)
[1] "realRatingMatrix"
attr(,"package")
[1] "recommenderlab"

# Compute HybridRecommender
system.time(
  recom2 <- HybridRecommender(
      Recommender(Jester5k, method = "POPULAR"),
      Recommender(Jester5k, method = "IBCF"),
      Recommender(Jester5k, method = "SVDF"),
      Recommender(Jester5k, method = "UBCF"),
      weights = c(.25, .25, .25, .25))
)

# Predict recommendation (works well with no prediction issue)
getList(predict(recom2, 1:5, data = Jester5k, type = "topNList", n = 5))

[[1]]
[1] "j84" "j85" "j83" "j82" "j81"

[[2]]
[1] "j89" "j93" "j76" "j81" "j88"

[[3]]
character(0)

[[4]]
character(0)

[[5]]
[1] "j80"  "j81"  "j100" "j72"  "j89" 

Warning message:
In data[newdata, , drop = FALSE] : drop not implemented for ratingMatrix!

Hope to hear from you!

could I find the list of methords available Recommender(,methord=)

Dear Sir, I have the trouble to find out all the methords we can use in Recommender(), function and advantages, disadvantages.
could you let me know how to find it? thank you!

error in predict function for IBCF

could I get the rank or score from the predict()? and reformat the result into datafram

Dear Sir,
I am using "binaryRatingMatrix" as input. and using
seg_pre_ubcf<-Recommender(method = "UBCF", param=list(method="Cosine")) %>%
recommenderlab::predict(seg_pred_matrix, n = 20)

to pipe the result.
then I can use as(seg_pre_ubcf, "list") to check the result. (produce the recomendation for each cust_id)
in the result, i have a list of item_id for a specific custmoer_id, but I am not sure, is the first item_id is the best recommendation from UBCF model for this customer_id (I having n=20, is the first item_id means it rank as the first, all the all 20 item_id having same probability/similarity score to the specific customer? )

could I use existing function to reformat it?
I did not find one then I creat my own lop, as

seg2df <- function(top_list){
tmp <- as(top_list, "list")
n <- length(names(tmp))
out <- tibble(rownum = NA, item_id = NA)
for (i in 1:n){
list_item <- tibble(rownum = i,
item_id= tmp[[i]])
out <- rbind(out, list_item)
}

out <- out %>%
group_by(FK_Customer_No) %>%
mutate(rank = row_number()) %>%
ungroup()
return(out)
}

any suggestion? input? thank you!

No negative cosine similarities for IBCF with user mean-centered ratings

Hello Michael
Thank you for a great package! I am using recommenderlab with my students since it nicely offers to experiment with many of the concepts published in the literature.
In the students' challenge we are struggling with the fact, that for the MovieLense data set no negative Item-Item similarities result when applying an IBCF recommender with "cosine" on user-mean centered ratings.
Most likely this is due to proxy::dist() but we would be more than happy if you could shed some light over what we might be doing wrong, if anything.
We have included a reproducible example below.

Minimal example

# Movie-to-Movie cosine similarity on user-centered MovieLense ratings
# Negative similarities seem to be mapped to positive similarities


library(recommenderlab)
library(tidyverse)

# minimal example
x <- rbind(c(1,-1),c(-1,1))
rating <- as(matrix(x, ncol = 2), "realRatingMatrix")

ibcf <- Recommender(rating, method = 'IBCF', 
                    parameter = list(method = 'cosine', k = dim(MovieLense)[2], 
                                     normalize = NULL, na_as_zero = FALSE))

as(ibcf@model$sim, "matrix")[1,2]
# --> 1 is not correct

proxy::dist(x = x, y = x, method = "cosine")[1,2]
# --> 0 is not correct

lsa::cosine(x[1,], x[2,])
# --> -1 is correct

Movie Lense example

A) All movies considered

--> no negative similarities

# MovieLense example 
data("MovieLense")

# @k: no neighborhood constraint 
ibcf <- Recommender(MovieLense, method = 'IBCF', 
                    parameter = list(method = 'cosine', k = dim(MovieLense)[2], 
                                     normalize = "center", na_as_zero = FALSE))

# similarity distribution
summary(as.vector(as(ibcf@model$sim, "matrix")))

#   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
# 0.0000  0.0000  0.2668  0.4067  0.8992  1.0000 
# --> no negative similarities

B) Two movies selected

--> similarities match up to negative sign

# Example: "Money Train (1995)") vs "One Flew Over the Cuckoo's Nest (1975)"

# extract similarity
similarity1 <- as(ibcf@model$sim, "matrix")[536, 354]

# normalize ratings
center_MovieLense <- normalize(MovieLense, method = "center")
centered_ratings_twomovies <- as(center_MovieLense@data, "matrix")[, c(536, 354)]

# mimic nas_as_zero = FALSE
centered_ratings_twomovies[centered_ratings_twomovies[,1] == 0 | centered_ratings_twomovies[,2] == 0] <- NA 
centered_ratings_twomovies[is.na(centered_ratings_twomovies)] <- 0

similarity2 <- as.numeric(lsa::cosine(centered_ratings_twomovies[,1], centered_ratings_twomovies[,2]))


paste0("cosine similarity recommenderlab: ", similarity1)
# --> "cosine similarity recommenderlab: 0.660194486849794"

paste0("cosine similarity lsa: ", similarity2)
# --> "cosine similarity lsa: -0.660194486849794"

# --> similarities match up to negative sign

# check
# which(MovieLense@data@Dimnames[[2]] == "Money Train (1995)")
# which(MovieLense@data@Dimnames[[2]] == "One Flew Over the Cuckoo's Nest (1975)")
# which(ibcf@model$sim@Dimnames[[1]] == "Money Train (1995)")
# which(ibcf@model$sim@Dimnames[[1]] == "One Flew Over the Cuckoo's Nest (1975)")

Session Info

sessionInfo()

# R version 4.0.2 (2020-06-22)
# Platform: x86_64-w64-mingw32/x64 (64-bit)
# Running under: Windows 10 x64 (build 18363)
# 
# Matrix products: default
# 
# locale:
#   [1] LC_COLLATE=German_Switzerland.1252  LC_CTYPE=German_Switzerland.1252    LC_MONETARY=German_Switzerland.1252 LC_NUMERIC=C                       
# [5] LC_TIME=German_Switzerland.1252    
# 
# attached base packages:
#   [1] stats     graphics  grDevices utils     datasets  methods   base     
# 
# other attached packages:
#   [1] forcats_0.5.0        stringr_1.4.0        dplyr_1.0.2          purrr_0.3.4          readr_1.4.0          tidyr_1.1.2          tibble_3.1.0        
# [8] ggplot2_3.3.2        tidyverse_1.3.0      recommenderlab_0.2-7 registry_0.5-1       proxy_0.4-24         arules_1.6-6         Matrix_1.2-18       
# 
# loaded via a namespace (and not attached):
#   [1] Rcpp_1.0.5       lubridate_1.7.9  lattice_0.20-41  assertthat_0.2.1 digest_0.6.25    utf8_1.1.4       lsa_0.73.2       R6_2.4.1        
# [9] cellranger_1.1.0 backports_1.1.10 reprex_0.3.0     evaluate_0.14    httr_1.4.2       pillar_1.5.1     rlang_0.4.10     readxl_1.3.1    
# [17] rstudioapi_0.13  recosystem_0.4.3 irlba_2.3.3      blob_1.2.1       rmarkdown_2.4    labeling_0.3     munsell_0.5.0    broom_0.7.5     
# [25] compiler_4.0.2   modelr_0.1.8     xfun_0.18        pkgconfig_2.0.3  htmltools_0.5.0  tidyselect_1.1.0 fansi_0.4.1      crayon_1.3.4    
# [33] dbplyr_1.4.4     withr_2.3.0      SnowballC_0.7.0  grid_4.0.2       jsonlite_1.7.1   gtable_0.3.0     lifecycle_0.2.0  DBI_1.1.0       
# [41] magrittr_2.0.1   scales_1.1.1     cli_2.3.1        stringi_1.5.3    farver_2.0.3     fs_1.5.0         xml2_1.3.2       ellipsis_0.3.1  
# [49] generics_0.1.0   vctrs_0.3.4      tools_4.0.2      glue_1.4.2       hms_0.5.3        yaml_2.2.1       colorspace_1.4-1 rvest_0.3.6     
# [57] knitr_1.30       haven_2.3.1

Best, Daniel

UBCF not handling properly training data with NAs

The issues are with these lines:

sum_s_uk <- colSums(s_uk, na.rm=TRUE)
## calculate the weighted sum
r_a_norms <- sapply(1:nrow(newdata), FUN=function(i) {
  ## neighbors ratings of active user i
  r_neighbors <- as(model$data[neighbors[,i]], "dgCMatrix")
  drop(as(crossprod(r_neighbors, s_uk[,i]), "matrix"))
})
ratings <- t(r_a_norms)/sum_s_uk

If training data contain NAs, r_a_norms doesn't count them, but sum_s_uk still contains 'proximity' to all neighbors, which leads to normalization by a larger number than it should be ==> sum_s_uk should be a matrix, not a vector.

sum_s_uk <- colSums(s_uk, na.rm=TRUE); # <--This line could be replaced with these:

sum_s_uk.perPerson = sapply(1:nrow(newdata), FUN=function(i) {
  d_neighbors=s_uk[,i];
  r_neighbors=as(model$data[neighbors[,i]], "matrix");
  r_neighbors[!is.na(r_neighbors)] = 1;
  dr_neighbours =r_neighbors* d_neighbors;
  return(apply(dr_neighbours, 2, sum, na.rm = T));
});
sum_s_uk.perPerson[sum_s_uk.perPerson==0] = 1; # <== This is a quick&dirty solution to the case of no ratings (all are NAs for all neighbors) translating in 0; We want to avoid dividing by 0, and it doesn't matter what we replace it with, either 1 or whatever else non-zero and non-infinity, because the corresponding r_a_norms will 0 anyways.

And this line
ratings <- t(r_a_norms)/sum_s_uk

Could be replaced with this:
ratings <- t(r_a_norms/sum_s_uk.perPerson);

Item based collaboration filtering with cross validation

Hello

I was trying to use evaluationScheme() to perform 3 fold cross validation but I was getting an error as below,

The object 'rating' looks like "671 x 5782 rating matrix of class ‘realRatingMatrix’ with 30001 ratings.", how should I choose the value of 'given' parameter.

UBCF counts the active user as a part of neighborhood

When newdata are the same as training data (i.e., we want to fill in missing ratings in the original training user-item matrix), UBCF (maybe IBCF, I didn't check it) counts the active user as a part of their neighborhood. The code then generates new estimates for the ratings that we already have (i.e., for the ratings of the active user that were in the training matrix) - I'd expect the original ratings to be copied into the output matrix (instead of being generated based on the neighborhood).

Known-unknown split returns false splits with only 1 rating per user

The binaryRatingMatrix I'm working with contains several rows with just 1 rating per user. I developed a custom UBCF that takes users' metadata to still be able to predict, even with 0 given ratings. When testing the method with given all-but-one ratings, I noticed a bug within .splitKnownUnknown() (called by evaluationScheme) that assigns nothing to the unknown split.
With given all-but-one there should be at least 1 rating per row.

Here is a reproducible example:

set.seed(2100)
#parameters of constructed binaryRatingMatrix
nr_users <- 400 
nr_items <- 500 
sparsity <- 0.004 

#create random binaryRatingMatrix
data <- new("binaryRatingMatrix",
  data=as(t(as(Matrix(rbinom(nr_users*nr_items , 1 , sparsity), 
          nrow=nr_users, ncol=nr_items, sparse = TRUE),
         "ngCMatrix")), "itemMatrix"))
#select rows with min. 1 rating/user
min_1_item <- data[rowCounts(data) >= 1] 

  #inspect nr of ratings/user
  table(rowCounts(min_1_item))
  1   2   3   4   5   6   7   8 
120  88  74  47  12   3   1   1 

#Evaluation scheme with given-protocol all-but-one
es <- evaluationScheme(min_1_item, method='split', train=0.8, given=-1)
train_set <- getData(es, type='train')
known <- getData(es, type='known')
unknown <- getData(es, type='unknown')
unknown #should have at least as many ratings as rows

70 x 500 rating matrix of class ‘binaryRatingMatrix’ with 49 ratings.
#too few ratings 
####

The problem lies in the sample() function, which returns "integer(0)" when there is nothing to sample from. For a user with 1 rating, there are 0 given/known entries and 1 outcome/unknown entries.
Below you find my workaround within splitKnownUnknown(). Feel free to use it for the fix:

setMethod(".splitKnownUnknown_fixed", signature(data="binaryRatingMatrix"), 
  function(data, given) {

    ## given might of length one or length(data)
    if(length(given)==1) given <- rep(given, nrow(data))
    nitems <- rowCounts(data)

    allBut <- given < 0
    if(any(allBut)) {
      given[allBut] <- nitems[allBut] + given[allBut]
    }

    if(any(given>nitems)) stop("Not enough ratings for user" ,
      paste(which(given>nitems), collapse = ", "))

    l <- getList(data, decode=FALSE) #item labels of ratings

    known_index <- lapply(1:length(l), 
      FUN = function(i) { 
          if(given[i] == 0) 0 
          else sample(1:length(l[[i]]), given[i]) 
        }
      ) #FIXED: sample() returns integer(0) for row with 0 known/1 unknown 

    #added due to integer(0) problem
    unknown_index <- lapply(1:length(l), 
    FUN = function(i) {   
        if(given[i] == 0) 1   
        else -known_index[[i]]      
      }     
    )   

    #define KNOWN ratings    
    known <- encode(
      lapply(1:length(l), FUN = function(x)
        l[[x]][known_index[[x]]]),  
      itemLabels = itemLabels(data@data)
      )
    rownames(known) <- rownames(data)

    #define UNKNOWN ratings
    unknown <- encode(
      lapply(1:length(l), FUN = function(x)
        l[[x]][unknown_index[[x]]]), 
      itemLabels = itemLabels(data@data))
    rownames(unknown) <- rownames(data)

    known <- new("binaryRatingMatrix", data = known)
    unknown <- new("binaryRatingMatrix", data = unknown)

    list(
      known = known,
      unknown = unknown
    )
  })

lower bound for topN in IBCF for binaryRatingMatrix

I'm using recommenderlab on a binaryratingmatrix evaluating an item based CF recommender. I noticed that the minumum topN is ten whichever n I set and I wonder whether this is an error.

recc_model.ibcf <- Recommender(data = data_train, method = "IBCF",parameter = list(method = "Jaccard",k=4)) #item based with 4 items
model_details <- getModel(recc_model.ibcf)

recc_model.ubcf <- Recommender(data = data_train, method = "UBCF",parameter = list(method = "Jaccard",nn=30))

model_details <- getModel(recc_model.ubcf)

n_recommended <- 2L
recc_predicted <- predict(object = recc_model.ibcf, newdata = data_test, n = n_recommended)

Recommendations as ‘topNList’ with n = 10 for 4205 users.

iI would have expected with n=2 instead than 10

recc_user_1 <- recc_predicted@items[[1]]
ard_user_1 <- recc_predicted@itemLabels[recc_user_1]
ard_user_1

db4RS.zip
The attached zip files contains an rdata where train_data and test_data are saved

mhahsler / recommenderlab Goto Github PK

recommenderlab's People

Contributors

Stargazers

Watchers

Forkers

recommenderlab's Issues

recc_model.ubcf <- Recommender(data = data_train, method = "UBCF",parameter = list(method = "Jaccard",nn=30))

model_details <- getModel(recc_model.ubcf)

Recommendations as ‘topNList’ with n = 10 for 4205 users.

iI would have expected with n=2 instead than 10

Recommend Projects

Recommend Topics

Recommend Org