Comments (7)
Update: I have implemented the same procedure for RECOM_POPULAR.
Moreover, I made some improvements, and the runtime penalty has halved since. So it is now between 2-3 times slower than the original version, but still has constant memory. See gregreich/recommenderlab@e208fb5
from recommenderlab.
This is an issue with all recommenders that create a full or close to full rating matrix before returnRatings()
translates them into a topN list. For the random recommender, we could implement a more efficient sampling code that does not create the full matrix (I was too lazy), but in general I think it would be better to create a wrapper in
setMethod("predict", signature(object = "Recommender")
in predict.R
that applies a blocking strategy for all recommenders where this could be an issue. We could define a maximum block size (rows x items) and then call predict dor the actual recommender for each block.
from recommenderlab.
The main motivation behind my approach to loop over users and still use returnRatings()
was to be as little intrusive as possible, and to ensure consistency of the output arguments without having to think (or to test) too much; so in that sense, I have to admit, I was lazy, too.
The blocking would definitely be a nice feature; but until somebody has time to do it, using the loop over users is probably quite close already, as it constitutes a special case of the blocker. (And we also have to keep in mind that maximum speedup compared to the other special case, i.e., one single block, is bounded quite tightly - if my measurements are reliable.)
from recommenderlab.
from recommenderlab.
Hi Michael, thank you for all the updates!
I also have two more comments:
- The solution using c() on a user level is indeed not optimal for two reasons: First, the "user" in this context can well be recommenderlab itself (e.g., in an evaluationScheme), so it would still require intrusive changes; as you suggested, a wrapper could be a solution. However, and secondly, this procedure potentially creates a source of substantial overhead for those RECOMs that can benefit from knowing all data at once, like AR due to the call to
is.subset()
; I've seen that in realistic cases. So having a solution that can be "appended" to the specific steps of the RECOM would be desirable, I think. removeKnownRatings()
has a similar problem (the code contains aFIXME
already). For example, calling it from RECOM_AR where everything is nice and sparse (provided the problem itself is), it will potentially use enormous amounts of memory just for cleaning out the known items. I have applied my loop quick fix here, too, and it appears that the overhead in terms of runtime is moderate (below 2, at least for sufficiently large cases - I haven't analysed smaller ones yet), but the memory requirement is now more or less constant as opposed to linear in the number of new users.
Function_Call Elapsed_Time_sec Total_RAM_Used_MiB Peak_RAM_Used_MiB
1 predict [old, n=100 ] 90.865 1077.3
2 predict [new, n=100 ] 152.973 638.3
3 predict [old, n=1000] 882.417 7595.8
4 predict [new, n=1000] 1318.017 856.6
In case you are interested, you find the version in my memory efficiency branch; of course, this version is not fully sparse but still dense in terms of items.
from recommenderlab.
removeKnownRatings is now sparse.
I am still thinking about the best way to deal with creating topN lists more memory efficient for RANDOM and POPULAR. I want to keep the code simple since this is one of the main features of this package. To make this really efficient, we probably need to implement these parts in C/C++.
from recommenderlab.
Hi, thanks for the new version of removeKnownRatings()
, it works very nicely.
Meanwhile, I have adapted also HybridRecommender to not overuse memory when creating top-N lists. Here, the problem is twofold: First, the predictions from the individual RECOMs have been dense (because predict()
was called with type="ratings"
, regardless what the final return type of the hybrid is), and second, again a complete, dense rating matrix is created to work on; in this case, I think it does not make sense because there is a loop over the rows of newdata
anyway. I resolved both issues in a quick and dirty manner (see here: cb2361d), but this time, it turns out to be even quicker than the original version, and, at the same time, less memory consuming:
Function_Call Elapsed_Time_sec Total_RAM_Used_MiB Peak_RAM_Used_MiB
1 predict [old, n=100] 241.613 3345.9
2 predict [new, n=100] 202.106 1053.5
I think that there is only so much one can do if the return type for any prediction is dense by construction of the desired output. But working with top-N lists wherever possible also in the intermediate steps would be highly desirable, I take from this.
from recommenderlab.
Related Issues (20)
- UBCF returns 1 as rating for predicted values with the weighted flag on binaryRatingMatrix HOT 2
- Regression from version 0.2-5 to 0.2-6 in Recommender.predict behavior HOT 4
- User-based Collaborative Filtering fails in nearest neighbor assignment HOT 1
- MovieLens metadata HOT 3
- Confusion about Confusion Matrix HOT 7
- Is "calcPredictionAccuracy" working correctly? HOT 5
- No negative cosine similarities for IBCF with user mean-centered ratings HOT 4
- Extensions for sampling "known"/"unknown" recommendations in test set HOT 3
- Problem with `keepModel` option in `evaluationScheme.evaluate()` method HOT 3
- `summary()` methods for more classes in recommenderlab HOT 1
- `@Dim` method for `binaryRatingMatrix` class appears to be inverted HOT 2
- Implementation of eALS HOT 2
- Recommenderlab with Predict error HOT 4
- Evaluation Scheme doesn't work for Large Real Rating Matrix HOT 3
- Implicit ALS Bug
- rmse() function missing HOT 1
- interestMeasure: parameter method is now deprecated in AR recommender
- BIN_AR method throws an error "Unknown interest measure to sort by."
- Use more stable Rd cross reference HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from recommenderlab.