The current implementation of RECOM_RANDOM will use excessive amounts of memory for pr

Memory efficiency of RECOM_RANDOM (and, possibly, RECOM_POPULAR) about recommenderlab HOT 7 OPEN

mhahsler commented on May 24, 2024

Memory efficiency of RECOM_RANDOM (and, possibly, RECOM_POPULAR)

from recommenderlab.

Comments (7)

gregreich commented on May 24, 2024

Update: I have implemented the same procedure for RECOM_POPULAR.

Moreover, I made some improvements, and the runtime penalty has halved since. So it is now between 2-3 times slower than the original version, but still has constant memory. See gregreich/recommenderlab@e208fb5

from recommenderlab.

mhahsler commented on May 24, 2024

This is an issue with all recommenders that create a full or close to full rating matrix before returnRatings() translates them into a topN list. For the random recommender, we could implement a more efficient sampling code that does not create the full matrix (I was too lazy), but in general I think it would be better to create a wrapper in

setMethod("predict", signature(object = "Recommender") in predict.R

that applies a blocking strategy for all recommenders where this could be an issue. We could define a maximum block size (rows x items) and then call predict dor the actual recommender for each block.

from recommenderlab.

gregreich commented on May 24, 2024

The main motivation behind my approach to loop over users and still use returnRatings() was to be as little intrusive as possible, and to ensure consistency of the output arguments without having to think (or to test) too much; so in that sense, I have to admit, I was lazy, too.

The blocking would definitely be a nice feature; but until somebody has time to do it, using the loop over users is probably quite close already, as it constitutes a special case of the blocker. (And we also have to keep in mind that maximum speedup compared to the other special case, i.e., one single block, is bounded quite tightly - if my measurements are reliable.)

from recommenderlab.

mhahsler commented on May 24, 2024

Hmmm. After thinking about it a little more, I think that we need a more general solution. This basically impacts any recommender that creates an (almost) complete rating matrix and that means most of the algorithms. For now, I think it is better if the user calls predict on a smaller number of new users in this case. I have added c() to combine topNLists to make this easier.

…

-m

On Fri, Aug 27, 2021 at 10:19 AM gregreich ***@***.***> wrote: The main motivation behind my approach to loop over users and still use returnRatings() was to be as little intrusive as possible, and to ensure consistency of the output arguments without having to think (or to test) too much; so in that sense, I have to admit, I was lazy, too. The blocking would definitely be a nice feature; but until somebody has time to do it, using the loop over users is probably quite close already, as it constitutes a special case of the blocker. (And we also have to keep in mind that maximum speedup compared to the other special case, i.e., one single block, is bounded quite tightly - if my measurements are reliable.) — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#50 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADRV247KBPSGS3723YVI4N3T66UIJANCNFSM5C3IRFMA> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.

from recommenderlab.

gregreich commented on May 24, 2024

Hi Michael, thank you for all the updates!

I also have two more comments:

The solution using c() on a user level is indeed not optimal for two reasons: First, the "user" in this context can well be recommenderlab itself (e.g., in an evaluationScheme), so it would still require intrusive changes; as you suggested, a wrapper could be a solution. However, and secondly, this procedure potentially creates a source of substantial overhead for those RECOMs that can benefit from knowing all data at once, like AR due to the call to is.subset(); I've seen that in realistic cases. So having a solution that can be "appended" to the specific steps of the RECOM would be desirable, I think.
removeKnownRatings() has a similar problem (the code contains a FIXME already). For example, calling it from RECOM_AR where everything is nice and sparse (provided the problem itself is), it will potentially use enormous amounts of memory just for cleaning out the known items. I have applied my loop quick fix here, too, and it appears that the overhead in terms of runtime is moderate (below 2, at least for sufficiently large cases - I haven't analysed smaller ones yet), but the memory requirement is now more or less constant as opposed to linear in the number of new users.

          Function_Call Elapsed_Time_sec Total_RAM_Used_MiB Peak_RAM_Used_MiB
1 predict [old, n=100 ]           90.865                               1077.3
2 predict [new, n=100 ]          152.973                                638.3
3 predict [old, n=1000]          882.417                               7595.8
4 predict [new, n=1000]         1318.017                                856.6

In case you are interested, you find the version in my memory efficiency branch; of course, this version is not fully sparse but still dense in terms of items.

from recommenderlab.

mhahsler commented on May 24, 2024

removeKnownRatings is now sparse.

I am still thinking about the best way to deal with creating topN lists more memory efficient for RANDOM and POPULAR. I want to keep the code simple since this is one of the main features of this package. To make this really efficient, we probably need to implement these parts in C/C++.

from recommenderlab.

gregreich commented on May 24, 2024

Hi, thanks for the new version of removeKnownRatings(), it works very nicely.

Meanwhile, I have adapted also HybridRecommender to not overuse memory when creating top-N lists. Here, the problem is twofold: First, the predictions from the individual RECOMs have been dense (because predict() was called with type="ratings", regardless what the final return type of the hybrid is), and second, again a complete, dense rating matrix is created to work on; in this case, I think it does not make sense because there is a loop over the rows of newdata anyway. I resolved both issues in a quick and dirty manner (see here: cb2361d), but this time, it turns out to be even quicker than the original version, and, at the same time, less memory consuming:

         Function_Call Elapsed_Time_sec Total_RAM_Used_MiB Peak_RAM_Used_MiB
1 predict [old, n=100]          241.613                               3345.9
2 predict [new, n=100]          202.106                               1053.5

I think that there is only so much one can do if the return type for any prediction is dense by construction of the desired output. But working with top-N lists wherever possible also in the intermediate steps would be highly desirable, I take from this.

from recommenderlab.

Memory efficiency of RECOM_RANDOM (and, possibly, RECOM_POPULAR) about recommenderlab HOT 7 OPEN

Comments (7)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent