Giter Club home page Giter Club logo

Comments (7)

gregreich avatar gregreich commented on May 24, 2024

Update: I have implemented the same procedure for RECOM_POPULAR.

Moreover, I made some improvements, and the runtime penalty has halved since. So it is now between 2-3 times slower than the original version, but still has constant memory. See gregreich/recommenderlab@e208fb5

from recommenderlab.

mhahsler avatar mhahsler commented on May 24, 2024

This is an issue with all recommenders that create a full or close to full rating matrix before returnRatings() translates them into a topN list. For the random recommender, we could implement a more efficient sampling code that does not create the full matrix (I was too lazy), but in general I think it would be better to create a wrapper in

setMethod("predict", signature(object = "Recommender") in predict.R

that applies a blocking strategy for all recommenders where this could be an issue. We could define a maximum block size (rows x items) and then call predict dor the actual recommender for each block.

from recommenderlab.

gregreich avatar gregreich commented on May 24, 2024

The main motivation behind my approach to loop over users and still use returnRatings() was to be as little intrusive as possible, and to ensure consistency of the output arguments without having to think (or to test) too much; so in that sense, I have to admit, I was lazy, too.

The blocking would definitely be a nice feature; but until somebody has time to do it, using the loop over users is probably quite close already, as it constitutes a special case of the blocker. (And we also have to keep in mind that maximum speedup compared to the other special case, i.e., one single block, is bounded quite tightly - if my measurements are reliable.)

from recommenderlab.

mhahsler avatar mhahsler commented on May 24, 2024

from recommenderlab.

gregreich avatar gregreich commented on May 24, 2024

Hi Michael, thank you for all the updates!

I also have two more comments:

  • The solution using c() on a user level is indeed not optimal for two reasons: First, the "user" in this context can well be recommenderlab itself (e.g., in an evaluationScheme), so it would still require intrusive changes; as you suggested, a wrapper could be a solution. However, and secondly, this procedure potentially creates a source of substantial overhead for those RECOMs that can benefit from knowing all data at once, like AR due to the call to is.subset(); I've seen that in realistic cases. So having a solution that can be "appended" to the specific steps of the RECOM would be desirable, I think.
  • removeKnownRatings() has a similar problem (the code contains a FIXME already). For example, calling it from RECOM_AR where everything is nice and sparse (provided the problem itself is), it will potentially use enormous amounts of memory just for cleaning out the known items. I have applied my loop quick fix here, too, and it appears that the overhead in terms of runtime is moderate (below 2, at least for sufficiently large cases - I haven't analysed smaller ones yet), but the memory requirement is now more or less constant as opposed to linear in the number of new users.
          Function_Call Elapsed_Time_sec Total_RAM_Used_MiB Peak_RAM_Used_MiB
1 predict [old, n=100 ]           90.865                               1077.3
2 predict [new, n=100 ]          152.973                                638.3
3 predict [old, n=1000]          882.417                               7595.8
4 predict [new, n=1000]         1318.017                                856.6

In case you are interested, you find the version in my memory efficiency branch; of course, this version is not fully sparse but still dense in terms of items.

from recommenderlab.

mhahsler avatar mhahsler commented on May 24, 2024

removeKnownRatings is now sparse.

I am still thinking about the best way to deal with creating topN lists more memory efficient for RANDOM and POPULAR. I want to keep the code simple since this is one of the main features of this package. To make this really efficient, we probably need to implement these parts in C/C++.

from recommenderlab.

gregreich avatar gregreich commented on May 24, 2024

Hi, thanks for the new version of removeKnownRatings(), it works very nicely.

Meanwhile, I have adapted also HybridRecommender to not overuse memory when creating top-N lists. Here, the problem is twofold: First, the predictions from the individual RECOMs have been dense (because predict() was called with type="ratings", regardless what the final return type of the hybrid is), and second, again a complete, dense rating matrix is created to work on; in this case, I think it does not make sense because there is a loop over the rows of newdata anyway. I resolved both issues in a quick and dirty manner (see here: cb2361d), but this time, it turns out to be even quicker than the original version, and, at the same time, less memory consuming:

         Function_Call Elapsed_Time_sec Total_RAM_Used_MiB Peak_RAM_Used_MiB
1 predict [old, n=100]          241.613                               3345.9
2 predict [new, n=100]          202.106                               1053.5

I think that there is only so much one can do if the return type for any prediction is dense by construction of the desired output. But working with top-N lists wherever possible also in the intermediate steps would be highly desirable, I take from this.

from recommenderlab.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.