Giter Club home page Giter Club logo

lupa's Introduction

Lupa

BigData Clustering Text recommendation

What is recommendation?

A recommendation is a prediction of a rating or preference that a user would grant to a specific object, achieved by using a recommendation system, which is a sub-category of information filtering system. The recommendation is based on previously recorded data and its main objective is to boost the conversion rate, foster the cross-selling by proposing complementary products and encourage customer loyalty. It can be implemented on a wide range of products, like videos, news, books and songs, among others and is used by many leading companies such as Amazon, Netflix and Pandora.

Types of automatic recommendation

It can be distinguished two generic types of recommendation systems in terms of information sources:

  • Collaborative filtering - takes into account user's past behavior, like items purchased or viewed, as well as collaboration between users and similar decisions made by other users. It predicts user's preferences as a weighted sum of the other users' preferences, where the weights are corresponding to the fraction of correlations of joint set of items assessed by two users.

The two main disadvantages of this method are that, firstly, it can be used only if there already exists information about the item (at least a couple of users have assessed the product), which means that it cannot recommend a new product. Secondly, this method does not take into account item's characteristics, which can lead to a recommendation of a product from a completely different category.

  • Content-based filtering - uses characteristics of an item to recommend other objects with similar features and is based on user preferences for specific values of product characteristics. Also it can employ importance ratings and feature's trade-offs to construct recommendations. For instance, in movie recommendation it may take into account factors such as genre, actors or director. In the case of music, personalized online radio stations are created on base of fundamental music features like types of instruments or rhythm.

As opposed to collaborative filtering though, content-based methods allow recommendations of completly new products, as long as these items have relevant product characteristics.

In the absence of any user preferences or a completely anonymous system where no user information can be recorded, a common solution used for content-based filtering is to assume the user prefers the item that he/she is viewing or selecting at the moment, creating that way a data source for the future recommendation.

Regardless of the chosen type, an effective recommendation system should be able to use at least one of the following five information sources:

  1. A person’s expressed preferences or choices among alternative products
  2. Preferences or choices of other consumers
  3. Product characteristics and its preferences
  4. Individual characteristics that may predict preferences
  5. Expert evaluations

In the case of Lupa, the chosen recommendation method is Content-based filtering, using item-to-item

<iframe src="//www.slideshare.net/slideshow/embed_code/44224872" width="425" height="355" frameborder="0" marginwidth="0" marginheight="0" scrolling="no" style="border:1px solid #CCC; border-width:1px; margin-bottom:5px; max-width: 100%;" allowfullscreen> </iframe>

Performance Test

This test consisted of a recommendation of 20786 news without and with clustering, using two different environments: local and global (Amazon). The objective of this test was to compare the runtime of both environments and find the most optimal.

Results:

| Mode Local without clustering | Mode Local with clustering | Mode AWS 4large with clustering ---|---|---|--- Cost Time | 3 days 2h 30 min | 1h 10 min | 25 min Insert New Item | 2 min | 9 seconds | 2 seconds

Occupied space: 62367 keys -> 72M

lupa's People

Contributors

monika-krzyzanowska avatar jaumejane90 avatar

Watchers

James Cloos avatar ezTask.io avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.