Giter Club home page Giter Club logo

ptzagk / awesome-recommendation-engine Goto Github PK

View Code? Open in Web Editor NEW

This project forked from alonsoir/awesome-recommendation-engine

0.0 1.0 0.0 5.37 MB

The purpose of this tiny project is to put things together with the know how that i learned from the course big data expert from formacionhadoop.com The idea is to use spark, kafka,mongo, machine learning algorithms, web sockets and play.

License: Apache License 2.0

Scala 61.09% CSS 6.71% XSLT 32.20%

awesome-recommendation-engine's Introduction

The purpose of this project is to learn how to code a recommendation engine using kafka, spark, mongodb or cassandra, websockets and play.

The idea of the project comes from the big data course that i received from formacionhadoop.com

These are the components:

A kafka producer is going to ask periodically to Amazon in order to know what products based on my own ratings 
and i am going to introduced them into some kafka topic.

A spark streaming process is going to read from that previous topic.

Apply some machine learning algorithms (ALS, content based filtering colaborative filtering) on those datasets readed by 
the spark streaming process.

Save results in a mongo or cassandra instance.

Use play framework to create an websocket interface between the mongo instance and the visual interface.

I am going to use some ideas from a previous work: 

https://github.com/alonsoir/hello-kafka-twitter-scala

https://github.com/alonsoir/recomendation-spark-engine

Actually the project can push data to kafka topic, the spark streaming process can recover data from the topic and
save them into mongo instance.

Actual output:

MacBook-Pro-Retina-de-Alonso:bin aironman$ ./amazon-kafka-connector 127.0.0.1:9092 amazonRatingsTopic
Initializing Streaming Spark Context and kafka connector...
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
16/05/16 18:48:49 INFO SparkContext: Running Spark version 1.6.1
16/05/16 18:48:49 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
...
16/05/16 18:48:51 INFO VerifiableProperties: Verifying properties
16/05/16 18:48:51 INFO VerifiableProperties: Property group.id is overridden to 
16/05/16 18:48:51 INFO VerifiableProperties: Property zookeeper.connect is overridden to 
Initialized Streaming Spark Context and kafka connector...
Initializing mongodb connector...
Initialized mongodb connector...
Creating temporary table in mongo instance...
16/05/16 18:48:52 INFO SparkContext: Starting job: show at AmazonKafkaConnectorWithMongo.scala:137
16/05/16 18:48:53 INFO DAGScheduler: Got job 0 (show at AmazonKafkaConnectorWithMongo.scala:137) with 1 output partitions
...
16/05/16 18:48:53 INFO DAGScheduler: Job 0 finished: show at AmazonKafkaConnectorWithMongo.scala:137, took 0,250144 s
+--------------------+--------------------+
|                  id|       amazonProduct|
+--------------------+--------------------+
|Mon May 16 18:41:...|[  null  , "{\"it...|
|Mon May 16 18:42:...|[  null  , "{\"it...|
|Mon May 16 18:45:...|[  null  , "{\"it...|
+--------------------+--------------------+

tested a mongodb connection with stratio library...
finished withSQLContext...
16/05/16 18:48:53 INFO BlockManagerInfo: Removed broadcast_0_piece0 on localhost:57536 in memory (size: 2.5 KB, free: 2.4 GB)
...
16/05/16 18:48:57 INFO StreamingContext: Invoking stop(stopGracefully=false) from shutdown hook
16/05/16 18:48:57 INFO JobGenerator: Stopping JobGenerator immediately
16/05/16 18:48:57 INFO RecurringTimer: Stopped timer for JobGenerator after time 1463417336000
16/05/16 18:48:57 INFO JobGenerator: Stopped JobGenerator
16/05/16 18:48:57 INFO JobScheduler: Stopped JobScheduler
Finished!


MacBook-Pro-Retina-de-Alonso:bin aironman$ ./amazon-producer-example 0981531679
log4j:WARN No appenders could be found for logger (kafka.utils.VerifiableProperties).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Trying to parse product with id 0981531679
amazonProduct is AmazonProduct(0981531679,Scala Puzzlers,http://www.amazon.com/Scala-Puzzlers-Andrew-Phillips/dp/0981531679,http://ecx.images-amazon.com/images/I/41UHeor2AfL._SX218_BO1,204,203,200_QL40_.jpg,)
amazon product sent to kafka cluster...AmazonProduct(0981531679,Scala Puzzlers,http://www.amazon.com/Scala-Puzzlers-Andrew-Phillips/dp/0981531679,http://ecx.images-amazon.com/images/I/41UHeor2AfL._SX218_BO1,204,203,200_QL40_.jpg,)

Last login: Mon May 16 18:31:09 on ttys003
MacBook-Pro-Retina-de-Alonso:my-recommendation-spark-engine aironman$ mongo
MongoDB shell version: 3.2.6
connecting to: test
> use alonsodb;
switched to db alonsodb
> db.amazonRatings.find()
{ "_id" : ObjectId("5739f84a8d6ab41037bbf32d"), "id" : ISODate("2016-05-16T16:41:46.183Z"), "amazonProduct" : [ null, "{\"itemId\":\"0981531679\",\"title\":\"Scala Puzzlers\",\"url\":\"http://www.amazon.com/Scala-Puzzlers-Andrew-Phillips/dp/0981531679\",\"img\":\"http://ecx.images-amazon.com/images/I/41UHeor2AfL._SX218_BO1,204,203,200_QL40_.jpg\",\"description\":\"\"}" ] }
{ "_id" : ObjectId("5739f8628d6ab41037bbf32e"), "id" : ISODate("2016-05-16T16:42:10.025Z"), "amazonProduct" : [ null, "{\"itemId\":\"0981531679\",\"title\":\"Scala Puzzlers\",\"url\":\"http://www.amazon.com/Scala-Puzzlers-Andrew-Phillips/dp/0981531679\",\"img\":\"http://ecx.images-amazon.com/images/I/41UHeor2AfL._SX218_BO1,204,203,200_QL40_.jpg\",\"description\":\"\"}" ] }
{ "_id" : ObjectId("5739f9308d6ab41037bbf32f"), "id" : ISODate("2016-05-16T16:45:36.021Z"), "amazonProduct" : [ null, "{\"itemId\":\"0981531679\",\"title\":\"Scala Puzzlers\",\"url\":\"http://www.amazon.com/Scala-Puzzlers-Andrew-Phillips/dp/0981531679\",\"img\":\"http://ecx.images-amazon.com/images/I/41UHeor2AfL._SX218_BO1,204,203,200_QL40_.jpg\",\"description\":\"\"}" ] }
> 


Things to do:

Saving to mongo instance the results from ALS algorithm...
have fun in the process!

awesome-recommendation-engine's People

Contributors

alonsoir avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.