Giter Club home page Giter Club logo

facebook-friend-recommendation-using-graph-mining's Introduction

Facebook-Friend-Recommendation-using-Graph-Mining

This repositary belongs to Facebook Friend Recommendation using Graph Mining case study

About Dataset:

  • Our dataset is directed graph data
  • We have approx. 1.86M nodes and 9.43M edges.
  • Data was obtained from kaggle. You can get data from here https://www.kaggle.com/c/FacebookRecruiting
  • We have provided only connected nodes. i.e. 9.43M edges. But for each user among n user's, there is n-1 edges. So, for n nodes total possible edges are of 10^12 order.

Performance metric:

  • Both precision and recall is important so F1 score is good choice
  • Confusion matrix

Training Dataset preperation:

  • If we consider y= 1 , if edge is present in between two nodes.
  • We will assume y = 0 , if no edge is present.
  • Generated Bad links from graph which are not in graph and whose shortest path is greater than 2

Featurization:

Featurization is the most important part of this case study. Below is the list of extracted features

  1. Similarity measures
    • Jaccard Distance
    • Cosine distance
  2. Ranking Measure
  3. Graph Features
    • Shortest Path
    • Checking for same community
    • Adamic/Adar Index
    • Is following back
    • Katz Centrality
    • Hits Score
    • num followers
    • num followees
  4. Weight Features
    • weight of incoming edges
    • weight of outgoing edges
    • weight of incoming edges + weight of outgoing edges
    • weight of incoming edges * weight of outgoing edges
    • 2*weight of incoming edges + weight of outgoing edges
    • weight of incoming edges + 2*weight of outgoing edges
  5. SVD features using Adjancy matrix. (n_components = 6)

All the features are calculated for both Followers and Followees

Models:

We have used two models. RandomForest and XGBOOST. For both one, Follows_back is the most important feature found. Here is the summary result,

| Model | n_estimators | max_depth | Train f1-Score | Test f1-Score |

| Random Forest | 72 | 14 | 0.964 | 0.926 |

| XGBOOST | 76 | 14 | 0.996 | 0.927 |

Observations:

  1. Understanding of graph and feature engineering was the most important part of this case study.
  2. For Random Forest, Follow_back was the most important feature found, followed by weight, inter_follower and shortest_path.
  3. For XGBOOST, page_rank followed by shortest_path was the most important feature.
  4. Best result was obtained in case of XGBOOST.
  5. XGBOOST took most of time.
  6. For XGBOOST, follows_back was the most important feature. Followed by cosine_follower and weight_f1.

Note:

you can get all the weight and excel files used in above case study from here... https://drive.google.com/open?id=1AuduB2ttQuSUf-b057x0PnNyXpF07HwL

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.