Giter Club home page Giter Club logo

social-network-mining-based-on-academic-literatures's Introduction

Introduction

This is our final project for Social network mining(DATA130007) in Fudan university.

Task Review

Task 1

Design clustering algorithms or community mining algorithms to cluster all the papers in the data set. Use visualize tools to show all fields (ie, communities, identify corresponding community research topics), and highlight the most influential scholars in each field.

Task 2

Realization of demonstrating the ego-network to any input scholar (refer to the function example on the ArnetMiner website).

Task 3

Use the data provided by DBLP and ArnetMiner to analyze and model more social relationships among scholars, such as predicting the cooperation or citation relationship between two scholars, and predicting which conference will a scholar publish papers on in the future.

Division of work

In brief, he applies several methods in network representation learning and uses and expands a hierarchical representation learning algorithm(HARP) for Networks. For homogeneous networks, he mainly uses deepwalk, node2vec and LINE. For heterogeneous networks, he uses metapath2vec++. For task 1, he uses both homogeneous NRL and heterogeneous NRL methods to extract features for clustering and uses KMeans and Birch to complete the clustering task. For task 3, he uses heterogeneous NRL methods to generate the vector representation of scholar cooperation network, scholar citation network and scholar-conference network.

She revises the preprocessing part of word2vec model in task 1. (Stop words deleted, all words are transformed into lower case.) To get the main subjects of the papers, a LDA model is built by her to get the key words of each group of the papers in task 1. After a comparision of different scoring indexes, she uses the PageRank index to calculate the influence of each paper and scholar, then visualize the citation and cooperative relationship in task 1. Another visualization work, the ego-network in task 2, is also implemented and improved by her. To make a contrast of different methods in task 3, she built a baseline model for link prediction.

In this project, his work is mainly divided into two part: For task1, he uses the word vector file to train the sentence vector of each paper, and he is responsible for generating small data sets for testing. For task3, he uses the MDP model to train the transition matrix and predicts the relationship between scholars and conferences. Based on this, he further predict the relationship between scholars and scholars.

File Structure

Data-Preprocessing: include all code we use to select the data we want to use from Aminer.

HARP: include codes we use to realize our modified HARP model.

metapath2vec: include codes we use to generate vector depending on the metapath.

Task 1-3: include other codes we use to complete the task.

Task 1:

DrawCoauthorNetwork.py Given all the papers and the transformation from the paper id to a number (indicating which field the paper belongs to), calculate the influence of each paper and author, plot the cooperation network among scholars. Size of the nodes represent the influence of the scholar. Color indicates the major field the scholar belongs to.

DrawReferenceNetwork.py Given all the papers and the transformation from the paper id to a number (indicating which field the paper belongs to), calculate the influence of each paper, plot the citation network among papers in NLP field. Size of the nodes represent the influence of the paper.

HasVenue.py Given the original whole dataset, select those whose 'venue' infomation is not empty, then write this item into "[venue].txt".

LDA.ipynb & LDA.py Get the 6 main subjects of all the papers.

ProcessBar.py Visualize a process bar to get the project progress.

getTopAuthor.py Get the top 8 scholars, for experimental effect evaluation.

wordvector.pyAn advanced version of social_network_domain.ipynb

Naive_Clustering_Algorithm.py: Using KMeans and Birch to cluster. To get the result, run: python Naive_Clustering_Algorithm.py --model line/deepwalk/node2vec/metapath2vec --mixture True/False --KMeans True/False --Birch True/False --classes 6

result4cluster.py: compute the accuracy of cluster model.

Task 2

ego-network.py Draw ego-network. Open the ternimal, cd into the directory of the file, run python ego-network.py [Author Name] to get the ego-network of the specific author.

getco-author.py Generate and save a dictionary of which the key is the name of a scholar, the value is a list containing all his coauthers. Source for task2 .

Task 3

LinkPredictionBaseline.py A baseline model for link prediction of co-author network.

calculatePrecision.py Calculate the precision of link prediction of the baseline model.

evaluation4task3.py: calculate AUC of our model. Run: python evaluation4task3.py --task cp/re/conf/all --AUC True/False --p True/False

MDP model for task3.ipynb: Build the transfer matrix and do prediction on scholar-reference and scholar-scholar.

result : include all kinds of results we achieve.

You could find further information about the code with a README under the corresponding folder and a complete introduction of our project in report.

Acknowledgements

Many thanks to the following excellent open source projects:

HARP

metapath2vec

social-network-mining-based-on-academic-literatures's People

Contributors

wudan0399 avatar wz-fudan avatar yikai-wang avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.