Giter Club home page Giter Club logo

csdmetalearningrs's Introduction

Meta-Learning based Recommender System to Recommend Developers for Crowdsourcing Software Development

Project for the submitted paper for Empirical Software Engineering Journal


This is the project for our paper that proposed a meta-learning based recommender system to recommend reliable developers for crowdsourcing software development(CSD).
We shall give an insturction that will guide you to use the source code in this project here in detail.

Instruction for building the recommender system from source code and executing experiments

  • Prepare system environment
  • Start to run the data Crawler
  • Construct Input Data
  • Train Meta Models
  • Run Baselines and Policy Model for experiments

Prepare system environment

Minimum configuration of machines

  • RAM: 256G
  • CPU: 12 logic cores
  • Disk: 1TB+
  • TitanXP NVIDIA GPU is recommended for boosting computation
  • Make sure the bandwidth is at least 1000Mb/s if the database is not in your programming machine

Install python environment

We develop the whole system using python, so we recommend you to install an anaconda virtual python3.6 environment at: https://www.anaconda.com/

Install Mysql Database

Install mysql database into your computer with a linux system, and configure mysql ip and port according to the instruction of https://www.mysql.com/.

Install JDK8 and relative JAVA runtime

We use the crawler program implemented in JAVA. Please refer to the topcoder project at: https://github.com/lifeloner/topcoder for newest data crawler implemented in JAVA and prepare to import relative jar libraries.

Required python packages

  • machine learning:scikit-learn, lightgbm, xgboost, tensorflow, keras, imbalance-learn, networkx
  • data preprocessing: pymysql, numpy, pandas
  • models: Models required for the tool

Project Check

  • The DIG is implemented in CompetitionGraph Package.
  • The machine learning algorithms and policy model are implemented in ML_Models package.
  • For challenge and developer feature encoding and some data preprocessing modules of the system, refer to the DataPre package.
  • The Utility package contains some personalized tag definition, user function and testing scripts.
  • Make sure that the hierarchy of data folder is same in local disk.

Start to run the data Crawler

We do have a database in our laboratory, but due to the size and continuously updating of our database, it is not a good way to put the database here. Instead, we put the tools for data collection here, thus everyone can get enough data as they want. If you are eager for our data, contact me via the anonymous email mail@{[email protected]}.

  • Install mysql database into your computer with a linux system, and configure mysql ip and port according to the instruction of https://www.mysql.com/.
  • refer to the topcoder project at: https://github.com/lifeloner/topcoder for newest data crawler implemented in JAVA.
  • After downloading the java crawler maven project, please use intelliJ idea at: https://www.jetbrains.com/idea/ to deploy the crawler jar package in your machine
  • Configure the ip and port of your crawler according to the the configure of mysql database
  • Start run the crawler by the following command which will run in background: nohup java –jar crawler.jar &

Construct Input Data

Configure the datra/dbSetup.xml and set ip and port as same as the machine running mysql database, copy data/viewdef.sql and run it in your mysql client to create view for initial data cleaning.

You need to encode Developer and Challenge features at first

  • Run TaskContent.py of DataPre package to generate challenge feature encoding vectors and build clustering model
  • Run UserHistory.py of DataPre package to generate developer history data
  • Run DIG.py of CompetitionGraph package to generate developer rank score data

Run TaskUserInstances.py of DataPre package to generate input data

  • Adjust the maxProcessNum of DataInstances class to adapt your computer CPU and RAM
  • For training,set global variant testInst=False. The value of variant mode in global means 0-registration training data input, 1-submission training data input, 2-winning training data input. You have to run the script under the 3 values.
  • Generate test input data via set mode=2 and testinst=True

After finished running all the above scripts, check whether the generate traing input and test input data is completed via running the TopcoderDataset.py

Train Meta Models

Run XGBoostModel.py of ML_Models package

  • Feed “keepd” as key of tasktypes and run the script for 3 times with mode =0,1,and 2
  • Feed “clustered” as key of tasktrypes and run the script for 3 times with mode=0,1,and 2
  • After finished this, the meta model implemented using XGBoost algorithms can extract registration meta-feature, submission meta-feature and winning met-feature of all datasets

Run DNNModel.py of ML_Models package in the same way as XGBoostModel.py

Run EnsembleModel.py of ML_Models package in the same way as XGBoostModel.py

Generate the performance of all the winning meta models via running MetaModelTest.py of ML_Models package

  • Readers can build winning predictor based on the performance results

Run Baselines and Policy Model for experiments

Run BaselineModel.py of ML_Models package to build the baseline models we mentioned in the paper

  • After building baseline models, run the MetaModelTest.py of ML_Models package again but pass the model name as the names of classes of the baseline model in BaselineModel.py to generate performance results

  • Readers can refer to MetaLearning.py of ML_Models package which implemented some new learning process but may not be global optima

..........................................................

Please give a cite to our work if you want use the project somewhere else.

@INPROCEEDINGS{metalearning-recommender, 
author={Zhenyu Zhang, Hailong Sun, HongyuZhang}, 
title={Developer Recommendation for Topcoder through aMeta-learning based Policy Model},
year={2019},
url={https://github.com/zhangzhenyu13/CSDMetalearningRS} 
}
  

csdmetalearningrs's People

Contributors

zhangzhenyu13 avatar anonymousauthor2013 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.