Giter Club home page Giter Club logo

ealink's Introduction

EALink: An Efficient and Accurate Pre-trained Framework for Issue-Commit Link Recovery

Source code for the ASE'23 paper EALink: An Efficient and Accurate Pre-trained Framework for Issue-Commit Link Recovery.

Folder

  • Dstill folder contains the data format used in the distillation step dataset.py, the configuration file tiny_bert_config.json for student model and the distillation file bertdistill.py .
  • LinkGenerator folder contains the parser_lang folder for parsing abstract syntax trees and preprocessing steps for raw data.
  • data is used to store the processed datasets (you can get it in the link below).
  • modelscontains training and testing files.

Environment

  • python 3.9.7
  • pytorch 1.11.0
  • pandas 1.3.4
  • numpy 1.21.6
  • transformers 4.21.0
  • cudatoolkit 11.3.1
  • torchaudio 1.11.0
  • torchvision 1.12.0
  • GPU with CUDA 11.3

Datasets

We have constructed six large-scale project datasets for evaluating issue-commit link recovery. You can download the final dataset (Google Drive or 阿里云盘) described in the paper. To generate the dataset used for EALink in our experiments, please follow the data preprocessing steps.

How to run

1. Data preprocessing

You can follow the steps in the LinkGenerator folder to generate the dataset used for EALink. Or you can directly download the processed dataset (Google Drive or 阿里云盘) for use.

Get issue-code links for auxiliary task

In the LinkGenerator folder, 0_subdata.py generates issue-code links. You can run the following command:

python 0_subdata.py

Get issue-commit links after word segmentation processing

python 1_splitword.py

Merge

dataset merging

python 2_sub_merge.py

2. Distill the pre-trained model

cd Dstill
python bertdistill.py

3. Train and test

In the models folder, train.py and test.py enable training and testing of the trained model, respectively.

Train

cd models
python train.py \
   --tra_batch_size 16 \
   --val_batch_size 16 \
   --end_epoch 400 \
   --output_model <model_save_path> 

Test

python test.py \
   --tes_batch_size 16 \
   --model_path <model_path> 

ealink's People

Contributors

hui-li avatar cyzhang00 avatar

Stargazers

SejinKim avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.