Giter Club home page Giter Club logo

codeqa's Introduction

CodeQA

CodeQA is a free-form question answering dataset for the purpose of source code comprehension: given a code snippet and a question, a textual answer is required to be generated.

To obtain natural and faithful questions and answers, we implement syntactic rules and semantic analysis to transform code comments into question-answer pairs.

We hope this new dataset can serve as a useful research benchmark for source code comprehension.

You can find more details, analyses, and baseline results in our Findings of EMNLP 2021 paper "CodeQA: A Question Answering Dataset for Source Code Comprehension".

Data

The dataset (ver. 1.0) can be downloaded from Google Drive.

It contains a Java dataset with 119,778 question-answer pairs and a Python dataset with 70,085 question-answer pairs.

A few examples of CodeQA data format are shown in data_sample. Each one contains a question, an answer and a code snippet. For the code snippet, we provide an original version (.code.original) as well as a processed version (.code). Details of the processing are available in the following code summarization datasets.

Evaluation

We follow the same evaluation method of automatic metrics (BLEU, ROUGE-L, METEOR) as in Ahmad et al. (2020).

Source code can be found here.

Experiments on CodeBERT

cd codeBERT

Dependency

  • pip install torch
  • pip install transformers

Data

You can download data from Google Drive. Unzip it and move it to ./data.

Train

We fine-tune the model on 3*1080Ti GPUs.

Please run the following scripts:

bash java_script.sh [gpu-id] [model-name]

bash python_script.sh [gpu-id] [model-name]

Inference

After the training process, several best checkpoints are stored in a folder named after your model name, for example, ./output/[model-name]/checkpoint-best-bleu/pytorch_model.bin. You can run the following scripts to get the results on test set:

bash java_script_test.sh [gpu-id] [model-name]

bash python_script_test.sh [gpu-id] [model-name]

Pretrained Model

Java and Python pre-trained models (20 epochs) are available here.

Acknowledgements

Our CodeQA dataset is based on two code summarization datasets, code-docstring-corpus and TL-CodeSum.

We are thankful to the authors for making dataset and code available.

Citation

If you use our dataset, please cite us!

@inproceedings{liu2021codeqa,
  title={CodeQA: A Question Answering Dataset for Source Code Comprehension},
  author={Liu, Chenxiao and Wan, Xiaojun},
  booktitle={Findings of the Association for Computational Linguistics: EMNLP 2021},
  pages={2618--2632},
  year={2021}
}

codeqa's People

Contributors

jadecxliu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

codeqa's Issues

Can not reproduce CodeBERT baseline results on test set

Hi, thank you for your remarkable work and the released dataset. I am currently trying to reproduce the CodeBERT baseline results with your recently released code, it works fine on validation set and I am able to get roughly the same results on it. However, when it come to test set, the results are far below the reported values.
Specifically, the best result on validation set are:
{dev set: bleu = 33.34 | rouge_l = 31.16 | meteor = 11.17 | EM = 7.50 | Precision = 39.62 | Recall = 30.01 | F1 = 32.06}
which is close to the reported values.
The corresponding test results are:
{test set: bleu = 26.19 | rouge_l = 4.71 | meteor = 1.23 | EM = 0.40 | Precision = 6.74 | Recall = 4.69 | F1 = 4.86}
which is far below the reported values.
Could you please rerun the code or check the test set to verify it is the correct version? Thank you very much.

Source code original formatting

Thanks for this great project.

In the dataset you've released, I notice both code and code.orig in the different train, dev, test folders are the same, and both contain just the linearized strings of the programs.

Do you have the programs in their original format? With all the tabs and whitespaces?

Thanks.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.