wasiahmad / avatar Goto Github PK

View Code? Open in Web Editor NEW

52.0 52.0 10.0 20.15 MB

Official code of our work, AVATAR: A Parallel Corpus for Java-Python Program Translation.

Home Page: https://arxiv.org/abs/2108.11590

License: Creative Commons Attribution Share Alike 4.0 International

Python 90.83% Shell 8.66% Perl 0.51%

programming-language programming-language-translator representation-learning translation-model

avatar's Introduction

Things I do

Senior Research Scientist 👨‍💻 at NVIDIA.
I am part of the Applied Deep Learning Research group.

Connect with me

avatar's People

Contributors

Stargazers

Watchers

Forkers

wwnew wanderer2014 mizersy kgourgou showjim phamasaur fahadmuslim nkdatascientist orgroman-dev lalayang

avatar's Issues

Two questions about the AVATAR paper

Hello,

Thank you very much for the great work!

In order to evaluate some the Code-ML models, I have two questions regarding the AVATAR paper (https://arxiv.org/pdf/2108.11590.pdf) :

(1) In Section 2: "To train models, we chose a maximum of k (we set k to 3 based on validation performances) solutions in each language to form a maximum of k^2 training examples and consider all the accepted solutions as reference translations for validation and testing." I am wondering how exactly these k^2 pairs are selected during training ? Assume there are 5 candidates in each language (JAVA/Python) for each problem, there are 25 pairs that one can choose from -- when k is set to 3, that means we ought to choose 9 out of these 25 pairs, I don't quite understand how exactly it is done as there are many ways to choose 9 pairs out of 25 pairs. Could you point to where in the code it is done ? Also, to validate and test, does it mean as long as the translated code matches with any one of the python reference targets (could be as many as 5), it should be counted as correct ?

(2) The caption of Table 2 reads: CA stands for Computational Accuracy. Does CA actually stand for Compilation Accuracy ?

Thanks!

Wei

Are the downloaded plbart checkpoints pretrained already

Well, the question is whether I can just use them on the inference stage

Preprocessing

Hi, can you please let me know how you preprocessed your data to run on the Transcoder-DOBF model? A few brief steps would really be helpful.

Thank you in advance!

download error of the pretrained ckpts you shared

Hi! I was trying to download the pretrained CodeT5-base and PLBART models by running download.sh, but I got the awk: fatal: cannot open file './cookie' for reading (No such file or directory) error.
I also tried to directly open the download link in my web browser but it shows "404 file not found". Are you using the same pretrained model as in huggingface (i.e., Salesforce/codet5-base)?

Also, will you release the finetuned ckpts of those models?

Thank you!

Typo Bug: Undefined 'tag' in DFG_java and DFG_csharp

AVATAR/evaluation/CodeBLEU/parser/DFG.py

Line 254 in bc92570

flag = False

Hi, I am working on a project that makes uses of your DFG_java code in the linked file, I found that there is a 'referenced before defined' error in this code. There should be another line define tag=False after flag=False.

I also found that other versions of DFG_java in this repository has the tag=False line defined, so I think this is just a typo.
(Also, I didn't make use of DFG_csharp but the DFG_csharp in this file evaluation/CodeBLEU/parser/DFG.py seem to be missing the tag initialization as well.)

Thanks for the good work!

Best

Unable to reproduce CodeT5 results

----------------------------------- Edit: able to reproduce results using different bs / lr ---------------------------------------------

Hi, I was trying to finetune CodeT5 for java-python translation but cannot reproduce the results. I got BLEU 63.77 / EM 1.88, which is much lower than your reported results: BLEU 67.0 / EM 2.8. The hyper-params are:

TRAIN_BATCH_SIZE=2
GRAD_ACCUM_STEP=16
LR=5e-5
NUM_TRAIN_EPOCHS=20
tokenizer_path=codet5/bpe;
source_length=510
target_length=510
--do_eval_bleu

All other hyper-params are as default in https://github.com/wasiahmad/AVATAR/blob/main/codet5/run.sh

Could you give me some suggestions on how to reproduce your results? Thank you!

Buggy test set?

I am working with AVATAR and tried to extract the test set with its test cases. I was able to extract 252 instances with test cases from codeforces and atcoder. I am facing some issues with test cases where the expected_output or test_input has ... at the end of it. I believe when downloading and preparing the test set, some inputs/outputs are getting truncated and ... is added at the end of test input/output. Moreover, there are test cases where the code is expecting 2 inputs but there is only one input in the test case. So as a result, the program hangs up waiting for the second input. I get these issues after doing the following:

Downloading the dataset by executing bash download.sh and prepare.sh in data
Downloading test cases by executing bash download.sh and bash prepare.sh in test_cases
The created atcoder_id2tests_filtered.jsonl and codeforces_id2tests_filtered.jsonl has avatar IDs but the inputs and outputs fields are empty ({"avatar_id": "codeforces_313_B", "inputs": [], "outputs": []}).
I matched the keys available in filtered jsonl files to non-filtered ones and extracted all unit tests for each example. For instance, this is the one for codeforces_313_B: {"avatar_id": "codeforces_313_B", "inputs": ["313_B/samples/10_input.txt", "313_B/samples/31_input.txt", "313_B/samples/25_input.txt", "313_B/samples/2_input.txt", "313_B/samples/28_input.txt", "313_B/samples/37_input.txt", "313_B/samples/23_input.txt", "313_B/samples/9_input.txt", "313_B/samples/16_input.txt", "313_B/samples/4_input.txt", "313_B/samples/11_input.txt", "313_B/samples/24_input.txt", "313_B/samples/30_input.txt", "313_B/samples/3_input.txt", "313_B/samples/29_input.txt", "313_B/samples/22_input.txt", "313_B/samples/8_input.txt", "313_B/samples/36_input.txt", "313_B/samples/17_input.txt", "313_B/samples/5_input.txt", "313_B/samples/33_input.txt", "313_B/samples/27_input.txt", "313_B/samples/12_input.txt", "313_B/samples/14_input.txt", "313_B/samples/35_input.txt", "313_B/samples/21_input.txt", "313_B/samples/19_input.txt", "313_B/samples/6_input.txt", "313_B/samples/26_input.txt", "313_B/samples/32_input.txt", "313_B/samples/13_input.txt", "313_B/samples/1_input.txt", "313_B/samples/15_input.txt", "313_B/samples/20_input.txt", "313_B/samples/34_input.txt", "313_B/samples/18_input.txt", "313_B/samples/7_input.txt"], "outputs": ["313_B/samples/10_output.txt", "313_B/samples/31_output.txt", "313_B/samples/25_output.txt", "313_B/samples/2_output.txt", "313_B/samples/28_output.txt", "313_B/samples/37_output.txt", "313_B/samples/23_output.txt", "313_B/samples/9_output.txt", "313_B/samples/16_output.txt", "313_B/samples/4_output.txt", "313_B/samples/11_output.txt", "313_B/samples/24_output.txt", "313_B/samples/30_output.txt", "313_B/samples/3_output.txt", "313_B/samples/29_output.txt", "313_B/samples/22_output.txt", "313_B/samples/8_output.txt", "313_B/samples/36_output.txt", "313_B/samples/17_output.txt", "313_B/samples/5_output.txt", "313_B/samples/33_output.txt", "313_B/samples/27_output.txt", "313_B/samples/12_output.txt", "313_B/samples/14_output.txt", "313_B/samples/35_output.txt", "313_B/samples/21_output.txt", "313_B/samples/19_output.txt", "313_B/samples/6_output.txt", "313_B/samples/26_output.txt", "313_B/samples/32_output.txt", "313_B/samples/13_output.txt", "313_B/samples/1_output.txt", "313_B/samples/15_output.txt", "313_B/samples/20_output.txt", "313_B/samples/34_output.txt", "313_B/samples/18_output.txt", "313_B/samples/7_output.txt"]}
Assuming these inputs and outputs are correct, I compiled and executed codeforces_313_B.java with the input provided in 313_B/samples/10_input.txt but since its one line, the program hangs up and waits for another input. However, no more input is available in 10_input.txt.
I believe ... can be part of input, but for some test cases the program parses it and tries to change everything to int, but it throws an exception when converting ... to integer.
I believe filtering should take care of this issue, however my filtered.jsonl files has no inputs/outputs. If the authors have jsonl files, it would be great if they can share it because I could not reproduce them.

Thanks.

Absent new_lines and indentation in python data

Hi!

I downloaded data from AVATAR/data/data.zip and also using script AVATAR/data/download.sh, and it seems that a lot of python functions in the dataset miss new_lines and indentation. For example CodeForces/421/A/solution1.py:

n, a, b = map(int, input().split())athur = map(int, input().split())alex = map(int, input().split()) total = [1] * n for i in alex:    total[i-1] = 2 print(*total)

or CodeForces/981/A/solution1.py:

s=input()c=len(s)for i in range(len(s)-1,0,-1):    k=s[0:i+1]    if(k!=k[::-1]):        print(c)        exit()    c-=1if(c==1):    print("0")

According to my simple heuristic calculation, about 50% of python functions look like this.

Is there way to fix it? Thanks in advance for your help!

Could you please help provide the trained model parameters?

Hi @wasiahmad , I wonder if it’s possible for you to share the trained models?Since l am in urgent need of evaluating the experimental effects of these models on datasets (such as AVATAR and G4G), including assigned
specific codes after translation, but my GPU is in short supply. Thank you in advance!

How did you select the samples in g4g-functions?

Hi! Sorry to bother you again. I found that there are 5132 geeksforgeeks samples in the whole AVATAR dataset, but only 3411 samples in g4g_functions. How did you select these 3411 samples? Did you filter out the problems in TransCoder-Eval?

Thank you!

evaluating TransCoder on AVATAR test set and bug in compile.py?

Hi @wasiahmad, I am using https://github.com/wasiahmad/AVATAR/blob/main/transcoder/run.sh to evaluate transcoder on AVATAR test data (test.java-python.java, test.java-python.java and test.jsonl ), 1699 samples. The scores do not match exactly with the paper. Could you please tell me if I am missing anything.

Also I think the success and error variables are swapped in the below line in format function due to which the current output for Python to Java is Success - 1699, Errors - 0 instead Success - 0, Errors - 1699.
In that case the CA is 0 for Python to Java. Please correct me if I am wrong.

AVATAR/evaluation/compile.py

Line 90 in 7bcabc9

 print('Success - {}, Errors - {} [Total - {}]'.format(error, success, num_errors)) 

Thanks and Regards,
Kunal Pagarey

eval_bleu with pretrained gpt model

Hi @wasiahmad,
I'm trying to evaluate a gpt-2 model with your code. Thus, I run run.py with microsoft/CodeGPT-small-py in pretrain_dir parameter and do_infer. In eval_blue script outputs equal to model(inputs)[1] – these are hidden states of pretrained gpt – and it's a tuple of 12 elements (n_layers) consisting of 2 elements each, and these two have [1, 12, 48, 64]. When it goes to this line past_hidden = [x[:, i:i + 1].expand(-1, beam_size, -1, -1, -1) for x in outputs] an error occurs: TypeError: tuple indices must be integers or slices, not tuple – and it also implies that the shape of each element in outputs should have 5 dimensions.
Which corrections should be done in this case?