Giter Club home page Giter Club logo

avatar's Introduction

avatar's People

Contributors

wasiahmad avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

avatar's Issues

Two questions about the AVATAR paper

Hello,

Thank you very much for the great work!

In order to evaluate some the Code-ML models, I have two questions regarding the AVATAR paper (https://arxiv.org/pdf/2108.11590.pdf) :

(1) In Section 2: "To train models, we chose a maximum of k (we set k to 3 based on validation performances) solutions in each language to form a maximum of k^2 training examples and consider all the accepted solutions as reference translations for validation and testing." I am wondering how exactly these k^2 pairs are selected during training ? Assume there are 5 candidates in each language (JAVA/Python) for each problem, there are 25 pairs that one can choose from -- when k is set to 3, that means we ought to choose 9 out of these 25 pairs, I don't quite understand how exactly it is done as there are many ways to choose 9 pairs out of 25 pairs. Could you point to where in the code it is done ? Also, to validate and test, does it mean as long as the translated code matches with any one of the python reference targets (could be as many as 5), it should be counted as correct ?

(2) The caption of Table 2 reads: CA stands for Computational Accuracy. Does CA actually stand for Compilation Accuracy ?

Thanks!

Wei

Preprocessing

Hi, can you please let me know how you preprocessed your data to run on the Transcoder-DOBF model? A few brief steps would really be helpful.

Thank you in advance!

download error of the pretrained ckpts you shared

Hi! I was trying to download the pretrained CodeT5-base and PLBART models by running download.sh, but I got the awk: fatal: cannot open file './cookie' for reading (No such file or directory) error.
I also tried to directly open the download link in my web browser but it shows "404 file not found". Are you using the same pretrained model as in huggingface (i.e., Salesforce/codet5-base)?

Also, will you release the finetuned ckpts of those models?

Thank you!

Typo Bug: Undefined 'tag' in DFG_java and DFG_csharp

Hi, I am working on a project that makes uses of your DFG_java code in the linked file, I found that there is a 'referenced before defined' error in this code. There should be another line define tag=False after flag=False.

I also found that other versions of DFG_java in this repository has the tag=False line defined, so I think this is just a typo.
(Also, I didn't make use of DFG_csharp but the DFG_csharp in this file evaluation/CodeBLEU/parser/DFG.py seem to be missing the tag initialization as well.)

Thanks for the good work!

Best

Unable to reproduce CodeT5 results

----------------------------------- Edit: able to reproduce results using different bs / lr ---------------------------------------------

image

Hi, I was trying to finetune CodeT5 for java-python translation but cannot reproduce the results. I got BLEU 63.77 / EM 1.88, which is much lower than your reported results: BLEU 67.0 / EM 2.8. The hyper-params are:

TRAIN_BATCH_SIZE=2
GRAD_ACCUM_STEP=16
LR=5e-5
NUM_TRAIN_EPOCHS=20
tokenizer_path=codet5/bpe;
source_length=510
target_length=510
--do_eval_bleu

All other hyper-params are as default in https://github.com/wasiahmad/AVATAR/blob/main/codet5/run.sh

Could you give me some suggestions on how to reproduce your results? Thank you!

Buggy test set?

I am working with AVATAR and tried to extract the test set with its test cases. I was able to extract 252 instances with test cases from codeforces and atcoder. I am facing some issues with test cases where the expected_output or test_input has ... at the end of it. I believe when downloading and preparing the test set, some inputs/outputs are getting truncated and ... is added at the end of test input/output. Moreover, there are test cases where the code is expecting 2 inputs but there is only one input in the test case. So as a result, the program hangs up waiting for the second input. I get these issues after doing the following:

  1. Downloading the dataset by executing bash download.sh and prepare.sh in data
  2. Downloading test cases by executing bash download.sh and bash prepare.sh in test_cases
  3. The created atcoder_id2tests_filtered.jsonl and codeforces_id2tests_filtered.jsonl has avatar IDs but the inputs and outputs fields are empty ({"avatar_id": "codeforces_313_B", "inputs": [], "outputs": []}).
  4. I matched the keys available in filtered jsonl files to non-filtered ones and extracted all unit tests for each example. For instance, this is the one for codeforces_313_B: {"avatar_id": "codeforces_313_B", "inputs": ["313_B/samples/10_input.txt", "313_B/samples/31_input.txt", "313_B/samples/25_input.txt", "313_B/samples/2_input.txt", "313_B/samples/28_input.txt", "313_B/samples/37_input.txt", "313_B/samples/23_input.txt", "313_B/samples/9_input.txt", "313_B/samples/16_input.txt", "313_B/samples/4_input.txt", "313_B/samples/11_input.txt", "313_B/samples/24_input.txt", "313_B/samples/30_input.txt", "313_B/samples/3_input.txt", "313_B/samples/29_input.txt", "313_B/samples/22_input.txt", "313_B/samples/8_input.txt", "313_B/samples/36_input.txt", "313_B/samples/17_input.txt", "313_B/samples/5_input.txt", "313_B/samples/33_input.txt", "313_B/samples/27_input.txt", "313_B/samples/12_input.txt", "313_B/samples/14_input.txt", "313_B/samples/35_input.txt", "313_B/samples/21_input.txt", "313_B/samples/19_input.txt", "313_B/samples/6_input.txt", "313_B/samples/26_input.txt", "313_B/samples/32_input.txt", "313_B/samples/13_input.txt", "313_B/samples/1_input.txt", "313_B/samples/15_input.txt", "313_B/samples/20_input.txt", "313_B/samples/34_input.txt", "313_B/samples/18_input.txt", "313_B/samples/7_input.txt"], "outputs": ["313_B/samples/10_output.txt", "313_B/samples/31_output.txt", "313_B/samples/25_output.txt", "313_B/samples/2_output.txt", "313_B/samples/28_output.txt", "313_B/samples/37_output.txt", "313_B/samples/23_output.txt", "313_B/samples/9_output.txt", "313_B/samples/16_output.txt", "313_B/samples/4_output.txt", "313_B/samples/11_output.txt", "313_B/samples/24_output.txt", "313_B/samples/30_output.txt", "313_B/samples/3_output.txt", "313_B/samples/29_output.txt", "313_B/samples/22_output.txt", "313_B/samples/8_output.txt", "313_B/samples/36_output.txt", "313_B/samples/17_output.txt", "313_B/samples/5_output.txt", "313_B/samples/33_output.txt", "313_B/samples/27_output.txt", "313_B/samples/12_output.txt", "313_B/samples/14_output.txt", "313_B/samples/35_output.txt", "313_B/samples/21_output.txt", "313_B/samples/19_output.txt", "313_B/samples/6_output.txt", "313_B/samples/26_output.txt", "313_B/samples/32_output.txt", "313_B/samples/13_output.txt", "313_B/samples/1_output.txt", "313_B/samples/15_output.txt", "313_B/samples/20_output.txt", "313_B/samples/34_output.txt", "313_B/samples/18_output.txt", "313_B/samples/7_output.txt"]}
  5. Assuming these inputs and outputs are correct, I compiled and executed codeforces_313_B.java with the input provided in 313_B/samples/10_input.txt but since its one line, the program hangs up and waits for another input. However, no more input is available in 10_input.txt.
  6. I believe ... can be part of input, but for some test cases the program parses it and tries to change everything to int, but it throws an exception when converting ... to integer.
  7. I believe filtering should take care of this issue, however my filtered.jsonl files has no inputs/outputs. If the authors have jsonl files, it would be great if they can share it because I could not reproduce them.

Thanks.

Absent new_lines and indentation in python data

Hi!

I downloaded data from AVATAR/data/data.zip and also using script AVATAR/data/download.sh, and it seems that a lot of python functions in the dataset miss new_lines and indentation. For example CodeForces/421/A/solution1.py:

n, a, b = map(int, input().split())athur = map(int, input().split())alex = map(int, input().split()) total = [1] * n for i in alex:    total[i-1] = 2 print(*total)

or CodeForces/981/A/solution1.py:

s=input()c=len(s)for i in range(len(s)-1,0,-1):    k=s[0:i+1]    if(k!=k[::-1]):        print(c)        exit()    c-=1if(c==1):    print("0")

According to my simple heuristic calculation, about 50% of python functions look like this.

Is there way to fix it? Thanks in advance for your help!

Could you please help provide the trained model parameters?

Hi @wasiahmad , I wonder if it’s possible for you to share the trained models?Since l am in urgent need of evaluating the experimental effects of these models on datasets (such as AVATAR and G4G), including assigned
specific codes after translation, but my GPU is in short supply. Thank you in advance!

How did you select the samples in g4g-functions?

Hi! Sorry to bother you again. I found that there are 5132 geeksforgeeks samples in the whole AVATAR dataset, but only 3411 samples in g4g_functions. How did you select these 3411 samples? Did you filter out the problems in TransCoder-Eval?

image

Thank you!

evaluating TransCoder on AVATAR test set and bug in compile.py?

Hi @wasiahmad, I am using https://github.com/wasiahmad/AVATAR/blob/main/transcoder/run.sh to evaluate transcoder on AVATAR test data (test.java-python.java, test.java-python.java and test.jsonl ), 1699 samples. The scores do not match exactly with the paper. Could you please tell me if I am missing anything.

Also I think the success and error variables are swapped in the below line in format function due to which the current output for Python to Java is Success - 1699, Errors - 0 instead Success - 0, Errors - 1699.
In that case the CA is 0 for Python to Java. Please correct me if I am wrong.

print('Success - {}, Errors - {} [Total - {}]'.format(error, success, num_errors))

Thanks and Regards,
Kunal Pagarey

eval_bleu with pretrained gpt model

Hi @wasiahmad,
I'm trying to evaluate a gpt-2 model with your code. Thus, I run run.py with microsoft/CodeGPT-small-py in pretrain_dir parameter and do_infer. In eval_blue script outputs equal to model(inputs)[1] – these are hidden states of pretrained gpt – and it's a tuple of 12 elements (n_layers) consisting of 2 elements each, and these two have [1, 12, 48, 64]. When it goes to this line past_hidden = [x[:, i:i + 1].expand(-1, beam_size, -1, -1, -1) for x in outputs] an error occurs: TypeError: tuple indices must be integers or slices, not tuple – and it also implies that the shape of each element in outputs should have 5 dimensions.
Which corrections should be done in this case?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.