pkuzqh / recoder Goto Github PK

View Code? Open in Web Editor NEW

49.0 49.0 10.0 6.23 MB

License: MIT License

Python 100.00%

recoder's Introduction

pkuzqh.github.io

This is my homepage, http://pkuzqh.github.io.

recoder's People

Contributors

Stargazers

Watchers

Forkers

zysszy code-rep shubhampachori12110095 happygirlzt nashid elhusseiniali 820237651 percywei cdaprod zys-szy

recoder's Issues

Issue with getNodeId

Hi,

I found that when you identify the root node in this line, you have ignored IfStatement and ForStatement. I guess your purpose is to traverse its children. But, how about other statements which also have children? For example, WhileStatement or BlockStatement also has children.

Particularly, I observed several cases in which the root node is BlockStatement. In such cases, your implementation returns the BlockStatement as the root node. As a result, the sub root in this line cannot be found as BlockStatement do not belong to linenode, making Recorder produce nothing.

This problem can be addressed by adding the condition of root.name != 'BlockStatement' to this line but I am not sure if it is in your intention or not. Thanks

Loading Data issue

I am trying to train the model without the docker. I have downloaded the data0.pkl and data1.pkl. They are mentioned as raw data. Do they need pre-processing? The paths in the Dataset.py code are hardcoded and the filenames do not match data0.pkl and data1.pkl. I have changed them accordingly. But do they need the call to the preprocess function? As passing them to the preprocess function is throwing error. I have thus loaded the data directly into the self.data variable. But now the dataloader is empty and thus training is not happening at all. Please help soon with this data issue.

Training Model issue

Hello! Thanks for open-sourcing the work!
I am trying to train the model without the docker. However, I encounter the error shown in the figure.

The error is caused by line 151 of run.py. Specifically, the variable test_set is defined on line 150 of run.py, and the argument "test" is passed to SumDataset to initialize test_set. However, the third branch of the init function in the Dataset.py does not do anything, which causes self.data to be empty.
Furthermore, could you provide the files of the test set and validation set (i.e., valdata.pkl, valnl.pkl, testdata.pkl, testnl.pkl, testcopy.pkl).
Thank you for taking a look.

你好，在代码阅读中我有两个问题想咨询您

第一个问题：训练模型时所用的data.pkl文件里面是向量数据，这些数据是怎么从代码文本转换得到的。data.pkl文件中的数据的含义是什么？得到这个数据文件的过程可以告知吗？
第二个问题：您是如何从代码文本中生成AST的并运用其特征的呢？
感谢回答，麻烦了

RuntimeError: CUDA out of memory

Hi there, thank you very much for open-sourcing the work!
I wonder what devices you used for the work. Since I tried to run the training in a machine with 8 Tesla V100-SXM2-16GB, but cannot make it. Besides, I found the code would only utilize 2 GPUs, although I did not specify. I modified the device setting inside run.py, but still cannot change the fact that only 2 GPUs are used.
Please kindly suggest. Thank you in advance!

关于训练数据的问题

您好，关于训练数据我有一些疑问。我看到您在云盘提供了两个文件：data0.pkl 和 data1.pkl，请问哪一个是训练时用到的训练数据呢？感谢！

pkl formats

Hi,
Can you please share how you created the rule.pkl, code_voc.pkl, char_voc.pkl and nl_voc.pkl? These are read as input. So for a new dataset, I would need to create these files. Please help asap.

Data processing

Hello authors, may I ask for the workflow to process the data from raw data (java methods pairs) into the mode input (data.pkl)? Thanks so much for your time.

How to generate json file in result folder

Hi there,

I'm trying to run recoder on other projects, but I found that the run.py requires to load data at
classcontent = json.load(open("../result/%s.json" % idss, 'r') )
Is there a way to generate such json file for other projects?

Usage example of Recorder on any other benchmark

Hi @pkuzqh,

Any chance you could provide a quick step-by-step on how to run Recoder on any other bug than the ones used in the empirical study (i.e., the ones from Defects4J v2.0, QuixBugs, and IntroClassJava)? For instance, any bug from the Bears benchmark or from the Bugs.jar benchmark.

Thanks in advance.

Closure_21, Math_98, Math_65 are two-hunk bugs

Hi, author, first thanks for your wonderful work! I can learn a lot from Recoder.

But for the patch validation phase, I find you only run the failing test cases but do not run all test cases. It could make some biases on the experiment results.
For example, according to http://program-repair.org/defects4j-dissection/#!/, Closure_21, Math_98 and Math_65 are two-hunk bugs, but in your patch file, they are fixed with only one code change.

And why the buggy line of Math_96 is "return real;" ?

Besides, should the patch of Math_30 be correct? (replace int with long, but the developer patch replace int with double)

Also, why the patch of closure_126 is correct ?

Instruction to train Recoder on other languages

Hi all,

To my understanding Recoder needs to extract the host language grammar rules.
So it would be great to have instructions on how to train Recoder on other languages.
Just looking at the code I can see running solvetree.py will produce a rule.pkl, is there anything else required?

Thanks.

AST format

Hi,
The data provided has the AST in the form of a string sequence. I am preparing a new dataset. I have the AST. Can you please share how did you convert the AST to the string sequence? Please help as soon as possible.

Post processing failure on some cases

Hi Dr. Zhu, I'm using your Recoder tool to repair some new bugs and it turns out to be really so powerful! But I observed that on some cases it may fail in post processing phase and I wonder you can help confirm that. In your testone.py script which is the core component of Recoder,

Recoder/testone.py

Line 441 in 83de143

if 'throw' in lines[0] and mode == 1:

tries to judge whether there's a throw statement in lines[0]. This lines is directly from aftercode which is part of the context. But for some special cases where the buggy method is too short (e.g. only one line, like int sum(int a, int b) { return a + b;}), the aftercode is actually empty and so is lines. Therefore lines[0] will trigger an Index Error: list out of range error. I think fixing this issue may help Recoder to fix more bugs. Thank you for taking a look.

On the fault localization results and implementation

Dear Sirs,

Sorry for interrupting you here. I am very interested in the great tool Recoder and appreciate its open-source implementation for public usage. It would be sincerely appreciated if the implementation and results of fault localization could be further provided.

Thank you in advance for your time and help!

Availability of code used for evaluating on Quixbugs, hardware requirements

Dear Authors,

I am researching attention mechanisms in Automatic Program Repair approaches, and Recoder is a strong candidate for inclusion due to its novel architecture and high performance. The primary dataset I'm considering is Quixbugs. Since Recoder offers high performance on that dataset, would it be possible for you to also share the code used to evaluate Recoder on Quixbugs?

Furthermore, I would like to ask what hardware resources were used in the training of Recoder.

Thanks in Advance and Best Regards

RuntimeError: cuDNN error: CUDNN_STATUS_MAPPING_ERROR

when I run the python file, there is a error
I think this error about the sentence "loss, _ = model(dBatch[0], dBatch[1], dBatch[2], dBatch[3], dBatch[4], dBatch[6], dBatch[7], dBatch[8], dBatch[9], tmpf, tmpc, tmpindex, tmpchar, tmpindex2, rulead, antimask2, dBatch[5])"
Can you help me with this questions：

Traceback (most recent call last):
File "run.py", line 1208, in
train()
File "run.py", line 189, in train
loss, _ = model(dBatch[0], dBatch[1], dBatch[2], dBatch[3], dBatch[4], dBatch[6], dBatch[7], dBatch[8], dBatch[9], tmpf, tmpc, tmpindex, tmpchar, tmpindex2, rulead, antimask2, dBatch[5])
File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 541, in call
result = self.forward(*input, **kwargs)
File "/opt/conda/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 152, in forward
outputs = self.parallel_apply(replicas, inputs, kwargs)
File "/opt/conda/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 162, in parallel_apply
return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
File "/opt/conda/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 85, in parallel_apply
output.reraise()
File "/opt/conda/lib/python3.6/site-packages/torch/_utils.py", line 385, in reraise
raise self.exc_type(msg)
RuntimeError: Caught RuntimeError in replica 0 on device 0.
Original Traceback (most recent call last):
File "/opt/conda/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 60, in _worker
output = module(*input, **kwargs)
File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 541, in call
result = self.forward(*input, **kwargs)
File "/root/Repair/Model.py", line 128, in forward
charEm = self.conv(charEm.permute(0, 3, 1, 2))
File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 541, in call
result = self.forward(*input, **kwargs)
File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/conv.py", line 345, in forward
return self.conv2d_forward(input, self.weight)
File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/conv.py", line 342, in conv2d_forward
self.padding, self.dilation, self.groups)
RuntimeError: cuDNN error: CUDNN_STATUS_MAPPING_ERROR

Which pkl is used for tokenization?

Hello！I found your work to be exceptionally insightful and engaging.
I noticed that there are three pkls in your project, namely char_ voc.pkl, code_ voc.pkl and nl_ voc.pkl, so which file is used for tokenization of code readers?

How to acquire Recoder's training dataset?

Dear Authors,
Thank you very much for the great work published in this Github repository. According to my understanding of the paper, there are 103,585 data points used for training Recoder. How can we acquire and investigate these training instances? I failed in retrieving them from Docker, and the provided link to the raw data is on a Pickle format, which is not human-readable. It is much appreciated if you can provide us with an easier way to reach the training dataset.