pkuzqh / recoder Goto Github PK
View Code? Open in Web Editor NEWLicense: MIT License
License: MIT License
Hi,
The data provided has the AST in the form of a string sequence. I am preparing a new dataset. I have the AST. Can you please share how did you convert the AST to the string sequence? Please help as soon as possible.
Hello!I found your work to be exceptionally insightful and engaging.
I noticed that there are three pkls in your project, namely char_ voc.pkl, code_ voc.pkl and nl_ voc.pkl, so which file is used for tokenization of code readers?
when I run the python file, there is a error
I think this error about the sentence "loss, _ = model(dBatch[0], dBatch[1], dBatch[2], dBatch[3], dBatch[4], dBatch[6], dBatch[7], dBatch[8], dBatch[9], tmpf, tmpc, tmpindex, tmpchar, tmpindex2, rulead, antimask2, dBatch[5])"
Can you help me with this questions:
Traceback (most recent call last):
File "run.py", line 1208, in
train()
File "run.py", line 189, in train
loss, _ = model(dBatch[0], dBatch[1], dBatch[2], dBatch[3], dBatch[4], dBatch[6], dBatch[7], dBatch[8], dBatch[9], tmpf, tmpc, tmpindex, tmpchar, tmpindex2, rulead, antimask2, dBatch[5])
File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 541, in call
result = self.forward(*input, **kwargs)
File "/opt/conda/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 152, in forward
outputs = self.parallel_apply(replicas, inputs, kwargs)
File "/opt/conda/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 162, in parallel_apply
return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
File "/opt/conda/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 85, in parallel_apply
output.reraise()
File "/opt/conda/lib/python3.6/site-packages/torch/_utils.py", line 385, in reraise
raise self.exc_type(msg)
RuntimeError: Caught RuntimeError in replica 0 on device 0.
Original Traceback (most recent call last):
File "/opt/conda/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 60, in _worker
output = module(*input, **kwargs)
File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 541, in call
result = self.forward(*input, **kwargs)
File "/root/Repair/Model.py", line 128, in forward
charEm = self.conv(charEm.permute(0, 3, 1, 2))
File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 541, in call
result = self.forward(*input, **kwargs)
File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/conv.py", line 345, in forward
return self.conv2d_forward(input, self.weight)
File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/conv.py", line 342, in conv2d_forward
self.padding, self.dilation, self.groups)
RuntimeError: cuDNN error: CUDNN_STATUS_MAPPING_ERROR
Hello authors, may I ask for the workflow to process the data from raw data (java methods pairs) into the mode input (data.pkl)? Thanks so much for your time.
Hi Dr. Zhu, I'm using your Recoder tool to repair some new bugs and it turns out to be really so powerful! But I observed that on some cases it may fail in post processing phase and I wonder you can help confirm that. In your testone.py
script which is the core component of Recoder,
Line 441 in 83de143
lines[0]
. This lines
is directly from aftercode
which is part of the context. But for some special cases where the buggy method is too short (e.g. only one line, like int sum(int a, int b) { return a + b;}), the aftercode
is actually empty and so is lines
. Therefore lines[0]
will trigger an Index Error: list out of range
error. I think fixing this issue may help Recoder to fix more bugs. Thank you for taking a look.Hi there,
I'm trying to run recoder on other projects, but I found that the run.py requires to load data at
classcontent = json.load(open("../result/%s.json" % idss, 'r') )
Is there a way to generate such json file for other projects?
Dear Authors,
Thank you very much for the great work published in this Github repository. According to my understanding of the paper, there are 103,585 data points used for training Recoder. How can we acquire and investigate these training instances? I failed in retrieving them from Docker, and the provided link to the raw data is on a Pickle format, which is not human-readable. It is much appreciated if you can provide us with an easier way to reach the training dataset.
Hi all,
To my understanding Recoder needs to extract the host language grammar rules.
So it would be great to have instructions on how to train Recoder on other languages.
Just looking at the code I can see running solvetree.py
will produce a rule.pkl
, is there anything else required?
Thanks.
Dear Authors,
I am researching attention mechanisms in Automatic Program Repair approaches, and Recoder is a strong candidate for inclusion due to its novel architecture and high performance. The primary dataset I'm considering is Quixbugs. Since Recoder offers high performance on that dataset, would it be possible for you to also share the code used to evaluate Recoder on Quixbugs?
Furthermore, I would like to ask what hardware resources were used in the training of Recoder.
Thanks in Advance and Best Regards
Hi @pkuzqh,
Any chance you could provide a quick step-by-step on how to run Recoder on any other bug than the ones used in the empirical study (i.e., the ones from Defects4J v2.0, QuixBugs, and IntroClassJava)? For instance, any bug from the Bears benchmark or from the Bugs.jar benchmark.
Thanks in advance.
Hi,
Can you please share how you created the rule.pkl, code_voc.pkl, char_voc.pkl and nl_voc.pkl? These are read as input. So for a new dataset, I would need to create these files. Please help asap.
Dear Sirs,
Sorry for interrupting you here. I am very interested in the great tool Recoder and appreciate its open-source implementation for public usage. It would be sincerely appreciated if the implementation and results of fault localization could be further provided.
Thank you in advance for your time and help!
Hi there, thank you very much for open-sourcing the work!
I wonder what devices you used for the work. Since I tried to run the training in a machine with 8 Tesla V100-SXM2-16GB, but cannot make it. Besides, I found the code would only utilize 2 GPUs, although I did not specify. I modified the device setting inside run.py
, but still cannot change the fact that only 2 GPUs are used.
Please kindly suggest. Thank you in advance!
I am trying to train the model without the docker. I have downloaded the data0.pkl and data1.pkl. They are mentioned as raw data. Do they need pre-processing? The paths in the Dataset.py code are hardcoded and the filenames do not match data0.pkl and data1.pkl. I have changed them accordingly. But do they need the call to the preprocess function? As passing them to the preprocess function is throwing error. I have thus loaded the data directly into the self.data variable. But now the dataloader is empty and thus training is not happening at all. Please help soon with this data issue.
Hello! Thanks for open-sourcing the work!
I am trying to train the model without the docker. However, I encounter the error shown in the figure.
The error is caused by line 151 of run.py. Specifically, the variable test_set is defined on line 150 of run.py, and the argument "test" is passed to SumDataset to initialize test_set. However, the third branch of the init function in the Dataset.py does not do anything, which causes self.data to be empty.
Furthermore, could you provide the files of the test set and validation set (i.e., valdata.pkl, valnl.pkl, testdata.pkl, testnl.pkl, testcopy.pkl).
Thank you for taking a look.
Hi, author, first thanks for your wonderful work! I can learn a lot from Recoder.
But for the patch validation phase, I find you only run the failing test cases but do not run all test cases. It could make some biases on the experiment results.
For example, according to http://program-repair.org/defects4j-dissection/#!/, Closure_21, Math_98 and Math_65 are two-hunk bugs, but in your patch file, they are fixed with only one code change.
And why the buggy line of Math_96 is "return real;" ?
Besides, should the patch of Math_30 be correct? (replace int with long, but the developer patch replace int with double)
Also, why the patch of closure_126 is correct ?
您好,关于训练数据我有一些疑问。我看到您在云盘提供了两个文件:data0.pkl 和 data1.pkl,请问哪一个是训练时用到的训练数据呢?感谢!
第一个问题:训练模型时所用的data.pkl文件里面是向量数据,这些数据是怎么从代码文本转换得到的。data.pkl文件中的数据的含义是什么?得到这个数据文件的过程可以告知吗?
第二个问题:您是如何从代码文本中生成AST的并运用其特征的呢?
感谢回答,麻烦了
Hi,
I found that when you identify the root node in this line, you have ignored IfStatement and ForStatement. I guess your purpose is to traverse its children. But, how about other statements which also have children? For example, WhileStatement or BlockStatement also has children.
Particularly, I observed several cases in which the root node is BlockStatement. In such cases, your implementation returns the BlockStatement as the root node. As a result, the sub root in this line cannot be found as BlockStatement do not belong to linenode, making Recorder produce nothing.
This problem can be addressed by adding the condition of root.name != 'BlockStatement'
to this line but I am not sure if it is in your intention or not. Thanks
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.