Fuzz-TK

[Fuzz-TK]A tool that employs fuzzy set relus using features based on semantics, spectra, and mutation to achieve effective spectrum-based fault localization and semantic-based program repair.

Requirements:

Java 1.7
Python 3.6
Defects4J 1.2.0
SVN >= 1.8
Git >= 1.9
Perl >= 5.0.10
PyTorch-1.5.1

Datatest:

If you wish to use publicly available datasets, you can access data_for_transfer.zip.Unzip it and put the extracted datasets in the corresponding path,Put dataset_fl.pkl into ./fault_localization/,Put dataset_pr.pkl into ./program_repair/,Put src_code.pkl into ./fault_localization/binary_classification/d4j_data/ .

Our Works

Fuzz-TK is an automated software debugging method that aims to relieve developers of the significant burden of fixing software failures.

Learing Transfer Knowledge

For generated SLforge or realistic models, place them in the courpus_seed folder. At the command line, type: ModelProcessing. The results will be displayed in the directed_graph.txt file.

Fault Localization Task

Run python train.py <fix_template> to train the corresponding binary classifier, and run python data_preprocess_for_d4j.py <fix_template> to preprocess each suspicious statement in the Defects4J dataset. Then execute python predict_for_d4j.py to obtain the final 11-dimensional semantic features.

Program Repair Task

Run python pipeline.py and python train_github.py to train the multi-classifier, then execute python train_d4j.py to fine-tune the parameters of the model. Place model_save_d4j.m and w2v_32.m into the dnn_model directory, then run python parse_version.py. Finally, run python get_sus_file.py.

Specifically, if you want to use our model quickly, some steps need to be done.

Firstly, navigate to the path ./fault_localization/ranking_task/run_model/ and execute the command python run_group.py.

Next, proceed to the directory ./program_repair/automatic_fix/shell_71_versions/ and run the python generate_shell.py command to create a shell file that includes 71 sequential commands for the 71 bugs in Defects4J.

Finally, execute ./run_repair.sh to acquire the repair results. All repair logs are located in ./program_repair/dnn_model/dnn_tbar_log/, and all generated plausible patches can be found in ./program_repair/automatic_fix/OUTPUT/.

Introduction to each code file

data.m: Preserving transferred knowledge

fault_localization.m: Obtaining the fault location information

program_repair.m: The fault is repaired by migrating knowledge combined with Tbar

datasetfl:

datasetfl consists of a large number of historical bug fixes from popular open-source Java projects on GitHub. Specifically, if a commit message contains keywords such as "error", "bug", "fix", "issue", "incorrect", "fault", "defect", "type", and others, the commit is considered to be related to a bug. There are a total of 1,010,628 bug-related commits identified. The dataset then identifies bug types from bug fix commits. For all 15 repair templates defined in TBar, if the code element \textit{d} with a bug is fixed after applying repair template \textit{gu}, \textit{d} is considered to have a bug of the corresponding type of \textit{gu}. Once the repair template is determined, the code elements before the commit are labeled with their respective error types. After collecting bug fix commits with error type labels, the dataset assigns each commit to one or more groups based on their error types. To collect negative samples, a novel approach is proposed by gathering all statements containing necessary syntax components and their corresponding methods as negative sample candidates to construct a balanced dataset. Finally, the 11 constructed datasets include 392,567 positive samples and an equal number of negative samples (a total of 785,134) for \textit{datasetfl}.

datasetpr:

datasetpr is a dataset that contains 408,091 samples with 11 different categories. The dataset is used for program fixing, where the goal is to select the correct repair template to fix a given erroneous statement. This task can be seen as a multi-class classification problem. Each sample in the dataset represents a version of the code (prior to the submission) and is classified into one of the 11 predefined error types. Since error fixes can be labeled with one or multiple error types, the dataset assigns them to deeper error types in the Abstract Syntax Tree (AST) hierarchy. This allows for the use of smaller change regions in the repair templates, which is preferred by actual developers.

The following tables show the data sizes for datasetfl and datasetpr:

Statistics of Datasetfl and Datasetpr

Fix Templates	Datasetfl	Datasetpr
Insert Null Pointer Checker	11,320	7,600
Insert Missed Statement	94,240	54,445
Mutate Conditional Expression	62,334	30,530
Mutate Data Type	6,586	5,178
Mutate Literal Expression	71,072	55,044
Mutate Method Invocation Expr.	397,508	166,943
Mutate Operators	3,346	2,326
Mutate Return Statement	25,022	20,680
Mutate Variable	59,836	35,156
Move Statement	14,626	7,812
Remove Buggy Statement	39,244	22,377
Total	785,134	408,091

fuzz-tk Goto Github PK

Fuzz-TK

[Fuzz-TK]A tool that employs fuzzy set relus using features based on semantics, spectra, and mutation to achieve effective spectrum-based fault localization and semantic-based program repair.

Datatest:

Our Works

Learing Transfer Knowledge

Fault Localization Task

Program Repair Task

Specifically, if you want to use our model quickly, some steps need to be done.

Introduction to each code file

datasetfl:

datasetpr:

The following tables show the data sizes for datasetfl and datasetpr:

Statistics of Datasetfl and Datasetpr

hhhiiii's Projects

fuzz-tk

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent