skyhover / deckard Goto Github PK
View Code? Open in Web Editor NEWCode clone detection; clone-related bug detection; sematic clone analysis
License: Other
Code clone detection; clone-related bug detection; sematic clone analysis
License: Other
This is a release package of Deckard -- a tree-based, scalable, and accurate code clone detection tool. It is also capable of reporting clone-related bugs. ********************************************************************** * * LICENSE * ********************************************************************** Copyright (c) 2007-2018, University of California / Singapore Management University Lingxiao Jiang <[email protected]> <[email protected]> Ghassan Misherghi <[email protected]> Zhendong Su <[email protected]> Stephane Glondu <[email protected]> All rights reserved. Three-clause BSD licence ********************************************************************** * * Version 2.0 + support for Solidity syntax * May 25th, 2018 * * What's new? * ********************************************************************** - Faster clone detection by parallized execution of various components in DECKARD. Use "MAX_PROCS=<number>" in the "config" file to set the maximum number of processes that may be used for executions of DECKARD. - Compatibility fixes for Mac OS - Support for Java 7 syntax, contributed by Prof. Gail Kaiser and her students in the Programming Systems Lab at Columbia University. - Support for Solidity syntax (with small tweaks) ********************************************************************** * * Installation * ********************************************************************** In bash shell or cygwin, go into the folder: /path/to/src/main/ and run the build script: ./build.sh For convenience, can add "/path/to/src/main" into $PATH. NOTE: Deckard's built-in parser previously cannot handle Java 1.5 or later features. It has been upgraded for Java 7 syntax. It should now be able to generate vectors for Java files that use Java 6 and 7 features. NOTE: The compiled executables may not be "executable" (showing "Permission Denied") on Windows Vista/7 due to false alarms of UAC rules (based on file path/hash of a .exe). A simple (but may not be desirable) workaround is to run cygwin shell with elevated privileges before invoking the above scripts. Also, Deckard's performance may be tens of times slower when executed in cygwin than on Linux due to slow I/O operations. To uninstall, go into the folder: /path/to/src/main/ and simply run: ./clean.sh ********************************************************************* * * Usage * ********************************************************************* 1. For clone detection (suppose the source code of your application is in /path/to/app/src): - Specify the location of your source code, say /path/to/app/src. - Create a "config" file in /path/to/app/, following the sample "config" in samples/ or the template "config-sample" in scripts/clonedetect/. Make sure all paths are valid and the programming language is specified correctly. - (Optional) create other three directories in /path/to/app/ for storing outputs (see what's in samples/). These directories may be automatically created if specified in 'config'. - Batch mode run of clone detection (no bug detection by default): "/path/to/scripts/clonedetect/deckard.sh" An optional parameter to the script is 'clean', 'clean_all', or 'overwrite' - Instead of running 'deckard.sh', you may also run the scripts called in 'deckard.sh' step-by-step by yourself: -- Vector generation: from where "config" is, run "/path/to/scripts/clonedetect/vdbgen" An optional parameter to the script is 'clean', 'clean_all', or 'overwrite' -- Vector clustering (i.e., clone detection): from where "config" is, run "/path/to/scripts/clonedetect/vertical-param-batch" An optional parameter to the script is 'clean', 'clean_all', or 'overwrite' 2. Vector generation for parts of a file: - Identify the source file name, say /path/to/src/filename.java and the range [s, e] of line numbers you'd like to have a vector generated - Run "src/main/jvecgen [options] /path/to/src/filename.java --start-line-number s --end-line-number e" Run "jvecgen -h" for more options. Note that different vecgen (cvecgen, jvecgen, phpvecgen) should be used for files in different languages. This vecgen command will generate a vector representing the code between Line 's' and 'e' in the source file, and store the vector in "filename.java.vec" by default. 3. Detection of clone-related bugs: - Invoke 'bugfiltering' on a clone report file with a specified language, e.g., /path/to/scripts/bugdetect/bugfiltering cluster_result c > bug_result - Optionally transform 'bug_result' to a html file for easier inspection of the reported potentially buggy clones in a web browser: /path/to/src/main/out2html bug_result > bug_result.html - See 'deckard.sh' for how to run it in a batch mode (not enabled by default). ********************************************************************* * * What are in the package * ********************************************************************* 1. Organization The whole package is organized according to the several components in Deckard: - Parse tree generation -- src/include/ : a generic interface for trees -- src/ptgen/ : ANTLR parser generator -- src/ptgen/gcc : a grammar for C (GNU C extensions) and its parse tree generator -- src/ptgen/java : a grammar for Java (<=1.4) and its parse tree generator -- src/ptgen/php5 : a grammar for php5 and its parse tree generator - Vector generation -- src/vgen/treeTra/ : a generic tree traversal framework based on the generic tree interface in src/include, and vector generation based on tree traversal, mostly C++. -- src/vgen/vgrouping/ : code for vector grouping (mix of C,C++,python,bash) - Vector clustering -- src/lsh/ : the LSH package and an interface for Deckard to use (src/lsh/source/enumBuckets.cpp). - Main entrances -- src/main/ptree.cc : an implementation of the tree interface -- src/main/main.cc : entrance for vector generation -- src/main/parseTreeMain.cc : entrance for parse tree dumping, can be useful for inspecting detected clones, bugs, and their related parse trees -- src/main/bugmain.cc : entrance for bug filtering -- src/main/out2html.C : entrance for adding html tags into clone/bug reports - Scripts gluing things together -- scripts/clonedetect/ : bash and python scripts --- deckard.sh : batch-mode clone detection --- vdbgen : batch-mode vector generation --- vertical-param-batch : batch-mode vector clustering -- scripts/bugdetect/ : bash and python scripts -- various auxiliary scripts for simple statistics - Others -- README -- LICENSE 2. Details about the clone/bug detection algorithms can be found in these two papers: - DECKARD: Scalable and Accurate Tree-based Detection of Code Clones, by Lingxiao JIANG, Ghassan MISHERGHI, Zhendong SU, and Stephane GLONDU. In the proceedings of 29th International Conference on Software Engineering (ICSE '07), Minneapolis, Minnesota, USA, 2007. - Context-Based Detection of Clone-Related Bugs, by Lingxiao JIANG, Zhendong SU, and Edwin CHIU. In the proceedings of the 6th joint meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering (ESEC/FSE'07), Dubrovnik, Croatia, 2007. ********************************************************************** * * How to programmably use the vectors and the clone reports? * ********************************************************************** 1. How to get the subtree representing each clone? Each clone in the reports has a TBID and a TEID, in addition to the file name, and line numbers. The TBID and TEID uniquely identify the IDs of the first token and the last token in the clone from the original file (possibly containing parsing errors). To maintain consistent counting of the IDs, you should leave the work to "yyparse()" and Deckard's TokenCounter for how the IDs are calculated (see TraGenMain::run() for implementation details). The following are the main steps for getting the subtree for a clone (please refer to "src/vgen/treeTra/token-tree-map.h" for more implementation details): - Given a line from the clone report file, parse it to get file name, line numbers, TBID, and TEID, etc. C.f. the function: bool parse(char * line, regex_t patterns[], int dim=ENUM_CLONE_THE_END) - Call the following function (which calls "yyparse()" and a token counter) to get a whole parse tree for the source file and the token IDs for every node: ParseTree* TokenTreeMap::parseFile(const char * filename) - Call the following function to get the smallest tree that contains all tokens between TBID and TEID: Tree* tokenRange2Tree2(std::pair<long, long> tokenrange, ParseTree* pt) - Then do whatever you'd like with the returned tree. Note that vectors are NOT generated for this tree yet. If vectors are needed, do the following: -- Create a new object of type TraGenMain and call "TraGenMain::run(0, 0)" (c.f., src/main/main.cc) -- Retrieve the vector for the tree: TreeVector* tv = TreeAccessor::get_node_vector(Tree* tree_node_pointer) -- If you also want some merged vectors from the child nodes of this tree, that would require calls to TraGenMain::run() with different parameters or adjust the internals of TraGenMain::run(), depending on how you want the vectors to be presented to you. Feel free to improve the vector generation, both the core and its interface/APIs. 2. How to get the vector for a line or a sequence of lines from a file? - Option 1: See above: Use "vector generation for parts of a file" with your scripts. - Option 2: Given the parse tree for a file (produced by TokenTreeMap::parseFile() and yyparse()) and the starting and ending line numbers, do the following: -- (If not done before,) Call Deckard's vector generator on the parse tree through TraGenMain::run, same as above. Please refer to src/main/main.cc, TraGenMain::run(int startln, int endln), and VecGenerator::traverse(Tree* root, Tree* init). -- Call the following function (c.f. src/include/ptree.h, src/main/ptree.cc) to return the smallest tree enclosing all elements from these lines: Tree* ParseTree::line2Tree(int startln, int endln) -- Then retrieve the vector (the actual vector generation is done beforehand): TreeVector* tv = TreeAccessor::get_node_vector(tree_node_pointer) Enjoy and Feedback :=) @Deckard : Am I a clone?
After I put my directory into the config in the sample directory, I can run the clone detection but I get the following output:
= Vector clustering w/ MIN_TOKENS=30, STRIDE=2, SIMILARITY=0.95 ...
grouping: vectors/vdb_30_2 with distance=5,477226...Done grouping 30 2 5,477226. See >groups in vectors/vdb_30_2_g[0-9]_5,477226_30
paramsetting: 30 2 0.95 ...Error: paramsetting failure: no vector group found: 30 2 0.95
Error: problem in vec clustering step. Stop and check logs in times/
So I'm not sure I can trust what is output in clusters/post_cluster...
What is wrong?
Thanks,
Stefan
Hi, I'm getting:
a - token-counter.o
a - sq-tree.o
a - node-vec-gen.o
a - vector-output.o
a - vector-merger.o
a - tree-accessor.o
a - token-tree-map.o
a - clone-context-php.o
rm -f vectorsort dispatchvectors computeranges *~ *.o
gcc -O3 -O3 vectorsort.c -lm -o vectorsort
gcc -O3 -O3 dispatchvectors.c -lm -o dispatchvectors
gcc -O3 -O3 computeranges.c -lm -o computeranges
rm -f *.o cvecgen jvecgen cbugfilters jbugfilters out2html phpvecgen phpbugfilters out2xml cParseTreeMain jParseTreeMain phpParseTreeMain
g++ -o ptreeC.o -O3 -I../include -I../vgen/treeTra -c -DCLANG ptree.cc
make: *** No rule to make target '../ptgen/gcc/gccptgen.a', needed by 'cvecgen'. Stop.
Error: main make failed. Exit.
./build.sh 7.49s user 0.35s system 85% cpu 9.207 total
by just executing the build.sh in src/main
Hi , I got this error when running build.sh:
rm -f *.pyc make -C simple clean make[1]: Entering directory '/home/ayf/Deckard-rel2.0solidity/src/ptgen/simple' rm -f *.o lex.yy.cc pt_c.tab* pt_c.y head.cc c_ptgen make[1]: Leaving directory '/home/ayf/Deckard-rel2.0solidity/src/ptgen/simple' make -C gcc clean make[1]: Entering directory '/home/ayf/Deckard-rel2.0solidity/src/ptgen/gcc' rm -f *.o lex.yy.cc pt_c.tab* pt_c.y head.cc gccptgen.a make[1]: Leaving directory '/home/ayf/Deckard-rel2.0solidity/src/ptgen/gcc' make -C java clean make[1]: Entering directory '/home/ayf/Deckard-rel2.0solidity/src/ptgen/java' rm -f *.o lex.yy.cc pt_j.tab* pt_j.y head.cc javaptgen.a make[1]: Leaving directory '/home/ayf/Deckard-rel2.0solidity/src/ptgen/java' make -C php5 clean make[1]: Entering directory '/home/ayf/Deckard-rel2.0solidity/src/ptgen/php5' rm -f *.o lex.yy.cc pt_zend_language_parser.tab* pt_zend_language_parser.y head.cc phpptgen.a make[1]: Leaving directory '/home/ayf/Deckard-rel2.0solidity/src/ptgen/php5' make -C sol clean make[1]: Entering directory '/home/ayf/Deckard-rel2.0solidity/src/ptgen/sol' rm -f *.o lex.yy.cc pt_solidity.* head.cc solidityptgen.a make[1]: Leaving directory '/home/ayf/Deckard-rel2.0solidity/src/ptgen/sol' make -C gcc make[1]: Entering directory '/home/ayf/Deckard-rel2.0solidity/src/ptgen/gcc' ./mainc.py c.y Traceback (most recent call last): File "/home/ayf/Deckard-rel2.0solidity/src/ptgen/gcc/./mainc.py", line 43, in <module> import YaccParser,YaccLexer File "/home/ayf/Deckard-rel2.0solidity/src/ptgen/gcc/../YaccParser.py", line 8 False = 0 ^^^^^ SyntaxError: cannot assign to False make[1]: *** [Makefile:62: pt_c.y] Error 1 make[1]: Leaving directory '/home/ayf/Deckard-rel2.0solidity/src/ptgen/gcc' make: *** [Makefile:35: TARGET] Error 2 Error: ptgen make failed. Exit. Error: ptgen make failed. Deckard build fails.
it seemed that YaccParser.py assigned to False, which is not accepted in python.
Did I have the wrong environment or something went wrong ?
.....deleting intermediate vector files....Done
I followed the steps what README.md say.
But when I installed the Deckard,I want to test the clone detection...
I create a "config" file in the path /home/xx/projects/Deckard,And the content is same as "config" in /sample,
The configuration file is as follows:
FILE_PATTERN='*.java' # used in the 'find' command below
#where are the source files?
SRC_DIR="src"
PDG_DIR="ddgs" # used by Deckard2 for 'find $SRC_DIR -ipath "*/$PDG_DIR/$FILE_PATTERN"'
AST_DIR="asts" # each pdg should have an ast with the same name in a different folder
#where are node definition files? used by Deckard2
TYPE_FILE='/home/ly/projects/Deckard/testdata/deckard3/AstNodeTypeNamesIDs.txt'
RELEVANT_NODEFILE='/home/ly/projects/Deckard/testdata/deckard3/AstRelevantNodes.txt'
LEAF_NODEFILE='/home/ly/projects/Deckard/testdata/deckard3/AstLeafNodes.txt'
PARENT_NODEFILE='/home/ly/projects/Deckard/testdata/deckard3/AstParentNodes.txt'
#####The above are for Deckard2 only #####
#where is Deckard?
DECKARD_DIR="/home/ly/projects/Deckard"
#clone parameters; refer to paper.
MIN_TOKENS='30 50' # can be a sequence of integers
STRIDE='2 0' # can be a sequence of integers
SIMILARITY='1.0 0.95' # can be a sequence of values <= 1
#DISTANCE='0 0.70711 1.58114 2.236'
###########################################################
#Where to store result files?
#where to output generated vectors?
VECTOR_DIR="vectors"
#where to output detected clone clusters?
CLUSTER_DIR="clusters"
#where to output timing/debugging info?
TIME_DIR="times"
##########################################################
#where are several programs we need?
#where is the vector generator?
VGEN_EXEC="$DECKARD_DIR/src"
case $FILE_PATTERN in
*.dot )
VGEN_EXEC="$VGEN_EXEC/dot2d/dotvgen" ;; # for Deckard2 dot only
*.java )
VGEN_EXEC="$VGEN_EXEC/main/jvecgen" ;;
*.php )
VGEN_EXEC="$VGEN_EXEC/main/phpvecgen" ;;
*.c | *.h )
VGEN_EXEC="$VGEN_EXEC/main/cvecgen" ;;
MAX_PROCS=8
GROUPING_S='30' # should be a single value
#GROUPING_D
#GROUPING_C
export DECKARD_DIR
export FILE_PATTERN
export SRC_DIR
export PDG_DIR
export AST_DIR
export TYPE_FILE
export RELEVANT_NODEFILE
export LEAF_NODEFILE
export PARENT_NODEFILE
export VECTOR_DIR
export TIME_DIR
export CLUSTER_DIR
export VGEN_EXEC
export GROUPING_EXEC
export CLUSTER_EXEC
export POSTPRO_EXEC
export SRC2HTM_EXEC
export SRC2HTM_OPTS
export MIN_TOKENS
export STRIDE
#export DISTANCE
export SIMILARITY
export GROUPING_S
export GROUPING_D
export GROUPING_C
export MAX_PROCS
But when I follow the next step to run,there will be a error.
`ly@ubuntu:~/projects/Deckard$ sh /home/ly/projects/Deckard/scripts/clonedetect/deckard.sh
DECKARD--A Tree-Based Code Clone Detection Toolkit.
/home/ly/projects/Deckard/scripts/clonedetect/deckard.sh: 4: /home/ly/projects/Deckard/scripts/clonedetect/deckard.sh: [[: not found
==== Configuration checking.../home/ly/projects/Deckard/scripts/clonedetect/deckard.sh: 81: /home/ly/projects/Deckard/scripts/clonedetect/configure: [[: not found
Error: no config file in current directory
I don't know how to fix it.....
Can someone give me some advice,Thx
Hi.
I want to build the Deckard but got error in Error: ptgen make failed. Exit.Error: ptgen make failed. Deckard build fails.
I have tried the solutions in other issues like install the newest version of packages, edit the file /src/ptgen/gcc/mainc.py to use python2 .
I also changed my OS to the Ubuntu 12.
But still get the errors below.
Can anyone help me? Thanks a lot!
syu@ubuntu:~/workspaces/Deckard/src/main$ sudo ./build.sh
rm -f *.pyc
make -C simple clean
make[1]: Entering directory/home/syu/workspaces/Deckard/src/ptgen/simple' rm -f *.o lex.yy.cc pt_c.tab* pt_c.y head.cc c_ptgen make[1]: Leaving directory
/home/syu/workspaces/Deckard/src/ptgen/simple'
make -C gcc clean
make[1]: Entering directory/home/syu/workspaces/Deckard/src/ptgen/gcc' rm -f *.o lex.yy.cc pt_c.tab* pt_c.y head.cc gccptgen.a make[1]: Leaving directory
/home/syu/workspaces/Deckard/src/ptgen/gcc'
make -C java clean
make[1]: Entering directory/home/syu/workspaces/Deckard/src/ptgen/java' rm -f *.o lex.yy.cc pt_j.tab* pt_j.y head.cc javaptgen.a make[1]: Leaving directory
/home/syu/workspaces/Deckard/src/ptgen/java'
make -C php5 clean
make[1]: Entering directory/home/syu/workspaces/Deckard/src/ptgen/php5' rm -f *.o lex.yy.cc pt_zend_language_parser.tab* pt_zend_language_parser.y head.cc phpptgen.a make[1]: Leaving directory
/home/syu/workspaces/Deckard/src/ptgen/php5'
make -C sol clean
make[1]: Entering directory/home/syu/workspaces/Deckard/src/ptgen/sol' rm -f *.o lex.yy.cc pt_solidity.* head.cc solidityptgen.a make[1]: Leaving directory
/home/syu/workspaces/Deckard/src/ptgen/sol'
make -C gcc
make[1]: Entering directory/home/syu/workspaces/Deckard/src/ptgen/gcc' ./mainc.py c.y bison -d pt_c.y -o pt_c.tab.cc pt_c.y: conflicts: 11 shift/reduce flex -olex.yy.cc c.l g++ -O3 -I../../include -c -o lex.yy.o lex.yy.cc g++ -O3 -I../../include -c -o pt_c.tab.o pt_c.tab.cc pt_c.tab.cc: In function ‘int yyparse()’: pt_c.tab.cc:13685:35: warning: deprecated conversion from string constant to ‘char*’ [-Wwrite-strings] pt_c.tab.cc:13827:35: warning: deprecated conversion from string constant to ‘char*’ [-Wwrite-strings] g++ -O3 -I../../include -c -o head.o head.cc ar -csrv gccptgen.a lex.yy.o pt_c.tab.o head.o a - lex.yy.o a - pt_c.tab.o a - head.o make[1]: Leaving directory
/home/syu/workspaces/Deckard/src/ptgen/gcc'
make -C java
make[1]: Entering directory/home/syu/workspaces/Deckard/src/ptgen/java' ./mainj.py j.y bison -d pt_j.y -o pt_j.tab.cc pt_j.y: conflicts: 24 shift/reduce, 259 reduce/reduce flex -olex.yy.cc j.l g++ -O3 -I../../include -c -o lex.yy.o lex.yy.cc g++ -O3 -I../../include -c -o pt_j.tab.o pt_j.tab.cc pt_j.tab.cc: In function ‘int yyparse()’: pt_j.tab.cc:17408:35: warning: deprecated conversion from string constant to ‘char*’ [-Wwrite-strings] pt_j.tab.cc:17550:35: warning: deprecated conversion from string constant to ‘char*’ [-Wwrite-strings] g++ -O3 -I../../include -c -o head.o head.cc ar -csrv javaptgen.a lex.yy.o pt_j.tab.o head.o a - lex.yy.o a - pt_j.tab.o a - head.o make[1]: Leaving directory
/home/syu/workspaces/Deckard/src/ptgen/java'
make -C php5
make[1]: Entering directory/home/syu/workspaces/Deckard/src/ptgen/php5' ./mainphp.py zend_language_parser.y sed -i -e "s/'\"'/'\\\\\"'/" head.cc bison -d pt_zend_language_parser.y -o pt_zend_language_parser.tab.cc flex -i -olex.yy.cc zend_language_scanner.l g++ -O3 -I../../include -c -o lex.yy.o lex.yy.cc zend_language_scanner.l: In function ‘int yylex(YYSTYPE*)’: zend_language_scanner.l:906:67: warning: format ‘%s’ expects argument of type ‘char*’, but argument 3 has type ‘int’ [-Wformat] zend_language_scanner.l:906:67: warning: format ‘%d’ expects a matching ‘int’ argument [-Wformat] lex.yy.cc:4873:57: warning: deprecated conversion from string constant to ‘char*’ [-Wwrite-strings] lex.yy.cc: In function ‘int yy_get_next_buffer()’: lex.yy.cc:4894:61: warning: deprecated conversion from string constant to ‘char*’ [-Wwrite-strings] lex.yy.cc:4962:51: warning: deprecated conversion from string constant to ‘char*’ [-Wwrite-strings] lex.yy.cc:4975:3: warning: deprecated conversion from string constant to ‘char*’ [-Wwrite-strings] lex.yy.cc:4975:3: warning: deprecated conversion from string constant to ‘char*’ [-Wwrite-strings] lex.yy.cc:5005:68: warning: deprecated conversion from string constant to ‘char*’ [-Wwrite-strings] lex.yy.cc: In function ‘void yyunput(int, char*)’: lex.yy.cc:5102:54: warning: deprecated conversion from string constant to ‘char*’ [-Wwrite-strings] lex.yy.cc: In function ‘yy_buffer_state* yy_create_buffer(FILE*, int)’: lex.yy.cc:5261:65: warning: deprecated conversion from string constant to ‘char*’ [-Wwrite-strings] lex.yy.cc:5270:65: warning: deprecated conversion from string constant to ‘char*’ [-Wwrite-strings] lex.yy.cc: In function ‘void yyensure_buffer_stack()’: lex.yy.cc:5427:71: warning: deprecated conversion from string constant to ‘char*’ [-Wwrite-strings] lex.yy.cc:5447:71: warning: deprecated conversion from string constant to ‘char*’ [-Wwrite-strings] lex.yy.cc: In function ‘yy_buffer_state* yy_scan_buffer(char*, yy_size_t)’: lex.yy.cc:5473:63: warning: deprecated conversion from string constant to ‘char*’ [-Wwrite-strings] lex.yy.cc: In function ‘yy_buffer_state* yy_scan_bytes(const char*, int)’: lex.yy.cc:5522:62: warning: deprecated conversion from string constant to ‘char*’ [-Wwrite-strings] lex.yy.cc:5531:51: warning: deprecated conversion from string constant to ‘char*’ [-Wwrite-strings] lex.yy.cc: In function ‘void yy_push_state(int)’: lex.yy.cc:5557:68: warning: deprecated conversion from string constant to ‘char*’ [-Wwrite-strings] lex.yy.cc: In function ‘void yy_pop_state()’: lex.yy.cc:5568:53: warning: deprecated conversion from string constant to ‘char*’ [-Wwrite-strings] g++ -O3 -I../../include -c -o pt_zend_language_parser.tab.o pt_zend_language_parser.tab.cc pt_zend_language_parser.tab.cc: In function ‘int yyparse()’: pt_zend_language_parser.tab.cc:11522:35: warning: deprecated conversion from string constant to ‘char*’ [-Wwrite-strings] pt_zend_language_parser.tab.cc:11664:35: warning: deprecated conversion from string constant to ‘char*’ [-Wwrite-strings] g++ -O3 -I../../include -c -o head.o head.cc ar -csrv phpptgen.a lex.yy.o pt_zend_language_parser.tab.o head.o a - lex.yy.o a - pt_zend_language_parser.tab.o a - head.o make[1]: Leaving directory
/home/syu/workspaces/Deckard/src/ptgen/php5'
make -C sol
make[1]: Entering directory/home/syu/workspaces/Deckard/src/ptgen/sol' ./mainsol.py solidity.y bison -d pt_solidity.y -o pt_solidity.tab.cc -v -g pt_solidity.y:255.1-11: invalid directive:
%precedence'
pt_solidity.y:254.8-10: %type redeclaration for UFIXED
pt_solidity.y:231.62-67: previous declaration
pt_solidity.y:254.8-10: %type redeclaration for FIXED
pt_solidity.y:231.56-60: previous declaration
pt_solidity.y:254.8-10: %type redeclaration for BYTE
pt_solidity.y:231.51-54: previous declaration
pt_solidity.y:254.8-10: %type redeclaration for BYTES
pt_solidity.y:231.45-49: previous declaration
pt_solidity.y:254.8-10: %type redeclaration for UINT
pt_solidity.y:231.40-43: previous declaration
pt_solidity.y:254.8-10: %type redeclaration for INT
pt_solidity.y:231.36-38: previous declaration
pt_solidity.y:254.8-10: %type redeclaration for VAR
pt_solidity.y:231.32-34: previous declaration
pt_solidity.y:254.8-10: %type redeclaration for STRING
pt_solidity.y:231.25-30: previous declaration
pt_solidity.y:254.8-10: %type redeclaration for BOOL
pt_solidity.y:231.20-23: previous declaration
pt_solidity.y:254.8-10: %type redeclaration for ADDRESS
pt_solidity.y:231.12-18: previous declaration
pt_solidity.y:270.1-11: invalid directive:%precedence' pt_solidity.y:269.8-10: %type redeclaration for DELETE pt_solidity.y:233.39-44: previous declaration pt_solidity.y:269.8-10: %type redeclaration for AFTER pt_solidity.y:233.33-37: previous declaration pt_solidity.y:273.1-11: invalid directive:
%precedence'
make[1]: *** [pt_solidity.tab.cc] Error 1
make[1]: Leaving directory `/home/syu/workspaces/Deckard/src/ptgen/sol'
make: *** [TARGET] Error 2
Error: ptgen make failed. Exit.
Error: ptgen make failed. Deckard build fails.
Only PHP 5 is supported at the moment.
Ln 52 scripts/bugfiltering
filterpath = os.environ.get("DECKARD_DIR")
The bash crashes, stating it cannot find the Deckard path.
Hi
I am trying to compile Deckard on a Linux system, but it stops, because it tries to find "dot2d".
Is this some kind of third party lib I should add? If so were should it be placed?
Here is a part of the log:
Everything cool above here:
In braces I translated the error from German to English.
Cheers and Thanks
I noticed that the Deckard 2 config parameters for TYPE_FILE, RELEVANT_NODEFILE, LEAF_NODEFILE and PARENT_NODEFILE of the sample config point to the nonexistent directory Deckard/testdata.
I assume that they are pretty important, as the detection outputs a lot of garbage if they are not changed.
What is supposed to be in these files? I assume this is about the node types for the ASTs, but I cant figure out how to specify them.
I'm using Java and want to run Deckard on BigCloneEval. The clones should have method level granularity.
It is especially important that I can configure Deckard to prune irrelevant NODE types early, as I want to run a performance analysis and comparison, and it doesn't feel fair to run Deckard on a lot more ASTs than necessary.
I've got a problem when doing clone detecting with my C codes. The feed back is like this
"Error: problem in vec generator step. Stop and check logs in times/"
Could you tell me what might be the problem? Thanks a lot.
Hi,
I am trying to detect clones from a slice, how can I use Deckard to detect clones from a slice?
Thanks!
I have run Deckard on the code of about 30 java projects. The resulting cluster_vdb_50_0_allg_0.95_30 is not empty but the corresponding post_cluster_vdb_50_0_allg_0.95_30 file is empty. Why does this happen? Is it because there are too much suspicious clones in cluster file and then in the post-process all the clones are excluded leading to empty post_cluster file?
In Mac OS(Mojave 10.14.5) and Linux(Ubuntu 18.04.2 LTS), cannot build.
I command $sh build.sh
in src/main/
rm -f *.pyc
make -C simple clean
rm -f .o lex.yy.cc pt_c.tab pt_c.y head.cc c_ptgen
make -C gcc clean
rm -f .o lex.yy.cc pt_c.tab pt_c.y head.cc gccptgen.a
make -C java clean
rm -f .o lex.yy.cc pt_j.tab pt_j.y head.cc javaptgen.a
make -C php5 clean
rm -f .o lex.yy.cc pt_zend_language_parser.tab pt_zend_language_parser.y head.cc phpptgen.a
make -C sol clean
rm -f .o lex.yy.cc pt_solidity. head.cc solidityptgen.a
make -C gcc
./mainc.py c.y
Traceback (most recent call last):
File "./mainc.py", line 43, in
import YaccParser,YaccLexer
File "../YaccParser.py", line 77
except antlr.RecognitionException, ex:
^
SyntaxError: invalid syntax
make[1]: *** [pt_c.y] Error 1
make: *** [TARGET] Error 2
Error: ptgen make failed. Exit.
Error: ptgen make failed. Deckard build fails.
rm -f *.pyc
make -C simple clean
make[1]: Entering directory '/home/imseongbin/Deckard/src/ptgen/simple'
rm -f .o lex.yy.cc pt_c.tab pt_c.y head.cc c_ptgen
make[1]: Leaving directory '/home/imseongbin/Deckard/src/ptgen/simple'
make -C gcc clean
make[1]: Entering directory '/home/imseongbin/Deckard/src/ptgen/gcc'
rm -f .o lex.yy.cc pt_c.tab pt_c.y head.cc gccptgen.a
make[1]: Leaving directory '/home/imseongbin/Deckard/src/ptgen/gcc'
make -C java clean
make[1]: Entering directory '/home/imseongbin/Deckard/src/ptgen/java'
rm -f .o lex.yy.cc pt_j.tab pt_j.y head.cc javaptgen.a
make[1]: Leaving directory '/home/imseongbin/Deckard/src/ptgen/java'
make -C php5 clean
make[1]: Entering directory '/home/imseongbin/Deckard/src/ptgen/php5'
rm -f .o lex.yy.cc pt_zend_language_parser.tab pt_zend_language_parser.y head.cc phpptgen.a
make[1]: Leaving directory '/home/imseongbin/Deckard/src/ptgen/php5'
make -C sol clean
make[1]: Entering directory '/home/imseongbin/Deckard/src/ptgen/sol'
rm -f .o lex.yy.cc pt_solidity. head.cc solidityptgen.a
make[1]: Leaving directory '/home/imseongbin/Deckard/src/ptgen/sol'
make -C gcc
make[1]: Entering directory '/home/imseongbin/Deckard/src/ptgen/gcc'
./mainc.py c.y
bison -d pt_c.y -o pt_c.tab.cc
make[1]: bison: Command not found
Makefile:59: recipe for target 'pt_c.tab.cc' failed
make[1]: *** [pt_c.tab.cc] Error 127
make[1]: Leaving directory '/home/imseongbin/Deckard/src/ptgen/gcc'
Makefile:35: recipe for target 'TARGET' failed
make: *** [TARGET] Error 2
Error: ptgen make failed. Exit.
Error: ptgen make failed. Deckard build fails.
plz, help me.
Build fails. It seems that it is related to solidity parser.
/mainsol.py solidity.y
bison -d pt_solidity.y -o pt_solidity.tab.cc -v -g
pt_solidity.y:213.9-15: syntax error, unexpected identifier, expecting string
make[1]: *** [pt_solidity.tab.cc] Error 1
make: *** [TARGET] Error 2
I receive this error message when running on sample code in /Deckard/samples/src
DECKARD--A Tree-Based Code Clone Detection Toolkit.
==== Configuration checking...Done.
==== Start clone detection ====
Vector generation.../home/shijing/ra/codeReuse/Deckard/src/main/jvecgen *.java
vgen: 30 2 ...Done. Log: times/vgen_30_2
...deleting intermediate vector files...Done
vgen: 30 0 ...Done. Log: times/vgen_30_0
...deleting intermediate vector files...Done
vgen: 50 2 ...Done. Log: times/vgen_50_2
...deleting intermediate vector files...Done
vgen: 50 0 ...Done. Log: times/vgen_50_0
...deleting intermediate vector files...Done
Error: problem in vec generator step. Stop and check logs in times/
Did anyone encounter similar situation?
Hi and thanks for the tool!
I set up a config file to test my C project, following the one reported as sample in scripts/clonedect
, but I obtain this error after running ./deckard.sh
:
==== Configuration checking...Error: missing file ~/Deckard-rel2.0solidity/scripts/clonedetect/src/main/cvecgen. Check your config
any suggestion?
Thanks in advance
Hi,
Thank you for your great tool.
I am currently using Deckard for my research. However, when I run it multiple times with the same set of hyperparameters on the same dataset, I get different results. This affects the reproducibility of my research. Any chance to set seed?
Kind regards.
When I run the bugfiltering command, the results showed "Command line options for filters IDs not implemented" and "Cannot open file : src/AbstractAsyncTableRendering.java". The command I used is "scripts/bugdetect/bugfiltering samples/clusters/post_cluster_vdb_50_0_allg_0.95_30 java > bug_result". Do you have any idea about how to solve the problem? Thanks.
Hi,
I executed Deckard to detect clones on a dataset of 47k source files. However, after a day of execution I faced with the an error. following,, you can find the content of different log files.
Clustering 'vectors/vdb_50_4_g9_2.50998_30_100000' 6.513064 ...
/home/local/SAIL/amir/tasks/RQ2/RQ2.2/Deckard/src/lsh/bin/enumBuckets -R 6.513064 -M 7600000000 -b 2 -A -f vectors/vdb_50_4_g9_2.50998_30_100000 -c -p vectors/vdb_50_4_g9_2.50998_30_100000.param > clusters/cluster_vdb_50_4_g9_2.50998_30_100000
Warning: output all clones. Takes more time...
Warning: will compute parameters
Error: the structure supports at most 2097151 points (3238525 were specified).
real 2m58.162s
user 2m50.464s
sys 0m7.492s
cluster: Possible errors occurred with LSH. Check log: times/cluster_vdb_50_4_g9_2.50998_30_100000
paramsetting: 50 4 0.79 ...Looking for optimal parameters by Clustering 'vectors/vdb_50_4_g9_2.50998_30_100000' 6.513064 ...
/home/local/SAIL/amir/tasks/RQ2/RQ2.2/Deckard/src/lsh/bin/enumBuckets -R 6.513064 -M 7600000000 -b 2 -A -f vectors/vdb_50_4_g9_2.50998_30_100000 -c -p vectors/vdb_50_4_g9_2.50998_30_100000.param > clusters/cluster_vdb_50_4_g9_2.50998_30_100000
cluster: Possible errors occurred with LSH. Check log: times/cluster_vdb_50_4_g9_2.50998_30_100000
Error: paramsetting failure...exit.
grouping: vectors/vdb_50_4 with distance=2.50998...Total 7602630 vectors read in; 11282415 vectors dispatched into 57 ranges (actual groups may be many fewer).
real 410m12.610s
user 6m43.592s
sys 26m6.544s
Done grouping 50 4 2.50998. See groups in vectors/vdb_50_4_g[0-9]_2.50998_30
Note that I have sufficient memory for execution; Thus, I added two other conditions for the memory limit setting in both vecquery and vertical-param-batch files. The reason I increased the memory limit is that my vectors size is greater than 2G and I have no problem with the availability of enough memory. Now the conditions are like this:
# dumb (not flexible) memory limit setting
mem=`wc "$vdb" | awk '{printf("%.0f", $3/1024/1024+0.5)}'`
if [ $mem -lt 2 ]; then
mem=10000000
elif [ $mem -lt 5 ]; then
mem=20000000
elif [ $mem -lt 10 ]; then
mem=30000000
elif [ $mem -lt 20 ]; then
mem=60000000
elif [ $mem -lt 50 ]; then
mem=150000000
elif [ $mem -lt 100 ]; then
mem=300000000
elif [ $mem -lt 200 ]; then
mem=600000000
elif [ $mem -lt 500 ]; then
mem=900000000
elif [ $mem -lt 1024 ]; then
mem=1900000000
elif [ $mem -lt 2048 ]; then
mem=3800000000
elif [ $mem -lt 4096 ]; then # this condition is added by me
mem=7600000000
elif [ $mem -lt 8192 ]; then # this condition is added by me
mem=15200000000
else
echo "Error: Size of $vdb > 8G. I don't want to do it before you think of any optimization." | tee -a "$TIME_DIR/cluster_${vfile}"
exit 1;
fi
The parameters of deckard is set to the following values:
I attached the log files. please help me to mitigate this problem, I need your tool for my experiments.
deckard log.zip
Hi, on a current Mac OS system, the build fails because malloc.h is not at the place you expect that to be. I fixed it for me by a symbolic link but that can only be a workaround. It would be better fixed in the build script.
Recently, I try deckard to find bugs in clone code. I find it not work well for the following java file. I set
MIN_TOKENS='15' STRIDE='2' SIMILARITY='0.8' .
According to the FSE 07 it is supposed to find out the bug.
The bug line is:
cmp = lhsType.compareTo(lhsType);
if (cmp != 0)
return cmp;
it looks similar several line ahead:
cmp = lhsName.compareTo(rhsName);
if (cmp != 0)
return cmp;
Can anyone help me?
public class VersionInsensitiveBugComparator implements WarningComparator {
private ClassNameRewriter classNameRewriter = IdentityClassNameRewriter.instance();
private boolean exactBugPatternMatch = true;
private boolean comparePriorities = false;
public VersionInsensitiveBugComparator() {
}
public void setClassNameRewriter(ClassNameRewriter classNameRewriter) {
this.classNameRewriter = classNameRewriter;
}
public void setComparePriorities(boolean b) {
comparePriorities = b;
}
/**
* Wrapper for BugAnnotation iterators, which filters out
* annotations we don't care about.
*/
private class FilteringAnnotationIterator implements Iterator<BugAnnotation> {
private Iterator<BugAnnotation> iter;
private BugAnnotation next;
public FilteringAnnotationIterator(Iterator<BugAnnotation> iter) {
this.iter = iter;
this.next = null;
}
public boolean hasNext() {
findNext();
return next != null;
}
public BugAnnotation next() {
findNext();
if (next == null)
throw new NoSuchElementException();
BugAnnotation result = next;
next = null;
return result;
}
public void remove() {
throw new UnsupportedOperationException();
}
private void findNext() {
while (next == null) {
if (!iter.hasNext())
break;
BugAnnotation candidate = iter.next();
if (!isBoring(candidate)) {
next = candidate;
break;
}
}
}
}
private boolean isBoring(BugAnnotation annotation) {
return !annotation.isSignificant();
}
private static int compareNullElements(Object a, Object b) {
if (a != null)
return 1;
else if (b != null)
return -1;
else
return 0;
}
private static String getCode(String pattern) {
int sep = pattern.indexOf('_');
if (sep < 0)
return "";
return pattern.substring(0, sep);
}
public int compare(BugInstance lhs, BugInstance rhs) {
// Attributes of BugInstance.
// Compare abbreviation
// Compare class and method annotations (ignoring line numbers).
// Compare field annotations.
int cmp;
BugPattern lhsPattern = lhs.getBugPattern();
BugPattern rhsPattern = rhs.getBugPattern();
if (lhsPattern == null || rhsPattern == null) {
// One of the patterns is missing.
// However, we can still accurately match by abbrev (usually) by comparing
// the part of the type before the first '_' character.
// This is almost always equivalent to the abbrev.
String lhsCode = getCode(lhs.getType());
String rhsCode = getCode(rhs.getType());
if ((cmp = lhsCode.compareTo(rhsCode)) != 0) {
return cmp;
}
} else {
// Compare by abbrev instead of type. The specific bug type can change
// (e.g., "definitely null" to "null on simple path"). Also, we often
// change bug pattern types from one version of FindBugs to the next.
//
// Source line and field name are still matched precisely, so this shouldn't
// cause loss of precision.
if ((cmp = lhsPattern.getAbbrev().compareTo(rhsPattern.getAbbrev())) != 0)
return cmp;
if (isExactBugPatternMatch() && (cmp = lhsPattern.getType().compareTo(rhsPattern.getType())) != 0)
return cmp;
}
if (comparePriorities) {
cmp = lhs.getPriority() - rhs.getPriority();
if (cmp != 0) return cmp;
}
Iterator<BugAnnotation> lhsIter = new FilteringAnnotationIterator(lhs.annotationIterator());
Iterator<BugAnnotation> rhsIter = new FilteringAnnotationIterator(rhs.annotationIterator());
while (lhsIter.hasNext() && rhsIter.hasNext()) {
BugAnnotation lhsAnnotation = lhsIter.next();
BugAnnotation rhsAnnotation = rhsIter.next();
// Different annotation types obviously cannot be equal,
// so just compare by class name.
if (lhsAnnotation.getClass() != rhsAnnotation.getClass())
return lhsAnnotation.getClass().getName().compareTo(rhsAnnotation.getClass().getName());
if (lhsAnnotation.getClass() == ClassAnnotation.class) {
// ClassAnnotations should have their class names rewritten to
// handle moved and renamed classes.
String lhsClassName = classNameRewriter.rewriteClassName(
((ClassAnnotation)lhsAnnotation).getClassName());
String rhsClassName = classNameRewriter.rewriteClassName(
((ClassAnnotation)rhsAnnotation).getClassName());
cmp = lhsClassName.compareTo(rhsClassName);
if (cmp != 0)
return cmp;
} else if(lhsAnnotation.getClass() == MethodAnnotation.class ) {
// Rewrite class names in MethodAnnotations
MethodAnnotation lhsMethod = ClassNameRewriterUtil.convertMethodAnnotation(
classNameRewriter, (MethodAnnotation) lhsAnnotation);
MethodAnnotation rhsMethod = ClassNameRewriterUtil.convertMethodAnnotation(
classNameRewriter, (MethodAnnotation) rhsAnnotation);
cmp = lhsMethod.compareTo(rhsMethod);
if (cmp != 0)
return cmp;
} else if(lhsAnnotation.getClass() == FieldAnnotation.class) {
// Rewrite class names in FieldAnnotations
FieldAnnotation lhsField = ClassNameRewriterUtil.convertFieldAnnotation(
classNameRewriter, (FieldAnnotation) lhsAnnotation);
FieldAnnotation rhsField = ClassNameRewriterUtil.convertFieldAnnotation(
classNameRewriter, (FieldAnnotation) rhsAnnotation);
cmp = lhsField.compareTo(rhsField);
if (cmp != 0)
return cmp;
} else if(lhsAnnotation.getClass() == StringAnnotation.class) {
// Rewrite class names in FieldAnnotations
String lhsString = ((StringAnnotation)lhsAnnotation).getValue();
String rhsString = ((StringAnnotation)rhsAnnotation).getValue();
cmp = lhsString.compareTo(rhsString);
if (cmp != 0)
return cmp;
} else if(lhsAnnotation.getClass() == LocalVariableAnnotation.class) {
// Rewrite class names in FieldAnnotations
String lhsName = ((LocalVariableAnnotation)lhsAnnotation).getName();
String rhsName = ((LocalVariableAnnotation)rhsAnnotation).getName();
if (lhsName.equals("?") && rhsName.equals("?"))
continue;
cmp = lhsName.compareTo(rhsName);
if (cmp != 0)
return cmp;
} else if(lhsAnnotation.getClass() == TypeAnnotation.class) {
// Rewrite class names in FieldAnnotations
String lhsType = ((TypeAnnotation)lhsAnnotation).getTypeDescriptor();
String rhsType = ((TypeAnnotation)rhsAnnotation).getTypeDescriptor();
lhsType = ClassNameRewriterUtil.rewriteSignature(classNameRewriter, lhsType);
rhsType = ClassNameRewriterUtil.rewriteSignature(classNameRewriter, rhsType);
cmp = lhsType.compareTo(lhsType);
if (cmp != 0)
return cmp;
} else if(lhsAnnotation.getClass() == IntAnnotation.class) {
// Rewrite class names in FieldAnnotations
int lhsValue = ((IntAnnotation)lhsAnnotation).getValue();
int rhsValue = ((IntAnnotation)rhsAnnotation).getValue();
cmp = lhsValue - rhsValue;
if (cmp != 0)
return cmp;
} else if (isBoring(lhsAnnotation)) {
throw new IllegalStateException("Impossible");
} else
throw new IllegalStateException("Unknown annotation type: " + lhsAnnotation.getClass().getName());
}
if (rhsIter.hasNext())
return -1;
else if (lhsIter.hasNext())
return 1;
else
return 0;
}
/**
* @param exactBugPatternMatch The exactBugPatternMatch to set.
*/
public void setExactBugPatternMatch(boolean exactBugPatternMatch) {
this.exactBugPatternMatch = exactBugPatternMatch;
}
/**
* @return Returns the exactBugPatternMatch.
*/
public boolean isExactBugPatternMatch() {
return exactBugPatternMatch;
}
}
Error starts at line...
"make[1]: execvp: ./mainc.py: Permission denied"
and then ends at...
"make: *** No rule to make target ../ptgen/gcc/gccptgen.a', needed by
cvecgen'. Stop."
These items need to be installed before 'build.sh' is executed...
(sudo apt-get install )
Bison
Flex
Most of the Yacc parser (and maybe other portions) were written in Python 2. Since Python 2 was deprecated in 2020, we should update the codebase to use Python 3.
When I use the command: "cvecgen -i ../../src/dircolors.c -o tmp.vec --start-line-number 508 --end-line-number 508"
The output is "cvecgen: tree-accessor.C:81: static TreeVector* TreeAccessor::get_node_vector(Tree*): Assertion `attr_itr!=t->attributes.end()' failed."
Please refer to the attachment for the file dircolors.c.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.