Comments (4)
ERROR: more than one doc found
is a problem. This must be happening when SourcererCC searches for the tokens of a document (using document id as the query) in the forward index. Ideally we should never get more than one doc, as document id for each document should be unique. My guess is that this is happening because we might have assigned same id for more than one document in the parsing stage, or may be we indexed one document twice.
About the ArrayIndexOutOfBoundsException
, let's keep a track of these characters. We should remove them during the tokenizing stage.
from sourcerercc.
I meet the similar problem with you. My error message is
EXCEPTION CAUGHT, invalid line: Bud% @� @ @ @ E%DSDB
@ @ @
index size of GTPM: 66012
Directory: dataset
indexing file : .DS_Store
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 1
at noindex.CloneHelper.deserialise(Unknown Source)
at indexbased.SearchManager.doIndex(Unknown Source)
at indexbased.SearchManager.main(Unknown Source)`
Every step of the operation is performed in accordance with the steps in readme. What does it mean? My data is wrong?
from sourcerercc.
please delete the .DS_Store file from the dataset directory.
from sourcerercc.
I'm hoping this solved the problem. If not, please open another issue.
from sourcerercc.
Related Issues (20)
- Q:what wrong is it in this error log? HOT 1
- Reproducing results of ICSE 16 paper HOT 2
- Why does the detection cost 3min? HOT 1
- SourcererCC used old library of eproperties HOT 6
- There is something wrong in step2 when detect file clone
- the website your paper give has some tools,but i cant open,it is useless now
- result format of code-clone detection in block-level HOT 1
- 0 and 1 token clones not detected
- Corrupt lines in pair file HOT 1
- How can I specify min line threshold?
- Why controller.py can not run? Could anybody help me? HOT 5
- Changing the value of threshold
- cat: 'clone-detector/NODE_*/output8.0/query_*': No such file or directory HOT 1
- Q How to resolve controller.execute() error : One or more nodes failed during Step Search.
- How to create the clone mapping in C or C++? HOT 2
- Provide the patch which makes the block-level tokenizer.py compatible with current python version HOT 4
- failed in testing tokenizer with tokenizer-sample-input HOT 1
- Failed to run block-level tokenizer HOT 16
- Where is block-level tokenizer and partial index algorithm HOT 1
- collector.py doesn't generate query_*, only report.csv HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from sourcerercc.