felipelouza / egsa Goto Github PK
View Code? Open in Web Editor NEWGeneralized enhanced suffix array construction in external memory [CPM'13, AMB 2017]
Home Page: https://doi.org/10.1186/s13015-017-0117-9
License: GNU General Public License v3.0
Generalized enhanced suffix array construction in external memory [CPM'13, AMB 2017]
Home Page: https://doi.org/10.1186/s13015-017-0117-9
License: GNU General Public License v3.0
Hi @felipelouza ,
I would like to play with this package. So, I try:
git clone https://github.com/felipelouza/egsa.git
cd egsa
make
./egsa dataset/input-100.txt 2
And, I get this error:
SIGMA = 255
DIR = dataset/
INPUT = input-100.txt
K = 2
MEMLIMIT = 2048.00 MB
CHECK = 0
COMPUTE_BWT = 0
WORKSPACE = 13.n bytes
malloc_count ### free(0x7fcd04fffff0) has no sentinel !!! memory corruption?
egsa(5619,0x7fffa48d5380) malloc: *** error for object 0x7fcd04fffff0: pointer being freed was not allocated
*** set a breakpoint in malloc_error_break to debug
What I am doing wrong?
I am using Mac OS X version 10.13.6
Best,
With this input file testatestb\0blablabla
i get a segfault (tried with K = 0, 1, 2):
DIR = ./
INPUT = lalouli.txt
K = 0
MEMLIMIT = 2048.00 MB
CHECK = 0
COMPUTE_BWT = 0
WORKSPACE = 13.n bytes
### PREPROCESSING ###
K = 0
PARTITIONS = 1
TOTAL = 20 bytes 0.00 MB
CLOCK = 0.000658 TIME = 0.000000
0.000658 0.000000
### PHASE 1 ###
CLOCK = 0.000184 TIME = 0.000000
0.000184 0.000000
### PHASE 2 ###
INDUCING:
alfa TOTAL INDUCED %:
ALL) 19 0 0.00
Segmentation fault (core dumped)
and the output file is created but empty.
Hi!
This is an excellent library, thanks for posting!
I am wondering, is there a way to get the actual max-length LCP text? When I ran the code I can see the maximum length, but not the actual LCP. Is there a way to get that done?
I am afraid I do not have much experience with advanced C/C++, so any help is greatly appreciated! Thanks!
Hi @felipelouza ,
Finally, I have run the library on a Linux machine :)
I am not sure if I interpret in the right way the normal output of this library, because I get a bigger LCS size with k=50 than with k=5. What is the meaning of the "size" in the output?
k=5
ubuntu@ip-172-31-32-99:~/egsa/egsa$ ./egsa dataset/input-100.txt 5
SIGMA = 255
DIR = dataset/
INPUT = input-100.txt
K = 5
MEMLIMIT = 2048.00 MB
CHECK = 0
COMPUTE_BWT = 0
WORKSPACE = 13.n bytes
### PREPROCESSING ###
K = 5
PARTITIONS = 1
TOTAL = 286 bytes 0.00 MB
CLOCK = 0.000272 TIME = 0.000000
0.000272 0.000000
### PHASE 1 ###
CLOCK = 0.000125 TIME = 0.000000
0.000125 0.000000
### PHASE 2 ###
INDUCING:
alfa TOTAL INDUCED %:
ALL) 285 98 34.39
CLOCK = 0.004332 TIME = 0.000000
0.004332 0.000000
### TOTAL ###
CLOCK = 0.004495 TIME = 0.000000
0.004495 0.000000
milisecond per byte = 0.000000000
0.000000000
size = 285
malloc_count ### exiting, total: 1,158,870,124, peak: 1,158,641,041, current: 1,033
k=50
ubuntu@ip-172-31-32-99:~/egsa/egsa$ ./egsa dataset/input-100.txt 50
SIGMA = 255
DIR = dataset/
INPUT = input-100.txt
K = 50
MEMLIMIT = 2048.00 MB
CHECK = 0
COMPUTE_BWT = 0
WORKSPACE = 13.n bytes
### PREPROCESSING ###
K = 50
PARTITIONS = 1
TOTAL = 2848 bytes 0.00 MB
CLOCK = 0.000360 TIME = 0.000000
0.000360 0.000000
### PHASE 1 ###
CLOCK = 0.000612 TIME = 0.000000
0.000612 0.000000
### PHASE 2 ###
INDUCING:
alfa TOTAL INDUCED %:
ALL) 2847 1403 49.28
CLOCK = 0.005419 TIME = 0.000000
0.005419 0.000000
### TOTAL ###
CLOCK = 0.006064 TIME = 0.000000
0.006064 0.000000
milisecond per byte = 0.000000000
0.000000000
size = 2847
malloc_count ### exiting, total: 1,159,007,790, peak: 1,158,692,569, current: 1,033
My problem is about to find the k-LCS in n (n>=k and 2<=k<=n) strings. So, when k=5 the LCS value should be >= than when k=50.
I'd like to play with this library, but I'm not familiar with C, an example of how to export the results to a text format would be very helpful, so it can be easily interfaced with anything.
Hi, what is the format of the output binary file, how to read them in C++/C?
Also, I am curious how you deal with the read index for generalized suffix array. When there are many reads, the read index may cost a lot of memory.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.