Giter Club home page Giter Club logo

pysais's People

Contributors

alexeyg avatar hynekcer avatar whym avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

pysais's Issues

Segmentation fault in lcp for the terminal symbol alone

I'm testing the package on random data:

>>> sequence = '$'
>>> sa = pysais.sais(sequence)
>>> lcp, lcp_lm, lcp_mr = pysais.lcp(sequence, sa)
Segmentation fault (core dumped)

(I observed segmentation faults also on normal data sometimes if I omitted the terminal symbol. But I understand that you strictly require it.)

pysais.lcp crashes python silently

I'm using python 3.6.6

Calling pysais.lcp(sequence, sa) is crashing silently- it seems to crash python entirely. Whether running a script that calls lcp or in interactive mode, it ends up closing python all together. The following code caused it on my end

import pysais
sequence = "aaabbbcccdddaaacccbbbddd"

sa = pysais.sais(sequence)
o = pysais.lcp(sequence, sa)
print(o)

bisect - the first character can not be found

I'm trying why the output of LCP is so complicated - and I found a bug in bisect:

>>> sequence = 'abc'
>>> sa = pysais.sais(sequence)
>>> lcp, lcp_lm, lcp_mr = pysais.lcp(sequence, sa)
>>> pysais.bisect(sequence, 'c', sa, lcp_lm, lcp_mr)
(2, True)   # OK
>>> pysais.bisect(sequence, 'b', sa, lcp_lm, lcp_mr)
(1, True)   # OK
>>> pysais.bisect(sequence, 'a', sa, lcp_lm, lcp_mr)
(1, True)   # BUG

Add a license

It would be nice if you can release this under a clear license. It will make it easier for people to use it to build a larger piece of software. It could encourage contributions, too.

Would you like to choose the MIT license (as Yuta Mori did for sais.c and sais.h)?

lcp_int returns wrong result

lcp_int returned an array with no useful data (constant zeros).
I had a look into the code and found a suspicious line:

PySAIS/pysais.c

Line 290 in bc27f42

T = pyvector_to_Carrayptrs(SA_np);

T = pyvector_to_Carrayptrs(SA_np);

This looks obviously wrong. Correcting it to
T = pyvector_to_Carrayptrs(T_np);

gives me better results.

Now I am wondering, am I the first one to even use this function? Might there be other glitches hidden noone ever found due to lack of testing? Is this project still alive? It would be a pity of not, because I think it is really cool!

Best regards and thanks, Andreas

returned array has wrong length

Python: python 3.5
pysais: master

from sklearn.datasets import fetch_20newsgroups
import pysais
s = '$'.join(fetch_20newsgroups().data)
sa = pysais.sais(s)
print(len(s))
print(len(sa))

expected output:

22065807
22065807

actual output:

22065807
22065930

Extending to a faster algorithm

This issue to let you know that I've re-used your strategy for wrapping the C code from Yuta Mori, applied to an enhanced version found here: https://github.com/kurpicz/sais-lite-lcp . This enhancement enables computing SA and LCP arrays simultaneously, and I have yet to find a segfault. According to wikipedia it was also the most efficient algorithm as of 2012.

My fork it there: https://github.com/fcharras/pyfischer

I've only ported the functions that meet my usecases, but the other functions could be ported too.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.