Giter Club home page Giter Club logo

suffix-trees's Introduction

suffix_trees

ci codecov

Python implementation of Suffix Trees and Generalized Suffix Trees. Also provided methods with typcal applications of STrees and GSTrees.

Installation

pip install suffix-trees

Usage

from suffix_trees import STree

# Suffix-Tree example.
st = STree.STree("abcdefghab")
print(st.find("abc")) # 0
print(st.find_all("ab")) # {0, 8}

# Generalized Suffix-Tree example.
a = ["xxxabcxxx", "adsaabc", "ytysabcrew", "qqqabcqw", "aaabc"]
st = STree.STree(a)
print(st.lcs()) # "abc"

suffix-trees's People

Contributors

piperchester avatar ptrus avatar zhylkaaa avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

suffix-trees's Issues

Not Generalized Suffix Trees fro multiple strings.

Hi!
I tried using the code fro multiple strings, but it fails to give the correct answer. I believe the current implementation is only meant for a single string unless I made some mistake in the function call.
Pasting my code below for your reference:

text = ["how are you doing?", "how are you?", "good morning", "good morning madam"] st = STree.STree(text) st.find("how are you?")

It will be really great if you could add the implementation for multiple strings as well.

Return All Common Substrings In Generalised Suffix Tree?

Hi,

I would like to say, fantastic job first and foremost. I would like to ask if you would be able to implement a function that returned all common substrings in a generalised suffix trie? I'm currently using this package for the detection of patterns in music and this would be very useful. Any help would be really appreciated.

Cheers!

Efficiency problem

Hi @ptrus ,

Very good python package. Congratulations!

I have the next problem:

I need to use Generalized Suffix Trees (k-lcs problem) with 2000 inputs and very big strings. I can work without problem with 5 or 10 rows, but with 2000 is impossible. What do you recommend me?

Regards,

setup.py depends on pypandoc before having a chance to install it.

So we just can't install suffix-trees:

Collecting suffix-trees (from deploy-client==0.0.1)
  Downloading suffix-trees-0.2.4.2.tar.gz
    Complete output from command python setup.py egg_info:
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/tmp/pip-build-793aivvt/suffix-trees/setup.py", line 3, in <module>
        import pypandoc
    ModuleNotFoundError: No module named 'pypandoc'

STree matches on non-existent matches in edge cases

I came across some spurious matches on my data. I've whittled down the text to reproduce, so here's a minimal example:

from suffix_trees.STree import STree
text = "name language w en url http w namelanguage en url http"
stree = STree(text)

print "STree finds 'law' at index:", stree.find_all('law')
print "'law' in text -->", 'law' in text

which outputs

STree finds 'law' at index: [5]
'law' in text --> False

find_all problem

Hello,
I was looking for a package able to use a suffix tree to search DNA sequences.
I downloaded the last version of suffix_trees and find_all function (seems to) doesn't work. I tried test3.py and got an error. find_all return no hit even if the substring is into the string. I also tried the code below. Is there something wrong with my code? :-|

By the way, thank you very much for your package ;-)
Loïc
PS: please, let me know if you need more information

Example code run with python3.5

from suffix_trees import STree

Suffix-Tree example.

st = STree.STree("abcdefghab")
print(st.find("abc")) # 0
print(st.find_all("ab")) # [0, 8] ---> [] :-(

Any plan to support frequent pattern mining?

Hi,

Thank you for sharing this great work! I was wondering if you have any plan to support frequent pattern mining which is a quite common use case in substring pattern mining.

Many thanks!

Python 3 errors

I get this under Python 3:

ryan@DevPC-LX:~/stuff/suffix-trees$ ipython3
Python 3.4.4 |Anaconda 2.1.0 (64-bit)| (default, Jan 11 2016, 13:54:01) 
Type "copyright", "credits" or "license" for more information.

IPython 2.2.0 -- An enhanced Interactive Python.
Anaconda is brought to you by Continuum Analytics.
Please check out: http://continuum.io/thanks and https://binstar.org
?         -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help      -> Python's own help system.
object?   -> Details about 'object', use 'object??' for extra details.

In [1]: from suffix_trees import STree

In [2]: STree.STree(['abc', 'def'])
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-2-3209533ade10> in <module>()
----> 1 STree.STree(['abc', 'def'])

/media/ryan/stuff/suffix-trees/suffix_trees/STree.py in __init__(self, input)
     11 
     12         if not input == '':
---> 13            self.build(input)
     14 
     15     def _check_input(self, input):

/media/ryan/stuff/suffix-trees/suffix_trees/STree.py in build(self, x)
     40             self._build(x)
     41         if type == 'gst':
---> 42             self._build_generalized(x)
     43 
     44     def _build(self, x):

/media/ryan/stuff/suffix-trees/suffix_trees/STree.py in _build_generalized(self, xs)
    131         terminal_gen = self._terminalSymbolsGenerator()
    132 
--> 133         _xs = ''.join([x + next(terminal_gen) for x in xs])
    134         self.word = _xs
    135         self._generalized_word_starts(xs)

/media/ryan/stuff/suffix-trees/suffix_trees/STree.py in <listcomp>(.0)
    131         terminal_gen = self._terminalSymbolsGenerator()
    132 
--> 133         _xs = ''.join([x + next(terminal_gen) for x in xs])
    134         self.word = _xs
    135         self._generalized_word_starts(xs)

/media/ryan/stuff/suffix-trees/suffix_trees/STree.py in _terminalSymbolsGenerator(self)
    244         """
    245         py2 = sys.version[0] < '3'
--> 246         UPPAs = list(range(0xE000,0xF8FF+1) + range(0xF0000,0xFFFFD+1) + range(0x100000, 0x10FFFD))
    247         for i in UPPAs:
    248             if py2:

TypeError: unsupported operand type(s) for +: 'range' and 'range'

In [3]: 

Error in the usage documentation

The usage description says:
from STree import STree

I haven't been able to get it to work with that. I think that it should say:

from suffix_trees import STree

Support for substring frequencies

Thanks @ptrus for this great library!

I was wondering if you are planning to add frequency counts in the future; this feature would be very useful in order to determine how often a substring appears within a given string. This issue is related to #11.

No PyPI release

The PyPI page for this project doesn't contain any actual source releases, so commands like pip install suffix-trees won't work. I think you need to run python setup.py sdist upload.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.