Giter Club home page Giter Club logo

suffix-tree's Introduction

Generalized Suffix Trees

Quick annoucement

This repository is deprecated and not maintained. For an up to date implementation of Generalized Suffix Trees, please have a look at https://github.com/Rerito/suffix-tree-v2

Description

Here is a C++ implementation for Generalized Suffix Trees based on Ukkonen's algorithm.

A Suffix Tree is a special data structure that holds the suffixes of an input string S1 and allow to perform the following queries in linear time:

  • Test if a string S2 is a substring of S1
  • Find palindrome substrings
  • ... (@TODO complete this list)

However, some problems needs an extension of this data structure to maintain a tree of the suffixes of each string in a set of strings. For example, finding the largest common substring of a set of strings is equivalent to finding the deepest internal node in a Generalized Suffix Tree that has children leaves of every input string.

How does it work

The basic algorithm remains valid to build a Generalized Suffix Tree. Some little tweaks are required nonetheless:

  • Input strings are stored in a hashmap. Each input string has thus a unique integer key.

  • Transitions from a node to another are stored in a hashmap. Since there is at most one transition starting with a given character, a transision's key is its first character.

  • A transition can be viewed as a directed edge, labeled by a substring. Such a substring is originally represented using two pointers (left, righ). Here, since we can have multiple reference strings, we add a third parameter: the ID of the reference string (same as in the string set hashmap). Thus, the substring cao of the string cacao at index 1 is represented by the triple:

    (1, 2, 4)

We thus add some fetching steps (O(1) complexity if the hashmap performs well).

More at this SO Question...

Improvements

The @TODO list for this little project is not cleared yet:

  • Handle leaves (reference the strings in the set that has a suffix on a leaf)
  • Allow a remove_string procedure to remove the suffixes, nodes and transitions brought by the given string. This will be online in the same way as the insertion routine.
  • Program some tests (using Google Test API)

suffix-tree's People

Contributors

rerito avatar trig-ger avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.