Giter Club home page Giter Club logo

pysuffix's People

Contributors

cbazin avatar

Watchers

 avatar

pysuffix's Issues

Longer string (12500 characters) not returning proper suffix array, but shorter string (12000 characters) is correct.

What steps will reproduce the problem?

1. Use the provided source code with the provided file with the correct version 
of python and pysuffix, I don't know if this happens with other versions, but 
it is better to be safe then sorry.
2. Compare the output of BWT(text) with bwt(text), using BWT(text)==bwt(text) 
will give you a boolean answer if they are equivalent.
3. (This is more of a suggestion) play around with changing the length of the 
string, I know that 12500 characters long from ww2.txt doesn't work (the last 2 
are switched, and anything above that length seems to either do the same, or 
other things seem to start going... weird), but 12000 characters does work. 
Assuming the ww2.txt is in the same directory as the python file, it is best to 
use text=readFile('ww2.txt')[:length] (this function is in my source code) to 
get the first part of the document that has the length indicated.


What is the expected output? What do you see instead?

The expected output is the result of a Burrows Wheeler Transformation on the 
text that is inputted, my code works for shorter strings, but pysuffix seems to 
fail in the order (in my example it is the last 2 characters being switched) on 
longer strings (+12500), and it seems to be the last part of the string that is 
wrong. I am using the attached ww2.txt as a test to see if it is working or 
not, and considering it is a 632545 character long file, I didn't think it is 
the best idea to paste it here so it is attached. Also attached are ww2_bwt.txt 
which is what the output should be (at 12500 characters) and ww2_BWT.txt which 
is what the version with pysuffix is giving me (at 12500 characters). I believe 
this difference is because of an error in the returned suffix array. My best 
guess for why this is happening is a difference in sorting between unicode and 
ascii, but I am probably very wrong. I would like to see this fixed, but I 
don't know where to start on my end. You may ask 'you already have a function 
to do this, why do you need pysuffix?' Well... my function is very slow, and it 
takes an insane amount of memory when doing anything with longer strings, as 
it's memory usage and processing time seem to be exponential with the length of 
the string. On the other hand, there is a module (pysuffix) that does the hard 
work for you and it is impressively fast (bravo by the way!) :)


What version of the product are you using? On what operating system?

I am using pysuffix v2.1 with python 2.7.6 on Windows 7


Please provide any additional information below.

Please email me ([email protected]) for any further questions, or if this 
gets resolved, whether it is an error on my part or a new version of pysuffix :)

This is my source code:

import sys
sys.path.append('C:\Users\****\Desktop\python\pysuffix')#I have the pysuffix 
file here, and the import works after I add the file location to sys.path, so 
if you already have pysuffix installed somewhere else, this is not needed.
from tools_karkkainen_sanders import *

def BWT(text):#New Burrows-Wheeler transform using pysuffix
    text+='\0'#addition of sign byte
    def f(x):return text[x-1]#function for map to return value
    return ''.join(map(f, simple_kark_sort(unicode(text,'utf-8','replace'))[:len(text)]))

def bwt(text):#old and slow, but reliable, Burrows-Wheeler transform
    text+='\0'#addition of sign byte
    def perm(x):return text[x:]+text[:x]#function for returning cyclic permutation
    return ''.join([row[-1:] for row in sorted(map(perm,range(len(text))))])

def readFile(filename):#This is the function I am using to read files.
    f=open(filename, 'rb')
    text=f.read()
    f.close()
    return text





Thanks :)

Original issue reported on code.google.com by [email protected] on 21 Apr 2015 at 10:27

Attachments:

example doesn't work

What steps will reproduce the problem?
1.
2.
3.

What is the expected output? What do you see instead?


What version of the product are you using? On what operating system?


Please provide any additional information below.


Original issue reported on code.google.com by [email protected] on 4 Oct 2011 at 7:01

There are no Python docstrings anywhere in this project.

What steps will reproduce the problem?
1. Open a command prompt.
2. Navigate to the root of the project.
3. Run this command: find . -name "*.py" | xargs grep \"\"\"

What is the expected output? What do you see instead?

I expect to see at least one line of docstrings on every module, class, method, 
and function. Complex codeblocks whose purpose is not immediately obvious 
should have more exentsive docs. Any function or method that takes a parameter 
or returns a value should document that interface, including expected types, in 
its docstring. Any function or method that requires context or bibliographical 
references should get it.

What version of the product are you using? On what operating system?

Version 2.1, Ubuntu Linux, but this is an issue of source code hygiene; it is 
is not system specific.

Please provide any additional information below.

An open source project will be adopted only if it has good docs.

Original issue reported on code.google.com by [email protected] on 3 Apr 2012 at 6:05

example doesn't work

$ python suffix_array.test.py
[2, 0, 3, 1, 0, 0, 0]
[2, 0, 1, 0]
Traceback (most recent call last):
  File "suffix_array.test.py", line 14, in <module>
    1/0
ZeroDivisionError: integer division or modulo by zero


Original issue reported on code.google.com by [email protected] on 4 Oct 2011 at 7:02

Please provide setuptools-compatible setup.py, and upload each release to an egg repository.

What steps will reproduce the problem?
1. Open a command prompt.
2. If you're in a virtualenv, run this: easy_install pysuffix
3. If not, run this: sudo easy_install pysuffix

What is the expected output? What do you see instead?

I expect pysuffix to install itself automatically from a known egg server. It 
does not. Instead, I get this:

Searching for pysuffix
Reading http://pypi.python.org/simple/pysuffix/
Couldn't find index page for 'pysuffix' (maybe misspelled?)
Scanning index of all packages (this may take a while)
Reading http://pypi.python.org/simple/
No local packages or download links found for pysuffix
error: Could not find suitable distribution for Requirement.parse('pysuffix')


What version of the product are you using? On what operating system?

Version 2.1, Ubuntu linux. This is a packaging issue which transcends operating 
systems.


Please provide any additional information below.

Please provide for the packaging and distribution of this library. An open 
source release will only be adopted if it is easy to install from known 
repositories using standard tools.

Original issue reported on code.google.com by [email protected] on 3 Apr 2012 at 6:10

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.