Giter Club home page Giter Club logo

geco3's People

Contributors

miltondts avatar pratas avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

geco3's Issues

Conda Installation failed

I follow the Installation instruction using command conda install -y -c bioconda geco3, but it reports:

Solving environment: failed

PackagesNotFoundError: The following packages are not available from current channels:

  - geco3

Current channels:

  - https://conda.anaconda.org/bioconda/win-64
  - https://conda.anaconda.org/bioconda/noarch
  - https://repo.anaconda.com/pkgs/main/win-64
  - https://repo.anaconda.com/pkgs/main/noarch
  - https://repo.anaconda.com/pkgs/free/win-64
  - https://repo.anaconda.com/pkgs/free/noarch
  - https://repo.anaconda.com/pkgs/r/win-64
  - https://repo.anaconda.com/pkgs/r/noarch
  - https://repo.anaconda.com/pkgs/pro/win-64
  - https://repo.anaconda.com/pkgs/pro/noarch
  - https://repo.anaconda.com/pkgs/msys2/win-64
  - https://repo.anaconda.com/pkgs/msys2/noarch

To search for alternate channels that may provide the conda package you're
looking for, navigate to  https://anaconda.org and use the search bar at the top of the page 

How can I solve it?

Data corruption depending on input file name

I noticed that GeCo3 sometimes corrupts data. I.e., after compression + decompression, the decompressed sequence is different from the original one.

I uploaded a small test dataset: http://kirill.med.u-tokai.ac.jp/data/temp/GeCo3-repro-1-test-data.seq.gz (1.4 MB, gzipped to 0.4 MB).

Steps to reproduce (on Linux):

cd /tmp
mkdir GeCo3-repro-1
cd GeCo3-repro-1
git clone https://github.com/cobilab/geco3.git
cd geco3/src
make
cd ../..
wget http://kirill.med.u-tokai.ac.jp/data/temp/GeCo3-repro-1-test-data.seq.gz
gzip -dc GeCo3-repro-1-test-data.seq.gz >GeCo3-repro-1-test-data.seq
geco3/src/GeCo3 -l 1 GeCo3-repro-1-test-data.seq
geco3/src/GeDe3 GeCo3-repro-1-test-data.seq.co
cmp GeCo3-repro-1-test-data.seq GeCo3-repro-1-test-data.seq.de

Console output of the last few steps:

lr: 0.03, hs: 40
xs: 35
Total bytes: 347972 (339.8 KB), 1.924 bpb, 1.924 bps w/ no header, Normalized Dissimilarity Rate: 0.962092
Spent 2.37155 sec.
xs: 35
Spent 2.05411 sec.
GeCo3-repro-1-test-data.seq GeCo3-repro-1-test-data.seq.de differ: byte 31, line 1

The decompressed file "GeCo3-repro-1-test-data.seq.de" has correct size, but completely different sequence. This problem occurs only with some data, and only at some compression levels.

A little experimentation revealed that curiously this issue seems to be sensitive to the input file name.

The following Perl script tries GeCo3 with levels from 1 to 7, verifying the output of each run. Then it renames the file to "1.seq" and again tests GeCo3 levels 1 to 7:

print "\nInput is named \"GeCo3-repro-1-test-data.seq\"\n";
for (my $l = 1; $l <= 7; $l++)
{
    system("geco3/src/GeCo3 -l $l GeCo3-repro-1-test-data.seq >/dev/null 2>&1");
    system('geco3/src/GeDe3 GeCo3-repro-1-test-data.seq.co >/dev/null 2>&1');
    my $r = `cmp GeCo3-repro-1-test-data.seq GeCo3-repro-1-test-data.seq.de`;
    print "$l: ", (($r eq '') ? "OK\n" : $r);
    unlink 'GeCo3-repro-1-test-data.seq.co';
    unlink 'GeCo3-repro-1-test-data.seq.de';
}
rename 'GeCo3-repro-1-test-data.seq', '1.seq';
print "\nNow input is renamed to \"1.seq\"\n";
for (my $l = 1; $l <= 7; $l++)
{
    system("geco3/src/GeCo3 -l $l 1.seq >/dev/null 2>&1");
    system('geco3/src/GeDe3 1.seq.co >/dev/null 2>&1');
    my $r = `cmp 1.seq 1.seq.de`;
    print "$l: ", (($r eq '') ? "OK\n" : $r);
    unlink '1.seq.co';
    unlink '1.seq.de';
}

Steps to download and run the above script:

cd /tmp
mkdir GeCo3-repro-1-part-2
cd GeCo3-repro-1-part-2
git clone https://github.com/cobilab/geco3.git
cd geco3/src
make
cd ../..
wget http://kirill.med.u-tokai.ac.jp/data/temp/GeCo3-repro-1-test-data.seq.gz
gzip -dc GeCo3-repro-1-test-data.seq.gz >GeCo3-repro-1-test-data.seq
wget http://kirill.med.u-tokai.ac.jp/data/temp/GeCo3-repro-1-test.pl
/usr/bin/env perl GeCo3-repro-1-test.pl

The output of this step:

Input is named "GeCo3-repro-1-test-data.seq"
1: GeCo3-repro-1-test-data.seq GeCo3-repro-1-test-data.seq.de differ: byte 31, line 1
2: OK
3: OK
4: GeCo3-repro-1-test-data.seq GeCo3-repro-1-test-data.seq.de differ: byte 40, line 1
5: GeCo3-repro-1-test-data.seq GeCo3-repro-1-test-data.seq.de differ: byte 40, line 1
6: GeCo3-repro-1-test-data.seq GeCo3-repro-1-test-data.seq.de differ: byte 40, line 1
7: OK

Now input is renamed to "1.seq"
1: OK
2: 1.seq 1.seq.de differ: byte 29, line 1
3: 1.seq 1.seq.de differ: byte 17, line 1
4: OK
5: OK
6: OK
7: 1.seq 1.seq.de differ: byte 59, line 1

Leaking input file name into other data structures may be indication of larger memory safety problems. It may be good idea to review the code for memory safety issues. May be you can consider using static analysis tools such as CoverityScan.

Test machine: Ubuntu 18.04.1 LTS, dual Xeon E5-2643v3.

Let me know if you need any additional information or testing.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.