Giter Club home page Giter Club logo

aaf's Introduction

AAF (Alignment and Assembly Free)

This is a package for constructing phylogeny without doing alignment or assembly. For instruction on usage, check out aafUserManual.doc.

If you need to cite AAF: Fan H, Ives A, Surget-Groba Y, Cannon C (2015). An assembly and alignment-free method of phylogeny reconstruction from next-generation sequencing data, BMC Genomics 16:522

Installation

Prerequisites

AAF can be used on a UNIX system (Linux, OsX...) with Python 2.7+ and higher versions (including Python 3.X+), and g++/gcc compilers. Biopython (http://biopython.org/wiki/Main_Page) is required for the non-parametric bootstrap, and R (http://cran.r-project.org/) and the R package 'ape' are required for the parametric bootstrap.

Install

  1. Get the source code

     wget https://github.com/fanhuan/AAF/AAF20160831.zip
    
  2. Compile kmer_count(x) and kmer_merge as follows. "path_to_AAF" stands for your path to the AAF folder generated by decompressing AAF.tar.gz.

     a. path_to_AAF/AAF$ cd phylokmer
    
     b. path_to_AAF/AAF/phylokmer$ make
    
     c. Add kmer_count(x) and kmer_merge to your PATH or working directory
    
  3. Compile fitch_kmerX, consense and treedist

     a. path_to_AAF/AAF$ cd phylip_src
    
     b. path_to_AAF/AAF/phylip_src$ make all
    
     c. Add fitch_kmerX and consense to your PATH or working directory  
    

Bootstrap

The most feedback I received about AAF are around bootsrap. It is very computationally intensive to do the two-step nonparametric bootstrap. In case you have a higher coverage (>8X), we assume that the incomplete coverage problem is minor. To reduce the computational load, you can choose only to carry out the seconde step of the bootstrap (nonparametric_bootstrap_s2only.py): sample the kmer table with replacement 1/k of the number of the rows of the table. To further reduce the computaiton, here is a version to sample from the shared kmer table (nonparametric_bootstrap_s2only_skt.py). Singletons from each sample (i.e. kmers that only appear in one sample) are calculated from the difference between the total diversity file and the shared kmer table. Then those singletons are added back during the the calculation of pariwise distance, following a poisson distribution with a mean of 1/k of each singletone number.

BetaVersion/nonparametric_bootstrap_s2only.py

BetaVersion/nonparametric_bootstrap_s2only_skt.py

This only does ONE boostrap. It is designed this way since some users use high throughput facilities. For high performance facility users, increase the ram and threads so each boostrap takes less time. You can wrap this script with a shell script. Be sure not to overwrite the boostrap tree generated each time.

Example:

python singletonCalculator.py phylokmer.dat.gz kmer_diversity.wc 25 -t 10  
[25 is k, compulsary, -t is the number of threads to use, optional. Default = 1.]  
[This would produce a file containing the number of singletons in each sample, in this case phylokmer_singleton.wc]
for ((i=1;i<=100;i++)) #boostrap 100 times
do
	python nonparametric_bootstrap_s2only_skt.py -i phylokmer.dat.gz --fs phylokmer_singleton.wc -t 10
	cat phylokmer_bootstrap.tre >> phylokmer_bootstrap
done
consense #use phylokmer_bootstrap_trees as infile

Note that to loop through 1 to 100, the syntax is different for Unix command. This works for bash on Ubuntu 16.04. If you are having problem with it, check out this post .

FAQ

  1. Dear User: If I have paired end (sample.1.fq, sample.2.fq) files for each sample, should I merge them as input for AAF or should I keep them separately in the ./data/ folder?

    Huan: If you have multiple files for one sample, please put them in the same folder. AAF detects things in one folder as one sample and take the name of the folder as the sample name. Unfortunately AAF does not deal with a mixture of folders and files in the data directory. Therefore if you have one sample that has multiple input files, the rest need to be in folders as well, even if some of them only have one sequence file. Of course you could merge input files from one sample into one so there are only files in the data directory. This way no subdirectories need to be made. Either way it should work. Just no mix of files and folders. I hope I’m not making this sounds more complicated than it needs to be.

  2. Dear User: Should I use the BetaVersion?

    Huan: Like any BetaVersion, it might not work on your machine and most importantly, it might not be consistant with the user manual. But let's be reckless and give it a try! Please email me or report an issue if it does not work. Thanks for your help!

aaf's People

Contributors

fanhuan avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

aaf's Issues

UnicodeDecodeError when running aaf_phylokmer.py

I have a directory of subdirectories that are labeled for each sample and contain the two paired-end reads from Illumina sequencing that I am trying to run with aaf_phylokmer.py.

Set up like this:

/data/
     /sample1/
            /sample1_r1.fastq
            /sample1_r1.fastq
     /sample2/
            /sample2_r1.fastq
            /sample2_r2.fastq
     /sample3/
            /sample3_r1.fastq
            /sample3_r2.fastq
     /sample4/
            /sample4_r1.fastq
            /sample4_r2.fastq

When I go to run this command: python3 aaf_phylokmer.py -k 19 -d data/ -o kmer_phlyo -W

I get this error message:

SAMPLE LIST:
sample1
sample2
sample3
sample4

Traceback (most recent call last):
  File "/Users/rimo/AAF/AAF20190129/aaf_phylokmer.py", line 162, in <module>
    firstChar = handle.read(1)
                ^^^^^^^^^^^^^^
  File "<frozen codecs>", line 322, in decode
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb0 in position 37: invalid start byte

Does anyone have any idea why? Or how to fix it?

AAF past version 2017 no download

Hallo FanHuan,
for reasons of comparisons I want to use version [AAF20171001.zip] in my research. When I click on the link in "past versions" it redirects me to a page of source code. Clicking on versions 2016* I find a download button.
I would appreciate if you could provide access to AAF20171001.zip.

Thank you very much in advance!
Aqualung

Math domain error in aaf_tip.py

Hello,

I have a Math domain error type when running aaf_tip.py on my data.

Traceback (most recent call last):
File "AAF20190129/aaf_tip.py", line 115, in
tip[key] = 0.5/float(kl) * math.log((Pr*Pe + Pta)/(Pr2 * Pe2))
ValueError: math domain error

data_gs_c_24.txt

You can find attached the file serving as tip_info_test.txt.

It seems that data structure is fine since the script manages to run through all equations until Eq*10 (that is in the error message)

Have you an idea of what could be causing the error ?

Thank you

AAF is not working

Hello Fanhuan,

I used AAF methods several times. it worked well. Recently I could not run it due the error copied below. Would you please let me know how can I fix it. Thanks. Aziz

chunkLength = 10956549
Tue Jul 6 10:06:04 2021 start running jobs
Tue Jul 6 10:06:04 2021 running 24 jobs
Traceback (most recent call last):
File "/scratch/brown/aebrahi/halstead/aebrahi/all_walnut_tree/AAF/aaf_distance.py", line 163, in
shared = job.get()
File "/usr/lib64/python2.7/multiprocessing/pool.py", line 554, in get
raise self._value
IndexError: list index out of range

Feedback BetaVersion distance matrix

Hallo FanHuan,
the BetaVersion as well as the latest version 2019 doubles the number of sequences in the header of the distance matrix file. This makes it unreadable for e.g. SplitsTree. The result from your test data is:

10 10
sp1 0 0.08774439038460255 0.17518807382881404 0.08730292323305468 0.08765974685380827 0.08899149001935049 0.08712292543596371 0.04616029487773169 0.12377583510162214 0.16504186033449375
sp10 [...]

I changed to the following and it works perfectly:

10
sp1 0 0.08774439038460255 0.17518807382881404 0.08730292323305468 0.08765974685380827 0.08899149001935049 0.08712292543596371 0.04616029487773169 0.12377583510162214 0.16504186033449375
sp10

Best regards,
Aqualung

AAF nonparametric bootstrap issue

Hi, I am having some trouble to run the nonparametric bootstrap on my data. Could you help me? Below you find the error message.

rgufal@cln0:~/Documentos/ANDRE/AAF$ python2 BetaVersion/nonparametric_bootstrap.py -k 25 -t 4 -G 16 -d data --S1 100 --S2 20
SPECIES LIST:
Pangu
Pgaud
Pgram
Psp1H
Psp2H
Pstri
Pten1
Pten2
Purvi
Raber
Ralb1
Ralb2
Rbald
Rbar1
Rbar2
Rbert
Rbifl
Rbrac
Rbras
Rbro1
Rbro2
Rbro3
Rbuch
Rcali
Rcap1
Rcap2
Rceph
Rceps
Rchal
Rchap
Rchin
Rcili
Rcilo
Rcolo
Rcoma
Rcomp
Rcrin
Rcube
Rdebi
Rdecu
Rdist
Rdiva
Rdive
Relat
Relli
Remac
Rexal
Rexim
Rfasc
Rfern
Rfili
Rfusc
Rgale
Rglaz
Rglo1
Rglo2
Rglo3
Rgray
Rharp
Rharv
Rimer
Rinex
Rinte
Rknie
Rlati
Rlava
Rlept
Rmacr
Rmari
Rmarl
Rmega
Rmegp
Rmicp
Rmicr
Rmili
Rmixt
Rnite
Rnive
Rodor
Rpall
Rperp
Rplum
Rpoly
Rprui
Rpube
Rpunc
Rpusi
Rrace
Rradi
Rrari
Rreco
Rried
Rripa
Rrobu
Rrubr
Rschi
Rscir
Rsesl
Rsier
RspI1
Rsten
Rsulc
Rtene
Rtenu
Rtrac
Rvulc
Rwrig
1 out of 100 times of bootstrap over reads.

Thu Feb 2 00:16:27 2017
running batch 1/27
kmer_count -l 25 -n 1 -G 4 -o data/Pangu.pkdat.gz -i '/home/rgufal/Documentos/ANDRE/AAF/boot_Folder/Pangu/Pangu.fa'
kmer_count -l 25 -n 1 -G 4 -o data/Pgaud.pkdat.gz -i '/home/rgufal/Documentos/ANDRE/AAF/boot_Folder/Pgaud/Pgaud.fa'
kmer_count -l 25 -n 1 -G 4 -o data/Pgram.pkdat.gz -i '/home/rgufal/Documentos/ANDRE/AAF/boot_Folder/Pgram/Pgram.fa'
kmer_count -l 25 -n 1 -G 4 -o data/Psp1H.pkdat.gz -i '/home/rgufal/Documentos/ANDRE/AAF/boot_Folder/Psp1H/Psp1H.fa'
Total: 236590 kmers
Total: 273945 kmers
Total: 405446 kmers
Total: 436814 kmers

kmer_count -l 25 -n 1 -G 4 -o data/Psp2H.pkdat.gz -i '/home/rgufal/Documentos/ANDRE/AAF/boot_Folder/Psp2H/Psp2H.fa'
kmer_count -l 25 -n 1 -G 4 -o data/Pstri.pkdat.gz -i '/home/rgufal/Documentos/ANDRE/AAF/boot_Folder/Pstri/Pstri.fa'
Thu Feb 2 00:16:28 2017
running batch 2/27
kmer_count -l 25 -n 1 -G 4 -o data/Pten1.pkdat.gz -i '/home/rgufal/Documentos/ANDRE/AAF/boot_Folder/Pten1/Pten1.fa'
kmer_count -l 25 -n 1 -G 4 -o data/Pten2.pkdat.gz -i '/home/rgufal/Documentos/ANDRE/AAF/boot_Folder/Pten2/Pten2.fa'
Total: 263520 kmers
Total: 281579 kmers
Total: 570942 kmers
Total: 617703 kmers

Thu Feb 2 00:16:30 2017
kmer_count -l 25 -n 1 -G 4 -o data/Purvi.pkdat.gz -i '/home/rgufal/Documentos/ANDRE/AAF/boot_Folder/Purvi/Purvi.fa'
running batch 3/27
kmer_count -l 25 -n 1 -G 4 -o data/Raber.pkdat.gz -i '/home/rgufal/Documentos/ANDRE/AAF/boot_Folder/Raber/Raber.fa'
kmer_count -l 25 -n 1 -G 4 -o data/Ralb1.pkdat.gz -i '/home/rgufal/Documentos/ANDRE/AAF/boot_Folder/Ralb1/Ralb1.fa'
kmer_count -l 25 -n 1 -G 4 -o data/Ralb2.pkdat.gz -i '/home/rgufal/Documentos/ANDRE/AAF/boot_Folder/Ralb2/Ralb2.fa'
Total: 578501 kmers
Total: 622151 kmers
Total: 897710 kmers
Total: 1129223 kmers

Thu Feb 2 00:16:32 2017
running batch 4/27
kmer_count -l 25 -n 1 -G 4 -o data/Rbald.pkdat.gz -i '/home/rgufal/Documentos/ANDRE/AAF/boot_Folder/Rbald/Rbald.fa'
kmer_count -l 25 -n 1 -G 4 -o data/Rbar1.pkdat.gz -i '/home/rgufal/Documentos/ANDRE/AAF/boot_Folder/Rbar1/Rbar1.fa'
kmer_count -l 25 -n 1 -G 4 -o data/Rbar2.pkdat.gz -i '/home/rgufal/Documentos/ANDRE/AAF/boot_Folder/Rbar2/Rbar2.fa'
kmer_count -l 25 -n 1 -G 4 -o data/Rbert.pkdat.gz -i '/home/rgufal/Documentos/ANDRE/AAF/boot_Folder/Rbert/Rbert.fa'
Total: 853690 kmers
Total: 903695 kmers
Total: 1105437 kmers
Total: 1112646 kmers

Thu Feb 2 00:16:36 2017
running batch 5/27
kmer_count -l 25 -n 1 -G 4 -o data/Rbifl.pkdat.gz -i '/home/rgufal/Documentos/ANDRE/AAF/boot_Folder/Rbifl/Rbifl.fa'
kmer_count -l 25 -n 1 -G 4 -o data/Rbrac.pkdat.gz -i '/home/rgufal/Documentos/ANDRE/AAF/boot_Folder/Rbrac/Rbrac.fa'
kmer_count -l 25 -n 1 -G 4 -o data/Rbras.pkdat.gz -i '/home/rgufal/Documentos/ANDRE/AAF/boot_Folder/Rbras/Rbras.fa'
kmer_count -l 25 -n 1 -G 4 -o data/Rbro1.pkdat.gz -i '/home/rgufal/Documentos/ANDRE/AAF/boot_Folder/Rbro1/Rbro1.fa'
Total: 1854 kmers
Total: 690664 kmers
Total: 944905 kmers
Total: 1091638 kmers

Thu Feb 2 00:16:38 2017
kmer_count -l 25 -n 1 -G 4 -o data/Rbro2.pkdat.gz -i '/home/rgufal/Documentos/ANDRE/AAF/boot_Folder/Rbro2/Rbro2.fa'
running batch 6/27
kmer_count -l 25 -n 1 -G 4 -o data/Rbro3.pkdat.gz -i '/home/rgufal/Documentos/ANDRE/AAF/boot_Folder/Rbro3/Rbro3.fa'
kmer_count -l 25 -n 1 -G 4 -o data/Rbuch.pkdat.gz -i '/home/rgufal/Documentos/ANDRE/AAF/boot_Folder/Rbuch/Rbuch.fa'
kmer_count -l 25 -n 1 -G 4 -o data/Rcali.pkdat.gz -i '/home/rgufal/Documentos/ANDRE/AAF/boot_Folder/Rcali/Rcali.fa'
Total: 900477 kmers
Total: 1131040 kmers
Total: 1277587 kmers
Total: 1299140 kmers

Thu Feb 2 00:16:41 2017
running batch 7/27
kmer_count -l 25 -n 1 -G 4 -o data/Rcap1.pkdat.gz -i '/home/rgufal/Documentos/ANDRE/AAF/boot_Folder/Rcap1/Rcap1.fa'
kmer_count -l 25 -n 1 -G 4 -o data/Rcap2.pkdat.gz -i '/home/rgufal/Documentos/ANDRE/AAF/boot_Folder/Rcap2/Rcap2.fa'
kmer_count -l 25 -n 1 -G 4 -o data/Rceph.pkdat.gz -i '/home/rgufal/Documentos/ANDRE/AAF/boot_Folder/Rceph/Rceph.fa'
kmer_count -l 25 -n 1 -G 4 -o data/Rceps.pkdat.gz -i '/home/rgufal/Documentos/ANDRE/AAF/boot_Folder/Rceps/Rceps.fa'
Total: 882956 kmers
Total: 1016400 kmers
Total: 1824643 kmers
Total: 1824965 kmers

kmer_count -l 25 -n 1 -G 4 -o data/Rchal.pkdat.gz -i '/home/rgufal/Documentos/ANDRE/AAF/boot_Folder/Rchal/Rchal.fa'
Thu Feb 2 00:16:46 2017
running batch 8/27
kmer_count -l 25 -n 1 -G 4 -o data/Rchap.pkdat.gz -i '/home/rgufal/Documentos/ANDRE/AAF/boot_Folder/Rchap/Rchap.fa'
kmer_count -l 25 -n 1 -G 4 -o data/Rchin.pkdat.gz -i '/home/rgufal/Documentos/ANDRE/AAF/boot_Folder/Rchin/Rchin.fa'
kmer_count -l 25 -n 1 -G 4 -o data/Rcili.pkdat.gz -i '/home/rgufal/Documentos/ANDRE/AAF/boot_Folder/Rcili/Rcili.fa'
Total: 518702 kmers
Total: 680853 kmers
Total: 853249 kmers
Total: 1020510 kmers

kmer_count -l 25 -n 1 -G 4 -o data/Rcilo.pkdat.gz -i '/home/rgufal/Documentos/ANDRE/AAF/boot_Folder/Rcilo/Rcilo.fa'
kmer_count -l 25 -n 1 -G 4 -o data/Rcolo.pkdat.gz -i '/home/rgufal/Documentos/ANDRE/AAF/boot_Folder/Rcolo/Rcolo.fa'
Thu Feb 2 00:16:48 2017
running batch 9/27
kmer_count -l 25 -n 1 -G 4 -o data/Rcoma.pkdat.gz -i '/home/rgufal/Documentos/ANDRE/AAF/boot_Folder/Rcoma/Rcoma.fa'
kmer_count -l 25 -n 1 -G 4 -o data/Rcomp.pkdat.gz -i '/home/rgufal/Documentos/ANDRE/AAF/boot_Folder/Rcomp/Rcomp.fa'
Total: 522384 kmers
Total: 791138 kmers
Total: 921074 kmers
Total: 1860267 kmers

kmer_count -l 25 -n 1 -G 4 -o data/Rcrin.pkdat.gz -i '/home/rgufal/Documentos/ANDRE/AAF/boot_Folder/Rcrin/Rcrin.fa'
kmer_count -l 25 -n 1 -G 4 -o data/Rcube.pkdat.gz -i '/home/rgufal/Documentos/ANDRE/AAF/boot_Folder/Rcube/Rcube.fa'
Thu Feb 2 00:16:52 2017
running batch 10/27
kmer_count -l 25 -n 1 -G 4 -o data/Rdebi.pkdat.gz -i '/home/rgufal/Documentos/ANDRE/AAF/boot_Folder/Rdebi/Rdebi.fa'
kmer_count -l 25 -n 1 -G 4 -o data/Rdecu.pkdat.gz -i '/home/rgufal/Documentos/ANDRE/AAF/boot_Folder/Rdecu/Rdecu.fa'
Total: 779288 kmers
Total: 1087005 kmers
Total: 1342066 kmers
Total: 1506813 kmers

Thu Feb 2 00:16:56 2017
running batch 11/27
kmer_count -l 25 -n 1 -G 4 -o data/Rdist.pkdat.gz -i '/home/rgufal/Documentos/ANDRE/AAF/boot_Folder/Rdist/Rdist.fa'
kmer_count -l 25 -n 1 -G 4 -o data/Rdiva.pkdat.gz -i '/home/rgufal/Documentos/ANDRE/AAF/boot_Folder/Rdiva/Rdiva.fa'
kmer_count -l 25 -n 1 -G 4 -o data/Rdive.pkdat.gz -i '/home/rgufal/Documentos/ANDRE/AAF/boot_Folder/Rdive/Rdive.fa'
kmer_count -l 25 -n 1 -G 4 -o data/Relat.pkdat.gz -i '/home/rgufal/Documentos/ANDRE/AAF/boot_Folder/Relat/Relat.fa'
Total: 18214 kmers
Total: 672527 kmers
Total: 956949 kmers
Total: 1399766 kmers

kmer_count -l 25 -n 1 -G 4 -o data/Relli.pkdat.gz -i '/home/rgufal/Documentos/ANDRE/AAF/boot_Folder/Relli/Relli.fa'
Thu Feb 2 00:16:59 2017
running batch 12/27
kmer_count -l 25 -n 1 -G 4 -o data/Remac.pkdat.gz -i '/home/rgufal/Documentos/ANDRE/AAF/boot_Folder/Remac/Remac.fa'
kmer_count -l 25 -n 1 -G 4 -o data/Rexal.pkdat.gz -i '/home/rgufal/Documentos/ANDRE/AAF/boot_Folder/Rexal/Rexal.fa'
kmer_count -l 25 -n 1 -G 4 -o data/Rexim.pkdat.gz -i '/home/rgufal/Documentos/ANDRE/AAF/boot_Folder/Rexim/Rexim.fa'
Total: 653168 kmers
Total: 583478 kmers
Total: 853401 kmers
Total: 964250 kmers

kmer_count -l 25 -n 1 -G 4 -o data/Rfasc.pkdat.gz -i '/home/rgufal/Documentos/ANDRE/AAF/boot_Folder/Rfasc/Rfasc.fa'
kmer_count -l 25 -n 1 -G 4 -o data/Rfern.pkdat.gz -i '/home/rgufal/Documentos/ANDRE/AAF/boot_Folder/Rfern/Rfern.fa'
kmer_count -l 25 -n 1 -G 4 -o data/Rfili.pkdat.gz -i '/home/rgufal/Documentos/ANDRE/AAF/boot_Folder/Rfili/Rfili.fa'
Thu Feb 2 00:17:01 2017
running batch 13/27
kmer_count -l 25 -n 1 -G 4 -o data/Rfusc.pkdat.gz -i '/home/rgufal/Documentos/ANDRE/AAF/boot_Folder/Rfusc/Rfusc.fa'
Total: 486312 kmers
Total: 645445 kmers
Total: 756005 kmers
Total: 946328 kmers

kmer_count -l 25 -n 1 -G 4 -o data/Rgale.pkdat.gz -i '/home/rgufal/Documentos/ANDRE/AAF/boot_Folder/Rgale/Rgale.fa'
Thu Feb 2 00:17:03 2017
running batch 14/27
kmer_count -l 25 -n 1 -G 4 -o data/Rglaz.pkdat.gz -i '/home/rgufal/Documentos/ANDRE/AAF/boot_Folder/Rglaz/Rglaz.fa'
kmer_count -l 25 -n 1 -G 4 -o data/Rglo1.pkdat.gz -i '/home/rgufal/Documentos/ANDRE/AAF/boot_Folder/Rglo1/Rglo1.fa'
kmer_count -l 25 -n 1 -G 4 -o data/Rglo2.pkdat.gz -i '/home/rgufal/Documentos/ANDRE/AAF/boot_Folder/Rglo2/Rglo2.fa'
Total: 340459 kmers
Total: 790820 kmers
Total: 843002 kmers
Total: 1631524 kmers

Thu Feb 2 00:17:07 2017
running batch 15/27
kmer_count -l 25 -n 1 -G 4 -o data/Rglo3.pkdat.gz -i '/home/rgufal/Documentos/ANDRE/AAF/boot_Folder/Rglo3/Rglo3.fa'
kmer_count -l 25 -n 1 -G 4 -o data/Rgray.pkdat.gz -i '/home/rgufal/Documentos/ANDRE/AAF/boot_Folder/Rgray/Rgray.fa'
kmer_count -l 25 -n 1 -G 4 -o data/Rharp.pkdat.gz -i '/home/rgufal/Documentos/ANDRE/AAF/boot_Folder/Rharp/Rharp.fa'
kmer_count -l 25 -n 1 -G 4 -o data/Rharv.pkdat.gz -i '/home/rgufal/Documentos/ANDRE/AAF/boot_Folder/Rharv/Rharv.fa'
Total: 715601 kmers
Total: 914590 kmers
Total: 1038797 kmers
Total: 1184725 kmers

kmer_count -l 25 -n 1 -G 4 -o data/Rimer.pkdat.gz -i '/home/rgufal/Documentos/ANDRE/AAF/boot_Folder/Rimer/Rimer.fa'
Thu Feb 2 00:17:10 2017
running batch 16/27
kmer_count -l 25 -n 1 -G 4 -o data/Rinex.pkdat.gz -i '/home/rgufal/Documentos/ANDRE/AAF/boot_Folder/Rinex/Rinex.fa'
kmer_count -l 25 -n 1 -G 4 -o data/Rinte.pkdat.gz -i '/home/rgufal/Documentos/ANDRE/AAF/boot_Folder/Rinte/Rinte.fa'
kmer_count -l 25 -n 1 -G 4 -o data/Rknie.pkdat.gz -i '/home/rgufal/Documentos/ANDRE/AAF/boot_Folder/Rknie/Rknie.fa'
Total: 702146 kmers
Total: 1030774 kmers
Total: 1347309 kmers
Total: 1341808 kmers

kmer_count -l 25 -n 1 -G 4 -o data/Rlati.pkdat.gz -i '/home/rgufal/Documentos/ANDRE/AAF/boot_Folder/Rlati/Rlati.fa'
Thu Feb 2 00:17:13 2017
running batch 17/27
kmer_count -l 25 -n 1 -G 4 -o data/Rlava.pkdat.gz -i '/home/rgufal/Documentos/ANDRE/AAF/boot_Folder/Rlava/Rlava.fa'
kmer_count -l 25 -n 1 -G 4 -o data/Rlept.pkdat.gz -i '/home/rgufal/Documentos/ANDRE/AAF/boot_Folder/Rlept/Rlept.fa'
kmer_count -l 25 -n 1 -G 4 -o data/Rmacr.pkdat.gz -i '/home/rgufal/Documentos/ANDRE/AAF/boot_Folder/Rmacr/Rmacr.fa'
Total: 525570 kmers
Total: 576172 kmers
Total: 1345604 kmers
Total: 1596797 kmers

Thu Feb 2 00:17:17 2017
kmer_count -l 25 -n 1 -G 4 -o data/Rmari.pkdat.gz -i '/home/rgufal/Documentos/ANDRE/AAF/boot_Folder/Rmari/Rmari.fa'
running batch 18/27
kmer_count -l 25 -n 1 -G 4 -o data/Rmarl.pkdat.gz -i '/home/rgufal/Documentos/ANDRE/AAF/boot_Folder/Rmarl/Rmarl.fa'
kmer_count -l 25 -n 1 -G 4 -o data/Rmega.pkdat.gz -i '/home/rgufal/Documentos/ANDRE/AAF/boot_Folder/Rmega/Rmega.fa'
kmer_count -l 25 -n 1 -G 4 -o data/Rmegp.pkdat.gz -i '/home/rgufal/Documentos/ANDRE/AAF/boot_Folder/Rmegp/Rmegp.fa'
Total: 646760 kmers
Total: 589472 kmers
Total: 905129 kmers
Total: 1478663 kmers

Thu Feb 2 00:17:20 2017
running batch 19/27
kmer_count -l 25 -n 1 -G 4 -o data/Rmicp.pkdat.gz -i '/home/rgufal/Documentos/ANDRE/AAF/boot_Folder/Rmicp/Rmicp.fa'
kmer_count -l 25 -n 1 -G 4 -o data/Rmicr.pkdat.gz -i '/home/rgufal/Documentos/ANDRE/AAF/boot_Folder/Rmicr/Rmicr.fa'
kmer_count -l 25 -n 1 -G 4 -o data/Rmili.pkdat.gz -i '/home/rgufal/Documentos/ANDRE/AAF/boot_Folder/Rmili/Rmili.fa'
kmer_count -l 25 -n 1 -G 4 -o data/Rmixt.pkdat.gz -i '/home/rgufal/Documentos/ANDRE/AAF/boot_Folder/Rmixt/Rmixt.fa'
Total: 369315 kmers
Total: 971278 kmers
Total: 1700172 kmers
Total: 1870676 kmers

Thu Feb 2 00:17:25 2017
running batch 20/27
kmer_count -l 25 -n 1 -G 4 -o data/Rnite.pkdat.gz -i '/home/rgufal/Documentos/ANDRE/AAF/boot_Folder/Rnite/Rnite.fa'
kmer_count -l 25 -n 1 -G 4 -o data/Rnive.pkdat.gz -i '/home/rgufal/Documentos/ANDRE/AAF/boot_Folder/Rnive/Rnive.fa'
kmer_count -l 25 -n 1 -G 4 -o data/Rodor.pkdat.gz -i '/home/rgufal/Documentos/ANDRE/AAF/boot_Folder/Rodor/Rodor.fa'
kmer_count -l 25 -n 1 -G 4 -o data/Rpall.pkdat.gz -i '/home/rgufal/Documentos/ANDRE/AAF/boot_Folder/Rpall/Rpall.fa'
Total: 712578 kmers
Total: 866813 kmers
Total: 903405 kmers
Total: 1044035 kmers

Thu Feb 2 00:17:27 2017
kmer_count -l 25 -n 1 -G 4 -o data/Rperp.pkdat.gz -i '/home/rgufal/Documentos/ANDRE/AAF/boot_Folder/Rperp/Rperp.fa'
running batch 21/27
kmer_count -l 25 -n 1 -G 4 -o data/Rplum.pkdat.gz -i '/home/rgufal/Documentos/ANDRE/AAF/boot_Folder/Rplum/Rplum.fa'
kmer_count -l 25 -n 1 -G 4 -o data/Rpoly.pkdat.gz -i '/home/rgufal/Documentos/ANDRE/AAF/boot_Folder/Rpoly/Rpoly.fa'
kmer_count -l 25 -n 1 -G 4 -o data/Rprui.pkdat.gz -i '/home/rgufal/Documentos/ANDRE/AAF/boot_Folder/Rprui/Rprui.fa'
Total: 8314 kmers
Total: 413046 kmers
Total: 1192931 kmers
Total: 1944980 kmers

Thu Feb 2 00:17:31 2017
running batch 22/27
kmer_count -l 25 -n 1 -G 4 -o data/Rpube.pkdat.gz -i '/home/rgufal/Documentos/ANDRE/AAF/boot_Folder/Rpube/Rpube.fa'
kmer_count -l 25 -n 1 -G 4 -o data/Rpunc.pkdat.gz -i '/home/rgufal/Documentos/ANDRE/AAF/boot_Folder/Rpunc/Rpunc.fa'
kmer_count -l 25 -n 1 -G 4 -o data/Rpusi.pkdat.gz -i '/home/rgufal/Documentos/ANDRE/AAF/boot_Folder/Rpusi/Rpusi.fa'
kmer_count -l 25 -n 1 -G 4 -o data/Rrace.pkdat.gz -i '/home/rgufal/Documentos/ANDRE/AAF/boot_Folder/Rrace/Rrace.fa'
Total: 1423 kmers
Total: 407698 kmers
Total: 643732 kmers
Total: 1482929 kmers

Thu Feb 2 00:17:34 2017
running batch 23/27
kmer_count -l 25 -n 1 -G 4 -o data/Rradi.pkdat.gz -i '/home/rgufal/Documentos/ANDRE/AAF/boot_Folder/Rradi/Rradi.fa'
kmer_count -l 25 -n 1 -G 4 -o data/Rrari.pkdat.gz -i '/home/rgufal/Documentos/ANDRE/AAF/boot_Folder/Rrari/Rrari.fa'
kmer_count -l 25 -n 1 -G 4 -o data/Rreco.pkdat.gz -i '/home/rgufal/Documentos/ANDRE/AAF/boot_Folder/Rreco/Rreco.fa'
kmer_count -l 25 -n 1 -G 4 -o data/Rried.pkdat.gz -i '/home/rgufal/Documentos/ANDRE/AAF/boot_Folder/Rried/Rried.fa'
Total: 47773 kmers
Total: 679416 kmers
Total: 769224 kmers
Total: 864651 kmers

Thu Feb 2 00:17:36 2017
running batch 24/27
kmer_count -l 25 -n 1 -G 4 -o data/Rripa.pkdat.gz -i '/home/rgufal/Documentos/ANDRE/AAF/boot_Folder/Rripa/Rripa.fa'
kmer_count -l 25 -n 1 -G 4 -o data/Rrobu.pkdat.gz -i '/home/rgufal/Documentos/ANDRE/AAF/boot_Folder/Rrobu/Rrobu.fa'
kmer_count -l 25 -n 1 -G 4 -o data/Rrubr.pkdat.gz -i '/home/rgufal/Documentos/ANDRE/AAF/boot_Folder/Rrubr/Rrubr.fa'
kmer_count -l 25 -n 1 -G 4 -o data/Rschi.pkdat.gz -i '/home/rgufal/Documentos/ANDRE/AAF/boot_Folder/Rschi/Rschi.fa'
Total: 19780 kmers
Total: 54908 kmers
Total: 824393 kmers
Total: 903644 kmers

Thu Feb 2 00:17:38 2017
running batch 25/27
kmer_count -l 25 -n 1 -G 4 -o data/Rscir.pkdat.gz -i '/home/rgufal/Documentos/ANDRE/AAF/boot_Folder/Rscir/Rscir.fa'
kmer_count -l 25 -n 1 -G 4 -o data/Rsesl.pkdat.gz -i '/home/rgufal/Documentos/ANDRE/AAF/boot_Folder/Rsesl/Rsesl.fa'
kmer_count -l 25 -n 1 -G 4 -o data/Rsier.pkdat.gz -i '/home/rgufal/Documentos/ANDRE/AAF/boot_Folder/Rsier/Rsier.fa'
kmer_count -l 25 -n 1 -G 4 -o data/RspI1.pkdat.gz -i '/home/rgufal/Documentos/ANDRE/AAF/boot_Folder/RspI1/RspI1.fa'
Total: 696521 kmers
Total: 697564 kmers
Total: 862815 kmers
Total: 812405 kmers

Thu Feb 2 00:17:41 2017
kmer_count -l 25 -n 1 -G 4 -o data/Rsten.pkdat.gz -i '/home/rgufal/Documentos/ANDRE/AAF/boot_Folder/Rsten/Rsten.fa'
running batch 26/27
kmer_count -l 25 -n 1 -G 4 -o data/Rsulc.pkdat.gz -i '/home/rgufal/Documentos/ANDRE/AAF/boot_Folder/Rsulc/Rsulc.fa'
kmer_count -l 25 -n 1 -G 4 -o data/Rtene.pkdat.gz -i '/home/rgufal/Documentos/ANDRE/AAF/boot_Folder/Rtene/Rtene.fa'
kmer_count -l 25 -n 1 -G 4 -o data/Rtenu.pkdat.gz -i '/home/rgufal/Documentos/ANDRE/AAF/boot_Folder/Rtenu/Rtenu.fa'
Total: 163323 kmers
Total: 238938 kmers
Total: 1473427 kmers
Total: 1624038 kmers

kmer_count -l 25 -n 1 -G 4 -o data/Rtrac.pkdat.gz -i '/home/rgufal/Documentos/ANDRE/AAF/boot_Folder/Rtrac/Rtrac.fa'
kmer_count -l 25 -n 1 -G 4 -o data/Rvulc.pkdat.gz -i '/home/rgufal/Documentos/ANDRE/AAF/boot_Folder/Rvulc/Rvulc.fa'
Thu Feb 2 00:17:44 2017
running last batch
kmer_count -l 25 -n 1 -G 4 -o data/Rwrig.pkdat.gz -i '/home/rgufal/Documentos/ANDRE/AAF/boot_Folder/Rwrig/Rwrig.fa'
Total: 19111 kmers
Total: 39221 kmers
Total: 1132628 kmers

Thu Feb 2 00:17:47 2017
kmer_merge -k s -c -d '0' -a 'O,M,F' 'data/Pangu.pkdat.gz' 'data/Pgaud.pkdat.gz' 'data/Pgram.pkdat.gz' 'data/Psp1H.pkdat.gz' 'data/Psp2H.pkdat.gz' 'data/Pstri.pkdat.gz' 'data/Pten1.pkdat.gz' 'data/Pten2.pkdat.gz' 'data/Purvi.pkdat.gz' 'data/Raber.pkdat.gz' 'data/Ralb1.pkdat.gz' 'data/Ralb2.pkdat.gz' 'data/Rbald.pkdat.gz' 'data/Rbar1.pkdat.gz' 'data/Rbar2.pkdat.gz' 'data/Rbert.pkdat.gz' 'data/Rbifl.pkdat.gz' 'data/Rbrac.pkdat.gz' 'data/Rbras.pkdat.gz' 'data/Rbro1.pkdat.gz' 'data/Rbro2.pkdat.gz' 'data/Rbro3.pkdat.gz' 'data/Rbuch.pkdat.gz' 'data/Rcali.pkdat.gz' 'data/Rcap1.pkdat.gz' 'data/Rcap2.pkdat.gz' 'data/Rceph.pkdat.gz' 'data/Rceps.pkdat.gz' 'data/Rchal.pkdat.gz' 'data/Rchap.pkdat.gz' 'data/Rchin.pkdat.gz' 'data/Rcili.pkdat.gz' 'data/Rcilo.pkdat.gz' 'data/Rcolo.pkdat.gz' 'data/Rcoma.pkdat.gz' 'data/Rcomp.pkdat.gz' 'data/Rcrin.pkdat.gz' 'data/Rcube.pkdat.gz' 'data/Rdebi.pkdat.gz' 'data/Rdecu.pkdat.gz' 'data/Rdist.pkdat.gz' 'data/Rdiva.pkdat.gz' 'data/Rdive.pkdat.gz' 'data/Relat.pkdat.gz' 'data/Relli.pkdat.gz' 'data/Remac.pkdat.gz' 'data/Rexal.pkdat.gz' 'data/Rexim.pkdat.gz' 'data/Rfasc.pkdat.gz' 'data/Rfern.pkdat.gz' 'data/Rfili.pkdat.gz' 'data/Rfusc.pkdat.gz' 'data/Rgale.pkdat.gz' 'data/Rglaz.pkdat.gz' 'data/Rglo1.pkdat.gz' 'data/Rglo2.pkdat.gz' 'data/Rglo3.pkdat.gz' 'data/Rgray.pkdat.gz' 'data/Rharp.pkdat.gz' 'data/Rharv.pkdat.gz' 'data/Rimer.pkdat.gz' 'data/Rinex.pkdat.gz' 'data/Rinte.pkdat.gz' 'data/Rknie.pkdat.gz' 'data/Rlati.pkdat.gz' 'data/Rlava.pkdat.gz' 'data/Rlept.pkdat.gz' 'data/Rmacr.pkdat.gz' 'data/Rmari.pkdat.gz' 'data/Rmarl.pkdat.gz' 'data/Rmega.pkdat.gz' 'data/Rmegp.pkdat.gz' 'data/Rmicp.pkdat.gz' 'data/Rmicr.pkdat.gz' 'data/Rmili.pkdat.gz' 'data/Rmixt.pkdat.gz' 'data/Rnite.pkdat.gz' 'data/Rnive.pkdat.gz' 'data/Rodor.pkdat.gz' 'data/Rpall.pkdat.gz' 'data/Rperp.pkdat.gz' 'data/Rplum.pkdat.gz' 'data/Rpoly.pkdat.gz' 'data/Rprui.pkdat.gz' 'data/Rpube.pkdat.gz' 'data/Rpunc.pkdat.gz' 'data/Rpusi.pkdat.gz' 'data/Rrace.pkdat.gz' 'data/Rradi.pkdat.gz' 'data/Rrari.pkdat.gz' 'data/Rreco.pkdat.gz' 'data/Rried.pkdat.gz' 'data/Rripa.pkdat.gz' 'data/Rrobu.pkdat.gz' 'data/Rrubr.pkdat.gz' 'data/Rschi.pkdat.gz' 'data/Rscir.pkdat.gz' 'data/Rsesl.pkdat.gz' 'data/Rsier.pkdat.gz' 'data/RspI1.pkdat.gz' 'data/Rsten.pkdat.gz' 'data/Rsulc.pkdat.gz' 'data/Rtene.pkdat.gz' 'data/Rtenu.pkdat.gz' 'data/Rtrac.pkdat.gz' 'data/Rvulc.pkdat.gz' 'data/Rwrig.pkdat.gz' | cut -f 2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40,42,44,46,48,50,52,54,56,58,60,62,64,66,68,70,72,74,76,78,80,82,84,86,88,90,92,94,96,98,100,102,104,106,108,110,112,114,116,118,120,122,124,126,128,130,132,134,136,138,140,142,144,146,148,150,152,154,156,158,160,162,164,166,168,170,172,174,176,178,180,182,184,186,188,190,192,194,196,198,200,202,204,206,208,210,212,214 | gzip > data/phylokmer.dat.gz
Thu Feb 2 00:25:49 2017

Thu Feb 2 00:25:49 2017 start calulation distances
.
Traceback (most recent call last):
File "BetaVersion/nonparametric_bootstrap.py", line 376, in
aaf_distance(outFile,nThreads,memory,samples,options.kLen)
File "BetaVersion/nonparametric_bootstrap.py", line 92, in aaf_distance
total,shared = job.get()
File "/usr/lib/python2.7/multiprocessing/pool.py", line 567, in get
raise self._value
IOError: bad message length

Thanks,
André

Non parametric bootstrap

Hello, I ran AAF to make a tree very easily, but the bootstrap step is failing in reading the input files.
My command :
python nonparametric_bootstrap.py -k 21 -t 10 -d data --S1 2 --S2 2
(my data are fastq.gz files in the folder data)
And I got that error :
sp1
sp2
...
last_sp
1 out of 3 times of bootstrap over reads.
Traceback (most recent call last):
File "nonparametric_bootstrap.py", line 316, in
for seq_record in SeqIO.parse(handle, seqFormat):
File "/home/un/miniconda2/lib/python2.7/site-packages/Bio/SeqIO/init.py", line 655, in parse
for r in i:
File "/home/un/miniconda2/lib/python2.7/site-packages/Bio/SeqIO/QualityIO.py", line 1240, in FastqIlluminaIterator
for title_line, seq_string, quality_string in FastqGeneralIterator(handle):
File "/home/un/miniconda2/lib/python2.7/site-packages/Bio/SeqIO/QualityIO.py", line 904, in FastqGeneralIterator
"Records in Fastq files should start with '@' character")
ValueError: Records in Fastq files should start with '@' character

I suspect it could be due to the .gz compression of my data, but it was fine for the distance and tree building step...
If I remove the first bootstrap step (--S1 0) and/or the second step (--S2 0), I get a broken pipe error :
...
gzip: stdout: Broken pipe

gzip: stdout: Broken pipe
Segmentation fault (core dumped)
Segmentation fault (core dumped)

gzip: stdout: Broken pipe

gzip: stdout: Broken pipe
Segmentation fault (core dumped)

I think it's not a big issue, but it prevent the bootstrap to be done...
Thanks in advance, Damien

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.