Giter Club home page Giter Club logo

mifish's People

Contributors

billzt avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

mifish's Issues

Issue with crabs db_download with mitofish

I am trying to download the mitofish database using the crabs conda installation (as far as I know Docker does not play well on the NeSI infrastructure). We are getting the following error:
crabs db_download --source mitofish --output mitofish.fasta --keep_original yes

downloading sequences from the MitoFish database
Traceback (most recent call last):
File "/nesi/nobackup/uoo03004/alana_crabs/crabs/crabs_env/bin/crabs", line 1372, in
main()
File "/nesi/nobackup/uoo03004/alana_crabs/crabs/crabs_env/bin/crabs", line 1369, in main
args.func(args)
File "/nesi/nobackup/uoo03004/alana_crabs/crabs/crabs_env/bin/crabs", line 96, in db_download
dl_file = mitofish_download(url)
File "/nesi/nobackup/uoo03004/alana_crabs/crabs/crabs_env/lib/python3.6/site-packages/function/module_db_download.py", line 139, in mitofish_download
os.remove('complete_partial_mitogenomes.zip')
FileNotFoundError: [Errno 2] No such file or directory: 'complete_partial_mitogenomes.zip'

Species_num error

Hi all,

I've been trying to work with MiFish with a custom amplicon reference database built using makeblastdb, and a results file in .fasta format.

I've been using the command:

mifish seq/ database/crabtest.fasta

All the dependencies are found et al. But I get this error:

Detect your data as
#########
	zip warning: name not matched: ./MiFishResult/Sample-*/01_filter_fastq_and_merge/*.html

zip error: Nothing to do! (./MiFishResult/QC.zip)
Traceback (most recent call last):
  File "/home/labaccount/miniconda3/envs/MiFish/bin/mifish", line 33, in <module>
    sys.exit(load_entry_point('mifish', 'console_scripts', 'mifish')())
  File "/home/labaccount/projects/mifish_test/MiFish/mifish/cmd/mifish.py", line 82, in main
    pipeline.runMiFish(data_dir=args.seq_dir, data_dir_other_groups=data_dir_other_groups, \
  File "/home/labaccount/projects/mifish_test/MiFish/mifish/core/pipeline.py", line 395, in runMiFish
    if simple_result == False and 'species_num' in stat_data and stat_data['species_num'] > 3:
UnboundLocalError: local variable 'stat_data' referenced before assignment

I'm not sure how to interpret this. It looks like potentially there are few matches in the amplicon database?

Help with multiple input data files (groups)

The mitofish pipeline is working when I use a single input directory, however, I am trying to analyze the data in three subgroups. I have placed each sub-group into its own folder and specified each of the folders by using the "-d" argument.

However, I am now getting an error (see below) with the pipeline as it seems it cannot find the database.

What am I doing wrong?

$ mifish -o output/WTRBA-YEARS -t 124 -d /home/cbfgws6/MiFish/WTRBA_YEAR/21 -d /home/cbfgws6/MiFish/WTRBA_YEAR/22 -d /home/cbfgws6/MiFish/WTRBA_YEAR/23 /home/cbfgws6/MiFish/mifishdb-Oct2023/mitofish.db.fa
usage: mifish [-h] [-d OTHER_DATA_DIR] [-m MIN_READ_LEN] [-M MAX_READ_LEN] [-f PRIMER_FWD] [-r PRIMER_REV] [-u UNOISE_MIN] [-i BLAST_MIN_IDENTITY] [-s] [-k]
              [-o OUTPUT_DIR] [-t THREADS]
              seq_dir db
mifish: error: the following arguments are required: db

core dumped with usearch -otutab

Hi there. While testing the pipeline with 2 groups of real data, I got a core dumped with usearch.

mifish seq/AA1 ../../MitoFish_db/MitoFish -d seq/AB2 -s -o MiFish_re_Result
#########
Sample AA1_3 Step 0: Decompress
Sample AA1_3 Step 1: filter the quality of FASTQ and merge Pair-End Reads
Sample AA1_3 Step 2: filter read length and remove primers
Sample AA1_3 Step 3: De-noise and generate haploid
sh: line 1: 165256 Aborted                 (core dumped) usearch -otutab MiFish_re_Result/MiFishResult/Sample-AA1_3/02_process_fasta/AA1_3.processed.fa -zotus MiFish_re_Result/MiFishResult/Sample-AA1_3/03_haploid/AA1_3.zotus.fasta -threads 2 -otutabout MiFish_re_Result/MiFishResult/Sample-AA1_3/03_haploid/AA1_3.zotus.size.txt > MiFish_re_Result/MiFishResult/Sample-AA1_3/03_haploid/AA1_3.otutab.log 2>&1
Traceback (most recent call last):
  File "/jdfsbjcas1/workdir/Env/miniconda/envs/MiFish_re/bin/mifish", line 33, in <module>
    sys.exit(load_entry_point('mifish', 'console_scripts', 'mifish')())
  File "/jdfsbjcas1/workdir/Tools/test_install/MiFish/mifish/cmd/mifish.py", line 71, in main
    pipeline.runMiFish(data_dir=args.seq_dir, data_dir_other_groups=data_dir_other_groups, \
  File "/jdfsbjcas1/workdir/Tools/test_install/MiFish/mifish/core/pipeline.py", line 223, in runMiFish
    sizeFasIntegrator.run(zotusCountFile=f'{workdir_sample}/03_haploid/{sample_name}.zotus.size.txt', \
  File "/jdfsbjcas1/workdir/Tools/test_install/MiFish/mifish/core/sizeFasIntegrator.py", line 5, in run
    with open(zotusCountFile) as handle:
FileNotFoundError: [Errno 2] No such file or directory: 'MiFish_re_Result/MiFishResult/Sample-AA1_3/03_haploid/AA1_3.zotus.size.txt'

In the end of AA1_3.otutab.log , we found

terminate called after throwing an instance of 'std::bad_alloc'
  what():  std::bad_alloc

Was the memory limit of 32-bit version usearch casued the issue?

How to Format DB for use with MitoFish

I downloaded the entire database from the site: http://mitofish.aori.u-tokyo.ac.jp/species/detail/download/?filename=download%2F/complete_partial_mitogenomes.zip

Then I used this command

$ makeblastdb -in mito-all.fa -dbtype nucl

Building a new DB, current time: 09/28/2023 16:06:29
New DB name:   /home/cbfgws6/MiFish/mifishdb/mito-all.fa
New DB title:  mito-all.fa
Sequence type: Nucleotide
Keep MBits: T
Maximum file size: 1000000000B
Adding sequences from FASTA; added 825365 sequences in 24.0024 seconds.

I then attempt to run the pipline and I get this error:

$ mifish -d /home/cbfgws6/MiFish/WTRBA_21-22-23/ seq /home/cbfgws6/MiFish/mifishdb/ -t 124 -o WTRBA_ALL
Error: /home/cbfgws6/MiFish/mifishdb/ does not seem to be a valid database for NCBI BLAST+

What am I doing wrong?

mifish crashes when a group has been skipped

I am using three groups; this sample <Sample 21_Pt4_LO_S_1_> in the first group has only 27 reads and is skipped during initial processing.

Sample 21_Pt4_LO_S_1_ Step 0: Decompress
Sample 21_Pt4_LO_S_1_ Step 1: filter the quality of FASTQ and merge Pair-End Reads
Sample 21_Pt4_LO_S_1_ Step 2: filter read length and remove primers
Sample 21_Pt4_LO_S_1_ has not passed read length filter. Only has 27 reads. Skip

Later, the pipeline crashes:

Traceback (most recent call last):
  File "/home/cbfgws6/miniconda3/envs/MiFish/bin/mifish", line 33, in <module>
    sys.exit(load_entry_point('mifish', 'console_scripts', 'mifish')())
  File "/home/cbfgws6/MiFish/mifish/cmd/mifish.py", line 76, in main
    pipeline.runMiFish(data_dir=args.seq_dir, data_dir_other_groups=data_dir_other_groups, \
  File "/home/cbfgws6/MiFish/mifish/core/pipeline.py", line 397, in runMiFish
    json.dump(stat.eco_diversity(workdir, group_to_sample), fp=out_handle, indent=4)
  File "/home/cbfgws6/MiFish/mifish/core/stat.py", line 79, in eco_diversity
    with open(f'{workdir_sample}/04_blast/{sample_name}.json') as handle:
FileNotFoundError: [Errno 2] No such file or directory: 'output/WTRBA-YEARS/MiFishResult/Sample-21_Pt4_LO_S_1_/04_blast/21_Pt4_LO_S_1_.json'

It is true that there is no JSON file, as it was skipped. The pipeline should figure this out, or at least not crash, and move on to the next sample.

The immediate workaround would be to remove this sample (or any samples that are "SKIPPED" from the analysis/pipeline.

ete3.parser.newick.NewickError in Step 5

Hi, thank you for the great tool.
I installed MiFish in a new conda environment as recommended. While testing with mifish seq mifishdbv3.83.fa -d seq2, I got an error during Step 5: Phylogenetic Analysis.

Detect your data as
#########
Group1: 1 samples
Sample DRR126155: read type = pe
#########
Group2: 1 samples
Sample DRR126155B: read type = pe
#########
Sample DRR126155 Step 0: Decompress
Sample DRR126155 Step 1: filter the quality of FASTQ and merge Pair-End Reads
Sample DRR126155 Step 2: filter read length and remove primers
Sample DRR126155 Step 3: De-noise and generate haploid
Sample DRR126155 Step 4: BLAST and calculate LOD Score
Sample DRR126155 Step 5: Phylogenetic Analysis
Traceback (most recent call last):
  File "/jdfsbjcas1/workdir/Env/miniconda/envs/MiFish/bin/mifish", line 33, in <module>
    sys.exit(load_entry_point('mifish', 'console_scripts', 'mifish')())
  File "/jdfsbjcas1/workdir/Tools/MiFish/mifish/cmd/mifish.py", line 71, in main
    pipeline.runMiFish(data_dir=args.seq_dir, data_dir_other_groups=data_dir_other_groups, \
  File "/jdfsbjcas1/workdir/Tools/MiFish/mifish/core/pipeline.py", line 349, in runMiFish
    drawTree.svg(species_result=species_result, tree_file=f'{workdir_sample}/05_MSA/{sample_name}.nwk', \
  File "/jdfsbjcas1/workdir/Tools/MiFish/mifish/core/drawTree.py", line 29, in svg
    tree_handle = Tree(tree_file)
  File "/jdfsbjcas1/workdir/Env/miniconda/envs/MiFish/lib/python3.9/site-packages/ete3/coretype/tree.py", line 212, in __init__
    read_newick(newick, root_node = self, format=format,
  File "/jdfsbjcas1/workdir/Env/miniconda/envs/MiFish/lib/python3.9/site-packages/ete3/parser/newick.py", line 264, in read_newick
    raise NewickError('Unexisting tree file or Malformed newick tree structure.')
ete3.parser.newick.NewickError: Unexisting tree file or Malformed newick tree structure.
You may want to check other newick loading flags like 'format' or 'quoted_node_names'.

If -s was added, the pipeline would finish smoothly. In my conda env, ete3==3.1.2, as recommended. How can I rule this out?

For your reference, my conda env was as follows:

# packages in environment at /jdfsbjcas1/workdir/Env/miniconda/envs/MiFish:
#
# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                 conda_forge    conda-forge
_openmp_mutex             4.5                       2_gnu    conda-forge
alsa-lib                  1.2.8                h166bdaf_0    conda-forge
appdirs                   1.4.4              pyh9f0ad1d_0    conda-forge
arrow-cpp                 10.0.1           ha770c72_6_cpu    conda-forge
asttokens                 2.2.1              pyhd8ed1ab_0    conda-forge
attr                      2.5.1                h166bdaf_1    conda-forge
attrs                     22.2.0             pyh71513ae_0    conda-forge
aws-c-auth                0.6.21               hd93a3ba_3    conda-forge
aws-c-cal                 0.5.20               hff2c3d7_3    conda-forge
aws-c-common              0.8.5                h166bdaf_0    conda-forge
aws-c-compression         0.2.16               hf5f93bc_0    conda-forge
aws-c-event-stream        0.2.18               h57874a7_0    conda-forge
aws-c-http                0.7.0                h96ef541_0    conda-forge
aws-c-io                  0.13.12              h57ca295_1    conda-forge
aws-c-mqtt                0.7.13              h0b5698f_12    conda-forge
aws-c-s3                  0.2.3                h82cbbf9_0    conda-forge
aws-c-sdkutils            0.1.7                hf5f93bc_0    conda-forge
aws-checksums             0.1.14               h6027aba_0    conda-forge
aws-crt-cpp               0.18.16             hf80f573_10    conda-forge
aws-sdk-cpp               1.10.57              ha834a50_1    conda-forge
backcall                  0.2.0              pyh9f0ad1d_0    conda-forge
backports                 1.0                pyhd8ed1ab_3    conda-forge
backports.functools_lru_cache 1.6.4              pyhd8ed1ab_0    conda-forge
biopython                 1.79             py39hb9d737c_3    conda-forge
brotli                    1.0.9                h166bdaf_8    conda-forge
brotli-bin                1.0.9                h166bdaf_8    conda-forge
brotlipy                  0.7.0           py39hb9d737c_1005    conda-forge
bzip2                     1.0.8                h7f98852_4    conda-forge
c-ares                    1.18.1               h7f98852_0    conda-forge
ca-certificates           2022.12.7            ha878542_0    conda-forge
cachecontrol              0.12.11            pyhd8ed1ab_1    conda-forge
cairo                     1.16.0            ha61ee94_1014    conda-forge
certifi                   2022.12.7          pyhd8ed1ab_0    conda-forge
cffi                      1.15.1           py39he91dace_3    conda-forge
charset-normalizer        2.1.1              pyhd8ed1ab_0    conda-forge
colorama                  0.4.6              pyhd8ed1ab_0    conda-forge
contourpy                 1.0.7            py39h4b4f3f3_0    conda-forge
cryptography              39.0.0           py39h079d5ae_0    conda-forge
cutadapt                  4.1              py39hbf8eff0_1    bioconda
cycler                    0.11.0             pyhd8ed1ab_0    conda-forge
cython                    0.29.33          py39h227be39_0    conda-forge
dbus                      1.13.6               h5008d03_3    conda-forge
decorator                 5.1.1              pyhd8ed1ab_0    conda-forge
dnaio                     0.10.0           py39hbf8eff0_0    bioconda
ete3                      3.1.2              pyh9f0ad1d_0    conda-forge
exceptiongroup            1.1.0              pyhd8ed1ab_0    conda-forge
executing                 1.2.0              pyhd8ed1ab_0    conda-forge
expat                     2.5.0                h27087fc_0    conda-forge
fftw                      3.3.10          nompi_hf0379b8_106    conda-forge
font-ttf-dejavu-sans-mono 2.37                 hab24e00_0    conda-forge
font-ttf-inconsolata      3.000                h77eed37_0    conda-forge
font-ttf-source-code-pro  2.038                h77eed37_0    conda-forge
font-ttf-ubuntu           0.83                 hab24e00_0    conda-forge
fontconfig                2.14.2               h14ed4e7_0    conda-forge
fonts-conda-ecosystem     1                             0    conda-forge
fonts-conda-forge         1                             0    conda-forge
fonttools                 4.38.0           py39hb9d737c_1    conda-forge
freetype                  2.12.1               hca18f0e_1    conda-forge
gettext                   0.21.1               h27087fc_0    conda-forge
gflags                    2.2.2             he1b5a44_1004    conda-forge
glib                      2.74.1               h6239696_1    conda-forge
glib-tools                2.74.1               h6239696_1    conda-forge
glog                      0.6.0                h6f12383_0    conda-forge
graphite2                 1.3.13            h58526e2_1001    conda-forge
gst-plugins-base          1.21.3               h4243ec0_1    conda-forge
gstreamer                 1.21.3               h25f0c4b_1    conda-forge
gstreamer-orc             0.4.33               h166bdaf_0    conda-forge
harfbuzz                  6.0.0                h8e241bc_0    conda-forge
hdmedians                 0.14.2           py39h2ae25f5_3    conda-forge
icu                       70.1                 h27087fc_0    conda-forge
idna                      3.4                pyhd8ed1ab_0    conda-forge
iniconfig                 2.0.0              pyhd8ed1ab_0    conda-forge
ipython                   8.9.0              pyh41d4057_0    conda-forge
isa-l                     2.30.0               ha770c72_4    conda-forge
jack                      1.9.21               h583fa2b_2    conda-forge
jedi                      0.18.2             pyhd8ed1ab_0    conda-forge
joblib                    1.2.0              pyhd8ed1ab_0    conda-forge
jpeg                      9e                   h166bdaf_2    conda-forge
keyutils                  1.6.1                h166bdaf_0    conda-forge
kiwisolver                1.4.4            py39hf939315_1    conda-forge
krb5                      1.20.1               h81ceb04_0    conda-forge
lame                      3.100             h166bdaf_1003    conda-forge
lcms2                     2.14                 hfd0df8a_1    conda-forge
ld_impl_linux-64          2.40                 h41732ed_0    conda-forge
lerc                      4.0.0                h27087fc_0    conda-forge
libabseil                 20220623.0      cxx17_h05df665_6    conda-forge
libarrow                  10.0.1           hf9c26a6_6_cpu    conda-forge
libblas                   3.9.0           16_linux64_openblas    conda-forge
libbrotlicommon           1.0.9                h166bdaf_8    conda-forge
libbrotlidec              1.0.9                h166bdaf_8    conda-forge
libbrotlienc              1.0.9                h166bdaf_8    conda-forge
libcap                    2.66                 ha37c62d_0    conda-forge
libcblas                  3.9.0           16_linux64_openblas    conda-forge
libclang                  15.0.7          default_had23c3d_0    conda-forge
libclang13                15.0.7          default_h3e3d535_0    conda-forge
libcrc32c                 1.1.2                h9c3ff4c_0    conda-forge
libcups                   2.3.3                h36d4200_3    conda-forge
libcurl                   7.87.0               hdc1c0ab_0    conda-forge
libdb                     6.2.32               h9c3ff4c_0    conda-forge
libdeflate                1.17                 h0b41bf4_0    conda-forge
libedit                   3.1.20191231         he28a2e2_2    conda-forge
libev                     4.33                 h516909a_1    conda-forge
libevent                  2.1.10               h28343ad_4    conda-forge
libffi                    3.4.2                h7f98852_5    conda-forge
libflac                   1.4.2                h27087fc_0    conda-forge
libgcc-ng                 12.2.0              h65d4601_19    conda-forge
libgcrypt                 1.10.1               h166bdaf_0    conda-forge
libgfortran-ng            12.2.0              h69a702a_19    conda-forge
libgfortran5              12.2.0              h337968e_19    conda-forge
libglib                   2.74.1               h606061b_1    conda-forge
libgomp                   12.2.0              h65d4601_19    conda-forge
libgoogle-cloud           2.5.0                h21dfe5b_1    conda-forge
libgpg-error              1.46                 h620e276_0    conda-forge
libgrpc                   1.51.1               h30feacc_0    conda-forge
libiconv                  1.17                 h166bdaf_0    conda-forge
libjpeg-turbo             2.1.4                h166bdaf_0    conda-forge
liblapack                 3.9.0           16_linux64_openblas    conda-forge
libllvm15                 15.0.7               hadd5161_0    conda-forge
libnghttp2                1.51.0               hff17c54_0    conda-forge
libnsl                    2.0.0                h7f98852_0    conda-forge
libogg                    1.3.4                h7f98852_1    conda-forge
libopenblas               0.3.21          pthreads_h78a6416_3    conda-forge
libopus                   1.3.1                h7f98852_1    conda-forge
libpng                    1.6.39               h753d276_0    conda-forge
libpq                     15.1                 hb675445_3    conda-forge
libprotobuf               3.21.12              h3eb15da_0    conda-forge
libsndfile                1.2.0                hb75c966_0    conda-forge
libsqlite                 3.40.0               h753d276_0    conda-forge
libssh2                   1.10.0               hf14f497_3    conda-forge
libstdcxx-ng              12.2.0              h46fd767_19    conda-forge
libsystemd0               252                  h2a991cd_0    conda-forge
libthrift                 0.16.0               he500d00_2    conda-forge
libtiff                   4.5.0                h6adf6a1_2    conda-forge
libtool                   2.4.7                h27087fc_0    conda-forge
libudev1                  252                  h166bdaf_0    conda-forge
libutf8proc               2.8.0                h166bdaf_0    conda-forge
libuuid                   2.32.1            h7f98852_1000    conda-forge
libvorbis                 1.3.7                h9c3ff4c_0    conda-forge
libwebp-base              1.2.4                h166bdaf_0    conda-forge
libxcb                    1.13              h7f98852_1004    conda-forge
libxkbcommon              1.0.3                he3ba5ed_0    conda-forge
libxml2                   2.10.3               h7463322_0    conda-forge
libxslt                   1.1.37               h873f0b0_0    conda-forge
libzlib                   1.2.13               h166bdaf_4    conda-forge
lockfile                  0.12.2                     py_1    conda-forge
lxml                      4.9.2            py39h14694de_0    conda-forge
lz4-c                     1.9.4                hcb278e6_0    conda-forge
matplotlib-base           3.6.3            py39he190548_0    conda-forge
matplotlib-inline         0.1.6              pyhd8ed1ab_0    conda-forge
mifish                    1.0                       dev_0    <develop>
mpg123                    1.31.2               hcb278e6_0    conda-forge
msgpack-python            1.0.4            py39hf939315_1    conda-forge
munkres                   1.1.4              pyh9f0ad1d_0    conda-forge
mysql-common              8.0.32               ha901b37_0    conda-forge
mysql-libs                8.0.32               hd7da12d_0    conda-forge
natsort                   8.2.0              pyhd8ed1ab_0    conda-forge
ncurses                   6.3                  h27087fc_1    conda-forge
nspr                      4.35                 h27087fc_0    conda-forge
nss                       3.82                 he02c5a1_0    conda-forge
numpy                     1.23.1           py39hba7629e_0    conda-forge
openjpeg                  2.5.0                hfec8fc6_2    conda-forge
openssl                   3.0.7                h0b41bf4_2    conda-forge
orc                       1.8.2                hfdbbad2_0    conda-forge
packaging                 23.0               pyhd8ed1ab_0    conda-forge
pandas                    1.5.3            py39h2ad29b5_0    conda-forge
parquet-cpp               1.5.1                         1    conda-forge
parso                     0.8.3              pyhd8ed1ab_0    conda-forge
pbzip2                    1.1.13                        0    conda-forge
pcre2                     10.40                hc3806b6_0    conda-forge
pexpect                   4.8.0              pyh1a96a4e_2    conda-forge
pickleshare               0.7.5           py39hde42818_1002    conda-forge
pigz                      2.6                  h27826a3_0    conda-forge
pillow                    9.4.0            py39ha08a7e4_0    conda-forge
pip                       23.0               pyhd8ed1ab_0    conda-forge
pixman                    0.40.0               h36c2ea0_0    conda-forge
pluggy                    1.0.0            py39hf3d152e_4    conda-forge
ply                       3.11                       py_1    conda-forge
pooch                     1.6.0              pyhd8ed1ab_0    conda-forge
prompt-toolkit            3.0.36             pyha770c72_0    conda-forge
pthread-stubs             0.4               h36c2ea0_1001    conda-forge
ptyprocess                0.7.0              pyhd3deb0d_0    conda-forge
pulseaudio                16.1                 ha8d29e2_1    conda-forge
pure_eval                 0.2.2              pyhd8ed1ab_0    conda-forge
pyarrow                   10.0.1          py39hf0ef2fd_6_cpu    conda-forge
pycparser                 2.21               pyhd8ed1ab_0    conda-forge
pygments                  2.14.0             pyhd8ed1ab_0    conda-forge
pyopenssl                 23.0.0             pyhd8ed1ab_0    conda-forge
pyparsing                 3.0.9              pyhd8ed1ab_0    conda-forge
pyqt                      5.15.7           py39h5c7b992_3    conda-forge
pyqt5-sip                 12.11.0          py39h227be39_3    conda-forge
pysocks                   1.7.1            py39hf3d152e_5    conda-forge
pytest                    7.2.1              pyhd8ed1ab_0    conda-forge
python                    3.9.15          hba424b6_0_cpython    conda-forge
python-dateutil           2.8.2              pyhd8ed1ab_0    conda-forge
python-duckdb             0.6.1            py39hb98b84a_1    conda-forge
python-isal               1.1.0            py39hb9d737c_1    conda-forge
python_abi                3.9                      3_cp39    conda-forge
pytz                      2022.7.1           pyhd8ed1ab_0    conda-forge
qt-main                   5.15.6               h602db52_6    conda-forge
re2                       2022.06.01           h27087fc_1    conda-forge
readline                  8.1.2                h0f457ee_0    conda-forge
requests                  2.28.2             pyhd8ed1ab_0    conda-forge
s2n                       1.3.31               h3358134_0    conda-forge
scikit-bio                0.5.6            py39h16ac069_4    conda-forge
scikit-learn              1.2.1            py39h86b2a18_0    conda-forge
scipy                     1.10.0           py39h7360e5f_0    conda-forge
setuptools                66.1.1             pyhd8ed1ab_0    conda-forge
sip                       6.7.6            py39h227be39_0    conda-forge
six                       1.16.0             pyh6c4a22f_0    conda-forge
snappy                    1.1.9                hbd366e4_2    conda-forge
sqlite                    3.40.0               h4ff8645_0    conda-forge
stack_data                0.6.2              pyhd8ed1ab_0    conda-forge
threadpoolctl             3.1.0              pyh8a188c0_0    conda-forge
tk                        8.6.12               h27826a3_0    conda-forge
toml                      0.10.2             pyhd8ed1ab_0    conda-forge
tomli                     2.0.1              pyhd8ed1ab_0    conda-forge
traitlets                 5.9.0              pyhd8ed1ab_0    conda-forge
tzdata                    2022g                h191b570_0    conda-forge
unicodedata2              15.0.0           py39hb9d737c_0    conda-forge
urllib3                   1.26.14            pyhd8ed1ab_0    conda-forge
wcwidth                   0.2.6              pyhd8ed1ab_0    conda-forge
wheel                     0.38.4             pyhd8ed1ab_0    conda-forge
xcb-util                  0.4.0                h516909a_0    conda-forge
xcb-util-image            0.4.0                h166bdaf_0    conda-forge
xcb-util-keysyms          0.4.0                h516909a_0    conda-forge
xcb-util-renderutil       0.3.9                h166bdaf_0    conda-forge
xcb-util-wm               0.4.1                h516909a_0    conda-forge
xlsxwriter                3.0.3              pyhd8ed1ab_0    conda-forge
xopen                     1.7.0            py39hf3d152e_0    conda-forge
xorg-kbproto              1.0.7             h7f98852_1002    conda-forge
xorg-libice               1.0.10               h7f98852_0    conda-forge
xorg-libsm                1.2.3             hd9c2040_1000    conda-forge
xorg-libx11               1.7.2                h7f98852_0    conda-forge
xorg-libxau               1.0.9                h7f98852_0    conda-forge
xorg-libxdmcp             1.1.3                h7f98852_0    conda-forge
xorg-libxext              1.3.4                h7f98852_1    conda-forge
xorg-libxrender           0.9.10            h7f98852_1003    conda-forge
xorg-renderproto          0.11.1            h7f98852_1002    conda-forge
xorg-xextproto            7.3.0             h7f98852_1002    conda-forge
xorg-xproto               7.0.31            h7f98852_1007    conda-forge
xz                        5.2.6                h166bdaf_0    conda-forge
zlib                      1.2.13               h166bdaf_4    conda-forge
zstandard                 0.19.0           py39h29414ee_1    conda-forge
zstd                      1.5.2                h3eb15da_6    conda-forge

error parsing the blastxml in mifish/core/pipeline.py

Hi There. I was getting no hit results back and I noticed that the percent identities and #miss-matches being reported in "haploids with low identities" tab on the output taxonomy spreadsheet didnt make any sense.

Looking at pipeline.py line 256-261

# /core/pipeline.py
for alignment in blast_record.alignments:
    hsp = alignment.hsps[0]
    aln_len = alignment.length
    identity = hsp.identities/aln_len
    if identity >= blast_identity/100:
        good_alns.append(alignment)

For me aln_len is reporting the length of the hit record in the database, not the HSP overlap length. This means the identity number is really much smaller than it should be. I fixed it by assigning aln_len to hsp.align_length (see below).

#/core/pipeline.py
for alignment in blast_record.alignments:
    hsp = alignment.hsps[0]
    aln_len = hsp.align_length #alignment.length
    identity = hsp.identities/aln_len
    if identity >= blast_identity/100:
        good_alns.append(alignment)

Now I get correct reporting on the identity because it is dividing by the HSP length and not the hit record length.

This also needs to be fixed on lines 266 and 289 (moving it below the hsp assignment which occurs on line 269 and 292, respectively)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.