Giter Club home page Giter Club logo

Comments (4)

Cecilia-Sensalari avatar Cecilia-Sensalari commented on May 28, 2024

Hi,

There is a warning message at the beginning of that output that might help out:

INFO Checking if sequence data files exist and if sequence IDs are compatible with wgd pipeline...
WARNING Poa trivialis: sequence IDs in FASTA file [cds_final.fasta] could raise an error due to:
WARNING - ID length longer than 50 characters, it is advised to shorten them
WARNING - ID name contains one or more characters that are not allowed: =
INFO Completed

We noticed in the past that long sequence names or sequence names with unusual characters (like =) would become problematic. Therefore we suggest to short out the sequence names (but make sure that they still remain unique per sequence), perhaps removing any spaces and unusual characters (in this case the =).
Afterwards, delete the already generated BLAST table and MCL file and rerun the paralog-ks command from scratch.

I hope this can be of help, let me know how it goes!

Cecilia

from ksrates.

caiobrunharo avatar caiobrunharo commented on May 28, 2024

Hi Cecilia

Thank you for your quick reply. I was able to edit the fasta files to remove the equal signs and shorten the sequence length while keeping them unique. I am still getting the same error. Any other suggestions would be appreciated. Thank you!

(WGD) [cpb5881@p-sc-2001 wgd]$ ksrates paralogs-ks config_filename.txt --n-threads 16

INFO    - - - - - - - - - - - - - - - - - - - -
INFO    Paralog wgd analysis for species [poatr]
INFO    Thu Jun  2 10:43:48 2022
INFO    - - - - - - - - - - - - - - - - - - - -
INFO    Checking if sequence data files exist and if sequence IDs are compatible with wgd pipeline...
INFO    Completed
INFO    Creating directory [paralog_distributions/]
INFO    Running wgd paralog Ks pipeline...
INFO    ---
INFO    Checking external software...
INFO    makeblastdb: 2.5.0+
INFO    blastp: 2.5.0+
INFO    mcl 14-137
INFO    muscle 5.1.linux64 []
INFO    AAML in paml version 4.9j, February 2020
INFO    Usage for FastTree version 2.1.11 Double precision (No SSE3):
INFO    Creating output directory /storage/group/cpb5881/default/WGD/wgd/paralog_distributions/wgd_poatr
INFO    Translating CDS file cds_final_edited_edited_edited_edited.fasta...
INFO    ---
INFO    Running all versus all Blastp
INFO    Writing protein Blastdb sequences to /storage/group/cpb5881/default/WGD/wgd/paralog_distributions/wgd_poatr/poatr.blast_tmp/...
INFO    Writing protein query sequences to /storage/group/cpb5881/default/WGD/wgd/paralog_distributions/wgd_poatr/poatr.blast_tmp/...
INFO    Performing all versus all Blastp (this might take a while)...
INFO    Making Blastdb
INFO    makeblastdb -in /storage/group/cpb5881/default/WGD/wgd/paralog_distributions/wgd_poatr/poatr.blast_tmp/poatr.db.fasta -dbtype prot
INFO    makeblastdb output:
Building a new DB, current time: 06/02/2022 10:43:55
New DB name:   /storage/group/cpb5881/default/WGD/wgd/paralog_distributions/wgd_poatr/poatr.blast_tmp/poatr.db.fasta
New DB title:  /storage/group/cpb5881/default/WGD/wgd/paralog_distributions/wgd_poatr/poatr.blast_tmp/poatr.db.fasta
Sequence type: Protein
Keep MBits: T
Maximum file size: 1000000000B
Adding sequences from FASTA; added 33248 sequences in 1.24499 seconds.
INFO    Running Blastp
INFO    blastp -db /storage/group/cpb5881/default/WGD/wgd/paralog_distributions/wgd_poatr/poatr.blast_tmp/poatr.db.fasta -query /storage/group/cpb5881/default/WGD/wgd/paralog_distributions/wgd_poatr/poatr.blast_tmp/poatr.query.fasta -evalue 1e-10 -outfmt 6 -num_threads 16 -out /storage/group/cpb5881/default/WGD/wgd/paralog_distributions/wgd_poatr/poatr.blast.tsv
INFO    All versus all Blastp done
INFO    Removing tmp directory
INFO    ---
INFO    Running gene family construction (MCL clustering with inflation factor = 2.0)
INFO    Started MCL clustering (mcl)
INFO    ---
INFO    Running whole paranome Ks analysis...
WARNING Filtered out the 3 largest gene families because their size is > 200
WARNING If you want to analyse these large families anyhow, please raise the `max_gene_family_size` parameter
INFO    Started analysis of 5019 gene families in parallel using 16 threads
INFO    Performing analysis on gene family GF_000004 (size 165)
ERROR   Unexpected internal error during analysis of gene family GF_000004:
Traceback (most recent call last):
  File "/storage/home/cpb5881/.local/lib/python3.8/site-packages/wgd_ksrates/ks_distribution.py", line 278, in analyse_family_try_except
    analysis_function(family_id, family, nucleotide, tmp, codeml, preserve,
  File "/storage/home/cpb5881/.local/lib/python3.8/site-packages/wgd_ksrates/ks_distribution.py", line 371, in analyse_family
    msa_path, stats, successful = prepare_aln(msa_path_protein, nucleotide)
  File "/storage/home/cpb5881/.local/lib/python3.8/site-packages/wgd_ksrates/alignment.py", line 43, in prepare_aln
    with open(msa_file, 'r') as f:
FileNotFoundError: [Errno 2] No such file or directory: '/storage/group/cpb5881/default/WGD/wgd/paralog_distributions/wgd_poatr/poatr.ks_tmp/GF_000004.fasta.msa'
ERROR   Skipping gene family
INFO    Performing analysis on gene family GF_000005 (size 138)

... [same error for the other 54 gene families]


FileNotFoundError: [Errno 2] No such file or directory: '/storage/group/cpb5881/default/WGD/wgd/paralog_distributions/wgd_poatr/poatr.ks_tmp/GF_000054.fasta.msa'
ERROR   Skipping gene family
ERROR   Too many gene family analyses failed, terminating threads...
ERROR   Too many gene family analyses failed, terminating threads...
ERROR   Too many gene family analyses failed, terminating threads...
ERROR   Too many gene family analyses failed, terminating threads...
ERROR   Too many gene family analyses failed, terminating threads...
ERROR   Too many gene family analyses failed, terminating threads...
ERROR   --
ERROR   The analyses of more than 1% of gene families [51/5019] have failed due to unexpected internal errors
ERROR   Please check the nature of the error(s), remove the tmp directory [/storage/group/cpb5881/default/WGD/wgd/paralog_distributions/wgd_poatr/poatr.ks_tmp] and rerun the Ks analysis
ERROR   See the tracebacks above for the following gene family IDs:
ERROR   GF_000004
ERROR   GF_000005
ERROR   GF_000006
ERROR   GF_000007
ERROR   GF_000008
ERROR   GF_000009
ERROR   GF_000010
ERROR   GF_000011
ERROR   GF_000012
ERROR   GF_000013
ERROR   GF_000014
ERROR   GF_000015
ERROR   GF_000016
ERROR   GF_000017
ERROR   GF_000018
ERROR   GF_000019
ERROR   GF_000020
ERROR   GF_000021
ERROR   GF_000022
ERROR   GF_000023
ERROR   GF_000024
ERROR   GF_000025
ERROR   GF_000026
ERROR   GF_000027
ERROR   GF_000028
ERROR   GF_000029
ERROR   GF_000030
ERROR   GF_000031
ERROR   GF_000032
ERROR   GF_000033
ERROR   GF_000034
ERROR   GF_000035
ERROR   GF_000036
ERROR   GF_000037
ERROR   GF_000038
ERROR   GF_000039
ERROR   GF_000040
ERROR   GF_000041
ERROR   GF_000042
ERROR   GF_000043
ERROR   GF_000044
ERROR   GF_000045
ERROR   GF_000046
ERROR   GF_000047
ERROR   GF_000048
ERROR   GF_000049
ERROR   GF_000050
ERROR   GF_000051
ERROR   GF_000052
ERROR   GF_000053
ERROR   GF_000054
ERROR   Exiting

from ksrates.

caiobrunharo avatar caiobrunharo commented on May 28, 2024

I think I figured out what the issue was (or at least what solved the issue). I downgraded muscle from 5.1 to 3.8 and this step is now completing and a ks file is being generated.

from ksrates.

Cecilia-Sensalari avatar Cecilia-Sensalari commented on May 28, 2024

Hi,

Great that you could find the cause! Sorry about this versioning issue, we indeed developed and tested it with muscle 3.8.31.

Cecilia

from ksrates.

Related Issues (17)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.