Giter Club home page Giter Club logo

trycycler's Introduction

Trycycler

Trycycler is a tool for generating consensus long-read assemblies for bacterial genomes. I.e. if you have multiple long-read assemblies for the same isolate, Trycycler can combine them into a single assembly that is better than any of your inputs.

For installation instructions, usage, deeper explanations and more, head over to the Trycycler wiki!

For our paper describing Trycycler, follow this link: Wick RR, Judd LM, Cerdeira LT, Hawkey J, Méric G, Vezina B, Wyres KL, Holt KE. Trycycler: consensus long-read assemblies for bacterial genomes. Genome Biology. 2021. doi:10.1186/s13059-021-02483-z.

License GPL v3 DOI

trycycler's People

Contributors

rrwick avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

trycycler's Issues

Permission Denied [13]: minimap after pip install

Hello,

I have installed trycycler using pip, as it would not install via conda. I have also installed minimap2 and miniasm and all requirements are in place. However, as I try to run the subsample, I keep getting the "Permission Denied [error 13]: minimap2" even with read/write permissions updated.

Is there a separate step in the installation I missed, or a part of the command line that needs to point to where minimap2 is installed? Any guidance would be appreciated.

Maximum insertion/deletion sizes

Hi,

all my contigs have problems withe the maximum insertion size.

Maximum insertion/deletion sizes:
A_contig_1: 0 417 481 304 300 310 846 691 598
B_utg000001l: 417 0 481 298 298 298 1647 298 598
C_Utg574: 481 481 0 481 481 481 1647 481 598
D_contig_2: 304 298 481 0 9 2 1647 20 598
E_utg000001l: 300 298 481 9 0 9 1647 20 598
G_contig_2: 310 298 481 2 9 0 1647 20 598
H_utg000001l: 846 1647 1647 1647 1647 1647 0 483 598
J_contig_2: 691 298 481 20 20 20 483 0 598
K_utg000001c: 598 598 598 598 598 598 598 598 0

I therefore just used --max_indel_size 1650 for trycycler reconcile.
Is it okay to do so? What is the problem with my sequences?

Thanks in advance.

Assemblers

Hi, In your wiki, under the section on assembling the subsets generated by Trycycler you mention Flye, Raven, Can etc. Is there any reason not to use Unicycler for my long-read Nanopore data? Thanks.

0.5.0 not in conda?

When can we expect the latest 0.5.0 release to be avaliable via conda?

thanx :-)

Chloroplast genome asssembly

Hi,

I am wondering if Trycycler can be used for another kind of circular genome assembly (chloroplast) with nanopore reads?

Chloroplasts generally exist with two haplotypes in the same cell. (Same size around 150 kb with some structural variations. For example, one haplotype has an inversion around 30 kb).

Do you think if it is possible to assemble two haplotypes at the same time using Trycycler?

Best Regards,
Mehmet

gfa file to visualize the final consensus fasta by Bandage

Hi Ryan,
I am using Trycycler to assembly 90 bacterial genomes from fish guts. It is an amazing and very helpful tool. I was able to obtain circular, > 98 % completeness, and very good coverage in all the genomes after following the pipeline and polishing the assemblies by medaka. However, I'm wondering if there is an easy and proper way to obtain a gfa-like file to visualize the genomes in bandage. I would like to plot those circular chromosomes and plasmids as well as those linear ones in a bandage-like figure. I've been looking if there is any available script in the Trycycler pipeline to do this, but so far I haven't found any.

Hope you can help me.

Thank you in advance.

Arturo.

how to automatically deal circularization problem in **trycycler reconcile** step

Hi,
After assembly and cluster step, durning reconcile process I meet a error as:

Error: failed to circularise sequence K_utg000001l because its end could not be found in other
sequences. You can either trim some sequence off the end of K_utg000001l or exclude the sequence
altogether and try again.

Then the whole workflow stacked, how to continue with this process by some automatic operations.

Thanks

Assembly qscore plots

Hi,
I came across the assembly qscore vs read depth plot, and the accuracy worst 100 bp plot.
I would like to make similar plots for my data. Would you be so kind to share the scripts?

Also, speaking of assembly qscores: The MISAG/MIMAG genome standards require an assembly to have a qscore of ≥50. This seems very high, as the flye assembly from the plot at 400x coverage only gets up to a qscore of ~46. What's your take on this requirement? Also, it's not clear what they mean, ie. mean qscore over the whole assembly?

Zero coverage at clustering

Thanks @rrwick for developing this tool, and for making the wiki so clear and easy to follow!

I'm not sure this is really a trycycler issue but wanted your thoughts. I've done long read subsampling and assembly as suggested in the wiki - 12 read sets, assembled with flye, miniasm+minipolish, raven, redbean.

When I come to the clustering step it runs fine but I notice there are 2-3 contigs in the redbean assemblies with coverage of zero. I've given trycycler the complete read set, which I previously subsampled to generate the redbean assemblies. Apart from using the wrong read sets I can't work out how I could get zero coverage.

Do you think this might be an error during the redbean assembly, or during the clustering? Or something else?

version.py of 0.5.2 is still 0.5.1

I think perhaps you forgot the version bump before releasing 0.5.2. Took me a couple of minutes to figure out why version 0.5.1 was still on my path after installing 0.5.2. :-)

Using mafft instead of muscle for MSA step?

I am using Trycycler v0.5.3 and have been running into trouble on the msa step on a few of my assemblies. Using the default msa settings and regardless of whether I use muscle v3 or v5, I will consistently get an error that muscle couldn't finish on one or more segments. Since the temporary files are automatically deleted, I can't troubleshoot this any further to determine where muscle is having problems. I've tried removing all contigs in the cluster that required addition or trimming for circularization and also those with more than a few 100 bp of indels and still couldn't get it to finish. I see from prior Issues that others have had this same or similar problem on this step, but it's not clear if a solution was available.

So I tried taking another approach and used mafft (v7.475) to directly align the 2_all_seqs.fasta file without any partitioning. This only took 3 hours to align the seven 6.9 Mbp contigs in the file using 12 threads. The consensus step seems to have run just fine on the mafft-produced 3_msa.fasta file:

chunks: 7,601 (3,801 same, 3,800 different)
combining small chunks: 6,549 (3,275 same, 3,274 different)

...

Consensus length: 6,924,934 bp

Different chunks needing assessment:     5
Different chunks not needing assessment: 3,269

...

Chunks where sequence is...
  the same as in the initial consensus: 2
  different to the initial consensus:   3

So with the caveat that you probably have not extensively (or maybe ever) tested trycycler using mafft without sequence partitioning and muscle for the multiple sequence alignment step, can you think of any reason why this approach wouldn't be acceptable when trycycler msa fails for unclear reasons?

Thanks.

Problem with mamba/conda install

I got the following error when I tried to install Trycycler with mamba (another anaconda package mangaer):

`I got the following error when I tried to install Trycycler with mamba (another anaconda package mangaer):

Looking for: ['trycycler']

bioconda/linux-64 [====================] (00m:00s) No change
bioconda/noarch [====================] (00m:00s) No change
anaconda/pkgs/main/linux [====================] (00m:00s) No change
anaconda/pkgs/r/linux-64 [====================] (00m:00s) No change
anaconda/pkgs/main/noarc [====================] (00m:00s) No change
anaconda/pkgs/r/noarch [====================] (00m:00s) No change
anaconda/pkgs/msys2/linu [====================] (00m:00s) No change
anaconda/pkgs/msys2/noar [====================] (00m:00s) No change
Encountered problems while solving.
Problem: nothing provides icu 54.* needed by r-base-3.4.1-0`

And I got the following error when I tried to install Trycycler with conda:
`Found conflicts! Looking for incompatible packages. failed

UnsatisfiableError: The following specifications were found to be incompatible with each other:

Output in format: Requested package -> Available versions`

Small-scale errors after polishing

Dear Ryan,

Thanks for developing this great tool.
My problem is not coming from Trycycler but probably from the step 7 "Polishing after Trycycler" so apologies for bothering you with this.
I have followed exactly your walk-through concerning step 7 "Polishing after Trycycler" until no changes are reported. However, on the handful of genomes performed so far with Trycycler, I have noticed that the polished bacterial genomes are still having a lot of small scales errors. This results in disrupted CDS i.e. premature stop codons.
Do you have any suggestion on how to improve this? Should a more stringent Illumina reads QC with fastp be performed for example? Is there any parameter that could be tweak? Any tips here would be really appreciated!
Many thanks in advance

no output after racon initial polishing round

Hello,
I'm using trycycler to assembly a bacterial genome with pacbio data, for generating assembly I modified the miniasm_and_minipolish.sh script to adress the reads type (minimap2 -x ava-pb -t "$2" "$1" "$1" > "$overlaps" and minipolish --threads "$2" "$1" "$unpolished_assembly" --pacbio). Unfortunately, I have no output upon racon initial polishing round. Flye and raven assembly appear just fine.
I hope you can help.
Best,
Fety

bash miniasm_and_minipolish.sh read_subsets/sample_02.fastq "$threads" > assembly_02.gfa && any2fasta assembly_02.gfa > assemblies/assembly_02.fasta
[M::mm_idx_gen::8.696*1.56] collected minimizers
[M::mm_idx_gen::9.465*2.42] sorted minimizers
[M::main::9.466*2.42] loaded/built the index for 22400 target sequence(s)
[M::mm_mapopt_update::10.049*2.34] mid_occ = 76
[M::mm_idx_stat] kmer size: 19; skip: 5; is_hpc: 1; #seq: 22400
[M::mm_idx_stat::10.372*2.30] distinct minimizers: 24073649 (78.86% are singletons); average occurrences: 1.968; average spacing: 4.304; total length: 203856207
[M::worker_pipeline::31.269*9.23] mapped 22400 sequences
[M::main] Version: 2.23-r1111
[M::main] CMD: minimap2 -x ava-pb -t 16 read_subsets/sample_02.fastq read_subsets/sample_02.fastq
[M::main] Real time: 31.380 sec; CPU: 288.693 sec; Peak RSS: 2.097 GB
[M::main] ===> Step 1: reading read mappings <===
[M::ma_hit_read::2.686*1.00] read 2182721 hits; stored 3220936 hits and 22143 sequences (201659622 bp)
[M::main] ===> Step 2: 1-pass (crude) read selection <===
[M::ma_hit_sub::3.564*1.00] 21921 query sequences remain after sub
[M::ma_hit_cut::3.638*1.00] 3149337 hits remain after cut
[M::ma_hit_flt::3.706*1.00] 3050186 hits remain after filtering; crude coverage after filtering: 83.75
[M::main] ===> Step 3: 2-pass (fine) read selection <===
[M::ma_hit_sub::3.997*1.00] 21841 query sequences remain after sub
[M::ma_hit_cut::4.068*1.00] 3045274 hits remain after cut
[M::ma_hit_contained::4.147*1.00] 1038 sequences and 13666 hits remain after containment removal
[M::main] ===> Step 4: graph cleaning <===
[M::ma_sg_gen] read 12995 arcs
[M::main] ===> Step 4.1: transitive reduction <===
[M::asg_arc_del_trans] transitively reduced 10508 arcs
[M::asg_arc_del_multi] removed 36 multi-arcs
[M::asg_arc_del_asymm] removed 177 asymmetric arcs
[M::main] ===> Step 4.2: initial tip cutting and bubble popping <===
[M::asg_cut_tip] cut 56 tips
[M::asg_pop_bubble] popped 38 bubbles and trimmed 0 tips
[M::main] ===> Step 4.3: cutting short overlaps (3 rounds in total) <===
[M::asg_arc_del_multi] removed 0 multi-arcs
[M::asg_arc_del_asymm] removed 10 asymmetric arcs
[M::asg_arc_del_short] removed 32 short overlaps
[M::asg_cut_tip] cut 5 tips
[M::asg_pop_bubble] popped 7 bubbles and trimmed 0 tips
[M::asg_arc_del_short] removed 0 short overlaps
[M::asg_arc_del_short] removed 0 short overlaps
[M::main] ===> Step 4.4: removing short internal sequences and bi-loops <===
[M::asg_cut_internal] cut 1 internal sequences
[M::asg_cut_biloop] cut 0 small bi-loops
[M::asg_cut_tip] cut 0 tips
[M::asg_pop_bubble] popped 0 bubbles and trimmed 0 tips
[M::main] ===> Step 4.5: aggressively cutting short overlaps <===
[M::asg_arc_del_short] removed 0 short overlaps
[M::main] ===> Step 5: generating unitigs <===
[M::main] Version: 0.3-r179
[M::main] CMD: miniasm -f read_subsets/sample_02.fastq /tmp/tmp.x7U2Kr1O49.paf
[M::main] Real time: 4.875 sec; CPU: 4.876 sec

Checking requirements
    Minipolish requires Minimap2 and Racon to run, so it checks for these tools now.

Minimap2 found: /home/fiestaj/anaconda3/envs/trycycler/bin/minimap2 (v2.23-r1111)
Racon found:    /usr/local/bin/racon (v1.4.12)


Loading graph
    Loading the miniasm GFA graph into memory.

/tmp/tmp.wWrFV6CYBO.gfa
  4 segments (1,946,342 bp)
  8 links


Initial polishing round
    The first round of polishing is done on a per-segment basis and only uses reads
which are definitely associated with the segment (because the GFA indicated that they
were used to make the segment).

Running Racon on utg000001c:
  reads:      /tmp/tmp5vs645_s/utg000001c_reads.fastq (853 reads)
  input:      /tmp/tmp5vs645_s/utg000001c.fasta (1,923,351 bp)
  alignments: /tmp/tmp5vs645_s/utg000001c.paf (885 alignments)
  output:     /tmp/tmp5vs645_s/utg000001c_polished.fasta (0 bp)

Removing empty segment: utg000001c

Running Racon on utg000002c:
  reads:      /tmp/tmp5vs645_s/utg000002c_reads.fastq (3 reads)
  input:      /tmp/tmp5vs645_s/utg000002c.fasta (7,167 bp)
  alignments: /tmp/tmp5vs645_s/utg000002c.paf (18 alignments)
  output:     /tmp/tmp5vs645_s/utg000002c_polished.fasta (0 bp)

Removing empty segment: utg000002c

Running Racon on utg000003c:
  reads:      /tmp/tmp5vs645_s/utg000003c_reads.fastq (2 reads)
  input:      /tmp/tmp5vs645_s/utg000003c.fasta (12,955 bp)
  alignments: /tmp/tmp5vs645_s/utg000003c.paf (19 alignments)
  output:     /tmp/tmp5vs645_s/utg000003c_polished.fasta (0 bp)

Removing empty segment: utg000003c

Running Racon on utg000004c:
  reads:      /tmp/tmp5vs645_s/utg000004c_reads.fastq (2 reads)
  input:      /tmp/tmp5vs645_s/utg000004c.fasta (2,869 bp)
  alignments: /tmp/tmp5vs645_s/utg000004c.paf (33 alignments)
  output:     /tmp/tmp5vs645_s/utg000004c_polished.fasta (0 bp)

Removing empty segment: utg000004c



Full polishing rounds
    The assembly graph is now polished using all of the reads. Multiple rounds of
polishing are done, and circular contigs are rotated between rounds.


Running Racon on round_1:
  reads:      read_subsets/sample_02.fastq (22,400 reads)
  input:      /tmp/tmp5vs645_s/round_1.fasta (0 bp)
  alignments: /tmp/tmp5vs645_s/round_1.paf (0 alignments)
  output:     /tmp/tmp5vs645_s/round_1_polished.fasta (0 bp)


Running Racon on round_2:
  reads:      read_subsets/sample_02.fastq (22,400 reads)
  input:      /tmp/tmp5vs645_s/round_2.fasta (0 bp)
  alignments: /tmp/tmp5vs645_s/round_2.paf (0 alignments)
  output:     /tmp/tmp5vs645_s/round_2_polished.fasta (0 bp)


Assign read depths
    The reads are aligned to the contigs one final time to calculate read depth
values.

Aligning reads:
  reads:      read_subsets/sample_02.fastq (22,400 reads)
  contigs:    /tmp/tmp5vs645_s/depths.fasta (0 bp)
  alignments: /tmp/tmp5vs645_s/depths.paf (0 alignments)
  mean depth: 0.000x

This is any2fasta 0.4.2
Opening 'assembly_02.gfa'
ERROR: The input appears to be empty

python error during merging MSA

Hi Ryan,

First of all, kudos for developing Trycycler. I love how it allows me to assemble genomes without making arbitrary calls when comparing the output of different long read assembly pipelines.

I wanted to report a python error I encountered while running trycycler msa. The below error gets printed to screen and no 3_msa.fasta is written.

Merging MSA (2020-09-28 17:51:51)
    Each of the MSA pieces are now merged together and saved to file.

MSA length: 4,751,608 bp
Traceback (most recent call last):
  File "/home/linuxbrew/.linuxbrew/bin/trycycler", line 33, in <module>
    sys.exit(load_entry_point('Trycycler==0.3.0', 'console_scripts', 'trycycler')())
  File "/home/linuxbrew/.linuxbrew/opt/[email protected]/lib/python3.8/site-packages/trycycler/__main__.py", line 40, in main
    msa(args)
  File "/home/linuxbrew/.linuxbrew/opt/[email protected]/lib/python3.8/site-packages/trycycler/msa.py", line 35, in msa
    merge_pieces(temp_dir, args.cluster_dir, seqs)
  File "/home/linuxbrew/.linuxbrew/opt/[email protected]/lib/python3.8/site-packages/trycycler/msa.py", line 168, in merge_pieces
    assert seqs[n] == msa_minus_dashes
AssertionError

I have previously used trycycler to succesfully assemble two other isolates on this system. The error only occurs with this specific isolate.

I uploaded the trycycler reconciled contigs here in case you would like to have a look:
https://koenvdl.stackstorage.com/s/OUkaDUwU2xb7M7Q4

Cheers

Short and long reads come from different samples

Hi,

I have already ask the question in Unicycler - sorry for that.

What would be the best approach for assembling the genome when the short and long reads come from different samples (same strain but different DNA isolations)?

Thanks in advance :)

muscle command in msa.py missing -threads argument

The lines in question are:

muscle_command = ['muscle', '-in', input_filename, '-out', output_filename]

muscle_command = ['muscle', '-align', input_filename, '-output', output_filename]

Since the multiprocessing module is already controlling threads here, and muscle defaults to using all cores (up to 20), if you specify trycycler msa --threads 16 it will try to use num_cores*16 threads instead of just 16, yes?

FileNotFoundError during `trycycler cluster`

For some reason I get FileNotFoundError: [Errno 2] No such file or directory: '/tmp/tmpz1qeqdno/A_assemblies/canu_0_pos.fasta' when running trycycler cluster during the distance matrix part. It seems like the temp directory is being made but not the A_assemblies directory inside that.

I think I am using the latest version of Trycycler (that is to say, I python3 setup.py install'd in a directory called Trycycler-0.4.2 but the version.py in that is still 0.4.1)

I was able to Trycycle a different set of assemblies so it might be something on my end. I can't share the sequences unfortunately but I can try to see if I can get a reproducible example going.

Building distance matrix (2021-01-20 19:08:34)
    Mash is used to build a distance matrix of all contigs in the assemblies.

Traceback (most recent call last):
  File "/home/ubuntu/.local/bin/trycycler", line 11, in <module>
    load_entry_point('Trycycler==0.4.1', 'console_scripts', 'trycycler')()
  File "/home/ubuntu/.local/lib/python3.7/site-packages/Trycycler-0.4.1-py3.7.egg/trycycler/__main__.py", line 40, in main
    cluster(args)
  File "/home/ubuntu/.local/lib/python3.7/site-packages/Trycycler-0.4.1-py3.7.egg/trycycler/cluster.py", line 41, in cluster
    matrix = distance_matrix(seqs, seq_names, args.distance)
  File "/home/ubuntu/.local/lib/python3.7/site-packages/Trycycler-0.4.1-py3.7.egg/trycycler/cluster.py", line 232, in distance_matrix
    mash_matrix = get_mash_dist_matrix(seq_names, seqs, distance, indent=False)
  File "/home/ubuntu/.local/lib/python3.7/site-packages/Trycycler-0.4.1-py3.7.egg/trycycler/mash.py", line 28, in get_mash_dist_matrix
    pos_sketches, neg_sketches = make_mash_sketches(seq_names, seqs, temp_dir)
  File "/home/ubuntu/.local/lib/python3.7/site-packages/Trycycler-0.4.1-py3.7.egg/trycycler/mash.py", line 63, in make_mash_sketches
    write_seq_to_fasta(seq_pos, seq_name, fasta_pos)
  File "/home/ubuntu/.local/lib/python3.7/site-packages/Trycycler-0.4.1-py3.7.egg/trycycler/misc.py", line 155, in write_seq_to_fasta
    with open(filename, 'wt') as f:
FileNotFoundError: [Errno 2] No such file or directory: '/tmp/tmp6f7zakqj/A_assemblies/canu_0_pos.fasta'

Version mismatch

There is a mismatch between version numbers in the releases (latest 0.3.2) and the program (latest version.py has 0.3.0).
I think version.py should be updated, so that pip reports the correct version during installation.

trycycler reconsile could continue evaluating contigs after one fails

Hi there,

first of all, trycycler seems a great tool, thanks for this! Disclaimer: I am using it the first time.

What I really found extremely annoying is that trycycler reconsile stops after the first contig doesnt meet requirements.

For example, I use trycycler reconsile and it stops complaining

#Error: failed to circularise sequence A_tig00000003 for multiple reasons. You
#must either repair this sequence or exclude it and then try running trycycler
#reconcile again.

That is alright, I remove it and try again, however, same problem with the second contig. I get skeptic, but try a third round after removing the second contig. Third contig also cannot be circularized, again an error. I realize that this genome part might be linear (a qucik literature confirms that this might be true), re-add all contigs, add the appropriate command (--linear), run it a forth time and it passes. However:

#Error: some pairwise identities are below the minimum allowed value of 98.0%.
#Please remove offending sequences or lower the --min_identity threshold and try
#again.

Alright, so I remove those bad contigs, and restart it 5th time, but

#Error: some pairwise indels are greater than the maximum allowed value of 1000.
#Please remove offending sequences or raise the --max_indel_size threshold and
#try again.

Again, I remove those contigs and restart 6th time.

Essentially, I am just wondering whether it wouldn't be more effective to have trycycler reconsile continue after it encounters the first "error" but stops with all those error reports in one run. I could have seen immediately that none of the contigs are circular and used --linear instead of running it 3 times. I could have immediately removed contigs with bad pairwise identities and pairwise indels.

Maybe circularisation is required to calculate indentities & indels and it has to stop when circularisation is failing, but at least it could report all contigs that fail to circularize? And maybe pairwise identities and pairwise indels could be another block that fails in one go? That would have left me with 3 instead of 6 runs, much better imho.

Best,
Daniel

Challenges recreating demo datasets

I am hoping to get more insight into how you created your assemblies for the demo read sets. I am trying to replicate the contig clusters you generated for these datasets using assemblies generated from Flye, Miniasm+Minipolish, and Raven but am running into issues with the plasmid assembly in each case. My general process is shown below:

Step 1. Process reads with Filtlong
filtlong --min_length 1000 --keep_percent 90 reads.fastq.gz | gzip > filtered_reads.fastq.gz

Step 2. Create read subsets
trycycler subsample --reads filtered_reads.fastq.gz --out_dir subsets --count 12

Step 3. Create sub-assemblies using Flye, Miniasm+Minipolish, and Raven
# run Flye for read subsets sample_01.fastq to sample_04.fastq
flye --nano-raw subsets/sample_01.fastq --out-dir flye/sample_01

# run Miniasm+Minipolish for read subsets sample_05 to sample_08
minimap2 -x ava-ont subsets/sample_05.fastq subsets/sample_05.fastq > miniasm/sample_05/overlaps.paf
miniasm -f subsets/sample_05.fastq miniasm/sample_05/overlaps.paf > miniasm/sample_05/assembly.gfa
minipolish subsets/sample_05.fastq miniasm/sample_05/assembly.gfa

# run Raven for read subsets sample_09 to sample_12
raven subsets/sample_09.fastq > assemblies/sample_09.fasta

Step 4. Cluster contigs
trycycler cluster --assemblies assemblies/*.fasta --reads filtered_reads.fastq --out_dir clusters

Below are the trees from the great, good, and mediocre demo datasets:
Note that there are additional Raven assemblies included in the figures due to an error in my pipeline.

Great Dataset
great_dataset_tree

Good Dataset
good_dataset_tree

Mediocre Dataset
mediocre_dataset_tree

In each case, I am having issues resolving plasmids (though the great dataset is passable). This contrasts your examples for the same datasets. Do you have any ideas why my results might be different? For what it is worth, I did try running this pipeline without Filtlong and it did not solve the issue. Any help is appreciated!

Docker request

Hello,
May I ask you to create a Docker image for your brilliant pipeline!

Thank you.

MUSCLE version compatibility issue?

Hi there!

First of all, thank you for all the work you put into making Trycycler and all your other tools as easy to use as possible.

I just wanted to let you know that I ran into an issue trying to use Trycycler with the latest version of MUSCLE. The default version on my lab's cluster is version 5.1, and it seems to mess up the trycycler msa step. I kept getting the following error:

Error: MUSCLE failed to complete on 4988 of the 4988 pieces. Please remove the most divergent sequences from this cluster and then try again.

I solved it by loading MUSCLE v.3.8.31 instead.

Best,
Flora

AssertionError during Reconcile of plasmid sequence

Dear Trycycler Team

I have been running/assembling a big dataset of bacterial genomes with Trycycler and I have just encountered following problem with one of my plasmid clusters. The sequences are circular and are taken along the dataset, untill the known starting sequences are searched for. No matter which sequences I include and/or remove, the error keeps on persisting. This error did not occur with any other plasmid cluster before. Also not from the same dataset.

`Finding starting sequence (2022-07-08 10:39:36)
In this step, Trycycler finds a sequence to use as a starting point for each of the contigs. This can be a standard starting point
(e.g. the dnaA gene) or if one is not found, then a randomly-chosen unique sequence will be used. If necessary, the sequences will be
flipped (converted to their reverse complement sequence) to ensure that the starting sequence is on the positive strand.

Looking for known starting sequences in each contig...

Found starting sequence 0676_AWB10_RS27370 (replication protein RepA4)
ATGAGAGAGCTTCTGTGCGCGGTCGGAGTGGTCCCGACGAGGGTTTACCC...

A_contig_7: Traceback (most recent call last):
File "/home/linuxbrew/.linuxbrew/bin/trycycler", line 8, in
sys.exit(main())
File "/home/thor/.local/lib/python3.9/site-packages/trycycler/main.py", line 48, in main
reconcile(args)
File "/home/thor/.local/lib/python3.9/site-packages/trycycler/reconcile.py", line 37, in reconcile
seqs, starting_seq = get_starting_seq(seqs, args.threads)
File "/home/thor/.local/lib/python3.9/site-packages/trycycler/starting_seq.py", line 68, in get_starting_seq
seqs = flip_seqs_as_necessary(seqs, starting_seq)
File "/home/thor/.local/lib/python3.9/site-packages/trycycler/starting_seq.py", line 43, in flip_seqs_as_necessary
assert len(alignments) == 1
AssertionError
`

Thank you in advance.
Regards Nick Vereecke

Running trycylcer cluster under python3.7

Hi, I’m @AmyNjaaye and curently testing trycycler cluster. I encounter the following error at the Building distance matrix:

Traceback (most recent call last):
  File "/usr/bin/trycycler", line 11, in <module>
    load_entry_point('Trycycler==0.5.3', 'console_scripts', 'trycycler')()
  File "/usr/lib/python3/dist-packages/trycycler/__main__.py", line 41, in main
    cluster(args)
  File "/usr/lib/python3/dist-packages/trycycler/cluster.py", line 41, in cluster
    matrix = distance_matrix(seqs, seq_names, args.distance)
  File "/usr/lib/python3/dist-packages/trycycler/cluster.py", line 234, in distance_matrix
    mash_matrix = get_mash_dist_matrix(seq_names, seqs, distance, indent=False)
  File "/usr/lib/python3/dist-packages/trycycler/mash.py", line 28, in get_mash_dist_matrix
    pos_sketches, neg_sketches = make_mash_sketches(seq_names, seqs, temp_dir)
  File "/usr/lib/python3/dist-packages/trycycler/mash.py", line 69, in make_mash_sketches
    str(fasta_pos)], stderr=dev_null)
  File "/usr/lib/python3.7/subprocess.py", line 395, in check_output
    **kwargs).stdout
  File "/usr/lib/python3.7/subprocess.py", line 487, in run
    output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['mash', 'sketch', '-n', '-o', '/tmp/tmpgmibvfqh/A_contig_1_pos.msh', '/tmp/tmpgmibvfqh/A_contig_1_pos.fasta']' died with <Signals.SIGSEGV: 11>.

It seems to be related with the python version.
I used the great datasets available on the wiki for testing.
Any idea about that error ?
Thank you for your response.

Aminata

Assertion Error during MSA

Hi,

First just wanted to say thanks for creating such a valuable tool.

I am currently experiencing an issue when running the MSA step of the workflow. I am getting this:

MSA length: 1,756,900 bp
Traceback (most recent call last):
File "/home/djones2/anaconda3/envs/trycycler/bin/trycycler", line 8, in
sys.exit(main())
File "/home/djones2/anaconda3/envs/trycycler/lib/python3.8/site-packages/trycycler/main.py", line 46, in main
msa(args)
File "/home/djones2/anaconda3/envs/trycycler/lib/python3.8/site-packages/trycycler/msa.py", line 36, in msa
merge_pieces(temp_dir, args.cluster_dir, seqs)
File "/home/djones2/anaconda3/envs/trycycler/lib/python3.8/site-packages/trycycler/msa.py", line 187, in merge_pieces
assert seqs[n] == msa_minus_dashes
AssertionError

Does it seem like there is a resolve for this? Any experience with this issue? Thanks in advance!

glibC incompatibility causing a failure in installation from conda

Hi,

I failed to install Trycycler v0.5.3 on my ubuntu laptop due to the glibC incompatibility:

conda install -c conda-forge -c bioconda trycycler
Collecting package metadata (current_repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Solving environment: failed with repodata from current_repodata.json, will retry with next repodata source.
Collecting package metadata (repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Solving environment: | 
Found conflicts! Looking for incompatible packages.
This can take several minutes.  Press CTRL-C to abort.
failed                                                                                                                                                    

UnsatisfiableError: The following specifications were found to be incompatible with each other:

Output in format: Requested package -> Available versionsThe following specifications were found to be incompatible with your system:

  - feature:/linux-64::__glibc==2.27=0
  - feature:|@/linux-64::__glibc==2.27=0

Your installed version is: 2.27

I circumvented this problem using pip:

pip3 install git+https://github.com/rrwick/Trycycler.git

Is there a way to solve the problem of conda installation?

Thanks


My system configuration:

  • Ubuntu 18.04.6 LTS 64 bit (kernel: Linux version 5.4.0-91-generic)
  • gcc version 7.5.0
  • conda 4.10.3

ValueError when creating the Newick file

Hello Ryan, thank you for creating this awesome tool!

I installed Trycycler version 0.5.0 through conda on the HPC cluster we are working on here. I am testing the workflow now, and I encountered this error when I ran trycycler cluster:
(I have shortened the paths to make it a bit easier to read)

Traceback (most recent call last):
  File "src/miniconda/envs/trycycler/bin/trycycler", line 10, in <module>
    sys.exit(main())
  File "src/miniconda/envs/trycycler/lib/python3.9/site-packages/trycycler/__main__.py", li
ne 41, in main
    cluster(args)
  File "src/miniconda/envs/trycycler/lib/python3.9/site-packages/trycycler/cluster.py", lin
e 43, in cluster
    build_tree(seq_names, seqs, depths, matrix, args.out_dir, cluster_numbers)
  File "src/miniconda/envs/trycycler/lib/python3.9/site-packages/trycycler/cluster.py", lin
e 262, in build_tree
    tree_script, newick = create_tree_script(temp_dir, phylip)
  File "src/miniconda/envs/trycycler/lib/python3.9/site-packages/trycycler/cluster.py", lin
e 281, in create_tree_script
    return str(tree_script), pathlib.Path(newick).relative_to(pathlib.Path.cwd())
  File "src/miniconda/envs/trycycler/lib/python3.9/pathlib.py", line 939, in relative_to
    raise ValueError("{!r} is not in the subpath of {!r}"
ValueError: 'contigs.newick' is not in the subpath of 'assemblies_subsets' OR one path is relative and the oth
er is absolute.

The newick file is the only file missing in the output folder. Not sure how to interpret this error!
This was the command I used:

trycycler cluster --assemblies ${subset}/*fasta --reads $longread --out_dir ${out_loc}/${subset_id} --threads 16

Thanks in advance!

failed circularisation

What if the contig is not circular? like just a chromosome... How to be able to proceed without having to circularize?

ValueError: not enough values to unpack (expected 4, got 3)

Hello,

Thank you for the great new tool!
I have the problem with one cluster and decided to run
trycycler dotplot --cluster_dir trycycler/cluster_003
but I have the error:

Traceback (most recent call last):
  File "/home/miniconda3/envs/trycycler/bin/trycycler", line 10, in <module>
    sys.exit(main())
  File "/home/miniconda3/envs/trycycler/lib/python3.9/site-packages/trycycler/__main__.py", line 45, in main
    dotplot(args)
  File "/home/miniconda3/envs/trycycler/lib/python3.9/site-packages/trycycler/dotplot.py", line 43, in dotplot
    image = create_dotplots(seq_names, seqs, args)
  File "/home/miniconda3/envs/trycycler/lib/python3.9/site-packages/trycycler/dotplot.py", line 90, in create_dotplots
    draw_labels(image, seq_names, start_positions, end_positions, text_gap, outline_width,
  File "/home/miniconda3/envs/trycycler/lib/python3.9/site-packages/trycycler/dotplot.py", line 173, in draw_labels
    font, text_width, text_height, font_size = \
ValueError: not enough values to unpack (expected 4, got 3)

How could I fix it?
version of Trycycler 0.5.1

Valery

Minimap2/miniasm produce unusual number of contigs

Dear Ryan,

Thanks for developing the amazing tool. This is the second time I am using it and I am facing some issues this time. Please take a look:
I used Nanopore to sequence E.coli. Flye and minimap2/miniasm was used to generate assemblies on the read subsets. However, Flye produces finally 4/5 contigs while minimap produces around 35 contigs(unitigs). This is so weird because last time I generated on another E.coli set I sequenced, both Flye and minimap produced similar contig numbers. Both the E.coli strains were sequenced at different times with different sequencing depths.
Question:

  • Am I getting too many unitigs this time because there is lot of high error rate subsequences?
  • Should I use raven/canu instead of minimap2/miniasm?
  • If I still want to use minimap2/miniasm, can you suggest how to go round about the issue?

171262291_3858435617576525_6234930909772664639_n

Trycycler should automatically save log to file

Although the log printed on screen by Trycycler could be manually redirected to file. It would be much better if Trycycler automatically save them to a file. Those files may be useful for debug.

Enhancement: Parallize dotplot.py

This is an enhancement request: Could you parallelize the construction of dotplots (and add a --threads argument)?

We really appreciate the work that you have done on trycycler (and unicycler, and filtlong, and ...).

Thanks.

should I worry about the persistent indels?

after many rounds (at least 3 rounds each) of medaka, polypolish and polca polishing, I still got 14 indel errors, informed by the report from polca (shown below),

"Substitution Errors: 0
Insertion/Deletion Errors: 14
Assembly Size: 5523242
Consensus Quality: 99.9997
Consensus QV: 55.96"

should I find other polishing tools to fix these indels, given that I don't want to add errors to the assembly?
many thanks.

Short-read question

Hi!
I have, probabily, a stupid question: Could I use clustering and consensus pipeline over short-read assemblies? I am usually worried about algorithm selection bias and this could be an option.

Thank you!

KeyError thrown in 'Merging MSA' step of trycycler msa

Hi there,

I am running Trycycler 0.5.3, and near the end of the msa step in the pipeline, I am getting a KeyError thrown with certain clusters. In the environment I have set up, the msa makes use of MUSCLE v5.1. The traceback directs me to line 175 in msa.py:

line 175, in merge_pieces
aligned_seq_parts[n].append(parts[n].upper())
KeyError: 'A_contig_1'

This has only been an issue with clusters including larger (> 2 Mb) contigs. Other clusters with smaller tigs, produced from the same 'trycycler cluster' step and assemblers, have worked just fine.

Any input would be appreciated - thank you for your time!

msa Error

Hello,

I'm failing to complete the msa step in trycycler. I've copied the message below. I'm wondering if the issue might be with large number of repeats (~20; ~70Kb total length) in my engineered genome. To provide some wiggle room for repeat issues, I allowed the max_indel size to be 2000 (Indel sizes are listed at bottom). Thanks for any advice!

Error message:*********************************************************************
Merging MSA (2022-11-08 18:52:31)
Each of the MSA pieces are now merged together and saved to file.

Traceback (most recent call last):
File "/homes/rwilton/software/miniconda3/envs/nano-trycycler/bin/trycycler", line 10, in
sys.exit(main())
File "/homes/rwilton/software/miniconda3/envs/nano-trycycler/lib/python3.10/site-packages/trycycler/main.py", line 51, in main
msa(args)
File "/homes/rwilton/software/miniconda3/envs/nano-trycycler/lib/python3.10/site-packages/trycycler/msa.py", line 36, in msa
merge_pieces(temp_dir, args.cluster_dir, seqs)
File "/homes/rwilton/software/miniconda3/envs/nano-trycycler/lib/python3.10/site-packages/trycycler/msa.py", line 175, in merge_pieces
aligned_seq_parts[n].append(parts[n].upper())
KeyError: 'A_contig_1'

Indels from reconcile step**********************************************************
A_contig_1 vs B_contig_1... 99.00% identity, max indel = 43
A_contig_1 vs C_contig_1... 99.93% identity, max indel = 27
A_contig_1 vs D_contig_1... 99.78% identity, max indel = 37
A_contig_1 vs E_Utg648... 99.88% identity, max indel = 604
A_contig_1 vs G_Utg580... 99.72% identity, max indel = 301
A_contig_1 vs H_Utg592... 99.93% identity, max indel = 781
A_contig_1 vs L_utg000001l... 99.18% identity, max indel = 39
B_contig_1 vs C_contig_1... 99.07% identity, max indel = 1452
B_contig_1 vs D_contig_1... 99.21% identity, max indel = 1669
B_contig_1 vs E_Utg648... 98.89% identity, max indel = 1533
B_contig_1 vs G_Utg580... 99.17% identity, max indel = 1521
B_contig_1 vs H_Utg592... 98.97% identity, max indel = 1704
B_contig_1 vs L_utg000001l... 99.77% identity, max indel = 94
C_contig_1 vs D_contig_1... 99.86% identity, max indel = 37
C_contig_1 vs E_Utg648... 99.81% identity, max indel = 1401
C_contig_1 vs G_Utg580... 99.79% identity, max indel = 301
C_contig_1 vs H_Utg592... 99.88% identity, max indel = 781
C_contig_1 vs L_utg000001l... 99.25% identity, max indel = 40
D_contig_1 vs E_Utg648... 99.67% identity, max indel = 1156
D_contig_1 vs G_Utg580... 99.93% identity, max indel = 301
D_contig_1 vs H_Utg592... 99.75% identity, max indel = 1499
D_contig_1 vs L_utg000001l... 99.39% identity, max indel = 38
E_Utg648 vs G_Utg580... 99.64% identity, max indel = 301
E_Utg648 vs H_Utg592... 99.84% identity, max indel = 781
E_Utg648 vs L_utg000001l... 99.09% identity, max indel = 274
G_Utg580 vs H_Utg592... 99.73% identity, max indel = 1499
G_Utg580 vs L_utg000001l... 99.38% identity, max indel = 301
H_Utg592 vs L_utg000001l... 99.18% identity, max indel = 599

Command 'raven' not found

Hello,

I had installed trycycler through conda. But when I ran raven for one subset of reads, it showed that not found "raven" command. Later, I tried to installed Raven through conda (conda install -c bioconda raven-assembler), it still didn't work. Could you kindly help me to solve the problem? Thanks much!

Rscript path should not be hardcoded

Hello

trycycler/cluster.py have Rscript hardcoded on line 273
see

rpm_maker:Trycycler/Trycycler-0.5.0 > grep -Rn Rscript
trycycler/cluster.py:264:        subprocess.check_output(['Rscript', tree_script])
trycycler/cluster.py:273:        f.write('#!/usr/bin/Rscript\n') 

should be mmore flexible.

in our case, R is provided via module environnement

I suggest to use #/usr/bin/env Rscript instead

regards

segmentation fault error

Hi,

I build Trycyler in a pipeline. With some datasets, its working flawless, but wit some datasets, it produces, reproduceable, the following error:

Building distance matrix (2022-11-14 12:22:29)
Mash is used to build a distance matrix of all contigs in the assemblies.

A_contig_1: 0.000 0.002
B_Utg38: 0.002 0.000

Clustering (2022-11-14 12:22:29)
The contigs are now split into clusters using a complete-linkage hierarchical approach.

trycycler/cluster_001/1_contigs:
trycycler/cluster_001/1_contigs/A_contig_1.fasta: 127,851 bp, 39.9x
trycycler/cluster_001/1_contigs/B_Utg38.fasta: 126,371 bp, 41.8x

Building FastME tree (2022-11-14 12:22:29)
R (ape and phangorn) are used to build a FastME tree of the relationships between the contigs.

saving distance matrix: trycycler/contigs.phylip
saving tree: trycycler/contigs.newick

*** caught segfault ***
address 0x38, cause 'memory not mapped'

Traceback:
1: fastme.bal(distances)
An irrecoverable exception occurred. R is aborting now ...
Traceback (most recent call last):
File "/home/leinehome/mh-hannover.local/steinbrl/.conda/envs/Trycycler/bin/trycycler", line 10, in
sys.exit(main())
File "/home/leinehome/mh-hannover.local/steinbrl/.conda/envs/Trycycler/lib/python3.10/site-packages/trycycler/main.py", line 41, in main
cluster(args)
File "/home/leinehome/mh-hannover.local/steinbrl/.conda/envs/Trycycler/lib/python3.10/site-packages/trycycler/cluster.py", line 43, in cluster
build_tree(seq_names, seqs, depths, matrix, args.out_dir, cluster_numbers)
File "/home/leinehome/mh-hannover.local/steinbrl/.conda/envs/Trycycler/lib/python3.10/site-packages/trycycler/cluster.py", line 264, in build_tree
subprocess.check_output(['Rscript', tree_script])
File "/home/leinehome/mh-hannover.local/steinbrl/.conda/envs/Trycycler/lib/python3.10/subprocess.py", line 420, in check_output
return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
File "/home/leinehome/mh-hannover.local/steinbrl/.conda/envs/Trycycler/lib/python3.10/subprocess.py", line 524, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['Rscript', '/tmp/tmpcahstqdr/tree.R']' died with <Signals.SIGSEGV: 11>.

Do you have any idea, what is triggering this? The script ran on a HPC node, with 64 cores, 128GB RAM und Centros, controlled by SLURM.

Best wishes,

Lars

Is Trycycler appropriate for Metagenomes?

Hi Ryan,

More of a question than an issue.
I have assembled a a set of human faecal metagenomes using metaFlye with different min. overlap settings.
Does it make sense for me to run these through Trycycler to attempt to get the best possible assembly instead of manually assessing which is the best (which I feel is always slightly subjective)?
Would you expect metagenome complexity to cause an issue? Or perhaps the assemblies might all be a bit too similar since they were generated by the same algorithm with one different parameter?

Thanks,
Calum

EDIT: Actually I just saw the metagenomes section in the FAQs. I should have read in a bit more detail before asking.
Apologies 🙈

question: 0.250 values from mash when building distance matrix

Thanks for an interesting tool, Ryan!

This is my first attempt running trycyler on a sample/genome which (for some yet-unknown reason) ends up too fragmented even with decent (~50x) PacBio coverage. (A few other nearly-identical samples get assembled very well even at ~30x.)

I've used mash a few times before for small-group and pairwise whole-genome comparisons, so I am surprised to see a particular output.
At the stage of building a distance matrix with mash, I am seeing a peculiar pattern of repeated 0.250 values (wrapped for somewhat better readability):

A_sample_098: 0.250  0.250  0.250  0.250  0.250  0.250  0.250  0.250  0.250  0.250  0.250  0.250  0.250  0.250  
0.250  0.250  0.250  0.250  0.250  0.250  0.250  0.250  0.250  0.250  0.250  0.250  0.250  0.250  0.250  0.250  
0.250  0.250  0.250  0.250  0.250  0.250  0.250  0.250  0.250  0.250  0.250  0.250  0.250  0.250  0.250  0.250  
0.250  0.250  0.250  0.250  0.250  0.250  0.250  0.250  0.250  0.250  0.250  0.250  0.250  0.250  0.250  0.250  
0.250  0.250  0.250  0.250  0.250  0.250  0.250  0.116  0.250  0.250  0.250  0.250  0.250  0.250  0.250  0.250  
0.250  0.250  0.250  0.250  0.250  0.250  0.250  0.250  0.250  0.250  0.250  0.250  0.250  0.250  0.250  0.250  
0.250  0.250  0.000  0.250  0.250  0.250  0.250  0.250  0.250  0.250  0.250  0.250  0.250  0.250  0.250  0.250  
0.250  0.250  0.250  0.250  0.250  0.250  0.250  0.250  0.250  0.250  0.250  0.250  0.250  0.250  0.250  0.250  
0.250  0.250  0.250  0.250  0.250  0.250  0.250  0.250  0.250  0.250  0.250  0.250  0.250  0.250  0.250  0.250  
0.250  0.250  0.250  0.250  0.250  0.250  0.250  0.250  0.250  0.250  0.250  0.250  0.250  0.250  0.250  0.250  
0.250  0.250  0.244  0.250  0.250  0.250  0.250  0.250  0.250  0.250  0.250  0.250  0.250  0.250  0.250  0.250  
0.250  0.250

This goes on and on, pages and pages of scrollback buffer :) , with occasional different values.

The actual question is: is this a normal/expected behavior, or a bug in my local environment?
Resulting dendrograms look fine, with variable branch lengths and realistic-looking clustering.

How to set --genome_size option for two bacteria sequenced (PacBio) together?

I had two bacteria sequenced together (PacBio). The genome size is 4.6m and 4.9m, respectively. Should I set --genome_size to 9.5m (the sum or their genome size) or separately (4.6m and 4.9m) during generating assemblies process, or just leave it to miniasm to get the size?
And is there any other special option I should use in the following Trycycler process?

MUSCLE failed to complete pieces; Error running MUSCLE on genome cluster during msa step

Dear Trycycler Team

I have been assembling multiple bacterial genomes with the Trycycler pipeline, but encountered following error with one of the genome clusters I obtained. All other datasets went fine, except for this one giving an error during the msa part of the pipeline. While all sequences within the genome cluster were properly ran through the reconcile pipeline, the msa part does not seem to complete to the end due to 1 piece of sequence that failed during the MUSCLE alignment. I have tried to remove the most "divergent" sequences, but this did not solve the issue. I also tried to play with the piece sizes, but without any success. Any thoughts on how to get this issue solved?

`Starting Trycycler MSA (2022-07-08 10:45:00)
Trycycler MSA is a tool for conducting global multiple sequence alignment of contig sequences.

Input sequences:
C_utg000001c: 4,078,071 bp
G_utg000003c: 4,074,445 bp
J_Utg2938: 4,069,380 bp

Checking required software:
MUSCLE: v3.8.1551

Partitioning sequences (2022-07-08 10:45:00)
The sequences are now partitioned into smaller chunks to make the multiple sequence alignment more tractable.

pieces: 4010

median piece size: 1,000 bp
max piece size: 44,000 bp

Running Muscle (2022-07-08 10:45:14)
Trycycler now runs Muscle on each of the pieces to turn them into multiple sequence alignments.

pieces: 4010

Error: MUSCLE failed to complete on 1 of the 4010 pieces. Please remove the most divergent sequences from this cluster and then try again.`

Thank you in advance.
Best regards Nick Vereecke

Too Many Clusters

How many clusters should I be generating? Is ~160 clusters too many? No matter how much QC I do my tree looks a lot like the final example and I can't seem to find a single cluster that fits the length of my genome (for reference it's 1.1m). None of my clusters seem to be 1.1m bp, they tend to be 100k-300k bp

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.