tseemann / berokka Goto Github PK
View Code? Open in Web Editor NEWπ π« Trim, circularise and orient long read bacterial genome assemblies
License: GNU General Public License v3.0
π π« Trim, circularise and orient long read bacterial genome assemblies
License: GNU General Public License v3.0
Don't forget the badges!
https://www.ncbi.nlm.nih.gov/nucleotide/MG551957?report=genbank
LOCUS MG551957 4034 bp DNA linear SYN 21-NOV-2017
DEFINITION Synthetic construct PacBio unrolled DNA internal control sequence.
>tig00008674 len=10963 reads=1 covStat=0.00 gappedBases=no class=contig suggestRepeat=no suggestCircular=no
For use in https://github.com/rrwick/Rebaler by @rrwick
We need @Slugger70 power here.
Enhancement suggestion to make table numbers right justified.
Removing temporary files: 1.fa 1.head.fa 1.bls
Removing temporary files: 2.fa 2.head.fa 2.bls
Removing temporary files: 3.fa 3.head.fa 3.bls
Removing temporary files: 4.fa 4.head.fa 4.bls
Can't call method "start" on an undefined value at /home/tseemann/git/berokka/bin/berokka line 134, line 123.
I have an animal mitochondrial genome assembled by Unicycler (so no overlap). I'd like to orient and rotate the genome to agree with its closest relative in Genbank. I have a FASTA file of the related genome. Is this task possible with Berokka?
Just reporting the conda installer for berokka has a Perl issue.
conda create -n berokka_env berokka
conda activate berokka_env
berokka
Can't locate Bio/SeqIO.pm in @INC (you may need to install the Bio::SeqIO module)
Usually the beginning hits the end, which we handle:
Running: blastn -query 4.head.fa -subject 4.fa -out 4.bls -evalue 1E-6 -dust no
blastn: 1..13766/20000 aligns to 43298..57075/57075
tig00000003 keep 1..43297/57075 (remove 13779 bp)
On these smaller ones, the end hits the beginning, which we DO NOT HANDLE.
*** [7] tig00000006 ***
Using first 900 bp to BLAST
Writing tig00000006 ( 900 bp ) to 7.fa
Writing tig00000006 ( 900 bp ) to 7.head.fa
Running: blastn -query 7.head.fa -subject 7.fa -out 7.bls -evalue 1E-6 -dust no
blastn: 781..900/900 aligns to 1..120/900
tig00000006 - COULD NOT TRIM
1.ctl.bls 2.ctl.bls 3.ctl.bls 4.ctl.bls
My conda install produces the following error:
berokka Can't locate Bio/SeqIO.pm in @INC (you may need to install the Bio::SeqIO module) (@INC contains: /home/kvandelannoo/miniconda3/envs/berokka_env/lib/site_perl/5.26.2/x86_64-linux-thread-multi /home/kvandelannoo/miniconda3/envs/berokka_env/lib/site_perl/5.26.2 /home/kvandelannoo/miniconda3/envs/berokka_env/lib/5.26.2/x86_64-linux-thread-multi /home/kvandelannoo/miniconda3/envs/berokka_env/lib/5.26.2 .) at /home/kvandelannoo/miniconda3/envs/berokka_env/bin/berokka line 4. BEGIN failed--compilation aborted at /home/kvandelannoo/miniconda3/envs/berokka_env/bin/berokka line 4.
I installed berokka using:
conda install -c conda-forge -c bioconda -c defaults berokka
I tried the following things without success:
1/ updating conda
2/ creating a separate conda env
My install looks OK to me:
which berokka
~/miniconda3/envs/berokka_env/bin/berokka
which perl
~/miniconda3/envs/berokka_env/bin/perl
echo $PATH | tr ":" "\n" | nl
1 /home/kvandelannoo/miniconda3/envs/berokka_env/bin
2 /home/kvandelannoo/miniconda3/condabin
3 /usr/local/showq/0.15/bin
4 /usr/local/slurm/latest/bin
5 /usr/lib64/qt-3.3/bin
6 /usr/local/bin
7 /usr/bin
8 /usr/local/sbin
9 /usr/sbin
10 /opt/ibutils/bin
11 /opt/puppetlabs/bin
12 /opt/dell/srvadmin/bin
13 /home/kvandelannoo/.local/bin
14 /home/kvandelannoo/bin
perl -e "print qq(@INC)"
/home/kvandelannoo/miniconda3/envs/berokka_env/lib/site_perl/5.26.2/x86_64-linux-thread-multi /home/kvandelannoo/miniconda3/envs/berokka_env/lib/site_perl/5.26.2 /home/kvandelannoo/miniconda3/envs/berokka_env/lib/5.26.2/x86_64-linux-thread-multi /home/kvandelannoo/miniconda3/envs/berokka_env/lib/5.26.2
Any help with this would be much appreciated.
KV
Make sure our control and dnaA data is present.
Add to brewsci/bio
Possibly dangerous!
Currently uses 1 .. N which is cleanand will always work, but names might be better.
S.Enteriditis__2017-15455
#sequence old_len new_len trimmed
tig00000001 4736005 4736005 0
tig00002852 3967 0 3968
just some SNPS
maybe an indel?
tig00000001 dna 4736005
Score = 55308 bits (29950), Expect = 0.0
Identities = 29995/30013 (99%), Gaps = 17/30013 (0%)
Strand=Plus/Plus
Query 1 CGCTGTCGGCAAGAATATAGCGGCTTGATGCCAAAG-CGCCT-GGTCATTTCGACAAAAA 58
|||||||||||||||||||||||||||||||||||| ||||| |||||||||||||||||
Sbjct 4704147 CGCTGTCGGCAAGAATATAGCGGCTTGATGCCAAAGGCGCCTGGGTCATTTCGACAAAAA 4704206
<snip>
Query 29988 ACGGTTTTTCAGT 30000
|||||||||||||
Sbjct 4734143 ACGGTTTTTCAGT 4734155
Hello,
Thank you for this tool. I tried to run it with a bacterial genome and it returned the same input sequence despite having clear overlaps at the beginning and end. It did not output any error. Do you know what could be the problem?
Thanks
Good for travis too
*** [5] tig00003742 ***
Using first 3959 bp to BLAST
Writing tig00003742 ( 3959 bp ) to 5.fa
Writing tig00003742 ( 3959 bp ) to 5.head.fa
Running: blastn -query 5.head.fa -subject 5.fa -out 5.bls -evalue 1E-6 -dust no
blastn: 2..3959/3959 aligns to 2..3959/3959 at 98.5 %id
tig00003742 keep 1..0/3959 (remove 3960 bp)
------------- EXCEPTION -------------
MSG: trunc start,end -- there was no end for 1
STACK Bio::PrimarySeqI::trunc /home/linuxbrew/.linuxbrew/Cellar/perl/5.26.1_1/lib/perl5/site_perl/5.26.1/Bio/PrimarySeqI.pm:447
STACK main::check_overhang /home/tseemann/git/berokka/bin/berokka:149
STACK toplevel /home/tseemann/git/berokka/bin/berokka:74
-------------------------------------
Ensure we are good to go!
$ ! berokka --doesnotexist
Unknown option: doesnotexist
SYNOPSIS
Filter, trim, circularise & orient long read assemblies
USAGE
berokka [options] canu.contigs.fasta [another.fasta ...]
OPTIONS
--help This help.
--debug Debug info (default '0').
--version Print version and exit.
--check Check dependencies and exit.
--test Run a small test and exit.
--force Force overwite of existing (default '0').
--outdir [X] Output folder (default '').
--readlen [N] Approximate read length (default '30000').
--keepfiles Keep intermediate files (default '0').
--noanno Don't annotate FASTA with circular=true (default '0').
AUTHOR
Torsten Seemann | https://github.com/tseemann/berokka
The command "! berokka --doesnotexist" exited with 1.
Also check stderr/stdout
helloοΌ
I used conda to install berokka, but run into this problem:
Error: NCBI C++ Exception:
T0 "/opt/conda/conda-bld/blast_1537407096784/work/c++/src/objtools/readers/fasta.cpp", line 2178: Error: ncbi::objects::CFastaReader::PostWarning() - CFastaReader: Near line 7, there's a line that doesn't look like plausible data, but it's not marked as defline or comment. (m_Pos = 7
Thanks!!!!
Get file from circlator?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. πππ
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google β€οΈ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.