Giter Club home page Giter Club logo

biscot's People

Contributors

bistace avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

biscot's Issues

KeyError: 100004

Hello,

I would like to run BiSCoT to merge overlapping contigs and I get this error message:

[2020-08-04 14:30:54]   [biscot.py - 175]       [ INFO] Parsing key file
[2020-08-04 14:30:54]   [biscot.py - 178]       [ INFO] Parsing reference cmap file
[2020-08-04 14:30:55]   [biscot.py - 181]       [ INFO] Parsing contigs cmap file(s)
[2020-08-04 14:30:56]   [biscot.py - 184]       [ INFO] Parsing xmap(s)
Traceback (most recent call last):
  File "/home/copettid/miniconda3/envs/py36/bin/biscot", line 8, in <module>
    sys.exit(run())
  File "/home/copettid/miniconda3/envs/py36/lib/python3.6/site-packages/biscot/biscot.py", line 192, in run
    args.only_confirmed_positions,
  File "/home/copettid/miniconda3/envs/py36/lib/python3.6/site-packages/biscot/Alignment.py", line 166, in parse_xmap
    reference_maps_dict[aln.reference_id].add_alignment(aln)
KeyError: 100004

The command was:

biscot --cmap-ref CTTAAG_EXP_REFINEFINAL1_bppAdjust_cmap_OF1p3c_noCOs_16326_fa_BNGcontigs_NGScontigs_r.cmap \
--cmap-1 EXP_REFINEFINAL1_bppAdjust_cmap_OF1p3c_noCOs_16326_fa_NGScontigs_HYBRID_SCAFFOLD_q.cmap \
--xmap-1 EXP_REFINEFINAL1_bppAdjust_cmap_OF1p3c_noCOs_16326_fa_NGScontigs_HYBRID_SCAFFOLD.xmap \
--key OF1p3c_noCOs_16326_CTTAAG_0kb_0labels_key.txt \
--contigs OF1p3c_noCOs_16326.fa \
--output default

I have a Flye assembly of ONT data for a heterozygous plant genome. The optical map is as large as the diploid genome, the coverage of molecules at the sites is unimodal (so there are no collapsed homozygous regions on the map) and I put aside a few contigs that have "complete overlaps" (Figure 1 case 3 in your paper), I just have cases 1 and 2. The assembly is almost diploid (3.7 Gb in size, haploid genome ~2.5 Gb, optical map 4.2 Gb).

I wonder if the input files I am using are correct, as there are minor discrepancies with the new file names produced by Bionano Access. If I replace the xmap file with *BNGcontigs_HYBRID_SCAFFOLD.xmap or the cmap-ref with CTTAAG_EXP_REFINEFINAL1_bppAdjust_cmap_OF1p3c_noCOs_16326_fa_BNGcontigs_NGScontigs_r.cmap I get the same error.
I also tried replacing the HS input contigs with OF1p3c_noCOs_16326.fa.cut.fasta and the key file with OF1p3c_noCOs_16326_CTTAAG_0kb_0labels_key.txt.cut.txt.

Can you point me to the issue?
Thanks,
Dario

ERROR: Converting agp file to fasta

Hi Author

I'm trying to correct the bionano scaffold in Single-Enzyme with BiSCoT.
However, an error occurs at the step "Converting agp file to fasta" and it stops working.

Can you please tell me how I can avoid the error?

The following is the original error.

[2021-12-02 19:40:50] [biscot.py - 175] [ INFO] Parsing key file
[2021-12-02 19:40:51] [biscot.py - 178] [ INFO] Parsing reference cmap file
[2021-12-02 19:40:51] [biscot.py - 181] [ INFO] Parsing contigs cmap file(s)
[2021-12-02 19:40:52] [biscot.py - 184] [ INFO] Parsing xmap(s)
[2021-12-02 19:40:53] [biscot.py - 197] [ INFO] Trying to integrate contained maps
[2021-12-02 19:40:53] [ Map.py - 220] [ INFO] Sorting alignments
[2021-12-02 19:40:53] [ Map.py - 235] [ INFO] Looking for contained maps
[2021-12-02 19:40:53] [Alignment - 1087] [ INFO] Containment solver round 1
[2021-12-02 19:40:55] [ Map.py - 220] [ INFO] Sorting alignments
[2021-12-02 19:40:55] [ Map.py - 235] [ INFO] Looking for contained maps
[2021-12-02 19:40:55] [Alignment - 1087] [ INFO] Containment solver round 2
[2021-12-02 19:40:55] [ Map.py - 220] [ INFO] Sorting alignments
[2021-12-02 19:40:56] [ Map.py - 235] [ INFO] Looking for contained maps
[2021-12-02 19:40:56] [Alignment - 1087] [ INFO] Containment solver round 3
[2021-12-02 19:40:56] [ Map.py - 220] [ INFO] Sorting alignments
[2021-12-02 19:40:56] [ Map.py - 235] [ INFO] Looking for contained maps
[2021-12-02 19:40:56] [Alignment - 1087] [ INFO] Containment solver round 4
[2021-12-02 19:40:58] [ Map.py - 220] [ INFO] Sorting alignments
[2021-12-02 19:40:58] [ Map.py - 235] [ INFO] Looking for contained maps
[2021-12-02 19:40:58] [Alignment - 1087] [ INFO] Containment solver round 5
[2021-12-02 19:40:58] [ Map.py - 220] [ INFO] Sorting alignments
[2021-12-02 19:40:58] [ Map.py - 235] [ INFO] Looking for contained maps
[2021-12-02 19:40:58] [Alignment - 1087] [ INFO] Containment solver round 6
[2021-12-02 19:41:01] [ Misc.py - 093] [ INFO] Loading contigs fasta file
[2021-12-02 19:41:27] [Alignment - 1034] [ INFO] Writing unplaced contigs in agp file
[2021-12-02 19:41:28] [ Misc.py - 110] [ INFO] Converting agp file to fasta
Traceback (most recent call last):
File "/home/sanno/anaconda3/bin/biscot", line 8, in
sys.exit(run())
File "/home/sanno/anaconda3/lib/python3.8/site-packages/biscot/biscot.py", line 215, in run
Misc.agp_to_fasta(
File "/home/sanno/anaconda3/lib/python3.8/site-packages/biscot/Misc.py", line 128, in agp_to_fasta
scaffolds_sequence_dict[scaffold_name] += contigs_sequence_dict[
KeyError: 'scf7180000028800'

In addition, BiSCoT runs with the following output from Bionano

"CTTAAG_EXP_REFINEFINAL1_bppAdjust_cmap_XXX_BNGcontigs_NGScontigs_r.cmap"
"CTTAAG_EXP_REFINEFINAL1_bppAdjust_cmap_XXX_BNGcontigs_NGScontigs_q.cmap"
"CTTAAG_EXP_REFINEFINAL1_bppAdjust_cmap_XXX_BNGcontigs_NGScontigs.xmap"
"XXX_CTTAAG_0kb_0labels_key.txt".

regard
Author

KeyError: (2063, 1)

Dear biscot authors,
I am trying to apply biscot on our data produces by Bionanp hybrid scaffolding. Unfortunately my efforts ended up with KeyError: (2063, 1). It looks, the script works, but doesn't produce/find some parts necessery for additional steps. I am just guessing. A screen shot of the terminal is attached, hope, you will be able to read it.

biscot_issue

Have you ever met this error? Do you know, how to overcome this issue?
Thank you for any help or recommendations.

Best regards,
Zuzana

Ending coordinate smaller than starting coordinate in AGP file

Dear Benjamin,
we ran the BiSCoT analysis to solve the contigs overlaps left in Bionano scaffolds, after scaffolding only the primary contigs.
When running the suggested command cat biscot_output/scaffolds.agp | awk '$3 < $2' we got 3 entries:

Super-Scaffold_49 854137 854136 15 W VvNeb_Pc000059F_arrow 12110 12109 -
Super-Scaffold_49 854137 854136 16 W VvNeb_Pc000059F_arrow 12110 12109 -
Super-Scaffold_52 8513041 6645281 12 W VvNeb_Pc000061F_arrow 6665602 4797842 +

What do you think about this? Can we trust the scaffolds, or should we try to solve this issue?
Thanks,
Simone

Output sequences beginning or terminating with Ns

Dear biscot developers,

looking at the output scaffolds.fasta i noticed that some sequences begin and/or terminates with Ns (some sequences are composed only of Ns). I was wondering if you already faced this scenario and why this could happen.

I suppose that those sequeces composed by only 13 Ns are the negative gaps resolved by biscot (am i right?) and that i can get rid of those sequences. Nevertheless, i cannot figured out why are present sequences composed only by >13 Ns or sequences beginning or terminating with Ns.

I take this opportunity to wish you a happy new year,

Thanks in advance,
Luca

run EEROR

Hi,
When I run biscot, I got an error like this,

[2020-07-17 14:44:44] [biscot.py - 173] [ INFO] Command used : ./soft/anaconda3_python3.7/bin/biscot --cmap-ref ../hybrid_scaffolds/EXP_REFINEFINAL1_bppAdjust_cmap_JG_fasta_NGScontigs_HYBRID_SCAFFOLD_r.cmap --cmap-1 ../hybrid_scaffolds/EXP_REFINEFINAL1_bppAdjust_cmap_JG_fasta_NGScontigs_HYBRID_SCAFFOLD_q.cmap --xmap-1 ../hybrid_scaffolds/EXP_REFINEFINAL1_bppAdjust_cmap_JG_fasta_NGScontigs_HYBRID_SCAFFOLD.xmap --key ../hybrid_scaffolds/JG_BSSSI_0kb_0labels_key.txt --contigs ../hybrid_scaffolds/JG.fasta --output biscot
[2020-07-17 14:44:44] [biscot.py - 175] [ INFO] Parsing key file
[2020-07-17 14:44:44] [biscot.py - 178] [ INFO] Parsing reference cmap file
[2020-07-17 14:44:45] [biscot.py - 181] [ INFO] Parsing contigs cmap file(s)
[2020-07-17 14:44:46] [biscot.py - 184] [ INFO] Parsing xmap(s)
Traceback (most recent call last):
File "./soft/anaconda3_python3.7/bin/biscot", line 11, in
load_entry_point('biscot==2.3', 'console_scripts', 'biscot')()
File "./soft/anaconda3_python3.7/lib/python3.7/site-packages/biscot/biscot.py", line 195, in run
extended_key_dict = Key.extend_key_dict(key_dict, reference_maps_dict)
File "./soft/anaconda3_python3.7/lib/python3.7/site-packages/biscot/Key.py", line 66, in extend_key_dict
key_dict[(alignment.map_id, 1)]
KeyError: (937, 1)

The cmap files were generated using hybridScaffold.pl in Solve3.3_10252018, how can I work out the error?
Thank you!

Which input files should be used

Dear BISCoT developers,
I am trying to run your tool, but getting some errors with all the files I have been trying so far.
I have been performing a hybrid scaffolding of PacBio contigs with Bionano maps obtained with Irys platform and Nb.BssSI enzyme.
In the hybrid_scaffolds output folder, I have the following files:

  • CACGAG_EXP_REFINEFINAL1_bppAdjust_cmap_Neb71_PB_Primary_plus_Haplotigs_fasta_BNGcontigs_NGScontigs(_q.cmap/_r.cmap/.xmap)
  • EXP_REFINEFINAL1_bppAdjust_cmap_Neb71_PB_Primary_plus_Haplotigs_fasta_BNGcontigs_HYBRID_SCAFFOLD(_q.cmap/_r.cmap/.xmap)
  • EXP_REFINEFINAL1_bppAdjust_cmap_Neb71_PB_Primary_plus_Haplotigs_fasta_NGScontigs_HYBRID_SCAFFOLD(_q.cmap/_r.cmap/.xmap)

So, none of them looks exactly like the ones you show in your example.
Moreover, I don't know if I should provide the original contigs used for scaffolding (and their corresponding key file) or the trimmed contigs after scaffolding (with the .cut.txt key file).

Thanks,
Simone

unusally large file after processing with BISCOT

Hello,

Thanks for creating this great program!
I've been testing this on a plant genome with high heterozygosity and was wondering why the output of biscot is so big. I ran it as follows:

biscot \
   --cmap-ref  CTTAAG_EXP_REFINEFINAL1_bppAdjust_cmap_fasta_BNGcontigs_NGScontigs_r.cmap \
   --cmap-1 CTTAAG_EXP_REFINEFINAL1_bppAdjust_cmap_fasta_BNGcontigs_NGScontigs_q.cmap \
   --xmap-1 CTTAAG_EXP_REFINEFINAL1_bppAdjust_cmap_fasta_BNGcontigs_NGScontigs.xmap \
   --key fasta_CTTAAG_0kb_0labels_key.txt \
   --contigs contigs.fasta \
   --output biscot

The output resulted in was a 22Gb file (input contigs were only 3.5Gb). The stats also showed it was 10x the original.

Before

                                   Assumed genome size (Mbp)    2100.00
                                         Number of scaffolds       2181
                                     Total size of scaffolds 3798422271
  Total scaffold length as percentage of assumed genome size     180.9%
                                            Longest scaffold  415457058
                                           Shortest scaffold          5

After

                                   Assumed genome size (Mbp)    2100.00
                                         Number of scaffolds       2593
                                     Total size of scaffolds 23174606080
  Total scaffold length as percentage of assumed genome size    1103.6%
                                            Longest scaffold  349655800
                                           Shortest scaffold          0

I am suspecting that I messed up some input files for the program. Would you please kindly let me know the issue here? Thanks!

PS: my hybrid_scaffolds folder has following list of files:

cur_results.txt
hybridScaffold_DLE1_config.xml
hybrid_scaffold_informatics_report.txt
ngs_pre_cut_annotations.bed
EXP_REFINEFINAL1_bppAdjust_cmap_fasta_NGScontigs_HYBRID_SCAFFOLD.xmap
EXP_REFINEFINAL1_bppAdjust_cmap_fasta_NGScontigs_HYBRID_SCAFFOLD_r.cmap
EXP_REFINEFINAL1_bppAdjust_cmap_fasta_NGScontigs_HYBRID_SCAFFOLD_q.cmap
EXP_REFINEFINAL1_bppAdjust_cmap_fasta_NGScontigs_HYBRID_SCAFFOLD.gap
EXP_REFINEFINAL1_bppAdjust_cmap_fasta_NGScontigs_HYBRID_SCAFFOLD.agp
EXP_REFINEFINAL1_bppAdjust_cmap_fasta_HYBRID_SCAFFOLD.cmap
EXP_REFINEFINAL1_bppAdjust_cmap_fasta_BNGcontigs_HYBRID_SCAFFOLD.xmap
EXP_REFINEFINAL1_bppAdjust_cmap_fasta_BNGcontigs_HYBRID_SCAFFOLD_r.cmap
EXP_REFINEFINAL1_bppAdjust_cmap_fasta_BNGcontigs_HYBRID_SCAFFOLD_q.cmap
conflicts_cut_status.txt
conflicts_cut_status_CTTAAG.txt
bn_pre_cut_projected_ngs_coord_annotations.bed
auto_cut_NGS_coord_translation.txt
auto_cut_BN_coord_translation.txt
EXP_REFINEFINAL1_bppAdjust_cmap_fasta_NGScontigs_HYBRID_SCAFFOLD.fasta
EXP_REFINEFINAL1_bppAdjust_cmap_fasta_NGScontigs_HYBRID_SCAFFOLD_NCBI.fasta
EXP_REFINEFINAL1_bppAdjust_cmap_fasta_NGScontigs_HYBRID_SCAFFOLD_trimHeadTailGap.coord
EXP_REFINEFINAL1_bppAdjust_cmap_fasta_NGScontigs_HYBRID_SCAFFOLD_NOT_SCAFFOLDED.fasta
contigs.fasta.cut.fasta
contigs_CTTAAG_0kb_0labels_key.txt.cut.txt
contigs_CTTAAG_0kb_0labels_key.txt
ncbi_manifest.txt
CTTAAG_EXP_REFINEFINAL1_bppAdjust_cmap_fasta_BNGcontigs_NGScontigs.xmap
CTTAAG_EXP_REFINEFINAL1_bppAdjust_cmap_fasta_BNGcontigs_NGScontigs_r.cmap
CTTAAG_EXP_REFINEFINAL1_bppAdjust_cmap_fasta_BNGcontigs_NGScontigs_q.cmap
conflicts.txt
EXP_REFINEFINAL1_bppAdjust_cmap_fasta_HYBRID_SCAFFOLD_log.txt

output documentation

Hello!
I'm using Biscot to correct a hybrid scaffolding from bionano and ONT.
I have obtained 1453 scaffolds (62 Hybrid scaffold FASTA + 1348 not scaffolded NGS FASTA) with bionano HS.
Using Biscot I have obtained 3058 super-scaffolds, 92 subsets ...
In the agp file created by biscot, what feature correspond to the higher level of scaffolding (subset, Super-scaffolding) ?
I wonder what fasta sequence i will use to calculate busco, %N and other statistics?
Thanks!
Julie

Question regarding AGP (scaffold.agp)

Hi,

Thanks for developing the biscot tool.

I was wondering if you can give provide insight of the resulting AGP after running biscot.

I have a biscot run finished and when I grep'd the 13bp gap blocks, there were more in the scaffolds.agp then the original hybridscaffold.agp

Comparing the AGP files, biscot did merge some of the overlaps (duplicated regions) that was highlighted by the 13bp gap during hybridscaffold, which is great. However I do notice that the new 13bp gaps in scaffolds.agp are replacing the sized gaps from the hybridscaffold agp.

Just wondering the reasoning for this and if you can tell me if there are any other reasons for introducing a new 13bp gap in the scaffolds.agp. Pretty much is the reason for the increase in 13bp agp?

Again great work and thanks for your time.

Will

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.