Giter Club home page Giter Club logo

gaftools's Introduction

gaftools

This is a suite of scripts developed for working with GAF files and their corresponding rGFA files.

Detailed documentation is available here.

gaftools's People

Contributors

asylvz avatar samarendra-pani avatar fawaz-dabbaghieh avatar tobiasmarschall avatar

Stargazers

Quan Wei avatar Konstantinn Bonnet avatar  avatar Groza Cristian avatar Colin Davenport avatar Yumin Huang avatar WeiWenjie avatar

Watchers

 avatar  avatar

gaftools's Issues

can not found gfa_sort function in sort.py

Hi,

I installed gaftools according to the process and tried to index, but found the following error:

$ gaftools index t.gaf t.gfa
Traceback (most recent call last):
  File "/share/home/miniconda3/envs/podman/bin/gaftools", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/share/home/software/gaftools/gaftools/__main__.py", line 88, in main
    module.main(args)
  File "/share/home/software/gaftools/gaftools/cli/index.py", line 206, in main
    run(**vars(args))
  File "/share/home/software/gaftools/gaftools/cli/index.py", line 24, in run
    from gaftools.cli.sort import gfa_sort
ImportError: cannot import name 'gfa_sort' from 'gaftools.cli.sort' (/share/home/software/gaftools/gaftools/cli/sort.py)

How to solve?

Thanks

Improve order_gfa documentation

Please add complete description of output files and their content. In particular, please explain the meaning and interpretation of B0 and N0 tags relative to the following excerpt:

"...bubbles in the graphs are identified using gaftools (commit ID: 919bbec).
The function order_gfa tags the nodes of the graph to identify whether the nodes are bubble nodes (nodes inside a bubble) or scaffold nodes (nodes outside a bubble)."

(https://www.biorxiv.org/content/10.1101/2024.04.18.590093v1.full.pdf)

Gaftools stat does not work with a GAF file as input

When I run gaftools stat to get statistics on my gaf file, it gives me this error and I get the impression that the problem is in the code and there's nothing I can do about it :

Traceback (most recent call last):
File "/usr/local/bioinfo/miniconda3-23.10.0-1/envs/gaftools-0.1/bin/gaftools", line 8, in
sys.exit(main())
File "/usr/local/bioinfo/gaftools-0.1/gaftools/main.py", line 88, in main
module.main(args)
File "/usr/local/bioinfo/gaftools-0.1/gaftools/cli/stat.py", line 159, in main
run_stat(**vars(args))
File "/usr/local/bioinfo/gaftools-0.1/gaftools/cli/stat.py", line 51, in run_stat
for alignment_count, mapping in enumerate(parse_gaf(gaf_path), 1):
File "/usr/local/bioinfo/gaftools-0.1/gaftools/gaf.py", line 66, in parse_gaf
query_start = int(fields[2])
ValueError: invalid literal for int() with base 10: '*'

Bug in order_gfa after merging PR14

I get the following error in order_gfa:

$ gaftools order_gfa --chromosome_order chr1 test.gfa
Reading test.gfa
Reading test.gfa
The graph has:
The graph has 90015 Nodes and 129941 Edges
Connected components: 1
Found one connected component per expected chromosome.
Processing chr1
 Input graph: 90015 nodes
 Finding Biconnected Components of the component
Traceback (most recent call last):
  File "/home/tobi/miniforge3/envs/gaftools-dev/bin/gaftools", line 8, in <module>
    sys.exit(main())
  File "/home/tobi/scm/gaftools/gaftools/__main__.py", line 88, in main
    module.main(args)
  File "/home/tobi/scm/gaftools/gaftools/cli/order_gfa.py", line 217, in main
    run_order_gfa(**vars(args))
  File "/home/tobi/scm/gaftools/gaftools/cli/order_gfa.py", line 56, in run_order_gfa
    scaffold_nodes, inside_nodes, node_order, bo, bubble_count = decompose_and_order(graph, component_nodes, bo)
  File "/home/tobi/scm/gaftools/gaftools/cli/order_gfa.py", line 136, in decompose_and_order
    scaffold_graph.add_edge(('s',node1), ('s',node2))
TypeError: add_edge() missing 2 required positional arguments: 'node2' and 'node2_dir'

The error does not happen in commit b286340. Also finding biconnected components seems to be quite a bit slower now after the merge.

Possible to use GFA 1.0 as input to gaftools index?

I would like to use this toolkit within an analysis workflow I am developing. The nature of the analysis requires the graph be in GFA 1.0 format for reasons I won't go into. However, supplying a GFA 1.0 as input to gaftools index causes a crash, and looking into the code shows that rGFA is, indeed, baked in, as it were.

I am wondering if there are technical reasons rGFA is used and GFA 1.0 is not supported. This constraint prevents applying gaftools to any genomics analyses that require haplotype information, which is a pretty big limitation. Any thoughts?

order_gfa with minigraph-cactus

I'm getting the error below when with minigraph cactus:
gaftools order_gfa /gpfs/project/projects/medbioinf/users/asoylev/hgsvc3/ref_graph/hprc-v1.1-mc-chm13.gfa

Reading /gpfs/project/projects/medbioinf/users/asoylev/hgsvc3/ref_graph/hprc-v1.1-mc-chm13.gfa
The graph has:
The graph has 92879580 Nodes and 128165765 Edges
Connected components: 25
Traceback (most recent call last):
  File "/home/liy34nag/bin/gaftools", line 33, in <module>
    sys.exit(load_entry_point('gaftools', 'console_scripts', 'gaftools')())
  File "/gpfs/project/projects/medbioinf/users/asoylev/gaftools/gaftools/__main__.py", line 88, in main
    module.main(args)
  File "/gpfs/project/projects/medbioinf/users/asoylev/gaftools/gaftools/cli/order_gfa.py", line 218, in main
    run_order_gfa(**vars(args))
  File "/gpfs/project/projects/medbioinf/users/asoylev/gaftools/gaftools/cli/order_gfa.py", line 39, in run_order_gfa
    components = name_comps(graph, components)
  File "/gpfs/project/projects/medbioinf/users/asoylev/gaftools/gaftools/cli/order_gfa.py", line 195, in name_comps
    counts = Counter([graph[n].tags["SN"][1] for n in comp])
  File "/gpfs/project/projects/medbioinf/users/asoylev/gaftools/gaftools/cli/order_gfa.py", line 195, in <listcomp>
    counts = Counter([graph[n].tags["SN"][1] for n in comp])
KeyError: 'SN'

SNP calling

Hi,

thanks for your efforts here. Is it in scope in future for this package to call SNPs, or do you know of any SNP calling alternatives to vg call for pangenomes. Especially calling from gaf files.

Thanks

Issue running gaftools stat on gzip compressed file

Hi,

when running gaftools stat (commit 2cacdd4) with a gzip compressed file, I get the following error:

Traceback (most recent call last):
  File "/home/jana/miniconda3/bin/gaftools", line 33, in <module>
    sys.exit(load_entry_point('gaftools', 'console_scripts', 'gaftools')())
  File "/home/jana/Downloads/test-gaftools/gaftools/gaftools/__main__.py", line 88, in main
    module.main(args)
  File "/home/jana/Downloads/test-gaftools/gaftools/gaftools/cli/stat.py", line 118, in main
    run_stat(**vars(args))
  File "/home/jana/Downloads/test-gaftools/gaftools/gaftools/cli/stat.py", line 50, in run_stat
    for alignment_count, mapping in enumerate(parse_gaf(gaf_path), 1):
  File "/home/jana/Downloads/test-gaftools/gaftools/gaftools/gaf.py", line 40, in parse_gaf
    gaf_file = gzip.open(filename,"r")
NameError: name 'gzip' is not defined

Maybe this is simply because of a missing import gzip?

Feature Request: Orient a GFA

I have a request related to the orientation of the sequence in GFA graphs. With the GFA graphs I am working with, the strand represented in the GFA segment sequence is essentially random. One way this manifests in the graph is that links from the same node end arrive have opposite "to orientation" For example, for nodes utig4-1529 to utig4-1530 and utig4-1527 in this fork structure

Screenshot 2023-08-14 at 3 50 36 PM

The edge from 1529 to 1530 has a + "to orientation" while the edge to 1527 has a - "to orientation"

Screenshot 2023-08-14 at 4 05 08 PM

I believe that this could be done, in the acyclic case, by traversing the graph from one end to the other and taking reverse complements when relative misorientations are detected.

[Feature Request] Extract chromosome path from graph with the GFA class

Arda asked if the GFA class can have a function that retrieves a path that represents a chromosome in the pangenome graph.

The idea is then to extract the node with the SN tag with the chromosome required, order the nodes according to the SO tags, and this ordered list of nodes should create a linear path, i.e. there are nodes connecting each node in the list with the following node.

Integrating GFA class

TODO in the GFA class side:
1- Now that there's an add_edge add_node function, I should use those when populating the graph in read_graph instead of doing it manually
2- Make sure that any function in the GFA class or Node class that acesses internal information on those classes is safe, and do not delete things before making sure everything is correct
3- better error raising and more checks

TODO in terms of integrating the GFA class
1- Implementing the bi-connected components graph to get rid of networkx then gaftools can rely on one GFA class for any graph related operations
2- integrating GFA class step by step into each subcommand separately until it's all the tests are green and happy

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.