This is a suite of scripts developed for working with GAF files and their corresponding rGFA files.
Detailed documentation is available here.
General purpose utility related to GAF files
License: MIT License
This is a suite of scripts developed for working with GAF files and their corresponding rGFA files.
Detailed documentation is available here.
Hi,
I installed gaftools according to the process and tried to index, but found the following error:
$ gaftools index t.gaf t.gfa
Traceback (most recent call last):
File "/share/home/miniconda3/envs/podman/bin/gaftools", line 8, in <module>
sys.exit(main())
^^^^^^
File "/share/home/software/gaftools/gaftools/__main__.py", line 88, in main
module.main(args)
File "/share/home/software/gaftools/gaftools/cli/index.py", line 206, in main
run(**vars(args))
File "/share/home/software/gaftools/gaftools/cli/index.py", line 24, in run
from gaftools.cli.sort import gfa_sort
ImportError: cannot import name 'gfa_sort' from 'gaftools.cli.sort' (/share/home/software/gaftools/gaftools/cli/sort.py)
How to solve?
Thanks
Please add complete description of output files and their content. In particular, please explain the meaning and interpretation of B0 and N0 tags relative to the following excerpt:
"...bubbles in the graphs are identified using gaftools (commit ID: 919bbec).
The function order_gfa tags the nodes of the graph to identify whether the nodes are bubble nodes (nodes inside a bubble) or scaffold nodes (nodes outside a bubble)."
(https://www.biorxiv.org/content/10.1101/2024.04.18.590093v1.full.pdf)
When I run gaftools stat to get statistics on my gaf file, it gives me this error and I get the impression that the problem is in the code and there's nothing I can do about it :
Traceback (most recent call last):
File "/usr/local/bioinfo/miniconda3-23.10.0-1/envs/gaftools-0.1/bin/gaftools", line 8, in
sys.exit(main())
File "/usr/local/bioinfo/gaftools-0.1/gaftools/main.py", line 88, in main
module.main(args)
File "/usr/local/bioinfo/gaftools-0.1/gaftools/cli/stat.py", line 159, in main
run_stat(**vars(args))
File "/usr/local/bioinfo/gaftools-0.1/gaftools/cli/stat.py", line 51, in run_stat
for alignment_count, mapping in enumerate(parse_gaf(gaf_path), 1):
File "/usr/local/bioinfo/gaftools-0.1/gaftools/gaf.py", line 66, in parse_gaf
query_start = int(fields[2])
ValueError: invalid literal for int() with base 10: '*'
I get the following error in order_gfa
:
$ gaftools order_gfa --chromosome_order chr1 test.gfa
Reading test.gfa
Reading test.gfa
The graph has:
The graph has 90015 Nodes and 129941 Edges
Connected components: 1
Found one connected component per expected chromosome.
Processing chr1
Input graph: 90015 nodes
Finding Biconnected Components of the component
Traceback (most recent call last):
File "/home/tobi/miniforge3/envs/gaftools-dev/bin/gaftools", line 8, in <module>
sys.exit(main())
File "/home/tobi/scm/gaftools/gaftools/__main__.py", line 88, in main
module.main(args)
File "/home/tobi/scm/gaftools/gaftools/cli/order_gfa.py", line 217, in main
run_order_gfa(**vars(args))
File "/home/tobi/scm/gaftools/gaftools/cli/order_gfa.py", line 56, in run_order_gfa
scaffold_nodes, inside_nodes, node_order, bo, bubble_count = decompose_and_order(graph, component_nodes, bo)
File "/home/tobi/scm/gaftools/gaftools/cli/order_gfa.py", line 136, in decompose_and_order
scaffold_graph.add_edge(('s',node1), ('s',node2))
TypeError: add_edge() missing 2 required positional arguments: 'node2' and 'node2_dir'
The error does not happen in commit b286340. Also finding biconnected components seems to be quite a bit slower now after the merge.
I would like to use this toolkit within an analysis workflow I am developing. The nature of the analysis requires the graph be in GFA 1.0 format for reasons I won't go into. However, supplying a GFA 1.0 as input to gaftools index
causes a crash, and looking into the code shows that rGFA is, indeed, baked in, as it were.
I am wondering if there are technical reasons rGFA is used and GFA 1.0 is not supported. This constraint prevents applying gaftools to any genomics analyses that require haplotype information, which is a pretty big limitation. Any thoughts?
I'm getting the error below when with minigraph cactus:
gaftools order_gfa /gpfs/project/projects/medbioinf/users/asoylev/hgsvc3/ref_graph/hprc-v1.1-mc-chm13.gfa
Reading /gpfs/project/projects/medbioinf/users/asoylev/hgsvc3/ref_graph/hprc-v1.1-mc-chm13.gfa
The graph has:
The graph has 92879580 Nodes and 128165765 Edges
Connected components: 25
Traceback (most recent call last):
File "/home/liy34nag/bin/gaftools", line 33, in <module>
sys.exit(load_entry_point('gaftools', 'console_scripts', 'gaftools')())
File "/gpfs/project/projects/medbioinf/users/asoylev/gaftools/gaftools/__main__.py", line 88, in main
module.main(args)
File "/gpfs/project/projects/medbioinf/users/asoylev/gaftools/gaftools/cli/order_gfa.py", line 218, in main
run_order_gfa(**vars(args))
File "/gpfs/project/projects/medbioinf/users/asoylev/gaftools/gaftools/cli/order_gfa.py", line 39, in run_order_gfa
components = name_comps(graph, components)
File "/gpfs/project/projects/medbioinf/users/asoylev/gaftools/gaftools/cli/order_gfa.py", line 195, in name_comps
counts = Counter([graph[n].tags["SN"][1] for n in comp])
File "/gpfs/project/projects/medbioinf/users/asoylev/gaftools/gaftools/cli/order_gfa.py", line 195, in <listcomp>
counts = Counter([graph[n].tags["SN"][1] for n in comp])
KeyError: 'SN'
Hi,
thanks for your efforts here. Is it in scope in future for this package to call SNPs, or do you know of any SNP calling alternatives to vg call
for pangenomes. Especially calling from gaf files.
Thanks
Hi,
when running gaftools stat
(commit 2cacdd4) with a gzip compressed file, I get the following error:
Traceback (most recent call last):
File "/home/jana/miniconda3/bin/gaftools", line 33, in <module>
sys.exit(load_entry_point('gaftools', 'console_scripts', 'gaftools')())
File "/home/jana/Downloads/test-gaftools/gaftools/gaftools/__main__.py", line 88, in main
module.main(args)
File "/home/jana/Downloads/test-gaftools/gaftools/gaftools/cli/stat.py", line 118, in main
run_stat(**vars(args))
File "/home/jana/Downloads/test-gaftools/gaftools/gaftools/cli/stat.py", line 50, in run_stat
for alignment_count, mapping in enumerate(parse_gaf(gaf_path), 1):
File "/home/jana/Downloads/test-gaftools/gaftools/gaftools/gaf.py", line 40, in parse_gaf
gaf_file = gzip.open(filename,"r")
NameError: name 'gzip' is not defined
Maybe this is simply because of a missing import gzip
?
It seems the strand column in the output GAF is misplaced in the 4th column when it should be in the 5th column.
I have a request related to the orientation of the sequence in GFA graphs. With the GFA graphs I am working with, the strand represented in the GFA segment sequence is essentially random. One way this manifests in the graph is that links from the same node end arrive have opposite "to orientation" For example, for nodes utig4-1529 to utig4-1530 and utig4-1527 in this fork structure
The edge from 1529 to 1530 has a + "to orientation" while the edge to 1527 has a - "to orientation"
I believe that this could be done, in the acyclic case, by traversing the graph from one end to the other and taking reverse complements when relative misorientations are detected.
Arda asked if the GFA class can have a function that retrieves a path that represents a chromosome in the pangenome graph.
The idea is then to extract the node with the SN tag with the chromosome required, order the nodes according to the SO tags, and this ordered list of nodes should create a linear path, i.e. there are nodes connecting each node in the list with the following node.
TODO in the GFA class side:
1- Now that there's an add_edge add_node function, I should use those when populating the graph in read_graph instead of doing it manually
2- Make sure that any function in the GFA class or Node class that acesses internal information on those classes is safe, and do not delete things before making sure everything is correct
3- better error raising and more checks
TODO in terms of integrating the GFA class
1- Implementing the bi-connected components graph to get rid of networkx then gaftools can rely on one GFA class for any graph related operations
2- integrating GFA class step by step into each subcommand separately until it's all the tests are green and happy
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.