Giter Club home page Giter Club logo

Comments (10)

tobiasmarschall avatar tobiasmarschall commented on July 20, 2024 1

Could you also add a test case for this?

from gaftools.

tobiasmarschall avatar tobiasmarschall commented on July 20, 2024 1

Yes, works for me now. Thanks for the quick fix.

from gaftools.

fawaz-dabbaghieh avatar fawaz-dabbaghieh commented on July 20, 2024

Should be easy to do, a small modification in the write_graph internal GFA function.
But just to make sure I understood it correctly, you want to output to follow the BO and NO tag that we start assigning, so for example I take the "first" scaffold node, output it, then output all the nodes in the bubble next to it, then the scaffold node after the bubble, and so on...

Should I mix the S and L lines or do you want the output to first have all the S lines, and then all the L lines?
At the moment, I loop through the nodes I want to output into a file, write the S line of that node, and then the L lines associated with that node. Should I keep it as is?

from gaftools.

tobiasmarschall avatar tobiasmarschall commented on July 20, 2024

Exactly, I thought to output the S lines ordered by their BO (primary sort key) and NO (secondary sort key). In my view, the L lines should be interleaved, i.e. after a S line (or multiple one) corresponding to a given BO tag, you can list of a block of L lines corresponding to all edges within the corresponding biconnected component. In this way, the information stays together. So if write_graph would support this (for graphs that do have BO and NO tags), that'd be a nice feature.

from gaftools.

fawaz-dabbaghieh avatar fawaz-dabbaghieh commented on July 20, 2024

Yeah, I don't think it would be too hard to do. Famous last words!

from gaftools.

fawaz-dabbaghieh avatar fawaz-dabbaghieh commented on July 20, 2024

I just tried to order based on the BO tags, which should give the desired effect already, I looked at the small graph in Bandage, and the GFA output follows the graph from one end to the other. The only caveat here is that the nodes inside the bubble won't be ordered topologically, but all the nodes in the bubble will come after each other in the GFA output. The L lines are intertwined between the S line.

In theory, I can output an index which keeps bubble IDs and offsets in the output GFA file, and you can use that to only load one or more bubbles instead of reading a huge GFA file.

But I'll keep this idea for another day :)

from gaftools.

tobiasmarschall avatar tobiasmarschall commented on July 20, 2024

Thanks. Yes, let's keep indexing separate. I think both Samarendra and I have some thoughts on indexing as well and would be good to discuss first.

from gaftools.

tobiasmarschall avatar tobiasmarschall commented on July 20, 2024

Doesn't work for me:

$ time gaftools order_gfa --outdir out_d750351 minigraph-extended_all-Oct24.gfa Reading minigraph-extended_all-Oct24.gfa
The graph has:
The graph has 1250314 Nodes and 1802130 Edges
Connected components: 25
Found one connected component per expected chromosome.
Processing chr1
 Input component: 90015 nodes
 Finding Biconnected Components of the component chr1
 It took 4.152787557002739 seconds to find the Biconnected Components
  Bubbles: 17942
  Scaffold graph: 35887 nodes
Traceback (most recent call last):
  File "/home/tobi/miniforge3/envs/gaftools-dev/bin/gaftools", line 8, in <module>
    sys.exit(main())
  File "/home/tobi/scm/gaftools/gaftools/__main__.py", line 88, in main
    module.main(args)
  File "/home/tobi/scm/gaftools/gaftools/cli/order_gfa.py", line 215, in main
    run_order_gfa(**vars(args))
  File "/home/tobi/scm/gaftools/gaftools/cli/order_gfa.py", line 75, in run_order_gfa
    graph.write_gfa(set_of_nodes=component_nodes, output_file=f_gfa, append=False, order_bo=True)
  File "/home/tobi/scm/gaftools/gaftools/GFA.py", line 390, in write_gfa
    set_of_nodes = sorted(list(self.nodes.keys()), key=lambda x: int(self.nodes[x].tags['BO'][1]))
  File "/home/tobi/scm/gaftools/gaftools/GFA.py", line 390, in <lambda>
    set_of_nodes = sorted(list(self.nodes.keys()), key=lambda x: int(self.nodes[x].tags['BO'][1]))
KeyError: 'BO'

real    0m30,692s
user    0m28,821s
sys     0m1,848s

Maybe we should start using feature branches and only merge once there are tests and they are ok.

from gaftools.

fawaz-dabbaghieh avatar fawaz-dabbaghieh commented on July 20, 2024

Yes, this is what I was doing before, developing in a separate branch. But those edits seemed harmless enough, and it passed the all automated tests, so I pushed it to main. I'll fix this later today and get back to you

from gaftools.

fawaz-dabbaghieh avatar fawaz-dabbaghieh commented on July 20, 2024

I fixed the bug you were getting, at least I hope so. I did test it right now for the minigraph-extended and it worked (see time output below). I still need to add proper test cases to the sorting, which i also changed a bit, it should sort better now. Basically sort all the BOs into buckets, and then each bucket of nodes that have the same BO, gets sorted by NO, and I output that order.

I hope it's stable for now, so I don't need to push to master again and can develop things on the side. But wanted to fix this bug to make it usable again

	Command being timed: "gaftools order_gfa minigraph-extended_all_no_seq.gfa --outdir out/"
	User time (seconds): 122.75
	System time (seconds): 2.03
	Percent of CPU this job got: 99%
	Elapsed (wall clock) time (h:mm:ss or m:ss): 2:04.87
	Average shared text size (kbytes): 0
	Average unshared data size (kbytes): 0
	Average stack size (kbytes): 0
	Average total size (kbytes): 0
	Maximum resident set size (kbytes): 3640720
	Average resident set size (kbytes): 0
	Major (requiring I/O) page faults: 32
	Minor (reclaiming a frame) page faults: 985690
	Voluntary context switches: 72
	Involuntary context switches: 1222
	Swaps: 0
	File system inputs: 326360
	File system outputs: 491440
	Socket messages sent: 0
	Socket messages received: 0
	Signals delivered: 0
	Page size (bytes): 4096
	Exit status: 0

from gaftools.

Related Issues (13)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.