Giter Club home page Giter Club logo

gfatk's People

Contributors

euphrasiologist avatar lavafroth avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

gfatk's Issues

Allow any name for GFA segments, lines, paths etc.

Currently the user is only allowed integer numbers for names of the GFA elements (which get parsed to usize). A gfatk rename function is provided which converts any GFA to usize parsable names. However, this is annoying and should allow any arbitrary name for e.g. a segment name. These can be parsed as Vec<u8>'s.

Extract multiple mito contigs

The time has come.

  • gfatk extract-chloro should only extract one subgraph as we expect a consistent structure.
  • gfatk extract-mito should allow for multiple contigs being extracted. These contigs are usually single segments, and circular. So we can restrict our extracting criteria to these at the moment, unless there is good evidence otherwise.

`gfatk linear -ei` bug

If we have the -e flag and also the -i flag, if one of the subgraphs overflows, we should fall back on normal linearisation?

Format GFA to be compatible with gfatk

Might be nice to rename the nodes of in input GFA to be 1-indexed, so it can work with the rest of the toolkit. MBG does this by default. I never got around to implementing arbitrary index names, but oh well.

Restrict linear based on node number

As the linear algorithm is brute force, as the number of nodes increase, there is a very large increase in number of possible paths. At the moment, gfatk hangs if there are too many nodes in the graph (it tries to do the calculation). We should terminate calculations for number of nodes > 60? Probably should do some tests.

Obtain path sequences as specified in P-lines

Hi, thanks for providing this great toolkit!

I found this toolkit to be the only one I could find that is able to provide the fasta sequence for a path. The only thing is, I have to manually specify this. Would it be possible to supply the PathName of a path (or by default obtain the sequence for all paths)? My current workaround for one path is:

pathname="G1S1" #name of the path
pathsegments=$(awk -vpathname=${pathname} 'BEGIN{FS = OFS = "\t";} $1=="P" && $2==pathname {print $3;}' test.gfa) #obtaining the order of segments for the path
gfatk path test.gfa "${pathsegments}" | awk -vpathname=${pathname} '/^>/{$1=">"pathname} {print;}' #use gfatk for obtaining the sequence and change fasta header back to name of the path

However, ideally I think it should be possible to say something like gfatk path test.gfa --all > all_paths.fa, if P-lines are specified in the GFA format. (I have not found any tool that can do this but yours seems closest.) What are your thoughts on this?

gfatk linear usage questions

Hi,
Thanks for your user-friendly software!
I am looking for a way to extract a longest path in a GFA file.
But while I try
gfatk linear xxx.gfa
There is an error
Error: Edge coverage not found.

Could I achieve my goal without the coverage?

Best wishes!
Lan

Add `gfatk extract-chloro` command

Add a convenience subcommand to extract the putative chloroplast assembly.

This should be straightforward, as it should be of a rather consistent size and GC%.

Hashmap for NodeIndex <-> segment ID

Hi,

I've been hacking some stuff together for a different project since this has a nice implementation of gfa->petgraph, but it was slowing down way too much for large graphs (10+ million nodes).

I didn't need as much fancy handling, so I just implemented a hashmap for NodeIndex to segment ID (and can later invert it), and the graph went from probably taking hours to load to 10 seconds or so, so that is a big improvement (I was inspired by your hint!).

/// This should 100% have been a map-like structure...

Thanks,
Alex

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.