The biograph.jl from nguyetdang

biograph.jl's Issues

Error with get_gfa() and get_fasta() functions

I've been trying to get a a method working based on the test_longest_path.jl.

I'm running from a singularity container, produced from a Dockerfile with the following:

FROM ubuntu:21.04

ARG DEBIAN_FRONTEND=noninteractive

RUN apt-get update && apt-get install -y dialog apt-utils julia git wget \
        && rm -rf /var/lib/apt/lists/*

RUN julia -e 'import Pkg; Pkg.add("BioGraph"); Pkg.add("Cbc"); Pkg.add("LightGraphs"); Pkg.add("Suppressor"); Pkg.precompile()'

I then run this method through Julia using one of the graphs in /test/data/

import Pkg; Pkg.add("BioGraph"); Pkg.add("Cbc"); Pkg.add("LightGraphs"); Pkg.add("Suppressor"); using BioGraph; using BioGraph: Weight, NodeLabel, EdgeLabel; using LightGraphs; using Suppressor; using Cbc; gfa_result = read_from_gfa("gfa_sample_1.gfa"); res = find_graph_component(gfa_result); longest = find_longest_path(res.graph[1], Cbc.Optimizer, is_weighted = false); final_gfa = get_gfa(longest, "out.txt");

I get this output:

Making graph
Add constraint
Solving
inter: 1, objective: 6.0, num new cycle: 0
ERROR: LoadError: MethodError: no method matching get_gfa(::BioGraph.LongestPath, ::String)
Closest candidates are:
  get_gfa(::BioGraph.LongestPath; outfile) at /home/ubuntu/.julia/packages/BioGraph/N9DQS/src/longest_path.jl:167
Stacktrace:
 [1] top-level scope at /data/biograph_builds/BioGraph.jl/test/data/method.jl:1
 [2] include(::Function, ::Module, ::String) at ./Base.jl:380
 [3] include(::Module, ::String) at ./Base.jl:368
 [4] exec_options(::Base.JLOptions) at ./client.jl:296
 [5] _start() at ./client.jl:506
in expression starting at /data/biograph_builds/BioGraph.jl/test/data/method.jl:1

I receive a similar error if I use get_fasta instead.

It's like the get_gfa and get_fasta functions aren't accessible.

To linearise the GFA file, should I be using a method like I'm using? I assume by passing the GFA through these functions like find_longest_path, the resulting graph will be linerised. I'm attempting to produce a method for reading in a graph produced by PGGB and writing out a linerised graph compatible with Panache.

Linearise a GFA by a chosen path (reference)

I'm interested in using BioGraph to linearise a GFA file produced by PGGB (https://github.com/pangenome/pggb) for visualisation in Panache (https://github.com/SouthGreenPlatform/panache). I'm currently attempting to install through Julia (I'm not familiar with Julia), and I want to select a linearised representation based on a chosen reference. I was wondering if that is possible. From the documentation it only looks like the linear representation is based on the longest path in the graph. Thanks.

Update FASTA output

Writing the FASTA output considering the CIGAR string in the L-line.

** Input **

A linear path

** Output **

FASTA file

Taking the sequence string in the S-line and their link in the L-line. A linear sequence will be generated. The mechanism is kept as in previous FASTA output, however, we have to consider the CIGAR string for overlap.

For example in the L-line having s1 + s2 + 5M, meaning that the sequence 1 (forward) and the sequence 2 (forward) are connected by a 5-nucleotide overlapped fragment. When concatenating sequence 1 and sequence 2, the overlap fragment must appear only once.

Read GFA file

Input:

GFA file

Output:

Elements having the same index in Node Label Array, Sequence Array and Sequence Length Array are coming from the same S-line in the GFA file.

Find Terminus

Input

Simple Graph - output of #2

Output

Dictionary of source nodes and corresponding direction
Dictionary of sink nodes and corresponding direction

Get weight value

Input

Genome Graph - output of #1
CSV containing 2 columns: name of segment and weight value

Output

Genome Graph containing weight value as indicated in CSV file

Find Longest Linear Path

Input

Simple Graph - output of #2
Weight Array (optional)
Remove internal cycle (optional): TRUE/FALSE

Output

Longest linear path in P-line format

Output FASTA

Input

A linear path
Original genome graph (all output from read gfa)
Path to output (default: current working directory)

Output

a FASTA file at working directory

FASTA structure: There are two types of line in the FASTA file

>linear_path
ATGCATGCATGC

Direction of a DNA strand

If the strand is +, take the DNA as in the sequence of the S-line
If the strand is -, take the reverse complement DNA of the sequence in the S-line

See the last example at http://gfa-spec.github.io/GFA-spec/GFA1.html

Find Genome Graph Components

Input

Genome Graph - output of #1

Output

Simple graph
Cycle
Lone nodes

Find the longest path containing given paths

Input

Genome Graph - output #1
Array containing list of paths in the GFA

Output

Longest paths having those paths in the input

Some notice:

We should modify the read GFA function to touch the P-line.
The paths in the input array might belong to different graph components (simple graph or cycle). We have to classify those paths to certain graph component before performing the longest path.

Find Longest Linear Path with Known Terminus

Input

Simple Graph - output of #2
Start node and its direction
End node and its direction
Weight value array (optional): default = EMPTY ARRAY
Remove internal cycles (optional): default = TRUE

Output

Longest linear path (same as P-line format) having those start and end nodes

In case weight value array is provided, raise warning if weight value array does not have the same size as the number of node in the simple graph.

nguyetdang / biograph.jl Goto Github PK

biograph.jl's People

Contributors

Stargazers

Watchers

biograph.jl's Issues

Error with get_gfa() and get_fasta() functions

Linearise a GFA by a chosen path (reference)

Update FASTA output

Read GFA file

Find Terminus

Get weight value

Find Longest Linear Path

Output FASTA

Find Genome Graph Components

Find the longest path containing given paths

Find Longest Linear Path with Known Terminus

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent