Giter Club home page Giter Club logo

biograph.jl's People

Contributors

francoissabot avatar nguyetdang avatar tuando95 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

biograph.jl's Issues

Error with get_gfa() and get_fasta() functions

Hi

I've been trying to get a a method working based on the test_longest_path.jl.

I'm running from a singularity container, produced from a Dockerfile with the following:

FROM ubuntu:21.04

ARG DEBIAN_FRONTEND=noninteractive

RUN apt-get update && apt-get install -y dialog apt-utils julia git wget \
        && rm -rf /var/lib/apt/lists/*

RUN julia -e 'import Pkg; Pkg.add("BioGraph"); Pkg.add("Cbc"); Pkg.add("LightGraphs"); Pkg.add("Suppressor"); Pkg.precompile()'

I then run this method through Julia using one of the graphs in /test/data/

import Pkg; Pkg.add("BioGraph"); Pkg.add("Cbc"); Pkg.add("LightGraphs"); Pkg.add("Suppressor"); using BioGraph; using BioGraph: Weight, NodeLabel, EdgeLabel; using LightGraphs; using Suppressor; using Cbc; gfa_result = read_from_gfa("gfa_sample_1.gfa"); res = find_graph_component(gfa_result); longest = find_longest_path(res.graph[1], Cbc.Optimizer, is_weighted = false); final_gfa = get_gfa(longest, "out.txt");

I get this output:

Making graph
Add constraint
Solving
inter: 1, objective: 6.0, num new cycle: 0
ERROR: LoadError: MethodError: no method matching get_gfa(::BioGraph.LongestPath, ::String)
Closest candidates are:
  get_gfa(::BioGraph.LongestPath; outfile) at /home/ubuntu/.julia/packages/BioGraph/N9DQS/src/longest_path.jl:167
Stacktrace:
 [1] top-level scope at /data/biograph_builds/BioGraph.jl/test/data/method.jl:1
 [2] include(::Function, ::Module, ::String) at ./Base.jl:380
 [3] include(::Module, ::String) at ./Base.jl:368
 [4] exec_options(::Base.JLOptions) at ./client.jl:296
 [5] _start() at ./client.jl:506
in expression starting at /data/biograph_builds/BioGraph.jl/test/data/method.jl:1

I receive a similar error if I use get_fasta instead.

It's like the get_gfa and get_fasta functions aren't accessible.

To linearise the GFA file, should I be using a method like I'm using? I assume by passing the GFA through these functions like find_longest_path, the resulting graph will be linerised. I'm attempting to produce a method for reading in a graph produced by PGGB and writing out a linerised graph compatible with Panache.

Linearise a GFA by a chosen path (reference)

Hi

I'm interested in using BioGraph to linearise a GFA file produced by PGGB (https://github.com/pangenome/pggb) for visualisation in Panache (https://github.com/SouthGreenPlatform/panache). I'm currently attempting to install through Julia (I'm not familiar with Julia), and I want to select a linearised representation based on a chosen reference. I was wondering if that is possible. From the documentation it only looks like the linear representation is based on the longest path in the graph. Thanks.

Update FASTA output

Writing the FASTA output considering the CIGAR string in the L-line.

** Input **

  • A linear path

** Output **

  • FASTA file

Taking the sequence string in the S-line and their link in the L-line. A linear sequence will be generated. The mechanism is kept as in previous FASTA output, however, we have to consider the CIGAR string for overlap.

For example in the L-line having s1 + s2 + 5M, meaning that the sequence 1 (forward) and the sequence 2 (forward) are connected by a 5-nucleotide overlapped fragment. When concatenating sequence 1 and sequence 2, the overlap fragment must appear only once.

Read GFA file

Input:

GFA file

Output:

  • Genome Graph
  • Node Label Array
  • Sequence Array
  • Sequence Length Array
  • Edge Label Array

Elements having the same index in Node Label Array, Sequence Array and Sequence Length Array are coming from the same S-line in the GFA file.

Find Terminus

Input

Simple Graph - output of #2

Output

  • Dictionary of source nodes and corresponding direction
  • Dictionary of sink nodes and corresponding direction

Get weight value

Input

  • Genome Graph - output of #1
  • CSV containing 2 columns: name of segment and weight value

Output

  • Genome Graph containing weight value as indicated in CSV file

Find Longest Linear Path

Input

  • Simple Graph - output of #2
  • Weight Array (optional)
  • Remove internal cycle (optional): TRUE/FALSE

Output

  • Longest linear path in P-line format

Output FASTA

Input

  • A linear path
  • Original genome graph (all output from read gfa)
  • Path to output (default: current working directory)

Output

  • a FASTA file at working directory

FASTA structure: There are two types of line in the FASTA file

>linear_path
ATGCATGCATGC

Direction of a DNA strand

  • If the strand is +, take the DNA as in the sequence of the S-line
  • If the strand is -, take the reverse complement DNA of the sequence in the S-line

See the last example at http://gfa-spec.github.io/GFA-spec/GFA1.html

Find the longest path containing given paths

Input

  • Genome Graph - output #1
  • Array containing list of paths in the GFA

Output

  • Longest paths having those paths in the input

Some notice:

  • We should modify the read GFA function to touch the P-line.
  • The paths in the input array might belong to different graph components (simple graph or cycle). We have to classify those paths to certain graph component before performing the longest path.

Find Longest Linear Path with Known Terminus

Input

  • Simple Graph - output of #2
  • Start node and its direction
  • End node and its direction
  • Weight value array (optional): default = EMPTY ARRAY
  • Remove internal cycles (optional): default = TRUE

Output

  • Longest linear path (same as P-line format) having those start and end nodes

In case weight value array is provided, raise warning if weight value array does not have the same size as the number of node in the simple graph.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.