pangenome / odgi Goto Github PK

View Code? Open in Web Editor NEW

191.0 191.0 39.0 25.28 MB

Optimized Dynamic Genome/Graph Implementation: understanding pangenome graphs

Home Page: https://doi.org/10.1093/bioinformatics/btac308

License: MIT License

C++ 96.50% CMake 1.76% Shell 0.94% Python 0.15% C 0.29% R 0.06% Scheme 0.19% Dockerfile 0.06% Nix 0.06%

odgi's People

Contributors

Stargazers

Watchers

odgi's Issues

Special case for bin_width=1 sequence output

At --bin_width=1 the json produced looks ridiculous. For supporting Schematize, this is going to be a common use case. Everytime we load in a graph with odgi we'll ask for bin width = 1, 10, 100, 1000, 10000, 100000 at minimum.

Currently, the output would look like {id=1, seq='C'}{id=2, seq=G}{id=3, seq='T'}. That's 43 characters for "CGT". In this special case, index in the string +1 is exactly the same as bin id, so they don't need to be listed.

New output should be:
{pangenome_sequence="CGTACGTACGTACGTACTACTCAGCTAGCTAGCTACGTCGAGTCTTACTCTAGATC"} {path_name="ATH17...
No bin declarations should be included. We'll also make a special case for bin_width=1 in schematize to look for the unique key "pangenome_sequence" which should only occur once in the file. Besides this and the lack of bin declarations, the rest of the file can be the same including the bin_ids in the path traversals and links.

specification of path ordering in odgi viz

Hi guys,
I find it useful if one could force the placement of the same chromosomes of different genomes below each other in 'odgi viz' to better compare them (I assume I can get this behavior by labeling paths Chr1_sample1, Chr1_sample2... with -D_, but typically the sample name comes first). One simple solution would be a file as parameter that simply includes the desired ordering of path names. Would that be possible to implement?

Visulization of vg file from Progressive Cactus Pangenome Pipeline

Hello,

I created a pangenome for the Barn swallow with the Cactus Pangenome Pipeline (https://github.com/ComparativeGenomicsToolkit/cactus/blob/master/doc/pangenome.md). I Aligned the chromosome-level assembly for the species with 5 assemblies of 5 other barn swallow individuals assembled with Hifiasm starting from PacBio HiFi reads (contig-level assemblies). In the Cactus pipeline, the chromosome-level assembly was choosen as the reference.

The final file from the Pangenome pipeline is a .vg file, that I have converted to .gfa with vgtools:

vg view --threads 32 Hirundo.vg > Hirundo.gfa

Then I coverted it in odgi format and sort it:

odgi build -g Hirundo.gfa -o - -p | odgi sort --threads=32 -i - -o Hirundo_sorted.og -p bSnSnS

Finally, I visualized it with odgi viz:

odgi viz -i Hirundo_sorted.og -o Hirundo_sorted.og.png -x 1920 -y 1080 -t 32

However, I can't understand the final image (below).. do you have any suggestions on how to sort my pangenome and visualize the different chromosomes of the reference and their corresponding aligned contigs from the other 5 individals (for example adding the chr names)?

Many thanks,

Simona

/odgi/src/odgi.cpp:565: uint64_t odgi::graph_t::edge_delta_to_id(uint64_t, uint64_t) const: Assertion `delta != 0' failed

This is coming when accessing a larger Odgi file through SpOdgi.

Sorry for the sparse details. I have no idea where this is comeing from.

AsciiDoc and Manpages for odgi

This will help to document what is actually possible with odgi.
I will create one document for odgi itself, listing all subcommands, its general purpose and related tools maybe. There will be a manpage for each subcommand, too.

Installation problem with GOMP_parallel

Hi Erik,

I was trying to get odgi installed but run into problems with GOMP_parallel.

Exactly at the following step -
[ 75%] Linking CXX executable /global/scratch2/Software/odgi/build/bsort-prefix/src/bsort/bin/bsort
/global/scratch/miniconda3/bin/../lib/gcc/x86_64-conda_cos6-linux-gnu/7.3.0/../../../../x86_64-conda_cos6-linux-gnu/bin/ld: /global/scratch2/Software/odgi/build/bsort-prefix/src/bsort/lib/libbsort.a(bsort.cpp.o): in function bsort::radixify(unsigned char*, long, long, long, long, long, long, long, long, long)': bsort.cpp:(.text+0xc2b): undefined reference to GOMP_parallel'

It's the same problem with seqwish too. I have conda installations of gcc, boost (xgboost) and openmp. Can you please let me know what I am missing here?

Cheers,
Rohit

viz should represent relative orientation

This is a pretty simple one. I need to come up with a visual motif that indicates which relative orientation a path has in a given region.

I think the only thing that is reliably available at all path rendering widths will be a lighter/darker color for the given path steps.

Playing around with 12 Arabidopsis thaliana genomes.

These are the 12 Arabidopsis thaliana genomes from @ChriKub I run through odgi sort with options bnSn, no_K and R in odgi viz.

The data can be found on pg2: /home/ubuntu/sh/ath_christian_kubica/chrk_12samples_odgi_1D_sort_bnSn_no_K_R. @tpook92

What I am confused about is that one can get the impression, that there is no real distinction between the sequences of the 5 chromosomes.... All chromosomes' sequences are below each other. I would have expected that they would consecutively build up to the right as in #71 (comment).

@ekg @ChriKub @josiahseaman @6br

Show nucleotide position in each bin for each path

In order to really make odgi and dependent visualizations be able to inter-operate with the rest of the Bioinformatics world, we need nucleotide positions for any feature, which will be different for each path. Here's what I propose:

Find code in odgi bin that is generating mean_pos
Find the total length of each path
Add center_nucleotide to Bin[3]
Output center_nucleotide in bin output json
Introduce JSON format version number so we can detect data incompatibility in the future

Can we use odgi to analyze graphs made from assemblers

I was wondering if we could also use odgi to analyze graphs made by assemblers. So far it seems that it can only handle files made by seqwish as odgi build throws the following error if I give it something else:

terminate called after throwing an instance of 'std::invalid_argument'
  what():  stol
Aborted (core dumped)

Please make a release

Hi @ekg ,

following up the discussion at pangenome/spodgi#3, could you please make a release or pre-release of odgi?
Thanks!

Best,
Simon

please add `odgi version` or `odgi --version`

Thanks!

Can the graph_t handle paths with multiple occurrences of the same node?

As far as I can tell, the implementation of a step is (node_id, path_id), correct? Won't this create a fundamental ambiguity if a path traverses a node multiple times (e.g. in a CNV)? I think it might be worth also spending some of the bits on the path_id portion to encode that the step is the n-th occurrence of that node on that path.

Links don't get reseted for each new path

The links don't seem to get reseted for each new path. The number of links per path successively increases, in our example of yeast genomes:
indiv1.chrI 956
...
indiv2.chrI 6990
...
indiv3.chrI 13278

If you need more info, let me know

odgi build - make -o and --out mandatory arguments

odgi build runs through with just specifying the input graph. But it does not store it anywhere nor does it print to std::cout. I would suggest to make the -o / --out a mandatory argument @ekg ?

viz testing B1phi1 data

Input was run1.B1phi1.i1.seqwish.gfa.tar.gz.
Today's master of odgi.
Today's master of component_segmentation.
Today's master of Schematize.

Then I executed

#!/bin/bash

ODGI=~/software/odgi/git/master/bin/odgi
GFA=run1.B1phi1.i1.seqwish.gfa
OG=${GFA%.gfa}.og
SOG=${GFA%.gfa}.sorted.og


## Build the sparse matrix form of the gfa graph
echo "### odgi build"
BLDPREF=${0%.sh}_01_build
/usr/bin/time -v -o ${BLDPREF}.time \
ionice -c2 -n7 \
$ODGI build \
--progress \
--gfa=$GFA \
--out=$OG \
> ${BLDPREF}.log 2>&1

## Sort paths by 1D sorting
echo "### odgi sort"
SRTPREF=${0%.sh}_02_sort
/usr/bin/time -v -o ${SRTPREF}.time \
ionice -c2 -n7 \
$ODGI sort \
--pipeline="bSnSnS" \
--sgd-use-paths \
--paths-max \
--progress \
--idx=$OG \
--out=$SOG \
> ${SRTPREF}.log 2>&1

echo "### odgi bin"
for w in 100 1000 10000 100000; do
	BIN=${GFA%.gfa}.w${w}.json
	BINPREF=${0%.sh}_04_bin_w${w}
	/usr/bin/time -v -o ${BINPREF}.time \
	ionice -c2 -n7 \
	$ODGI bin \
	--json \
	--idx=$SOG \
	--bin-width=${w} \
	1> $BIN \
	2> ${BINPREF}.log &
done

on our VM pantograph2. If you want access and fiddle around their directly, just tell me @ekg
Next step: graph segmentation into components:
python component_segmentation/matrixcomponent/segmentation.py -j ~/sh/11_odgi/run1.B1phi1.i1.seqwish.w100.json -o ~/sh/11_odgi/
and plugged the resulting file into our Schematize React application.
Screenshot with first bin:

Screenshot without first bin:

If you need anything else, I'd be happy to help.

Output "Zoom Stack" list of bin widths

Currently we use a script which invokes odgi bin multiple times with different bin widths. I believe this should be a single invocation that accepts a list of widths instead. This particularly becomes relevant in the case where we're outputting sequence (at bin_width=1) and we want to reuse that same sequence file at every bin_width level. Sequence and bin declarations should be excluded in every bin_width json file because a single FASTA file covers sequence needs for all sizes.

argument accepts list instead of single number
internal invocation loop for each bin_width
add "fasta_file" key to json header

Currently the script looks like this:

echo "### odgi bin"
for w in 1 10 100 1000; do
	BIN=${GFA%.gfa}.w${w}.json
	BINPREF=${0%.sh}_04_bin_w${w}
	/usr/bin/time -v -o ${BINPREF}.time \
	ionice -c2 -n7 \
	$ODGI bin \
	--json \
	--idx=$SOG \
	--bin-width=${w} \
	--fasta $FASTA \
	1> $BIN \
	2> ${BINPREF}.log &
done

It would change to listing the prefix in $BIN and then automatically appending w${w}.json to the end of each file created. Since this is one invocation, it would make one .time file. This may also speed up by eliminating the need to reload the --idx each time?

Question: Does this proposal disrupt any features related to the stdout pipe? Where is that used and is there a multi-file solution that still allows usage of the stdout pipe?

Follow on to #88.

path guided linear 1D SGD sort - edge cases

The following graphs break the sorting algorithm because each path walks through exactly one node. We may have to address this at some point.
DRB5-3127.gfa.txt
V-352962.gfa.txt

Bring ODGI to Bioconda

We are trying to bring ODGI to Bioconda. The latest problem indicated in bioconda/bioconda-recipes#18743 was that ODGI did not build because of it's sonLib dependency.
@ekg mentioned that it is not needed. Therefore, I will remove this dependency.

odgi bin: ranges should indicate strandedness

odgi bin: Implement reverse complement ranges as [10,1] (forward orientation would be [1,10]. For each node we may have to come up with a new range, the moment we switch orientation.

Non-contiguous node IDs for purging junk data

In our SARS-CoV-2 use case, we have found a small number of individuals with >10kbp private insertions. We'd like to generate a list of these and exclude them from later steps. Currently, this involves excluding them from seqwish and regenerating everything: seqwish -> odgi sort -> odgi bin -> component_segmentation. It's taking around 5 hours roundtrip.

Would it be possible to output from odgi a sorted GFA file? From here we could use regex / programs to delete content but leave the same sort order. This would mean node ids would not be contiguous, but bin ids would be, because we'd re-run odgi bin and downstream. This is likely related to #30 where it sounds like non-contiguous node ids are tolerated, but slow.

Build fails with clang

I tried to build this on a Mac with clang, but it won't complete. The build fails while compiling bsort with a complaint about the -fopenmp flag.

[ 25%] Performing build step for 'bsort'
[ 25%] Building CXX object CMakeFiles/bsort.dir/src/bsort.cpp.o
clang: error: unsupported option '-fopenmp'
make[5]: *** [CMakeFiles/bsort.dir/src/bsort.cpp.o] Error 1
make[4]: *** [CMakeFiles/bsort.dir/all] Error 2
make[3]: *** [all] Error 2
make[2]: *** [bsort-prefix/src/bsort-stamp/bsort-build] Error 2
make[1]: *** [CMakeFiles/bsort.dir/all] Error 2
make: *** [all] Error 2

path guided linear 1D SGD sorting

The linear 1D SGD sorting in odgi sort precalculates all pairwise node distances, so called terms. This means we have a quadratic memory usage. Having large graphs with billions of nodes, this won't scale.
Also, randomly picking nodes from a normal distribution for the current iteration might not be the ideal way. @ekg Proposed to draw the second node from a Zipfian distribution. Or we can try out any other distribution.

We need to replace the const PathHandleGraph& graph input at https://github.com/vgteam/odgi/blob/39f467afa173d64e03eae85d27d818150eeb8ce1/src/algorithms/linear_sgd.cpp#L6 with the path index implemented in https://github.com/vgteam/odgi/blob/master/src/algorithms/xp.cpp.
This means functions like index.for_each_handle will have to be rewritten in xp.
We need to calculate each term on the fly. Here the first node will be picked via normal distribution and the second node via Zipfian distribution. We should also try to hit the same path again maybe? Here a lot of tweaks and variations are possible.
How can we sort graphs with no paths? For each disconnected graph, we need to find a random way to generate paths.

odgi build error with GFA graph

I'm getting an error with one of my graphs. I'm trying to build from GFA and am getting the following error:

odgi build -t 32 -P -g barley_pangenome_graph_2H.gfa -o barley_pangenome_graph_2H.gfa.og
[odgi::gfa_to_handle] building nodes: 73.86% @ 1.99e+06/s elapsed: 00:00:00:53 remain: 00:00:00:19odgi: /smoothxg/deps/odgi/src/odgi.cpp:507: virtual handlegraph::handle_t odgi::graph_t::create_handle(const string&, const nid_t&): Assertion `!has_node(id)' failed.
srun: error: node-9: task 0: Aborted (core dumped)

The original graph has consensus paths in it, so I removed them. I then converted from VG to GFA.

Is there a way I can process the graph either from the VG or GFA to clean it up and remove what appears to be a node ID causing a particular error. I don't think its a memory issue as I checked for any killed processes on the compute node, and there were none. I also have 126GB RAM on the node, and the GFA file is only 25GB is size.

odgi bin should output detailed nucleotide positions for each path

Currently, for each bin, for each path, we only give the range of nucleotides for the specific bin.
Which could be 20-30000. But positions 45-123 could be in another bin. To have a more detailed positioning, odgi bin should output these ranges for each bin in a JSON array instead of giving a summarized range.
@JervenBolleman @josiahseaman

dynamic path positions

I want to make odgi into MutablePathPositionHandleGraph. This will help with a ton of things, including graph sorting.

It should be possible to maintain path positions through editing operations. But it will be tricky and will require tests.

ODGI Bin: most links are useless

Overview

We want links to show us the location of non-linear segments in the Matrix.
A link spanning a gap in the Matrix is not useful.
>95% of links span gaps in the Matrix with no intervening bins.

Discussion

It'd be best not to generate "links" when there are no intervening populated nodes in that path. So if the path resumes 100 nodes later, and the individual does not have any coverage of the intervening 100 nodes, then there's no reason to mention a link. In MSA, it's understood that you keep reading past the gaps, the gaps aren't really "there" in the sequence. I've found that >90% of links I rendered are not necessary based on this rule. It will likely make the output files significantly smaller.

If you list, not node_id but node.sort_order, you just check if the numbers are consecutive or not. Rearrangements would be a non-consecutive jump in the sort_order. Whereas dummy link gaps would just be non-contiguous, but still sorted consecutive integers.

Example of two unnecessary links

we need binary tests

odgi has grown so much, we should consider adding binary tests. At least one test case for one subcommand.

Out of source build drops artifacts in source tree

I checked out ODGI and did an out of source tree build, as is recommended for cmake projects:

mkdir build
cd build
cmake ..
make -j8

I ended up with the odgi binary in bin under the root of the project, not in build/bin where it should have ended up.

viz should display path names

This is tricky. We want to be able to see the path names, but this requires directly writing into the PNG image. An alternative is to pull in cairo and write vector graphics over the image. That's also tricky, and introduces more (very complicated and oldschool) dependencies. I'll look into alternatives.

odgi build error

Hi,
When I build the latest version v0.4.1 odgi, it occurs the following error：

In file included from /home/cuixb/tools/biosoft/odgi/deps/libhandlegraph/src/handle.cpp:1:0:
/home/cuixb/tools/biosoft/odgi/deps/libhandlegraph/src/include/handlegraph/handle_graph.hpp: In member function ‘bool handlegraph::HandleGraph::for_each_edge(const Iteratee&, bool) const’:
/home/cuixb/tools/biosoft/odgi/deps/libhandlegraph/src/include/handlegraph/handle_graph.hpp:243:65: error: expected primary-expression before ‘)’ token
     return for_each_handle((std::function<bool(const handle_t&)>)[&](const handle_t& handle) -> bool {
                                                                 ^
/home/cuixb/tools/biosoft/odgi/deps/libhandlegraph/src/include/handlegraph/handle_graph.hpp:243:68: error: expected primary-expression before ‘]’ token
     return for_each_handle((std::function<bool(const handle_t&)>)[&](const handle_t& handle) -> bool {
                                                                    ^
/home/cuixb/tools/biosoft/odgi/deps/libhandlegraph/src/include/handlegraph/handle_graph.hpp:243:70: error: expected primary-expression before ‘const’
     return for_each_handle((std::function<bool(const handle_t&)>)[&](const handle_t& handle) -> bool {
                                                                      ^
/home/cuixb/tools/biosoft/odgi/deps/libhandlegraph/src/include/handlegraph/handle_graph.hpp:243:97: error: expected unqualified-id before ‘bool’
     return for_each_handle((std::function<bool(const handle_t&)>)[&](const handle_t& handle) -> bool {
gmake[5]: *** [CMakeFiles/handlegraph_objs.dir/src/handle.cpp.o] Error 1
gmake[4]: *** [CMakeFiles/handlegraph_objs.dir/all] Error 2
gmake[3]: *** [all] Error 2
gmake[2]: *** [handlegraph-prefix/src/handlegraph-stamp/handlegraph-build] Error 2
gmake[1]: *** [CMakeFiles/handlegraph.dir/all] Error 2

And my complier version:

-- The C compiler identification is GNU 4.8.5
-- The CXX compiler identification is GNU 4.8.5
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Found OpenMP_C: -fopenmp (found version "3.1")
-- Found OpenMP_CXX: -fopenmp (found version "3.1")
-- Found OpenMP: TRUE (found version "3.1")
-- Found PythonInterp: /home/cuixb/tools/biosoft/conda3/envs/vg/bin/python3.8 (found version "3.8.5")
-- Found PythonLibs: /home/cuixb/tools/biosoft/conda3/envs/vg/lib/libpython3.8.so

So, how to solve these error? Thank you!

Request for new 0.3 release and bump for Bioconda recipe

There are command line changes to odgi since the 0.2 release tag, the basis for the most recent Bioconda recipe.

Might it be possible to cut a new 0.3 release tag and bump the Bioconda recipe?

    container "heuermh/odgi-dev:latest"
    """
    odgi build -g $graph -o - \
      | odgi prune -i - -b 3 -o - \
      | odgi view -i - -g >${sample}.odgi-prune.b3.gfa
    """

    /*
    container "quay.io/biocontainers/odgi:0.2--py37h8b12597_0"
    """
    odgi build -g $graph -o - \
      | odgi prune -k 16 -i - -o - \
      | odgi view -i - -g >${sample}.odgi-prune.b3.gfa
    """
    */

Thank you in advance!

odgi build integrates sequence that is not present in any path

When I took a look at the odgi bin output of the t.gfa, I realized that for some bins in the middle, the coverage is not always 1.0. This seems to happen, because in the GFA, some sequences, that are not traversed by the path are still added to the graph as nodes and therefore later to the pangenome.
I would expect, this is not a desired behaviour @ekg ?

Header-only libraries still need to be installed

https://github.com/vgteam/dg/blob/43e5527253f6a16f1fb0e7bb21124eabec24cd57/CMakeLists.txt#L62

You need to do something like this https://stackoverflow.com/a/21223763 and have an INSTALL_COMMAND to actually put the headers in place I think.

Compiling with Clang 12.0

Hi,

I am trying to compile odgi with Apple Clang 12.0 and I get this error:

cgroza@mallow odgi % cmake -H. -Bbuild && cmake --build build -- -j 3
-- pybind11 v2.5.dev1
-- Configuring done
-- Generating done
-- Build files have been written to: /Users/cgroza/git/odgi/build
[  3%] Built target sdsl-lite
[  6%] Built target mondriaan
[  9%] Built target sgd2
[ 12%] Built target structures
[ 15%] Built target xoshiro
[ 18%] Built target lodepng
[ 21%] Built target bbhash
[ 23%] Built target random_distributions
[ 26%] Built target picosha256
[ 28%] Built target mmmulti
[ 31%] Built target intervaltree
[ 33%] Built target cgranges
[ 36%] Built target tayweeargs
[ 38%] Built target sparsepp
[ 41%] Built target ska
[ 44%] Built target httplib
[ 47%] Built target libbf
[ 49%] Built target ips4o
[ 52%] Built target hopscotch_map
[ 54%] Built target handlegraph
[ 57%] Built target dirtyzipf
[ 61%] Built target gfakluge
[ 63%] Built target dynamic
[ 66%] Built target atomicqueue
[ 68%] Built target atomicbitvector
[ 68%] Building CXX object CMakeFiles/odgi_objs.dir/src/subcommand/sort_main.cpp.o
[ 69%] Building CXX object CMakeFiles/odgi_objs.dir/src/subcommand/view_main.cpp.o
[ 69%] Building CXX object CMakeFiles/odgi_objs.dir/src/subcommand/kmers_main.cpp.o
clang: warning: argument unused during compilation: '-L/opt/local/lib/libomp' [-Wunused-command-line-argument]
clang: warning: argument unused during compilation: '-L/usr/local/lib' [-Wunused-command-line-argument]
clang: warning: argument unused during compilation: '-L/opt/local/lib/libomp' [-Wunused-command-line-argument]
clang: warning: argument unused during compilation: '-L/usr/local/lib' [-Wunused-command-line-argument]clang
: warning: argument unused during compilation: '-L/opt/local/lib/libomp' [-Wunused-command-line-argument]
clang: warning: argument unused during compilation: '-L/usr/local/lib' [-Wunused-command-line-argument]
In file included from /Users/cgroza/git/odgi/src/subcommand/sort_main.cpp:13:
In file included from /Users/cgroza/git/odgi/src/algorithms/linear_sgd.hpp:16:
In file included from /Users/cgroza/git/odgi/build/libbf-prefix/include/bf/all.hpp:7:
/Users/cgroza/git/odgi/build/libbf-prefix/include/bf/bloom_filter/counting.hpp:29:3: warning: explicitly defaulted move constructor is implicitly deleted
      [-Wdefaulted-function-deleted]
  counting_bloom_filter(counting_bloom_filter&&) = default;
  ^
/Users/cgroza/git/odgi/build/libbf-prefix/include/bf/bloom_filter/counting.hpp:14:31: note: move constructor of 'counting_bloom_filter' is implicitly deleted because base class
      'bf::bloom_filter' has a deleted move constructor
class counting_bloom_filter : public bloom_filter
                              ^
/Users/cgroza/git/odgi/build/libbf-prefix/include/bf/bloom_filter.hpp:11:3: note: 'bloom_filter' has been explicitly marked deleted here
  bloom_filter(bloom_filter const&) = delete;
  ^
/Users/cgroza/git/odgi/src/subcommand/sort_main.cpp:165:43: error: no matching function for call to 'max'
                    max_path_step_count = std::max(max_path_step_count, path_index.get_path_step_count(path));
                                          ^~~~~~~~
/Library/Developer/CommandLineTools/usr/bin/../include/c++/v1/algorithm:2633:1: note: candidate template ignored: deduced conflicting types for parameter '_Tp'
      ('unsigned long long' vs. 'unsigned long')
max(const _Tp& __a, const _Tp& __b)
^
/Library/Developer/CommandLineTools/usr/bin/../include/c++/v1/algorithm:2644:1: note: candidate template ignored: could not match 'initializer_list<type-parameter-0-0>' against
      'unsigned long long'
max(initializer_list<_Tp> __t, _Compare __comp)
^
/Library/Developer/CommandLineTools/usr/bin/../include/c++/v1/algorithm:2624:1: note: candidate function template not viable: requires 3 arguments, but 2 were provided
max(const _Tp& __a, const _Tp& __b, _Compare __comp)
^
/Library/Developer/CommandLineTools/usr/bin/../include/c++/v1/algorithm:2653:1: note: candidate function template not viable: requires single argument '__t', but 2 arguments were
      provided
max(initializer_list<_Tp> __t)
^
1 warning and 1 error generated.
make[2]: *** [CMakeFiles/odgi_objs.dir/src/subcommand/sort_main.cpp.o] Error 1
make[2]: *** Waiting for unfinished jobs....
make[1]: *** [CMakeFiles/odgi_objs.dir/all] Error 2
make: *** [all] Error 2
cgroza@mallow odgi %

Is compilation with Clang on OS X supported or I need special steps to compile odgi on Mac?

Thank you
Cristian

Constructing graph with ID that doesn't start at 1 can be very slow

The bottleneck appear to be this loop:
https://github.com/vgteam/odgi/blob/master/src/odgi.cpp#L432

I think it's hurting for a min ID offset to avoid this blow-up.

Bioconda Bump

Hi @dpryan79,

it seems the autobump of Bioconda still does not work? The current release of odgi is 0.4.1 but I am seeing 0.3.
Could you please trigger a Bioconda build? Thanks!

odgi bin does not seem to use odgi lib even when compiled non statically

In the quest to make smaller docker images I realized that the odgi binary does not depend on the compiled library.

Using the docker branch docker commit e9c6a59

docker build -t test/odgi
docker run -it -entrypoint=bash test/odgi 
odgi test
rm /opt/odgi/lib/libodgi.so
odgi test

Both test cases pass.
the docker builds with

-DBUILD_STATIC=0

Which leads to an assumption that the odgi library would use the shared libs. But it doesn't seem to do so in practice.

Feature: Output Pangenome sequence

Would you be so kind as to modify ODGI bin at 1bp bins to output a pangenome sequence? It'd be the concatenation of all node sequences in the order that you already sorted them. A single long string.

#Python pseudocode for pangenome matrix sequence
with open(bin_output_file, 'w') as out:
	out.write(regular_bin_output(my_sort_order))
	if bin_size == 1:
		out.write("'Pangenome Sequence':")
		pangenome = ''.join(node.seq for node in my_sort_order)
		out.write(pangenome)

This could potentially be triggered at every bin level or only bin_size = 1bp. Sequence length with real data tend to be 120% the size of the starting genome, so 100s of MB, not 100s of GB. Perhaps a command line flag? --emit_sequence

Automatically optimize a graph where necessary instead of throwing an error.

This is motivated by #282. In order to not confuse users, we should just automatically optimize each non-optimized input graph instead of throwing a confusing error.

use LMF for sorting

It's been difficult to use topological sorting to get reasonable node orders. It seems that the algorithm has trouble distinguishing tips from actual chromosome starts. An alternative approach would be to sort on the basis of chromosome, which is something that should be implied directly by the graph structures.

An improvement would be to sort the graph using some kind of matrix block diagonal form of its adjacency matrix. Given these matrices are very sparse for genome graphs, we should probably use a method adapted to sparse matrices with good scaling properties. The best-looking implementation I've found is LMF. Perhaps there are other, newer ones.

please include Buildable Source Tarballs in the releases

That is, with the entire contents of deps/ with all git-submodules checked out

Thanks!

Build error -- When building on centos with cmake 3.15.3

I get a compile error of :

/public/home/nheyer/odgi/src/odgi.hpp:446:36: error: too many initializers for 'handlegraph::step_handle_t'
/public/home/nheyer/odgi/src/odgi.hpp:447:35: error: array must be initialized with a brace-enclosed initializer
         step_handle_t last = {0, 0};

This error dosn't occur on my local computer on linux mint

'rdseed failed' error when using '-p Y' in odgi sort

I got the following error when trying to sort a graph made with minimap2+seqwish:

terminate called after throwing an instance of 'std::runtime_error'
  what():  random_device: rdseed failed
Aborted (core dumped)

Have you encountered this error before?

I was running the following pipe:

odgi build -g input.gfa -o - -p \
    | odgi chop -i - -o - -c 16 | odgi sort -t 4 -i - -o - -O \
    | odgi sort -t 4 -p Ygs -G 0.1 -i - -o - -P \
    | odgi view -i - -g > output.gfa

I narrowed the error down to the second sort: odgi sort -t 4 -p Ygs -G 0.1 -i - -o - -P. More specifically I only get the error when including Y in the -p pipeline parameter.

My input graph is in this vg-team s3 bucket for now at s3://vg-k8s/vgamb/chr20/seqwish/asm20-dropl10000-k256-l256/lc2019_12ont-hg38.seqwish.asm20-dropl10000-k256-l256.gfa.gz

replacing path "overlap" specifiers with a single *

This is now part of the GFA spec. It's just smaller, and seems fine by me as long as various parsers (e.g. in vg) can handle it.

--out flag doesn't work with matrix or bin

The --out flag worked for me with the build subcommand, but not with the matrix or bin subcommands. Those subcommands wrote the output to stdout, whether or not I specified --out.

Make sequence field optional in odgi bin when output via --json

viz testing of 12 Ath. thaliana genomes

Executing the following code on a given Ath. thaliana GFA with 12 genomes a 5 chromosomes I get the following output:

#!/bin/bash
ODGI=~/software/odgi/git/master/bin/odgi
GFA=sebastian.Athaliana.all.50000.gfa
OG=${GFA%.gfa}.og
SOG=${GFA%.gfa}.sorted.og

## Build the sparse matrix form of the gfa graph
echo "### odgi build"
BLDPREF=${0%.sh}_01_build
/usr/bin/time -v -o ${BLDPREF}.time \
ionice -c2 -n7 \
$ODGI build \
--progress \
--gfa=$GFA \
--out=$OG \
> ${BLDPREF}.log 2>&1
#--sort \

## Sort paths by 1D sorting
echo "### odgi sort"
SRTPREF=${0%.sh}_02_sort
/usr/bin/time -v -o ${SRTPREF}.time \
ionice -c2 -n7 \
$ODGI sort \
--pipeline="bSnSnS" \
--sgd-use-paths \
--paths-max \
--progress \
--idx=$OG \
--out=$SOG \
--threads="20"\
> ${SRTPREF}.log 2>&1

~/software/odgi/git/master/bin/odgi viz -i sebastian.Athaliana.all.50000.sorted.og -o sebastian.Athaliana.all.50000.sorted.og.png -x 5000 -y 1000 -P 10

Windows support?

Is this supported for windows?
I get this error
Severity Code Description Project File Line Suppression State
Error C2466 cannot allocate an array of constant size 0 [C:\corona\odgi\build\mondriaan-prefix\src\mondriaan-build\mondriaan_objs.vcxproj] mondriaan C:\corona\odgi\deps\mondriaan\src\Heap.c 192

mutex on every node

By adding a mutex to every node, it should be possible to implement distributed editing of the graph. This in turn could be used to speed up builds, by making them multithreaded.

pangenome / odgi Goto Github PK

odgi's People

Contributors

Stargazers

Watchers

Forkers

odgi's Issues

Overview

Discussion

Recommend Projects

Recommend Topics

Recommend Org