Giter Club home page Giter Club logo

ggcat's People

Contributors

alexandrutomescu avatar guilucand avatar jermp avatar jnalanko avatar rob-p avatar sebschmi avatar snystrom avatar tmaklin avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

ggcat's Issues

Graph format issue (?)

Hello,

I'm trying a few tests with ggcat and I'm having an issue at query time.
Here are the used commands:

ggcat build -c -l fof -k 25 -s 1 -o index_first_3_humans

With fof linked to these files downloaded from your zenodo repository :

HG00096.fa
HG00097.fa
HG00099.fa

The computation takes 45 minutes and creates these files:

5313314765 Jan 16 15:54 index_first_3_humans
       181 Jan 16 15:38 index_first_3_humans.colors.dat
   2050740 Jan 16 15:54 index_first_3_humans.stats.log
       189 Jan 16 17:10 output.stats.log

I query the created graph with this command:

ggcat query --colors -k 25  -j 16  index_first_3_humans ../query_reads/head_D3_S1_L001_R1_001.fasta 

It ends quickly with

Thread panicked at location: /scratch/ppeterlo/ggcat/pipeline/common/io/src/sequences_reader.rs:82:21
Error message: Cannot recognize file type of 'index_first_3_humans'

Any idea ? Am I doing something wrong?
Thanks !
Pierre

Deadlock during construction?

Hello, I'm calling ggcat through the C++ API, but it sometimes hangs during construction. Below is the stack trace printed by gdb when I attach to the process during the hang.

#0  syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
#1  0x00005588d321c037 in parking_lot::condvar::Condvar::wait_until_internal ()
#2  0x00005588d41ae05c in parallel_processor::execution_manager::execution_context::ExecutionContext::wait_for_completion ()
#3  0x00005588d419f4cf in ggcat_minimizer_bucketing::GenericMinimizerBucketing::do_bucketing ()
#4  0x00005588d3ec7cb6 in ggcat_assembler_minimizer_bucketing::__minimizer_bucketing_static ()
#5  0x00005588d32a9b8b in ggcat_assembler::run_assembler ()
#6  0x00005588d3397624 in ggcat_assembler::__run_assembler_static ()
#7  0x00005588d329d9bd in ggcat_api::GGCATInstance::build_graph ()
#8  0x00005588d31f3a7b in ggcat_cpp_bindings::ggcat_build ()
#9  0x00005588d31f0622 in cxx::unwind::prevent_unwind ()
#10 0x00005588d31e8827 in cxxbridge1$ggcat_build_from_files ()
#11 0x00005588d31dc3a2 in ggcat_build_from_files(GGCATInstanceFFI const&, rust::cxxbridge1::Slice<rust::cxxbridge1::String const>, rust::cxxbridge1::String, rust::cxxbridge1::Slice<rust::cxxbridge1::String const>, unsigned long, unsigned long, bool, unsigned long, bool, unsigned long, unsigned long) ()
#12 0x00005588d31dba54 in ggcat::GGCATInstance::build_graph_from_files(ggcat::Slice<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, unsigned long, unsigned long, bool, unsigned long, ggcat::ExtraElaborationStep, bool, ggcat::Slice<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, unsigned long) ()

Installation from scratch fail with file examples

Hi,

I installed (today) ggcat following the instructions provided in the README section, on a Linux machine, with installation of rust / nightly toolchain, cloning the sources and so on.

I tried to build the colored graph on the examples provided with a file color_mapping.in :

A1	sal1.fa
A2	sal2.fa
A3	sal3.fa

And running the command ggcat build -k 21 -c -d color_mapping.in gave me the following error :

Allocator initialized: mem: 2 GiB chunks: 8192 log2: 18
Using m: 10 with k: 21
Add index with color A1 => 0
Add index with color A2 => 1
Add index with color A3 => 2
Thread panicked at location: crates/io/src/sequences_stream/fasta.rs:15:14
Backtrace:    0: <unknown>
   1: <unknown>
   2: <unknown>
   3: <unknown>
   4: <unknown>
   5: <unknown>
   6: <unknown>
   7: <unknown>
   8: <unknown>
   9: <unknown>
  10: <unknown>
  11: <unknown>
  12: <unknown>
  13: <unknown>
  14: <unknown>
  15: <unknown>
  16: <unknown>
  17: __libc_start_call_main
             at ./csu/../sysdeps/nptl/libc_start_call_main.h:58:16
  18: __libc_start_main_impl
             at ./csu/../csu/libc-start.c:392:3
  19: <unknown>

I was able to replicate the error on separate Linux machines.

Error compiling using latest rust version

When compiling using the latest nightly build:

   Compiling ggcat_querier v0.1.1 (../ggcat/crates/querier)
error: to use a constant of type `TypeId` in a pattern, `TypeId` must be annotated with `#[derive(PartialEq, Eq)]`
   --> crates/assembler_minimizer_bucketing/src/lib.rs:212:1
    |
212 | / #[dynamic_dispatch(H = [
213 | |     hashes::cn_nthash::CanonicalNtHashIteratorFactory,
214 | |     #[cfg(not(feature = "devel-build"))] hashes::fw_nthash::ForwardNtHashIteratorFactory
215 | | ], CX = [
216 | |     #[cfg(not(feature = "devel-build"))] colors::bundles::multifile_building::ColorBundleMultifileBuilding,
217 | |     colors::non_colored::NonColoredManager,
218 | | ])]
    | |___^
    |
    = note: the traits must be derived, manual `impl`s are not sufficient
    = note: see https://doc.rust-lang.org/stable/std/marker/trait.StructuralEq.html for details
    = note: this error originates in the attribute macro `dynamic_dispatch` (in Nightly builds, run with -Z macro-backtrace for more info)

Also with older nighly builds it raises an error (2022-12-01, 2023-03-01, 2023-04-01), namely the following:

   Compiling ggcat_structs v0.1.0 (../tools/ggcat/ggcat/crates/structs)
error[E0635]: unknown feature `impl_trait_in_assoc_type`
 --> crates/hashes/src/lib.rs:3:12
  |
3 | #![feature(impl_trait_in_assoc_type)]
  |            ^^^^^^^^^^^^^^^^^^^^^^^^

Any way to solve this, or does a specific rust version allow it to compile successfully?

Thanks :)

Error message: index out of bounds: the len is 10833 but the index is 48737

I get the error message below. The command is ggcat build -c -k 31 -o coli3682.fna ~/data/coli3682_dataset/*, where ~/data/coli3682_dataset is this dataset: https://zenodo.org/record/6577997#.Y2TfEduxXRY

Allocator initialized: mem: 2 GiB chunks: 8192 log2: 18
Using m: 12 with k: 31
Started phase: reads bucketing prev stats: 
Elaborated 399801 sequences! [9978998363 | 99.74% qb] (1872[1878]/3682 => 50.84%)  ptime: 23.13s gtime: 23.14s
Temp buckets files size: 19.10 GiB
Finished phase: reads bucketing. phase duration: 43.75s gtime: 43.76s
Started phase: kmers merge prev stats: 
Found bucket with max size 269042 ==> .temp_files/bucket.908 // EXPECTED_SIZE: 145023 REAL_SIZE: 145027 SUB: 6
Found bucket with max size 190978 ==> .temp_files/bucket.908 // EXPECTED_SIZE: 144964 REAL_SIZE: 144963 SUB: 5
Found bucket with max size 303654 ==> .temp_files/bucket.908 // EXPECTED_SIZE: 145561 REAL_SIZE: 145559 SUB: 8
Found bucket with max size 189907 ==> .temp_files/bucket.908 // EXPECTED_SIZE: 145192 REAL_SIZE: 145189 SUB: 4
Found bucket with max size 281469 ==> .temp_files/bucket.908 // EXPECTED_SIZE: 146378 REAL_SIZE: 146375 SUB: 9
Found bucket with max size 191777 ==> .temp_files/bucket.908 // EXPECTED_SIZE: 146586 REAL_SIZE: 146587 SUB: 1
Found bucket with max size 215135 ==> .temp_files/bucket.908 // EXPECTED_SIZE: 145356 REAL_SIZE: 145353 SUB: 7
Found bucket with max size 193783 ==> .temp_files/bucket.908 // EXPECTED_SIZE: 145767 REAL_SIZE: 145769 SUB: 0
Found bucket with max size 206533 ==> .temp_files/bucket.908 // EXPECTED_SIZE: 145638 REAL_SIZE: 145641 SUB: 2
Found bucket with max size 326621 ==> .temp_files/bucket.908 // EXPECTED_SIZE: 146576 REAL_SIZE: 146575 SUB: 10
Found bucket with max size 201093 ==> .temp_files/bucket.908 // EXPECTED_SIZE: 146539 REAL_SIZE: 146542 SUB: 3
Found bucket with max size 232594 ==> .temp_files/bucket.944 // EXPECTED_SIZE: 144757 REAL_SIZE: 144755 SUB: 5
Found bucket with max size 245208 ==> .temp_files/bucket.944 // EXPECTED_SIZE: 144758 REAL_SIZE: 144754 SUB: 4
Found bucket with max size 188785 ==> .temp_files/bucket.944 // EXPECTED_SIZE: 145681 REAL_SIZE: 145683 SUB: 6
Found bucket with max size 513570 ==> .temp_files/bucket.944 // EXPECTED_SIZE: 144768 REAL_SIZE: 144775 SUB: 10
Found bucket with max size 277270 ==> .temp_files/bucket.944 // EXPECTED_SIZE: 144761 REAL_SIZE: 144755 SUB: 9
Found bucket with max size 244137 ==> .temp_files/bucket.944 // EXPECTED_SIZE: 145985 REAL_SIZE: 145986 SUB: 2
Found bucket with max size 193494 ==> .temp_files/bucket.944 // EXPECTED_SIZE: 146433 REAL_SIZE: 146430 SUB: 3
Found bucket with max size 230265 ==> .temp_files/bucket.944 // EXPECTED_SIZE: 146471 REAL_SIZE: 146473 SUB: 1
Found bucket with max size 216359 ==> .temp_files/bucket.944 // EXPECTED_SIZE: 144868 REAL_SIZE: 144870 SUB: 7
Found bucket with max size 261307 ==> .temp_files/bucket.944 // EXPECTED_SIZE: 144762 REAL_SIZE: 144761 SUB: 8
Found bucket with max size 168929 ==> .temp_files/bucket.944 // EXPECTED_SIZE: 145456 REAL_SIZE: 145458 SUB: 0
Found bucket with max size 233427 ==> .temp_files/bucket.406 // EXPECTED_SIZE: 34206 REAL_SIZE: 34211 SUB: 11
Found bucket with max size 214098 ==> .temp_files/bucket.406 // EXPECTED_SIZE: 145090 REAL_SIZE: 145083 SUB: 2
Found bucket with max size 148920 ==> .temp_files/bucket.406 // EXPECTED_SIZE: 145070 REAL_SIZE: 145065 SUB: 0
Found bucket with max size 256156 ==> .temp_files/bucket.406 // EXPECTED_SIZE: 145669 REAL_SIZE: 145666 SUB: 7
Found bucket with max size 296412 ==> .temp_files/bucket.406 // EXPECTED_SIZE: 145240 REAL_SIZE: 145246 SUB: 5
Found bucket with max size 309196 ==> .temp_files/bucket.406 // EXPECTED_SIZE: 145990 REAL_SIZE: 145985 SUB: 9
Found bucket with max size 178942 ==> .temp_files/bucket.406 // EXPECTED_SIZE: 145848 REAL_SIZE: 145845 SUB: 1
Found bucket with max size 309570 ==> .temp_files/bucket.406 // EXPECTED_SIZE: 146398 REAL_SIZE: 146411 SUB: 8
Found bucket with max size 383367 ==> .temp_files/bucket.406 // EXPECTED_SIZE: 145845 REAL_SIZE: 145844 SUB: 10
Found bucket with max size 244681 ==> .temp_files/bucket.22 // EXPECTED_SIZE: 145666 REAL_SIZE: 145665 SUB: 2
Found bucket with max size 235229 ==> .temp_files/bucket.22 // EXPECTED_SIZE: 145670 REAL_SIZE: 145673 SUB: 1
Found bucket with max size 246194 ==> .temp_files/bucket.22 // EXPECTED_SIZE: 146113 REAL_SIZE: 146111 SUB: 5
Found bucket with max size 231931 ==> .temp_files/bucket.22 // EXPECTED_SIZE: 146500 REAL_SIZE: 146504 SUB: 4
Found bucket with max size 249543 ==> .temp_files/bucket.22 // EXPECTED_SIZE: 146225 REAL_SIZE: 146231 SUB: 8
Found bucket with max size 522478 ==> .temp_files/bucket.22 // EXPECTED_SIZE: 145690 REAL_SIZE: 145685 SUB: 10
Found bucket with max size 187340 ==> .temp_files/bucket.22 // EXPECTED_SIZE: 145673 REAL_SIZE: 145674 SUB: 3
Found bucket with max size 227460 ==> .temp_files/bucket.22 // EXPECTED_SIZE: 146210 REAL_SIZE: 146207 SUB: 6
Found bucket with max size 185691 ==> .temp_files/bucket.22 // EXPECTED_SIZE: 146549 REAL_SIZE: 146550 SUB: 0
Found bucket with max size 326026 ==> .temp_files/bucket.22 // EXPECTED_SIZE: 145681 REAL_SIZE: 145677 SUB: 9
Found bucket with max size 229211 ==> .temp_files/bucket.10 // EXPECTED_SIZE: 29230 REAL_SIZE: 29217 SUB: 11
Found bucket with max size 176171 ==> .temp_files/bucket.10 // EXPECTED_SIZE: 145562 REAL_SIZE: 145558 SUB: 5
Found bucket with max size 262191 ==> .temp_files/bucket.10 // EXPECTED_SIZE: 145520 REAL_SIZE: 145523 SUB: 9
Found bucket with max size 174267 ==> .temp_files/bucket.10 // EXPECTED_SIZE: 146001 REAL_SIZE: 146009 SUB: 1
Found bucket with max size 196214 ==> .temp_files/bucket.10 // EXPECTED_SIZE: 145751 REAL_SIZE: 145747 SUB: 4
Found bucket with max size 226865 ==> .temp_files/bucket.10 // EXPECTED_SIZE: 145803 REAL_SIZE: 145810 SUB: 6
Found bucket with max size 245701 ==> .temp_files/bucket.10 // EXPECTED_SIZE: 145962 REAL_SIZE: 145967 SUB: 8
Found bucket with max size 263075 ==> .temp_files/bucket.10 // EXPECTED_SIZE: 146513 REAL_SIZE: 146516 SUB: 3
Found bucket with max size 243848 ==> .temp_files/bucket.10 // EXPECTED_SIZE: 146003 REAL_SIZE: 146002 SUB: 0
Found bucket with max size 390354 ==> .temp_files/bucket.10 // EXPECTED_SIZE: 145935 REAL_SIZE: 145934 SUB: 10
Found bucket with max size 161177 ==> .temp_files/bucket.10 // EXPECTED_SIZE: 146193 REAL_SIZE: 146190 SUB: 2
Found bucket with max size 183838 ==> .temp_files/bucket.112 // EXPECTED_SIZE: 23173 REAL_SIZE: 23180 SUB: 11
Found bucket with max size 197540 ==> .temp_files/bucket.112 // EXPECTED_SIZE: 145844 REAL_SIZE: 145840 SUB: 4
Found bucket with max size 285566 ==> .temp_files/bucket.112 // EXPECTED_SIZE: 145843 REAL_SIZE: 145840 SUB: 8
Found bucket with max size 313293 ==> .temp_files/bucket.112 // EXPECTED_SIZE: 146192 REAL_SIZE: 146196 SUB: 9
Found bucket with max size 174097 ==> .temp_files/bucket.112 // EXPECTED_SIZE: 146123 REAL_SIZE: 146126 SUB: 6
Found bucket with max size 249186 ==> .temp_files/bucket.112 // EXPECTED_SIZE: 145966 REAL_SIZE: 145965 SUB: 3
Found bucket with max size 182835 ==> .temp_files/bucket.112 // EXPECTED_SIZE: 146211 REAL_SIZE: 146248 SUB: 2
Found bucket with max size 277355 ==> .temp_files/bucket.112 // EXPECTED_SIZE: 146390 REAL_SIZE: 146326 SUB: 7
Found bucket with max size 244545 ==> .temp_files/bucket.112 // EXPECTED_SIZE: 146529 REAL_SIZE: 146534 SUB: 5
Found bucket with max size 166957 ==> .temp_files/bucket.112 // EXPECTED_SIZE: 146151 REAL_SIZE: 146150 SUB: 0
Found bucket with max size 418642 ==> .temp_files/bucket.112 // EXPECTED_SIZE: 146183 REAL_SIZE: 146200 SUB: 10

Thread panicked at location: /home/niklas/code/ggcat/pipeline/common/colors/src/managers/multiple.rs:312:37
Error message: index out of bounds: the len is 10833 but the index is 48737
Backtrace:    0: ggcat::main::{{closure}}
             at cmdline/src/main.rs:447:37
   1: <alloc::boxed::Box<F,A> as core::ops::function::Fn<Args>>::call
             at /rustc/bed4ad65bf7a1cef39e3d66b3670189581b3b073/library/alloc/src/boxed.rs:2001:9
      std::panicking::rust_panic_with_hook
             at /rustc/bed4ad65bf7a1cef39e3d66b3670189581b3b073/library/std/src/panicking.rs:692:13
   2: std::panicking::begin_panic_handler::{{closure}}
             at /rustc/bed4ad65bf7a1cef39e3d66b3670189581b3b073/library/std/src/panicking.rs:579:13
   3: std::sys_common::backtrace::__rust_end_short_backtrace
             at /rustc/bed4ad65bf7a1cef39e3d66b3670189581b3b073/library/std/src/sys_common/backtrace.rs:137:18
   4: rust_begin_unwind
             at /rustc/bed4ad65bf7a1cef39e3d66b3670189581b3b073/library/std/src/panicking.rs:575:5
   5: core::panicking::panic_fmt
             at /rustc/bed4ad65bf7a1cef39e3d66b3670189581b3b073/library/core/src/panicking.rs:65:14
   6: core::panicking::panic_bounds_check
             at /rustc/bed4ad65bf7a1cef39e3d66b3670189581b3b073/library/core/src/panicking.rs:151:5
   7: <colors::managers::multiple::MultipleColorsManager<H,MH> as colors::colors_manager::ColorsMergeManager<H,MH>>::process_colors
   8: <kmers_merge::final_executor::ParallelKmersMergeFinalExecutor<H,MH,CX> as kmers_transform::KmersTransformFinalExecutor<kmers_merge::ParallelKmersMergeFactory<H,MH,CX>>>::process_map
             at pipeline/assembler/kmers_merge/src/final_executor.rs:193:13
   9: <kmers_transform::processor::KmersTransformProcessor<F> as parallel_processor::execution_manager::executor::AsyncExecutor>::async_executor_main::{{closure}}
             at pipeline/common/kmers_transform/src/processor.rs:113:26
      <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
             at /rustc/bed4ad65bf7a1cef39e3d66b3670189581b3b073/library/core/src/future/mod.rs:91:19
      parallel_processor::execution_manager::thread_pool::ExecThreadPool::register_executors::{{closure}}::{{closure}}
             at libs/parallel-processor-rs/src/execution_manager/thread_pool.rs:78:25
      <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
             at /rustc/bed4ad65bf7a1cef39e3d66b3670189581b3b073/library/core/src/future/mod.rs:91:19
      parallel_processor::execution_manager::thread_pool::ExecThreadPool::register_executors::{{closure}}
             at libs/parallel-processor-rs/src/execution_manager/thread_pool.rs:83:17
      <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
             at /rustc/bed4ad65bf7a1cef39e3d66b3670189581b3b073/library/core/src/future/mod.rs:91:19
  10: tokio::runtime::task::core::CoreStage<T>::poll::{{closure}}
             at /home/niklas/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.21.2/src/runtime/task/core.rs:184:17
      tokio::loom::std::unsafe_cell::UnsafeCell<T>::with_mut
             at /home/niklas/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.21.2/src/loom/std/unsafe_cell.rs:14:9
      tokio::runtime::task::core::CoreStage<T>::poll
             at /home/niklas/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.21.2/src/runtime/task/core.rs:174:13
      tokio::runtime::task::harness::poll_future::{{closure}}
             at /home/niklas/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.21.2/src/runtime/task/harness.rs:480:19
      <core::panic::unwind_safe::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once
             at /rustc/bed4ad65bf7a1cef39e3d66b3670189581b3b073/library/core/src/panic/unwind_safe.rs:271:9
      std::panicking::try::do_call
             at /rustc/bed4ad65bf7a1cef39e3d66b3670189581b3b073/library/std/src/panicking.rs:483:40
      std::panicking::try
             at /rustc/bed4ad65bf7a1cef39e3d66b3670189581b3b073/library/std/src/panicking.rs:447:19
      std::panic::catch_unwind
             at /rustc/bed4ad65bf7a1cef39e3d66b3670189581b3b073/library/std/src/panic.rs:137:14
      tokio::runtime::task::harness::poll_future
             at /home/niklas/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.21.2/src/runtime/task/harness.rs:468:18
      tokio::runtime::task::harness::Harness<T,S>::poll_inner
             at /home/niklas/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.21.2/src/runtime/task/harness.rs:104:27
      tokio::runtime::task::harness::Harness<T,S>::poll
             at /home/niklas/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.21.2/src/runtime/task/harness.rs:57:15
  11: tokio::runtime::task::raw::RawTask::poll
             at /home/niklas/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.21.2/src/runtime/task/raw.rs:134:18
      tokio::runtime::task::LocalNotified<S>::run
             at /home/niklas/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.21.2/src/runtime/task/mod.rs:385:9
      tokio::runtime::scheduler::multi_thread::worker::Context::run_task::{{closure}}
             at /home/niklas/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.21.2/src/runtime/scheduler/multi_thread/worker.rs:421:13
      tokio::coop::with_budget::{{closure}}
             at /home/niklas/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.21.2/src/coop.rs:102:9
      std::thread::local::LocalKey<T>::try_with
             at /rustc/bed4ad65bf7a1cef39e3d66b3670189581b3b073/library/std/src/thread/local.rs:446:16
      std::thread::local::LocalKey<T>::with
             at /rustc/bed4ad65bf7a1cef39e3d66b3670189581b3b073/library/std/src/thread/local.rs:422:9
  12: tokio::coop::with_budget
             at /home/niklas/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.21.2/src/runtime/scheduler/multi_thread/worker.rs:420:9
      tokio::coop::budget
             at /home/niklas/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.21.2/src/coop.rs:72:5
      tokio::runtime::scheduler::multi_thread::worker::Context::run_task
             at /home/niklas/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.21.2/src/runtime/scheduler/multi_thread/worker.rs:420:9
  13: tokio::runtime::scheduler::multi_thread::worker::Context::run
             at /home/niklas/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.21.2/src/runtime/scheduler/multi_thread/worker.rs:387:24
  14: tokio::runtime::scheduler::multi_thread::worker::run::{{closure}}
             at /home/niklas/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.21.2/src/runtime/scheduler/multi_thread/worker.rs:372:17
      tokio::macros::scoped_tls::ScopedKey<T>::set
             at /home/niklas/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.21.2/src/macros/scoped_tls.rs:61:9
  15: tokio::runtime::scheduler::multi_thread::worker::run
             at /home/niklas/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.21.2/src/runtime/scheduler/multi_thread/worker.rs:369:5
  16: tokio::runtime::scheduler::multi_thread::worker::Launch::launch::{{closure}}
             at /home/niklas/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.21.2/src/runtime/scheduler/multi_thread/worker.rs:348:45
      <tokio::runtime::blocking::task::BlockingTask<T> as core::future::future::Future>::poll
             at /home/niklas/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.21.2/src/runtime/blocking/task.rs:42:21
      tokio::runtime::task::core::CoreStage<T>::poll::{{closure}}
             at /home/niklas/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.21.2/src/runtime/task/core.rs:184:17
      tokio::loom::std::unsafe_cell::UnsafeCell<T>::with_mut
             at /home/niklas/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.21.2/src/loom/std/unsafe_cell.rs:14:9
      tokio::runtime::task::core::CoreStage<T>::poll
             at /home/niklas/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.21.2/src/runtime/task/core.rs:174:13
  17: tokio::runtime::task::harness::poll_future::{{closure}}
             at /home/niklas/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.21.2/src/runtime/task/harness.rs:480:19
      <core::panic::unwind_safe::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once
             at /rustc/bed4ad65bf7a1cef39e3d66b3670189581b3b073/library/core/src/panic/unwind_safe.rs:271:9
      std::panicking::try::do_call
             at /rustc/bed4ad65bf7a1cef39e3d66b3670189581b3b073/library/std/src/panicking.rs:483:40
      std::panicking::try
             at /rustc/bed4ad65bf7a1cef39e3d66b3670189581b3b073/library/std/src/panicking.rs:447:19
      std::panic::catch_unwind
             at /rustc/bed4ad65bf7a1cef39e3d66b3670189581b3b073/library/std/src/panic.rs:137:14
      tokio::runtime::task::harness::poll_future
             at /home/niklas/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.21.2/src/runtime/task/harness.rs:468:18
      tokio::runtime::task::harness::Harness<T,S>::poll_inner
             at /home/niklas/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.21.2/src/runtime/task/harness.rs:104:27
      tokio::runtime::task::harness::Harness<T,S>::poll
             at /home/niklas/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.21.2/src/runtime/task/harness.rs:57:15
  18: tokio::runtime::task::raw::RawTask::poll
             at /home/niklas/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.21.2/src/runtime/task/raw.rs:134:18
      tokio::runtime::task::UnownedTask<S>::run
             at /home/niklas/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.21.2/src/runtime/task/mod.rs:422:9
      tokio::runtime::blocking::pool::Task::run
             at /home/niklas/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.21.2/src/runtime/blocking/pool.rs:111:9
      tokio::runtime::blocking::pool::Inner::run
             at /home/niklas/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.21.2/src/runtime/blocking/pool.rs:346:17
  19: tokio::runtime::blocking::pool::Spawner::spawn_thread::{{closure}}
             at /home/niklas/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.21.2/src/runtime/blocking/pool.rs:321:13
      std::sys_common::backtrace::__rust_begin_short_backtrace
             at /rustc/bed4ad65bf7a1cef39e3d66b3670189581b3b073/library/std/src/sys_common/backtrace.rs:121:18
  20: std::thread::Builder::spawn_unchecked_::{{closure}}::{{closure}}
             at /rustc/bed4ad65bf7a1cef39e3d66b3670189581b3b073/library/std/src/thread/mod.rs:551:17
      <core::panic::unwind_safe::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once
             at /rustc/bed4ad65bf7a1cef39e3d66b3670189581b3b073/library/core/src/panic/unwind_safe.rs:271:9
      std::panicking::try::do_call
             at /rustc/bed4ad65bf7a1cef39e3d66b3670189581b3b073/library/std/src/panicking.rs:483:40
      std::panicking::try
             at /rustc/bed4ad65bf7a1cef39e3d66b3670189581b3b073/library/std/src/panicking.rs:447:19
      std::panic::catch_unwind
             at /rustc/bed4ad65bf7a1cef39e3d66b3670189581b3b073/library/std/src/panic.rs:137:14
      std::thread::Builder::spawn_unchecked_::{{closure}}
             at /rustc/bed4ad65bf7a1cef39e3d66b3670189581b3b073/library/std/src/thread/mod.rs:550:30
      core::ops::function::FnOnce::call_once{{vtable.shim}}
             at /rustc/bed4ad65bf7a1cef39e3d66b3670189581b3b073/library/core/src/ops/function.rs:251:5
  21: <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once
             at /rustc/bed4ad65bf7a1cef39e3d66b3670189581b3b073/library/alloc/src/boxed.rs:1987:9
      <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once
             at /rustc/bed4ad65bf7a1cef39e3d66b3670189581b3b073/library/alloc/src/boxed.rs:1987:9
      std::sys::unix::thread::Thread::new::thread_start
             at /rustc/bed4ad65bf7a1cef39e3d66b3670189581b3b073/library/std/src/sys/unix/thread.rs:108:17
  22: start_thread
             at /build/glibc-2ORdQG/glibc-2.27/nptl/pthread_create.c:463
  23: clone
             at /build/glibc-2ORdQG/glibc-2.27/misc/../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Coloring options for an input file list

Hi,

Would it be possible to add an option to supply a list of colors to use when building a colored DBG from an input file list? The current method of using the -l argument seems to give each sequence in the list its own color, but in some cases it would be desirable to color several sequences with the same color (for example according to some taxonomy).

Currently this can be done by concatenating the files that should get the same color into separate fasta files but this is quite cumbersome to setup for large inputs with many colors and requires duplicating the entire dataset in the temporary fasta files, which can get pretty large.

Example of input I would like to use (first column is colors, second is the file path):

color-0    GCF_000160075.2_ASM16007v2_genomic.fna.gz
color-0    GCF_013267415.1_ASM1326741v1_genomic.fna.gz
color-1    GCF_000963925.1_ASM96392v1_genomic.fna.gz
color-1    GCF_002153775.1_ASM215377v1_genomic.fna.gz
color-1    GCF_007989305.1_ASM798930v1_genomic.fna.gz

where the colors and the sequence paths could be supplied either in the same file as above or as separate files.

Thanks for creating ggcat!

Error 68 : Unfinished stream

Hi there! In jermp/sshash#39, @jermp suggested that I use ggcat to produce input datasets for sshash.

I tried using ggcat, but unfortunately something seems wrong:

$ gzip -d se.ust.k31.fa.gz 
$ ggcat build -k 31 -j 8 --eulertigs se.ust.k31.fa 
...
Final output saved to: output.fasta.lz4
$ lz4 -d output.fasta.lz4 
Decoding file output.fasta 
Error 68 : Unfinished stream 

This is using se.ust.k31.fa.gz as an input dataset for ggcat. Ultimately, I want to apply ggcat to compute eulertigs of Homo_sapiens.GRCh38.dna.toplevel.fa.gz for k=127, but with that dataset I too end up having Unfinished stream errors, and the resulting file is much smaller than I anticipate. Could you please advice if I'm doing anything wrong here?

Interaction between `-m 0` and `--prefer-memory` causes a crash

Hi, I've been playing with ggcat and noticed that running

ggcat build -k 31 -m 0 --prefer-memory input.fa.gz

causes ggcat to crash. Presumably this is due to -m 0 because increasing the value or removing --prefer-memory prevents the crash. It might be good to prevent this combination or require a positive value for -m.

Log from running the above command with real inputs:
temaklin@xps13:~/Projects/African_E_colis$ ggcat build -k 31 -m 0 --prefer-memory assemblies/SRR13687106.fa.gz
Allocator initialized: mem: 0 octets chunks: 8192 log2: 8
Using m: 12 with k: 31
Started phase: reads bucketing prev stats: 
Temp buckets files size: 6.11 MiB
Finished phase: reads bucketing. phase duration: 432.93ms gtime: 432.94ms
Started phase: kmers merge prev stats: 
Thread panicked at location: libs-crates/parallel-processor-rs/src/buckets/readers/generic_binary_reader.rs:106:9
Backtrace:    0: <unknown>
   1: <unknown>
   2: <unknown>
   3: <unknown>
   4: <unknown>
   5: <unknown>
   6: <unknown>
   7: <unknown>
   8: <unknown>
   9: <unknown>
  10: <unknown>
  11: <unknown>
  12: <unknown>
  13: <unknown>
  14: <unknown>
  15: <unknown>
  16: <unknown>
  17: <unknown>
  18: <unknown>
  19: <unknown>
  20: <unknown>
  21: <unknown>
  22: <unknown>
  23: <unknown>
  24: <unknown>
  25: <unknown>
  26: start_thread
  27: clone3

code: 24, kind: Uncategorized, message: "Too many open files"

Hello, I'm trying to run the code on the attached file (file extension .txt because .fna is not allowed on Github). I'm getting the following error:

niklas@phoenix:~/code/ggcat$ ggcat build -k 31 ../Themisto/example_input/coli3.fna -o coli3
Allocator initialized: mem: 2 GiB chunks: 8192 log2: 18
Using m: 12 with k: 31
Started phase: reads bucketing prev stats: 
Temp buckets files size: 26.64 MiB
Finished phase: reads bucketing. phase duration: 373.73ms gtime: 373.80ms
Started phase: kmers merge prev stats: 
Finished phase: kmers merge. phase duration: 452.88ms gtime: 826.68ms
Started phase: hashes sorting prev stats: 
Thread panicked at location: libs/parallel-processor-rs/src/memory_fs/file/internal.rs:248:26
Error message: called `Result::unwrap()` on an `Err` value: Os { code: 24, kind: Uncategorized, message: "Too many open files" }
Backtrace:    0: <unknown>
   1: <unknown>
   2: <unknown>
   3: <unknown>
   4: <unknown>
   5: <unknown>
   6: <unknown>
   7: <unknown>
   8: <unknown>
   9: <unknown>
  10: <unknown>
  11: <unknown>
  12: <unknown>
  13: <unknown>
  14: <unknown>
  15: <unknown>
  16: <unknown>
  17: <unknown>
  18: <unknown>
  19: <unknown>
  20: <unknown>

The ulimit for the number of open files is set to unlimited:

niklas@phoenix:~/code/ggcat$ ulimit
unlimited

coli3.txt

Using nightly Rust is problematic

The build was again broken on latest nightly. It seems to be fixed now, but to prevent future breakage, I think it would be better if GGCAT used a fixed Rust version that is not the nightly version.

Query index shifting

When a query has less than k bases, it's removed and all the successive queries get their index decremented

GGCAT API crash when building many graphs in memory from the same instance

Link to files + code that reproduce the crash on my system (Fedora 39 Linux 6.6.13-200.fc39.x86_64) at the end.

Description

GGCAT API seems to have a bug where using the API to build many (> 100) graphs from the same instance initialized with prefer_memory: true eventually causes a panic with error message:

thread 'main' panicked at /home/temaklin/.cargo/registry/src/index.crates.io-6f17d22bba15001f/parallel-processor-0.1.13/src/memory_fs/file/internal.rs:248:26:
called `Result::unwrap()` on an `Err` value: Os { code: 2, kind: NotFound, message: "No such file or directory" }
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

I looked more into this by making the panicking function (create_writing_underlying_file in parallel-processor) print the file it's attempting to access, and the panic seems to be caused by the instance going into a state where it thinks that it has run out of memory after building some of the graphs. The first graphs are built normally in memory (no temporary files are created) but after a while the building seems to switch to 100% on disk. This eventually causes a crash with the error:

	create_writing_underlying_file: tmp/build_graph_95c7a77f-d9b1-4028-989f-f5676fdf4417/result.997
	create_writing_underlying_file: tmp/build_graph_95c7a77f-d9b1-4028-989f-f5676fdf4417/result.998
	create_writing_underlying_file: tmp/build_graph_95c7a77f-d9b1-4028-989f-f5676fdf4417/result.999
	create_writing_underlying_file: tmp/build_graph_02d33ca3-550c-42c5-b3f2-993a06afe332/maximal-links.207
thread 'main' panicked at /home/temaklin/.cargo/registry/src/index.crates.io-6f17d22bba15001f/parallel-processor-0.1.13/src/memory_fs/file/internal.rs:249:26:
called `Result::unwrap()` on an `Err` value: Os { code: 2, kind: NotFound, message: "No such file or directory" }
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

because the file tmp/build_graph_02d33ca3-550c-42c5-b3f2-993a06afe332/maximal-links.207 doesn't exist in the temporary directory.

I also tried calling run_assembler directly but it results in the same crash, so it seems like the API isn't the issue.

Code

use std::collections::HashMap;
use std::path::PathBuf;

fn build_pangenome_graph(input_seq_names: &[String], prefix: &String, instance: &ggcat_api::GGCATInstance) {
    println!("Building graph {} from {} sequences:", prefix, input_seq_names.len());
    input_seq_names.iter().for_each(|x| { println!("\t{}", x) });

    let graph_file = PathBuf::from(prefix.to_string());
    let ggcat_inputs: Vec<ggcat_api::GeneralSequenceBlockData> = input_seq_names
        .iter()
        .map(|x| ggcat_api::GeneralSequenceBlockData::FASTA((PathBuf::from(x), None)))
        .collect();

    instance.build_graph(
        ggcat_inputs,
        graph_file,
        Some(input_seq_names),
        51 as usize,
        4 as usize,
        false,
        None,
        false, // No colors
        1 as usize,
        ggcat_api::ExtraElaboration::GreedyMatchtigs,
    );
}

fn main() {
    // Read in the inputs
    let f = std::fs::File::open("clusters_morethanone.tsv").unwrap();
    let mut reader = csv::ReaderBuilder::new()
        .delimiter(b'\t')
        .has_headers(false)
        .from_reader(f);
    let mut seqs_to_clusters: HashMap<String, Vec<String>> = HashMap::new();
    for line in reader.records().into_iter() {
        let record = line.unwrap();
        let key = record[0].to_string().clone();
        let val = record[1].to_string().clone();

	if seqs_to_clusters.contains_key(&key) {
            seqs_to_clusters.get_mut(&key).unwrap().push(val.clone());
        } else {
            seqs_to_clusters.insert(key.clone(), vec![val.clone()]);
        }
    }

    let config = ggcat_api::GGCATConfig {
        temp_dir: Some(PathBuf::from("tmp")),
        memory: 2.0 as f64,
        prefer_memory: true,
        total_threads_count: 4 as usize,
        intermediate_compression_level: None,
        stats_file: None,
    };

    let instance = ggcat_api::GGCATInstance::create(config);

    // Build 170 graphs with > 1 genomes each
    seqs_to_clusters
        .iter()
        .for_each(|x| build_pangenome_graph(x.1, x.0, &instance));
}

Reproducing

Download the files from https://drive.google.com/file/d/11wj5h6D40zgQcncmCbNRhBT73HeFiAec/view?usp=sharing and run using cargo build --release && target/release/ggcat-tmpfiles-crash.

query jsonl output generates bad keys

The current jsonl output of query returns unquoted keys for query_index and matches which does not match the JSON spec. This causes several json parsers to choke on the files. Quoting these keys resolves the issue.

Build fails with `error[E0635]: unknown feature proc_macro_span_shrink`

Hi, I'm trying to build ggcat with rust 1.72.0-nightly (0ab38e95b 2023-07-03) but the build keeps failing with an error related to unknown features. I'm guessing this has something to do with the nightly issues mentioned in #27?

$ cargo install --path crates/cmdline/ --locked
warning: some crates are on edition 2021 which defaults to `resolver = "2"`, but virtual workspaces default to `resolver = "1"`
note: to keep the current resolver, specify `workspace.resolver = "1"` in the workspace root's manifest
note: to use the edition 2021 resolver, specify `workspace.resolver = "2"` in the workspace root's manifest
warning: some crates are on edition 2021 which defaults to `resolver = "2"`, but virtual workspaces default to `resolver = "1"`
note: to keep the current resolver, specify `workspace.resolver = "1"` in the workspace root's manifest
note: to use the edition 2021 resolver, specify `workspace.resolver = "2"` in the workspace root's manifest
  Installing ggcat_cmdline v0.1.0 (/home/temaklin/software/ggcat/crates/cmdline)
    Updating crates.io index
warning: package `hermit-abi v0.3.1` in Cargo.lock is yanked in registry `crates-io`, consider running without --locked
warning: package `pest v2.6.0` in Cargo.lock is yanked in registry `crates-io`, consider running without --locked
warning: Patch `papi-bindings v0.5.2 (/home/temaklin/software/ggcat/libs-crates/papi-bindings-rs)` was not used in the crate graph.
Check that the patched package version and available features are compatible
with the dependency requirements. If the patch has a different version from
what is locked in the Cargo.lock file, run `cargo update` to use the new
version. This may also occur with an optional dependency that is not enabled.
   Compiling proc-macro2 v1.0.59
   Compiling getrandom v0.2.9
   Compiling num-traits v0.2.15
   Compiling crossbeam-utils v0.8.15
   Compiling hashbrown v0.12.3
   Compiling lazy_static v1.4.0
   Compiling proc-macro-error-attr v1.0.4
   Compiling lock_api v0.4.9
   Compiling memoffset v0.8.0
error[E0635]: unknown feature `proc_macro_span_shrink`
  --> /home/temaklin/.cargo/registry/src/index.crates.io-6f17d22bba15001f/proc-macro2-1.0.59/src/lib.rs:92:30
   |
92 |     feature(proc_macro_span, proc_macro_span_shrink)
   |                              ^^^^^^^^^^^^^^^^^^^^^^

For more information about this error, try `rustc --explain E0635`.
error: could not compile `proc-macro2` (lib) due to previous error
warning: build failed, waiting for other jobs to finish...
error: failed to compile `ggcat_cmdline v0.1.0 (/home/temaklin/software/ggcat/crates/cmdline)`, intermediate artifacts can be found at `/home/temaklin/software/ggcat/target`.
To reuse those artifacts with a future compilation, set the environment variable `CARGO_TARGET_DIR` to that path.

Is there any way to dump the whole colormap in a readable file ?

Hi,

I am aware of the command line ggcat dump-colors output.fasta.colors.dat that dump the colors associated to each individual file in a json file; but I was wondering whether there is an option to dump not only the colors associated to each file, but also those associated to the powerset of those colors.

Say, for instance, that some kmer is seen in files {1,2,4}, and that this subset {1,2,4} is associated to say color 8 in the colormap; in the final FASTA file this kmer would have header C:8:1. But without querying this kmer, I cannot know to which subset of colors corresponds 8.

So basically I am interested in accessing the colormap in plain. Is there any way this is possible ?

Using ggcat-api in an external Rust project fails to build without including `dynamic-dispatch-proc-macro` in Cargo.toml

I noticed that including the ggcat API as a dependency in another Rust project will fail because of build issues with the following Cargo.toml file:

[package]
name = "ggcat-api-test"
version = "0.1.0"
edition = "2021"

# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html

[dependencies]
ggcat-api = { version = "0.1.0", git = "https://github.com/algbio/ggcat" }

Adding dynamic-dispatch-proc-macro under the [patch.crates-io] (see below for a working Cargo.toml file) section fixes the issue but this isn't documented in the API section or the example files, is this intended?

Cargo.toml that builds successfully:

[package]
name = "ggcat-api-test"
version = "0.1.0"
edition = "2021"

# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html

[dependencies]
ggcat-api = { version = "0.1.0", git = "https://github.com/algbio/ggcat" }

[patch.crates-io]
dynamic-dispatch-proc-macro = { version = "0.4.2", git = "https://github.com/Guilucand/dynamic-dispatch-rs"}
Log from failed build:
temaklin@xps13:~/software/ggcat-api-test$ cargo build
   Compiling proc-macro2 v1.0.76
   Compiling unicode-ident v1.0.12
   Compiling libc v0.2.152
   Compiling autocfg v1.1.0
   Compiling cfg-if v1.0.0
   Compiling version_check v0.9.4
   Compiling memchr v2.7.1
   Compiling serde v1.0.195
   Compiling syn v1.0.109
   Compiling once_cell v1.19.0
   Compiling libm v0.2.8
   Compiling zerocopy v0.7.32
   Compiling ahash v0.8.7
   Compiling num-traits v0.2.17
   Compiling allocator-api2 v0.2.16
   Compiling ppv-lite86 v0.2.17
   Compiling quote v1.0.35
   Compiling crossbeam-utils v0.8.19
   Compiling proc-macro-error-attr v1.0.4
   Compiling syn v2.0.48
   Compiling getrandom v0.2.12
   Compiling lazy_static v1.4.0
   Compiling rand_core v0.6.4
   Compiling cc v1.0.83
   Compiling aho-corasick v1.1.2
   Compiling hashbrown v0.14.3
   Compiling rand_chacha v0.3.1
   Compiling rand v0.8.5
   Compiling lock_api v0.4.11
   Compiling proc-macro-error v1.0.4
   Compiling regex-syntax v0.8.2
   Compiling thiserror v1.0.56
   Compiling parking_lot_core v0.9.9
   Compiling scopeguard v1.2.0
   Compiling smallvec v1.13.0
   Compiling either v1.9.0
   Compiling itoa v1.0.10
   Compiling proc-macro2 v0.4.30
   Compiling adler v1.0.2
   Compiling backtrace v0.3.69
   Compiling miniz_oxide v0.7.1
   Compiling crossbeam-epoch v0.9.18
   Compiling ryu v1.0.16
   Compiling gimli v0.28.1
   Compiling unicode-xid v0.1.0
   Compiling byteorder v1.5.0
   Compiling crossbeam-deque v0.8.5
   Compiling parking_lot v0.12.1
   Compiling object v0.32.2
   Compiling syn v0.15.44
   Compiling rustc-demangle v0.1.23
   Compiling quote v0.6.13
   Compiling lz4-sys v1.9.4
   Compiling typenum v1.17.0
   Compiling rayon-core v1.12.1
   Compiling addr2line v0.21.0
   Compiling regex-automata v0.4.3
   Compiling crossbeam-queue v0.3.11
   Compiling crossbeam-channel v0.5.11
   Compiling bytesize v1.3.0
   Compiling json v0.12.4
   Compiling serde_json v1.0.111
   Compiling crossbeam v0.8.4
   Compiling serde_derive v1.0.195
   Compiling thiserror-impl v1.0.56
   Compiling tokio-macros v2.2.0
   Compiling mt-debug-counters v0.1.3
   Compiling desse-derive v0.2.1
   Compiling nightly-quirks v0.1.4
   Compiling num_cpus v1.16.0
   Compiling filebuffer v0.4.0
   Compiling pin-project-lite v0.2.13
   Compiling crc32fast v1.3.2
   Compiling bytemuck v1.14.0
   Compiling tokio v1.35.1
   Compiling desse v0.2.1
   Compiling rayon v1.8.1
   Compiling unicode-segmentation v1.10.1
   Compiling replace_with v0.1.7
   Compiling unicode-width v0.1.11
   Compiling unchecked-index v0.2.2
   Compiling textwrap v0.11.0
   Compiling heck v0.3.3
   Compiling atty v0.2.14
   Compiling bitflags v1.3.2
   Compiling strsim v0.8.0
   Compiling ansi_term v0.12.1
   Compiling structopt-derive v0.4.18
   Compiling flate2 v1.0.28
   Compiling static_assertions v1.1.0
   Compiling bstr v1.9.0
   Compiling regex v1.10.2
   Compiling num-integer v0.1.45
   Compiling radium v0.7.0
   Compiling matrixmultiply v0.3.8
   Compiling paste v1.0.14
   Compiling equivalent v1.0.1
   Compiling tap v1.0.1
   Compiling indexmap v2.1.0
   Compiling wyz v0.5.1
   Compiling safe_arch v0.7.1
   Compiling num-complex v0.4.4
   Compiling num-rational v0.4.1
   Compiling heck v0.4.1
   Compiling semver v0.1.20
   Compiling rustversion v1.0.14
   Compiling funty v2.0.0
   Compiling rawpointer v0.2.1
   Compiling fixedbitset v0.4.2
   Compiling dynamic-dispatch-proc-macro v0.4.2
   Compiling rustc_version v0.1.7
   Compiling petgraph v0.6.4
   Compiling wide v0.7.13
   Compiling bitvec v1.0.1
   Compiling vec_map v0.8.2
   Compiling dashmap v5.5.3
   Compiling bincode v1.3.3
   Compiling clap v2.34.0
   Compiling dynamic-dispatch v0.5.3
   Compiling approx v0.5.1
   Compiling csv-core v0.1.11
   Compiling traitsequence v2.0.0
   Compiling feature-probe v0.1.1
   Compiling retain_mut v0.1.7
   Compiling bv v0.11.1
   Compiling csv v1.3.0
   Compiling roaring v0.10.2
   Compiling simba v0.6.0
   Compiling newtype_derive v0.1.6
   Compiling nalgebra-macros v0.1.0
   Compiling itertools v0.10.5
   Compiling rand_distr v0.4.3
   Compiling structopt v0.3.26
   Compiling streaming-libdeflate-rs v0.1.5
   Compiling atoi v2.0.0
   Compiling hashbrown v0.13.2
   Compiling thread_local v1.1.7
   Compiling siphasher v0.3.11
   Compiling anyhow v1.0.79
   Compiling utf8parse v0.2.1
   Compiling ref-cast v1.0.22
   Compiling instrumenter-proc-macro v0.1.1
   Compiling anstyle-parse v0.2.3
   Compiling strum_macros v0.25.3
   Compiling traitgraph v5.0.0
   Compiling instrumenter v0.1.3
   Compiling enum-map-derive v0.17.0
   Compiling ref-cast-impl v1.0.22
   Compiling derive-new v0.5.9
   Compiling itertools v0.11.0
   Compiling error-chain v0.12.4
   Compiling anstyle-query v1.0.2
   Compiling colorchoice v1.0.0
   Compiling bit-vec v0.6.3
   Compiling powerfmt v0.2.0
   Compiling anstyle v1.0.4
   Compiling time-core v0.1.2
   Compiling anstream v0.6.11
   Compiling time-macros v0.2.16
   Compiling deranged v0.3.11
   Compiling bit-set v0.5.3
   Compiling enum-map v2.7.3
   Compiling traitgraph-algo v8.1.0
   Compiling multimap v0.9.1
   Compiling bio-types v1.0.1
   Compiling ndarray v0.15.6
   Compiling nalgebra v0.29.0
   Compiling getset v0.1.2
   Compiling fxhash v0.2.1
   Compiling ordered-float v3.9.2
   Compiling itertools-num v0.1.3
   Compiling triple_accel v0.4.0
   Compiling editdistancek v1.0.2
   Compiling bitvector v0.1.5
   Compiling bytecount v0.6.7
   Compiling log v0.4.20
   Compiling custom_derive v0.1.7
   Compiling strsim v0.10.0
   Compiling num_threads v0.1.6
   Compiling clap_lex v0.6.0
   Compiling strum v0.25.0
   Compiling time v0.3.31
   Compiling bigraph v5.0.0
   Compiling clap_builder v4.4.18
   Compiling compact-genome v2.1.1
   Compiling clap_derive v4.4.7
   Compiling disjoint-sets v0.4.2
   Compiling termcolor v1.1.3
   Compiling lz4 v1.24.0
   Compiling parallel-processor v0.1.13
   Compiling simplelog v0.12.1
   Compiling memory-stats v1.1.0
   Compiling permutation v0.4.1
   Compiling atomic-counter v1.0.1
   Compiling ggcat_config v0.1.0 (https://github.com/algbio/ggcat#0ca7007e)
   Compiling ggcat_hashes v0.1.0 (https://github.com/algbio/ggcat#0ca7007e)
   Compiling ggcat_utils v0.1.0 (https://github.com/algbio/ggcat#0ca7007e)
   Compiling fs_extra v1.3.0
   Compiling fdlimit v0.3.0
   Compiling uuid v1.7.0
error: `TypeId::of` is not yet stable as a const fn
   --> /home/temaklin/.cargo/git/checkouts/ggcat-40470e373049bee8/0ca7007/crates/hashes/src/cn_nthash.rs:106:1
    |
106 | #[dynamic_dispatch]
    | ^^^^^^^^^^^^^^^^^^^
    |
    = help: add `#![feature(const_type_id)]` to the crate attributes to enable
    = note: this error originates in the attribute macro `dynamic_dispatch` (in Nightly builds, run with -Z macro-backtrace for more info)

error: `TypeId::of` is not yet stable as a const fn
   --> /home/temaklin/.cargo/git/checkouts/ggcat-40470e373049bee8/0ca7007/crates/hashes/src/cn_nthash.rs:198:1
    |
198 | #[dynamic_dispatch]
    | ^^^^^^^^^^^^^^^^^^^
    |
    = help: add `#![feature(const_type_id)]` to the crate attributes to enable
    = note: this error originates in the attribute macro `dynamic_dispatch` (in Nightly builds, run with -Z macro-backtrace for more info)

error: `TypeId::of` is not yet stable as a const fn
   --> /home/temaklin/.cargo/git/checkouts/ggcat-40470e373049bee8/0ca7007/crates/hashes/src/base/cn_seqhash_base.rs:105:1
    |
105 | #[dynamic_dispatch]
    | ^^^^^^^^^^^^^^^^^^^
    |
    = help: add `#![feature(const_type_id)]` to the crate attributes to enable
    = note: this error originates in the attribute macro `dynamic_dispatch` (in Nightly builds, run with -Z macro-backtrace for more info)

error: `TypeId::of` is not yet stable as a const fn
  --> /home/temaklin/.cargo/git/checkouts/ggcat-40470e373049bee8/0ca7007/crates/hashes/src/fw_nthash.rs:92:1
   |
92 | #[dynamic_dispatch]
   | ^^^^^^^^^^^^^^^^^^^
   |
   = help: add `#![feature(const_type_id)]` to the crate attributes to enable
   = note: this error originates in the attribute macro `dynamic_dispatch` (in Nightly builds, run with -Z macro-backtrace for more info)

error: `TypeId::of` is not yet stable as a const fn
   --> /home/temaklin/.cargo/git/checkouts/ggcat-40470e373049bee8/0ca7007/crates/hashes/src/fw_nthash.rs:188:1
    |
188 | #[dynamic_dispatch]
    | ^^^^^^^^^^^^^^^^^^^
    |
    = help: add `#![feature(const_type_id)]` to the crate attributes to enable
    = note: this error originates in the attribute macro `dynamic_dispatch` (in Nightly builds, run with -Z macro-backtrace for more info)

error: `TypeId::of` is not yet stable as a const fn
  --> /home/temaklin/.cargo/git/checkouts/ggcat-40470e373049bee8/0ca7007/crates/hashes/src/base/fw_seqhash_base.rs:93:1
   |
93 | #[dynamic_dispatch]
   | ^^^^^^^^^^^^^^^^^^^
   |
   = help: add `#![feature(const_type_id)]` to the crate attributes to enable
   = note: this error originates in the attribute macro `dynamic_dispatch` (in Nightly builds, run with -Z macro-backtrace for more info)

error: `TypeId::of` is not yet stable as a const fn
   --> /home/temaklin/.cargo/git/checkouts/ggcat-40470e373049bee8/0ca7007/crates/hashes/src/base/cn_rkhash_base.rs:165:1
    |
165 | #[dynamic_dispatch]
    | ^^^^^^^^^^^^^^^^^^^
    |
    = help: add `#![feature(const_type_id)]` to the crate attributes to enable
    = note: this error originates in the attribute macro `dynamic_dispatch` (in Nightly builds, run with -Z macro-backtrace for more info)

error: `TypeId::of` is not yet stable as a const fn
   --> /home/temaklin/.cargo/git/checkouts/ggcat-40470e373049bee8/0ca7007/crates/hashes/src/base/fw_rkhash_base.rs:126:1
    |
126 | #[dynamic_dispatch]
    | ^^^^^^^^^^^^^^^^^^^
    |
    = help: add `#![feature(const_type_id)]` to the crate attributes to enable
    = note: this error originates in the attribute macro `dynamic_dispatch` (in Nightly builds, run with -Z macro-backtrace for more info)

error: could not compile `ggcat_hashes` (lib) due to 18 previous errors
warning: build failed, waiting for other jobs to finish...

installation not working

[22:18:22 ~/ggcat]$ curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
...
[22:18:48 ~/ggcat]$ source "$HOME/.cargo/env"
[22:18:51 ~/ggcat]$ which cargo
/home/ubuntu/.cargo/bin/cargo
[22:19:27 ~/ggcat]$ ls -la /home/ubuntu/.cargo/bin/
total 200936
drwxrwxr-x 2 ubuntu ubuntu 4096 Oct 25 22:12 .
drwxrwxr-x 3 ubuntu ubuntu 4096 Oct 25 22:13 ..
-rwxr-xr-x 13 ubuntu ubuntu 15825920 Oct 25 22:11 cargo
-rwxr-xr-x 13 ubuntu ubuntu 15825920 Oct 25 22:11 cargo-clippy
-rwxr-xr-x 13 ubuntu ubuntu 15825920 Oct 25 22:11 cargo-fmt
-rwxr-xr-x 13 ubuntu ubuntu 15825920 Oct 25 22:11 cargo-miri
-rwxr-xr-x 13 ubuntu ubuntu 15825920 Oct 25 22:11 clippy-driver
-rwxr-xr-x 13 ubuntu ubuntu 15825920 Oct 25 22:11 rls
-rwxr-xr-x 13 ubuntu ubuntu 15825920 Oct 25 22:11 rust-gdb
-rwxr-xr-x 13 ubuntu ubuntu 15825920 Oct 25 22:11 rust-gdbgui
-rwxr-xr-x 13 ubuntu ubuntu 15825920 Oct 25 22:11 rust-lldb
-rwxr-xr-x 13 ubuntu ubuntu 15825920 Oct 25 22:11 rustc
-rwxr-xr-x 13 ubuntu ubuntu 15825920 Oct 25 22:11 rustdoc
-rwxr-xr-x 13 ubuntu ubuntu 15825920 Oct 25 22:11 rustfmt
-rwxr-xr-x 13 ubuntu ubuntu 15825920 Oct 25 22:11 rustup
[22:19:31 ~/ggcat]$ cargo install --path .
error: found a virtual manifest at /home/ubuntu/ggcat/Cargo.toml instead of a package manifest

Additionally (not an issue) for the paper, consider comparing to BlastFrost instead of Bifrost for the queries:

https://github.com/nluhmann/BlastFrost

Thanks for your work on this interesting tool!

Plans to generate a GFA?

Hi,

Do you have any plans to produce a GFA output instead of a Fasta? it would be a natural way to represent the full graph with adjacency information betw unitigs

Thanks

Issues with -d and -c flags (ggcat doesn't find sample in large color_mapping.in file)

Hi.

I am currently running ggcat with a color_mapping.in file of 360 samples (both single and paired-end reads). The command I am running is as follows:

ggcat build -k 31 -c -d color_mapping.in -j 20 -m 600

I get the output you see on the ggcat.log file. Although ggcat finds adds the index with color per sample without problem, eventually, it throws an error as if a file was not found:

Panic: panicked at crates/io/src/sequences_stream/fasta.rs:15:14: Error while opening file /pasteur/appa/scratch/cduitama/RascovanProject/fastq_files/aOralNonHuman/SRR6877286_1.fastq.gz : Os { code: 2, kind: NotFound, message: "No such file or directory" } Backtrace: 0: <unknown> 1: <unknown> 2: <unknown> 3: <unknown> 4: <unknown> 5: <unknown> 6: <unknown> 7: <unknown> 8: <unknown> 9: <unknown> 10: <unknown> 11: <unknown> 12: <unknown> 13: <unknown> 14: <unknown> 15: <unknown> 16: <unknown> 17: __libc_start_main 18: <unknown>
As an additional test, I take the sample that it's said to be not found and I create a smaller color_mapping.in file with it. Using the same parameters, ggcat runs without problem. This means for some reason ggcat finds the sample in a small color_mapping.in(2 samples) file but not on the large one (360 samples)

Thanks!

Camila
ggcat.log

different runs return different number of unitigs and distinct color subsets

Hello @Guilucand and @alexandrutomescu.
Let me first congratulate with you for this excellent algorithm!

We are currently using GGCAT for building Fulgor, a colored k-mer index based on SSHash (same functionalities offered by Themisto).

I noticed that, when running the same command multiple times, I get small differences in the number of unitigs/color subsets.

For example, if I run:

./fulgor build -l ../test_data/salmonella_10_filenames.txt -o ../test_data/salmonella_10 -k 31 -m 19 -d tmp_dir --verbose --check -t 16

I sometimes get 171 color subsets and 86630 unitigs, or 172 and 86632, or 173 and 86648.

I'm inclined to think this might be an issue with multithreading since I did not notice this if I use 1 thread only.
Or, is it just something that can happen because unitigs are not required to be maximal anyway?

(The correctness of Fulgor is not broken as long as the number of kmers stays the same, as it seems to be.
But I just wanted you to know this potential issue. Ideally, I'd expect to always get the same number across different runs.)

Thank you very much!

CC: @rob-p.

Compilation failed with error: `use of unstable library feature 'int_log'`

Hi, I am attempting to make a Docker image for GGCAT, and I get this result when building:

#0 91.42    Compiling instrumenter v0.1.1 (/software/ggcat/libs-crates/instrumenter-rs)
#0 92.07    Compiling parallel-processor v0.1.6 (/software/ggcat/libs-crates/parallel-processor-rs)
#0 92.70 error[E0658]: use of unstable library feature 'int_log'
#0 92.70  --> libs-crates/parallel-processor-rs/src/utils/mod.rs:9:36
#0 92.70   |
#0 92.70 9 |     ((size.octets as u64) * 2 - 1).ilog2() as u8
#0 92.70   |                                    ^^^^^
#0 92.70   |
#0 92.70   = note: see issue #70887 <https://github.com/rust-lang/rust/issues/70887> for more information
#0 92.70 
#0 92.72 For more information about this error, try `rustc --explain E0658`.
#0 92.73 error: could not compile `parallel-processor` due to previous error
#0 92.73 warning: build failed, waiting for other jobs to finish...
#0 99.79 error: failed to compile `ggcat_cmdline v0.1.0 (/software/ggcat/crates/cmdline)`, intermediate artifacts can be found at `/software/ggcat/target`
------
Dockerfile:34
--------------------
  32 |     RUN git clone https://github.com/algbio/ggcat --recursive && cd ggcat && git checkout a256dd5
  33 |     WORKDIR /software/ggcat/
  34 | >>> RUN cargo install --path crates/cmdline/ --locked
  35 |     ENV PATH=$PATH:$HOME/.cargo/bin
  36 |     
--------------------
ERROR: failed to solve: process "/bin/sh -c cargo install --path crates/cmdline/ --locke

Any ideas what's going on? This image is built on top of the official Ubuntu 22.04 image with minimal other additions to the environment.

Thanks

Color set -> color class compression

Congratulations on the excellent manuscript! I have read it with great excitement and @jamshed will be presenting this paper in our lab's journal club soon. I was also very happy to see that you've chosen to implement ggcat in Rust.

I wanted to make brief mention of something that may be relevant to your color set representation. The concept of collapsing redundant color vectors into a minimal set of unique colors (i.e. color classes) and then representing them by their indices via an extra layer of indirection, is, depending on the specific data, a very powerful idea. This is essentially the same strategy adopted in rainbowfish and, later, mantis, and may be worth mention in your manuscript. Along these lines, I also wanted to convey an additional idea adopted in those tools that you may find useful to compress the color information even further.

In many types of data over which one wants to build a compacted colored dBG, not only do many k-mers have identical color sets (that is, they are equivalent under the color relation of k1 ~ k2 iff colors(k1) = colors(k2)), but the frequency distribution of the color sets themselves is highly skewed. That is, there are a small number of distinct color classes to which many k-mers map, and the distribution of frequencies of color class occurrences is often scale-free. This presents an additional opportunity to compress the color class labels themselves even further by choosing small labels (integers) for color ids that will be used frequently, and choosing larger labels (integers) for color ids that will be used infrequently. Specifically, as you build the information concerning the color sets, it is relatively easy to track the number of times a color ID has been used. Then, IDs can be re-ordered to assign smaller IDs to more frequent color sets and larger IDs to less frequent color sets. While this re-labeling can be done globally, it turns out that, in practice, the majority of the benefit comes from just assigning small IDs to the few most frequent color sets — this can be done at a relatively small cost by simply sampling the color sets for a small fraction of the k-mers. As long as the sampling is done randomly and unbiasedly, you'll expect to see all/most of the heavy hitters (probably many times) in your random sample, and you can assign IDs based on the frequency in this random sample (this is akin to the strategy we adopt to assign color IDs in mantis).

Anyway, congrats again on the very cool paper, and thanks for helping to drive rust adoption in bioinformatics! We look forward to exploring and using your tool.

GGCAT keeps panicking about too many open files

I'm trying to run GGCAT on the following dataset: https://ftp.ebi.ac.uk/pub/databases/ENA2018-bacteria-661k/
I had 5T of free disk space and the following flags: -k 31 -m 32 -j 8 -s 1
However, my runs keep panicking on too many open files despite raising ulimit -n. My first two runs with ulimit -n of 1024 and 4096 crashed in the following phase:

Started phase: kmers merge prev stats:
Thread panicked at location: libs-crates/parallel-processor-rs/src/memory_fs/file/internal.rs:248:26
Error message: called `Result::unwrap()` on an `Err` value: Os { code: 24, kind: Uncategorized, message: "Too many open files" }
Backtrace:    0: <unknown>
   1: <unknown>
   2: <unknown>
   3: <unknown>
   4: <unknown>
   5: <unknown>
   6: <unknown>
   7: <unknown>
   8: <unknown>
   9: <unknown>
  10: <unknown>
  11: <unknown>
  12: <unknown>
  13: <unknown>
  14: <unknown>
  15: <unknown>
  16: <unknown>
  17: <unknown>
  18: <unknown>
  19: <unknown>
  20: <unknown>
  21: <unknown>
  22: <unknown>

Then I raised the ulimit -n to 10240, and it crashed in the following phase:

Finished phase: kmers merge. phase duration: 20675.80s gtime: 30013.67s
Started phase: hashes sorting prev stats: 
Finished phase: hashes sorting. phase duration: 646.23s gtime: 30659.90s
Started phase: links compaction prev stats: 
Thread panicked at location: libs-crates/parallel-processor-rs/src/memory_fs/file/internal.rs:248:26
Error message: called `Result::unwrap()` on an `Err` value: Os { code: 24, kind: Uncategorized, message: "Too many open files" }
Backtrace:    0: <unknown>
   1: <unknown>
   2: <unknown>
   3: <unknown>
   4: <unknown>
   5: <unknown>
   6: <unknown>
   7: <unknown>
   8: <unknown>
   9: <unknown>
  10: <unknown>
  11: <unknown>
  12: <unknown>
  13: <unknown>
  14: <unknown>
  15: <unknown>
  16: <unknown>
  17: <unknown>
  18: <unknown>
  19: <unknown>
  20: <unknown>
  21: <unknown>
  22: <unknown>

Problem with GGCAT when using 32 threads

I was running ggcat on 72 human genomes (subset of human genomes used in ggcat paper and published on zenodo). I was executing it with 32 threads. After 24 hours it was still executing so I stopped it. Then I changed the number of threads to 20, it finished in 13 minutes. I tried to monitor both execution settings, in the first case (32 threads) phase eta was always 18446744073709551615 s, while for the 16 threads case it was 1290 s. I am not sure if this is the source of the problem?

Anyway, what could be the cause of such a problem? (it is supposed to be faster when using more threads).

Thank you in advance.

GGCAT installation fails when compiling ggcat-api

Hi,

I wanted to try out ggcat, during installation i get the following error:

Compiling ggcat-api v0.1.0 (/home/Prog/fulgor/external/ggcat/crates/api)
error: could not write output to /home/Prog/fulgor/external/ggcat/target/release/deps/ggcat_cpp_bindings-8701aaa82dc571a8.ggcat_assembler_minimizer_bucketing-8434959e806de058.ggcat_assembler_minimizer_bucketing.4e5595e557a91e17-cgu.04.rcgu.o.rcgu.o: File name too long

error: could not compile `ggcat-cpp-bindings` (lib) due to 1 previous error
make[3]: *** [Makefile:15: lib/libggcat_cpp_bindings.a] Error 101
make[2]: *** [CMakeFiles/ggcat_cpp_api.dir/build.make:70: CMakeFiles/ggcat_cpp_api] Error 2
make[1]: *** [CMakeFiles/Makefile2:86: CMakeFiles/ggcat_cpp_api.dir/all] Error 2
make: *** [Makefile:91: all] Error 2

This error appeared during the installation of fulgor but I get the same error when trying to install ggcat alone:

Compiling ggcat_cmdline v0.1.0 (/home/Prog/ggcat/crates/cmdline)
error: could not write output to /home/Prog/ggcat/target/release/deps/ggcat-b6d2d6c2a91228c2.ggcat_assembler_minimizer_bucketing-e11f3962450e6c80.ggcat_assembler_minimizer_bucketing.e9bbab0fcd303136-cgu.02.rcgu.o.rcgu.o: File name too long

error: could not compile `ggcat_cmdline` (bin "ggcat") due to previous error
error: failed to compile `ggcat_cmdline v0.1.0 (/home/Prog/ggcat/crates/cmdline)`, intermediate artifacts can be found at `/home/Prog/ggcat/target`.
To reuse those artifacts with a future compilation, set the environment variable `CARGO_TARGET_DIR` to that path.

@Malfoy and @kamimrcht have had the same issue.
We suspect that this error only occurs on encrypted computers.

access datasets of origin

Hi,

I would need an option to access the original datasets of each unitig constructed by GGCAT. The output could be heavy, but a first idea would be to write this info in the header instead of the colors.

Best.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.