Giter Club home page Giter Club logo

needle's Introduction

SeqAn - The Library for Sequence Analysis

build status license latest release platforms start twitter

NOTE
SeqAn3 is out and hosted in a different repository
We recommend using SeqAn3 for new applications.

What Is SeqAn?

SeqAn is an open source C++ library of efficient algorithms and data structures for the analysis of sequences with the focus on biological data. Our library applies a unique generic design that guarantees high performance, generality, extensibility, and integration with other libraries. SeqAn is easy to use and simplifies the development of new software tools with a minimal loss of performance.

License

The SeqAn library itself, the tests and demos are licensed under the very permissive 3-clause BSD License. The licenses for the applications themselves can be found in the LICENSE files.

Prerequisites

Older compiler versions might work but are neither supported nor tested.

Linux, macOS, FreeBSD

  • GCC ≥ 11
  • Clang/LLVM ≥ 15
  • Intel oneAPI C++ Compiler 2024.0.2 (IntelLLVM)

Windows

  • Visual C++ ≥ 17.0 / Visual Studio ≥ 2022

Architecture support

  • Intel/AMD platforms, including optimisations for modern instruction sets (POPCNT, SSE4, AVX2, AVX512)
  • All Debian release architectures supported, including most ARM and all PowerPC platforms.

Build system

  • To build tests, demos, and official SeqAn applications you also need CMake ≥ 3.12.

Some official applications might have additional requirements or only work on a subset of platforms.

Documentation Resources

Contact

needle's People

Contributors

eseiler avatar irallia avatar joergi-w avatar marehr avatar mitradarja avatar mr-c avatar rrahn avatar smehringer avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

needle's Issues

Replace std::unordered_set

Use robin_hood::unordered_set instead of std::unordered_set

robin_hood::unordered_node_map instead of robin_hood::unordered_map?

Recalculate normalized expression value after preprocessing

It might make sense to use a different normalization method after preprocessing, this should be possible for all methods based on a genome mask, where new sequences are given. Not possible remains to calculate anything on the sequences of the experiment, because these are not given.

Calculate FPR for every file on every level

Instead of having one fpr for all files on a level, estimate the fpr for every file, store it and then use it in the estimations. This is important, if the amount of file content differs a lot, because then the fpr for all is not correct.

Use biggest file to estimate size of ibf?

In order to gurantee a fpr for every file, I could change from using the mean the maximum... This would increase the size, but compression should be even smaller.

Make more use of header files or use cereal

Currently, the header file information is not used at all, when using ibf with minimiser files. So, an user would have to type in cutoffs, k, w, shape again. This does not make sense...
On the other hand, the expression level information should be not used, there is no reason, why an user wants to stay with those. They should only be used for statistics.

Maybe the information about k, w, shape can be stored in a better way, so that not an extra file is necessary? Like in the name? experiment_k_w_shape_cutoff.minimiser could work. Then the header would only be used for the statistic function.
Or stome them in a data structure.

[BUG] Multithread results sometimes in error

Sometimes the ape test test_needle fails for the multithread option.

[ RUN      ] ibfmin.no_given_expression_levels_multiple_threads
/home/mitradarvish/Dokumente/develop/needle/test/api/test_needle.cpp:337: Failure
Expected equality of these values:
  expected_result2
    Which is: [1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1]
  res2
    Which is: [1,0,1,1,1,1,1,1,1,1,1,0,1,1,1,1]
[  FAILED  ] ibfmin.no_given_expression_levels_multiple_threads (1 ms)

Needle 2.0.0

  • Add HIBF #100
  • Add Merge option #38
  • Add update option? #38
  • Make Preprocessing better with multiple threads
  • Make count competitive with kallisto & Co.?
  • Add argument verbose #32

[Feature] Add Update possibilities

Add a possibility to add a new experiment and a possibility to add an IBF with a new expression level. For adding a new expression level, the IBFs "cornering" the new expression value need to be recalculated.

"needle" name conflict

I put together a basic package of Needle for Debian and I discovered that there is another programs with the same name

/use/bin/needle is already provided by EMBOSS
https://packages.debian.org/bullseye/emboss a.k.a https://bio.tools/needle-ebi or https://bio.tools/needle

This causes a problem with Debian policy

Two different packages must not install programs with different functionality but with the same filenames. […] If this case happens, one of the programs must be renamed.

https://www.debian.org/doc/debian-policy/ch-files.html#binaries

And I would rather that the Debian package of Needle has the same program name as other packaging systems. Otherwise that causes problems for user's scripts and workflows..

Perhaps this Needle could be renamed? Sorry to ask it, but it would be best to not have the confusion.

Add preprocessing step

Add an optional preprocessing step that counts all minimizers and their occurrences for an experiment and saves it in a binary file, the ibf should then be build on these binarys.

Improve documentation

  • Add link to apptemplate in Readme
  • Explain how doc can be built
  • Check documentation for errors or missing paragraphs
  • Add more information, so that built documentation is easier to use. Maybe add a small tutorial?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.