Giter Club home page Giter Club logo

hpg-libs's People

Contributors

cyenyxe avatar gemalm3 avatar imedina avatar j-coll avatar jmmut avatar jtarraga avatar martineh avatar pescobar avatar raulmorenogaldon avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

hpg-libs's Issues

Record IDs may contain semi-colons

According to VCF 4.1 and 4.2 specifications:

https://github.com/samtools/hts-specs/blob/master/VCFv4.1.tex#L162
https://github.com/samtools/hts-specs/blob/master/VCFv4.2.tex#L179

It seems than record IDs can contain semi-colons, although this descriptions nor clear:
ID - identifier: Semi-colon separated list of unique identifiers where available. If this is a dbSNP variant it is encouraged to use the rs number(s). No identifier should be present in more than one data record. If there is no identifier available, then the missing value should be used. (String, no white-space or semi-colons permitted)

The text in parenthesis seems not correct. Although it seems ';' are allowed in the specification but we do not according to:

https://github.com/opencb/hpg-libs/blob/develop/cpp/src/bioformats/vcf/vcf_v41.ragel#L364

Project refactoring

Project must be restructured to accommodate both C and C++ code and third party libraries. At root level a 'c', 'cpp' and 'third_party' folder must be present.
As a related task scons must be configured properly

Manage dependencies using submodules

I have found out that all our dependencies are now maintained in GitHub. To be as up-to-date as possible, it could be a good idea to manage them with submodules. If we point to the tag with their latest release we can also guarantee stability.
From the point of view of the applications that depend on hpg-libs it shouldn't be a problem because the git submodules command has a --recursive option that handles nested submodules.

The links to the projects are:
https://github.com/samtools/samtools, https://github.com/samtools/htslib (tagged releases)
https://github.com/akheron/jansson (tagged releases)
https://github.com/mackyle/sqlite/ (unofficial, tagged releases)
https://github.com/argtable
https://github.com/hyperrealm/libconfig

Thoughts? :)

Implement transformation rules for VCF Variants in C++

VCF format specification allows to combine in a single line different type of a variants and alleles. This makes hard the analysis and the interpretation of data.
A set of transformations to theses variants must be applied either when string lines are parsed or afterwards.

Support multiple variant types in a single filter

The VCF filter accepts only one variant type as argument. Since the library supports SNVs, INDELs and SVs, it should accept 2 types at the same time in order to remove the other from an input file.

Support for bgzip'd files

It looks like bgzip is not fully compatible with gzip, so the parsing fails after a while (probably after the first compressed block). Support for this file format must be implemented.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.