opencb / hpg-libs Goto Github PK
View Code? Open in Web Editor NEWC libs for HPG project
License: GNU General Public License v2.0
C libs for HPG project
License: GNU General Public License v2.0
According to VCF 4.1 and 4.2 specifications:
https://github.com/samtools/hts-specs/blob/master/VCFv4.1.tex#L162
https://github.com/samtools/hts-specs/blob/master/VCFv4.2.tex#L179
It seems than record IDs can contain semi-colons, although this descriptions nor clear:
ID - identifier: Semi-colon separated list of unique identifiers where available. If this is a dbSNP variant it is encouraged to use the rs number(s). No identifier should be present in more than one data record. If there is no identifier available, then the missing value should be used. (String, no white-space or semi-colons permitted)
The text in parenthesis seems not correct. Although it seems ';' are allowed in the specification but we do not according to:
https://github.com/opencb/hpg-libs/blob/develop/cpp/src/bioformats/vcf/vcf_v41.ragel#L364
Old Ragel VCF parser must be adapted to C++
Project must be restructured to accommodate both C and C++ code and third party libraries. At root level a 'c', 'cpp' and 'third_party' folder must be present.
As a related task scons must be configured properly
I have found out that all our dependencies are now maintained in GitHub. To be as up-to-date as possible, it could be a good idea to manage them with submodules. If we point to the tag with their latest release we can also guarantee stability.
From the point of view of the applications that depend on hpg-libs it shouldn't be a problem because the git submodules command has a --recursive option that handles nested submodules.
The links to the projects are:
https://github.com/samtools/samtools, https://github.com/samtools/htslib (tagged releases)
https://github.com/akheron/jansson (tagged releases)
https://github.com/mackyle/sqlite/ (unofficial, tagged releases)
https://github.com/argtable
https://github.com/hyperrealm/libconfig
Thoughts? :)
VCF format specification allows to combine in a single line different type of a variants and alleles. This makes hard the analysis and the interpretation of data.
A set of transformations to theses variants must be applied either when string lines are parsed or afterwards.
On 2WGM change SI 60, 61, 62, 63 to CVI.
Aggregated VCF files are currently not accepted by the VCF parser.
Using Boost.IOstream a set of variant readers must be implemented for:
Aggregated data must also be supported.
The VCF filter accepts only one variant type as argument. Since the library supports SNVs, INDELs and SVs, it should accept 2 types at the same time in order to remove the other from an input file.
It looks like bgzip is not fully compatible with gzip, so the parsing fails after a while (probably after the first compressed block). Support for this file format must be implemented.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.