Giter Club home page Giter Club logo

tobiasrausch / alfred Goto Github PK

View Code? Open in Web Editor NEW
134.0 9.0 12.0 21.9 MB

BAM Statistics, Feature Counting and Annotation

Home Page: https://www.gear-genomics.com/alfred/

License: BSD 3-Clause "New" or "Revised" License

Makefile 0.20% C++ 94.93% R 1.25% Shell 0.97% Dockerfile 0.08% Python 0.02% HTML 0.47% CSS 0.22% JavaScript 1.05% C 0.80%
insert-size coverage-distribution sequencing quality read-counts quality-control alfred alignment-metrics feature-counting

alfred's Introduction

install with bioconda Anaconda-Server Badge C/C++ CI Docker CI GitHub license GitHub Releases

Alfred: BAM alignment statistics, feature counting and feature annotation

Alfred is available as a Bioconda package, as a pre-compiled statically linked binary from Alfred's github release page, as a singularity container SIF file or as a minimal Docker container. Please have a look at Alfred's documentation for any installation or usage questions.

Source Code

Web Application

Documentation

Citation

Tobias Rausch, Markus Hsi-Yang Fritz, Jan O Korbel, Vladimir Benes.
Alfred: interactive multi-sample BAM alignment statistics, feature counting and feature annotation for long- and short-read sequencing. Bioinformatics. 2019 Jul 15;35(14):2489-2491. https://doi.org/10.1093/bioinformatics/bty1007

License

Alfred is distributed under the BSD 3-Clause license. Consult the accompanying LICENSE file for more details.

Credits

HTSlib is heavily used for all alignment and VCF/BCF processing. Boost for various data structures and algorithms and Edlib for pairwise alignments using edit distance.

alfred's People

Contributors

dependabot[bot] avatar mhyfritz avatar tobiasrausch avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

alfred's Issues

Repeated value for Y-axis tick labels for some plots

Dear all,

I have used alfred to produce QC plots from ONT sequencing and I noticed that some plots (e.g. Base Content Distribution, Read Length Distribution) all have zero's on the Y-axis tick labels. I though this could be an issue, for instance, with my local R installation, which generates the plots. Though I have noticed that the plots that come with the installation on the example folder (`stats.tsv.gz.pdf) also have only zero's or one's for some plots (e.g. Base Content Distribution and InDel Homopolymer Context)

On-target coverage BED file

Hi there, I'm trying to use Alfred to look at on-target statistics, but I can't figure out the correct BED format expected by the tool. At the very minimum I have my BED file of targets with the three required fields (chrom, chromStart, chromEnd). Is any additional information required?
Alfred will still run and generate an output, but none of the on-target statistics are present (i.e., the following columns in the tsv are lacking leading me to believe there's a problem with parsing the BED file).

#TotalBedBp #AlignedBasesInBed FractionInBed EnrichmentOverBed
0 0 0 -nan

Error in Rscript

Hi,

Thank you for bringing this excellent tool.

I was trying to plot the qc.tsv.gz file but it keeps giving errors. This is the output I obtained:
WARNING: ignoring environment value of R_HOME
[1] "Base Content"
Error: Must request at least one colour from a hue palette.
Execution halted

unrecognised option '-f' with docker image

Hi,

trying to create a json output according the usage guide but with the docker image with this command:
docker run -v /tmp/myDir:/tmp trausch/alfred alfred qc -f json -r /tmp/genome.fa -o /tmp/qc.json.gz /tmp/sorted.bam

Is works fine without the -f parameter.

Additionally, I was trying to run the R-Script from the docker container and couldn't get this working either. How would I have to call this?

Thank you,
Best,
Nadine

input reference file is missing

Hi, I am trying to run this via docker on a mac El Capitan OS (couldn't build from source or find a mac version on Conda)

docker run trausch/alfred alfred qc -r /Volumes/Toshiba4Tb/GenomeReferences/hg38.fa -j qc.json.gz /Volumes/Toshiba4Tb/single_cell_data/67_10.picoplex.qs.oct18/bam/D38-7-S2.67_10.qiascout.picoplex.samblasted.sorted.bam

Input reference file is missing: /Volumes/Toshiba4Tb/GenomeReferences/hg38.fa

I have tried different directories with the same reference, and get the same result

Is there an obvious thing I am missing?

thanks
christos

Replication timing help

I have six repli-Seq data (G1,S1,S2,S3,S4,G2). I want to get the repli value like Repli-chip.

  1. I execute alfred replication. I found that the value are all integers in profile.tsv file. It seems they haven't done any standardization.
  2. I found many -1 in reptime.tsv file, can I remove it ?
  3. How do I convert them (reptime.tsv) to positive and negative values like in repli-chip file

Discorcondant value from Alfred's #SoftClippedBases and CIGAR string's S

Hi,

I am using alfred version 0.2.1 to qc my RNA-seq data using the following command (alfred qc -r <hg38.fa> -b <target_gene> -o test.alfred.out.gz test.bam)
My test.bam file contains only one read pair for test purposes as described below. In the bam file, CIGAR string of read1 of readpair_11405287 is 10S65M and that of read2 is 32M1216N43M.
I expect to have 10 as #SoftClippedBases but alfred outputted 1 as #SoftClippedBases. Does "#SoftClippedBases" represent the number of reads containing any size of soft-clipped bases rather than the size of soft-clipped bases?

test.bam
readpair_11405287 99 chr17 36486687 255 **10S65M** = 36493975 8579 GAAACAGTCTCTCCCACAAAACCATGGCGTCGCTCAAATGTAGCACCGTCGTCTGCGTGATCTGCTTGGAGAAGC F~~a~~~~~~~~~~G~~~~~G~^~~~y~~~~~o~~~~t~~~~w~~~\~~~~~~~~~~~~~~~~~~~C~H~~~~~~ NH:i:1 HI:i:1 AS:i:137 nM:i:1 NM:i:1 readpair_11405287 147 chr17 36493975 255 **32M1216N43M** = 36486687 -8579 CAGAGTTTCTTTGCAGAATTTAAAGAATTTAGGGGAATCTGCAACATTAAGAAGCTTATTGCTCAATCCACACCT ~~~~R~~g~~~r~~~~~~~~s~~e~~e~~V~~~~Z~}~~~~~~~~~~~~}~~~~}~~~~}~~~~~~a~S~~~~i~ NH:i:1 HI:i:1 AS:i:137 nM:i:1 NM:i:0

alfred qc output
Sample test Library DefaultLib #QCFail 0 QCFailFraction 0 #DuplicateMarked 0 DuplicateFraction 0 #Unmapped 0 UnmappedFraction 0 #Mapped 2 MappedFraction 1 #MappedRead1 1 #MappedRead2 1 RatioMapped2vsMapped1 1 #MappedForward 1 MappedForwardFraction 0.5 #MappedReverse 1 MappedReverseFraction 0.5 #SecondaryAlignments 0 SecondaryAlignmentFraction 0 #SupplementaryAlignments 0 SupplementaryAlignmentFraction 0 #SplicedAlignments 1 SplicedAlignmentFraction 0.5 #Pairs 1 #MappedPairs 1 MappedPairsFraction 1 #MappedSameChr 1 MappedSameChrFraction 1 #MappedProperPair 1 MappedProperFraction 1 #ReferenceBp 83257441 #ReferenceNs 337237 #AlignedBases 140 #MatchedBases 139 MatchRate 0.992857 #MismatchedBases 1 MismatchRate 0.00714286 #DeletionsCigarD 0 DeletionRate 0 HomopolymerContextDel 0 #InsertionsCigarI 0 InsertionRate 0 HomopolymerContextIns 0 **#SoftClippedBases 1** SoftClipRate 0.00714286 #HardClippedBases 0 HardClipRate 0 ErrorRate 0.0142857 MedianReadLength 0:0 DefaultLibraryLayout 2 MedianInsertSize 0 MedianCoverage 0 SDCoverage 0.918795 CoveredBp 140 FractionCovered 1.68837e-06 BpCov1ToCovNRatio 1 BpCov1ToCov2Ratio inf MedianMAPQ 255 #TotalBedBp 7959 #AlignedBasesInBed 140 FractionInBed 1 EnrichmentOverBed 10418.4

install error

i install alfred use comman "make all",and i account an compilation error like below:

lboost_system -lboost_program_options -lboost_date_time -L/Analysis/biosoft/02.RNA_Soft/alfred/src/htslib/ -L/Analysis/biosoft/02.RNA_Soft/alfred/src/htslib//lib -lpthread -lhts -lz -llzma -lbz2 -Wl,-rpath,/Analysis/biosoft/02.RNA_Soft/alfred/src/htslib/
In file included from src/alfred.cpp:50:0:
src/count_rna.h:27:44: fatal error: boost/icl/split_interval_map.hpp: No such file or directory
#include <boost/icl/split_interval_map.hpp>
^
compilation terminated.
make: *** [src/alfred] Error 1

Is it possible that the installation package I downloaded is incomplete,or my system problem ,thanks!

MedianMAPQ

Hi,
thank you for your useful work!

I am confused with MedianMAPQ in output.
It is the median of all records' MAPQ of reads or all primary records' MAPQ of reads?

I have several bam files of ONT and all of them have MedianMAPQ=60.

SDCoverage

Hi,
Would you please provide a link or detail on the output values of alfred qc.
Especially "SDCoverage"
Jeffin

Fail to open index

Hello, I am trying to generate BAM quality control file recently but it is failing with this error. I am assuming -r option is asking for reference genome fasta file that I have downloaded from UCSC. The bam file is generated from bowtie alignment.

saddams-MBP:alfred saddamhusain$ ls
S4_L_.fastq	S4_L_bwt.bam	hg38.1.ebwt	hg38.2.ebwt	hg38.3.ebwt	hg38.4.ebwt	hg38.fa		hg38.fa.fai	hg38.rev.1.ebwt	hg38.rev.2.ebwt
saddams-MBP:alfred saddamhusain$ alfred qc -r hg38.fa -o qc.tsv.gz S4_L_bwt.bam
Fail to open index for S4_L_bwt.bam
saddams-MBP:alfred saddamhusain$

Insert size histogram

Hello,

I am having some difficulty interpreting the insert size histogram and couldn't find documentation that explains this plot in depth. What does "R+ pairs" mean and could you clarify this plot in general?
Also, how should this histogram look like with good library preparation and for it to work with Delly? (I ran this tool because I ran into the warning "non-default paired-end layout").

Thank you very much!

feature request: multithreaded option for `alfred qc`

Hi Tobias

Thanks for the excellent tool!

I have a very brave request: we've been included Alfred in our pipelines, which has proved invaluable. Recently we've noticed that alfred qc takes a great deal of time for high coverage WGS BAMs. Based on what I've gathered from the source code:

https://github.com/tobiasrausch/alfred/blob/master/src/qc.h

there is no multithreaded option for alfred qc.

How difficult would it be to implement this?

Thanks, Evan

Update BioConda recipe for using Python 3.7+

Dear @tobiasrausch ,

many thanks for developing your package. I just wanted to note that the current Conda package only supports python below 3.7. This unnecessarily leaves out environments created with the two most recent Python versions. It is not a huge problem (thanks to the statically linked binaries) but it would be nice to update the recipe.

Kind regards

make: *** [.boost] Error 1

Hi,
I got this error when installing:

make all
cd src/modular-boost && ./bootstrap.sh --prefix=/scratch/genomic_med/apps/alfred/src/modular-boost --without-icu --with-libraries=iostreams,filesystem,system,program_options,date_time && ./b2 && ./b2 headers && cd ../../ && touch .boost
./bootstrap.sh: line 196: ./tools/build/src/engine/build.sh: No such file or directory
Building Boost.Build engine with toolset ... 
Failed to build Boost.Build build engine
Consult 'bootstrap.log' for more details
make: *** [.boost] Error 1

Thanks!
Tommy

How ErrorRate is calculated?

Hi,

Thanks for creating such a useful tool!

I have referred to the FAQ, but am still not sure how ErrorRate is calculated. Is ErrorRate equal to the total number of mismatches divided by the total number of aligned bases?

Regards,
Frank

can't install alfred

Hi, I'm interested in using your tool but I can't install it.

Here's my environment:

rpm -qf /etc/redhat-release
centos-release-6-10.el6.centos.12.3.x86_64
uname -s -r
Linux 2.6.32-642.15.1.el6.x86_64

when trying to run precompiled executable I get:

FATAL: kernel too old
[1]    15238 abort (core dumped)  ./alfred_v0.1.17_linux_x86_64bit

I can install from bioconda no problem. when I try to run I get:

alfred
alfred: symbol lookup error: alfred: undefined symbol: _ZN5boost15program_options3argE
(miniconda3-latest)

Tried compiling from source wtih gcc/5.4.0, intelmpi/5.1.3, boost/1.6.3, and cmake/3.9.0 but get a lot of errors - mostly having to do with boost (see attached error log).

Any ideas as to what's going wrong?

error.log

cheers,
-shane

Something wrong with on-target rate calculation?

Hi, I was trying to use alfred on WES. Seems the on-target rate calculation is much lower than the result I get from other tools. And also in the InsertSize plot, what does it mean for "Insert size > 27701 (8.61%)" ? Can insert size be so large?

Chromosome mapping with `--bed` option

Great comprehensive tool! I was just wondering if certain statistics such as ObsExpRatio in Chromosome Mapping takes into consideration the lengths of the regions in the bed file that is provided with the option --bed . in otherwords, is the expected part of the ratio based on the full length of the genome, or just based on the regions in the bed.

Imcomplete v0.2.1 release sources

Hello,

The sources (.tar.gz) of the v0.2.1 release won't compile:

averdier@bioinfo:/tmp/alfred/alfred-0.2.1$ make all                                                                                                                                                                             [102/316]
if [ -r src/htslib/Makefile ]; then cd src/htslib && autoheader && autoconf && ./configure --disable-s3 --disable-gcs --disable-libcurl --disable-plugins && make && make lib-static && cd ../../ && touch .htslib; fi
g++ -std=c++11 -isystem /tmp/alfred/alfred-0.2.1/src/jlib/ -isystem /tmp/alfred/alfred-0.2.1/src/htslib/ -pedantic -W -Wall -O3 -fno-tree-vectorize -DNDEBUG src/alfred.cpp -o src/alfred -L/tmp/alfred/alfred-0.2.1/src/htslib/ -L/tmp/a
lfred/alfred-0.2.1/src/htslib//lib -lboost_iostreams -lboost_filesystem -lboost_system -lboost_program_options -lboost_date_time  -lhts -lz -llzma -lbz2 -Wl,-rpath,/tmp/alfred/alfred-0.2.1/src/htslib/
In file included from src/alfred.cpp:28:0:
src/bamstats.h: In function ‘int32_t bamstats::bamStatsRun(TConfig&)’:
src/bamstats.h:415:119: error: no matching function for call to ‘min(hts_pos_t&, int&)’
    itRg->second.rc.brange[refIndex][psId].first = std::min(rec->core.pos, itRg->second.rc.brange[refIndex][psId].first);
                                                                                                                       ^
In file included from /usr/include/c++/7/bits/char_traits.h:39:0,
                 from /usr/include/c++/7/ios:40,
                 from /usr/include/c++/7/ostream:38,
                 from /usr/include/c++/7/iostream:39,
                 from src/alfred.cpp:3:
/usr/include/c++/7/bits/stl_algobase.h:195:5: note: candidate: template<class _Tp> const _Tp& std::min(const _Tp&, const _Tp&)
     min(const _Tp& __a, const _Tp& __b)
     ^~~
/usr/include/c++/7/bits/stl_algobase.h:195:5: note:   template argument deduction/substitution failed:
In file included from src/alfred.cpp:28:0:
src/bamstats.h:415:119: note:   deduced conflicting types for parameter ‘const _Tp’ (‘long intandint’)
    itRg->second.rc.brange[refIndex][psId].first = std::min(rec->core.pos, itRg->second.rc.brange[refIndex][psId].first);

In file included from /usr/include/c++/7/bits/char_traits.h:39:0,                                                                                                                                                                [80/316]
                 from /usr/include/c++/7/ios:40,
                 from /usr/include/c++/7/ostream:38,
                 from /usr/include/c++/7/iostream:39,
                 from src/alfred.cpp:3:
/usr/include/c++/7/bits/stl_algobase.h:243:5: note: candidate: template<class _Tp, class _Compare> const _Tp& std::min(const _Tp&, const _Tp&, _Compare)
     min(const _Tp& __a, const _Tp& __b, _Compare __comp)
     ^~~
/usr/include/c++/7/bits/stl_algobase.h:243:5: note:   template argument deduction/substitution failed:
In file included from src/alfred.cpp:28:0:
src/bamstats.h:415:119: note:   deduced conflicting types for parameter ‘const _Tp’ (‘long intandint’)
    itRg->second.rc.brange[refIndex][psId].first = std::min(rec->core.pos, itRg->second.rc.brange[refIndex][psId].first);
                                                                                                                       ^
In file included from /usr/include/c++/7/algorithm:62:0,
                 from /usr/include/boost/any.hpp:17,
                 from /usr/include/boost/program_options/value_semantic.hpp:12,
                 from /usr/include/boost/program_options/options_description.hpp:13,
                 from src/alfred.cpp:9:
/usr/include/c++/7/bits/stl_algo.h:3450:5: note: candidate: template<class _Tp> _Tp std::min(std::initializer_list<_Tp>)
     min(initializer_list<_Tp> __l)
     ^~~
/usr/include/c++/7/bits/stl_algo.h:3450:5: note:   template argument deduction/substitution failed:
In file included from src/alfred.cpp:28:0:
src/bamstats.h:415:119: note:   mismatched types ‘std::initializer_list<_Tp>’ and ‘long int’
    itRg->second.rc.brange[refIndex][psId].first = std::min(rec->core.pos, itRg->second.rc.brange[refIndex][psId].first);
                                                                             
In file included from /usr/include/c++/7/algorithm:62:0,
                 from /usr/include/boost/any.hpp:17,
                 from /usr/include/boost/program_options/value_semantic.hpp:12,
                 from /usr/include/boost/program_options/options_description.hpp:13,
                 from src/alfred.cpp:9:
/usr/include/c++/7/bits/stl_algo.h:3456:5: note: candidate: template<class _Tp, class _Compare> _Tp std::min(std::initializer_list<_Tp>, _Compare)
     min(initializer_list<_Tp> __l, _Compare __comp)
     ^~~
/usr/include/c++/7/bits/stl_algo.h:3456:5: note:   template argument deduction/substitution failed:
In file included from src/alfred.cpp:28:0:
src/bamstats.h:415:119: note:   mismatched types ‘std::initializer_list<_Tp>’ and ‘long int’
    itRg->second.rc.brange[refIndex][psId].first = std::min(rec->core.pos, itRg->second.rc.brange[refIndex][psId].first);
                                                                                                                       ^
Makefile:52: recipe for target 'src/alfred' failed
make: *** [src/alfred] Error 1
averdier@bioinfo:/tmp/alfred/alfred-0.2.1$ make all 2>&1 > compile_fail
In file included from src/alfred.cpp:28:0:
src/bamstats.h: In function ‘int32_t bamstats::bamStatsRun(TConfig&)’:
src/bamstats.h:415:119: error: no matching function for call to ‘min(hts_pos_t&, int&)’
    itRg->second.rc.brange[refIndex][psId].first = std::min(rec->core.pos, itRg->second.rc.brange[refIndex][psId].first);

In file included from /usr/include/c++/7/bits/char_traits.h:39:0,
                 from /usr/include/c++/7/ios:40,
                 from /usr/include/c++/7/ostream:38,
                 from /usr/include/c++/7/iostream:39,
                 from src/alfred.cpp:3:
/usr/include/c++/7/bits/stl_algobase.h:195:5: note: candidate: template<class _Tp> const _Tp& std::min(const _Tp&, const _Tp&)
     min(const _Tp& __a, const _Tp& __b)
     ^~~
/usr/include/c++/7/bits/stl_algobase.h:195:5: note:   template argument deduction/substitution failed:
In file included from src/alfred.cpp:28:0:
src/bamstats.h:415:119: note:   deduced conflicting types for parameter ‘const _Tp’ (‘long intandint’)
    itRg->second.rc.brange[refIndex][psId].first = std::min(rec->core.pos, itRg->second.rc.brange[refIndex][psId].first);
                                                                                                                       ^
In file included from /usr/include/c++/7/bits/char_traits.h:39:0,
                 from /usr/include/c++/7/ios:40,
                 from /usr/include/c++/7/ostream:38,
                 from /usr/include/c++/7/iostream:39,
                 from src/alfred.cpp:3:
/usr/include/c++/7/bits/stl_algobase.h:243:5: note: candidate: template<class _Tp, class _Compare> const _Tp& std::min(const _Tp&, const _Tp&, _Compare)
     min(const _Tp& __a, const _Tp& __b, _Compare __comp)
     ^~~
/usr/include/c++/7/bits/stl_algobase.h:243:5: note:   template argument deduction/substitution failed:
In file included from src/alfred.cpp:28:0:
src/bamstats.h:415:119: note:   deduced conflicting types for parameter ‘const _Tp’ (‘long intandint’)
    itRg->second.rc.brange[refIndex][psId].first = std::min(rec->core.pos, itRg->second.rc.brange[refIndex][psId].first);

In file included from /usr/include/c++/7/algorithm:62:0,
                 from /usr/include/boost/any.hpp:17,
                 from /usr/include/boost/program_options/value_semantic.hpp:12,
                 from /usr/include/boost/program_options/options_description.hpp:13,
                 from src/alfred.cpp:9:
/usr/include/c++/7/bits/stl_algo.h:3450:5: note: candidate: template<class _Tp> _Tp std::min(std::initializer_list<_Tp>)
     min(initializer_list<_Tp> __l)
     ^~~
/usr/include/c++/7/bits/stl_algo.h:3450:5: note:   template argument deduction/substitution failed:
In file included from src/alfred.cpp:28:0:
src/bamstats.h:415:119: note:   mismatched types ‘std::initializer_list<_Tp>’ and ‘long int’
    itRg->second.rc.brange[refIndex][psId].first = std::min(rec->core.pos, itRg->second.rc.brange[refIndex][psId].first);
                                                                                                                       ^
In file included from /usr/include/c++/7/algorithm:62:0,
                 from /usr/include/boost/any.hpp:17,
                 from /usr/include/boost/program_options/value_semantic.hpp:12,
                 from /usr/include/boost/program_options/options_description.hpp:13,
                 from src/alfred.cpp:9:
/usr/include/c++/7/bits/stl_algo.h:3456:5: note: candidate: template<class _Tp, class _Compare> _Tp std::min(std::initializer_list<_Tp>, _Compare)
     min(initializer_list<_Tp> __l, _Compare __comp)

     ^~~
/usr/include/c++/7/bits/stl_algo.h:3456:5: note:   template argument deduction/substitution failed:
In file included from src/alfred.cpp:28:0:
src/bamstats.h:415:119: note:   mismatched types ‘std::initializer_list<_Tp>’ and ‘long int’
    itRg->second.rc.brange[refIndex][psId].first = std::min(rec->core.pos, itRg->second.rc.brange[refIndex][psId].first);
                                                                                                                       ^
make: *** [src/alfred] Error 1

However, cloning the repo then switching to the v0.2.1 tag before compiling work:

averdier@bioinfo:/tmp/alfred$ git clone --recursive https://github.com/tobiasrausch/alfred.git
Cloning into 'alfred'...
remote: Enumerating objects: 240, done.
remote: Counting objects: 100% (240/240), done.
remote: Compressing objects: 100% (180/180), done.
remote: Total 2866 (delta 113), reused 134 (delta 49), pack-reused 2626
Receiving objects: 100% (2866/2866), 20.88 MiB | 2.85 MiB/s, done.
Resolving deltas: 100% (1768/1768), done.
Submodule 'src/htslib' (https://github.com/samtools/htslib.git) registered for path 'src/htslib'
Cloning into '/tmp/alfred/alfred/src/htslib'...
remote: Enumerating objects: 75, done.
remote: Counting objects: 100% (75/75), done.
remote: Compressing objects: 100% (54/54), done.
remote: Total 13238 (delta 35), reused 35 (delta 21), pack-reused 13163
Receiving objects: 100% (13238/13238), 9.89 MiB | 2.08 MiB/s, done.
Resolving deltas: 100% (9508/9508), done.
Submodule path 'src/htslib': checked out '1832d3a1b75133e55fb6abffc3f50f8a6ed5ceae'

averdier@bioinfo:/tmp/alfred$ cd alfred

averdier@bioinfo:/tmp/alfred/alfred$ git checkout tags/v0.2.1 -b v0.2.1-branch
Switched to a new branch 'v0.2.1-branch'

averdier@bioinfo:/tmp/alfred/alfred$ git log --oneline --decorate -n 5
2812b18 (HEAD -> v0.2.1-branch, tag: v0.2.1) autoconf
4cebee6 autoconf
d0d7208 v0.2.1
7798f6f xz removed
2c8ff68 xz

averdier@bioinfo:/tmp/alfred/alfred$ make all
if [ -r src/htslib/Makefile ]; then cd src/htslib && autoheader && autoconf && ./configure --disable-s3 --disable-gcs --disable-libcurl --disable-plugins && make && make lib-static && cd ../../ && touch .htslib; fi
checking for gcc... gcc
checking whether the C compiler works... yes
checking for C compiler default output file name... a.out
checking for suffix of executables...
checking whether we are cross compiling... no
[...]
make[1]: Leaving directory '/tmp/alfred/alfred/src/htslib'
make[1]: Entering directory '/tmp/alfred/alfred/src/htslib'
make[1]: Nothing to be done for 'lib-static'.
make[1]: Leaving directory '/tmp/alfred/alfred/src/htslib'
g++ -std=c++11 -isystem /tmp/alfred/alfred/src/jlib/ -isystem /tmp/alfred/alfred/src/htslib/ -pedantic -W -Wall -O3 -fno-tree-vectorize -DNDEBUG src/alfred.cpp -o src/alfred -L/tmp/alfred/alfred/src/htslib/ -L/tmp/alfred/alfred/src/htslib//lib -lboost_iostreams -lboost_filesystem -lboost_system -lboost_program_options -lboost_date_time  -lhts -lz -llzma -lbz2 -Wl,-rpath,/tmp/alfred/alfred/src/htslib/

Like the error is pointing it out, the release tar is missing the htslib (alfred is the git clone, alfred-0.2.1 is the sources tar):

averdier@bioinfo:/tmp/alfred$ diff -qr alfred alfred-0.2.1
Only in alfred: .git
Only in alfred/src/htslib: .appveyor.yml
Only in alfred/src/htslib: autom4te.cache
Only in alfred/src/htslib: bcf_sr_sort.c
Only in alfred/src/htslib: bcf_sr_sort.h
Only in alfred/src/htslib: bgzf.c
Only in alfred/src/htslib: bgzip.1
Only in alfred/src/htslib: bgzip.c
Only in alfred/src/htslib: config.h
Only in alfred/src/htslib: config.h.in
Only in alfred/src/htslib: config.log
Only in alfred/src/htslib: config.mk
Only in alfred/src/htslib: config.mk.in
Only in alfred/src/htslib: config.status
Only in alfred/src/htslib: configure
Only in alfred/src/htslib: configure.ac
Only in alfred/src/htslib: cram
Only in alfred/src/htslib: errmod.c
Only in alfred/src/htslib: faidx.5
Only in alfred/src/htslib: faidx.c
Only in alfred/src/htslib: .git
Only in alfred/src/htslib: .gitattributes
Only in alfred/src/htslib: .gitignore
Only in alfred/src/htslib: hfile.c
Only in alfred/src/htslib: hfile_gcs.c
Only in alfred/src/htslib: hfile_internal.h
Only in alfred/src/htslib: hfile_libcurl.c
Only in alfred/src/htslib: hfile_net.c
Only in alfred/src/htslib: hfile_s3.c
Only in alfred/src/htslib: hts.c
Only in alfred/src/htslib: htsfile.1
Only in alfred/src/htslib: htsfile.c
Only in alfred/src/htslib: hts_internal.h
Only in alfred/src/htslib: htslib
Only in alfred/src/htslib: htslib.mk
Only in alfred/src/htslib: htslib.pc.in
Only in alfred/src/htslib: htslib.pc.tmp
Only in alfred/src/htslib: htslib_vars.mk
Only in alfred/src/htslib: hts_os.c
Only in alfred/src/htslib: INSTALL
Only in alfred/src/htslib: kfunc.c
Only in alfred/src/htslib: knetfile.c
Only in alfred/src/htslib: kstring.c
Only in alfred/src/htslib: LICENSE
Only in alfred/src/htslib: m4
Only in alfred/src/htslib: Makefile
Only in alfred/src/htslib: md5.c
Only in alfred/src/htslib: multipart.c
Only in alfred/src/htslib: NEWS
Only in alfred/src/htslib: os
Only in alfred/src/htslib: plugin.c
Only in alfred/src/htslib: probaln.c
Only in alfred/src/htslib: README
Only in alfred/src/htslib: README.md
Only in alfred/src/htslib: realn.c
Only in alfred/src/htslib: regidx.c
Only in alfred/src/htslib: sam.5
Only in alfred/src/htslib: sam.c
Only in alfred/src/htslib: synced_bcf_reader.c
Only in alfred/src/htslib: tabix.1
Only in alfred/src/htslib: tabix.c
Only in alfred/src/htslib: tbx.c
Only in alfred/src/htslib: test
Only in alfred/src/htslib: textutils.c
Only in alfred/src/htslib: textutils_internal.h
Only in alfred/src/htslib: thread_pool.c
Only in alfred/src/htslib: thread_pool_internal.h
Only in alfred/src/htslib: .travis.yml
Only in alfred/src/htslib: vcf.5
Only in alfred/src/htslib: vcf.c
Only in alfred/src/htslib: vcf_sweep.c
Only in alfred/src/htslib: vcfutils.c
Only in alfred/src/htslib: version.sh

Insert size discordant with samtools

For the same aligned FASTQ pair from WGS, I get a mean insert size of 362 from samtools stats. From alfred qc I get a median insert size of 65535, and actually ,all but a handful of readpairs are assigned this insert size according to the output from zgrep ^IS.

I can't really figure out what's going on. Any clues?

[Question] Calculation of indel rate

Hi Tobias,

In the output of albert, when using .metrics.tsv file, there is an insertion and deletion rate.

But I'm wondering how exactly this score means and how it is calculated. Is it a score per base ? Is it calculated overall the sequence and with the reads coverage ?

Thanks in advance for your answers !

Cheers,

Roxane

readgroup stats

Hi i am looking for a lightwheigt tool to get stats from bam files with multiple readgroups.
I thought you implemented this feature in bamStat.

But when i run bamStat, it complains about:

Only one sample (@rg:SM) is allowed per input BAM file RNA2013.07080910.IL2-1ax.bam

Is this feature still in progress?
Thanks,
michel

ase and split not working

Hi,

Using a multi-sample phased vcf file with eagle2 and even the 1KG phased VCF or a toy VCF. The bam file passed the QC.

Both VCF and BAM are from RNA-seq. split did not give any error but no read was split

Thanks
A

Hybrid-selection metrics from alfred qc

Hi there

I was curious about how Alfred worked with hybrid-selection metrics. The motivation was we were discussing the differences between Alfred and Picard/GATK4's "CollectHSMetrics" for WES samples:

https://software.broadinstitute.org/gatk/documentation/tooldocs/4.0.5.1/picard_analysis_directed_CollectHsMetrics.php

java -jar picard.jar CollectHsMetrics \
      I=input_reads.bam \
      O=output_hs_metrics.txt \
      R=reference.fasta \
      BAIT_INTERVALS=bait.interval_list \
      TARGET_INTERVALS=target.interval_list

This requires both target and bait intervals.

It's not clear to what extent Alfred provides HS metrics, or whether the metrics output by e.g. CollectHSMetrics would differ that much from Alfred.

Somewhat related to this issue: #17

Any insight on your part is appreciated. Thank you for the help

installation error

Hello, I have installed Alfred from bioconda but when I run the following command I am seeing this error. Could you please tell me how can I resolve it? Thank you in advance.

I am on OSX Catalina and have tried brew update and brew upgrade as it worked for others with similar issue.


saddams-MBP:~ saddamhusain$ alfred
dyld: Library not loaded: @rpath/libboost_iostreams.dylib
  Referenced from: /Users/saddamhusain/opt/anaconda2/pkgs/alfred-0.2.1-h098d73b_1/bin/alfred
  Reason: image not found
Abort trap: 6

Alfred consensus: using -p

Hi Tobias,

Thanks for Alfred and thanks for making it public! I'm trying to use Alfred to compute a consensus from comparatively short ONT (duplex) reads against short references (200nt in one, 400nt in another case), where samtools consensus misses some indels. I would like to compute the consensus for the entire length of the alignment. How am I supposed to use -p in this setting? I noticed that if I send this to 1 or reflen, I often don't get a consensus, because there are no overlapping reads/bases at the extreme ends. If I use something like -p sq:200 (for the 400nt reference) I do get an excellent consensus for seemingly the entire length, which is what I want. Does Alfred in this case only look at reads that overlap with pos 200? Should I use the setting differently?

Many thanks,
Andreas

paired-end view

I use alfred to extract reads aligned to a particular region from a bam file. My bam file is derived from Illumina paired-end read data. Currently I can extract all reads mapping to a particular location, which is more of less what one would see in IGV individual read view. However, IGV has the option to view reads as paired and I would really like to be able to extract the reads from a bam file as such. Would it be possible to provide that option in alfred? It would greatly help me and maybe some others as well. Thanks in advance.

Alignment metrics

I am using albert for evaluating Oxford Nanopore reads, and it serves my purpose very well, thanks! I have a question concerning the alignment metrics from the file qc.tsv.gz. Could you explain some of the stats? I am particularly interested how you calculate the ErrorRate.
Best!

consensus command

Hi,
Is the consensus command available in the last version?

[]$ ./alfred_v0.1.12_linux_x86_64bit consensus
Unrecognized command consensus

Thanks!

A question of the example shell of the Haplotype-specific

Hi Tobias
I have a question of the example shell in the software folder.
企业微信截图_16418016297093
When you convert the vcf file to the bcf format, you use the parameter (-m2 -M2 -c 1 -C 1) to filter some variant sites.
It seemed that this parameter would retain the variant sites of 0/1 type. Why you use this parameter?

consensus command

Hello,

I'm sorry, but I'm not sure I understand what the consensus command does... Because when I tried, it takes one genome position and return a sequence of different lengt depending of the genome location.
Does it take all the reads aligned in one position, and then generate a consensus sequence of all the group of reads ?

I'm a virologist and I'm looking for a new tool (different from bcftools) that generate consensus sequence of a complete genome (and in my case less than one hundred thousand nucleotide), does alfred help me with that ?
Thank a lot.

Alexandre

Softclip rate appears to be incorrect

Softclip rate seems to count the number of reads that have an S in the cigar string. However, error rate is computed by adding mismatch rate and softclip rate. This is only valid if softclip rate was measured in bases and not in reads, given that many bases can be clipped in a read. The result is a dramatic underestimate of error rate when significant clipping has occurred, e.g. in BWA.

windowsize option not working

I tried to extract all reads aligning to a specific region from a bam file. I tried to set the window size larger than the default = 5, but no matter what value I entered larger or smaller, it returned to default.

merged bam files

I am finding this really useful. There is a minor issues I have- not a big deal, but I thought I should mention. It doesnt usually work on merged bam files (samtools merge, even after sorting). It starts parsing, and stops at ~92-98%. It worked for one bed file, for some samples only, but not for all samples, and not for other bed files. The unmerged bam files seem to work.

fatal error: boost/program_options/cmdline.hpp: No such file or directory

Hi,

I am trying to install alfred but getting the following error where it is unable to find the boost folder and the binaries in it:

(wes-env) [rathik@reslnrefo01 alfred]$ ls
alfred.png  docker  exampledata  exampleplots  gtf  LICENSE  Makefile  maps  R  README.md  src

(wes-env) [rathik@reslnrefo01 alfred]$ make all
g++ -isystem /home/rathik/tools/alfred/src/htslib/ -pedantic -W -Wall -Wno-unknown-pragmas -D__STDC_LIMIT_MACROS -fno-strict-aliasing -O3 -fno-tree-vectorize -DNDEBUG src/alfred.cpp -o src/alfred -L/home/rathik/tools/alfred/src/htslib/ -L/home/rathik/tools/alfred/src/htslib//lib -lboost_iostreams -lboost_filesystem -lboost_system -lboost_program_options -lboost_date_time  -lhts -lz -llzma -lbz2 -Wl,-rpath,/home/rathik/tools/alfred/src/htslib/
src/alfred.cpp:31:45: fatal error: boost/program_options/cmdline.hpp: No such file or directory
 #include <boost/program_options/cmdline.hpp>
                                             ^
compilation terminated.
make: *** [src/alfred] Error 1

Possible to get consensus sequence for whole bacterial genome?

Hi Tobias,

is it possible to get the consensus sequence for a whole bacterial genome (5498578 bp - https://www.ncbi.nlm.nih.gov/nuccore/BA000007)?

when I run /alfred consensus -f bam -t ont ../ref494_aligned.sorted.bam

I get

[2019-Jul-10 11:57:01] alfred consensus -f bam -t ont ../ref494_aligned.sorted.bam                       │
Input format: bam                                                                                        │
Sequencing type: ont                                                                                     │
Alignment scoring (match: 3, mismatch: -2, gapopen: -3, gapext: -1)                                      │
Window: 250                                                                                              │
Position needs to be in the format chr:pos

Thanks in advance and best regards

What are the statistics affected by `alfred qc --bed`?

Hi Tobias

I haven't dug closely into the source code for this, so apologies if this question is a bit lazy:

What are the metrics affects by using the optional --bed flag for alfred qc?

I suspect this affects things like target coverage calculated....but I'm not sure what else.

Given a standard WGS normal BAM at 40x, what would you expect the different to be between including the target BED or excluding it?

Thank you for the help

formate error

I am sorry to bother you once again, but Alfred qc is throwing this error now. I have tried to find the nature of error on src/qc.h file but couldn't locate it there, perhaps its in some other directory. Please help me out.


saddams-MBP:alfred saddamhusain$ samtools view S4_L_sort.bam | head -4
SOLEXA-1GA-2_0001:7:69:374:1352/1	0	chr1	10159	255	35M	*	0	0	ACCCTAACCCTACCCCTAACCTAACCATACCCCTA	BBBBBBBBBBBCB?A@B>A<AABA@@BBA=@68AB	XA:i:3	MD:Z:12A13C2A5	NM:i:3	XM:i:2
SOLEXA-1GA-2_0001:7:62:1502:34/1	0	chr1	19747	255	35M	*	0	0	GCACCATCTCCTTCCAGTGAGGAAGCGGGAAAAAC	BCCBBCBBBCBBBBBAB@BABBBABBA9'(B>89B	XA:i:3	MD:Z:30C0C1C1	NM:i:3	XM:i:2
SOLEXA-1GA-2_0001:7:19:236:957/1	0	chr1	159007	255	35M	*	0	0	AACACACACACACACACAGAAACACCCCCAATATC	BBAB8B@A>A=@?B@A2@>?@A(>.(7:<?-7@?<	XA:i:3	MD:Z:18C1C11C2	NM:i:3	XM:i:2
SOLEXA-1GA-2_0001:7:62:220:1852/1	16	chr1	180839	255	34M	*	0	0	AACCCTAACAACCCTAACACTAACCCTAACCAAC	:1-<=7=<;@8?95;B>A>A@B>@BB@C=BBB@B	XA:i:3	MD:Z:18C11A1C1	NM:i:3	XM:i:2
saddams-MBP:alfred saddamhusain$ alfred qc -r S4_L_sort.bam.bai -o qc.tsv.gz S4_L_sort.bam
[E::fai_build_core] Format error, unexpected "B" at line 1
[E::fai_build_core] Format error, unexpected "B" at line 1
Fail to open genome fai index for S4_L_sort.bam.bai
saddams-MBP:alfred saddamhusain$ 

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.