Giter Club home page Giter Club logo

hla's Introduction

xHLA: Fast and accurate HLA typing from short read sequence data

Travis-CI

Author Chao Xie
Haibao Tang (tanghaibao)
License See included LICENSE

The Human Leukocyte Antigen (HLA) gene complex on human chromosome 6 is one of the most polymorphic regions in the human genome, and contributes in large part to the diversity of the immune system. Accurate typing of HLA genes with short read sequencing data has historically been difficult due to the sequence similarity between the polymorphic alleles. xHLA iteratively refines the mapping results at the amino acid level to achieve 99 to 100% 4-digit typing accuracy for both class I and II HLA genes, taking only about 3 minutes to process a 30X whole genome BAM file on a desktop computer.

Installation

Simply pull from Docker hub:

docker pull humanlongevity/hla

Or, compile docker image yourself:

cd docker
make build
make deploy

Usage

Run xHLA caller directly on an indexed BAM file generated using BWA-mem against hg38 reference without alt contigs:

docker run -v `pwd`:`pwd` -w `pwd` humanlongevity/hla \
    --sample_id test --input_bam_path tests/test.bam \
    --output_path test

For other types of BAMs, pre-processing is required. Please check details here.

Output is a JSON file that lists 12 HLA alleles, 2 for each of the HLA genes:

{
 "subject_id": "test",
 "creation_time": "2017-10-04T18:59:34Z",
 "report_version": "1.2",
 "report_type": "hla_typing",
 "sample_id": "test",
 "hla": {
  "alleles": [
   "A*01:01",
   "A*02:01",
   "B*13:02",
   "B*37:01",
   "C*06:02",
   "C*06:02",
   "DPB1*04:01",
   "DPB1*04:01",
   "DQB1*02:02",
   "DQB1*05:01",
   "DRB1*07:01",
   "DRB1*10:01"
  ]
 }
}

Citation

Xie et al. (2017) Fast and accurate HLA typing from short-read next-generation sequence data with xHLA. PNAS doi:10.1073/pnas.1707945114

License

The HLA Typing Software Code (the "Code") is made available by Human Longevity, Inc. ("HLI") on a non-exclusive, non-sublicensable, non-transferable basis solely for non-commercial academic research use. Commercial use of the Code is expressly prohibited. If you would like to obtain a license to the Code for commercial use, please contact HLI at [email protected]. HLI MAKES NO REPRESENTATIONS OR WARRANTIES WHATSOEVER, EITHER EXPRESS OR IMPLIED, WITH RESPECT TO THE CODE PROVIDED HEREUNDER. IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE WITH RESPECT TO CODE ARE EXPRESSLY DISCLAIMED. THE CODE IS FURNISHED "AS IS" AND "WITH ALL FAULTS" AND DOWNLOADING OR USING THE CODE IS UNDERTAKEN AT YOUR OWN RISK. TO THE FULLEST EXTENT ALLOWED BY APPLICABLE LAW, IN NO EVENT SHALL HLI BE LIABLE, WHETHER IN CONTRACT, TORT, WARRANTY, OR UNDER ANY STATUTE OR ON ANY OTHER BASIS FOR SPECIAL, INCIDENTAL, INDIRECT, PUNITIVE, MULTIPLE OR CONSEQUENTIAL DAMAGES SUSTAINED BY YOU OR ANY OTHER PERSON OR ENTITY ON ACCOUNT OF USE OR POSSESSION OF THE CODE, WHETHER OR NOT FORESEEABLE AND WHETHER OR NOT HLI HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES, INCLUDING WITHOUT LIMITATION DAMAGES ARISING FROM OR RELATED TO LOSS OF USE, LOSS OF DATA, DOWNTIME, OR FOR LOSS OF REVENUE, PROFITS, GOODWILL, BUSINESS OR OTHER FINANCIAL LOSS.

hla's People

Contributors

andrewrech avatar tanghaibao avatar xchao-hli avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

hla's Issues

full resolution xHLA

Hi,
Thanks very much for making this resource available.

I am not familiar with docker - but have successfully managed to get xHLA to run on my samples. However, I am unclear how to run xHLA in full resolution mode, to go beyond 4-digit resolution typing? Is there a simple way of achieving this?

Thanks so much for your help,
--Nicky

Error opening file /opt/data/hla.dmnd

Hello,

I receive the following error when attempting to call on a file as described in the README.

docker run -v $TMPRAIDDIR:$TMPRAIDDIR -w $TMPRAIDDIR docker-dev.hli.io/xchao/hla \
    --sample_id test --input_bam_path $1.bam \
    --output_path ./test
[06/Jul/2017 18:35:55] INFO - Xie Chao's HLA typing algorithm
[06/Jul/2017 18:35:55] INFO - Sample_id: test Input file: test.bam
Extracting reads from S3
Aligning reads to IMGT database
processing FASTQ file
        found 11634 reads
processing MSA file
        found 26381 HLA exons
processing FASTQ file
Error: Error opening file /opt/data/hla.dmnd
        matched to 0 HLA exons
        0 reads matched to HLA types not in the MSA file
translating matches to MSA
Typing
Error in fread(align.path) : File is empty: hla-test/test.tsv
Execution halted
Traceback (most recent call last):
  File "/opt/bin/run.py", line 57, in <module>
    check_call([bin_path, args.input_bam_path, args.sample_id])
  File "/usr/lib/python2.7/subprocess.py", line 540, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['/opt/bin/typer.sh', 'test.bam', 'test']' returned non-zero exit status 1

It appears hla.dmnd is missing. I see one line about this file commented out in data/script/batch.sh:

#diamond index hla.faa to hla.dmnd based on the DIAMOND version you are using

Thank you

issues with missing calls and accuracy

I ran xHLA on samples from WES data from the 1000 Genomes Project that are also in HapMap, and obtain a massive number of missing calls (~20% for MHC I and 30% for MHC II). The accuracy on the remaining calls is also low.

sample command used:
run.py --sample_id NA19137_SRR792560_output --input_bam_path NA19137_SRR792560.recal.bam --output NA19137_SRR792560_results

Any help or comment on how to correctly use xHLA is much appreciated! Since the paper mentioned an accuracy of 99%, our results seem far from that.

Also perhaps an explanation about the output files would be beneficial for troubleshooting.

Thanks!

how do you perform the full resolution typing?

I am able to perform the 4 digit typing on the sample data (test.bam) using the following command line:

docker run -v pwd:pwd -w pwd humanlongevity/hla
--sample_id test --input_bam_path tests/test.bam
--output_path test

However, it is not clear to me how to perform the full resolution typing. Please could you assist me with this?

samtools view: failed to open

Greetings,

I have not seen my particular issue posted yet, and hopefully it is not too trivial (but I suspect it is). I am able to run the test.bam no problem. When I try some of my data, samtools says it does not exist. And I assure you the file exists and can be viewed outside of this deployment.

docker run -v `pwd:`pwd` -w `pwd` humanlongevity/hla --sample_id "120002" --input_bam_path "/home/bwubb/Parvati/bam_data/FLG2_AFR/GRCh38/120002.ready.bam" --output_path test
[19/Jan/2018 18:55:25] INFO - Xie Chao's HLA typing algorithm
[19/Jan/2018 18:55:25] INFO - Sample_id: 120002 Input file: /home/bwubb/Parvati/bam_data/FLG2_AFR/GRCh38/120002.ready.bam
typer.sh parameters: DELETE=false FULL=false
Extracting reads from S3
[E::hts_open_format] fail to open file '/home/bwubb/Parvati/bam_data/FLG2_AFR/GRCh38/120002.ready.bam'
samtools view: failed to open "/home/bwubb/Parvati/bam_data/FLG2_AFR/GRCh38/120002.ready.bam" for reading: No such file or directory
Traceback (most recent call last):
  File "/opt/bin/run.py", line 64, in <module>
    check_call(bin_args)
  File "/usr/lib/python2.7/subprocess.py", line 540, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['/opt/bin/typer.sh', '/home/bwubb/Parvati/bam_data/FLG2_AFR/GRCh38/120002.ready.bam', '120002']' returned non-zero exit status 1

Hopefully, I am just overlooking something foolish. Thank you.

Unable to compile xHLA

dhwani@dhwani-Precision-WorkStation-T7500:~/Documents/hla_scan/test_out/HLA-master/docker$ make build
docker build -t docker-dev.hli.io/xchao/hla-dev -f dev.dockerfile .
flag provided but not defined: -f

Usage: docker build [OPTIONS] PATH | URL | -

Build a new container image from the source code at PATH

-q=false: Suppress verbose build output
-t="": Tag to be applied to the resulting image in case of success
docker tag docker-dev.hli.io/xchao/hla-dev docker-dev.hli.io/xchao/hla-dev:0.0.0
2017/07/17 10:35:27 dial unix /var/run/docker.sock: permission denied
make: *** [build] Error 1

Fail to run with test.bam file

xhla_errormessage

I get an error when trying to run your test.bam (tests/test.bam) on Docker. It said:
[E::hts_open_format] fail to open file 'tests/test.bam'
samtools view: failed to open "tests/test.bam" for reading: No such file or directory

The script I ran is as below:
docker run -v pwd:pwd -w pwd humanlongevity/hla --sample_id test --input_bam_path tests/test.bam --output_path test

You can see the whole stack trace in the attached file. Can you help to run the script as I am not familiar with Docker? Thank you very much.

error occured when running

done lpsolve
[1] "A24:02" "A33:03" "B15:01" "B58:01" "C03:02"
[6] "C
04:01" "DPB102:01" "DPB104:01" "DQB102:01" "DQB105:03"
[11] "DRB103:01" "DRB114:05"
pulling non-core exons in
Error in mcfork() :
unable to fork, possible reason: Cannot allocate memory
Calls: do.call -> mclapply -> lapply -> FUN -> mcfork
Execution halted
Warning message:
system call failed: Cannot allocate memory
Traceback (most recent call last):
File "/opt/bin/run.py", line 64, in
check_call(bin_args)
File "/usr/lib/python2.7/subprocess.py", line 540, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['/opt/bin/typer.sh', 'hla-1-no-filter.sam.bam.sorted.bam', 'test']' returned non-zero exit status 1
################################################
the bam file size is only 170 Mb, I used a cloud server (8cores+24G ram), I don't know why this error happens. Can anyone help me!

problem when run get-reads-alt-unmap.sh

When I run get-reads-alt-unmap.sh in.bam out.bam to process my data, there is an error message:

*****WARNING: Query ERR091574.99978736 is marked as paired, but its mate does not occur next to it in your BAM file. Skipping.
...
*****WARNING: Query ERR091574.9999745 is marked as paired, but its mate does not occur next to it in your BAM file. Skipping.

sambamba 0.7.0
by Artem Tarasov and Pjotr Prins (C) 2012-2019
LDC 1.13.0 / DMD v2.083.1 / LLVM7.0.1 / bootstrap LDC - the LLVM D compiler (0.17.6)

[bam_header_read] EOF marker is absent. The input is probably truncated.
[bwt_restore_sa] SA-BWT inconsistency: primary is not the same. Abort!
[bam_header_read] invalid BAM binary header (this is not a BAM file).
[main_samview] fail to read the header from "-".
sambamba-sort: not enough data in stream
rm: cannot remove 'out.bam.full.bam': No such file or directory
rm: cannot remove 'out.bam.full.bam.bai': No such file or directory

How can I solve this problem?

An error in running get-reads-alt-unmap.sh script

I tried to run get-reads-alt-unmap.sh script:
./get-reads-alt-unmap.sh my_file.bam processed.bam
to pre-process a bam file but kept receiving error staring with:
sambamba-view: Reference with name my_file.bam does not exist
I was working in the directory where the bam file is in.

xHLA:for non-human samples

Hello,
I like to use xHLA: for non-human samples like primate and pig genome bam. If you don't mind can you please let us know how to create new reference folder HLA/data/ for non-human organisms and is there anything else we need to change to apply this for non-human genomes. Your help with this will be greatly appreciated.

Thanks,

Dharm

Script typing.r breaks

Hi,

I managed to generate a new bam file using get-reads-alt-unmap.sh. When I use this new bam file as an input, the code breaks while running typing.r. When I follow the cat statements, I discovered that it breaks between line 310 and 350 in the section starting

more <- do.call(rbind, mclapply(solution, function(s){..............}

The error message reads like this.

/opt/bin/typer.sh: line 45:   822 Killed                  $BIN/typing.r $OUT/${ID}.tsv $OUT/${ID}.hla
Traceback (most recent call last):
  File "/opt/bin/run.py", line 64, in <module>
    check_call(bin_args)
  File "/usr/lib/python2.7/subprocess.py", line 540, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['/opt/bin/typer.sh', '/myd/data/bam/NA06985.aligned.sorted_xhla.bam', 'NA06985.aligned.sorted', 'full']' returned non-zero exit status 137

Can anyone replicate this?
Thanks!

change from detectCores() to future::availableCores() in testing.r?

Hi,

in testing.r, you use detectCores() to decide how many worker processes to use for parallel execution.

options(mc.cores = detectCores())

However, detectCores() returns all available cores on the machine, even if the process is not allowed to use them as is the case for, for example, cluster jobs that do per-core allocations (e.g. in our case a slurm cluster that uses cgroups for resource limitations).

the availableCores() function from the future package however takes resource allocations into consideration (see https://cran.r-project.org/web/packages/future/vignettes/future-1-overview.html). Would you consider switching to availableCores()? Or, alternatively, allow the size of the parallel cluster to be determined.

Wolfgang

HLA Typing with 10X scRNAseq data (5' chemistry)

Dear all,

I tried to apply the algorithm to a single cell RNA seq (scRNAseq) data set that was created from human material with 10X Genomics' technology (5' chemistry).

I checked the BAM file (generated with BWA-mem against hg38) in the IGV Browser and hence I know that there are many many reads in the correct regions (e.g. HLA-A, -B, -C, etc.) with the correct read length (98bp).

Unfortunately though, I only get the message that no matches to HLA exons to be found (see link below).

What are potential reasons? Still an incompatible BAM file? Incompatibility of the algorithm with the scRNAseq technology used? What else? I appreciate your comments!

https://www.dropbox.com/s/y1rc74ps25rhx7k/Screen%20Shot%202019-11-19%20at%2011.11.00.png?dl=0

xHLA invalid input file, /get-reads-alt-unmap.sh not working.

Dear Authors!

I'm dealing with the next problem, using xHLA:

In our research we are trying to carry out HLA genotyping on exome sequencing data of asthmatic patients. The .SRA files we downloaded are aligned to GRCH37.

So far we tried to convert our files to hg38 with crossmap

and also manually with samtools, using the following commands:
1.samtools bam2fq SAMPLE.bam > SAMPLE.fastq
2. cat SAMPLE.fastq | grep '^@./1$' -A 3 --no-group-separator > SAMPLE_r1.fastq
cat SAMPLE.fastq | grep '^@.
/2$' -A 3 --no-group-separator > SAMPLE_r2.fastq
3. bwa mem "ref.genome_hg_38" sample_r1.fastq sample_r2.fastq
4. samtools sort sample.bam sample.sorted.bam
5. samtools index sample.sorted.bam
And the input file for xHLA was "sample.sorted.bam"

Each way we got the following error massage from xHLA:

processing FASTQ file
found 0 reads
processing MSA file
found 26381 HLA exons
processing FASTQ file
Error: Invalid input file format
matched to 0 HLA exons
0 reads matched to HLA types not in the MSA file

We also tried the recommenden pre-processing You mention in GitHub, but we were unable to run it from docker, running it manually resulted in the following error:
"./get-reads-alt-unmap.sh: line 37: sambamba: command not found
./get-reads-alt-unmap.sh: line 40: sambamba: command not found"

We would highly appreciate your help,
Thank you in advance.

Cut ties to specific build?

Would it be possible to break the hard coded link to human GRCh38 and allow users to provide a set of reference data at execution time rather than embedding it in the image?

If it's possible to provide documentation as to how to generate the reference files this would make the tool both backwards and forwards compatible. We specifically would like GRCh37 supported in the short term (in our case without a chr prefix).

I'm happy to do testing of the resulting changes.

Which GrCh38 reference file to use ?

Your documentation says to use "hg38 reference without alt contigs". Could you provide a link to which reference genome file you used ? I would like to try the same reference genome file. Thanks.

run get-reads-alt-unmap.sh in container

I'm struggling to get typing information from hg19 aligned bams. I think I'd like to run get-reads-alt-unmap.sh in the docker container - I don't have all the dependencies on my machine.

I think I need to change the entrypoint, but I'm not sure the syntax. The coordinates in the .sh script appear to cover all genes in both references.

I tried several variations of:
$docker run -itv pwd:pwd pwd -w humanlongevity/hla --entrypoint /bin/bash ls
docker: invalid reference format: repository name must be lowercase.
See 'docker run --help'.

But doesn't seem like I've successfully changed the entrypoint.

[E::bwa_idx_load_from_disk] fail to locate the index files when running

Hi, I run the script in https://github.com/humanlongevity/HLA/wiki/BAMs-compatible-with-xHLA to preprocess the file from International Genomes Sample Resource Project, which is mapped to Hg38 with HLA and alternative contigs. However, I got no output file from this, only one error:

.......[A lot of similar warning]
*****WARNING: Query A00296:45:HFMLCDSXX:4:2678:7283:5087 is marked as paired, but its mate does not occur next to it in your BAM file.  Skipping.
*****WARNING: Query A00296:45:HFMLCDSXX:4:2678:8712:11193 is marked as paired, but its mate does not occur next to it in your BAM file.  Skipping.
*****WARNING: Query A00296:45:HFMLCDSXX:4:2678:8874:30890 is marked as paired, but its mate does not occur next to it in your BAM file.  Skipping.
[E::bwa_idx_load_from_disk] fail to locate the index files

What is the problem in my case?

get-reads-alt-unmap.sh problem

Because my bamfile were generated from HISAT2, I tried to convert them first using get-reads-alt-unmap.sh
./get-reads-alt-unmap.sh inbam outbam

Then there are a lot of warnings and finally an error message:
...
*****WARNING: Query K00153:41:H3MLCBBXX:7:2228:8450:33123 is marked as paired, but its mate does not occur next to it in your BAM file. Skipping.
[E::bwa_idx_load_from_disk] fail to locate the index files

How do I deal with this? My input bam file is already sorted and indexed.
Thanks

Fail to type several HLA types

I tried to run two different bam files for HLA typing. However, the first run had no output of DQB1, and the second run had no output of A.

I guess there might be some situation that makes this condition happen. Is that possible to do some pre-procession to prevent this?

Best.
Brianna

Error in dcast.data.table

I have a bam file generated using BWA-mem against GRCh37. I have changed one line in typer.sh : https://github.com/humanlongevity/HLA/blob/master/bin/typer.sh#L34 from chr6:29886751-33090696 to 6:20000000-40000000, rebuilt Image and got the following error:

typer.sh parameters: DELETE=false FULL=false
Extracting reads from S3
Aligning reads to IMGT database
processing FASTQ file
	found 1088 reads
processing MSA file
	found 26381 HLA exons
processing FASTQ file
	matched to 445 HLA exons
	2 reads matched to HLA types not in the MSA file
translating matches to MSA
Typing
[1] 1434
[1] 1434
[1] 231
[1] 0
[1] 1050
[1] 729
[1] 729
[1] 0
dcast
Error in dcast.data.table(core, q ~ type, value.var = "specific", fun.aggregate = max,  : 
  Can not cast an empty data.table
Calls: dcast -> dcast.data.table
Execution halted
Traceback (most recent call last):
  File "/opt/bin/run.py", line 64, in <module>
    check_call(bin_args)
  File "/usr/lib/python2.7/subprocess.py", line 540, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['/opt/bin/typer.sh', 'SRR070472/SRR070472.bam', 'SRR070472']' returned non-zero exit status 1

Seems like the algorithm is complaining about zeros, but no idea how to fix it...

README for output

Hi I would like to know how to interpret the output of xHLA, besides the JSON report of the HLA alleles. What reads are those in the fq.gz file? What do the fields in the file .hla mean? And what about .tsv?
Thank you!

get-reads-alt-unmap.sh get "[bwt_restore_sa] SA-BWT inconsistency: primary is not the same. Abort!"

Hi,

1.I used

NCBI Human Genome Resources Reference Genome Sequence https://www.ncbi.nlm.nih.gov/projects/genome/guide/human/

GRCh38 GRCh38_latest_genomic.fna.gz fasta file ftp://ftp.ncbi.nlm.nih.gov/refseq/H_sapiens/annotation/GRCh38_latest/refseq_identifiers/GRCh38_latest_genomic.fna.gz

as BWA Reference Genome
instruction : bwa index ref.fa

Program: bwa (alignment via Burrows-Wheeler transformation)
Version: 0.7.17-r1188

than I used BWA to do sequence alignment (my HLA NGS data fastq is pair-end)
instruction : bwa mem ref.fa read1.fq read2.fq > aln-pe.sam

than used samtools to sort and Index

instruction :
samtools view -b -S aln-pe.sam > aln-pe.bam
samtools sort -m 4096M -o aln-pe.sorted.sam -@ 12 aln-pe.sam
samtools index aln-pe.sorted.sam

samtools --version
samtools 1.9
Using htslib 1.9
Copyright (C) 2018 Genome Research Ltd.

than used xHLA bin/get-reads-alt-unmap.sh to convert bam file, but I get results like follow messages:

[yubau@CNUH-i7 bin]$ get-reads-alt-unmap.sh aln-pe.sorted.bam aln-pe.sorted_xHLA_getreasds.sam
NCORE=12 MEM=187GB

sambamba 0.6.8 by Artem Tarasov and Pjotr Prins (C) 2012-2018
LDC 1.10.0 / DMD v2.080.1 / LLVM6.0.1 / bootstrap LDC - the LLVM D compiler (0.17.4)
sambamba 0.6.8 by Artem Tarasov and Pjotr Prins (C) 2012-2018
LDC 1.10.0 / DMD v2.080.1 / LLVM6.0.1 / bootstrap LDC - the LLVM D compiler (0.17.4)

[ ]Writing sorted chunks to temporary directory...
[=======================================================================================================================================]=========]
*****WARNING: Query M04622:111:000000000-G3CWM:1:1101:12467:2013 is marked as paired, but its mate does not occur next to it in your BAM file. Skipping.
*****WARNING: Query M04622:111:000000000-G3CWM:1:1101:13761:14426 is marked as paired, but its mate does not occur next to it in your BAM file. Skipping.
*****WARNING: Query M04622:111:000000000-G3CWM:1:1101:17424:18089 is marked as paired, but its mate does not occur next to it in your BAM file. Skipping.

[==============================================================================]
*****WARNING: Query M04622:111:000000000-G3CWM:1:2103:7820:5734 is marked as paired, but its mate does not occur next to it in your BAM file. Skipping.
*****WARNING: Query M04622:111:000000000-G3CWM:1:2103:9073:15589 is marked as paired, but its mate does not occur next to it in your BAM file. Skipping.
[bwt_restore_sa] SA-BWT inconsistency: primary is not the same. Abort!

sambamba 0.6.8 by Artem Tarasov and Pjotr Prins (C) 2012-2018
LDC 1.10.0 / DMD v2.080.1 / LLVM6.0.1 / bootstrap LDC - the LLVM D compiler (0.17.4)

/local_work/bin/xHLA/bin/get-reads-alt-unmap.sh: line 43: 46328 Aborted (core dumped) /local_work/bin/bwa-0.7.17/bwa mem -t $NCORE "$BIN/../data/chr6/hg38.chr6.fna" "${OUT}.1.fq" "${OUT}.2.fq"
46329 Done | /local_work/bin/samtools-1.9/samtools view -b -
46330 Done | /local_work/bin/xHLA/bin/sambamba sort -m $MEM -t $NCORE -o "${OUT}.full.bam" /dev/stdin


  1. what is "[bwt_restore_sa] SA-BWT inconsistency: primary is not the same. Abort!" mean?

3.what is "*****WARNING: Query M04622:111:000000000-G3CWM:1:1101:12467:2013 is marked as paired, but its mate does not occur next to it in your BAM file. Skipping." mean?

thanks for your time!

Issue with accuracy

I am trying to use xHLA on WES data, however I am not getting the same typing results as with Athlates. I have been previously aligning WES reads to the whole hg38 genome and am about to try aligning to just chr6.

Not doing anything to the BAM file except sorting and indexing after alignment.

bwa mem is run w/out any picard/extra flags

I am running samtools -F 4 flag to remove any unpaired reads though.

Typing of DQA1 and DPA1

Hello,
I would like to ask if xHLA is also capable of typing the alpha chains of DQ and DP. This information would be useful for predicting the immunogenicity of an antigen or calculating the risk to develop an immune disorder from the WGS data (e.g. celiac disease, the DQ alpha chain is crucial).
Is it possible to extend xHLA in order to perform that?

Kind regards and thanks in advance,
Matthias

Sort reads by coordinate in get-reads-unmap.sh

bamToFastq requires reads to be coordinate sorted:

sambamba sort -p -n -m 60GB -t $NCORE -o ${OUT}.sorted.bam -F "unmapped or mate_is_unmapped or (ref_name == 'chr6' and (position > 29844528 and position < 33100696))" $IN

should be minus -n

sambamba sort -p -m 60GB -t $NCORE -o ${OUT}.sorted.bam -F "unmapped or mate_is_unmapped or (ref_name == 'chr6' and (position > 29844528 and position < 33100696))" $IN

HLAx without docker

I am interested in using HLAx in my research; however, I am unable to use docker on my cluster for security reasons. Can you provide alternative instructions for running HLAx without docker?

Thanks!

Why xHLA does not work for test.bam

I used xHLA in WSL, but I found that I cann't test it, as follow:

docker pull humanlongevity/hla
Using default tag: latest
latest: Pulling from humanlongevity/hla
ab26a7a684ab: Pull complete
f39ad1e4649c: Pull complete
d8b3c1aee01f: Pull complete
2c6ea9f43a60: Pull complete
1cc53c212be4: Pull complete
5f54d3b9b0fa: Pull complete
ff22e3ae8553: Pull complete
55cdfec7d780: Pull complete
607aa3828258: Pull complete
e542261b255c: Pull complete
fa0348e95780: Pull complete
422e3508e617: Pull complete
f49897c45a91: Pull complete
ba9e2f3cc2c4: Pull complete
007700dc3f3f: Pull complete
270f7cdfe1aa: Pull complete
a98c82717073: Pull complete
f12abe75f70a: Pull complete
Digest: sha256:425487b5203485506d30254f03d438aee7d0322938b76bd4a7d3ac201a1ac0e9
Status: Downloaded newer image for humanlongevity/hla:latest

docker run -v pwd:pwd -w pwd humanlongevity/hla --sample_id test --input_bam_path tests/test.bam --output_path test
[06/Nov/2019 16:51:26] INFO - Xie Chao's HLA typing algorithm
[06/Nov/2019 16:51:26] INFO - Sample_id: test Input file: tests/test.bam
typer.sh parameters: DELETE=false FULL=false
Extracting reads from S3
[E::hts_open_format] fail to open file 'tests/test.bam'
samtools view: failed to open "tests/test.bam" for reading: No such file or directory
Traceback (most recent call last):
File "/opt/bin/run.py", line 64, in
check_call(bin_args)
File "/usr/lib/python2.7/subprocess.py", line 540, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['/opt/bin/typer.sh', 'tests/test.bam', 'test']' returned non-zero exit status 1

docker version
Client:
Version: 18.09.7
API version: 1.39
Go version: go1.10.1
Git commit: 2d0083d
Built: Fri Aug 16 14:20:06 2019
OS/Arch: linux/amd64
Experimental: false

Server: Docker Engine - Community
Engine:
Version: 19.03.4
API version: 1.40 (minimum version 1.12)
Go version: go1.12.10
Git commit: 9013bf5
Built: Thu Oct 17 23:50:38 2019
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: v1.2.10
GitCommit: b34a5c8af56e510852c35414db4c1f4fa6172339
runc:
Version: 1.0.0-rc8+dev
GitCommit: 3e425f80a8c931f88e6d94a8c831b9d5aa481657
docker-init:
Version: 0.18.0
GitCommit: fec3683

Issue with aligned bam on hg38+alt

Hi, there,
I tried you script to extract info from my bam based on hg38 and alt. However, in bamtofastq steps, it skips a lot reads because
WARNING: Query XXXXXXXXXXXXX is marked as paired, but its mate does not occur next to it in your BAM file. Skipping

I used bwa-mem to implement the alignment. Do you have any suggestions?

missing chr6.fna

Hi I am trying to use the script for preprocessing: get-reads-alt-unmap.sh

I realize however that the script looks for the fasta in the 'data/chr6/' folder, which contains the index files but do not have the corresponding fasta file. I am getting an error when i ran the script.

Help?

Ability to specify max cpus

The diamond blastx process takes all available CPUs. For users running under singularity it is helpful if possible to specify --cpus to limit resources taken when run in a HPC environment.

Can I run xHLA without docker?

Hello all,
I have no root access and I could not use docker. Could I use this software without docker ? How did I install it?

Thank you!

Massive memory issue

Occasionally even after running BAMS through get-reads-alt-unmap.sh XHla still uses upwards of 500gb of ram. Is there any further recommendations for pre processing?

Error message

It would be great if you can take a look at what happened and give me some clue!
Thanks!

2018-01-11T18:02:17.464782174Z processing FASTQ file
2018-01-11T18:02:17.474653029Z found 2796 reads
2018-01-11T18:02:17.474664030Z processing MSA file
2018-01-11T18:02:17.897350913Z found 26381 HLA exons
2018-01-11T18:02:17.897375302Z processing FASTQ file
2018-01-11T18:02:26.875431019Z matched to 9790 HLA exons
2018-01-11T18:02:26.875466077Z 262 reads matched to HLA types not in the MSA file
2018-01-11T18:02:26.944128389Z translating matches to MSA
2018-01-11T18:02:28.212725340Z Error in apply(mat2[, solution], 1, max) :
2018-01-11T18:02:28.212744859Z dim(X) must have a positive length
2018-01-11T18:02:28.212755739Z Execution halted
2018-01-11T18:02:28.221492161Z Traceback (most recent call last):
2018-01-11T18:02:28.221508050Z File "/opt/bin/run.py", line 64, in
2018-01-11T18:02:28.221514672Z check_call(bin_args)
2018-01-11T18:02:28.221517873Z File "/usr/lib/python2.7/subprocess.py", line 540, in check_call
2018-01-11T18:02:28.221525322Z raise CalledProcessError(retcode, cmd)
2018-01-11T18:02:28.221536499Z subprocess.CalledProcessError: Command '['/opt/bin/typer.sh', '/sbgenomics/Projects/11c7a09c-16a8-4ea2-a781-01709f9943c7/NA12878-Garvan-Vial1_R.bam', 'NA12878-Garvan-Vial1', 'delete', 'full']' returned non-zero exit status 1

Can xHLA be used for HLA-typing using RNA-Seq data

Hi I have a large amount of RNA-Seq samples available and was wondering if I can use xHLA for typing? I was not able to find mentions of RNA-Seq data usage in the paper or on the github page.

Could you please indicate if that is possible? Thanks a lot!

run errors

Hi I am running xHLA on the 1000 Genome Project exome samples. I was able to run for most of them except for some of the samples, which churned out some run errors that could be lumped into 2 kinds:
Type 1:
Warning message:
In mclapply(solution, function(s) { :
all scheduled cores encountered errors in user code
Error in more$solution : $ operator is invalid for atomic vectors
Calls: unique
Execution halted

Type 2:
Error in apply(mat2[, solution], 1, max) :
dim(X) must have a positive length
Execution halted

How do I troubleshoot and rectify them?
Thank you!

Update Stored HLA Database

The multiple sequence alignments in the data folder are almost 2 years old. Since then, many new alleles have been discovered and recorded in IMGT/HLA and existing alleles have been renamed or improved. The alleles contained in xHLA should be updated to release 3.29.0.

is this a bug or something wrong with my test data?

I use this software according to the article, and I got this error:
Extracting reads from S3
skipped 98412 duplicated reads
Aligning reads to IMGT database
processing FASTQ file
found 151748 reads
processing MSA file
found 26381 HLA exons
processing FASTQ file
terminate called after throwing an instance of 'std::bad_alloc'
what(): std::bad_alloc
matched to 0 HLA exons
0 reads matched to HLA types not in the MSA file
translating matches to MSA
Typing
Error in fread(align.path) : File is empty: hla-1c/1c.tsv
Execution halted
Traceback (most recent call last):
File "/opt/bin/run.py", line 64, in
check_call(bin_args)
File "/usr/lib/python2.7/subprocess.py", line 540, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['/opt/bin/typer.sh', 'HLA-1-clean_sorted.bam', '1c']' returned non-zero exit status 1

Missing HLA calls?

Thank you for making this wonderful tool open source.

Installed according to README

Linux master 4.4.0-1022-aws #31-Ubuntu SMP Tue Jun 27 11:27:55 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

Docker version 17.06.0-ce, build 02c1d87

docker run -v $(pwd):$(pwd) -w $(pwd) docker-dev.hli.io/xchao/hla \
    --sample_id $1:t:r --input_bam_path $1 \
    --output_path ./"$1:t:r_xhla"

allele rank
1: B07:02 11
2: C
06:103 NA
3: C07:02 6
4: DPB1
01:01 NA
5: DRB1*13:171 NA

Do missing HLA calls imply that not enough quality reads were aligned? I can send representative .bams privately.

.tsv from full.r: https://transfer.sh/M3Aa/andrewrech_example.tsv

Thank you

Do not hard code memory in get-reads-unmap.sh

This will break for some users. Get memory with:

$((( $(dmidecode -t 17 | grep "Size.*MB" | awk '{s+=$2} END {print s / 1024}') - 2500 )))MB

To use available system memory minus a little.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.