illumina / cyrius Goto Github PK

A tool to genotype CYP2D6 with WGS data

License: Other

Python 100.00%

cyrius's Introduction

Cyrius: WGS-based CYP2D6 genotyper

Cyrius is a tool to genotype CYP2D6 from a whole-genome sequencing (WGS) BAM file. Cyrius uses a novel method to solve the problems caused by the high sequence similarity with the pseudogene paralog CYP2D7 and thus is able to detect all star alleles, particularly those that contain structural variants, accurately. Please refer to our paper for details about the method.

Cyrius has been integrated into Illumina DRAGEN Bio-IT Platform since v3.7.

Running the program

This Python3 program can be run as follows:

python3 star_caller.py --manifest MANIFEST_FILE \
                       --genome [19/37/38] \
                       --prefix OUTPUT_FILE_PREFIX \
                       --outDir OUTPUT_DIRECTORY \
                       --threads NUMBER_THREADS

The manifest is a text file in which each line should list the absolute path to an input BAM/CRAM file. For CRAM input, it’s suggested to provide the path to the reference fasta file with --reference in the command.

Cyrius can now be installed with pip install cyrius and run with cyrius -h, with the same parameters as listed above.

Interpreting the output

The program produces a .tsv file in the directory specified by --outDir.
The fields are explained below:

Fields in tsv	Explanation
Sample	Sample name
Genotype	Genotype call
Filter	Filters on the genotype call

A genotype of "None" indicates a no-call.
There are currently four possible values for the Filter column:
-PASS: a passing, confident call.
-More_than_one_possible_genotype: In rare cases, Cyrius reports two possible genotypes for which it cannot distinguish one from the other. These are different sets of star alleles that result in the same set of variants that cannot be phased with short reads, e.g. *1/*46 and *43/*45. The two possible genotypes are reported together, separated by a semicolon.
-Not_assigned_to_haplotypes: In a very small portion of samples with more than two copies of CYP2D6, Cyrius calls a set of star alleles but they can be assigned to haplotypes in more than one way. Cyrius reports the star alleles joined by underscores. For example, *1_*2_*68 is reported and the actual genotype could be *1+*68/*2, *2+*68/*1 or *1+*2/*68.
-LowQ_high_CN: In rare cases, at high copy number (>=6 copies of CYP2D6), Cyrius uses less strict approximation in calling copy numbers to account for higher noise in depth and thus the genotype call could be lower confidence than usual.

A .json file is also produced that contains more information about each sample.

Fields in json	Explanation
Coverage_MAD	Median absolute deviation of depth, measure of sample quality
Median_depth	Sample median depth
Total_CN	Total copy number of CYP2D6+CYP2D7
Total_CN_raw	Raw normalized depth of CYP2D6+CYP2D7
Spacer_CN	Copy number of CYP2D7 spacer region
Spacer_CN_raw	Raw normalized depth of CYP2D7 spacer region
Variants_called	Targeted variants called in CYP2D6
CNV_group	An identifier for the sample's CNV/fusion status
Variant_raw_count	Supporting reads for each variant
Raw_star_allele	Raw star allele call
d67_snp_call	CYP2D6 copy number call at CYP2D6/7 differentiating sites
d67_snp_raw	Raw CYP2D6 copy number at CYP2D6/7 differentiating sites

Troubleshooting

Common causes for Cyrius to produce no-calls are:
-Low sequencing depth. We suggest a sequencing depth of 30x, which is the standard practice recommended by clinical genome sequencing.
-The depth of the CYP2D6/CYP2D7 region is much lower than the rest of the genome, most likely because reads are aligned to alternative contigs. If your reference genome includes alternative contigs, we suggest alt-aware alignment so that alignments to the primary assembly take precedence over alternative contigs.
-The majority of reads in CYP2D6/CYP2D7 region have a mapping quality of zero. This is probably due to some post-processing tools like bwa-postalt that modifies the mapQ in the BAM. We recommend using the BAM file before such post-processing steps as input to Cyrius.

cyrius's People

Contributors

Stargazers

Watchers

Forkers

tamadezhenshiyuncai cariaso iamh2o lightning-auriga

cyrius's Issues

How to correct "ZeroDivisionError: division by zero" ?

I run command star_caller.py, and i got this error

INFO:root:Processing sample test at 2023-05-01 11:57:07.119751
Traceback (most recent call last):
  File "/path/Cyrius/Cyrius-v1.1.1/star_caller.py", line 562, in <module>
    main()
  File "/path/Cyrius/Cyrius-v1.1.1/star_caller.py", line 530, in main
    bam_name, call_parameters, threads, count_file, reference_fasta, index_name=index_name
  File "/path/Cyrius/Cyrius-v1.1.1/star_caller.py", line 312, in d6_star_caller
    raw_cn_call.d67_cn,
  File "/path/Cyrius/Cyrius-v1.1.1/caller/call_variants.py", line 269, in call_exon9gc
    d6_values.append(full_length_cn * count1 / (count1 + count2))
ZeroDivisionError: division by zero

How to correct this? it mean my sample test isn't have this region?

Pavita

troubleshooting "no call" outputs for samples with coverage above 30X

Hi Xiao,

We've come across a few cases where Cyrius reports a no call, in spite of the sample having coverage > 30X. Would you have some recommendations to troubleshoot this further (e.g. not possible to resolve star alleles)?

Attaching a couple of examples: Archive.zip

Thanks

'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte

Hi,

Im getting this error despite everything properly, such as WGS bam file.

python3 star_caller.py --manifest /media/sf_Shared/Cyrius-master/caller/tests/test_data/NA23275.bam --genome 37 --prefix cyr --outDir test_output

Traceback (most recent call last):

File "/media/sf_Shared/Cyrius-master/star_caller.py", line 562, in

main()

File "/media/sf_Shared/Cyrius-master/star_caller.py", line 513, in main

for line in read_manifest:

File "/home/adnan/anaconda3/lib/python3.9/codecs.py", line 322, in decode

(result, consumed) = self._buffer_decode(data, self.errors, final)

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte

WES

Hi!

Do you think cyrius will work on WES-data?

new version?

Hi,
Is there a newer version covering more genes, other than CYP2D6?
I found a related newsletter from Illumina.

Thank you

Alternate and Reference Read Count Output Format

I was running Cyrius to test its performance on the NA12878 data and noticed that one of the outputs from your JSON file is the field "Variant_raw_count". For each entry, subsequent to the position and mutation information are I believe the alternate allele read count and reference allele read count. Wouldn't the convention typically be the reference allele count followed by the alternate allele count? Or is that already how you define your output?

Thank you

"None" genotypes with GATK4 best practices BAM/CRAMs

We intend to run Cryrius on a cohort of 10K Asian genomes which have been analysed with the GATK4 best practices pipeline. Whilst running some tests, I've noticed that I consistently get a genotype of None in the outputs.

In parallel, I've also tried to launch Cyrius on one of the replicates of NA12878 available in BaseSpace, analysed with Dragen 3.2.8, and observe the same behaviour.

In both cases, alignment has been done against GRCh38 with alt contigs (using the fasta from the GATK4 bundle and hg38-altaware in Dragen, respectively). Seeing this previous issue (#1), I wonder if it could be the alt-aware alignment what's leading to no-call. Would you have any thoughts?

I'm attaching the json output for the two test samples mentioned above for additional info.
output.zip

Many thanks

Is there a way to use just a part of cram/bam file to get not-None Cyrius result?

I used sample NA19239 from 1000-genomes, aligned to hg38 in .cram format.

wget ftp://ftp.sra.ebi.ac.uk/vol1/run/ERR323/ERR3239454/NA19239.final.cram
wget ftp://ftp.sra.ebi.ac.uk/vol1/run/ERR323/ERR3239454/NA19239.final.cram.crai

Result of Cyrius:

NA19239.final    *15/*17    PASS

When I extracted only chr22 using samtools view:
samtools view -C NA19239.final.cram chr22 > NA19239_chr22.cram
samtools index NA19239_chr22.cram

Result of Cyrius:

Sample	Genotype	Filter
NA19239_chr22	None	None

Thanks

Conda package?

Hi!

Is it in your internsion to provide Cyrius in a conda package?

Panel data

Hi,
I read that Cyrius cannot be used for panel data. Are you planning to develop this in the near future? (It would be most useful...)
Kind regards
Anna

Unknown variants?

Hi, I appreciate for your kind response for the previous issue I submitted last time! :) I do have one more question about the result in your article.

I am practicing finding rare variants in multi-allelic genes. And I found out that there were 26 samples that had variants calls that did not match any of the known star alleles out of 2504 1kGP samples after running Cyrius. If it's available, can I get the sample id of the 26 samples?

Thank you very much.

Using Cyrius within other tools/workflow management

Hello, I am fairly new to coding and I need some help implementing Cyrius in my project. I am currently trying to implement Cyrius in the All of Us cloud-based platform, but I am having trouble. All of Us stores all of its WGS CRAM files in a google bucket. When I copy the CRAMs to my active environment and use the paths to CRAMs in my active environment, I am able to run Cyrius. However, copying CRAMS from the platform takes about 15 minutes per CRAM, so is not feasible for a larger analysis. When I attempt to use the paths to the google bucket location (rather than copying to the active env.) of CRAM files, I receive the feedback that I do not have permissions and Cyrius does not run. I reached out to the All of Us datascience team and they recommended that I use tools such as SAMtools and GATK to work seamlessly with google bucket files. They also recommend using tools in workflows such as dsub, Cromwell, or Nextflow to interact with the google bucket files. Do you know how it would be impossible to incorporate Cyrius into these other tools or workflows, or another way to make Cyrius able to interact with the google bucket files?

Thank you in advance!
Tim Sanford

UnicodeDecodeError

Hi,

I have BAM and BAI files for a focused pharmacogene panel, including CYP2D6, generated from a clinical lab following in-house pipeline. I performed the following analysis with Python 3.8.3 on linux system:

git clone https://github.com/Illumina/Cyrius.git

python3 star_caller.py --manifest /path/to/bam/file.bam \
    		       --genome 37 \
    		       --prefix file \
    		       --outDir /path/to/work/directory/ \
    		       --threads 1

But I got the following error:

Traceback (most recent call last):
  File "bin/Cyrius/star_caller.py", line 562, in <module>
    main()
  File "bin/Cyrius/star_caller.py", line 513, in main
    for line in read_manifest:
  File "/panfs/roc/msisoft/anaconda/miniconda3_4.8.3-jupyter/lib/python3.8/codecs.py", line 322, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte

Input files or the Cyrius folder are not in gzipped files. Any suggestions?

Thank you.

RuntimeWarning: divide by zero encountered in true_divide

my command = python3 star_caller.py --manifest manifest.txt --genome 38 --prefix test --outDir result --threads 8
INFO:root:Processing sample HSxx at 2020-12-14 15:39:01.967956
xx/Cyrius/depth_calling/bin_count.py:84: RuntimeWarning: divide by zero encountered in true_divide
y_counts = y_counts / np.median(y_counts)
xx/Cyrius/depth_calling/bin_count.py:84: RuntimeWarning: invalid value encountered in true_divide
y_counts = y_counts / np.median(y_counts)
xx/envs/cyr_env/lib/python3.7/site-packages/numpy/core/fromnumeric.py:3373: RuntimeWarning: Mean of empty slice.
out=out, **kwargs)
xx/envs/cyr_env/lib/python3.7/site-packages/numpy/core/_methods.py:170: RuntimeWarning: invalid value encountered in double_scalars
ret = ret.dtype.type(ret / rcount)
INFO:root:Processing sample HSxx at 2020-12-14 15:39:02.742290

result tsp file =
Sample Genotype Filter
HSxx None None
json file =
{"HSxx": {"Coverage_MAD": NaN, "Median_depth": 0.0, "Total_CN": null, "Spacer_CN": null, "Total_CN_raw": null, "Spacer_CN_raw": null, "Variants_called": null, "CNV_group": null, "Genotype": null, "Filter": null, "Raw_star_allele": null, "Call_info": null, "Exon9_CN": null, "CNV_consensus": null, "d67_snp_call": null, "d67_snp_raw": null, "Variant_raw_count": null}, "HS01002_mini": {"Coverage_MAD": NaN, "Median_depth": 0.0, "Total_CN": null, "Spacer_CN": null, "Total_CN_raw": null, "Spacer_CN_raw": null, "Variants_called": null, "CNV_group": null, "Genotype": null, "Filter": null, "Raw_star_allele": null, "Call_info": null, "Exon9_CN": null, "CNV_consensus": null, "d67_snp_call": null, "d67_snp_raw": null, "Variant_raw_count": null}}
Why i cannot get result of genotype?

Sample specific error

Hello,

We've been using Cyrius-1.0 for a while and came across and error recently. I updated to v1.1 and the error is persisting.
The command we're using is:

python3 -u /gpfs/gpfs1/home/jholt/githubDL/Cyrius-1.1/star_caller.py             \
    --manifest {redacted}/cyrius-1.0/HALB3003261.txt             \
    --genome 38             \
    --outDir {redacted}/cyrius-1.0             \
    --prefix HALB3003261             \
    --threads 1

And the output is as follows:

INFO:root:Processing sample HALB3003261 at 2021-04-28 08:55:21.589540
Traceback (most recent call last):
  File "/gpfs/gpfs1/home/jholt/githubDL/Cyrius-1.1/star_caller.py", line 560, in <module>
    main()
  File "/gpfs/gpfs1/home/jholt/githubDL/Cyrius-1.1/star_caller.py", line 528, in main
    bam_name, call_parameters, threads, count_file, reference_fasta
  File "/gpfs/gpfs1/home/jholt/githubDL/Cyrius-1.1/star_caller.py", line 403, in d6_star_caller
    if ";" in final_star_allele_call:
TypeError: argument of type 'NoneType' is not iterable

Let me know if you need any additional information from me!

ValueError: invalid contig `22`

Hi,
I am trying to run star_caller.py exactly ad described in the manual, and I am getting this error messages:

INFO:root:Processing sample I-07-1524.normal.finalSorted at 2022-03-02 11:23:54.857425
Traceback (most recent call last):
File "star_caller.py", line 562, in
main()
File "star_caller.py", line 529, in main
cyp2d6_call = d6_star_caller(
File "star_caller.py", line 175, in d6_star_caller
normalized_depth = get_normed_depth(
File "/srv/ngs/analysis/dalteriog/Tools/Cyrius-1.1.1/depth_calling/bin_count.py", line 39, in get_normed_depth
counts_for_normalization, gc_for_normalization, region_type_cn, read_length = count_reads_and_prepare_for_normalization(
File "/srv/ngs/analysis/dalteriog/Tools/Cyrius-1.1.1/depth_calling/bin_count.py", line 187, in count_reads_and_prepare_for_normalization
region_reads = get_read_count(bamfile, region)
File "/srv/ngs/analysis/dalteriog/Tools/Cyrius-1.1.1/depth_calling/bin_count.py", line 103, in get_read_count
reads = bamfile.fetch(region[0], region[1], region[2])
File "pysam/libcalignmentfile.pyx", line 1091, in pysam.libcalignmentfile.AlignmentFile.fetch
File "pysam/libchtslib.pyx", line 685, in pysam.libchtslib.HTSFile.parse_region
ValueError: invalid contig 22.

I am not a python expert, but I think that the error stands, maybe, in the contigs nomenclature. Nevertheless, this is quite odd because all the bam files specified in the MANIFEST_FILE have the following contings nomenclature:

chr1
chr10
chr11
chr12
chr13
chr14
chr15
chr16
chr17
chr18
chr19
chr2
chr20
chr21
chr22
chr3
chr4

chr5
chr6
chr7
chr8
chr9
chrM
chrX
chrY

I don't either think that the problem is the dependencies, as I checked in this issue that I have installed all the packages with at least the version indicated (for some packages I have newer version, but I don't think this should be a problem).
For a matter of completeness, here is the MANIFEST_FILE, where each line specifies a bam full path
manifestBampath.txt

thank you so much in advantage
Giuseppe

processing steps to be on the lookout for in a CRAM?

Thank you for developing this tool. I have a collaborator that has some CRAMS that I might use as input for this caller. I see in the documentation that processing steps like bwa-postalt should be avoided. I am wondering are there any other obvious processing steps that I should be on the lookout for when evaluating whether these CRAMS are suitable as input for Cyrius?

Thank you

snp_count.py:127: UserWarning, haplotype.py:70: UserWarning

When running Cyrius on 5 wgs CRAM samples (all listed in a manifest text file), I received the following error after message that the first sample was processing:

Cyrius/depth_calling/snp_count.py:127: UserWarning: multiple_iterators not implemented for CRAM
ignore_orphan = False,
Cyrius/depth_calling/haplotype.py:70: UserWarning: multiple_iterators not implemented for CRAM
ignore_orphan = False

Though I received this message, the output tsv and json files appear to show that Cyrius ran correctly. Can you help me understand these error messages and if anything needs to be fixed?

Thank you!

Calling on HG00463 results in s/w crash.

I downloaded the R1 and R2 fasta files for this sample referenced in the paper. I aligned the reads with sentieon bwa mem, produced a valid BAM/BAI file. When I ran star_caller.py, I ended up with the following crash.

(supersonic) jmajor@kahlo:/locus/data/external_data/research_experiments/investigations/CYP2D6/HG00463$ python ~/wgs_resources/bin/Cyrius/star_caller.py --reference ~/wgs_resources/data/reference/human/human_g1k_v37_modified.fasta/human_g1k_v37_modified.fasta --genome 37 --prefix CYP_ --outDir ./ --threads 88 --manifest manifest.txt
INFO:root:Processing sample HG00463.aligned.deduped.sort at 2020-07-20 05:36:41.476394
Traceback (most recent call last):
File "/locus/home/jmajor/wgs_resources/bin/Cyrius/star_caller.py", line 580, in
main()
File "/locus/home/jmajor/wgs_resources/bin/Cyrius/star_caller.py", line 548, in main
bam_name, call_parameters, threads, count_file, reference_fasta
File "/locus/home/jmajor/wgs_resources/bin/Cyrius/star_caller.py", line 339, in d6_star_caller
raw_cn_call.spacer_cn,
File "/locus/data/external_data/research_experiments/wgs_resources/bin/Cyrius/caller/cnv_hybrid.py", line 56, in get_cnvtag
if exon9_intron4_sites_counter[0][1] >= EXON9_TO_INTRON4_SITES_MIN
IndexError: list index out of range
`

Recommended upstream alignment

I was tinkering with the software, mainly to see ease-of-use and ran into an issue with regards to upstream processing. I have two alignment processes that are almost identical (sentieon align, dedup, BQSR) except one performs postalt correction and the other does not. When I run on three NA12878 replicates I get the following results:
No-postalt:

Sample	Genotype
SL362490	*3/*4+*68
SL362491	*3/*4+*68
SL362492	*3/*4+*68

Postalt+:

Sample	Genotype
SL362490	None
SL362491	None
SL362492	None

The difference is obvious, which leads to the question of what's recommended for upstream alignment? Will performing postalt processing always lead to None or is there some workaround that will fix that?

Calling on Cyrius test data

I was trying to run your test data from this Github (Cyrius/depth_calling/tests/test_data/NA12878.bam) using this following command:

python3 star_caller.py --manifest cyp2d6_GRCh37_manifest.txt --genome 37 --prefix cyp2d6_GRCh37_NA12878 --outDir cyrius_run_and_visualization --threads 4

But it gave me these errors:

1. RuntimeWarning: Mean of empty slice.
2. RuntimeWarning: invalid value encountered in double_scalars  ret = ret.dtype.type(ret / rcount)
3. RuntimeWarning: invalid value encountered in true_divide  y_counts = y_counts / np.median(y_counts)

Since I'm using your test data, I'm unsure how it's giving me an error. Do you happen to know if I'm doing anything wrong?

TypeError: 'NoneType' object cannot be interpreted as an integer

Hello,
I have been able to succesffully run Cyrius on many samples. However, one sample I get the error shown below. Any thoughts on the cause? PyPGx and Aldy were both able to call this sample as *2x2/*4.

!python3 {cyrius_path} -m {manifest_file} -p cyrius_1152285 -g 38 -o cyrius_results

INFO:root:Processing sample 1152285 at 2023-11-13 15:07:38.992327
Traceback (most recent call last):
  File "/home/jupyter/workspaces/pharmacogenomichaplotypecharacterization/bin/Cyrius/star_caller.py", line 562, in <module>
    main()
  File "/home/jupyter/workspaces/pharmacogenomichaplotypecharacterization/bin/Cyrius/star_caller.py", line 530, in main
    bam_name, call_parameters, threads, count_file, reference_fasta, index_name=index_name
  File "/home/jupyter/workspaces/pharmacogenomichaplotypecharacterization/bin/Cyrius/star_caller.py", line 362, in d6_star_caller
    bamfile, cnvtag, haplotype_db["g.42127526C>T_g.42127556T>C"]
  File "/home/jupyter/workspaces/pharmacogenomichaplotypecharacterization/bin/Cyrius/caller/call_variants.py", line 339, in call_var42127526_var42127556
    for _ in range(var7526_cn):
TypeError: 'NoneType' object cannot be interpreted as an integer

Thanks,
Andrew

Alignment to PGx regions only

Hello,

Is there a way to align WGS fastq reads to specific regions of the reference genome? Can I somehow only map to the PGx gene regions and them use Cyrius? I want to speed up the whole process.

fetch called on bamfile without index?

Hi, I enjoyed your article very much. Now I am trying to apply Cyrius software tool on analyzing CYP2D6 gene for my research. However, while I was trying to run the tool, I faced an error that I cannot figure out.

The command I entered is as following :

python3 star_caller.py --manifest manifest.txt --genome 38 --prefix test --outDir ./output/

This is the error I see:

ValueError: fetch called on bamfile without index

I don't think I need bam.bai file, but the error says that "could not retrieve index for bam file". (In the manifest.txt file, there is only one line of the bam file path.) So, I am wondering how I can troubleshoot this issue.

Also, can we use a sliced WGS bam file as an input? For example, ch22.bam.

Thank you very much.

Questions about odd genotypes

I am using the latest version of Cyrius. Initial feedback I want to give is that Cyrius is very easy to install and use. I really appreciate that. Some tools can be a bear to get setup and running.

I have a few of samples where I have some odd genotypes that I'm not sure how to interpret.

*1_*1_*13 - I understand that this signifies that this is a Not_assigned_to_haplotypes scenario. But I'm not following what the possibilities would be. *1/*1+*13, shouldn't that just be *1/*13? or *1+*1/*13, shouldn't that just be *1x2/*13?
*1/*68+*68+*68+*4 and another one that is *1/*68+*68+*68+*68+*4 - why are there so many *68s? What does this mean?
*41/*4.013+*4 - is this the equivalent of *41/*4x2?

Thanks

Installing Cyrius using pip for ease of use with other python tools

Hello,

I ran "!pip install git+https://github.com/Illumina/Cyrius.git" in a cell in my jupyter notebook and received the following output:

Collecting git+https://github.com/Illumina/Cyrius.git
Cloning https://github.com/Illumina/Cyrius.git to /tmp/pip-req-build-atgcxnnp
Running command git clone --filter=blob:none --quiet https://github.com/Illumina/Cyrius.git /tmp/pip-req-build-atgcxnnp
Resolved https://github.com/Illumina/Cyrius.git to commit 9fb1c6d
ERROR: git+https://github.com/Illumina/Cyrius.git does not appear to be a Python project: neither 'setup.py' nor 'pyproject.toml' found.

Is there any way that i can install and import Cyrius so that it can be used in my workflow more efficiently? I am working in a cloud-based environment within a research project's platform. Let me know if there is a possible solution. Thank you for your help!

Using cyrius on a specific panel of genes not WGS

Hi,
I loved the article and the accuracy.
I have BAM files that contain a panel of 72 genes, i was wondering if i can use Cyrius for the CYP2D6 with these specific files or i have to use WGS BAM files.
Thanks in advance, looking forward for your answer.

Issue with file path error?

Hi there,

Is there a way to fix the error where the input file for the sample does not exist?

python star_caller.py --manifest test_filepath --genome 19 --prefix cyr --outDir test_output
WARNING:root:Input file for sample NA23275.bam does not exist.
INFO:root:Writing to json at 2021-11-08 00:30:38.127413
INFO:root:Writing to tsv at 2021-11-08 00:30:38.127615

The test_filepath txt file contains exactly the absolute filepath to the NA23275.bam file. It does not work either for HG00611.bam within the test_data folder either. I have also tried with other bam files that have worked for other PGx programs but am unsure why Cyrius is giving this issue?

Minor bug in target variants for GRch37

Hello @xiao-chen-xc,

I was running your program with GRch37 data and found a minor bug. I think you put the wrong position in the variant_name from file Cyrius/data/CYP2D6_target_variant_37.txt. For example, variant_name : g.42126611C>G where the real position should be 42522613. I also found the same issue in Cyrius/data/CYP2D6_target_variant_19.txt.

Thanks !

Best regards
Benz

Warning message

Hello;

Thank you for developing this tool.

I got a Warning message from my sample:

cyrius -m bam_manifest.txt -g 38 -t 20 -p genome_cyp2d6 -o genome_cyp2d6
INFO:root:Processing sample deduped at 2023-10-29 20:35:41.743280
WARNING:root:Sample deduped has uneven coverage. CN calls may be unreliable.

Warning is self explanatory but I was wondering if there was anything I could get an output regardless.

WGS only?

Have anyone tested Cyrius with data from a chip or single sequenced gen? Thank you in advance.

Question about "None" phenotype

Hello,

I wanted to ask if Cyrius yields "None" for the phenotype, is it the same as declaring "Indeterminate" for the gene's metabolizer status? I wanted to test concordance with another program!

Thank you.

Error Running CRAM Files

Hello! First off, I would like to say that I am very impressed with the Cyrius program. For research purposes I have been struggling to find an accurate tool to genotype CYP2D6 and, so far, Cyrius's accuracy has been unmatched.

I have run into an issue attempting to run build GRCH38 CRAM files through Cyrius. The program reaches the "INFO:root:Processing sample SAMPLENAME at 2021-10-25 15:37:07.945922" , runs for 2-3 minutes (which is usual for successful BAM runs), then fails due to the error below.

[E::cram_read_container] Container header CRC32 failure
Traceback (most recent call last):
File "Cyrius-1.1.1/star_caller.py", line 562, in
main()
File "Cyrius-1.1.1/star_caller.py", line 530, in main
bam_name, call_parameters, threads, count_file, reference_fasta, index_name=index_name
File "Cyrius-1.1.1/star_caller.py", line 255, in d6_star_caller
var_homo_db.dindex,
File "Cyrius-1.1.1/depth_calling/snp_count.py", line 195, in get_supporting_reads
bamfile_handle, nchr, dsnp1, dindex
File "Cyrius-1.1.1/depth_calling/snp_count.py", line 127, in get_reads_by_region
ignore_orphan=False,
File "pysam/libcalignmentfile.pyx", line 2628, in pysam.libcalignmentfile.IteratorColumnRegion.next
ValueError: error during iteration

As recommended, I supplied the reference fasta file for the cram file but this does not seem to be the issue, the error remains regardless if I supply a reference or not. Also, this error was found to be from both WGS and WES sequenced samples, some with 30x depth and some with less. I was able to run Cyrius v1.1 on 70 GRCH37 BAM samples with no such issues.

I am still in the process of troubleshooting this but just wanted to post this in case I am unable to resolve it alone. If I find the solution I will reply to this post in case anyone else ever has a similar issue. Thank you in advance, any help would be appreciated. Let me know if any other information is needed.

interpreting d67_snp_* fields in json output

Hi Xiao,

We're trying to reproduce these 2 figures from the Cyrius manuscript:

We've identified the relevant fields from the json output to be used, namely d67_snp_call and d67_snp_raw for panel B, and Variants_called for panel C.

Since the d67_snp_* fields are not annotated with variant information, we wanted to confirm that we're interpreting them correctly. Are we OK to assume that each differentiated variant appears in the same order as in the bundled annotation files (https://github.com/Illumina/Cyrius/blob/master/data/CYP2D6_SNP_38.txt)?

Thanks in advance!

underscores in diplotype calls

Hi Xiao,

We've come across the following output from Cyrius where multiple solutions have been detected:

Sample  Genotype        Filter
SAM123  *119_*2;*1_*41  More_than_one_possible_genotype

Would you be able to comment on why the star alleles are separated by "_" instead of "/", even though there's only two alleles per solution?

Many thanks

Cyrius v1.1.1 incorrectly calls GeT-RM sample NA18565 that was correctly called by Cyrius v1.1

Hello there,

while testing Cyrius v1.1.1 on CYP2D6-calling on samples from GeT-RM I encountered something unexpected. On a publication that used Cyrius v1.1, sample NA18565 is correctly genotyped as *10/*36x2 (supplementary materials). Nevertheless, v1.1.1 that I used calls this sample as *36/*36+*10.

The BAM file for this sample was downloaded from the ENA website and used as is.

Cyrius was called as:

python3 star_caller.py --manifest manifest.txt --genome 37 --prefix prefix --outDir outdir

Could this be due to an error on my end?

Thank you.

illumina / cyrius Goto Github PK

cyrius's Introduction

Cyrius: WGS-based CYP2D6 genotyper

Running the program

Interpreting the output

Troubleshooting

cyrius's People

Contributors

Stargazers

Watchers

Forkers

cyrius's Issues

Recommend Projects

Recommend Topics

Recommend Org