Giter Club home page Giter Club logo

absplice's Introduction

AbSplice: aberrant splicing prediction across human tissues

AbSplice is a method that predicts aberrant splicing across human tissues, as described in Wagner, Çelik et al., Nature Genetics 2023.

Precomputed AbSplice-DNA scores for all possible single-nucleotide variants genome-wide are available here (hg38) and here (hg19) for download.

AbSplice predictions are computed from VCF files and are based on enhanced tissue-specific splice site annotations (SpliceMaps). The scores represent the probability that a given variant causes aberrant splicing in a given tissue.

AbSplice-DNA: if only DNA is available, different DNA-based splicing predictions, as well as information from tissue-specific SpliceMaps, are combined into the integrative model AbSplice-DNA (see example use case).

AbSplice-RNA: if RNA-seq from clinically accessible tissues (e.g. blood or skin) is available, these direct splicing measurements can be used to predict aberrant splicing in another tissue from the same individual with the model AbSplice-RNA.

AbSplice

License

The source code to create and use SpliceMaps is under MIT license. Pre-computed SpliceMaps are under MIT license. The source code of AbSplice is under MIT license. The pre-trained AbSplice models as well as the pre-computed AbSplice scores are under the CC BY NC 4.0 license for academic and non-commercial use. This is because the implementation of running AbSplice predictions makes use of trained SpliceAI models which are currently under the CC BY NC 4.0 by Illumina (as of 21st December 2023).

Installation

With container

Instead of Docker you can also use Podman (you just need to replace docker with podman in all the commands).

Download the image archive (file size is 5GB):

wget https://zenodo.org/record/8095625/files/absplice.oci

Load the image from archive:

docker load -i absplice.oci

Run the image with command line interface:

docker run -it --name absplice_container localhost/absplice:latest /bin/bash

Now you are working inside the container. The conda environment is already installed here, you just need to activate it:

conda activate absplice_dock

Clone the AbSplice repository to the container:

git clone https://github.com/gagneurlab/absplice.git
cd absplice

With creating a conda environment

Clone git repo:

git clone https://github.com/gagneurlab/absplice.git

cd into repo directory:

cd absplice

Install conda environment:

# Recommended if you have mamba installed
mamba env create -f environment.yaml
# otherwise
conda env create -f environment.yaml

Activate conda environment:

conda activate absplice

Install modules from absplice:

pip install -e .

Output

Note: if you run AbSplice on large datasets, you might experience memory issues with the full output. We suggest setting the config fields extra_info_dna and extra_info_rna to False in such cases and if you do not need additional information from SpliceAI and SpliceMaps (True by default). This will crop the output, leaving only the columns with AbSplice features and unique row identifiers.

The full output of AbSplice is tabular data with variant, gene_id, tissue being the unique row identifier. It contains the following columns:

ID Column Description
variant Variant Variant as chrom:pos:ref>alt.
gene_id GeneID Ensembl GeneID of the gene which the variant belongs to.
tissue Tissue Name of the tissue that was used from SpliceMap.
AbSplice_DNA AbSplice-DNA The AbSplice score is a probability estimate of how likely aberrant splicing of some sort takes place in a given tissue and reports the splice site with the strongest effect. The model was trained using scores from MMSplice and SpliceAI models as well as annotation features from tissue-specific SpliceMaps. To ease downstream applications we suggest three cutoffs (high: 0.2, medium: 0.05, low: 0.01), which approximately have the same recalls as the high, medium and low cutoffs of SpliceAI.
delta_score SpliceAI DeltaScore Input feature to AbSplice. The main score predicted by SpliceAI, computed as a maximum of acceptor gain, acceptor loss, donor gain, and donor loss delta scores. The score represents probability of the variant being splice-altering.
delta_logit_psi MMSplice + SpliceMap score Input feature to AbSplice. The score is computed by using SpliceMap as an annotation for MMSplice. The score shows the effect of the variant on the inclusion level (PSI – percent spliced in) of the junction. The score is on a logit scale. If the score is positive, it shows that variant leads higher inclusion rate for the junction. If the score is negative, it shows that variant leads higher exclusion rate for the junction.
delta_psi MMSplice + SpliceMap + Ψ_ref score Input feature to AbSplice. The delta_psi (∆Ψ) score is computed by converting delta_logit_psi (∆𝑙𝑜𝑔𝑖𝑡(Ψ)) to natural scale with the splicing scaling law and ref_psi (Ψ𝑟𝑒𝑓):
∆Ψ = σ(∆𝑙𝑜𝑔𝑖𝑡(Ψ) + 𝑙𝑜𝑔𝑖𝑡(Ψ𝑟𝑒𝑓)) − Ψ𝑟𝑒𝑓
splice_site_is_expressed Splice site expressed Input feature to AbSplice. Binary feature indicating if the splice site is expressed in the target tissue using a cutoff of 10 split reads (median_n from SpliceMap).
junction Intron Coordinates of the respective intron from SpliceMap.
event_type Type of splicing event The event_type takes values either psi3 or psi5 for alternative donor or alternative acceptor site usage (from SpliceMap).
splice_site Splice site location Coordinates of the splice site.
ref_psi Ψ reference score Reference level of site usage from SpliceMap.
median_n Median coverage Splice site The median number of split reads sharing the splice site (from SpliceMap).
acceptor_gain SpliceAI Delta score (acceptor gain) Probability computed by SpliceAI that the variant will lead to an acceptor gain at acceptor_gain_position .
acceptor_loss SpliceAI Delta score (acceptor loss) Probability computed by SpliceAI that the variant will lead to an acceptor loss at acceptor_loss_position.
donor_gain SpliceAI Delta score (donor gain) Probability computed by SpliceAI that the variant will lead to a donor gain at donor_gain_position.
donor_loss SpliceAI Delta score (donor loss) Probability computed by SpliceAI that the variant will lead to a donor loss at donor_loss_position.
acceptor_gain_position SpliceAI Delta postion (acceptor gain) Delta position represents the location of respective splicing change relative to the variant position: positive values are downstream of the variant, negative values are upstream.
acceptor_loss_position SpliceAI Delta position (acceptor loss) See description of acceptor_gain_position.
donor_gain_position SpliceAI Delta postion (donor gain) See description of acceptor_gain_position.
donor_loss_position SpliceAI Delta position (donor loss) See description of acceptor_gain_position.

Example use case

The example folder contains a snakemake workflow to generate AbSplice predictions, given a vcf file and a fasta file (either for hg19 or hg38, will be downloaded automatically).
The snakemake workflow will download precomputed SpliceMaps from Zenodo and run AbSplice based on these annotations. To generate predictions run:

cd example/workflow
python -m snakemake -j 1 --use-conda

AbSplice-DNA:

To run the workflow on your own data do the following:

  • Store all (or provide a symlink to) vcf files for analysis in data/resources/analysis_files/vcf_files/.

  • Specify the genome version that you are going to use (hg19 or hg38) in the field genome of the config file.

  • In the field splicemap_tissues of the config file you can uncomment the tissues that AbSplice will use to generate predictions.

Optionally:

  • If you want to run the example on large datasets, you can enable a fast lookup interface spliceai_rocksdb that uses precomputed SpliceAI scores. The first time you use it, the precomputed database will be downloaded (it will take significant time – about 1 hour and use approximately 180GB of storage). To enable fast lookup for SpliceAI simply change the field use_rocksdb in the config file to True.

For users who work with the container:

To run AbSplice on your own vcf-files, you need to copy them from your disk to the container. If you are inside the container, run:

exit

To copy a vcf-file from your disk to the container run:

docker cp path/on/your/disk absplice_container:/app/absplice/example/data/resources/analysis_files/vcf_files/

To execute the container run:

docker start absplice_container
docker exec -it absplice_container /bin/bash

To edit the config file inside the container use pre-installed editor nano as follows (or optionally install any other editor):

nano config.yaml

To close the editor press Ctrl+X, choose whether to save the changes: Y or N, and press Enter.

AbSplice-RNA:

AbSplice-RNA combines DNA and RNA information. For each individual, DNA-based predictions of variants will be combined with RNA-based predictions/ measurements for junctions in the vicinity of the variants. The input are vcf files (either single or multisample) from DNA and the results from running FRASER on RNA-seq samples using DROP.

The DNA IDs in the vcf file have to match the DNA IDs in the DNA_ID column of the sample annotation file from DROP.

Example output for AbSplice-RNA.

To run AbSplice-RNA on your own data you need to:

  • Set the field AbSplice_RNA of the config file to True.
  • Run the aberrant splicing module of DROP and copy the root directory of DROP into this folder.
  • Specifiy the names of the DROP groups in the field DROP_group of the config file.
  • Speficy the gene annotation that was used to run DROP in the field geneAnnotation of the config file.

absplice's People

Contributors

wagnernils avatar muhammedhasan avatar neverov-am avatar vyepez88 avatar

Stargazers

Li Shen avatar qianche avatar Nicholas Knoblauch avatar A.s. avatar koreeda avatar Laurie Zongjun Liu avatar  avatar Jeff Cole avatar  avatar  avatar xiao---sheng avatar George Carvalho avatar Qin Lin avatar  avatar Nakamura avatar MISHIMA, Hiroyuki avatar  avatar Arya avatar Elmasnur Yılmaz avatar  avatar Kalin Nonchev avatar  Florian R. Hölzlwimmer avatar David Schlesinger avatar Manuel Holtgrewe avatar  avatar Kaur Alasoo avatar Konstantinos Kyriakidis avatar Ken Chen avatar

Watchers

 Florian R. Hölzlwimmer avatar  avatar Julien Gagneur avatar  avatar  avatar  avatar  avatar

absplice's Issues

Absplice-RNA

Hi,

I am trying to implement Absplice-RNA with a VCF file and RNAseq data from the same individual. I see the example code in README is made for Absplice-DNA. Could you point out how to use Absplice-RNA?

Thanks!
Yan

Can't find cause of the error

AbSplice was launched with the command from tutorial python -m snakemake -j 1 --use-conda from the AbSplice container, installed strictly by instructions in READ.md

End of the output:

[Fri Feb  9 17:37:54 2024]
Error in rule mmsplice_splicemap:
    jobid: 8
    input: ../data/resources/analysis_files/vcf_files/DIV_train_all_annotated.vcf.gz, ../data/resources/downloaded_files/GRCh38.primary_assembly.genome.fa, ../data/resources/downloaded_files/splicemap_hg38/Adipose_Subcutaneous_splicemap_psi5_method=kn_event_filter=median_cutoff.csv.gz, ../data/resources/downloaded_files/splicemap_hg38/Adipose_Subcutaneous_splicemap_psi3_method=kn_event_filter=median_cutoff.csv.gz
    output: ../data/results/hg38/model_scores_from_absplice_features/DIV_train_all_annotated.vcf.gz_MMSplice_SpliceMap.csv

Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete

I think that the VCF file can be the cause of the problem(despite successful annotation by SpliceAI and SPiP). It's attached below (modified extension to txt to upload it on github).
DIV_train_all_annotated.txt

Can you help me to find the cause of the error? Thank you!

How to get CATs data

I want to get CATs data to validate the absplice performance. How to get CATs data?

SpliceAI running time

Hello,
I an testing absplice on a human WGS sample using default configurations (and tissues).
Currently I ran snakemake workflow inside the docker container using 20 threads.

MMSplice finished quite qwickly, but SpliceAI in running since 6 days and is still at chr2...

Is there a way for speeding SpliceAI up?
Does SpliceAI scale up with the threads given as input to snakemake?

Splicemap_tissues

Dear Developer

Thank you for your making docker container.

I want to know about "Splicemap_tissues"

If I have a blood DNA sample but want to know the splicing mutations in the liver, can I select Splicemap_tissues for the liver?

License

Hello developers of absplice,

First of all many thanks for such a nice open-source project to improve splicing variant prediction and annotation.
I'm opening this issue because I struggle to find a license for using the code or precomputed scores. Do you happen to have a license for your wonderful project?

Container Environment problem

Hello,

We are trying to make the container work on our slurm based HPCC. We don't have docker available to us on the server and images before allowing them to be used but .oci isn't accepted. They asked for a tar.gz version of the image to convert to a sif before we could use it.

I installed docker locally, loaded the oci following the commands on git:
docker load -i absplice.oci docker run -it --name absplice_container localhost/absplice:latest /bin/bash
Then saved to a tar.gz
docker save localhost/absplice:latest | gzip > absplice_latest.tar.gz)
The IT team created a sif for us using this and gave us the following command to load it.
singularity shell -B /n /n/app/singularity/containers/absplice.sif

However, I run into an error trying to load the conda environment and conda init bash has no effect. Is this a problem with the system that we have to run it, the process of converting to tar.gz, or the image itself? And are there any recommendations for how to get around this problem?

`[user@compute-node ~]$ singularity shell -B /n /path/to/absplice.sif
Apptainer> conda activate absplice_dock

CommandNotFoundError: Your shell has not been properly configured to use 'conda activate'.
To initialize your shell, run

$ conda init <SHELL_NAME>

Currently supported shells are:

  • bash
  • fish
  • tcsh
  • xonsh
  • zsh
  • powershell

See 'conda init --help' for more information and options.

IMPORTANT: You may need to close and restart your shell after running 'conda init'.`

Thank you so much for your help!!!
Shayna

Can't get attribute 'EBMPreprocessor' on <module 'interpret.glassbox.ebm.ebm'

Hey guys, your tool looks really interesting to me! :)
And so I wanted to give it a try.

However, I stumbled over this error when running the example on my Linux-server.
Do you know how to get around it?

[Sat Dec 24 01:35:02 2022]
rule absplice_dna:
    input: mmsplice_splicemap.csv, spliceai.vcf
    output: absplice_dna.csv
    jobid: 1
    reason: Missing output files: absplice_dna.csv; Input files updated by another job: mmsplice_splicemap.csv
    resources: tmpdir=/tmp

2022-12-24 01:35:06.874355: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE4.1
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
Traceback (most recent call last):
  File "/home/mi/olivek95/git_projects/absplice/example/.snakemake/scripts/tmp5a1c9dig.absplice_dna.py", line 11, in <module>
    splicing_result.predict_absplice_dna()
  File "/home/mi/olivek95/git_projects/absplice/absplice/result.py", line 568, in predict_absplice_dna
    model = pickle.load(open(pickle_file, 'rb'))
AttributeError: Can't get attribute 'EBMPreprocessor' on <module 'interpret.glassbox.ebm.ebm' from '/buffer/ag_bsc/PS_SEQAN_STUDENTS/olivek95/miniconda3/envs/absplice/lib/python3.9/site-packages/interpret/glassbox/ebm/ebm.py'>
[Sat Dec 24 01:35:17 2022]
Error in rule absplice_dna:
    jobid: 1
    input: mmsplice_splicemap.csv, spliceai.vcf
    output: absplice_dna.csv

RuleException:
CalledProcessError in file /home/mi/olivek95/git_projects/absplice/example/Snakefile, line 88:
Command 'set -euo pipefail;  /buffer/ag_bsc/PS_SEQAN_STUDENTS/olivek95/miniconda3/envs/absplice/bin/python /home/mi/olivek95/git_projects/absplice/example/.snakemake/scripts/tmp5a1c9dig.absplice_dna.py' returned non-zero exit status 1.
  File "/home/mi/olivek95/git_projects/absplice/example/Snakefile", line 88, in __rule_absplice_dna
  File "/buffer/ag_bsc/PS_SEQAN_STUDENTS/olivek95/miniconda3/envs/absplice/lib/python3.9/concurrent/futures/thread.py", line 58, in run
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: .snakemake/log/2022-12-24T013405.664404.snakemake.log

Initial command:
cd example; python -m snakemake -j 1

Bugs in Absplice implementation

Dear Dr. Gagneur.

We are also working on the implementation of the Absplice reported in this issue.
However, I have been working on it based on the readme at the following URL, but an error occurred in the test sample.
I would like to know what to do in this case.

The command that caused the error is as follows
python -m snakemake -j 1 --use-conda

The following is the ERROR message.

Error in rule download_splicemaps:
jobid: 4
output: splicemap_gtex_v8
shell:
splicemap_download --version gtex_v8 --splicemap_dir splicemap_gtex_v8 --tissues Testis --tissues Cells_Cultured_fibroblasts
(one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: .snakemake/log/2023-05-17T141553.827340.snakemake.log

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.