Giter Club home page Giter Club logo

finsurf's Introduction

alt text

Introduction

FINSURF (Functional Identification of Non-coding Sequences Using Random Forests) is a tool designed to analyse lists of sequences variants in the human genome.

It assigns a score to each variant, reflecting its functional importance and therefore its likelihood to disrupt the physiology of its carrier. FINSURF scores Single Nucleotide Variants (SNV), insertions and deletions. Among SNVs, transitions and transversions are treated separately. Insertions are characterised by a score given to each base flanking the insertion point. Deletions are characterised by a score at every deleted base. FINSURF can (optionally) use a list of known or suspected disease genes, in order to restrict results to variants overlapping cis-regulatory elements linked to these genes.

For a variant of interest, users can generate a graphical representation of "feature contributions », showing the relative contributions of genomic, functional or evolutionary information to its score.

FINSURF is implemented as python3 scripts.

License

This code may be freely distributed and modified under the terms of the GNU General Public License version 3 (GPL v3) and the CeCILL licence version 2 of the CNRS. These licences are contained in the files:

  1. LICENSE-GPL.txt (or on www.gnu.org)
  2. LICENCE-CeCILL.txt (or on www.cecill.info)

Copyright for this code is held by the Dyogen (DYnamic and Organisation of GENomes) team of the Institut de Biologie de l'Ecole Normale Supérieure (IBENS) 46 rue d'Ulm Paris and the individual authors.

  • Copyright © 2020 IBENS/Dyogen : Lambert MOYON, Alexandra LOUIS, Thi Thuy Nga NGUYEN, Camille Berthelot and Hugues ROEST CROLLIUS

Contact

Email finsurf {at} bio {dot} ens {dot} psl {dot} eu

If you use FINSURF, please cite:

Classification of non-coding variants with high pathogenic impact. Lambert Moyon, Camille Berthelot, Alexandra Louis, Nga Thi Thuy Nguyen, Hugues Roest Crollius PLoS Genet. 2022 Apr 29;18(4):e1010191. doi: 10.1371/journal.pgen.1010191.

Quick start

Below is a quick start guide to using FINSURF

Table of content

Installation

Installing conda

The Miniconda3 package management system manages all FINSURF dependencies, including python packages and other software.

To install Miniconda3:

  • Download Miniconda3 installer for your system here

  • Run the installation script: bash Miniconda3-latest-Linux-x86_64.sh or bash Miniconda3-latest-MacOSX-x86_64.sh, and accept the defaults

  • Open a new terminal, run conda update conda and press y to confirm updates

Installing FINSURF

  • Clone the repository and go to FINSURF root folder

    git clone https://github.com/DyogenIBENS/FINSURF.git
    cd FINSURF
    
  • Create the main conda environment.

    We recommend using Mamba for a faster installation:

    conda install -c conda-forge mamba
    mamba env create -f envs/finsurf.yaml
    

    Alternatively, you can use conda directly :

    conda env create -f env/finsurf.yaml
    
  • Download feature contributions and gene associations.

    You have to download the data files (4.8 Go for intersect and 82Go for features contribution) that have to be intersect with your variants on https://www.opendata.bio.ens.psl.eu/finsurf/

    wget --no-check-certificate https://www.opendata.bio.ens.psl.eu/finsurf/finsurf_dataV1.tgz
    
    tar -xzvf finsurf_dataV1.tgz
    
    wget --no-check-certificate https://www.opendata.bio.ens.psl.eu/finsurf/plot_contribution_dataV1.tgz
    
    tar -xzvf plot_contribution_dataV1.tgz
    
    

    the architecture of the finsurf directory should then be:

  • FINSURF

    • LICENSE.txt
    • README.md
    • env
    • scripts
    • static
      • data
        • 2020-05-11_table_genes_FINSURF_regions.tsv
        • FINSURF_REGULATORY_REGIONS_GENES.bed.gz
        • FINSURF_REGULATORY_REGIONS_GENES.bed.gz.tbi
        • FINSURF_model_objects
          • full-model_woTargs_columns.txt
          • rename_columns_model.tsv
        • FULL_FC_transition.tsv.gz
        • FULL_FC_transition.tsv.gz.tbi
        • FULL_FC_transversion.tsv.gz
        • FULL_FC_transversion.tsv.gz.tbi
        • NUM_FEATURES.tsv.gz
        • NUM_FEATURES.tsv.gz.tbi
        • SCALED_NUM_FEATURES.tsv.gz
        • SCALED_NUM_FEATURES.tsv.gz.tbi
        • scores_all_chroms_1e-4.tsv.gz
        • scores_all_chroms_1e-4.tsv.gz.tbi
      • samples

Usage

Setting up your working environment for FINSURF

Before any FINSURF run, you should:

  • go to FINSURF root folder,
  • activate the conda environment with conda activate finsurf.

Running FINSURF on example data

Before using FINSURF on your data, we recommend running a test with our example data to ensure that installation was successful and to get familiar with the pipeline, inputs and outputs.

Example 1: Simple FINSURF run

To run FINSURF on example data:

python scripts/finsurf.py -i static/data/samples/variant.vcf -s static/data/scores_all_chroms_1e-4.tsv.gz -g static/data/FINSURF_REGULATORY_REGIONS_GENES.bed.gz -ig static/data/samples/gene.txt

The following output should be generated: res/result_*.txt.

To run FINSURF on the 49 variants from Genomizer:

python scripts/finsurf.py -i static/data/samples/Genomizer_49_var.vcf -s static/data/scores_all_chroms_1e-4.tsv.gz -g static/data/FINSURF_REGULATORY_REGIONS_GENES.bed.gz -ig static/data/samples/Genomizer_49_var_GENES.tsv

to plot the contributions for one specific variant:

python scripts/plot_contribution.py --variant "chr1:12005" --vartype "transition" --rename_cols_table static/data/FINSURF_model_objects/rename_columns_model.tsv --numFeat_path static/data/NUM_FEATURES.tsv.gz --scaled_numFeat_path static/data/SCALED_NUM_FEATURES.tsv.gz --featCont_transition_path static/data/FULL_FC_transition.tsv.gz --featCont_transversion_path static/data/FULL_FC_transversion.tsv.gz

to plot the contributions for one specific variant from Genomizer dataset:

python scripts/plot_contribution.py --variant "chr8:21988220" --vartype "transition" --rename_cols_table static/data/FINSURF_model_objects/rename_columns_model.tsv --numFeat_path static/data/NUM_FEATURES.tsv.gz --scaled_numFeat_path static/data/SCALED_NUM_FEATURES.tsv.gz --featCont_transition_path static/data/FULL_FC_transition.tsv.gz --featCont_transversion_path static/data/FULL_FC_transversion.tsv.gz

The script should generate the html file in res directory such as this one

finsurf's People

Contributors

dyogenibens avatar alouis72 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.