Giter Club home page Giter Club logo

yhydra's Introduction

yHydra: Deep Learning enables an Ultra Fast Open Search by Jointly Embedding MS/MS Spectra and Peptides of Mass Spectrometry-based Proteomics

This code is an implementation of the open search of yHydra as described in our preprint: https://doi.org/10.1101/2021.12.01.470818

Disclaimer:

This code-repository implements a GPU-accelerated open search that uses the joint embeddings of yHydra. Note, this repository contains trained models of yHydra. The yHydra training pipeline will be available soon. Since this is ongoing research it may not reflect the final performance of yHydra.

Getting started

System requirements:

Note: installation of required packages takes up to several minutes.

Start by cloning this github-repository:

git clone https://github.com/tzom/yHydra
cd yHydra

If running in WSL

sudo apt install dos2unix
dos2unix *

To install the required packages create a new conda environment (required packages are automatically installed when using yhydra_env.yml):

conda env create -f yhydra_env.yml
conda activate yhydra_env
bash install_thermorawfileparser.sh
conda install -y pandas lxml
sudo apt install jq

Run the example

The following commands will download example data (from https://www.ebi.ac.uk/pride/archive/projects/PXD007963) and run the pipeline of yHydra:

mkdir example
wget -nc -i example_data_urls.txt -P example/
gzip example/SynPCC7002_Cbase.fasta
bash run.sh config.yaml

Run yHydra

To run yHydra specify the location of input files in ./config.yaml:

# Input - File Locations
FASTA: example/*.fasta.gz
RAWs: example/*.raw

# Output - Results directory
RESULTS_DIR: example/search

# General Parameters
BATCH_SIZE: 64
...

then you can run yHydra using specified parameters (./config.yaml):

bash run.sh config.yaml

Inspect search results

The search results are dumped as dataframe in .hdf-files (e.g. locations is specified as RESULTS_DIR: example/search in the ./config.yaml), in order to get a glimpse of identfications, you can run this:

python inspect_search.py

which gives you the following output:

(yhydra_env) animeshs@DMED7596:/mnt/f/OneDrive - NTNU/yHydra$ python inspect_search.py
                  raw_file     id                                           is_decoy  precursorMZ      pepmass  ...           best_peptide peptide_mass  delta_mass         q           accession
0       qe2_03132014_1WT-1  13677  [False, False, True, True, True, True, False, ...   699.032500  2094.074025  ...  ADTAGVHGAALGADEIELTRK  2094.070485    0.003540  0.000000  [SYNPCC7002_A1022]
1      qe2_03132014_13WT-3  17071  [False, False, False, True, False, True, True,...   900.941528  1799.867407  ...      DIVTQFHGAEAAVDAEK  1799.868945   -0.001538  0.000000  [SYNPCC7002_A1609]
2       qe2_03132014_1WT-1  20532  [False, False, False, False, True, False, Fals...   925.985352  1849.955053  ...     TLIEGLDEISHGGLPSGR  1849.953345    0.001708  0.000000  [SYNPCC7002_A0287]
3       qe2_03132014_5WT-2  18652  [False, False, True, True, False, False, False...   779.409100  2335.203825  ...  SIEAEQLKDDLPTIHVGDTVR  2335.201905    0.001920  0.000000  [SYNPCC7002_A1033]
4       qe2_03132014_1WT-1  17209  [False, False, True, True, False, True, False,...   900.942505  1799.869360  ...      DIVTQFHGAEAAVDAEK  1799.868945    0.000415  0.000000  [SYNPCC7002_A1609]
...                    ...    ...                                                ...          ...          ...  ...                    ...          ...         ...       ...                 ...
23643  qe2_03132014_13WT-3  17280  [True, False, False, True, False, True, True, ...   473.229431   944.443212  ...                MFDIFTR   928.447665   15.995548  0.009897  [SYNPCC7002_A2209]
23644   qe2_03132014_5WT-2   3564  [True, False, True, False, True, True, False, ...   329.179352  1312.686107  ...           KEESELIDAHGK  1354.672825  -41.986718  0.009980  [SYNPCC7002_A2459]
23645  qe2_03132014_13WT-3  14744  [False, True, False, True, False, False, False...   757.902893  1513.790136  ...          AEKNIILSIEDIR  1512.851125    0.939011  0.009980  [SYNPCC7002_F0019]
23646   qe2_03132014_5WT-2   4053  [False, True, False, True, False, False, False...   310.158813   927.452965  ...                RGGGGDR   673.325565  254.127401  0.009980  [SYNPCC7002_A0148]
23647  qe2_03132014_13WT-3   4498  [True, False, True, False, True, False, False,...   452.226990  1353.657494  ...          TNYVPHVSFTGTK  1449.725215  -96.067721  0.009980  [SYNPCC7002_A2578]

[23648 rows x 19 columns]

Author

Tom Altenburg (tzom)

yhydra's People

Contributors

tzom avatar animesh avatar tondre avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.