Giter Club home page Giter Club logo

ms's Introduction

PS2MS

Repository for NYCU JHHLab NPS detection project. The repository of the paper "PS2MS".

Predict mass spectrums(NEIMS)

Usage:

Prepare dataset

python3 make_train_test_split.py --main_sdf_name=path/to/mainlib_merge.SDF --replicates_sdf_name=path/to/test.SDF --output_master_dir=path/to/output/dir/spectra_tf_records
  • --main_sdf_name: The file comprises mass spectra of all compounds, encompassing both the training and test sets, presented in SDF format.
  • --replicates_sdf_name: The file comprises mass spectra of compounds of test set, presented in SDF format.
  • --output_master_dir: The output directory of preprocessing data.

Training

python3 molecule_estimator.py \
  --dataset_config_file=path/to/output/dir/spectra_tf_records/query_replicates_val_predicted_replicates_val.json \
  --train_steps=10000 \
  --model_dir=path/of/output/model/models/output \
  --hparams=make_spectra_plots=True,mask=False,mass_power=0 --alsologtostderr
  • --dataset_config_file: The path of preprocessed dataset
  • --model_dir: The path of model

Predicting

python3 make_spectra_prediction.py \
  --input_file=path/to/test_SMILES.txt \
  --output_file=path/to/test.SDF \
  --weights_dir=path/of/output/model/models/output
  • --input_file: The SMILES of test set, presented in txt format.
  • --output_file: The predict mass spectrum of test set, presented in SDF format.
  • --weights_dir: The path of model.

Transfer file

python3 sdf_to_msp.py ${input_file} ${is_predict_spectrum}
  • Convert files from SDF format to msp format.
  • ${is_predict_spectrum}: a boolean value. If the input file contains predict mass spectrum, this value should beset to True.

Predict the fingerprints(DeepEI)

Usage:

  • Please refer to the README.md in the repository of DeepEI.

Combine the mass spectrum and fingerprint

python3 merge_fp_into_msp.py \
  ${MS.msp} \
  ${FP.pkl} \
  ${result.msp}

Enumerate the derivatives of Cathinone(Enumeration)

  • By Samuel

Usage:

Install conda env

conda env create --name enumerate -f enumerate/environment.yml

Build project

cd enumerate 
conda activate enumerate

mkdir build && cd build
cmake .. -DCMAKE_BUILD_TYPE=Release
make -j

Run permutation

cd enumerate 
conda activate enumerate

./build/build_database <split_count> <split_idx> <output_dir>

# for example: ./build/build_database 200 0 /tmp

Compare drugs with enumerate database(Drug detection)

  • Source code: cfm-id-code
    • We utilize the data type and the function responsible for calculating the cosine similarity between two mass spectra in this project.

Usage

Build project

Follow the step in INSTALL.txt

Run detection

  • Condition 1: We know what analytes actually are.
    • database_file: The file contains the mass spectrum and the fingerprint of enumerated compounds and is in msp format.
    • test_file: The file contains the mass spectrum and the fingerprint of the analytes and is in msp format.
    • result.txt: The file records the ranking performance of every analyte. For every analyte, the first line is the rank and the SMSF score of the answer in enumerate database. The following lines are the SMILES and SMSF score of molecular which has higher rank than the answer in enumerate database.
    • scores.txt: The file records the similarity scores of every analyte. For an analyte, it records the cosine similarity of mass spectrum, the Jaccard similarity of mass spectrum, the similarity of fingerprint and the SMSF score of with the corresponding one in enumerate database.
    • restrict_mw: A boolean value. If true, the system will solely compute the similarity for compounds whose molecular weight falls within the range of the analyte's molecular weight plus or minus one..
    • top_n_of_JS_of_MS: A positive integer. This value determines the number of the highest peaks used to compute the Jaccard similarity of mass spectra between two compounds. If set to zero, the system will not calculate the Jaccard similarity of mass sepctrum.
./build/drug-comparation/cfm_comparation \
  ${database_file} \
  ${test_file} \
  ${result.txt} \
  ${scores.txt} \
  ${restrict_mw} ${top_n_of_JS_of_MS}
  • Condition 2: We don't know what analytes are.
    • database_file: The file contains the mass spectrum and the fingerprint of enumerated compounds and is in msp format.
    • test_file: The file contains the mass spectrum and the fingerprint of the analytes and is in msp format.
    • result.txt: The file records the result of every analyte. For every analyte, the top 100 compounds is recorded with the SMILES and SMSF score.
    • restrict_mw: A boolean value. If true, the system will solely compute the similarity for compounds whose molecular weight falls within the range of the analyte's molecular weight plus or minus one..
    • top_n_of_JS_of_MS: A positive integer. This value determines the number of the highest peaks used to compute the Jaccard similarity of mass spectra between two compounds. If set to zero, the system will not calculate the Jaccard similarity of mass sepctrum.
./build/drug-comparation/cfm_comparation_without_answer \
  ${database_file} \
  ${test_file} \
  ${result.txt} \
  ${restrict_mw} ${top_n_of_JS_of_MS}

ms's People

Contributors

blenderwang9487 avatar csweichen avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.